Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encryption support #9

Open
ElDavoo opened this issue Feb 21, 2022 · 37 comments
Open

Encryption support #9

ElDavoo opened this issue Feb 21, 2022 · 37 comments
Assignees
Labels
enhancement New feature or request

Comments

@ElDavoo
Copy link
Owner

ElDavoo commented Feb 21, 2022

I really don't have the time and the need to do that, but now that the file formats are documented and almost 100% parsed, it is now definitely possible to create an encrypter. If someone else wants to do that I will happily accept a PR.

@ElDavoo ElDavoo added the enhancement New feature or request label Feb 21, 2022
@georg-lam
Copy link

I had a look at the encryption for crypt14. The encrypted database seems to consist of a header (about 190 bytes), the actual encrypted SQLite database and a footer (of 32 bytes). How can I determine/calculate these trailing bytes? I read that for crypt12 it is 20 bytes (md5 hash and the last two digits of your account).

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 10, 2022

There is no footer in .crypt14-15 files. (edit: false, see below)
Everything is documented in the code, it just kind of needs to be executed in the reverse order

@georg-lam
Copy link

georg-lam commented Dec 11, 2022

You are absolutely right. I investigated this further and found that the footer came from the compression (probably different settings for zlib). The footer is irrelevant.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 11, 2022

Overview of what needs to be done:

  • Build the protobuf header
  • Get its length and write it at the beginning of the file
  • Write the DB type (if i remember it's 0x08 or 0x01)
  • Write the header and the compressed and encrypted stream
  • Extensive tests!

I'm going by memory so I might be wrong. The code is the truth.
No idea if it's better to make a separate python script or to embed everything into the main file.

@georg-lam
Copy link

Hi ElDavoo,

I had some time to investigate this a bit further. It seems that the last 32 bytes of the crypt14 file are not needed for the decryption/extraction. Just shrink your crypt14 file by 32 bytes and then decrypt it with your algorithm. You will end up with the same sqlite database.

Another way of looking at it is the following. Compare the following:

  • Take the crypt14 database and (only) decrypt it -> then you have a compressed database
  • Take the sqlite database and compress it with "zlib.compress(, level=1)"
    The first (compressed) database is 32 bytes larger than the second database otherwise they are identical.

Here is some code that you can use to check this. Of course you need to first identify the offset and the IV (here 191 and 67). You could use your script with the parameter --no-protobuf.

from Cryptodome.Cipher import AES
import zlib

with open("key", "rb") as file_handler:
    key_data = file_handler.read()
key = key_data[126:]

with open("msgstore.db.crypt14", "rb") as file_handler:
    encrypted_database_data = file_handler.read()

iv = encrypted_database_data[67:83]
header = encrypted_database_data[:191]
encrypted_data = encrypted_database_data[191:]

decrypt_cipher = AES.new(key, AES.MODE_GCM, iv)

decrypted_filepath = "msgstore.db"
with open(decrypted_filepath, "wb") as file_handler:
    decrypted_data = decrypt_cipher.decrypt(encrypted_data)
    database_data = zlib.decompress(decrypted_data)
    file_handler.write(database_data)

# now we do the reverse
encrypt_cipher = AES.new(key, AES.MODE_GCM, iv)

with open("re-encrypted_msgstore.db.crypt14", "wb") as file_handler:
    compressed_data = zlib.compress(database_data, level=1)
    re_encrypted_database_data = header + encrypt_cipher.encrypt(compressed_data)
    file_handler.write(re_encrypted_database_data)


for index in range(len(compressed_data)):
    if compressed_data[index] != decrypted_data[index]:
        print("The two byte arrays differ.") #  You will not see this message as they are equal (up to the last 32 bytes)

# the decrypted data is 32 bytes longer than the zlib compressed data.
len(decrypted_data) - len(compressed_data)
print(decrypted_data[-32:])

Note, if you use re-encrypted_msgstore.db.crypt14 in WhatsApp to restore your database then it will fail. Those 32 bytes seem to make the difference.

Here is a guide that I follow in order to restore from a local backup. https://www.quora.com/If-I-have-both-a-Google-Drive-and-a-local-backup-of-WhatsApp-chats-how-do-I-ensure-WhatsApp-picks-up-the-local-version-to-restore-when-I-reinstall

This means it makes no sense to implement the encryption unless we know the definition of those last 32 bytes. I tried various hashes with no luck.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 15, 2022

Interesting findings, thanks.
So in other words, in the decompressed data there are 32 bytes of "something". Did I get it right?
This must also mean that the programs I used to open the SQLite databse does not detect this "padding" (and, this also means the decrypt script should cut off these bytes if they are not needed).
I will try to take a look at this this weekend. Again, thanks for your time. Yeah that must be some sort of checksum.

@georg-lam
Copy link

Yes, this how I see the situation.

The last 32 bytes of the crypt14 file are probably either a checksum or an encrypted checksum. But those bytes are not needed to unpack the sqilte database. It would be great if you can confirm this. I only analysed one database.

@georg-lam
Copy link

I also looked at zstandard (a zlib implementation from Meta) in order to see whether this is a better fit than zlib with level 1 compression.... no luck here.

Also I found/confirmed that the bytes 33 to 36 counted from the end of the decrypted file coincide with the adler32 checksum....another indication that zlib ends there... and the last 32 bytes are something else..

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 18, 2022

The last 16 bytes of the backup are the MD5 of the encrypted file, including the header. In crypt12 files, the user'jid was also taken into account (re-added at the end as a suffix in clear (the last four bytes))

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 18, 2022

To make things further complicated, it seems like the checksum is not added in all the backups files (stickers and wallpapers), but the script has no knowledge of which file is decrypting (whatsapp knows it thanks to the file name), so I won't make the checksum check an error and will add an explanation

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 18, 2022

Ok, now the scripts check the MD5 at the end of the file. You should now be able to generate a file that both the script and the official app should mark as correct by just appending the checksum of everything you have written. While your tests found out you can delete up to 32 bytes, i'm pretty sure there are no further footer information - just the md5

@georg-lam
Copy link

Thank you so much for all the changes. I only found the time to read your changes. I am not sure yet whether and when I can test the encryption.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 29, 2022

I only found the time to read your changes.

No worries, we took months to even notice there was a checksum at the end

@georg-lam
Copy link

I gave it a shot this morning with the re-encryption. I used the md5 sum of the header and the encrypted sqlite database as the footer (like you found out). Unfortunately, the restore process in WhatsApp with this re-encrypted file failed. I noticed however, that I got a little further than before. When you use a file without a footer, the restore process stops right at the beginning (at 0%). Now, with this new md5 footer the restore process starts, but then stops at 24%.
This suggested that it decrypts the backup database and then there is another check. Hopefully it is a checksum of the decrypted sql database.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 30, 2022

24%? strange

@georg-lam
Copy link

You side a progress bar. It is not my estimate.

And then I used the official backup to restore. It run from 0 to 24%. Paused a little and then jumped immediately to 100%.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Dec 30, 2022

only way we can understand what's going on is seeing the application logs/logcat

@georg-lam
Copy link

My phone is not rooted so I need to see how I can get access to the logs.

A pro could potentially also decompile the app itself.

I will probably compare some backup files by looking at bytes "-32 to -17". Maybe I can see something there. And also try some hashes.... Unfortunately, I have to go now. Maybe I find some time tonight,

@georg-lam
Copy link

I had no luck so far with my analysis of the "missing" bytes, i.e. decrypted_data[-32:-16].

However, I again analysed decrypted_data[-36:-32]. And here I confirmed for 16 different crypt14 database files that this is the adler32 checksum. You can see that by comparing int.from_bytes(decrypted_data[-36:-32], "big") with zlib.adler32(decompressed_data).

@georg-lam
Copy link

This issue was annoying me so much and I could not let go. Finally, I found the solution. Only just by chance.

The crypt14 file consists of a header (as you know it), the encrypted compressed database, an authentication tag and finally the md5 checksum of all bytes before. Here, the tag is 16 bytes long.

Or put differently use
ciphertext, tag = cipher.encrypt_and_digest(plaintext)
and
plaintext = cipher.decrypt_and_verify(ciphertext, tag)

@ElDavoo
Copy link
Owner Author

ElDavoo commented Jan 5, 2023

Excellent work! I will read what is an auth tag tomorrow (since I do not know) then i will implement the verification in the script (which should be easy to do). So you are able to perfectly re-encrypt now?

@georg-lam
Copy link

Yes.

I somehow came across the following post which triggered my eureka moment.
https://stackoverflow.com/questions/67028762/why-aes-256-with-gcm-adds-16-bytes-to-the-ciphertext-size

@ElDavoo
Copy link
Owner Author

ElDavoo commented Jan 6, 2023

The decrypt script is very large, more than 800 lines. Moreover, as you can guess, the encrypt script would share lots of code with the decrypt script. So i'm afraid that I need to let go of the "one autonomous python file that can do everything on its own" and split the project in more files

ElDavoo added a commit that referenced this issue Jan 6, 2023
@ElDavoo
Copy link
Owner Author

ElDavoo commented Jan 6, 2023

@georg-lam btw, did you know that you can just feed an unencrypted backup to whatsapp and it should restore it?

@georg-lam
Copy link

I tried that but it did not work for me. Do you have any instructions for it?

At the moment I am doing it the following way:
(1) Reinstall the app and stop when you are asked for you telephone number.
(2) Remove your Google Account in settings
(3) Give Whatsapp all the permissions
(4) Copy the msgstore.db.crypt14 to WhatsApp\Databases. There I will only keep this file and no other backup.
(5) Go back to the app and finish up the installation/restore process.
(6) Re-add the Google Account.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Jan 6, 2023

No. You should just copy the msgstore.db file in the database folder, it should just work. If it doesn't, myth busted.

@ElDavoo ElDavoo pinned this issue Jan 6, 2023
@ElDavoo
Copy link
Owner Author

ElDavoo commented Jan 6, 2023

First experimental encryptor pushed into the encryption branch. I could recreate an identical msgstore.db.crypt15 file. However, there is a minor problem and a major one:
Minor) The zlib compressor does not compress in the same way as the original, giving two different zlib streams;
Major) In the header, there is a "feature table" of the features supported by your backup (It's a bitfield). It would be very complicated to figure out which features are supported. Moreover, the crypt14 header has some fields (googleID salt and server salt) the user has no way to know in advance.
For these reasons, I think that it's impossible to encrypt a file from scratch; I mean it's doable, but it would be different from the original and not guaranteed to work. The only way would be to have a "reference" encrypted file and to copy the header from there (as the code you posted actually does)

@georg-lam
Copy link

For the zlib stream you could play around with the compression level and the wbit parameter. For the crypt14 files level = 1 was the correct setting.

Yes, I agree you need a reference encrypted file. Otherwise it is too complicated.

@courious875
Copy link

Yes.
I somehow came across the following post which triggered my eureka moment.

@georg-lam Congrats, nice work for both of you. If I'm not mistaken, you managed to perfectly re-encrypt a crypt14 file, correct ?

@georg-lam
Copy link

Yes, bit for bit.

@courious875
Copy link

@georg-lam What code have you used? The encrypt.py in the encryption branch does not seem to have crypt14 support.

@georg-lam
Copy link

As pointed out above you can either construct the header from scratch (more difficult) or use an existing one. So do the following.
(1) Identify how the long the header is for your msgstore.db.crypt14. E.g. 191 bytes or so.
(2) Then compress (with zlib, level=1) and encrypt (e.g. ciphertext, tag = cipher.encrypt_and_digest(database)) the database.
(3) Then combine header, ciphertext, tag to one file.
(4) Calculate the md5 sum and append this sum to this file.
That's it.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Jan 17, 2023

@courious875 support for crypt12 encryption implement in da3bca7 .
Very ugly code, but it works. command line:
python ./encrypt.py key msgstore.db msgstore-reencrypted.db.crypt12 --reference msgstore.db.crypt12 --type 12
Don't use --msgstore with crypt12 files.

@ceka14

This comment was marked as off-topic.

@NelisMk7 NelisMk7 linked a pull request Apr 21, 2023 that will close this issue
@ElDavoo
Copy link
Owner Author

ElDavoo commented Aug 26, 2023

I didn't manage to create an identical zlib stream. However, Whatsapp accepted my modified crypt15 without complaining :)
The encryptor should be working at this point, but it's 99% untested territory: What happens if we don't give a feature list? Does it work with non-msgstore files?

@Fusseldieb

This comment was marked as off-topic.

@ElDavoo
Copy link
Owner Author

ElDavoo commented Apr 3, 2024

Encryption support officially pushed, in "beta". Might still keep this issue open until it's final, but unpinning it.

@ElDavoo ElDavoo unpinned this issue Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

5 participants