Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while compiling signalbackup-tools under Fedora Live 30 (Hardware) #4

Closed
elbrutalo opened this issue Sep 4, 2019 · 79 comments
Closed

Comments

@elbrutalo
Copy link

Hello, when compiling under Fedora Live 30 I'm getting the following error:


LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=4 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools"
lto-wrapper: schwerwiegender Fehler: execvp: No such file or directory
Kompilierung beendet.
/usr/bin/ld: error: lto-wrapper failed
collect2: Fehler: ld gab 1 als Ende-Status zurück

I did the following steps on a Fedora30 on hardware:

$ git clone https://github.com/bepaald/signalbackup-tools.git
$ sudo dnf install gcc-g++ cryptopp-devel sqlite-devel
$ cd signalbackup-tools && chmod +x BUILDSCRIPT.sh
$ sh BUILDSCRIPT.sh

Can anyone help? My linux skills are limited.

@bepaald
Copy link
Owner

bepaald commented Sep 4, 2019

Thanks. Is this running in a VM? I can reproduce this in VM, and just pushed a fix for that. Could you try the same commands again, and let me know?

If you are not running in a virtual machine, I don't know what's happening, but you might be able to fix the build by just changing line 7 of the script to NUMCPU=1

@elbrutalo
Copy link
Author

elbrutalo commented Sep 4, 2019

thank you for your quick reply. Fedora Live does not run in a VM.
I didn't install it to the Harddrive but booting it from a USB Stick.

Unfortunately, the error also occurs with the fix. The adjustment of the code in line 7 leads to the same error message.

Can I create a more detailed error log to better narrow down the error?

Error with modified line 7:

LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=4 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools" lto-wrapper: schwerwiegender Fehler: execvp: No such file or directory Kompilierung beendet. /usr/bin/ld: error: lto-wrapper failed collect2: Fehler: ld gab 1 als Ende-Status zurück [liveuser@localhost-live signalbackup-tools]$

Error with line 7 unmodified ( NUMCPU=$(nproc) ):

LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=4 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools" lto-wrapper: schwerwiegender Fehler: execvp: No such file or directory Kompilierung beendet. /usr/bin/ld: error: lto-wrapper failed collect2: Fehler: ld gab 1 als Ende-Status zurück [liveuser@localhost-live signalbackup-tools]$

@bepaald
Copy link
Owner

bepaald commented Sep 4, 2019

thank you for your quick reply. Fedora Live does not run in a VM.
I didn't install it to the Harddrive but booting it from a USB Stick.

Hm, that should work just fine, I just did the exact same thing here with a F30 Live usb, so it should work just the same.

I think the problem is with the -flto=4 part of the command, but if you set NUMCPU=1, then that should also change to -flto=1, are you sure you changed that line, and saved the script? (And of course don't update from git after manually editing the script, that would undo the changes).

If it still fails, maybe you could try not running the script at all and just run:

g++ -std=c++2a -Wall -Wextra -Wshadow -Wold-style-cast -Woverloaded-virtual -pedantic -fomit-frame-pointer -O3 -march=native -flto -s -o signalbackup-tools */*.cc *.cc -lcryptopp -lsqlite3

or, the same with the -flto removed:

g++ -std=c++2a -Wall -Wextra -Wshadow -Wold-style-cast -Woverloaded-virtual -pedantic -fomit-frame-pointer -O3 -march=native -s -o signalbackup-tools */*.cc *.cc -lcryptopp -lsqlite3

@elbrutalo
Copy link
Author

thank you, that worked (compiling), but now the signalbackup-tools command doesn't start:

LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=1 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools" [liveuser@localhost-live signalbackup-tools]$ signalbackups-tools ../run/media/liveuser/UNTITLED/signal-2019-07-30-10-56-13.backup 55944 05382 40148 73738 38418 61176 --output ../run/media/liveuser/UNTITLED/signal-2019-09-04-10-56-13.backup --opassword 55944 05382 40148 73738 38418 61176 bash: signalbackups-tools: Befehl nicht gefunden...

@bepaald
Copy link
Owner

bepaald commented Sep 4, 2019

Good!

To run executables in linux they need to be in your path (which this one is not), or you need to specify the full location. Long story short, use ./signalbackup-tools (note the ./). Also, the passwords need to be one string, you can only put spaces in there if you quote it (ie: "55944 05382 40148 73738 38418 61176"), or use . or - instead, or only all the numbers without anything in between.

@elbrutalo
Copy link
Author

I'm sorry for asking such dumb questions but I tried to solve this for 30 minutes now, it still won't start the executable:
[liveuser@localhost-live signalbackup-tools]$ ./signalbackups-tools ../run/media/liveuser/UNTITLED/signal-2019-07-30-10-56-13.backup "55944 05382 40148 73738 38418 61176" --output ../run/media/liveuser/UNTITLED/signal-2019-09-04-10-56-13.backup --opassword "55944 05382 40148 73738 38418 61176" bash: ./signalbackups-tools: No such file or directory [liveuser@localhost-live signalbackup-tools]$
Did I get you wrong? Thank you so much for helping.

@bepaald
Copy link
Owner

bepaald commented Sep 4, 2019

Nope, I don't see anything wrong. If the build succeeded you should have the executable in the directory, can you see it when you type ls (lists the contents of the current directory)?

@elbrutalo
Copy link
Author

elbrutalo commented Sep 4, 2019

i think it is there: signalbackup-tools and BUILDSCRIPT.sh are the only green elements between the rest of blue and white elements:

[liveuser@localhost-live signalbackup-tools]$ ls arg BUILDWINDOWS.sh framewithattachment signalbackup attachmentframe common_be.h headerframe signalbackup-tools avatarframe cryptbase LICENSE sqlitedb backupframe databaseversionframe main.cc sqlstatementframe base64 endframe o stickerframe basedecryptor filedecryptor README.md BUILDSCRIPT.sh fileencryptor sharedprefframe

Since the signal backup file is 4-5 GB I can't put the backup file within the signalbackup-tools folder (that's why I have to put the path to the volume/backup file in the command) but I can't execute the executable right now...

@bepaald
Copy link
Owner

bepaald commented Sep 4, 2019

Hm, I'm not sure what's going on then, as far as I can tell it should just work. I've made a little video of the process, right from the start of booting the Live image (it hangs for a bit while installing the packages, so it's slightly long, but maybe check it out, see where the difference is): https://send.firefox.com/download/e9706d671e830f1f/#tGohrdcjyzc0ezuc4i7lnw

It ends with an error btw, only because I don't supply any arguments to the program, this is expected.

@elbrutalo
Copy link
Author

Thank you so much! It's working now. The tool is processing my corrupted backup to a new backup file.

Unfortunately, it does not seem to correct the error. When trying to import the new signal backup file in Signal the import process stops at the count of 67101 messages.

Is there anything in the syntax that can fix my Backup? The terminal's log of signalbackup-tool was this

Reading backup file... FRAME 66756 (099.2%)... STOPPING BEFORE END OF ATTACHMENT!!! done! Exporting backup to '/run/media/liveuser/signal/signal-neu.backup' Writing HeaderFrame... Writing DatabaseVersionFrame... Writing SqlStatementFrame(s)... Dealing with table 'sms'... 59436/59436 entries...done Dealing with table 'mms'... 2410/2410 entries...done Dealing with table 'part'... 70/2439 entries...Warning: attachment data not found Dealing with table 'part'... 345/2439 entries...Warning: attachment data not found Dealing with table 'part'... 562/2439 entries...Warning: attachment data not found Dealing with table 'part'... 583/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1504/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1505/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1506/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1507/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1508/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1509/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1510/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1511/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1512/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1513/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1514/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1872/2439 entries...Warning: attachment data not found Dealing with table 'part'... 2439/2439 entries...Warning: attachment data not found done Dealing with table 'thread'... 0/0 entries... Dealing with table 'identities'... 0/0 entries... Dealing with table 'drafts'... 0/0 entries... Dealing with table 'push'... 0/0 entries... Dealing with table 'groups'... 0/0 entries... Dealing with table 'recipient_preferences'... 0/0 entries... Dealing with table 'group_receipts'... 0/0 entries... Dealing with table 'job_spec'... 0/0 entries... Dealing with table 'constraint_spec'... 0/0 entries... Dealing with table 'dependency_spec'... 0/0 entries... Dealing with table 'sticker'... 0/0 entries... Writing SharedPrefFrame(s)... Writing EndFrame... Error: EndFrame not found

@bepaald
Copy link
Owner

bepaald commented Sep 5, 2019

Hm.. it looks very much like your backup file is not corrupted, but simply incomplete! That is quite a big problem, since the end of the backup file contains important data for the backup. (There is a small possibility of corruption though, if the size of that last attachment is incorrect it might try to read past the end of the file.)

I need to think about a way to verify what's going on and how to fix it. Obviously, the missing data is simply gone, but I might be able to at least get the data that is still there imported. I'm assuming this is somewhat important to you because it is going to be a somewhat complicated procedure (if it works at all), so be prepared for some complicated instructions. Also it will probably take a while for me to think up possible solutions.

@elbrutalo
Copy link
Author

Thank you so much for your help! Yes, for personal reasons it is very important for me to restore as many conversations as possible from this backup so that I accept every effort for it. If you find a way to restore the conversations up to the EOF I would be happy to pay you for the time you invested, I appreciate the effort!

If the backup was aborted prematurely or if it is a corrupt attachment, I can't tell. I think that the backup process ran smoothly and there was enough space on the device.

@bepaald
Copy link
Owner

bepaald commented Sep 5, 2019

I will be working on this, I have some ideas, but first:

I can't find it in this thread (maybe you deleted it?), but in my mailbox I have a message from you where you say you copied the backup file to a FAT32 formatted usb stick? Is this true and still the case? Earlier you also said the backup file was 4-5GB?

Files on a FAT32 filesystem have a maximum size of exactly 4GB, if you copy a larger file onto it it will be truncated to 4GB. Can you check the filesizes? Do you still have the original? I suggest trying to format the usb storage to a more modern filesystem (NTFS should work out of the box on both linux and windows).

@elbrutalo
Copy link
Author

Dear bepaald, thank you for your input. It's correct that I copied the backup file to a FAT32 formatted USB-drive to access it from witin Fedora Live. I checked the file size, it is identical to the original on my harddisk (4.039.229.440 Bytes).

Since there was no error message when copying to the usb-stick, this seems to be still within the FAT32 limit.

I am almost 100% sure that I transferred the signal backup directly from the phone to the computer via SmartSwitch. So I don't know why and when the file was cropped.

@bepaald
Copy link
Owner

bepaald commented Sep 5, 2019

Ok, in your message on the signal github you mentioned 4,7GB (signalapp/Signal-Android#7637 (comment)), so I figured that was way to big for FAT32 (maybe you meant 3,7GB, that's about 4039229440 bytes?).

Anyway, if you don't have any larger versions of the backup, it doesn't matter how it was cropped, I'll get working on a way to fix it. I've done some investigating and I have an idea to get the messages imported (it may take a little time though). Also from refreshing my memory by looking at the code, I'm pretty much certain there was no corruption but truncation (corruption would have resulted in a bad MAC before reaching eof).

@elbrutalo
Copy link
Author

I had 4.7 GB in memory but the original backup file still exists and has 4,039,229,440 bytes (4.04 GB under macOS). I guess I remember it wrong because I copied the backup file directly from the smartphone to the computer without any detours (via FAT32).

If the file is cropped, not much can be missing and my hope would be that at least most of the conversations could still be recovered.

@bepaald
Copy link
Owner

bepaald commented Sep 8, 2019

If the file is cropped, not much can be missing and my hope would be that at least most of the conversations could still be recovered.

I think I can tell from the output you posted earlier it is probably only the last attachement that is missing. From the order in which the backup data are written, if I'm correct, you should have all messages (the text parts of the messages) including messages received after that last incomplete attachment. Unfortunately, there was also important stuff that was written after the attachments that is now gone.

I have just pushed a commit that tries to generate the most important tables from the information in the messages. It fills in data for the thread table (otherwise your list of conversations would appear empty, even though the messages are in the db) and the groups info. I was worried about the 'identities' table remaining empty, but from my testing the app seems to accept this, it will just fill in new data after restoring. Check out the current code and compile (no need to edit the buildscript anymore!), then run like this:

./signalbackup-tools --generatefromtruncated /path/to/truncatedbackupfile.backup 01234560123456012345601234560123456 --output [newfixedbackup] --opassword [newpassword]

I'm extremely tired, but some notes I can think of right now:

  • all your contacts will get a message that your safety number has changed
  • any drafts you had are gone
  • all read_ and delivery_receipts for group messages are probably gone, so in group messages, if you check the details of a certain message, it can not tell which specific members have gotten the message
  • I think probably all avatars will be missing (they are backed up after the attachments), I'm guessing the app will resync the profile at some point. Also contact preferences (like custom color and such) are gone
  • all groups will be 'unnamed group', updating the group will probably fix this. Alternatively, leave the group and get yourself readded.
  • there may be a message with a broken attachment (the one where the file was truncated), I had this in testing. It seems not to cause a problem now but it might be wise to delete that message anyway if it isn't too important.
  • Please check the group members for each group! This was a hard part to restore, I am actually scanning all group conversations and keeping track of the 'joined' and 'quit' messages to determine the group members. If I made a mistake in this, messages could end up with people that are not supposed to be in a group (for example old members who have left already). It worked well for me in testing, but depending on your situation this could be a costly error. Also, changes to groups that have occurred after this backup was made are obviously also not included.
  • For complicated reasons, if a conversation is a group conversation but has no outgoing messages, it is not restored (I don't think it's possible but I'm not sure). There is a warning if such a conversation is found, so at least you know it is missing. If this group has only two members (including yourself) it can not be distinguished from a normal one-on-one conversation and will be added as such (unless a one-on-one conversation with this same person already exists).

hm... that's all I can think of right now. I would love full output of the command above (censor anything you need, but I don't think it reveals much sensitive information). Also if you notice anything about your restored backup (missing or incorrect things) I would like to know. You might also want to test actually sending and receiving messages.

@elbrutalo
Copy link
Author

dear beepald, thank you so much for your efforts!
i was able to compile the new package but I can't start the command now:

[liveuser@localhost-live signalbackup-tools]$ ./signalbackup-tools --generatefromtruncated /run/media/liveuser/KASTI32/signal-2019-07-30-10-56-13.backup 559440538240148737383841861176 --output /run/media/liveuser/KASTI32/signal-neu --opassword 559440538240148737383841861176 bash: ./signalbackup-tools: No such file or directory
within the folder "signalbackup-tools" the compiled script's name is now "signalbackup-tools2" - I tried adding the "2" to the command but that also results in an error:

[liveuser@localhost-live ~]$ ./signalbackup-tools2 --generatefromtruncated /run/media/liveuser/KASTI32/signal-2019-07-30-10-56-13.backup 559440538240148737383841861176 --output /run/media/liveuser/KASTI32/signal-neu --opassword 559440538240148737383841861176 bash: ./signalbackup-tools2: No such file or directory
I followed the same process than the last time. No problems when compiling the package, no editing of the buildscript.sh neede.

Thanks again so much!

@bepaald
Copy link
Owner

bepaald commented Sep 9, 2019

Whoops, sorry that was a stupid mistake, I fixed it now. If you check the code out again the buildscript should be fixed.

However, just adding the "2" should have worked. This looks like the exact same problem you had earlier (#4 (comment)), which I didn't understand either. How did you fix that one?

@bepaald
Copy link
Owner

bepaald commented Sep 9, 2019

Sorry, I just noticed: in your first command you were inside the signalbackup-tools directory ([liveuser@localhost-live signalbackup-tools]$), that only didn't work because I uploaded the wrong buildscript (which built signalbackup-tools2, with a "2" added).

However, in your second attempt (where you cleverly added the "2") you are in the wrong directory [liveuser@localhost-live ~], could that be the problem?

@bepaald
Copy link
Owner

bepaald commented Sep 13, 2019

@elbrutalo Any luck so far? I could make another video if you need one...

@elbrutalo
Copy link
Author

Dear bepaald, please forgive my late feedback, I was on my way and only now I had the possibility to report back.

The recovery process went on until the end, most of the messages could be restored (especially the old messages were all there, at the end some days might have been missing).

Not restored were the attachments of the last weeks (photos, voice messages, videos). These appear as speech bubbles in the conversations, but are empty (see screenshots).

But I'm overjoyed about the older messages that could still be saved! Therefore I thank you infinitely for your help! I have also sent you via Paypal a small expense allowance and appreciate your commitment here for the community and me very much!

Unfortunately, I have now caused a new problem due to carelessness:

  • the new instance of Signal was set to keep only 500 messages per conversation
  • i.e. signal imports the old data, but then deletes them again immediately.

In the months since the broken backup I have been working with a new instance of Signal and have received and sent several hundred messages. These messages exist in a second backup file with a different passphrase.

This means I now have the recovered backup file (with passphrase 1) and the new one from another signal instance (with passphrase 2).

Could I merge them with your tool? I found some threads on the net where people were facing the same problem. There doesn't seem to be a solution.

Maybe it is possible in my case because the conditions are favorable. The backup file from the new signal contains messages from the same contacts as in the old corrupt backup. So numbers and contact names are the same.

Is there a way to solve this with your script?

Best regards, elb

@bepaald
Copy link
Owner

bepaald commented Sep 16, 2019

Dear bepaald, please forgive my late feedback, I was on my way and only now I had the possibility to report back.

No problem!

The recovery process went on until the end, most of the messages could be restored (especially the old messages were all there, at the end some days might have been missing).

Good! If I had to guess, I'd say all messages that were in the database were restored. I think I was pretty careful about that. Any messages that could not be placed in a thread should have produced some output stating that. For example, with my own (truncated) testing backup:

Creating threads from 'mms' table data
Creating threads from 'sms' table data
Thread for this conversation partner already exists. This may be a group with only two members and only incoming messages. This case is not supported.
  !!! WARNING !!! Unable to generate thread data for messages belonging to this thread (no outgoing messages in conversation)
----------------------------------
| union_thread_id | address      |
----------------------------------
| 3               | +3164XXXXXXX |
----------------------------------

Not restored were the attachments of the last weeks (photos, voice messages, videos). These appear as speech bubbles in the conversations, but are empty (see screenshots).

I don't see any screenshots :) But that's okay, in my testing backup I also had one attachment missing and it also showed an empty bubble. I don't think they will pose a problem, but if the message body is empty (it was just an attachment with no actual text message) you might as well delete the messages to be on the safe side.

I think any missing attachments were imply not present in the backup file anymore, but you could test this. The program has had the ability to dump the entire decrypted database to a folder for a while now. You could use this option to see if there were any attachments in the database that haven't been placed in the fixed backup (I would guess not, but it's possible if the message they belong to is gone). To do this:


[~/programming/signalbackup-tools] $ mkdir RAWOUTPUT
[~/programming/signalbackup-tools] $ ./signalbackup-tools DEVsignal-2019-09-05-09-18-22.TRUNCATED 005708563826394701887625524302 --output RAWOUTPUT/
signalbackup-tools build 20190913.093016
IV: (hex:) 78 fe 4a 10 eb 1a 55 e0 b8 7c 85 6e cc b0 da f4 (size: 16)
SALT: (hex:) 4b b9 3c 58 dd 29 85 e3 4d 38 d3 78 d5 83 50 ef fb b5 0b c7 dd 02 e5 c8 5a ad d4 04 ff 56 fc e2 (size: 32)
BACKUPKEY: (hex:) fe 95 73 d2 58 80 4f d5 68 80 56 b0 94 9c e0 40 bb f7 be b4 4c 35 9f 91 09 26 7f 8b 54 ef 88 16 (size: 32)
CIPHERKEY: (hex:) b8 fa 66 66 2d aa 37 0a 90 a9 26 cf 41 ab 38 35 c8 ed df 8c 15 23 f2 07 28 36 a4 59 ae 58 f8 49 (size: 32)
MACKEY: (hex:) 3f 63 69 36 1a ed d2 5f 30 b5 31 93 65 cf 0f 24 b1 9b a1 8c f5 45 ae e8 e1 0b ff ff 70 36 8d 97 (size: 32)
COUNTER: 2029931024
Reading backup file...
FRAME 341 (099.2%)...  STOPPING BEFORE END OF ATTACHMENT!!! (EOF) 
Failed to get attachment data for FrameWithAttachment... info:
Frame number: 342
        Type: ATTACHMENT
         - row id          : 98 (8 bytes)
         - attachment id   : 1567667827993 (8 bytes)
         - length          : 1588447 (8 bytes)
done!
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing Attachments...
Writing Avatars...
Writing SharedPrefFrame(s)...
Writing StickerFrames...
Writing EndFrame...
Error: asked to write nullptr frame to disk
Writing database...

It will obviously fail at the end, because the database is no good, but it will write the attachment data to the directory as it finds them. The filenames will not be very helpful, but you can manually inspect the attachment files (they should also be pretty much chronologically stored in the backup, so the attachment with the highest number should be the most recent one).

But I'm overjoyed about the older messages that could still be saved! Therefore I thank you infinitely for your help! I have also sent you via Paypal a small expense allowance and appreciate your commitment here for the community and me very much!

Thanks a lot! I just noticed that this morning and did not know who it was from. Though of course I would have helped you anyway (or tried to at least), I really do appreciate it a lot!

Unfortunately, I have now caused a new problem due to carelessness:

* the new instance of Signal was set to keep only 500 messages per conversation

* i.e. signal imports the old data, but then deletes them again immediately.

In the months since the broken backup I have been working with a new instance of Signal and have received and sent several hundred messages. These messages exist in a second backup file with a different passphrase.

This means I now have the recovered backup file (with passphrase 1) and the new one from another signal instance (with passphrase 2).

Could I merge them with your tool? I found some threads on the net where people were facing the same problem. There doesn't seem to be a solution.

Maybe it is possible in my case because the conditions are favorable. The backup file from the new signal contains messages from the same contacts as in the old corrupt backup. So numbers and contact names are the same.

Is there a way to solve this with your script?

Haha, wow, that is some bad luck! But also good news! I have already implemented this feature a couple of weeks ago! In fact, when I woke up this morning I just decided to post a message in this thread to let those people know they could test it out, but now I will let you be the brave tester.

I have only tested with a few handcrafted, very small backups, you are really the first to try it seriously. Be prepared for it not to work (at least not the first time), it may need more work. Also, it is not a fully automated procedure, there are some slightly more complicated instructions than before. Example:

Assuming a current backup current.backup and an old one source.backup. First you need to get a list of threads from the old backup:

[~/programming/signalbackup-tools] $ ./signalbackup-tools --listthreads source.backup 871668681636341580140408145422 
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
----------------------------------------------------------------------------------------------------------
| _id | recipient_ids                  | snippet                        | COALESCE(recipient_prefer[...] |
----------------------------------------------------------------------------------------------------------
| 1   | +161765XXXXX                   | <#> Your Signal verificat[...] | (NULL)                         |
| 2   | __textsecure_group__!a2c3[...] | Ok                             | devgroup                       |
| 3   | +316474XXXXX                   | Last msg                       | Master Phone                   |
| 4   | +316836XXXXX                   | Ok                             | Devphone Red                   |
----------------------------------------------------------------------------------------------------------

Then, you can import a selection of threads from the source file into your current backup and export to a new backup file. Expect tons of output (I really need to clean that up sometime):

[~/programming/signalbackup-tools] $ ./signalbackup-tools --importthreads 2,3,4 --source source.backup --sourcepassword 871668681636341580140408145422 --output merged.backup --opassword 000000000000000000000000000000 current.backup 420676745407910020904427069666
IV: (hex:) e2 dd c7 b0 d7 c1 81 01 6b db f8 24 47 98 5c 35 (size: 16)
SALT: (hex:) 47 a8 83 be 1f 9f d7 a4 db 6c 82 bd c4 d2 e9 4b 5e 90 d7 fd a4 98 81 4a 62 f1 0e d6 e5 52 f7 ee (size: 32)
BACKUPKEY: (hex:) b1 59 c2 ec ce cf dc de 37 6f bd af 15 79 06 c7 30 c4 56 3f 5f 60 f8 74 67 34 90 7b a5 c5 44 2b (size: 32)
CIPHERKEY: (hex:) 69 12 52 65 c2 5d 96 e0 26 dc 46 6c 95 92 18 f0 e7 69 31 7d 07 f7 ce 7e 4d 74 76 10 d1 78 da d8 (size: 32)
MACKEY: (hex:) 37 4c bc 18 c2 93 47 60 67 63 0d 81 24 65 9e ab 55 a3 c6 17 fb 95 26 2e 4a 68 e8 aa 5c a5 0b 7e (size: 32)
COUNTER: 3806185392
Reading backup file...
FRAME 88 (100.0%)... Read entire backup file...
done!
Importing thread 2 from source file: source.backup
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
Deleting messages not belonging to requested thread(s) from 'sms'
Deleting messages not belonging to requested thread(s) from 'mms'
Deleting attachment entries from 'part' not belonging to remaining mms entries
Deleting other threads from 'thread'...
Dealing with thread id: 2
  Updating msgcount
  Setting last msg date
  Updating snippet
  Updating snippet type
Deleting removed groups...
Delete others from 'identities'
Deleting group receipts entries from deleted messages...
Deleting drafts from deleted threads...
Adjusting indexes in tables...
Compacting table: sms
Compacting table: mms
Compacting table: part
Compacting table: recipient_preferences
Compacting table: groups
Compacting table: identities
Compacting table: group_receipts
Compacting table: drafts
Found existing thread for this recipient in target database, merging into thread 5
Importing statements from source table 'sms'...4 entries...
Importing statements from source table 'mms'...3 entries...
Importing statements from source table 'part'...0 entries...
Importing statements from source table 'drafts'...0 entries...
Importing statements from source table 'push'...0 entries...
Importing statements from source table 'group_receipts'...6 entries...
Importing statements from source table 'sticker'...0 entries...
Importing statements from source table 'job_spec'...0 entries...
Importing statements from source table 'constraint_spec'...0 entries...
Importing statements from source table 'dependency_spec'...0 entries...
Importing thread 3 from source file: source.backup
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
Deleting messages not belonging to requested thread(s) from 'sms'
Deleting messages not belonging to requested thread(s) from 'mms'
Deleting attachment entries from 'part' not belonging to remaining mms entries
Deleting other threads from 'thread'...
Dealing with thread id: 3
  Updating msgcount
  Setting last msg date
  Updating snippet
  Updating snippet type
Deleting removed groups...
Delete others from 'identities'
Deleting group receipts entries from deleted messages...
Deleting drafts from deleted threads...
Adjusting indexes in tables...
Compacting table: sms
Compacting table: mms
Compacting table: part
Compacting table: recipient_preferences
Compacting table: groups
Compacting table: identities
Compacting table: group_receipts
Compacting table: drafts
Found existing thread for this recipient in target database, merging into thread 6
Importing statements from source table 'sms'...5 entries...
Importing statements from source table 'mms'...1 entries...
Importing statements from source table 'part'...1 entries...
Importing statements from source table 'drafts'...0 entries...
Importing statements from source table 'push'...0 entries...
Importing statements from source table 'group_receipts'...0 entries...
Importing statements from source table 'sticker'...0 entries...
Importing statements from source table 'job_spec'...0 entries...
Importing statements from source table 'constraint_spec'...0 entries...
Importing statements from source table 'dependency_spec'...0 entries...
Importing thread 4 from source file: source.backup
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
Deleting messages not belonging to requested thread(s) from 'sms'
Deleting messages not belonging to requested thread(s) from 'mms'
Deleting attachment entries from 'part' not belonging to remaining mms entries
Deleting other threads from 'thread'...
Dealing with thread id: 4
  Updating msgcount
  Setting last msg date
  Updating snippet
  Updating snippet type
Deleting removed groups...
Delete others from 'identities'
Deleting group receipts entries from deleted messages...
Deleting drafts from deleted threads...
Adjusting indexes in tables...
Compacting table: sms
Compacting table: mms
Compacting table: part
Compacting table: recipient_preferences
Compacting table: groups
Compacting table: identities
Compacting table: group_receipts
Compacting table: drafts
Importing statements from source table 'sms'...4 entries...
Importing statements from source table 'mms'...0 entries...
Importing statements from source table 'part'...0 entries...
Importing statements from source table 'thread'...1 entries...
Importing statements from source table 'identities'...0 entries...
Importing statements from source table 'drafts'...0 entries...
Importing statements from source table 'push'...0 entries...
Importing statements from source table 'groups'...0 entries...
Importing statements from source table 'recipient_preferences'...0 entries...
Importing statements from source table 'group_receipts'...0 entries...
Importing statements from source table 'sticker'...0 entries...
Importing statements from source table 'job_spec'...0 entries...
Importing statements from source table 'constraint_spec'...0 entries...
Importing statements from source table 'dependency_spec'...0 entries...
Exporting backup to 'merged.backup'
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing SqlStatementFrame(s)...
  Dealing with table 'sms'... 25/25 entries...done
  Dealing with table 'mms'... 7/7 entries...done
  Dealing with table 'part'... 2/2 entries...done
  Dealing with table 'thread'... 7/7 entries...done
  Dealing with table 'identities'... 3/3 entries...done
  Dealing with table 'drafts'... 0/0 entries...
  Dealing with table 'push'... 0/0 entries...
  Dealing with table 'groups'... 1/1 entries...done
  Dealing with table 'recipient_preferences'... 4/4 entries...done
  Dealing with table 'group_receipts'... 8/8 entries...done
  Dealing with table 'sticker'... 0/0 entries...
  Dealing with table 'job_spec'... 1/1 entries...done
  Dealing with table 'constraint_spec'... 0/0 entries...
  Dealing with table 'dependency_spec'... 0/0 entries...
Writing SharedPrefFrame(s)...
Writing EndFrame...
Done!

The program automatically tries to determine into which thread of the current db the old messages should be inserted. This might fail if one of the backups has a contact with a country code (+316.....) and the other omits it (06....).

Please let me know if you need any more help in running this function, I'm not sure the above instructions are very clear. And of course, if you do manage to get it going I would love to hear the results.

Good luck!

@elbrutalo
Copy link
Author

Dear beepald,

ah, great that this feature already exists!

I'll test it right now, but just to be on the safe side I'll ask the question first:

What exactly can go wrong if there are foreign contacts with an international prefix in one of the backups and not in the other? Does the whole process then stop?

In my case there are many threads with foreign phone numbers (partly only in the old backup, partly in the new and in the old one).

Is there anything else to consider before I start?

+41 / +43 / +44 / +17 / +35 / +33 / +21 / +39 etc.
then there are still threads with counterparts where the sender identification is only a text and not a number (service numbers, chatbots, messages sent via various messengers).

@bepaald
Copy link
Owner

bepaald commented Sep 16, 2019

(some of the following is guesswork, as I said, the code is not extensively tested. Keep in mind that the input backups are opened read-only, so you really can't end up any worse than you start :) )

Well nothing can go wrong exactly, but signal internally uses the phone numbers as the contact id. It is by this id that this program matches the threads in the old and new databases. So nothing can go wrong, just if you have 0611111111 in one backup and +31611111111 in the other, the threads will not be identified as the same person and will not be merged, they will just turn into two separate threads. You could check this by running the tool with --listthreads (as in the example above): the column "recipient_ids" is the identifier by which threads are merged. Any threads whose recipient_ids is not found in the other backup will get their own, new thread in the output file. I think this is the natural way, if a contact of yours loses his phone and gets a new one (with a new number), those messages would also become a new thread (so you'd have two for the same contact) because his phone number (= recipient_ids) has changed.

then there are still threads with counterparts where the sender identification is only a text and not a number (service numbers, chatbots, messages sent via various messengers).

I have very little experience with this, but I kind of expect it to work even if it's not a number, as long as it's the same string in both databases they hopefully get merged.

Is there anything else to consider before I start?

I can't think of anything else. I just tested merging the fixed backup I had truncated for testing. It seems to have worked fine, even with the missing attachments. There was also a service number in there which also seemed to work. Now all the messages in it are doubled (because I merged it with itself). I think the best way to get answers is to try it! I'm very curious myself actually, so if you have the time and feel up to it, please try it out.

PS If you have a lot of threads, it might be tedious to write --importthreads 1,2,3,4,5,6,7,9,10,11,12,13 in the command line. The program will also accept ranges of thread id's, so the previous could be written as --importthreads 1-7,9-13.

@bepaald
Copy link
Owner

bepaald commented Sep 25, 2019

@elbrutalo Did you try it out yet?

Obviously, you do not have to if you don't want, but I just glanced over the code changes for the upcoming 4.48 version of Signal (currently in beta testing), and the changes will definitely break my current merging code. So, if you are going to try, please do so before the new version comes out and before updating (or wait until I've updated my code, but it could take quite a while).

@elbrutalo
Copy link
Author

elbrutalo commented Sep 25, 2019 via email

@elbrutalo
Copy link
Author

Dear beepald,
I tried to merge the two scripts (all threads and only single threads) but somehow after a short time the script freezes my Fedora. After that I can only do a hard reset.

The problem occurs both when I select all threads (1-587) or only some threads (e.g. 1-4).

Since the computer freezes I cannot send you the terminal log but instead only two screenshots of the frozen screen:

https://www.dropbox.com/s/ekheqaz1v3yaia2/20190926_121934.jpg?dl=0
https://www.dropbox.com/s/taiwtp662eoj66l/20190926_121931.jpg?dl=0
https://www.dropbox.com/s/o2qv4501met4d9e/20190926_114333.jpg?dl=0
https://www.dropbox.com/s/s6a4uv3qi51h9cn/20190926_114323.jpg?dl=0

Do you have any idea how I can avoid this problem? thank you so much.

@bepaald
Copy link
Owner

bepaald commented Sep 26, 2019

Hi,

Thanks for trying!

Not sure what's going on here, but I can imagine the machine is running out of memory. Do you think that is possible? Though I already had some low-memory options prepared, I never bothered to enable them. The merging code was very memory hungry (more than the other functions), and in combination with your huge backup file, I can imagine the machine is getting low on RAM.

I've spent the last couple of hours enabling the low-memory mode for the merging routine (I had not maintained it properly, so it took a little work) and testing if I didn't break any other functionality. For my tests (merging 10 threads from two 96MB files) memory usage went from 520MB to around 150MB (also, it should go a bit quicker).

Please try again with the current code, I hope it helps. If it still hangs, does it at least get further along?

EDIT: Thinking about it, the max RAM used is probably around the same as the total size of the two backups you are trying to merge (in the new version), so if you have less RAM than the size of your two backups combined you might still get in trouble.

EDIT2: I've further reduced memory usage, if RAM was the problem I don't think it can be anymore. Also, with the number of threads you are merging, output will be waaay to big to capture from the terminal, so if you append | tee OUTPUT to your command (so it will look like ./signalbackup-tools --importthreads 2,3,4 --source source.backup --sourcepassword 871668681636341580140408145422 --output merged.backup --opassword 000000000000000000000000000000 current.backup 420676745407910020904427069666 | tee OUTPUT), all the output of the program will be stored in a file called "OUTPUT"

EDIT3: Sorry for all the messages. I just found a stupid bug in the merging code, but I'm to tired to fix it now, so please wait a little while, I'll have it fixed tomorrow. (fixed)

@Esokrates
Copy link

Esokrates commented Dec 10, 2019

@bepaald Yep, they belong all to that thread :-)

600|831|(NULL)|1472167059569|(NULL)
601|832|(NULL)|1472167059885|(NULL)
602|833|(NULL)|1472167060310|(NULL)
603|834|(NULL)|1472167061149|(NULL)
604|835|(NULL)|1472167061458|(NULL)
606|837|(NULL)|1472167062803|(NULL)
607|838|(NULL)|1472167063153|(NULL)
608|839|(NULL)|1472167063337|(NULL)

and the unique ID's are exactly the missing attachment errors from #7 (comment)
Btw I only left out the name of the group in the output, the name of the contact name is NULL.

@bepaald
Copy link
Owner

bepaald commented Dec 10, 2019

@Esokrates Ok good! Since the attachments are already missing anyway, the best thing to do is just delete these entries from the part database (they contain no other useful information):

./signalbackup-tools [input] [password] --runsqlquery "DELETE FROM part WHERE part._id BETWEEN 600 AND 608" --output [output] --opassword [password]

I sometimes see messages for missing attachments (like in #7 (comment)) in my own backups, but they are not a problem (I think they're just deleted attachments, where the part-entry is not removed), but none of them have NULL for size, and that seems to be the problem in your case.

Let me know if it's fixed! If it's not, we may need to do something with the mms messages these part-entries belong to (i really don't think that's necessary though). But I probably won't have anything new before tomorrow (if at all), because it's getting late here.

@Esokrates
Copy link

Esokrates commented Dec 10, 2019

@bepaald Thanks, I'll try that tomorrow, however could you tell me how to exclude 605 from the deletion, since that entry does not seem to be affected?

Am I understanding this right, that this only deletes the attachments but not the text of the messages?

@bepaald
Copy link
Owner

bepaald commented Dec 10, 2019

@Esokrates

however could you tell me how to exclude 605 from the deletion

Good catch! I'm glad one of us is paying attention :) Change the command to:

./signalbackup-tools [input] [password] --runsqlquery "DELETE FROM part WHERE part._id BETWEEN 600 AND 604 OR part._id BETWEEN 606 AND 608" -o [output] -op [password]

Am I understanding this right, that this only deletes the attachments but not the text of the messages?

Correct, the part database does not hold any of that data. Each entry in the part database belongs to a message in the mms database which holds the actual message body (which might be empty, I often send pictures without any text). You can print out some info on the messages they belong to using this command (it will show the id, the message body, and the date(_received) (in milliseconds):

./signalbackup-tools [input] [password] --runsqlquery  "SELECT _id,body,date,date_received FROM mms WHERE _id BETWEEN 831 AND 835 OR _id BETWEEN 837 AND 839"

Where that last number is the 'part.mid' (mid == mmsid) as you just posted them (#4 (comment)). Again, we are not actually touching these messages, but if you remember this info, you can hopefully see they are all still there after you restore the backup. To convert the date to a more readable format, you could probably just type date -d @[number] on the command line. I assume the date-command is present by default. Also, leave out the last three digits of the date, since the date-command expects second, not milliseconds.

Example:

[~/programming/signalbackup-tools] $ ./signalbackup-tools file.backup 000001111122222333334444455555 --runsqlquery "SELECT _id,body,date,date_received FROM mms WHERE _id = 831"
 (...)
done!
Executing query: SELECT _id,body,date,date_received FROM mms WHERE _id = 831
_id|body|date|date_received
831|(NULL)|1514250727022|1514250828000
[~/programming/signalbackup-tools] $ date -d @1514250727
di 26 dec 2017  2:12:07 CET

The date string is localized so it will probably look different (more normal) to you when you do it on your own machine.

@Esokrates
Copy link

@bepaald I tried it and now the thread doesn't crash anymore :-). Now it seems I have some zombie messages that display like this:

Example 1:
1
Sometimes also appearing as:
3
Example 2:
2

It looks like those are empty messages and in case of example 2 I can't even select that message in Signal, so could you help me find those empty messages and delete them? That would certainly make for an great general option for the tool like --fix-missing-attachments where NULLs are detected and removed and if the resulting message is empty, the message could be dropped altogether.

If I remember correctly those missing attachments were caused by the sender not being able to send the attached pictures so the spinner kept spinning indefinitely for the attached pictures.

@bepaald
Copy link
Owner

bepaald commented Dec 12, 2019

I tried it and now the thread doesn't crash anymore :-).

Yippie!

Now it seems I have some zombie messages [...] I can't even select that message in Signal, so could you help me find those empty messages and delete them?

Hmmm.... I assume these zombie messages correspond to the ones we deleted the 'part' entries from (are there 8 of them, in that same thread, around that same date)? I'm not sure why the app thinks there is still an attachment for them though... I think it has something to do with the fact that the entries in the 'mms' table still have a non-zero 'part_count' field, could you run the following command to list all messages in the mms database with a part_count > 0 and no entry in the 'part' table:

./signalbackup-tools [input] [password] --runsqlquery "SELECT _id,body,thread_id,DATETIME(ROUND(date / 1000), 'unixepoch') AS isodate,date,address FROM mms WHERE _id NOT IN (SELECT mid from part) AND part_count > 0"

IF the list is exactly the zombie messages (as far as you can tell, I hope the date and address help you determine this) you could delete this same selection of messages by running

./signalbackup-tools [input] [password] --runsqlquery "DELETE FROM mms WHERE _id NOT IN (SELECT mid from part) AND part_count > 0" -o [output] -op [password]

If I remember correctly those missing attachments were caused by the sender not being able to send the attached pictures so the spinner kept spinning indefinitely for the attached pictures.

That is probably useful info for the bug report you made (assuming someone will look at it sometime).

I totally missed the editgroupmembers option! I tried it now and it works perfectly!

Excellent

Tested, --importthreads ALL works as expected now :-).

Thank you for reporting and testing!

Removing duplicate messages
EDIT: Will --removedoubles work in my case?

No, --removedoubles will not work, that is for actual duplicates (not just the same contents, but everything, including the same delivery method).

I've tried to write a function that will scan for doubles in your case, but it's hard to test properly. Are there many of these duplicate messages? Is it doable to check the results manually or are there way too many to do that? Try to run the program like this ./signalbackup-tools [input] [password] --esokrates (obviously, pull latest version to have the new function). It will get a list of all 'unsecured messages' (I'm hoping this means the sms-type) and then search for secure messages with the same timestamp, body and address. It will say it is deleting them but this is only in memory! As long as you don't supply an -o [output] -op [password], nothing is written to disk. Please try as best you can to verify if the messages selected for deletion are ok to delete. If they are, run with --esokrates -o [output] -op [password].

@Esokrates
Copy link

Esokrates commented Dec 12, 2019

@bepaald First of all, let me say that you are amazing! :-)
The output of the first command are 831-835, 837-839.
837-839 have NULL as body, the other entries contain some of the most important text of the whole thread for me, would hurt me to loose that.
So I would like to try to remove 837-839 and see what happens. Could you tell me the syntax for that?

I would guess
--runsqlquery "DELETE FROM mms WHERE _id BETWEEN 837 AND 839"

Strangely all of the matched messages have a timestamp of Aug. 25 between 18:00 and 22:00, so I see no logical reason for example 1.

EDIT:
So I queried the messages in the affected thread that have null body:

Executing query: SELECT _id,body,thread_id,DATETIME(ROUND(date / 1000), 'unixepoch') AS isodate,date,address FROM mms WHERE body IS NULL and thread_id is 8
_id|body|thread_id|isodate|date|address
787|(NULL)|8|2016-08-17 14:40:43|1471444843444|__textsecure_group__!d9420f58768144098b9c287a10215fa2
788|(NULL)|8|2016-08-17 14:40:48|1471444848692|__textsecure_group__!d9420f58768144098b9c287a10215fa2
791|(NULL)|8|2016-08-17 14:42:10|1471444930459|__textsecure_group__!d9420f58768144098b9c287a10215fa2
792|(NULL)|8|2016-08-17 14:42:22|1471444942223|__textsecure_group__!d9420f58768144098b9c287a10215fa2
795|(NULL)|8|2016-08-17 14:44:23|1471445063020|__textsecure_group__!d9420f58768144098b9c287a10215fa2
796|(NULL)|8|2016-08-17 14:45:01|1471445101334|__textsecure_group__!d9420f58768144098b9c287a10215fa2
799|(NULL)|8|2016-08-17 14:46:13|1471445173553|__textsecure_group__!d9420f58768144098b9c287a10215fa2
800|(NULL)|8|2016-08-17 14:46:22|1471445182901|__textsecure_group__!d9420f58768144098b9c287a10215fa2
837|(NULL)|8|2016-08-25 21:57:41|1472162261399|+4<OMITTED>
838|(NULL)|8|2016-08-25 21:58:43|1472162323526|+4<OMITTED>
839|(NULL)|8|2016-08-25 21:58:56|1472162336666|+4<OMITTED>
840|(NULL)|8|2016-08-25 21:59:34|1472162374578|+4<OMITTED>
844|(NULL)|8|2016-08-26 15:32:12|1472225532124|__textsecure_group__!d9420f58768144098b9c287a10215fa2
845|(NULL)|8|2016-08-26 15:33:06|1472225586725|__textsecure_group__!d9420f58768144098b9c287a10215fa2
846|(NULL)|8|2016-08-26 15:33:22|1472225602555|__textsecure_group__!d9420f58768144098b9c287a10215fa2
847|(NULL)|8|2016-08-26 15:33:39|1472225619464|__textsecure_group__!d9420f58768144098b9c287a10215fa2
848|(NULL)|8|2016-08-26 15:34:05|1472225645888|__textsecure_group__!d9420f58768144098b9c287a10215fa2
849|(NULL)|8|2016-08-26 15:34:35|1472225675743|__textsecure_group__!d9420f58768144098b9c287a10215fa2
850|(NULL)|8|2016-08-26 15:35:00|1472225700716|__textsecure_group__!d9420f58768144098b9c287a10215fa2
851|(NULL)|8|2016-08-26 15:35:18|1472225718472|__textsecure_group__!d9420f58768144098b9c287a10215fa2
852|(NULL)|8|2016-08-26 15:35:54|1472225754535|__textsecure_group__!d9420f58768144098b9c287a10215fa2
853|(NULL)|8|2016-08-26 15:36:11|1472225771118|__textsecure_group__!d9420f58768144098b9c287a10215fa2
854|(NULL)|8|2016-08-26 15:37:26|1472225846111|__textsecure_group__!d9420f58768144098b9c287a10215fa2
855|(NULL)|8|2016-08-26 15:38:11|1472225891241|__textsecure_group__!d9420f58768144098b9c287a10215fa2
856|(NULL)|8|2016-08-26 15:38:57|1472225937206|__textsecure_group__!d9420f58768144098b9c287a10215fa2
857|(NULL)|8|2016-08-26 15:39:06|1472225946843|__textsecure_group__!d9420f58768144098b9c287a10215fa2
858|(NULL)|8|2016-08-26 15:40:16|1472226016264|__textsecure_group__!d9420f58768144098b9c287a10215fa2
859|(NULL)|8|2016-08-26 15:41:15|1472226075819|__textsecure_group__!d9420f58768144098b9c287a10215fa2
860|(NULL)|8|2016-08-26 15:41:35|1472226095158|__textsecure_group__!d9420f58768144098b9c287a10215fa2

Is it expected that the address of 837-840 differ from the rest?

@bepaald
Copy link
Owner

bepaald commented Dec 12, 2019

The output of the first command are 831-835, 837-839.

Right, that's good, I actually expected that! They are indeed the same messages, with the NULL for data_size in the part table (which we deleted). You can see the same _id's in your previous message here: #4 (comment)

I would guess
--runsqlquery "DELETE FROM mms WHERE _id BETWEEN 837 AND 839"

Yes, that looks correct to me, don't forget to add --output to save the changes. If, after deleting these empty messages you still have problems with the other ones (the ones with the important text), you could try to fix them by setting their 'part_count' field to 0: --runsqlquery "UPDATE mms SET part_count = 0 WHERE _id BETWEEN 831 AND 835". This is a little bit of a guess, so do make sure to keep the [input] file safe in case this does not do what I think it does.

So I queried the messages in the affected thread that have null body:

You're really getting the hang of this! Soon you won't need my help! ;-)
But, there are some phone numbers in the output, you might want to censor those. You beat me to it! The previous versions are still in the revision history of your post, I don't know if those are public, but I'll delete them now.

Is it expected that the address of 837-840 differ from the rest?

Well, at least it's not expected that they are the same. In group threads incoming messages have the address of the sender, outgoing messages have the group_id as address, so it's normal for messages in group threads to have many different addresses (all participants' phone numbers and the group_id)

@Esokrates
Copy link

You can see the same _id's in your previous message here: #4 (comment)

Yeah I happily noticed that :-), but I am just testing to delete the 3 messages with empty body and see if it fixes the rendering issues, it'll take some time as the files are big.
When imported I'll also look into the new deduplication function.

The previous versions are still in the revision history of your post, I don't know if those are public, but I'll delete them now.

You're great thanks, but luckily that number is from the old backup and nonexisting anymore :-).

@Esokrates
Copy link

Esokrates commented Dec 12, 2019

@bepaald Good news: Deleting the 3 messages with NULL body indeed fixed the problem :-).

Regarding the duplicate function, it complains about syntax errors:

Searching for possible duplicates of 4953 unsecured messages
SQL Error: near "heute": syntax error
SQL Error: near "heute": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
SQL Error: near "s": syntax error
...

Could me tell me how to query sms messages?

@bepaald
Copy link
Owner

bepaald commented Dec 12, 2019

Good news: Deleting the 3 messages with NULL body indeed fixed the problem :-).

That's great! So, just the duplicate messages left to fix right?

it complains about syntax errors:

I think there are some issues when the message bodies contain quotes... I'll work on a fix and let you know when I have something.

Could me tell me how to query sms messages?

Well, they are in a table called 'sms', so a simple "SELECT * FROM sms" will print everything (but that's a lot). Messages have a 'type' which is a number which can be bitmasked to get certain properties (see: https://github.com/signalapp/Signal-Android/blob/master/src/org/thoughtcrime/securesms/database/MmsSmsColumns.java#L27). My guess (and I tested this on my own databases, but I have only few examples) is that for signal messages the 'SECURE_MESSAGE_BIT' and for normal sms messages it is not. So to get all sms messages in your database (the 4953 you found in your message above), you would do "SELECT * FROM sms WHERE (type & 0x800000) IS 0)"

@bepaald
Copy link
Owner

bepaald commented Dec 12, 2019

Ok, I've fixed the quote-issue. Still don't know if it'll be the solution, but at least it shouldn't give errors now. Please try again

@Esokrates
Copy link

That's great! So, just the duplicate messages left to fix right?

Yeah :-)

WHERE (type & 0x800000) IS 0

Tested this and it's not completely safe, I found some messages that were not sent unencrypted that are now marked unencrypted. That seems to have been a bug in Signal, because those messages where never delivered to my messages partner, they had only one tick indefinitely. This happened sometimes years ago, but luckily not very often.

@Esokrates
Copy link

Ok, I've fixed the quote-issue. Still don't know if it'll be the solution, but at least it shouldn't give errors now. Please try again

Searching for possible duplicates of 4843 unsecured messages
Deleting 0 duplicates...
Deleted 0 entries

So something is wrong unfortunately.

@bepaald
Copy link
Owner

bepaald commented Dec 12, 2019

Ok, well that's not working. I have in my database 1 message, which I sent as signal message first, and then as a regular sms a minute or so later (in the same thread to the same person), so that's almost your situation I think, except the timestamp will be a bit different between the mesages. Could you try to find one of your duplicate messages with some identifying body-text and print some info on them? An example with that one message in my database:

./signalbackup-tools ~/PHONE/signal-2019-10-17-11-42-09.backup 000001111122222333334444455555 --runsqlquery "SELECT * FROM sms WHERE body LIKE '%het kaartje?%'"
signalbackup-tools source version 20191212.185330
(...)
Reading backup file...
FRAME 46397 (100.0%)... Read entire backup file...

done!
 * Executing query: SELECT * FROM sms WHERE body LIKE '%IDENTIFYING SUBSTRING%'
_id|thread_id|address|address_device_id|person|date|date_sent|protocol|read|status|type|reply_path_present|delivery_receipt_count|subject|body|mismatched_identities|service_center|subscription_id|expires_in|expire_started|notified|read_receipt_count|unidentified
2385|1|+XXXXXXXXX86|1|(NULL)|1505307658292|1505307658288|(NULL)|1|-1|10485783|(NULL)|1|(NULL)|(...censored...)IDENTIFYING SUBSTRING(...censored...)|(NULL)|(NULL)|-1|0|0|0|0|0
2386|1|+XXXXXXXXX86|1|(NULL)|1505308118963|1505308118961|(NULL)|1|-1|87|(NULL)|0|(NULL)|(...censored...)IDENTIFYING SUBSTRING(...censored...)|(NULL)|(NULL)|-1|0|0|0|0|0

Please post the output, but remember there is a phone number and the message body in there, so censor those (but tell me if they are not the same). Hopefully this way we can see more easily what exactly is the same and what is different in these duplicate messages. (As you can see, in mine only the 'type' is different (and the timestamp), so that's why I thought that would be the way to identify your duplicates)

@Esokrates
Copy link

Esokrates commented Dec 12, 2019

17|2|+4<PRIVATE>|1|(NULL)|1453138161138|1453138154075|31337|1|-1|10485780|1|0|(NULL)|<PRIVATE>|(NULL)|GCM|-1|0|0|0|0|0
4839|2|+4<PRIVATE>|1|(NULL)|1453138161138|1453138161138|0|1|-1|20|(NULL)|0|(NULL)|<PRIVATE>|(NULL)|(NULL)|-1|0|0|0|0|0

This implies that the offset should be 4822 so I guess this is the real number of duplicate messages.

I think I'll be reasonably safe with

FROM sms WHERE (type & 0x800000) IS 0 AND thread_id IS 2 AND body IS NOT NULL

which outputs messages with ID from 4835 and 9646, so I guess actually it is safe to delete all messages with ID between those numbers.
9646-4835=4811 which actually less than the offset, which should be ok, it would be more worrying if its greater than the offset.

@Esokrates
Copy link

Esokrates commented Dec 12, 2019

Now I am a bit puzzled: The id's in the output are strictly monotone increasing, the first message has ID 4835 and the last hast ID 9657 which makes for 4812 messages, yet counting the lines with

--runsqlquery "SELECT _id FROM sms WHERE (type & 0x800000) IS 0 AND thread_id IS 2 AND body IS NOT NULL" | wc -l

gives me 4820. How is that possible?

EDIT: Ah, I forgot about your debug output which I can't suppress sadly.
EDIT 2:

--runsqlquery "SELECT _id FROM sms WHERE (type & 0x800000) IS 0 AND thread_id IS 2 AND body IS NOT NULL" | grep -E "^[0-9]" | wc -l
4807

so I'am reasonably certain that its safe to delete all of those, since the messages before 4835 are the same as the last of the matches.

@Esokrates
Copy link

@bepaald Is it safe to specify an existing backup as output file? Will it safely overwrite that file or may that lead to something unexpected?

@bepaald
Copy link
Owner

bepaald commented Dec 12, 2019

I think I just found why the --esokrates failed before and fixed it, so you might want to try it again. I've tested it this time with the data you provided before. If you're tired of it by now, your own idea of just doing DELETE FROM sms WHERE (type & 0x800000) IS 0 AND thread_id IS 2 AND body IS NOT NULL could work as well. The --esokrates function will give a lot of output if there really are over 4800 duplicates, but you won't manually go over all of them anyway, so just look at a couple of them and check the list of _id's (it prints that right at the end).

Is it safe to specify an existing backup as output file? Will it safely overwrite that file or may that lead to something unexpected?

Should be safe, as long as it's not the same as the input file (even that might actually work, I just wouldn't dare try it with important data).

@Esokrates
Copy link

@bepaald The --esokrates works great now. I wrote a shell script to check all those candidates between 4835 and 9646 that were not listed by your function. Those were all status messages by signal that were duplicated too, but in a slightly different way (for example the contact name was included in the status message duplicate, while not being included in the original one). So I ended up doing

--runsqlquery "DELETE FROM part WHERE part._id BETWEEN 600 AND 604 OR part._id BETWEEN 606 AND 608" --runsqlquery "DELETE FROM mms WHERE _id BETWEEN 837 AND 839" --runsqlquery "DELETE FROM sms WHERE _id BETWEEN 4835 AND 9646"

overall to fix all mentioned problems. As far as I can see now everything works as I wanted it to do.

So I'll stick to the merged backup file now. Is there anything I could test for you? I'll wait for your reply before upgrading to the latest Signal release.

This won't be my final message here, but I want to say a very very big thank you for the amazing tool and your incredible support and efforts! As I said earlier, you are amazing! Kudos!

@bepaald
Copy link
Owner

bepaald commented Dec 13, 2019

@Esokrates

As far as I can see now everything works as I wanted it to do.

Excellent! Very happy for you. I had forgotten that the plaintext backup would not export the raw status messages, but would decode them first. So indeed the body would not match the original body. I could have tried to write something for that as well, but your script will probably have been a lot quicker.

So I'll stick to the merged backup file now. Is there anything I could test for you? I'll wait for your reply before upgrading to the latest Signal release.

Thanks! I'm trying to think of something, but I really can't right now. You should probably just upgrade. Make sure to save a backup before updating, just in case something goes wrong during the database migrations the app is going to have to do coming from your old version.

This won't be my final message here, but I want to say a very very big thank you for the amazing tool and your incredible support and efforts! As I said earlier, you are amazing! Kudos!

No problem! You certainly had some unexpected problems and requests, but it's fun trying to get everything fixed and even more fun succeeding :-) Very happy to have been able to help!

@Esokrates
Copy link

@bepaald The upgrade seems to have worked very well. :-) Only issue now is Signal making a backup every day and making one backup sucks battery badly because of the much bigger size now but I guess I'll just disable the backups and manually back up every now an then / request a feature to manually specify the backup interval.

A couple of questions / thoughts regarding signalbackup-tools:

  1. Will you port all the merging features forward to the new database?
  2. Could you make -op an optional argument? When not specified it will just use the input password.
  3. As in my case the backups are big files and I do not want so allocate too much space over time it would be great to have some kind of diff utility, to check if the newer backup file contains everything the old backup contains, but I guess that might be difficult over time when Signal continues to change the database, but maybe you are interested to implement such a feature. Just an idea.

@bepaald
Copy link
Owner

bepaald commented Dec 19, 2019

@Esokrates

The upgrade seems to have worked very well. :-) Only issue now is Signal making a backup every day and making one backup sucks battery badly because of the much bigger size now but I guess I'll just disable the backups and manually back up every now an then / request a feature to manually specify the backup interval.

Good! I seem to remember an editable backup interval has been requested in the past, but nothing seems to have come of it. I would love that feature too though, I'd probably set it weekly.

Will you port all the merging features forward to the new database?

I plan on it, the normal merge was already ported and I have ported the --mergerecipients option earlier this week (I just had a bunch of other half-complete changes so I couldn't push the update yet, will do later today though). I think it's only the --mergegroups option that needs porting, but I think that is quite an easy one to do, maybe I'll get around to it this weekend, but I've been very busy this week. Of course, even though the new versions' code is very similar to the old code, it's technically untested.

Could you make -op an optional argument? When not specified it will just use the input password.

Yes. It always used to be like that, but when I added support for the input and output to be an unencrypted directory, I had to remove that because when outputting to a directory, no password is given (as it's unencrypted) and I used the fact that no output password was supplied to determine the export should output to directory. Anyway, locally, I've already made the changes to have the program check the filesystem to find out whether output is a file or a dir, so now the output password is optional again. Also, the program now refuses to overwrite files unless the --overwrite options is supplied. These changes will also be in the push later today

As in my case the backups are big files and I do not want so allocate too much space over time it would be great to have some kind of diff utility, to check if the newer backup file contains everything the old backup contains, but I guess that might be difficult over time when Signal continues to change the database, but maybe you are interested to implement such a feature. Just an idea.

Yes, I would love something like that. And I have given it quite a bit of thought as it would be very useful in the test scripts I run after big changes (that's why I haven't pushed, still waiting for the tests to finish). But really I have not come up with any (simple) way to guarantee that two backups are equal in all cases. In simple cases (decode the backup to directory, then pack the directory up using the same password) the backups are bit-for-bit identical so that's easy. When using some operations (split the backup in two parts, the combine them with --importthreads) the resulting file should be the same, but because the split+merge has changed most of the _id's in the database I can only decode both files and check all attachments are still there, and run some commands to compare the database contents (count number of threads, number of messages in each thread, number of attachments in each thread), but it will only make it more plausible the backups are the same, not a mathematical certainty.

In your specific use case, it might even be a little more difficult because you actually know and expect the files to be different. I will continue to think about this, but it will certainly take a while.

@bepaald
Copy link
Owner

bepaald commented Nov 1, 2020

Since this issue has not had any comments in almost a year, just to clean up, I am closing this. Feel free to start a new one.

@bepaald bepaald closed this as completed Nov 1, 2020
devnoname120 added a commit to devnoname120/signalbackup-tools that referenced this issue Mar 8, 2023
Fix bepaald#9
Fix bepaald#85
Fix bepaald#70
Fix bepaald#53
Fix bepaald#38
Fix bepaald#4
Fix bepaald#1

Note: we could easily add support for g++-8 by copy/pasting the following changes: https://github.com/InfiniTimeOrg/InfiniSim/pull/83/files
devnoname120 added a commit to devnoname120/signalbackup-tools that referenced this issue Mar 8, 2023
Fix bepaald#9
Fix bepaald#85
Fix bepaald#70
Fix bepaald#53
Fix bepaald#38
Fix bepaald#4
Fix bepaald#1

Note: we could easily add support for g++-8 by copy/pasting the following changes: https://github.com/InfiniTimeOrg/InfiniSim/pull/83/files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants