-
-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messages received while client closed not decrypted #2733
Comments
Yup, seeing the same but didn't repro when I was using debug :) Basically they do get decrypted (afaics), since the notification shows them just fine (!!!) but on screen I get a red text with an error, then the text disappears and nothing is left. Will try to grab a log asap. |
FYI, I ran the same test, but instead of closing the client, I turned wifi off to the computer so that the client disconnected. When I turned wifi on and the client reconnected, I got the same result (not able to decrypt) for messages sent to both MUC and 1:1 while disconnected. Why would |
It looks like disconnecting/reconnecting (for instance in Then, I think at some point during the process but after MAM has returned values that can't be decrypted because the omemo_store isn't repopulated, |
Thanks for your detailed bug report and comments @afriedmanGlacier, it helps a lot. Based on everything you've wrote, your suspicion that it might be a race condition and looking at the code, I have a hunch as to what the issue is. It could be that MAM messages arrive and start to get parsed, before the OMEMO session has been fully restored, which is why I'm working on a fix. |
before parsing message stanza for encryption parameters. Otherwise we might not know what our own device-id/sid is, and therefore can't decrypt the incoming message. Updates #2733
Initial tests look good! Sent messages while client was closed and while client was disconnected. We'll test a bit more, but it looks promising! |
@afriedmanGlacier: I'm done with the work in this ticket's branch. Sounds like the issue is solved now? If so I'll merge the branch and close. |
Yep! That should be good, thanks!
…On Wed, Nov 24, 2021 at 4:46 PM JC Brand ***@***.***> wrote:
@afriedmanGlacier <https://github.com/afriedmanGlacier>: I'm done with
the work in this ticket's branch.
Sounds like the issue is solved now? If so I'll merge the branch and close.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2733 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFMALWQYSR2D3I4ROPJCRNTUNVMJ3ANCNFSM5ITU3C7A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Thanks everyone! |
So this is really fixed for you? Still seeing this with f25f110 I've written some messages to the Converse user while Converse was not online, so I login and see the errors: ...but the notifications look fine: And then it disappears totally. Console, with debug says (kinda for each message):
|
Sorry, I have not been seeing that error. It's possible that we changed something in our local copy that affects that issue, but I'd have to go back and compare to see if that's true. When I get the chance to look, I'll let you know if I find anything that looks like it might be related. Perhaps put your comments above in another ticket? |
Actually, we are seeing something like this again. I think it's the same issue. Can we re-open this ticket? Again, we are using a version of converse-desktop that has been updated with most of the latest commits. We are using IndexedDB, websockets, ejabberd 21.01, MAM When we lose connection due to desktop device going to sleep or loss of wifi, and messages are sent while there is no connection, and then we reconnect, we can still lose the messages in spite of the fixes in this ticket. This now seems to occur more in 1:1 messages than groups, and it may be due to timing. Here's the sequence I traced out:
ERROR: TypeError: Cannot read property 'get' of undefined at Object.getDeviceID When this happens, the message gets lost. For me, it brings up two questions:
EDIT: for the 2nd question, I just realized that even if we don't delete omemo_store, the sessions probably also need to be rebuilt before it tries to decrypt? |
I realized that promises are added to the api.promises array, but never removed. So, maybe once "OMEMOInitialized" happens the first time, the "waitUntil" method will always return immediately instead of waiting because the Promise has already been resolved? Maybe that Promise needs to be removed on disconnect and re-added? EDIT: I ended up adding "api.promises.add(['OMEMOInitialized']);" right after "delete _converse.omemo_store;" after the connection teardown in order to replace the Promise with a new one. That worked to cause getDeviceId and other things to wait for OMEMOInitialized to happen again, but then it still failed in "decryptWhisperMessage". So I'm not sure if there are other things that need to be re-initialized or not. |
Yup, still an issue with 0f648ed |
@licaon-kter in my testing that latest commit does fix most of the issue under the hood, but the messages are still lost so its not any different for the user. I am hopeful that #2803 should complete the fix. It works in my testing, I look forward to seeing if it works for you. |
@afriedmanGlacier I'll rebuild... |
@afriedmanGlacier: yesterday I made some small changes to what you did in #2803 to get the tests to pass. The main issue was that the parsed messages need to be reassigned to You might want to double check again that things work correctly for you. See here: 46e6cfb |
@jcbrand I tried almost that exact same thing before, and couldn't get it to work. Since then, I realized there are two different scenarios that react differently (and I was testing the one that didn't work).
In almost all of my initial testing I was turning off the wifi and waiting for the However, if the connection is gone for more than about 5 minutes, there is a message in converse saying that the session cannot be resumed and a new one will be created. There is a In real life, the more likely scenario is the second one (being offline more than 5 minutes), such as when a computer is locked or goes to sleep overnight. In this case, in my testing so far, your fix seems to be working. However, if I receive a message after the websocket is closed but before the time period when the session is reset, it doesn't always work. Another thing I noticed, if I am open to a conversation in the desktop app in scenario 1 described above, I am more likely to have the decryption issues on a message sent than if a message was sent to a different conversation. I've tried various things and found one solution that seems to work in both/all cases. It's a bit different than your commit. Assuming it proves to be solid after we test it a bit more, I'll either submit a PR or just post it here to see what you think. |
Thanks for the explanation @afriedmanGlacier. I think the difference between scenario 1 and 2 might be that in scenario 1 the delayed message is not received via MAM. If the session is resumed via SMACKS, then delayed messages are received as normal messages, not as MAM messages. The fix you made recently was for MAM messages. So the fix for scenario 1 is likely somewhere else, unrelated to MAM. Probably again some kind of race condition. |
Sounds like what I saw, mentioned here: #2803 (comment) |
Based on what I was seeing, that makes sense, but I think the race condition is actually related to MAM. I noticed yesterday that messages were coming in via both MAM and normal method (testing scenario 1 above), but if I remember correctly, I think the MAM message was coming in more quickly and was not getting decrypted for some reason. Then when the message came in normally and was properly decrypted, it was considered a duplicate and was thrown out. I added something from my local testing to your fix, and it seems to work although I'm not sure it's the best solution. In the handleMAMResult function of headless/plugins/mam/utils.js, I changed
to
Thoughts? |
That sounds like the real bug then. We need to figure out the "some reason", e.g. why certain messages don't get decrypted. Concerning your fix, that seems to be a bandaid around the fact that the message doesn't get decrypted. Better than nothing, but not ideal because the underlying issue isn't fixed. In your fix you're basically preventing the undecrypted message from being processed further and shown. You're however implicitly relying on the fact that the message will come again in a decryptable form, which isn't guaranteed. |
Thinking about this some more... the double-ratchet mechanism of libsignal prevents the same message from being decrypted twice. So it makes sense that when the same message comes in twice, once via MAM and once via SMACKS resumption, that decryption for one of them will fail. I think that might be the "some reason". If this is the case, it means the first message comes in, gets decrypted, then the second comes in, can't get decrypted, but it then still overwrites the first message, thereby removing the decrypted plaintext. In light of that, I think your fix might be on the right track. The code needs to be updated to be aware of the fact that an already decrypted message can't be decrypted again and it shouldn't be allowed to overwrite the already decrypted message (which seems to me is what's happening). There is message deduplicating code already, perhaps that code needs to be updated to deal with this scenario as well. |
Sounds like you have a much better idea of how to proceed than I do :) |
I'll see if I can reproduce it with a test and come up with a fix. |
@afriedmanGlacier Please check whether this commit fixes it for you: 906f8d6 EDIT: I noticed an error and force-pushed to the original commit |
I was testing with the original push, and was about to respond that it wasn't working quite right, but I'll test the update tomorrow. Thanks! |
Progress! But not quite there. Now it is decrypting, updating, and showing on screen briefly, but the message is still disappearing a few seconds later as if the I tried also adding this to the updated attributes, but it didn't help:
One thing I noticed is that the one message received after the session is resumed by SMACKS starts decrypting (getting into libsignal and thereby "winning" the race), but the other MAM message is received at almost the same time, fails and completes processing more quickly. So the message object that is saved is the failed one and should be updated, but I'm not sure why after it updates it still disappears as if it was ephemeral. Every once in a while the non-MAM message finishes decryption before the MAM message starts processing, and then the message stays on screen and doesn't disappear, like we would expect. |
I think I figured this last issue out. When Message is initialized (headles/plugins/chat/message.js), So I locally moved the below code outside of the
|
Testing with HEAD (0fb4dca) I saw the error again at one point, red text, notification with actual message decrypted. After an OS restart I shared a link from another account to this user, then I logged in Converse. That 1:1 chat had 2 pics shared. Waiting for the pictures to appear the link appeared first, but was positioned above the pics, basically was shown 5h earlier than the actual post time |
- Clear timer when a messages changes from epehemeral to non-ephemeral - Set MUC occupant on `groupchat` message when `type` changes to `groupchat` (from `error`) - Set roster contact on `chat` message when `type` changes to `chat` (from `error`) Thanks @afriedmanGlacier
@afriedmanGlacier: Great work, thanks!
Looks like a bug to me. I've pushed a commit that fixes it based on your analysis. Please let me know whether this is enough to close this ticket now |
Since you are listening for a change in
I'll test today and get back to you. Thanks! |
Looks like I missed |
Tests look good so far, I think this can be closed! Thank you @jcbrand! |
Looks okaish from here too |
Describe the bug
When re-opening the client after it has been closed, during which time encrypted messages have been sent to that user, the messages are not decrypted.
Note: This was originally part of #2241 but seems to be a different issue.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The decryption issue does not happen 100% of the time, but quite often. Messages should always be able to be decrypted.
Environment (please complete the following information):
Additional context
After the update for getting sender real JID (33f3cb7), we are correctly receiving the JID, but the decryption still often fails when opening the app for messages that arrived while closed. In looking at the message attributes during the decryption process, I noticed that it fails with no warning in plugins/omemo/utils.parseEncryptedMessage because it doesn't have attrs.encrypted.key.
In the attrs: "is_encrypted":true,"encrypted":{"device_id":"406865841"}. In other words it doesn't have the iv, key, etc.
However, the correct rid is clearly among the keys in the message. In headless/shared/parsers.getEncryptionAttributes, the line const device_id = _converse.omemo_store?.get('device_id'); doesn't return a device_id. The store appears to exist but device_id doesn't seem to be accessible yet when the first MAM messages come in for some reason.
In #2241 , @licaon-kter made the comment "Messages sent while Converse was not connected are almost surely not decrypted. Note: no key change, no IndexDB clean up. The old messages (already decrypted) are shown fine though." which sounds like it might be related to this particular error.
The text was updated successfully, but these errors were encountered: