Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Do not lose tweets on TLS issues #9

Closed
IBBoard opened this issue Sep 1, 2019 · 24 comments
Closed

Do not lose tweets on TLS issues #9

IBBoard opened this issue Sep 1, 2019 · 24 comments
Labels
enhancement New feature or request external Problems caused by dependencies and other systems

Comments

@IBBoard
Copy link
Owner

IBBoard commented Sep 1, 2019

Sometimes (possibly just on one network) I get TLS connection errors. Normally this happens when RTing, but occasionally it happens when tweeting. Normally, Corebird/Cawbird keeps the last tweet, but in one specific situation that I think is related to the TLS errors then it forgets it. My assumption is that it doesn't handle the error correctly and treats it as posted.

@IBBoard IBBoard added bug Something isn't working info required This doesn't seem right or more details are required and removed info required This doesn't seem right or more details are required labels Sep 1, 2019
@IBBoard
Copy link
Owner Author

IBBoard commented Sep 3, 2019

Error is:

Exception: Error performing TLS handshake: An unexpected TLS packet was received in TweetUtils.vala:91

At least it's from Vala code and we get a line number!

@IBBoard
Copy link
Owner Author

IBBoard commented Sep 3, 2019

This may be handled correctly on sending tweets. start_send_tweet in ComposeTweetWindow.vala only wipes the last message when the async finishes and the tweet succeeds (src).

But save_last_tweet only saves the text when we're not replying to something. So maybe we lose replies.

IBBoard added a commit that referenced this issue Sep 7, 2019
Previous code assumed a best-case situation that never failed because of network errors.
It also forgot to bail after an exception.

RTs now say whether they succeeded or not.
IBBoard added a commit that referenced this issue Sep 7, 2019
Added bonus: Favouriting changes the fav count!

Also, added code reuse
IBBoard added a commit that referenced this issue Sep 7, 2019
Still need to send a DELETE message so that the tweet
disappears everywhere.

Probably also need to handle DELETE on ProfilePage.
IBBoard added a commit that referenced this issue Sep 8, 2019
This also has to:
 * Load the tweet asynchronously
 * Add fields to save the tweet ID
 * Change when we save/clear
 * Add extra UI for clearing saved state (e.g. if replying to a
    now deleted tweet)
@IBBoard
Copy link
Owner Author

IBBoard commented Sep 10, 2019

From initial testing then the patches are working. I've had an RT that failed that then reset back to the "not RTed" status so that I could RT properly. Previously the UI would say RTed, but the dialog says it failed, then you'd un-RT and it'd complain that the tweet didn't exist, and then you'd RT.

The only problem is that I've had to add a "Save tweet?" dialog to give users a way out when they quote a tweet, it fails for transient reasons (like the TLS issue), and then by the time they go to post it again then it fails to load the tweet and so they want to clear the draft tweet.

One thing I need to think about is attached media. I'm not sure how well we can handle keeping draft attached images.

@IBBoard
Copy link
Owner Author

IBBoard commented Sep 13, 2019

Today was a REALLY bad day for these errors. Happened repeatedly. Not a clue what's happening with Twitter/the network. I briefly tried Wireshark but it wasn't hugely helpful.

However, I did find that I've got more work to do:

  • Fix the double notification (I'm considering rearchitecting how that works)
  • Don't inject the DM until we know it sent
  • Reset the "Send" status and stop the spinner when tweeting fails

Also, I still need to look at attached media.

IBBoard added a commit that referenced this issue Sep 14, 2019
 * Set status based on correct flag
 * Don't close action panel until success (as per RT button)
IBBoard added a commit that referenced this issue Sep 14, 2019
 * Alert is now centred
 * Emoji button gets disabled
 * Send button gets reactivated
 * Title goes back to text, not spinner
 * Image buttons become active again
IBBoard added a commit that referenced this issue Sep 14, 2019
 * Errors are bubbled out to the caller (through async pattern
   used for `download_avatar` method)
 * Alert is now centred
 * Emoji button gets disabled
 * Send button gets reactivated
 * Title goes back to text, not spinner
 * Image buttons become active again
IBBoard added a commit that referenced this issue Sep 14, 2019
 * Swap to injecting DM message that Twitter gives in response
 * Only inject DM when it succeeds
 * Track whether we've seen DMs, not whether it's newer than the newest
   (in case the other party replied before our last message, but we
   don't poll until we added our message to the model)
IBBoard added a commit that referenced this issue Sep 15, 2019
This solves a crash from the null assertion in cb_compose_job_send_async
@IBBoard
Copy link
Owner Author

IBBoard commented Sep 15, 2019

Okay, I think that tweet, quote, reply, DMs, RT/unRT, favourite/un-favourite and delete are all working now (as in they catch the error and report it to the user, then put the system in a sensible state).

We've also got the "Save?" dialog when you close the compose window with text. I'm not hugely keen on it, but it gives us a way out of the "compose a reply, sending fails, re-open compose later, try to load tweet that you were replying to, tweet doesn't exist, cancel button saves tweet, re-open compose, try to load tweet, tweet doesn't exist…" loop. But as I type this then I'm wondering whether we're better off asking the user whether to ditch the saved tweet when fetching the reply fails.

Now I need to look at saving attached images and work out whether it's better to save the local paths and handle the user deleting them, or save the remote paths (which would lose incomplete uploads) and work out how to download the thumbnail.

Technical detail

From running Wireshark and decrypting my comms (instructions) and then running with GNUTLS_DEBUG_LEVEL set and poking around the gnutls source code then it looks like the problem comes when Twitter sends a "Server Hello" with "New Session Ticket" as well as "Change Cipher Spec".

gnutls[4]: HSK[…]: Server's version: 3.3
gnutls[4]: HSK[…]: SessionID length: 32
gnutls[4]: HSK[…]: SessionID: 5fe…
gnutls[4]: EXT[…]: Parsing extension 'Safe Renegotiation/65281' (1 bytes)
gnutls[4]: HSK[…]: Safe renegotiation succeeded
gnutls[5]: REC[…]: SSL 3.3 Handshake packet received. Epoch 0, length: 186
gnutls[5]: REC[…]: Expected Packet ChangeCipherSpec(20)
gnutls[5]: REC[…]: Received Packet Handshake(22) with length: 186
gnutls[5]: REC[…]: Decrypted Packet[1] Handshake(22) with length: 186
gnutls[3]: ASSERT: record.c[recv_hello_request]:774
gnutls[3]: ASSERT: record.c[_gnutls_recv_in_buffers]:1579
gnutls[3]: ASSERT: record.c[_gnutls_recv_int]:1777
gnutls[3]: ASSERT: handshake.c[recv_handshake_final]:3272
gnutls[3]: ASSERT: handshake.c[handshake_client]:3074
gnutls[3]: ASSERT: buffers.c[_gnutls_io_read_buffered]:589
gnutls[3]: ASSERT: record.c[_gnutls_recv_int]:1777
(cawbird:14730): cawbird-WARNING **: 11:43:55.926: Could not send tweet: Error performing TLS handshake: An unexpected TLS packet was received.

I still don't know why this is a problem, but the gnutls state machine appears to be in an incorrect state (or Twitter are sending things out of order) and it's expecting the wrong bit.

@IBBoard
Copy link
Owner Author

IBBoard commented Sep 15, 2019

Question answered on the "save local image path or uploaded image" - from the docs:

  • The returned media_id and media_key are only valid for expires_after_secs seconds. Any attempt to use either after this time period in other endpoints will result in an HTTP 4xx Bad Request.

(Emphasis mine)

Given that we don't know when we'll come back to it, we'll have to assume that users keep the images that they're attaching (and if they don't then it's their fault!). The example shows 24h, which will generally be sufficient, but we can't guarantee that'll be the case.

IBBoard added a commit that referenced this issue Sep 15, 2019
 * Only prompt the user about saving when a tweet load fails
   * If it fails and they say no, reset everything
   * If it fails and they say yes, close so they can try again
     (maybe it was a transient error)
 * Follow the "throw errors" pattern for `TweetUtils.get_tweet`
IBBoard added a commit that referenced this issue Sep 15, 2019
File/line numbers are already recorded, and we can look up the
readable text for the domain quark
@IBBoard
Copy link
Owner Author

IBBoard commented Sep 15, 2019

Okay, I've improved how we handle saving - most users won't see a prompt. You'll now only get prompted if you compose with a draft reply/quote and it fails to load the referenced tweet (e.g. TLS error).

Only thing to do now is load/save draft images.

I've patched my custom version of Cawbird with the branch as it stands, so I'll do some testing before I merge.

IBBoard added a commit that referenced this issue Sep 20, 2019
This is in-line with RT and Favourite buttons

(Reply and Quote are different because the dialog can't fail, and
the compose dialog is then responsible for handling TLS errors)
IBBoard added a commit that referenced this issue Sep 28, 2019
Baedert added "draft" texts, but now we support draft images as well.

If a TLS error happens and the user doesn't close the compose window
then we can use the existing IDs because they're live for ~24h and the
user probably won't wait that long.

If they close the window then it could be re-opened the next day so we
can't rely on the image IDs on the Twitter server still being live, so
we need to re-upload them.
IBBoard added a commit that referenced this issue Sep 28, 2019
We hope the images are still there when we re-open the compose window.
But maybe they aren't.
@IBBoard
Copy link
Owner Author

IBBoard commented Sep 28, 2019

I've put in some load/save draft images stuff now. Possibly not so relevant since we now don't close the window and lose the tweet if it fails anyway, but a) useful for when people give up and try again later and b) it's let me spot another TLS error case to handle!

Still to go:

  • TLS error during timeline loading
  • TLS error during image upload

Anything I missed?

I'm vaguely suspicious that it's a GnuTLS bug underlying all of this, but I can't track it down and I don't know that level of the network stack enough to just drop in to their community and go "what's going on here?". Also, it could be GnuTLS following the spec and Twitter not doing.

IBBoard added a commit that referenced this issue Sep 28, 2019
This required a swap to a button (not a widget) and hooking up events

We also remove classes on upload completion to make sure we don't
get conflict
IBBoard added a commit that referenced this issue Sep 28, 2019
If timeline/mentions/favourites/DMs fails because of the specific
SSL error code then retry.

This appears to work, but it *might* cause problems (e.g. hammering out
requests with no back-off approach when the certificate is broken)
@IBBoard
Copy link
Owner Author

IBBoard commented Sep 28, 2019

Okay, timeline loading issues have a basic cludge. Image upload is sorted. I'm going to run a test build for a while to see if I hit any issues.

@IBBoard
Copy link
Owner Author

IBBoard commented Oct 2, 2019

It looks like GnuTLS are acknowledging this as a bug, so all of these changes will be defensive and shouldn't be needed in future. It appears to be something that Twitter changed that GnuTLS doesn't handle cleanly.

https://gitlab.com/gnutls/gnutls/issues/841#note_225110002

IBBoard added a commit that referenced this issue Oct 5, 2019
Paging isn't forcibly reloaded because the user can just try again
but these functions are "load at start" actions
@IBBoard
Copy link
Owner Author

IBBoard commented Oct 5, 2019

Okay, I think this is about done now. I've still got to look in to the GnuTLS side, but most things should now be handled correctly, fail and retry, or fail silently for some remaining less important things.

I've merged to Master and will release soon, so give me a shout if there's anything else that people find.

@IBBoard
Copy link
Owner Author

IBBoard commented Oct 7, 2019

It looks like there's a simple fix upstream in GnuTLS that appears to work for me.

It's currently scheduled for v3.6.11 in December.

In the mean-time, I'll consider this as fixed as it's going to be, unless someone tells me otherwise.

@schmittlauch
Copy link
Contributor

@IBBoard Have you really checked compiling cawbird with GnuTLS from master?
I tried backporting the patches from this commit and compiled cawbird with a glib-networking that uses this patched GnuTLS in NixOS, but I still get TLS errors.

I want to test building cawbird agains the latest GnuTLS master but currently fail at building the latter.

@IBBoard
Copy link
Owner Author

IBBoard commented Nov 1, 2019

I've been running with a custom build of GnuTLS for a few days that basically just adds MR 1087 to the 3.6.10 code and I've not seen a single TLS error.

I don't believe it's glib-networking that needs recompiling with the updated GnuTLS. I think it's libsoup. I've rebuilt GnuTLS but not glib-networking.

@schmittlauch
Copy link
Contributor

I don't believe it's glib-networking that needs recompiling with the updated GnuTLS. I think it's libsoup.

That is pretty interesting, as libsoup does not even take gnutls as a dependency in NixOS but just depends on glib-networking.
And that does take gnutls as a dependency.

@IBBoard
Copy link
Owner Author

IBBoard commented Nov 2, 2019

Huh, maybe you're right. Maybe I was reading old docs where libsoup used GnuTLS directly. I just rebuilt them both and it worked, but looking at build specs and ldd output then libsoup doesn't reference gnutls but does reference GIO's g_tls_… functions and glib-networking provides /usr/lib64/gio/modules/libgiognutls.so, which loads libgnutls.so.30. I might not need to rebuild libsoup.

@schmittlauch
Copy link
Contributor

Good news, having built Cawbird against NixOS-unstable with the latest GnuTLS release, I haven't encountered any TLS issues for several hours so far \0/

Good job.

@IBBoard
Copy link
Owner Author

IBBoard commented Jan 8, 2020

I just debugged it. The GnuTLS people knew the protocols enough to understand what to do to fix it!

@philipzae
Copy link

I'm running Ubuntu 20.04 with GnuTLS 3.6.13 and still getting the error.

@philipzae
Copy link

Well i'm running the snap which has a 18.04 base, so maybe that's the issue then.

@j1warren
Copy link

I've never stopped having this error the first time I try to open any tweet in the timeline.
Pressing back and trying again then works on any tweet.
18.04, 1.0.5 now.

@IBBoard
Copy link
Owner Author

IBBoard commented Apr 18, 2020

As far as I can tell, it looks like Ubuntu 18.04 is still using an old version of GnuTLS (possibly 3.5.18).

From what I understand of Snap, I think it bundles its own version of many libraries and I assume it is also outdated if you're seeing the errors.

There's nothing we can do. The changes in this ticket put in the best work-arounds we can without causing different issues. The only real solution is for distros and "application distribution systems" that insist on bundling their own copies of libraries to update their libraries to fix the real bug.

@philipzae
Copy link

I've filed a bug with the snap dev of cawbird as this is fixable in snaps.

@philipzae
Copy link

Patch to fix it in 18.04 and 19.10 was submitted by the snap maintainer.
https://bugs.launchpad.net/ubuntu/+source/gnutls28/+bug/1873565

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request external Problems caused by dependencies and other systems
Projects
None yet
Development

No branches or pull requests

4 participants