New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP 499 for polling iOS clients #677
Comments
|
This can't possibly be a timeout issue. The GET start end end are only 2 seconds apart. That's still long, but it shouldn't lead to a timeout client side. Sadly it looks like I cannot wireshark/tshark on unix sockets, so I'm not sure if I can see what's happening on the wire. 🤔 nginx:
ntfy:
|
I couldn't figure out how to intercept the traffic without restarting or messing with nginx. I am seeing evidence that this is not related to iOS at all though, and likely more on the thundering herd. It's happening to the web app and Android clients as well on occasion. I am too tired now, but I will look more tomorrow. |
Hmm. Ok, I can reproduce it on a topic that has a lot of messages, by just doing:
Not surprising, but I am wondering if the non-iOS ones we're seeing is just these cases... |
More data: Happens exactly every 20min as we know already, almost exactly to the second:
Happens pretty much exclusively on iOS (ntfy/1.2), but that could be false hint, since they all connect at the same time:
|
I am not really closer. I added logging times to nginx, and tuned the nginx.conf:
What I see is that during the thundering herd, ntfy requests in the backend take very long to complete (6-8 seconds), but nginx timings do not at all match the ntfy backend. It is entirely conceivable that if requests actually take this long, that iOS will hang up.
In the example about (which is representative), you can see that the backend request took almost 9 seconds, but nginx reports a response time of 0.7s and a status code of 499. So it could be that the iOS client really did hang up after 0.7s (unlikely), and the backend request just kept going. |
So. My profiling endeavors show me that the 499-spikes are likely simply caused by CPU exhaustion. The bottleneck for this many queries seems to be simply related to SQLite queries, and not anything in the code in particular. I think with a bigger box, we'd be fine, which is good to know, since the box that runs ntfy.sh is pretty small. As for my concrete steps from this, I will likely simply disable the iOS polling entirely (for now). This should not have any impact on iOS (hopefully), since the polling is a secondary measure anyway, and almost all of them seem to fail anyway right now. It may even be beneficial, because the duplicate delivery issues will disappear. There is no good way to test this IMHO, so I'll just do it and see if iOS users will scream. 😬 |
Disabled entirely in bdae48a |
The text was updated successfully, but these errors were encountered: