-
-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some publish requests on ntfy.sh take up to 15 seconds #338
Comments
In trying to fix this, I have encountered a horrible data race that I have not been able to figure out in quite some time. It appears to be happening when the Go HTTP code reads from the socket when closing the request, and causes a data race with the Here's what the data race stack looks like (see https://github.com/binwiederhier/ntfy/runs/6994933562?check_suite_focus=true):
This stack in particular indicates that something inside the Go stdlib is reading from the underlying connection/socket
The |
Hopefully fixed, will be released on the next server release |
Wrote this test to see if it fixed it and how much performance has improved. Before (note:
After (note:
|
So I've been noticing that every now and then some requests against ntfy.sh had been taking 11-15s (as opposed to <1s). At first I thought it was a problem with the Linux kernel tuning variables (somaxconn, nofile, ...). Then I thought it was nginx. After randomly poking around I found that the
updateStatsAndPrune()
code is likely to blame, because it locks the server mutex for a very long time (or so it appears).Here's what I saw:
This happened even when doing it against localhost:11080 (= not through nginx), meaning DNS and nginx could be ruled out.
I briefly turned on trace logging in ntfy and saw this:
This corresponds to this block of code:
ntfy/server/server.go
Lines 1114 to 1150 in 4e29216
ntfy/server/server.go
Line 1083 in 4e29216
Note the timestamps, 18:51:05 + 18:51:20 -- That's 15 seconds to run this code, meaning that all POST/PUT requests have to wait on the lock this entire time.
This is likely relatively easy to fix, and looking at the code it is obviously pretty inefficient.
The text was updated successfully, but these errors were encountered: