-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats get corrupted after flood of queries #4358
Comments
@cadusilva, hello and apologies for a late response. To troubleshoot the issue we'd like to check your verbose log. Could you please reproduce the issue and collect it? You may send it to devteam@adguard.com with something like "Issue 4358" in the subject. It'd also be really helpful if you attach the query log file and the corrupted |
hello @EugeneOne1, currently my installation is set to answer queries only from a limited list of CIDR ranges (from my country). Since then, I didn't see the problem surface again. I'll clear the list of CIDRs and wait to see if the problem comes back again. Also, there's already an e-mail that I sent some weeks ago that contains part of the info requested in your reply. As soon as the problem appears again, I'll send another e-mail. Thank you! |
Hello again, here's a follow-up: as soon as the CIDR list was cleared, the issue surfaced again and the e-mail with the files is on its way. Today I saw that the stats were corrupted again. The only catch is that the query log only contains the data from the last 6 hours as I forgot to expand the retention window. But I hope it helps anyway. Any extra step or additional info, just say the word. |
@cadusilva, we've received and investigating it, thanks. |
@cadusilva, hello again. Unfortunatelly, we can't reproduce the issue. Could you please answer a couple of questions to shed some light on the problem:
Also, could you please collect the browser's logs next time you'll catch it? This would really help up. Thanks. |
Hello @EugeneOne1,
Currently, even limiting the CIDRs that can query my server to accept only IPs from Brazil isn't preventing the stats from being corrupted, as some foreing queries shows up in the stats. All I know is, from time to time, when I access the AGH WebUI, everything is zeroed and there's an error message at the corner. At the moment there's no CIDR filtering in place, so I'm just waiting for the next time the stats get corrupted. Then I'll send the browser console logs to you guys. This can be confusing and I'm not sure how it all happens, but it does happen. What I can send in the form of logs will be sent. |
Hello @EugeneOne1, I just sent a new e-mail with an also new set of files including the console log from Chrome. I noticed few minutes ago that the problem happened again so I immediately gathered the files and sent the new e-mail. Hope this helps. Thank you. |
@cadusilva, we've received the data, many thanks. The issue seems kind of nontrivial, we'll dig further. |
@cadusilva, hello again and apologies for the long wait. Are the sent logs has been recorded while the issue occured? And also if flushing the statistics fixes the issue for some noticable period of time? |
Hello there! Yes, I set the verbose log on and waited for the problem to happen. I don't know if flushing the statistics from the webui fixes it, but I just delete the |
Update: flushing the statistics from the webui General Settings also works after the problem occurs, and that's what just happened. |
Merge in DNS/adguard-home from 4358-fix-stats to master Updates #4358. Updates #4342. Squashed commit of the following: commit 5683cb3 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 18:20:54 2022 +0300 stats: rm races test commit 63dd676 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 17:13:36 2022 +0300 stats: try to imp test commit 59a0f24 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 16:38:57 2022 +0300 stats: fix nil ptr deref commit 7fc3ff1 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Apr 7 16:02:51 2022 +0300 stats: fix races finally, imp tests commit c63f5f4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:56:49 2022 +0300 aghhttp: add register func commit 61adc7f Merge: edbdb2d 9b3adac Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:36:01 2022 +0300 Merge branch 'master' into 4358-fix-stats commit edbdb2d Merge: a91e4d7 a481ff4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 21:00:42 2022 +0300 Merge branch 'master' into 4358-fix-stats commit a91e4d7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:46:19 2022 +0300 stats: imp code, docs commit c5f3814 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:16:13 2022 +0300 all: log changes commit 5e6caaf Merge: 091ba75 eb8e816 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:09:10 2022 +0300 Merge branch 'master' into 4358-fix-stats commit 091ba75 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:07:39 2022 +0300 stats: imp docs, code commit f2b2de7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 2 17:09:30 2022 +0300 all: refactor stats & add mutexes commit b3f11c4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Apr 27 15:30:09 2022 +0300 WIP
@cadusilva, hello again and apologies for delayed response. We've finally improved the concurrent logic in the statistics module to make it work with shared memory more carefully. Could you please check the last build in the FIY, our tests didn't show any significant performance losses relatively to the old version, but we'd like to get your feedback as well. |
Hello @EugeneOne1, no problem. I just installed version |
Bad news @EugeneOne1: it happened again. Yesterday everything was fine, then a few minutes ago I accessed the WebUI and the stats were corrupted again. Unfortunatelly, the verbose log wasn't enabled this time. |
@cadusilva, thanks for the time you're contributing. We're going to investigate it further. |
Thank you Eugene for looking into it. I have now enabled verbose log so when it happens again I'll have more details for you guys to debug the issue. |
Hello @EugeneOne1, I've just sent another e-mail with a tarball containing everything including logs so you guys can take a look into the latest occurrence. Thank you! |
@cadusilva, we've received it and looking into, thanks. |
@cadusilva, could you please also try to access the Web UI directly (bypassing the Nginx) while using the same "bad" file? If the issue reproduces? If yes, if it reproduces within another browsers? |
Sure, tonight I'll make the tests and when I get the results, I'll will post them here @EugeneOne1. |
Hello @EugeneOne1, I just did some tests and found interesting results. By the way, the issue just happened so I downloaded the "corrupted" installation and cleaned the zeroed statistics via dashboard so it started to count queries again. Great, so next thing I did was to rename Then I stopped nginx and AdGuardHome services and changed the AGH ports so it would listen on 80 and 443, besides the other default ports for DNS over TLS and DNS over QUIC. Next I restarted AdGuardHome to apply the new port settings and to use the "corrupt" Then everything was zeroed. To be sure, I stopped nginx one more time and restarted AGH set to use ports 80 and 443 directly instead of being reverse-proxied by nginx. And there was all the stats from the "corrupt" For closure, I did these steps for a third time and yes, the "corrupt" Here are some pictures: AdGuard Home directly: AdGuard Home via nginx: |
@cadusilva, it seems, we've finally found a source of the issue. I'd say you may want to revisit your Nginx configuration. I'm not really familiar with it, but we've received a couple of related issues (e.g. #4727). Perhaps, you may configure Nginx to collect some logs and possibly let us see it so that we could enhance the documentation about using the reverse proxy. |
Hello @EugeneOne1, what kind of logs do you guys want? Currently there's already an error log going on. Would it suffice? Next time the issue happens, I'll send both AGH and nginx logs so you guys can investigate. I'm also not very familiar with nginx configs to pinpoint the problem and set things right but maybe with these logs a direction can be found. I'll also do a little research about how I can improve the communication between nginx and AGH and see if things improve. Thank you. |
First thing I did was to change my configuration. The line To test, I set Firefox to use encrypted DNS and everything works fine and quicker so far. By the way, as a result, AGH log doesn't see the query as encrypted anymore and shows it as "Plain DNS". But I think this is expected, as nginx is the one handling encryption right now and communicating with AGH without encrypting anything in Updates:
|
@EugeneOne1 so these are the initial findings, it seems that most clients and the server itself don't go along with the changes in the configuration, including AGH reseting the connection, it seems. Firefox, for some reason, is one of a kind and doesn't complain. |
Based on this blog post by Nginx staff, I did some other changes. dns.conf:
nginx.conf:
There's also this cache thingy but the specified folder remains empty.
I'll keep an eye now to see how everything works. The |
Hello @EugeneOne1, unfortunately the changes didn't solve the problem. I just became aware that it happened again. It always happens not too far beyond 100k queries. I'll check the nginx log and send everything to the devs e-mail. Thank you. |
@cadusilva, hello. Just to ensure, is it happened without Nginx proxy? We'll dig further then. |
It happened with nginx as middleman, I cannot bypass nginx or the other sites will become offline. I was testing the new nginx settings to see if it'd solve the problem without taking nginx out of the equation. |
@cadusilva, I've looked through the Nginx docs came up with a few suggestions:
Also, I've just noticed you've mentioned the Nginx log. Is it possible somehow to get the part of it with the error occurence? |
Hello @EugeneOne1, I've just applied your suggestions. Here are the relevant bits:
In this file there's also
About the end of your last message, I sent an e-mail two days ago with a few files including the nginx log but I couldn't find myself any relevant bit about the issue we're digging. I guess I'll check the log level and watch to see what happens now with the edits to the files. Thank you. |
I'm afraid I can't tell for sure. Actually, the main suspect for the moment is the |
@EugeneOne1 I'm still monitoring to see if the issue happens again. But I'm not sure if it will, as I am now running AdGuardHome in a machine way more powerfull than previous Raspberry Pi (now sold to someone else). It's now a Ryzen 5 3550H, soon-to-have 24 GB of DDR4 2400 MHz RAM. If the problem doesn't come back, maybe the issue has to do with the RPi not being powerful enough to deal with this I'll keep watching and will comment here if something new happens. |
@cadusilva, I'd be surprised if this was the actual cause, since the out-of-resources kind of problem usually causes issues in all parts of the system. Although, extra usage data never harms, so you're always welcome to share your findings. Besides, there is a quick way to check it out by simply replacing the stats and query log data files in the current setup, creating backups of the existing data beforehand. |
@EugeneOne1 I mean, there's now a lot more processing power than before. This is where the RPi struggles. It is a very competent piece of hardware and I hosted all kinds of stuff using it, but it's not powerful and it's not even meant to outpower a game rig or something. My SBC was the 4B version with 8 GB of RAM, so it had a lot of resources and used less than 2/8 of them (at least speaking of RAM usage). So processing is the Achilles' heel of this little computer, and it sometimes struggled with things like refreshing all the blocking lists I use to see if there's an update and then processing it, for example. And I also have a lot of clientes with a lot of CIDRs, so when you mentioned the But here's another piece of information: I replaced the current Again, there was all the stats. Currently, I'm running the edge release But these are my latest findings so you guys can analyse. So far, there's no new occurrencies. They used to happen shortly after hitting 100k queries, I couldn't get to 200k before the zeroed dashboard happened. I'll keep watching. Thank you. |
Hello @EugeneOne1, so far there's no new occurrences. I'm almost at 300.000 queries on record, but zero new issues. Maybe the number of clients and its CIDRs, blocklists and the weight of the dashboard are too much for the Raspberry Pi to handle when these three factors come together. It's just a theory. |
@cadusilva, I didn't recall if you've told already, but does Nginx runs on the same machine as AGH? |
Yes, everything is in the same machine and nginx acts as a reverse proxy. |
If there were no new occurrences, I'd say that it was an HTTP proxy issue. We'll close this issue, if you don't mind. |
It's okay, there's really no new occurrences so far. They were gone after I moved from a Raspberry Pi, 8 GB, SSD to a Ryzen 5 3550H, 16 GB RAM and NVMe (as of now). The setup is the same, with HTTP proxy and everything in between, but the hardware has changed to a much more powerful one and the problem is now gone. Thank you guys for looking into it, if it ever happens again I'll let you know. |
Merge in DNS/adguard-home from 4358-fix-stats to master Updates AdguardTeam#4358. Updates AdguardTeam#4342. Squashed commit of the following: commit 5683cb3 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 18:20:54 2022 +0300 stats: rm races test commit 63dd676 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 17:13:36 2022 +0300 stats: try to imp test commit 59a0f24 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 16:38:57 2022 +0300 stats: fix nil ptr deref commit 7fc3ff1 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Apr 7 16:02:51 2022 +0300 stats: fix races finally, imp tests commit c63f5f4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:56:49 2022 +0300 aghhttp: add register func commit 61adc7f Merge: edbdb2d 9b3adac Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:36:01 2022 +0300 Merge branch 'master' into 4358-fix-stats commit edbdb2d Merge: a91e4d7 a481ff4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 21:00:42 2022 +0300 Merge branch 'master' into 4358-fix-stats commit a91e4d7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:46:19 2022 +0300 stats: imp code, docs commit c5f3814 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:16:13 2022 +0300 all: log changes commit 5e6caaf Merge: 091ba75 eb8e816 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:09:10 2022 +0300 Merge branch 'master' into 4358-fix-stats commit 091ba75 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:07:39 2022 +0300 stats: imp docs, code commit f2b2de7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 2 17:09:30 2022 +0300 all: refactor stats & add mutexes commit b3f11c4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Apr 27 15:30:09 2022 +0300 WIP
Merge in DNS/adguard-home from 4358-stats-races to master Updates AdguardTeam#4358 Squashed commit of the following: commit 162d17b Merge: 17732cf d4c3a43 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 17 14:04:20 2022 +0300 Merge branch 'master' into 4358-stats-races commit 17732cf Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 17 13:53:42 2022 +0300 stats: imp docs, locking commit 4ee0908 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 20:26:19 2022 +0300 stats: revert const commit a7681a1 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 20:23:00 2022 +0300 stats: imp concurrency commit a6c6c1a Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 19:51:30 2022 +0300 stats: imp code, tests, docs commit 954196b Merge: 281e00d 6e63757 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 13:07:32 2022 +0300 Merge branch 'master' into 4358-stats-races commit 281e00d Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Fri Aug 12 16:22:18 2022 +0300 stats: imp closing commit ed036d9 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Fri Aug 12 16:11:12 2022 +0300 stats: imp tests more commit f848a12 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Fri Aug 12 13:54:19 2022 +0300 stats: imp tests, code commit 60e11f0 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 11 16:36:07 2022 +0300 stats: fix test commit 6d97f1d Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 11 14:53:21 2022 +0300 stats: imp code, docs commit 20c70c2 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 10 20:53:36 2022 +0300 stats: imp shared memory safety commit 8b39456 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 10 17:22:55 2022 +0300 stats: imp code
Issue Details
Version of AdGuard Home server:
How did you install AdGuard Home:
How did you setup DNS configuration:
CPU architecture:
Operating system and version:
Expected Behavior
Stats page working normally showing all the info it shows.
Actual Behavior
From time to time the stats page shows zero everywhere and empty domains lists (most blocked, most queried, etc). When accessing the page at the time it happens, an error appears at the right bottom corner saying:
AdGuard continues to work normally apparently and resolving names with other pages working also, but the stats page gets broken. My guess is after a flood of queries from a bunch of CN/HK/SG addresses, the stats info file gets corrupt somehow.
The last batch was repeatedly composed of queries like that:
Additional Information
The upstream is Unbound 1.13.1. If possible, please provide an e-mail where I can forward the more sensitive info like query and stats file for analysis if needed.
The text was updated successfully, but these errors were encountered: