Stats get corrupted after flood of queries #4358

cadusilva · 2022-03-04T11:51:35Z

Issue Details

Version of AdGuard Home server:
- v0.107.4
How did you install AdGuard Home:
- GitHub releases
How did you setup DNS configuration:
- System
CPU architecture:
- ARM64/aarch64
Operating system and version:
- Debian 11

Expected Behavior

Stats page working normally showing all the info it shows.

Actual Behavior

From time to time the stats page shows zero everywhere and empty domains lists (most blocked, most queried, etc). When accessing the page at the time it happens, an error appears at the right bottom corner saying:

Error: control/clients/find?ip0=127.0.0.1&ip1=206.42.33.166&ip2=10.0.0.3&ip3=45.164.223.2&ip4=motog&ip5=joaouerj&ip6=131.159.24.242&ip7=179.190.172.16&ip8=220.180.241.26&ip9=177.25.155.98&ip10=177.25.153.160&ip11=177.25.150.57&ip12=154.202.55.217&ip13=177.25.153.142&ip14=45.205.48.180&ip15=185.249.221.238&ip16=183.90.186.240&ip17=177.25.156.157&ip18=177.25.144.182&ip19=154.198.205.174&ip20=45.205.35.196&ip21=156.229.9.224&ip22=154.198.219.21&ip23=177.25.153.187&ip24=177.25.144.89&ip25=177.25.150.41&ip26=10.0.0.1&ip27=177.25.153.233&ip28=177.25.159.130&ip29=177.25.159.162&ip30=191.47.16.69&ip31=177.25.153.174&ip32=177.25.155.148&ip33=177.25.144.221&ip34=177.25.153.216&ip35=177.25.156.1&ip36=177.25.159.132&ip37=177.25.155.166&ip38=177.25.145.82&ip39=177.25.151.167&ip40=177.25.150.32&ip41=88.80.186.137&ip42=177.25.156.159&ip43=177.25.153.172&ip44=177.25.155.73&ip45=177.25.145.50&ip46=177.25.157.85&ip47=14.1.112.177&ip48=170.106.176.49&ip49=177.25.150.119&ip50=131.159.25.7&ip51=146.88.240.4&ip52=103.203.59.3&ip53=209.141.45.192&ip54=39.129.8.129&ip55=45.83.67.25&ip56=177.25.156.143&ip57=129.250.206.86&ip58=174.138.40.30&ip59=162.142.125.133&ip60=71.6.232.7&ip61=141.22.28.227&ip62=27.98.224.20&ip63=184.105.139.117&ip64=64.62.197.37&ip65=45.79.15.228&ip66=37.44.239.30&ip67=159.89.194.175&ip68=202.112.238.56&ip69=177.25.156.77&ip70=185.180.143.142&ip71=185.180.143.73&ip72=162.142.125.212&ip73=54.173.29.204&ip74=185.180.143.76&ip75=146.88.240.12&ip76=141.212.123.193 | Network Error

AdGuard continues to work normally apparently and resolving names with other pages working also, but the stats page gets broken. My guess is after a flood of queries from a bunch of CN/HK/SG addresses, the stats info file gets corrupt somehow.

The last batch was repeatedly composed of queries like that:

{"T":"2022-03-04T02:39:59.960278275-03:00","QH":"microsoft.com","QT":"TXT","QC":"IN","CP":"","Answer":"APyBgAABABAAAAABCW1pY3Jvc29mdANjb20AABAAAcAMABAAAQAADR8AKyphcHBsZS1kb21haW4tdmVyaWZpY2F0aW9uPTBnTWVhWXlZeTZHTFZpR2/ADAAQAAEAAA0fAEVEZ29vZ2xlLXNpdGUtdmVyaWZpY2F0aW9uPXBqUE9hdVNQY3JmWE9aUzlqblBQYTVheG93Y0hHQ0RBbDFfODZkQ3FGcGvADAAQAAEAAA0fABsaZmcydDBnb3Y5NDI0cDJ0ZGN1bzk0Z29lOWrADAAQAAEAAA0fABsadDdzZWJlZTUxanJqN3ZtOTMyazUzMWhpcGHADAAQAAEAAA0fAEVEZ29vZ2xlLXNpdGUtdmVyaWZpY2F0aW9uPU0tLUNWZm5fWXdzVi0yRkdiQ3BfSEZhRWoyM0JtVDBjVEY0bDhoWGdwdk3ADAAQAAEAAA0fACEgcGJjcGN3ODRzZms3dzRuaG03ZHd5ZzJrM2d4MHQ0eHLADAAQAAEAAA0fAC4tZG9jdXNpZ249ZDVhMzczN2MtYzIzYy00YmQwLTkwOTUtZDJmZjYyMWYyODQwwAwAEAABAAANHwC+vXY9c3BmMSBpbmNsdWRlOl9zcGYtYS5taWNyb3NvZnQuY29tIGluY2x1ZGU6X3NwZi1iLm1pY3Jvc29mdC5jb20gaW5jbHVkZTpfc3BmLWMubWljcm9zb2Z0LmNvbSBpbmNsdWRlOl9zcGYtc3NnLWEubWljcm9zb2Z0LmNvbSBpbmNsdWRlOnNwZi1hLmhvdG1haWwuY29tIGluY2x1ZGU6X3NwZjEtbWVvLm1pY3Jvc29mdC5jb20gLWFsbMAMABAAAQAADR8AODdhZG9iZS1zaWduLXZlcmlmaWNhdGlvbj1jMWZlYTliNGNkZDRkZjBkNTc3ODUxN2YyOWUwOTM0wAwAEAABAAANHwAuLWRvY3VzaWduPTUyOTk4NDgyLTM5M2QtNDZmNy05NWQ0LTE1YWM2NTA5YmZkZMAMABAAAQAADR8AXVxhZG9iZS1pZHAtc2l0ZS12ZXJpZmljYXRpb249OGFhMzVjNTI4YWY1ZDcyYmViMTliMWJkM2VkOWI4NmQ4N2VhN2YyNGIyYmEzYzk5ZmZjZDAwYzI3ZTlkODA5Y8AMABAAAQAADR8AJSRkMzY1bWt0a2V5PTRkOGJueWN4NDBmeTM1ODFwZXR0YTRnc2bADAAQAAEAAA0fAFlYOFJQRFhqQnpCUzl0dTdQYnlzdTdxQ0FDcndYUG9EVjhadExmdGhUbkM0eTlWSkZMZDg0aXQ1c1FsRUlUZ1NMSjRLT0lBOHBCWnhteXZQdWp1VXZoT2c9PcAMABAAAQAADR8ARURnb29nbGUtc2l0ZS12ZXJpZmljYXRpb249MVRlSzhxME96aUZsNFQxdEYtUVI2NUprekhaMXJjZGdOY2NERnA3OGlUa8AMABAAAQAADR8AJSRkMzY1bWt0a2V5PTN1YzFjZjgyY3B2NzUwbHprNzB2OWJ2ZjLADAAQAAEAAA0fADw7ZmFjZWJvb2stZG9tYWluLXZlcmlmaWNhdGlvbj1md3p3aGJiendtZzVmemdvdGMyZ281MW9sYzM1NjYAACkQAAAAAAAAAA==","Result":{},"Upstream":"127.0.0.1:5300","IP":"156.229.9.224","Elapsed":1031547,"Cached":true}

Additional Information

The upstream is Unbound 1.13.1. If possible, please provide an e-mail where I can forward the more sensitive info like query and stats file for analysis if needed.

The text was updated successfully, but these errors were encountered:

EugeneOne1 · 2022-04-05T12:02:51Z

@cadusilva, hello and apologies for a late response. To troubleshoot the issue we'd like to check your verbose log. Could you please reproduce the issue and collect it? You may send it to devteam@adguard.com with something like "Issue 4358" in the subject. It'd also be really helpful if you attach the query log file and the corrupted stats.db file as well.

cadusilva · 2022-04-06T10:53:41Z

hello @EugeneOne1, currently my installation is set to answer queries only from a limited list of CIDR ranges (from my country). Since then, I didn't see the problem surface again. I'll clear the list of CIDRs and wait to see if the problem comes back again. Also, there's already an e-mail that I sent some weeks ago that contains part of the info requested in your reply. As soon as the problem appears again, I'll send another e-mail. Thank you!

cadusilva · 2022-04-07T12:26:29Z

Hello again, here's a follow-up: as soon as the CIDR list was cleared, the issue surfaced again and the e-mail with the files is on its way. Today I saw that the stats were corrupted again. The only catch is that the query log only contains the data from the last 6 hours as I forgot to expand the retention window. But I hope it helps anyway. Any extra step or additional info, just say the word.

EugeneOne1 · 2022-04-07T12:54:38Z

@cadusilva, we've received and investigating it, thanks.

EugeneOne1 · 2022-04-25T13:11:11Z

@cadusilva, hello again. Unfortunatelly, we can't reproduce the issue. Could you please answer a couple of questions to shed some light on the problem:

What kind of machine is running AGH? Is it a router?
What do you mean by "gets corrupt somehow"? Did the problem occured strictly after the flood of queries?

Also, could you please collect the browser's logs next time you'll catch it? This would really help up. Thanks.

cadusilva · 2022-04-25T13:33:10Z

@cadusilva, hello again. Unfortunatelly, we can't reproduce the issue. Could you please answer a couple of questions to shed some light on the problem:
1. What kind of machine is running AGH?  Is it a router?

2. What do you mean by "gets corrupt somehow"?  Did the problem occured strictly after the flood of queries?
Also, could you please collect the browser's logs next time you'll catch it? This would really help up. Thanks.

Hello @EugeneOne1,

AGH is currently running in a Raspberry Pi 4B 8 GB RAM with the official Raspberry Pi OS x64 based on Debian 11.3.
This is because I'm not quite sure how the queries get corrupted, hence the "somehow". Initially, when this corruption first occurred, there was a flood of queries type TXT to microsoft.com from APAC IPs, but since then I'm not so sure how it keep getting corrupted after some time and who is to blame.

Currently, even limiting the CIDRs that can query my server to accept only IPs from Brazil isn't preventing the stats from being corrupted, as some foreing queries shows up in the stats.

All I know is, from time to time, when I access the AGH WebUI, everything is zeroed and there's an error message at the corner. At the moment there's no CIDR filtering in place, so I'm just waiting for the next time the stats get corrupted. Then I'll send the browser console logs to you guys.

This can be confusing and I'm not sure how it all happens, but it does happen. What I can send in the form of logs will be sent.

cadusilva · 2022-04-26T02:52:01Z

Hello @EugeneOne1, I just sent a new e-mail with an also new set of files including the console log from Chrome. I noticed few minutes ago that the problem happened again so I immediately gathered the files and sent the new e-mail. Hope this helps. Thank you.

EugeneOne1 · 2022-04-26T10:13:56Z

@cadusilva, we've received the data, many thanks. The issue seems kind of nontrivial, we'll dig further.

EugeneOne1 · 2022-05-30T18:34:39Z

@cadusilva, hello again and apologies for the long wait. Are the sent logs has been recorded while the issue occured? And also if flushing the statistics fixes the issue for some noticable period of time?

cadusilva · 2022-05-30T20:41:47Z

@cadusilva, hello again and apologies for the long wait. Are the sent logs has been recorded while the issue occured? And also if flushing the statistics fixes the issue for some noticable period of time?

Hello there! Yes, I set the verbose log on and waited for the problem to happen. I don't know if flushing the statistics from the webui fixes it, but I just delete the stats.db when the problem happens. It occurred again once or twice since the last post.

cadusilva · 2022-06-01T17:32:46Z

Update: flushing the statistics from the webui General Settings also works after the problem occurs, and that's what just happened.

Merge in DNS/adguard-home from 4358-fix-stats to master Updates #4358. Updates #4342. Squashed commit of the following: commit 5683cb3 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 18:20:54 2022 +0300 stats: rm races test commit 63dd676 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 17:13:36 2022 +0300 stats: try to imp test commit 59a0f24 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 16:38:57 2022 +0300 stats: fix nil ptr deref commit 7fc3ff1 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Apr 7 16:02:51 2022 +0300 stats: fix races finally, imp tests commit c63f5f4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:56:49 2022 +0300 aghhttp: add register func commit 61adc7f Merge: edbdb2d 9b3adac Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:36:01 2022 +0300 Merge branch 'master' into 4358-fix-stats commit edbdb2d Merge: a91e4d7 a481ff4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 21:00:42 2022 +0300 Merge branch 'master' into 4358-fix-stats commit a91e4d7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:46:19 2022 +0300 stats: imp code, docs commit c5f3814 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:16:13 2022 +0300 all: log changes commit 5e6caaf Merge: 091ba75 eb8e816 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:09:10 2022 +0300 Merge branch 'master' into 4358-fix-stats commit 091ba75 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:07:39 2022 +0300 stats: imp docs, code commit f2b2de7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 2 17:09:30 2022 +0300 all: refactor stats & add mutexes commit b3f11c4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Apr 27 15:30:09 2022 +0300 WIP

EugeneOne1 · 2022-08-04T16:34:52Z

@cadusilva, hello again and apologies for delayed response. We've finally improved the concurrent logic in the statistics module to make it work with shared memory more carefully. Could you please check the last build in the edge channel and tell if it works properly and doesn't spoil the database file? Thanks.

FIY, our tests didn't show any significant performance losses relatively to the old version, but we'd like to get your feedback as well.

cadusilva · 2022-08-04T16:54:18Z

Hello @EugeneOne1, no problem. I just installed version v0.108.0-a.189+4293cf59 and will report back if the issue happens again. I'll keep an eye on the performance as well. Thank you!

cadusilva · 2022-08-09T11:10:23Z

Bad news @EugeneOne1: it happened again.

Yesterday everything was fine, then a few minutes ago I accessed the WebUI and the stats were corrupted again. Unfortunatelly, the verbose log wasn't enabled this time.

EugeneOne1 · 2022-08-09T12:06:57Z

@cadusilva, thanks for the time you're contributing. We're going to investigate it further.

cadusilva · 2022-08-09T12:11:51Z

@cadusilva, thanks for the time you're contributing. We're going to investigate it further.

Thank you Eugene for looking into it. I have now enabled verbose log so when it happens again I'll have more details for you guys to debug the issue.

cadusilva · 2022-08-15T13:17:57Z

Hello @EugeneOne1, I've just sent another e-mail with a tarball containing everything including logs so you guys can take a look into the latest occurrence. Thank you!

EugeneOne1 · 2022-08-15T13:35:00Z

@cadusilva, we've received it and looking into, thanks.

EugeneOne1 · 2022-09-05T11:05:01Z

@cadusilva, could you please also try to access the Web UI directly (bypassing the Nginx) while using the same "bad" file? If the issue reproduces? If yes, if it reproduces within another browsers?

cadusilva · 2022-09-05T12:14:18Z

@cadusilva, could you please also try to access the Web UI directly (bypassing the Nginx) while using the same "bad" file? If the issue reproduces? If yes, if it reproduces within another browsers?

Sure, tonight I'll make the tests and when I get the results, I'll will post them here @EugeneOne1.

cadusilva · 2022-09-06T01:32:07Z

Hello @EugeneOne1, I just did some tests and found interesting results. By the way, the issue just happened so I downloaded the "corrupted" installation and cleaned the zeroed statistics via dashboard so it started to count queries again.

Great, so next thing I did was to rename stats.db to stats.db.bak and to upload the "corrupt" stats.db file I had downloaded previously.

Then I stopped nginx and AdGuardHome services and changed the AGH ports so it would listen on 80 and 443, besides the other default ports for DNS over TLS and DNS over QUIC.

Next I restarted AdGuardHome to apply the new port settings and to use the "corrupt" stats.db file. For my surprise, all the stats were back! Then I undone the port settings in the YAML file and started nginx again.

Then everything was zeroed. To be sure, I stopped nginx one more time and restarted AGH set to use ports 80 and 443 directly instead of being reverse-proxied by nginx. And there was all the stats from the "corrupt" stats.db file.

For closure, I did these steps for a third time and yes, the "corrupt" stats.db file works just fine when AGH dashboard is accessed directly (AGH uses ports 80 and 443) and shows as zeroed when nginx is the middleman.

Here are some pictures:

AdGuard Home directly:

AdGuard Home via nginx:

EugeneOne1 · 2022-09-06T11:31:34Z

@cadusilva, it seems, we've finally found a source of the issue. I'd say you may want to revisit your Nginx configuration. I'm not really familiar with it, but we've received a couple of related issues (e.g. #4727).

Perhaps, you may configure Nginx to collect some logs and possibly let us see it so that we could enhance the documentation about using the reverse proxy.

cadusilva · 2022-09-06T13:52:34Z

@cadusilva, it seems, we've finally found a source of the issue. I'd say you may want to revisit your Nginx configuration. I'm not really familiar with it, but we've received a couple of related issues (e.g. #4727).

Perhaps, you may configure Nginx to collect some logs and possibly let us see it so that we could enhance the documentation about using the reverse proxy.

Hello @EugeneOne1, what kind of logs do you guys want? Currently there's already an error log going on. Would it suffice? Next time the issue happens, I'll send both AGH and nginx logs so you guys can investigate. I'm also not very familiar with nginx configs to pinpoint the problem and set things right but maybe with these logs a direction can be found.

I'll also do a little research about how I can improve the communication between nginx and AGH and see if things improve.

Thank you.

cadusilva · 2022-09-06T14:09:55Z

First thing I did was to change my configuration.

The line proxy_pass https://dns_doh_servers; became proxy_pass http://127.0.0.1:4430;. Also, in the AGH YAML file, I changed allow_unencrypted_doh to true, so nginx can handle the HTTPS stuff alone.

To test, I set Firefox to use encrypted DNS and everything works fine and quicker so far. By the way, as a result, AGH log doesn't see the query as encrypted anymore and shows it as "Plain DNS". But I think this is expected, as nginx is the one handling encryption right now and communicating with AGH without encrypting anything in localhost. Let's see what happens.

Updates:

Google Chrome didn't like the new setup as much as Firefox and so it can't resolve any address when DoH is enabled.
There's a lot of erros like this one in nginx log: recv() failed (104: Connection reset by peer) while reading upstream, client: xxx.xxx.xxx.xxx , server: dns.alto.win, request: "POST /dns-query HTTP/2.0", upstream: "http://127.0.0.1:4430/dns-query", host: "dns.alto.win"
The dnslookup tool says: Cannot make the DNS request: got status code 400 from https://dns.alto.win:443/dns-query. I don't know how Firefox doesn't care and just works but some clients cares a lot and do not work.
Android's Intra DNS app also can't communicate via DoH with the settings changes.

cadusilva · 2022-09-06T15:54:05Z

@EugeneOne1 so these are the initial findings, it seems that most clients and the server itself don't go along with the changes in the configuration, including AGH reseting the connection, it seems. Firefox, for some reason, is one of a kind and doesn't complain.

cadusilva · 2022-09-06T18:13:26Z

Based on this blog post by Nginx staff, I did some other changes.

dns.conf:
The new entry for /dns-query is as follows:

	location /dns-query {
		proxy_http_version		1.1;
		proxy_set_header		Connection "";
		proxy_set_header		Host			$http_host;
		proxy_set_header		X-Real-IP		$realip_remote_addr;
		proxy_set_header		X-Forwarded-For		$proxy_add_x_forwarded_for;
		proxy_cache			doh_cache;
		proxy_cache_key			$scheme$proxy_host$uri$is_args$args$request_body;
		proxy_cache_methods		GET POST;
		proxy_pass			https://dohloop;
	}

nginx.conf:
The upstream directive is now as follows:

	upstream dohloop {
		zone dohloop			64k;
		server				127.0.0.1:4430;
		keepalive_timeout		60s;
		keepalive_requests		100;
		keepalive			10;
	}

There's also this cache thingy but the specified folder remains empty.

	proxy_cache_path /mnt/ram/nginx/doh_cache levels=1:2 keys_zone=doh_cache:10m;

I'll keep an eye now to see how everything works. The proxy_pass uses https:// again, so there's currently no problem with clients.

cadusilva · 2022-09-10T13:52:03Z

Hello @EugeneOne1, unfortunately the changes didn't solve the problem. I just became aware that it happened again. It always happens not too far beyond 100k queries. I'll check the nginx log and send everything to the devs e-mail. Thank you.

EugeneOne1 · 2022-09-10T13:59:53Z

@cadusilva, hello. Just to ensure, is it happened without Nginx proxy? We'll dig further then.

cadusilva · 2022-09-10T14:08:08Z

@cadusilva, hello. Just to ensure, is it happened without Nginx proxy? We'll dig further then.

It happened with nginx as middleman, I cannot bypass nginx or the other sites will become offline. I was testing the new nginx settings to see if it'd solve the problem without taking nginx out of the equation.

EugeneOne1 · 2022-09-12T12:25:39Z

@cadusilva, I've looked through the Nginx docs came up with a few suggestions:

the large_client_header_buffers defaults to 4 buffers 8k each. Perhaps, setting it to something about 4 16k may help;
enabling proxy_buffering also may help, but the sizes of buffers should be chosen properly.

Also, I've just noticed you've mentioned the Nginx log. Is it possible somehow to get the part of it with the error occurence?

cadusilva · 2022-09-12T13:06:48Z

Hello @EugeneOne1, I've just applied your suggestions. Here are the relevant bits:

nginx.conf

	proxy_buffering				on;
	proxy_request_buffering			on;
	proxy_buffers				8 4k;
	proxy_buffer_size			4k; 
	proxy_busy_buffers_size			16k;

In this file there's also proxy_set_header Early-Data $ssl_early_data;. Do you think it may help break the stats when viewing them with nginx as middleman?

dns.conf

	location / {
		proxy_pass			http://127.0.0.1:8081;
		proxy_set_header		X-Forwarded-For		$proxy_add_x_forwarded_for;
#		proxy_buffering			off;
#		proxy_redirect			off;
	}

	location /dns-query {
		proxy_http_version		1.1;
		proxy_set_header		Connection "";
		proxy_set_header		Host			$http_host;
		proxy_set_header		X-Real-IP		$realip_remote_addr;
		proxy_set_header		X-Forwarded-For		$proxy_add_x_forwarded_for;
		proxy_cache			doh_cache;
		proxy_cache_key			$scheme$proxy_host$uri$is_args$args$request_body;
		proxy_cache_methods		GET POST;
		proxy_pass			https://dohloop;
	}

About the end of your last message, I sent an e-mail two days ago with a few files including the nginx log but I couldn't find myself any relevant bit about the issue we're digging. I guess I'll check the log level and watch to see what happens now with the edits to the files.

Thank you.

EugeneOne1 · 2022-09-12T13:57:03Z

In this file there's also proxy_set_header Early-Data $ssl_early_data;. Do you think it may help break the stats when viewing them with nginx as middleman?

I'm afraid I can't tell for sure. Actually, the main suspect for the moment is the GET /control/stats endpoint's API. It intends the huge number of paramemters which significantly increases the URL length.

cadusilva · 2022-09-15T10:31:24Z

In this file there's also proxy_set_header Early-Data $ssl_early_data;. Do you think it may help break the stats when viewing them with nginx as middleman?

I'm afraid I can't tell for sure. Actually, the main suspect for the moment is the GET /control/stats endpoint's API. It intends the huge number of paramemters which significantly increases the URL length.

@EugeneOne1 I'm still monitoring to see if the issue happens again. But I'm not sure if it will, as I am now running AdGuardHome in a machine way more powerfull than previous Raspberry Pi (now sold to someone else). It's now a Ryzen 5 3550H, soon-to-have 24 GB of DDR4 2400 MHz RAM.

If the problem doesn't come back, maybe the issue has to do with the RPi not being powerful enough to deal with this GET /control/stats thingy, alongside with a big list of clients (as it was and is my case).

I'll keep watching and will comment here if something new happens.

EugeneOne1 · 2022-09-15T10:45:14Z

@cadusilva, I'd be surprised if this was the actual cause, since the out-of-resources kind of problem usually causes issues in all parts of the system. Although, extra usage data never harms, so you're always welcome to share your findings.

Besides, there is a quick way to check it out by simply replacing the stats and query log data files in the current setup, creating backups of the existing data beforehand.

cadusilva · 2022-09-15T15:12:26Z

@cadusilva, I'd be surprised if this was the actual cause, since the out-of-resources kind of problem usually causes issues in all parts of the system. Although, extra usage data never harms, so you're always welcome to share your findings.

Besides, there is a quick way to check it out by simply replacing the stats and query log data files in the current setup, creating backups of the existing data beforehand.

@EugeneOne1 I mean, there's now a lot more processing power than before. This is where the RPi struggles. It is a very competent piece of hardware and I hosted all kinds of stuff using it, but it's not powerful and it's not even meant to outpower a game rig or something.

My SBC was the 4B version with 8 GB of RAM, so it had a lot of resources and used less than 2/8 of them (at least speaking of RAM usage). So processing is the Achilles' heel of this little computer, and it sometimes struggled with things like refreshing all the blocking lists I use to see if there's an update and then processing it, for example.

And I also have a lot of clientes with a lot of CIDRs, so when you mentioned the GET part, I thought (as a layman of course) maybe it is about processing power.

But here's another piece of information: I replaced the current stats.db file with the one "corrupted" I last sent you via e-mail. Restarted AdGuard service and then... It worked. There was the old stats right in the dashboard. Just to be sure, I opened an older e-mail with a previous "corrupted" stats.db file and tried this one too.

Again, there was all the stats. Currently, I'm running the edge release v0.108.0-a.287+fc62796e Linux amd64. I just updated to this version, but the previous one also showed the same behaviour. The older stats.db file worked. Maybe there's something going on with the arm64 release? Maybe is something with the RPi processing power? Maybe is something else.

But these are my latest findings so you guys can analyse. So far, there's no new occurrencies. They used to happen shortly after hitting 100k queries, I couldn't get to 200k before the zeroed dashboard happened. I'll keep watching.

Thank you.

cadusilva · 2022-09-23T00:02:06Z

Hello @EugeneOne1, so far there's no new occurrences. I'm almost at 300.000 queries on record, but zero new issues. Maybe the number of clients and its CIDRs, blocklists and the weight of the dashboard are too much for the Raspberry Pi to handle when these three factors come together. It's just a theory.

EugeneOne1 · 2022-09-23T12:34:55Z

@cadusilva, I didn't recall if you've told already, but does Nginx runs on the same machine as AGH?

cadusilva · 2022-09-23T12:36:44Z

@cadusilva, I didn't recall if you've told already, but does Nginx runs on the same machine as AGH?

Yes, everything is in the same machine and nginx acts as a reverse proxy.

ainar-g · 2022-09-29T14:58:16Z

If there were no new occurrences, I'd say that it was an HTTP proxy issue. We'll close this issue, if you don't mind.

cadusilva · 2022-09-29T15:30:39Z

If there were no new occurrences, I'd say that it was an HTTP proxy issue. We'll close this issue, if you don't mind.

It's okay, there's really no new occurrences so far. They were gone after I moved from a Raspberry Pi, 8 GB, SSD to a Ryzen 5 3550H, 16 GB RAM and NVMe (as of now). The setup is the same, with HTTP proxy and everything in between, but the hardware has changed to a much more powerful one and the problem is now gone.

Thank you guys for looking into it, if it ever happens again I'll let you know.

Merge in DNS/adguard-home from 4358-fix-stats to master Updates AdguardTeam#4358. Updates AdguardTeam#4342. Squashed commit of the following: commit 5683cb3 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 18:20:54 2022 +0300 stats: rm races test commit 63dd676 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 17:13:36 2022 +0300 stats: try to imp test commit 59a0f24 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 16:38:57 2022 +0300 stats: fix nil ptr deref commit 7fc3ff1 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Apr 7 16:02:51 2022 +0300 stats: fix races finally, imp tests commit c63f5f4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:56:49 2022 +0300 aghhttp: add register func commit 61adc7f Merge: edbdb2d 9b3adac Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 4 00:36:01 2022 +0300 Merge branch 'master' into 4358-fix-stats commit edbdb2d Merge: a91e4d7 a481ff4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 21:00:42 2022 +0300 Merge branch 'master' into 4358-fix-stats commit a91e4d7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:46:19 2022 +0300 stats: imp code, docs commit c5f3814 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:16:13 2022 +0300 all: log changes commit 5e6caaf Merge: 091ba75 eb8e816 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:09:10 2022 +0300 Merge branch 'master' into 4358-fix-stats commit 091ba75 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 3 18:07:39 2022 +0300 stats: imp docs, code commit f2b2de7 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 2 17:09:30 2022 +0300 all: refactor stats & add mutexes commit b3f11c4 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Apr 27 15:30:09 2022 +0300 WIP

Merge in DNS/adguard-home from 4358-stats-races to master Updates AdguardTeam#4358 Squashed commit of the following: commit 162d17b Merge: 17732cf d4c3a43 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 17 14:04:20 2022 +0300 Merge branch 'master' into 4358-stats-races commit 17732cf Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 17 13:53:42 2022 +0300 stats: imp docs, locking commit 4ee0908 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 20:26:19 2022 +0300 stats: revert const commit a7681a1 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 20:23:00 2022 +0300 stats: imp concurrency commit a6c6c1a Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 19:51:30 2022 +0300 stats: imp code, tests, docs commit 954196b Merge: 281e00d 6e63757 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Tue Aug 16 13:07:32 2022 +0300 Merge branch 'master' into 4358-stats-races commit 281e00d Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Fri Aug 12 16:22:18 2022 +0300 stats: imp closing commit ed036d9 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Fri Aug 12 16:11:12 2022 +0300 stats: imp tests more commit f848a12 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Fri Aug 12 13:54:19 2022 +0300 stats: imp tests, code commit 60e11f0 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 11 16:36:07 2022 +0300 stats: fix test commit 6d97f1d Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Thu Aug 11 14:53:21 2022 +0300 stats: imp code, docs commit 20c70c2 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 10 20:53:36 2022 +0300 stats: imp shared memory safety commit 8b39456 Author: Eugene Burkov <E.Burkov@AdGuard.COM> Date: Wed Aug 10 17:22:55 2022 +0300 stats: imp code

cadusilva mentioned this issue Mar 14, 2022

statistic all 0 and error when open dashboard in windows #4249

Closed

EugeneOne1 added the waiting for data Waiting for users to provide more data. label Apr 5, 2022

EugeneOne1 added needs investigation Needs to be reproduced reliably. and removed waiting for data Waiting for users to provide more data. labels May 30, 2022

ainar-g assigned EugeneOne1 Jul 25, 2022

ainar-g added this to the v0.107.9 milestone Jul 25, 2022

ainar-g modified the milestones: v0.107.9, v0.107.10 Aug 3, 2022

EugeneOne1 added bug P3: Medium and removed needs investigation Needs to be reproduced reliably. labels Aug 4, 2022

ainar-g closed this as completed Sep 29, 2022

ainar-g added question and removed bug P3: Medium labels Sep 29, 2022

GTVolk mentioned this issue Dec 1, 2023

[Snyk] Fix for 1 vulnerabilities GTVolk/AdGuardHome#32

Open

GTVolk mentioned this issue Feb 11, 2024

[Snyk] Security upgrade webpack-dev-server from 3.11.0 to 4.8.0 GTVolk/AdGuardHome#40

Open

Stats get corrupted after flood of queries #4358

Stats get corrupted after flood of queries #4358

Comments

cadusilva commented Mar 4, 2022

Issue Details

Expected Behavior

Actual Behavior

Additional Information

EugeneOne1 commented Apr 5, 2022 • edited Loading

cadusilva commented Apr 6, 2022

cadusilva commented Apr 7, 2022

EugeneOne1 commented Apr 7, 2022

EugeneOne1 commented Apr 25, 2022 • edited Loading

cadusilva commented Apr 25, 2022

cadusilva commented Apr 26, 2022

EugeneOne1 commented Apr 26, 2022

EugeneOne1 commented May 30, 2022

cadusilva commented May 30, 2022

cadusilva commented Jun 1, 2022

EugeneOne1 commented Aug 4, 2022

cadusilva commented Aug 4, 2022

cadusilva commented Aug 9, 2022

EugeneOne1 commented Aug 9, 2022

cadusilva commented Aug 9, 2022

cadusilva commented Aug 15, 2022

EugeneOne1 commented Aug 15, 2022

EugeneOne1 commented Sep 5, 2022

cadusilva commented Sep 5, 2022

cadusilva commented Sep 6, 2022 • edited Loading

EugeneOne1 commented Sep 6, 2022

cadusilva commented Sep 6, 2022 • edited Loading

cadusilva commented Sep 6, 2022 • edited Loading

cadusilva commented Sep 6, 2022

cadusilva commented Sep 6, 2022 • edited Loading

cadusilva commented Sep 10, 2022 • edited Loading

EugeneOne1 commented Sep 10, 2022

cadusilva commented Sep 10, 2022

EugeneOne1 commented Sep 12, 2022

cadusilva commented Sep 12, 2022 • edited Loading

EugeneOne1 commented Sep 12, 2022

cadusilva commented Sep 15, 2022

EugeneOne1 commented Sep 15, 2022 • edited Loading

cadusilva commented Sep 15, 2022

cadusilva commented Sep 23, 2022

EugeneOne1 commented Sep 23, 2022

cadusilva commented Sep 23, 2022

ainar-g commented Sep 29, 2022

cadusilva commented Sep 29, 2022

EugeneOne1 commented Apr 5, 2022 •

edited

Loading

EugeneOne1 commented Apr 25, 2022 •

edited

Loading

cadusilva commented Sep 6, 2022 •

edited

Loading

cadusilva commented Sep 6, 2022 •

edited

Loading

cadusilva commented Sep 6, 2022 •

edited

Loading

cadusilva commented Sep 6, 2022 •

edited

Loading

cadusilva commented Sep 10, 2022 •

edited

Loading

cadusilva commented Sep 12, 2022 •

edited

Loading

EugeneOne1 commented Sep 15, 2022 •

edited

Loading