Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server began to freeze after the last update #5661

Closed
3 tasks done
ammnt opened this issue Mar 30, 2023 · 40 comments
Closed
3 tasks done

Server began to freeze after the last update #5661

ammnt opened this issue Mar 30, 2023 · 40 comments
Assignees
Milestone

Comments

@ammnt
Copy link

ammnt commented Mar 30, 2023

Prerequisites

  • I have checked the Wiki and Discussions and found no answer

  • I have searched other issues and found no duplicates

  • I want to report a bug and not ask a question

Operating system type

Linux, Other (please mention the version in the description)

CPU architecture

AMD64

Installation

GitHub releases or script from README

Setup

On one machine

AdGuard Home version

v0.108.0-b.31

Description

What did you do?

Classic GH-script installation. Debian 11 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye

Nothing special. Just updated to the last beta version and keep use as always. But now the server freezes. The verbose log is attached. Please start look from 22:43 MSK:
log.txt

Expected result

Actual result

Screenshots (if applicable)

Additional information

@ammnt
Copy link
Author

ammnt commented Mar 30, 2023

It looks like HTTP server stop responding when I use the journal section to find some domains🤔

@cjom
Copy link

cjom commented Mar 30, 2023

It's not only the HTTP server stop responding, connected devices also start saying "no internet access" either new or already connected.

My AGH is running directly in router Xiaomi AX3600 with OpenWRT (amd64 edge build).
This happens some hours after reboot, but reboot does not always fixes issue.

@Aiolos-Wang
Copy link

It is same for me.
After updated to latest version. server started to stuck in its web page. Connected devices can not query DNS request.
Rebooting the server only relieves it for a while.

@tomamplius
Copy link

It is same for me.

@muok
Copy link

muok commented Mar 31, 2023

Same issue here. There's a config file error. AGH modifies the config file and there's a syntax error.

The only way I resolved this is by restoring a backup.

@Gandulf78
Copy link

Same here on ARMv7.

@Fooose
Copy link

Fooose commented Mar 31, 2023

The error also occurs here, on an x86 Pfsense system.
AdGuard freezes and only a restart brings relief.

@ainar-g ainar-g self-assigned this Mar 31, 2023
@ainar-g ainar-g added this to the v0.107.27 milestone Mar 31, 2023
@ainar-g ainar-g pinned this issue Mar 31, 2023
@ainar-g
Copy link
Contributor

ainar-g commented Mar 31, 2023

Thanks for the reports, everyone. We're investigating but cannot reproduce it so far. Enabling and sharing verbose logs either here or through devteam@adguard.com, like OP did, would really help.

@ammnt, if it really is the query log, does disabling it fix the issue? Also, are there any panics in e.g. /var/log/AdGuardHome.err and /var/log/AdGuardHome.out?

@mouk, could you please elaborate? What kind of error do you see? There shouldn't be any syntax errors, but if you were following Edge, you could be affected by #5627, which shouldn't affect Beta users.

@ammnt
Copy link
Author

ammnt commented Mar 31, 2023

@ainar-g, I have sent you the message just now. Check devteam@adguard.com mailbox please🙂

@ammnt
Copy link
Author

ammnt commented Mar 31, 2023

@ainar-g, I recorded a video with the problem reproducing and the log output at that point. I will send you this next 10 minutes.

adguard pushed a commit that referenced this issue Mar 31, 2023
Updates #5661.

Squashed commit of the following:

commit 3fac63f
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Fri Mar 31 17:24:00 2023 +0300

    querylog: imp locks even more

commit bf14ab9
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Fri Mar 31 17:09:25 2023 +0300

    querylog: imp locks more

commit 40e885f
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Fri Mar 31 16:26:15 2023 +0300

    querylog: imp locks, logs
@ainar-g
Copy link
Contributor

ainar-g commented Mar 31, 2023

@ammnt, thanks for the information. We have a few theories. The latest Edge release contains a fix for one of them. We're working on the other ones.

adguard pushed a commit that referenced this issue Mar 31, 2023
Updates #5661.

Squashed commit of the following:

commit 0a1425d
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Fri Mar 31 18:31:19 2023 +0300

    querylog: opt locks more
@ainar-g
Copy link
Contributor

ainar-g commented Mar 31, 2023

v0.108.0-a.496+3575aa05 contains the second part of the tentative fix.

@qqsir-dev
Copy link

same problem with v0.108.0-a.496+3575aa05
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8733c87cb5fd adguard/adguardhome:edge "/sbin/tini -- /opt/…" 2 minutes ago Up 2 minutes (unhealthy) adguardhome

@fabricionaweb
Copy link

fabricionaweb commented Apr 1, 2023

It may had happened to me 3 many times today. v0.108.0-b.31 running on OpenWRT 22.03.3.
I have turned verbose on to try to catch some logs. But nothing yet. I may back if I found anything useful

I cant get any log as when it crashs the logs stops as well.

@ammnt
Copy link
Author

ammnt commented Apr 2, 2023

@ainar-g, I have not tested the latest version on the edge channel, but I can say that my server has not freezed over the past few days, since I did not use the web interface🤔

@qqsir-dev
Copy link

I am running v0.108.0-a.496+3575aa05 on Ubuntu 22.04.2 with Docker, it runs several minutes and goes to unhealthy, please check. Thanks.

@qqsir-dev
Copy link

I am running v0.108.0-a.496+3575aa05 on Ubuntu 22.04.2 with Docker, it runs several minutes and goes to unhealthy, please check. Thanks.

same as Version: v0.108.0-a.497+2a0d0629
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
355ff4b31d43 adguard/adguardhome:edge "/sbin/tini -- /opt/…" 3 minutes ago Up 3 minutes (unhealthy) adguardhome

@ammnt
Copy link
Author

ammnt commented Apr 3, 2023

@ainar-g, I am sure that the Docker container fall down itself, as healthcheck knocks on HTTP port. We must try to run the container without healthcheck and if it does not fall, then it is about HTTP processing and the library that is responsible for this.

@ainar-g
Copy link
Contributor

ainar-g commented Apr 3, 2023

@qqsir-dev, please report Docker healthcheck issues to #3290. Thanks.

@ammnt, our current theory is that long-running querylog searches prevented new requests from being counted. We've made a few changes that address those issues. Were you (or anyone watching the issue) able to test the most recent Edge releases?

@ZeroClover
Copy link

ZeroClover commented Apr 3, 2023

I also encountered this problem on one of my AGH servers with a fairly high query volume, version v0.108.0-b.31. AGH is running on a Debian 11 x64 vm.

image

Tried to get the logs, but when the problem occurs, the AGH binary no longer responds and does not output any logs.

The Systemd Unit is also unable to shut down AGH, and will only force it to shut down after a timeout with SIGKILL

image

The way to reproduce the problem is to click on "Query Log" in the WebGUI and after tens of seconds or minutes, AGH stops responding.

@blamaz
Copy link

blamaz commented Apr 3, 2023

Same problem here on openwrt master snapshot updating to AdGuard Home v0.108.0-a.497+2a0d0629.

@ammnt
Copy link
Author

ammnt commented Apr 4, 2023

@ainar-g, I checked the latest version of the Docker image now. There are no changes - the server still falls after a while. The logs are not unusual🤷🏻‍♂️

@ainar-g
Copy link
Contributor

ainar-g commented Apr 5, 2023

We've been able to reproduce the issue overnight. We have a few new theories, but most of them have to do with bugs in the Safe Search feature. @ammnt, could you please disable Safe Search, restart AGH, and see if the situation improves? We'll keep investigating the code in the meantime.

@ammnt
Copy link
Author

ammnt commented Apr 5, 2023

@ainar-g, unfortunately the same issue with disabled safesearch function. The verbose log is attached:
log.txt

EDIT:
Or I need to test it on the last version of edge build?😀

@ainar-g
Copy link
Contributor

ainar-g commented Apr 5, 2023

Thanks for trying it out. We are still investigating.

@fabricionaweb
Copy link

I had rolled back to stable. But before it, if it does help, I had one client that uses safe-search. And I had indeed notice some "adults websites" being blocked and it can be related.

@ammnt
Copy link
Author

ammnt commented Apr 5, 2023

I have tested the latest beta build. Works really better for now. Keep trying to reproduce it🫡

@qqsir-dev
Copy link

I have tested the latest beta build. Works really better for now. Keep trying to reproduce it🫡

me too, its working good for about 10 more minutes.

@ammnt
Copy link
Author

ammnt commented Apr 5, 2023

But I completely disabled safesearch service as @ainar-g told before👆🏼

@qqsir-dev
Copy link

But I completely disabled safesearch service as @ainar-g told before👆🏼

I didn't use safe search, its working good now.

@qqsir-dev
Copy link

But, Encryption on/off will stop http service.

@ammnt
Copy link
Author

ammnt commented Apr 5, 2023

I don't use built-in encryption but it works fine now with the enabled safesearch too.

@virtualm2000
Copy link

virtualm2000 commented Apr 5, 2023

I've tried v0.108.0-b.32. I doesn't work if safe search is enabled in client settings. It will become non responsive as before.
If safe search is disabled everywhere then it seems it works.
I use encryption too.

@ainar-g
Copy link
Contributor

ainar-g commented Apr 6, 2023

@ammnt, we've made a few improvements to safe search as well as the debugging API. Could you please update to v0.108.0-a.505+b1120221 or later and set debug_pprof to true in the config file? Then, if you are able to reproduce the error, can you please save and send us the following files:

curl -o /tmp/goroutine.pprof 'http://localhost:6060/debug/pprof/goroutine?debug=1'
curl -o /tmp/mutex.pprof 'http://localhost:6060/debug/pprof/mutex?debug=1'
curl -o /tmp/block.pprof 'http://localhost:6060/debug/pprof/block?debug=1'

These three files will contain the information that should help us find out, what is causing the issue. We're doing the same on our testing machines, but it'll help to have more data.

adguard pushed a commit that referenced this issue Apr 6, 2023
Updates #5661.

Squashed commit of the following:

commit 02e83c7
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Thu Apr 6 19:28:17 2023 +0300

    dnsforward: imp logs

commit 0f27265
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Thu Apr 6 19:18:19 2023 +0300

    dnsforward: imp locks
@ainar-g
Copy link
Contributor

ainar-g commented Apr 6, 2023

Update: we had been able to reliably reproduce the freeze and have pushed a change that fixes the issue in our tests in Edge build v0.108.0-a.506+5d5a7295.

@virtualm2000
Copy link

Running v0.108.0-a.506+5d5a7295 for about one hour and no issues anymore.

@Aiolos-Wang
Copy link

I updated to v0.108.0-a.506+5d5a7295 of my ADguard in LXC. It is runing over 1 hour and woks well till now

@blamaz
Copy link

blamaz commented Apr 7, 2023

Updated to v0.108.0-a.506+5d5a7295 and running fine again

@cjom
Copy link

cjom commented Apr 8, 2023

Version: v0.108.0-a.508+15bba281 running in router Xiaomi 3600 with safe search enable
Running without issues for around 12h.

@ainar-g
Copy link
Contributor

ainar-g commented Apr 10, 2023

Thanks everyone for testing! Since a lot of people are indicating that AGH runs well after the fix, we'll close this issue. The fix will be included into the next Beta and Release builds, which should come in the next few days.

@ainar-g ainar-g closed this as completed Apr 10, 2023
@ainar-g ainar-g unpinned this issue Apr 10, 2023
@ainar-g ainar-g modified the milestones: v0.107.29, v0.107.28 Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests