-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check-for-changes.sh uses a lot of resources #2348
Comments
I've flagged this as bug, because if the script indeed does not behave well, it is a bug :) ping @docker-mailserver/maintainers I will have a look at this tomorrow too. |
mine is not growing as quickly as OPs, but it is certainly growing. |
The only diff for changes on that file between the two releases was related to fixing an issue with I guess the looping logic is causing a memory leak since it never ends and bash doesn't know any better? If you're ok with I would like to use file watchers to trigger scripts instead of a polling loop that we have currently, but some users apparently require polling. I have plans to support both, time permitting that should hopefully happen in January. For now If you're not using Traefik for provisioning certs (I know |
With nothing changing to run the other logic, this is the repeated loop iteration over and over:
docker-mailserver/target/scripts/check-for-changes.sh Lines 48 to 58 in 014ddda
docker-mailserver/target/scripts/check-for-changes.sh Lines 135 to 141 in 014ddda
Could you try with a local bind mount to a host directory instead of a Docker data volume? Curious if the constant writing of a new file that we rename with
If this isn't filesystem cache memory (which is common to use available memory, and all these written/read files could eat into that but the memory itself should be available to any process that actually needs it), and not some CoW like issue with the file management/writes, then maybe the file itself is growing excessively large? It shouldn't AFAIK, hard to say. |
after a restart, i can confirm it's not growing anymore. so i can no longer reproduce. fyi. |
Then it might have been something related to the file locking logic, that's also going to be addressed and should improve things. |
Thank you very much for all these replies!
And the other possible change I could think about, could be a new version of
Sure. This process is really using memory, it is not the filesystem cache memory.
If I am correct, this means the file
If I am not wrong, the file
That's strange. In my case, after a restart it is growing again ;-) Again, thank you very much for all these informations and thoughts! |
Hmmm... perhaps temporarily try bind mounting a local volume for the entire
Are you saying it has always contained more certificates, or there is more after restarting?
It could be due to the image upgrade.. if you're comfortable building a local image to verify, you can do so this way: git clone https://github.com/docker-mailserver/docker-mailserver.git --recurse-submodules
cd docker-mailserver
# Change the Docker image back to Buster (Debian 10)..
# 1st line of `Dockerfile`:
# `FROM docker.io/debian:11-slim` to `FROM docker.io/debian:10-slim` should work.
# Build the image
make build
# You now have a new local docker-mailserver image called `mailserver-testing:ci`, as v10.4.0 but with Debian 10 base image instead A related change between docker-mailserver/target/scripts/helper-functions.sh Lines 221 to 255 in 014ddda
As the checksum file is not changing in size, it seems unlikely that this would introduce a memory leak, the logic should all be the same apart from now checking the |
Ok, I copied the whole /tmp directory to a local volume on the host and startet up with using this local directory.
There is no difference (means everything works like before and the memory consumption is constantly growing).
Debian 11.2 (amd64)
Just to clarify: I am using
There are 33 certificates in this directory, as this machine additionally provides some simple static websites on a bunch of (sub)domains.
No, the (re)start of
As I am still a bit new to Docker and never built a local image myself, I am - honestly speaking - not so comfortable with doing it the first time - but I tried ;-)
For sure I missed something as I never built an image before. |
I tried to dig a bit deeper and maybe found some related thing. When DMS is starting up, the output looks like this:
Some lines of this output are obviously generated by It looks like
Or:
But none of these outputs are visible during startup of DMS (see above). Maybe this shows that I don't know if this could be an interesting find, or if it is completely irrelevant and output from |
To see inf/task notifications, |
Not necessarily, as Docker builds on different OS can be compromised by different problems (and the build for this image has some quirks / difficulties - so no worries). The error you are experiencing however, does not make a lot of sense to me. The package As mentioned, to see a more detailed log output, set the environment variable I'm not sure whether I had a look at my setup but could not make out such a problem (running on ENABLE_CLAMAV=1
ENABLE_AMAVIS=1
SSL_TYPE=manual
ENABLE_QUOTAS=0 |
I tried building it again, now with With Debian 11 the build process looks fine, reaching the test process (which I cancelled). Wondering if this problem just occurs with SSL type Let'sEncrypt. I will try to use |
You can manually remove |
Thank you! The built of DMS 10.4.0 with Debian 10 was successful and is running right now. The excessive growth of the process So it looks like this bug was introduced with the switch to Debian 11. I don't have any idea about the reason, sadly. For now I will keep running this local built image, if there is no reason not to do so (if there is, please tell me). |
Thanks for the detailed responses and sharing your findings, much appreciated! ❤️
Did you give that a try?
Probably difficult to attempt debugging that, but it's good to know it's related to the base image change. There was another recent issue where the problem was due to running the host OS Debian 11, but no issue on Ubuntu; It may be related here too:
|
I just tested the script with Valgrind, and there are no leaks on the heap. So this does not seem to be an issue with Bash or the script... Maybe this is indeed a problem with the specific version of Docker. As @polarathene mentioned, did you give |
I tried that yesterday quickly, but ran into errors. For sure I just made a mistake when specifying the path to the certificates, as they have not been found at startup of DMS. Unfortunately I had no time to figure this out and correct it, as I had to bring up the mailserver again to reduce downtime. Sorry that I can not provide information about this topic now.
Same for me, and I am running this version already.
I updated compose from As this problem is hard to figure out, maybe it would be an idea to temporarily provide two different flavours of DMS for the next releases, one based on Debian 11, the other one based on Debian 10? |
We're not yet 100% sure whether it is actually Debian 11 or a package with Debian 11 that causes this, or something else entirely. The fact that all works well again with Debian 10 seems to be "obvious" as v10.4 is mainly about Debian 11, albeit not entirely. We will need to investigate further. I have made sure I'm on |
Did someone achive this local first with success? Would avoid it if it's effect is unclear |
@ontheair81 have you tried whether this issue persists with the current |
Just tested |
Alright. Then rebuilding is not going to change anything. One more thing we can rule out. For the future, we may want to try out some fixes. We will let you know when we need some assistance. Everyone keep us posted out their findings. |
Can confirm identical behavior. This morning RAM was at 5%, CPU usage at 1%~2%, now late at night, RAM is at 10%, CPU usage at 2%~5%. |
We switched from Debian 10 to 11 with the To rule out a regression from that, did someone already try to build the 10.4.0 release for Debian 10? I think undoing this commit should be sufficient: 8a47e7d#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557 |
Yes, I am running a local built version of 10.4.0 based on Debian 10 since December 29. It is working without any problems. |
#2401 was merged which refactored a part of the changedetector functionality that could behave weird. Can someone please try |
I guess it is time to open a bottle of champagne! Trying Maybe other people affected by this issue can give it a try and report back, too? |
If this ends up being the solution, this was a huge blocker for #2157 and may allow me to easily finalize it :) |
Very nice! It would be good to hear from all the others whether their issue is resolved as well (trying |
I've been on |
Running more than 6hrs on edge. So far looks good! No increase in memory consumption.
In addition, the script used 13 seconds of CPU time during this time. |
Looks very promising. For those who want to reduce the CPU load further (like me), you can use sed -i -E 's| sleep.*| sleep X|' /usr/local/bin/check-for-changes.sh to adjust the sleep time to |
Awesome that it appears to be fixed from the responses 🎉 Bit surprised it was an issue/bug from using a sub-shell though? 🤯
I have been lacking time, but I really think we should approach that as my comment on the PR describes. Hopefully I can find time to contribute to the project again at some point 🤞 I'm pretty sure that the old file locking approach without sleep polling can work with NFS, or we can fallback to polling for that specifically while the majority of users can use more efficient change detection. |
I noticed this issue as well. Fixed it by setting sleep from 1 second to 10 hours like @georglauterbach |
The proper fix should be included in the upcoming release ( If the issue comes up again, please let us know :D |
This issue has become stale because it has been open for 20 days without activity.
|
Looks like this is fixed now. political statement @tyedco? 🇺🇦 🤣 nice! |
@MichaelSp no one was thinking of that at that time. |
The fix is included in the current 10.5.0 release. |
Subject
One day after upgrading to
10.4.0
I realized the processcheck-for-changes.sh
consumes a lot of memory and cpu resources.I did not see this behaviour with
10.3.0
.Description
The next day after the upgrade the container used 2.5G of memory (this usually was less than 800M) and the four cpu cores have been very busy. After restarting
docker-mailserver
everything was as usual again.Some hours later I realized that memory and cpu usage was growing again. It seems to be caused by the process
bin/bash /usr/local/bin/check-for-changes.sh
.After starting
docker-mailserver
, the memory consumption of this process is around 6M, but growing 800k per minute (ca. 50M per hour). The higher the memory usage goes, the higher the cpu load.This makes it necessary for me to restart
docker-mailserver
periodically (let's say: every day) as I have 4G of RAM available.Question
It would be nice to hear if there are other people experiencing the same issue, or if it is just me ;-) Help is very appreciated!
I am not an expert in docker (yet), but I am very willing to learn as much as possible and to provide more information as needed and requested.
Thank you very much in advance!
(I did not label this issue as "bug" as I am not sure if this is a general issue or just happening in my configuration.)
Environment
Debian Bullseye, latest patches
Let'sEncrypt-certificates are generated from nginx-reverse-proxy with acme-companion.
And some (maybe useful and related) informations from .env. Please ask if there is more information needed!
The text was updated successfully, but these errors were encountered: