Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-for-changes.sh performance concerns on NFS #2098

Closed
NorseGaud opened this issue Jul 24, 2021 · 9 comments
Closed

check-for-changes.sh performance concerns on NFS #2098

NorseGaud opened this issue Jul 24, 2021 · 9 comments
Labels
kind/improvement Improve an existing feature, configuration file or the documentation meta/closed due to age or inactivity This issue / PR has been closed due inactivity meta/stale This issue / PR has become stale and will be closed if there is no further activity priority/low

Comments

@NorseGaud
Copy link
Member

NorseGaud commented Jul 24, 2021

Subject

#2096 sparked my interest in this. The addmailuser script seems to be a bit slow for some users (NFS mostly):

I notice that when calling addmailuser it becomes very slow when number of mailboxes increases.

I did not notice slowness too much when using non-NFS e.g. local HDD or AWS EBS storage

Now I am using NFS (actually AWS EFS) storage since deploying as docker in AWS ECS.

For example I have 2500 mailboxes and a call to addmailuser takes 60 seconds.

Ok, so this is understandable due to the following code I added in one of my last PRs:

if [[ -e "/tmp/docker-mailserver-config-chksum" ]] # Prevent infinite loop in tests like "checking accounts: user3 should have been added to /tmp/docker-mailserver/postfix-accounts.cf even when that file does not exist"
then
while [[ ! -d "/var/mail/${DOMAIN}/${USER}" ]]
do
echo "Waiting for dovecot to create /var/mail/${DOMAIN}/${USER}..."
sleep 1
done
fi

This code assists with NFS or other slow volumes in use by the mail server. We could debate why slow volumes are being used, but I'll save that for another ticket :)

So, with all of this said, it doesn't actually seem as if slow volumes are actually the main cause of the root problem. The root problem seems to be that check-for-changes.sh takes much too long to update what it needs to. I therefore did some digging in a forked branch: https://github.com/NorseGaud/docker-mailserver/commits/check-for-changes-performance

Here are the results while running a modified check-for-changes.sh that outputs run time and STDOUT/ERR:


  1. Adding emails one by one using for n in {1..10}; do ./usr/local/bin/addmailuser test${n}@pierce.us XXXXX; done in the running container in my production setup using NFS. The check-for-changes.sh was only modified to show the run time, and no other changes were made:
root@ip-172-31-3-247:/# ./usr/local/bin/check-for-changes.sh
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
---------- Created test1@pierce.us ----------------------
[ WARNING ]  File not found for certificate in check_for_changes.sh
postfix: stopped
postfix: started
dovecot: stopped
dovecot: started
DONE TOTAL RUNTIME SECONDS: 6
---------- Created test2@pierce.us ----------------------
[ WARNING ]  File not found for certificate in check_for_changes.sh
postfix: stopped
postfix: started
dovecot: stopped
dovecot: started
DONE TOTAL RUNTIME SECONDS: 18

All other emails added took ~18 seconds.

  1. I then added code that backgrounded almost all of the commands check-for-changes.sh runs, does a wait for each PID (wait "${WAIT_FOR_PIDS[*]}"), and then lets supervisorctl restart commands happen. The results were exactly the same, indicating to me that postfix/dovecot restarting was the primary target for the delay. I commented the dovecot restart out and found that actually postfix itself took ~15 seconds to restart ON THE SECOND RESTART.

While parallelization of certain things in check-for-changes.sh might be a good idea, postfix's restart time is crazy long.

I'm going to dig into if there is anything we can do about this, but I wanted to post it here for the community to also make some recommendations.

@NorseGaud NorseGaud added meta/needs triage This issue / PR needs checks and verification from maintainers meta/help wanted The OP requests help from others - chime in! :D kind/question Someone asked a question - feel free to answer priority/low labels Jul 24, 2021
@NorseGaud NorseGaud changed the title check-for-changes.sh performance check-for-changes.sh performance concerns on NFS Jul 24, 2021
@NorseGaud NorseGaud added area/issue kind/improvement Improve an existing feature, configuration file or the documentation and removed kind/question Someone asked a question - feel free to answer labels Jul 25, 2021
@NorseGaud
Copy link
Member Author

I found the same behavior with deleting emails.

@NorseGaud
Copy link
Member Author

Supervisor logs for restarting postfix:

-- first restart, took 5 seconds --
Stopping Postfix Mail Transport Agent: postfix.
/usr/local/bin/postfix-wrapper.sh: line 27: kill: (31219) - No such process
Starting Postfix Mail Transport Agent: postfix

-- second restart, took 17 seconds -- 
. (this took the majority of time)
Stopping Postfix Mail Transport Agent: postfix. (fast-ish)
/usr/local/bin/postfix-wrapper.sh: line 27: kill: (828) - No such process
Starting Postfix Mail Transport Agent: postfix (fast-ish)

etc etc

@NorseGaud
Copy link
Member Author

I finally found https://github.com/NorseGaud/docker-mailserver/blob/check-for-changes-performance/target/scripts/postfix-wrapper.sh which is doing while loops and sleeping and likely contributing to the issue. Looking for optimizations...

@NorseGaud
Copy link
Member Author

Side note: delmailuser returns right away, and if I'm deleting a ton of emails postfix restarts multiple times. There is room to improve this by blocking issuing a restart right away if there is still a pending lock file? I dunno...

@NorseGaud
Copy link
Member Author

NorseGaud commented Jul 29, 2021

It looks as if check-for-changes.sh is the issue here. It should have a delay before it performs the updates and restarts services to ensure that a lot of changes all happening at once get a chance to settle. I think the PR I opened is a great first step to solve this.

I was able to finally get the restart time down by using & and wait + changing sleeps to smaller amounts in the postfix-wrapper.

@github-actions
Copy link
Contributor

This issue has become stale because it has been open for 20 days without activity. Remove the label and comment or this issue will be closed in 10 days.

@github-actions github-actions bot added the meta/stale This issue / PR has become stale and will be closed if there is no further activity label Aug 21, 2021
@github-actions
Copy link
Contributor

This issue was closed due to inactivity.

@github-actions github-actions bot added the meta/closed due to age or inactivity This issue / PR has been closed due inactivity label Aug 31, 2021
@NorseGaud NorseGaud reopened this Aug 31, 2021
@NorseGaud NorseGaud removed meta/stale This issue / PR has become stale and will be closed if there is no further activity meta/needs triage This issue / PR needs checks and verification from maintainers meta/help wanted The OP requests help from others - chime in! :D labels Aug 31, 2021
@github-actions github-actions bot removed the meta/closed due to age or inactivity This issue / PR has been closed due inactivity label Sep 1, 2021
@github-actions
Copy link
Contributor

This issue has become stale because it has been open for 20 days without activity. Remove the label and comment or this issue will be closed in 10 days.

@github-actions github-actions bot added the meta/stale This issue / PR has become stale and will be closed if there is no further activity label Oct 10, 2021
@github-actions
Copy link
Contributor

This issue was closed due to inactivity.

@github-actions github-actions bot added the meta/closed due to age or inactivity This issue / PR has been closed due inactivity label Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/improvement Improve an existing feature, configuration file or the documentation meta/closed due to age or inactivity This issue / PR has been closed due inactivity meta/stale This issue / PR has become stale and will be closed if there is no further activity priority/low
Projects
None yet
Development

No branches or pull requests

2 participants