check-for-changes.sh performance concerns on NFS #2098

NorseGaud · 2021-07-24T14:28:27Z

Subject

#2096 sparked my interest in this. The addmailuser script seems to be a bit slow for some users (NFS mostly):

I notice that when calling addmailuser it becomes very slow when number of mailboxes increases.

I did not notice slowness too much when using non-NFS e.g. local HDD or AWS EBS storage

Now I am using NFS (actually AWS EFS) storage since deploying as docker in AWS ECS.

For example I have 2500 mailboxes and a call to addmailuser takes 60 seconds.

Ok, so this is understandable due to the following code I added in one of my last PRs:

docker-mailserver/target/bin/addmailuser

Lines 68 to 75 in 45345b2

if [[ -e "/tmp/docker-mailserver-config-chksum" ]] # Prevent infinite loop in tests like "checking accounts: user3 should have been added to /tmp/docker-mailserver/postfix-accounts.cf even when that file does not exist"

then

while [[ ! -d "/var/mail/${DOMAIN}/${USER}" ]]

do

echo "Waiting for dovecot to create /var/mail/${DOMAIN}/${USER}..."

sleep 1

done

fi

This code assists with NFS or other slow volumes in use by the mail server. We could debate why slow volumes are being used, but I'll save that for another ticket :)

So, with all of this said, it doesn't actually seem as if slow volumes are actually the main cause of the root problem. The root problem seems to be that check-for-changes.sh takes much too long to update what it needs to. I therefore did some digging in a forked branch: https://github.com/NorseGaud/docker-mailserver/commits/check-for-changes-performance

Here are the results while running a modified check-for-changes.sh that outputs run time and STDOUT/ERR:

Adding emails one by one using for n in {1..10}; do ./usr/local/bin/addmailuser test${n}@pierce.us XXXXX; done in the running container in my production setup using NFS. The check-for-changes.sh was only modified to show the run time, and no other changes were made:

root@ip-172-31-3-247:/# ./usr/local/bin/check-for-changes.sh
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
---------- Created test1@pierce.us ----------------------
[ WARNING ]  File not found for certificate in check_for_changes.sh
postfix: stopped
postfix: started
dovecot: stopped
dovecot: started
DONE TOTAL RUNTIME SECONDS: 6
---------- Created test2@pierce.us ----------------------
[ WARNING ]  File not found for certificate in check_for_changes.sh
postfix: stopped
postfix: started
dovecot: stopped
dovecot: started
DONE TOTAL RUNTIME SECONDS: 18

All other emails added took ~18 seconds.

I then added code that backgrounded almost all of the commands check-for-changes.sh runs, does a wait for each PID (wait "${WAIT_FOR_PIDS[*]}"), and then lets supervisorctl restart commands happen. The results were exactly the same, indicating to me that postfix/dovecot restarting was the primary target for the delay. I commented the dovecot restart out and found that actually postfix itself took ~15 seconds to restart ON THE SECOND RESTART.

While parallelization of certain things in check-for-changes.sh might be a good idea, postfix's restart time is crazy long.

I'm going to dig into if there is anything we can do about this, but I wanted to post it here for the community to also make some recommendations.

The text was updated successfully, but these errors were encountered:

NorseGaud · 2021-07-29T11:22:57Z

I found the same behavior with deleting emails.

NorseGaud · 2021-07-29T11:31:15Z

Supervisor logs for restarting postfix:

-- first restart, took 5 seconds --
Stopping Postfix Mail Transport Agent: postfix.
/usr/local/bin/postfix-wrapper.sh: line 27: kill: (31219) - No such process
Starting Postfix Mail Transport Agent: postfix

-- second restart, took 17 seconds -- 
. (this took the majority of time)
Stopping Postfix Mail Transport Agent: postfix. (fast-ish)
/usr/local/bin/postfix-wrapper.sh: line 27: kill: (828) - No such process
Starting Postfix Mail Transport Agent: postfix (fast-ish)

etc etc

NorseGaud · 2021-07-29T11:32:01Z

I finally found https://github.com/NorseGaud/docker-mailserver/blob/check-for-changes-performance/target/scripts/postfix-wrapper.sh which is doing while loops and sleeping and likely contributing to the issue. Looking for optimizations...

NorseGaud · 2021-07-29T11:48:00Z

Side note: delmailuser returns right away, and if I'm deleting a ton of emails postfix restarts multiple times. There is room to improve this by blocking issuing a restart right away if there is still a pending lock file? I dunno...

NorseGaud · 2021-07-29T18:34:38Z

It looks as if check-for-changes.sh is the issue here. It should have a delay before it performs the updates and restarts services to ensure that a lot of changes all happening at once get a chance to settle. I think the PR I opened is a great first step to solve this.

I was able to finally get the restart time down by using & and wait + changing sleeps to smaller amounts in the postfix-wrapper.

github-actions · 2021-08-21T01:21:37Z

This issue has become stale because it has been open for 20 days without activity. Remove the label and comment or this issue will be closed in 10 days.

github-actions · 2021-08-31T01:22:25Z

This issue was closed due to inactivity.

github-actions · 2021-10-10T01:24:15Z

This issue has become stale because it has been open for 20 days without activity. Remove the label and comment or this issue will be closed in 10 days.

github-actions · 2021-10-20T01:25:15Z

This issue was closed due to inactivity.

NorseGaud added meta/needs triage This issue / PR needs checks and verification from maintainers meta/help wanted The OP requests help from others - chime in! :D kind/question Someone asked a question - feel free to answer priority/low labels Jul 24, 2021

NorseGaud changed the title ~~check-for-changes.sh performance~~ check-for-changes.sh performance concerns on NFS Jul 24, 2021

NorseGaud added area/issue kind/improvement Improve an existing feature, configuration file or the documentation and removed kind/question Someone asked a question - feel free to answer labels Jul 25, 2021

NorseGaud mentioned this issue Jul 29, 2021

check-for-changes: performance improvements + wait for settle #2104

Merged

10 tasks

github-actions bot added the meta/stale This issue / PR has become stale and will be closed if there is no further activity label Aug 21, 2021

NorseGaud mentioned this issue Aug 28, 2021

check-for-changes: performance improvements + wait for settle v2 #2157

Closed

10 tasks

github-actions bot added the meta/closed due to age or inactivity This issue / PR has been closed due inactivity label Aug 31, 2021

github-actions bot closed this as completed Aug 31, 2021

NorseGaud reopened this Aug 31, 2021

NorseGaud removed meta/stale This issue / PR has become stale and will be closed if there is no further activity meta/needs triage This issue / PR needs checks and verification from maintainers meta/help wanted The OP requests help from others - chime in! :D labels Aug 31, 2021

NorseGaud mentioned this issue Aug 31, 2021

[BUG] More than 20s delay before postfix server responds #2161

Closed

github-actions bot removed the meta/closed due to age or inactivity This issue / PR has been closed due inactivity label Sep 1, 2021

georglauterbach removed the area/issue label Sep 19, 2021

github-actions bot added the meta/stale This issue / PR has become stale and will be closed if there is no further activity label Oct 10, 2021

github-actions bot added the meta/closed due to age or inactivity This issue / PR has been closed due inactivity label Oct 20, 2021

github-actions bot closed this as completed Oct 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check-for-changes.sh performance concerns on NFS #2098

check-for-changes.sh performance concerns on NFS #2098

NorseGaud commented Jul 24, 2021 •

edited

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021 •

edited

github-actions bot commented Aug 21, 2021

github-actions bot commented Aug 31, 2021

github-actions bot commented Oct 10, 2021

github-actions bot commented Oct 20, 2021

check-for-changes.sh performance concerns on NFS #2098

check-for-changes.sh performance concerns on NFS #2098

Comments

NorseGaud commented Jul 24, 2021 • edited

Subject

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021

NorseGaud commented Jul 29, 2021 • edited

github-actions bot commented Aug 21, 2021

github-actions bot commented Aug 31, 2021

github-actions bot commented Oct 10, 2021

github-actions bot commented Oct 20, 2021

NorseGaud commented Jul 24, 2021 •

edited

NorseGaud commented Jul 29, 2021 •

edited