Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(check-for-changes.sh): Drop redundant guards #2623

Conversation

polarathene
Copy link
Member

@polarathene polarathene commented Jun 6, 2022

Description

Housekeeping PR mostly, added a few extra configs that should be monitored for changes.

I looked into some conditional guards and why they were needed. Turns out they're not, most of it was introduced from early support of Change Detection service and no longer relevant.

Origin of conditional guards:

  • check-for-changes.sh originally used relative paths to supported config files to monitor. This and checksum file creation relied on changing the working directory to /tmp/docker-mailserver. No longer necessary as we use absolute paths.
  • This carried over to associated scripts to only run logic if the directory existed (such as here in a Aug 2019 commit). We support monitoring changes elsewhere now (certs), the guard is legacy and can be dropped.
  • The postfix-accounts.cf guard was introduced as part of the original check-for-changes.sh feature PR in Oct 2017, shortly followed up with a fix to the guard to exit if the config file didn't exist to prevent log spam.
  • postfix-aliases.cf was always generated, for what seemed to be a compatibility workaround for check-for-changes.sh support. That logic was originally introduced in a Nov 2018 PR. That PR did not introduce the original alias support requiring POSTMASTER_ADDRESS, but did carry over the dependency for it into check-for-changes.sh. As aliases.sh has since de-duplicated the alias code and check-for-changes.sh is in much better shape, this workaround is no longer necessary.

I've additionally added 3 user configs that make sense for change detection to support as well since we already run recreation logic involving them: postfix-sasl-password.cf, postfix-relaymap.cf, postfix-regexp.cf.

All these /tmp/docker-mailserver config files are wrapped into the directory check before adding them to STAGING_FILES, although with shopt -s nullglob and the filtering to CHANGED_FILES array, it probably makes no difference and the conditional could be dropped?

Type of change

  • Improvement (non-breaking change that does improve existing functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes

@polarathene polarathene added area/scripts kind/improvement Improve an existing feature, configuration file or the documentation labels Jun 6, 2022
@polarathene polarathene added this to the v11.1.0 milestone Jun 6, 2022
@polarathene polarathene self-assigned this Jun 6, 2022
georglauterbach
georglauterbach previously approved these changes Jun 6, 2022
Copy link
Member

@georglauterbach georglauterbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@polarathene

This comment was marked as resolved.

casperklein
casperklein previously approved these changes Jun 6, 2022
@georglauterbach georglauterbach self-assigned this Jun 6, 2022
@georglauterbach
Copy link
Member

georglauterbach commented Jun 6, 2022

test/tests.bats - "checking restart of process: postfix":

@test "checking restart of process: postfix" {
  run docker exec mail /bin/bash -c "pkill master && sleep 10 && ps aux --forest | grep -v grep | grep '/usr/lib/postfix/sbin/master'"
  assert_success
}

The test command looks awful... grep -v grep is used to "filter out" the grep command itself, which is a very interesting decision.

Not quite sure why that's happening with these changes tbh. I cannot seem to reliably reproduce this failure. I'll re-run the tests.

I do not see a reason for the failure as well ATM... Nevermind, possiby found something, see #2623 (comment)

@polarathene
Copy link
Member Author

polarathene commented Jun 7, 2022

Outdated - Early investigation

Bizarrely the test failure was avoided by reverting the commit adding 3 new config files to monitor. Now running tests with each config line added as single commits to try see where the CI is choking.

Current progress:

  • Add postfix-regexp.cf - Passed
  • Add postfix-sasl-password.cf (actually was postfix-relaymap.cf) - Failed
  • Add postfix-relaymap.cf (actually was postfix-sasl-password.cf) - Passed

postfix-relaymap.cf is the problem. Investigating...


The test command looks awful... grep -v grep is used to "filter out" the grep command itself, which is a very interesting decision.

Do you have a suggestion for improving the test? They originated from the Aug 2017 PR introducing supervisord into the project.

@polarathene
Copy link
Member Author

polarathene commented Jun 7, 2022

Cause identified as earlier tests updating the newly monitored file postfix-relaymap.cf, triggering change detection event at a variable time, resulting in random tests failure due to Postfix being down temporarily due to supervisorctl restart postfix.

PR to extract the setup.sh tests now up: #2629


Troubleshooting and experimentation

Prior to the supervisord kill tests, just a little bit earlier is setup.sh relay command tests that manipulate postfix-relaymap.cf:

@test "checking setup.sh: setup.sh relay add-domain" {
./setup.sh -c mail relay add-domain example1.org smtp.relay1.com 2525
./setup.sh -c mail relay add-domain example2.org smtp.relay2.com
./setup.sh -c mail relay add-domain example3.org smtp.relay3.com 2525
./setup.sh -c mail relay add-domain example3.org smtp.relay.com 587
# check adding
run /bin/sh -c "cat $(private_config_path mail)/postfix-relaymap.cf | grep -e \"^@example1.org\s\+\[smtp.relay1.com\]:2525\" | wc -l | grep 1"
assert_success
# test default port
run /bin/sh -c "cat $(private_config_path mail)/postfix-relaymap.cf | grep -e \"^@example2.org\s\+\[smtp.relay2.com\]:25\" | wc -l | grep 1"
assert_success
# test modifying
run /bin/sh -c "cat $(private_config_path mail)/postfix-relaymap.cf | grep -e \"^@example3.org\s\+\[smtp.relay.com\]:587\" | wc -l | grep 1"
assert_success
}
@test "checking setup.sh: setup.sh relay add-auth" {
./setup.sh -c mail relay add-auth example.org smtp_user smtp_pass
./setup.sh -c mail relay add-auth example2.org smtp_user2 smtp_pass2
./setup.sh -c mail relay add-auth example2.org smtp_user2 smtp_pass_new
# test adding
run /bin/sh -c "cat $(private_config_path mail)/postfix-sasl-password.cf | grep -e \"^@example.org\s\+smtp_user:smtp_pass\" | wc -l | grep 1"
assert_success
# test updating
run /bin/sh -c "cat $(private_config_path mail)/postfix-sasl-password.cf | grep -e \"^@example2.org\s\+smtp_user2:smtp_pass_new\" | wc -l | grep 1"
assert_success
}
@test "checking setup.sh: setup.sh relay exclude-domain" {
./setup.sh -c mail relay exclude-domain example.org
run /bin/sh -c "cat $(private_config_path mail)/postfix-relaymap.cf | grep -e \"^@example.org\s*$\" | wc -l | grep 1"
assert_success
}

As this is running in test/tests.bats each test is sharing the same container and running sequentially, so I suspect one of these earlier tests is impacting it.


UPDATE: Yep... that seems to be it. Likely change detection restarting Postfix and the tests that follow not expecting Postfix to be down? (presumably something like that is happening, and it's a race condition hence why my local system isn't affected)

PR to extract the setup.sh tests now up: #2629

Alternatively, the supervisord tests could have some delay added, but I imagine we'd want to split up test/tests.bats anyway :)


I'll play around with the supervisord sleep value to see if it's too short, or if the earlier pkill is at fault.

  • Duration after setup.sh relay test to the start of the supervisord postfix kill test was 13 seconds.
  • postfix-wrapper.sh has a while loop trigger that will check for a PID file and sleep 5 seconds between checking the condition again. I assume that's a variable offset that the pkill test introduces?
  • check-for-changes.sh has a similar variable offset of 2 second sleep to poll for changed files, and some variable amount of time to perform those changes along with the Postfix restart (tests tend to give 10-15 sec sleep to wait on that).
  • So it's likely check-for-changes.sh processes a change that performs the supervisorctl restart postfix step while the pkill postfix test is idle with sleep?

sleep 60 worked fine.


Reverted back commits to prior state before experimenting. The current failure has changed, and is tests between the setup.sh relay tests and supervisord Postfix one. So other tests are also prone to failure due to the change detection variability..

not ok 309 checking spoofing: rejects sender forging in 2359ms
# (from function `assert_output' in file test/test_helper/bats-assert/src/assert.bash, line 231,
#  in test file test/tests.bats, line 1183)
#   `assert_output --partial 'Sender address rejected: not owned by user'' failed
# 
# -- output does not contain substring --
# substring (1 lines):
#   Sender address rejected: not owned by user
# output (19 lines):
#   220 mail.my-domain.com ESMTP
#   250-mail.my-domain.com
#   250-PIPELINING
#   250-SIZE 10240000
#   250-ETRN
#   250-STARTTLS
#   250-AUTH PLAIN LOGIN
#   250-AUTH=PLAIN LOGIN
#   250-ENHANCEDSTATUSCODES
#   250-8BITMIME
#   250-DSN
#   250 CHUNKING
#   334 UGFzc3dvcmQ6
#   535 5.7.8 Error: authentication failed: UGFzc3dvcmQ6
#   250 2.1.0 Ok
#   250 2.1.5 Ok
#   354 End data with <CR><LF>.<CR><LF>
#   250 2.0.0 Ok: queued as 78BE61086CE
#   221 2.0.0 Bye
# --
# 
not ok 310 checking spoofing: accepts sending as alias in 2318ms
# (from function `assert_success' in file test/test_helper/bats-assert/src/assert.bash, line 114,
#  in test file test/tests.bats, line 1188)
#   `assert_success' failed
# 
# -- command failed --
# status : 1
# output : 
# --

Probably related to the cause for this issue.

This is an old requirement from when the change detector service was first introduced. It's no longer relevant.
The config was created regardless to workaround early change detection support. No longer necessary to require the file to exist.
Legacy guards when this was the only location change detection location supported.

There does not appear to be any need for changing into this directory at the start of `check-for-changes.sh` as we use absolute filepaths (originally monitored files were checked with relative paths to this config dir).
These are also handled at run-time in the current change detection support, so it makes sense to allows these config updates to also trigger change events.
@polarathene polarathene force-pushed the chore/changedetection-drop-redundant-guards branch from 4c4b2ed to 59b7854 Compare June 7, 2022 05:45
@polarathene polarathene force-pushed the chore/changedetection-drop-redundant-guards branch from 5b0c24c to 59b7854 Compare June 7, 2022 07:54
@georglauterbach georglauterbach added the pr/waiting for other PR to get merged This PR is waiting for another / other PR(s) to get merged label Jun 7, 2022
@georglauterbach
Copy link
Member

Nice detective work :D

@polarathene polarathene removed the pr/waiting for other PR to get merged This PR is waiting for another / other PR(s) to get merged label Jun 7, 2022
@polarathene polarathene requested a review from a team June 7, 2022 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/scripts kind/improvement Improve an existing feature, configuration file or the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants