Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logrotateadditionalfiles deletion. #2081

Merged
merged 5 commits into from
Mar 19, 2020

Conversation

baconmania
Copy link
Contributor

@baconmania baconmania commented Mar 12, 2020

This PR aims to fix a scenario whereby logrotateAdditionalFiles marked with deleteInExecutorCleanup would never actually be deleted. The scenario was:

I’ve plumbed through some logic for log cleanup which is similar to the logic for the sandbox app/ directory cleanup. Which is to say, if it hasn’t been 15 minutes since task termination, we return a WAITING state from the “immediate” cleanup process. This way, we allow scheduled cleanup to take another look at it later, and properly delete files which we had been logrotating for the lifetime of the task.

@baconmania
Copy link
Contributor Author

baconmania commented Mar 12, 2020

Note that there are, broadly, three things to be cleaned up:

  1. app dir
  2. log files uploaded by Singularity (service.log and s3UploaderAdditionalFiles)
  3. log files rotated by Singularity (logrotateAdditionalFiles)

With this PR, we're set up so that 2 and 3 are either performed or skipped together. Is this what we want? Should 1 and 2 happen together, with the 15 minute delay applying only to 3?

@ssalinas
Copy link
Member

I think I agree that the 15 minute delay should apply to both 2 and 3. In particular because the S3Uploader needs a chance to get a at the logs for 2, which the 15 minute window is nice for.

Overall PR seems fine. I think it's gonna be a matter of testing the common cases in staging. Good find on where we clean the task definition file. Couldn't remember earlier where we actually were informed about a sandbox still hanging around

@ssalinas ssalinas added this to the 1.3.0 milestone Mar 12, 2020
@baconmania
Copy link
Contributor Author

baconmania commented Mar 12, 2020

Tested successfully. At task termination time, app directory cleanup and logrotation happen immediately. logrotateAdditionalFiles deletion is the only piece that's subject to the 15 minute delay. The scenario described above now proceeds as follows:

  • Task terminates successfully
  • Executor runs the “immediate” cleanup logic
  • app/ directory is cleaned
  • tail_of_service.log is created by grabbing a snippet of service.log
  • service.log is deleted
  • All configured logfiles that are configured to be rotated, are rotated
  • Deletion of logrotateAdditionalFiles is skipped because it hasn't been 15 minutes since task termination
  • The immediate cleanup process returns a WAITING state and the task definition file isn't deleted
  • ... 15 minutes later...
  • Scheduled cleanup process discovers the task definition file
  • logrotateAdditionalFiles deletion proceeds
  • Task definition file is deleted
  • Scheduled cleanup process returns SUCCESS

@ssalinas
Copy link
Member

🚢

@ssalinas ssalinas merged commit 9dc5ede into master Mar 19, 2020
@ssalinas ssalinas deleted the fix-logrotateadditionalfiles-deletion branch March 19, 2020 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants