Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add log-cleanup sidecar to scheduler/worker pods #502

Closed
thesuperzapper opened this issue Jan 11, 2022 · 1 comment · Fixed by #554
Closed

add log-cleanup sidecar to scheduler/worker pods #502

thesuperzapper opened this issue Jan 11, 2022 · 1 comment · Fixed by #554
Labels
kind/enhancement kind - new features or changes
Milestone

Comments

@thesuperzapper
Copy link
Member

thesuperzapper commented Jan 11, 2022

Right now, if worker/scheduler pods run for long periods of time, they may accumulate so many logs under logs.path that they may cause issues.

We should allow users to deploy a sidecar that is responsible to truncate log files after some period of time.

Here are some example values for what this feature may look like:

scheduler:
  logCleanup:
    # if the sidecar container is added to the scheduler Pod (default: true)
    enabled: true

    # resources for the ContainerSpec
    resources: {}

    ## the number of minutes to retain log files (by last-modified time)
    ##
    retentionMinutes: 21600

    ## the number of seconds between each check for files to delete
    ##
    intervalSeconds: 900

workers:
  logCleanup:
    ## SAME AS ABOVE

WARNING: we should NOT allow {scheduler,workers}.logCleanup.enabled and logs.persistence.enabled to be true at the same time (using ./_helpers/validate-values.tpl) as multiple of the sidecars may fight with each other. In the error message, tell users that they could create an airflow job instead.

@thesuperzapper thesuperzapper added the kind/enhancement kind - new features or changes label Jan 11, 2022
@thesuperzapper thesuperzapper added this to the airflow-8.7.0 milestone Jan 11, 2022
@thesuperzapper thesuperzapper added this to Unsorted in Issue Triage and PR Tracking via automation Jan 11, 2022
@thesuperzapper thesuperzapper moved this from Unsorted to PR | Needed in Issue Triage and PR Tracking Jan 11, 2022
@thesuperzapper
Copy link
Member Author

This issue is related to memory usage increasing over time when our scheduler liveness probe is enabled.

Something about how apache/airflow#14924 was fixed does not fix our scheduler liveness probe, which seems to have ever-increasing cache memory usage (with some improvement gained from deleting the scheduler log files).

@thesuperzapper thesuperzapper moved this from Triage | Needs PR to Triage | Work Started in Issue Triage and PR Tracking Mar 30, 2022
Issue Triage and PR Tracking automation moved this from Triage | Work Started to Done Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement kind - new features or changes
Development

Successfully merging a pull request may close this issue.

1 participant