Skip to content

fix: Add caps to celery work #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

fix: Add caps to celery work #301

wants to merge 1 commit into from

Conversation

suejung-sentry
Copy link
Contributor

@suejung-sentry suejung-sentry commented Jul 9, 2025

[UPDATE - this PR as is won't work unless I keep track of the start somewhere durable.. thinking.. tbd]

This PR removed bounds from cleanup_flare() function, causing it to process potentially millions of records instead of limited batches. This created too many concurrent file/socket connections (redis, postgres, GCS), exceeding the system's file descriptor limit (ulimit), resulting in "Error 24: Too many open files" in Celery.

Fix: Added back bounded processing (max 200 batches per run), used .iterator() for memory efficiency, and added throttling (time.sleep(0.005)) between file deletions to prevent connection spikes. Note that this means it will take more daily jobs to pay down the rows needing cleanup, but we'll do it in more manageable batches.

Also note this PR retains the changes that the original above PR was meant to solve (i.e., look at everything instead of just ids # 5000-500000, so we correctly process 1-4999 and 500001+).

Closes https://linear.app/getsentry/issue/CCMRG-1352/fix-ongoing-intermittent-celery-task-failures-reported-issue

@suejung-sentry suejung-sentry marked this pull request as draft July 9, 2025 00:42
@suejung-sentry suejung-sentry force-pushed the sshin/task-caps branch 3 times, most recently from 25d301a to efe0edb Compare July 9, 2025 03:17
Copy link

codecov bot commented Jul 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.32%. Comparing base (f97b2d3) to head (efe0edb).

Current head efe0edb differs from pull request most recent head 4928645

Please upload reports for the commit 4928645 to get more accurate results.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #301   +/-   ##
=======================================
  Coverage   94.32%   94.32%           
=======================================
  Files        1228     1228           
  Lines       45278    45286    +8     
  Branches     1441     1441           
=======================================
+ Hits        42709    42717    +8     
  Misses       2268     2268           
  Partials      301      301           
Flag Coverage Δ
workerintegration 61.57% <4.76%> (-0.03%) ⬇️
workerunit 90.61% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@@ -14,6 +14,11 @@
@sentry_sdk.trace
def cleanup_flare(
context: CleanupContext,
start_id: int = 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm actually this totally won't work unless I persist the start somewhere durable. Maybe need to write a postgres table for this... will think about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant