Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collecting completed jobs #82

Open
gregwebs opened this issue Oct 2, 2023 · 6 comments
Open

Garbage collecting completed jobs #82

gregwebs opened this issue Oct 2, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@gregwebs
Copy link

gregwebs commented Oct 2, 2023

What is an appropriate strategy for garbage collecting old jobs?

gue deletes on completion and suggests inserting into a new table. Gue suggests doing this in a hook, but doesn't explain how to handle the edge case of a hook panic (job will get automatically retried).

I like the concept of moving data into an archive table (as suggested in the Gue README) but having the framework manage that to ensure error conditions are handled well.

@acaloiaro
Copy link
Owner

I think you're right that GC is best left to the library. This request has been sitting in my mental queue for a while.

How I'd like to proceed is:

  1. Be opinionated: make it "the neoq way" that jobs get moved to neoq_jobs_complete by default.
  2. Provide a configuration option that allows jobs to simply be purged instead of moved to neoq_jobs_complete

Thoughts?

@gregwebs
Copy link
Author

gregwebs commented Oct 2, 2023

That sound perfect to me. Should it be neoq_jobs_completed past tense?

For actual GC of the completed table, I think it is fine to leave that to the user for now as an out of band, and some users might want to keep it that way. For example if delete permission was not be available to neoq, then actual GC would fail and you would be back to some kind of error handling callback. It is also possible to provide an example neoq periodic job that does the completed GC. If that job were to fail then job failure monitoring would kick in.

It seems needed for backwards compatibility to have an option to maintain the current behavior. I can see how there might be a use case where the existing behavior is preferred because everything is in just one table. Although that seems limited if the tables can be combined with UNION.

@acaloiaro
Copy link
Owner

acaloiaro commented Oct 2, 2023

That sound perfect to me. Should it be neoq_jobs_completed past tense?

Yep!

For actual GC of the completed table, I think it is fine to leave that to the user for now as an out of band, and some users might want to keep it that way. For example if delete permission was not be available to neoq, then actual GC would fail and you would be back to some kind of error handling callback. It is also possible to provide an example neoq periodic job that does the completed GC. If that job were to fail then job failure monitoring would kick in.

I'd like to give this more thought, but I what I feel most inclined to do here is provide a WithJobArchive config option which would accept options such as MOVE, KEEP, DESTROY.

This behavior would likely live outside the job's transaction, but dependent upon its success. The reason being: if the job succeeds and its status is updated, we wouldn't want archival to affect the job's success. Failure to archive is a neoq problem; not the application embedding it.

It seems needed for backwards compatibility to have an option to maintain the current behavior. I can see how there might be a use case where the existing behavior is preferred because everything is in just one table. Although that seems limited if the tables can be combined with UNION.

I don't feel a strong obligation relative to neoq behavior until it hits 1.0. I have some obligation relative to API stability at this point (I guaranteed no changes to the neoq.Neoq API before 1.0). Currently, my thinking about when it'll reach 1.0 is in another issue. That writeup probably needs to live somewhere else; perhaps in the Status section of the README. I'll give this some thought as well.

@gregwebs
Copy link
Author

gregwebs commented Oct 2, 2023

This behavior would likely live outside the job's transaction, but dependent upon its success. The reason being: if the job succeeds and its status is updated, we wouldn't want archival to affect the job's success. Failure to archive is a neoq problem; not the application embedding it.

I like that. At some point it is a user problem though, so the mechanisms for reporting the problem to the user need to be well defined. Logging at error level is a standard fallback of course. But I use Sentry for error monitoring and it would be nice to have a hook to send such errors there, although I could make it with a user-defined periodic job that would query for both jobs in the completed state and jobs not archived (at some point these errors fall back to logging if reporting to Sentry doesn't work).

@acaloiaro
Copy link
Owner

acaloiaro commented Oct 2, 2023

Yeah I think a well-structured, error-level log is the bare minimum here. I'm hesitant to provide a use-case-specific hook for this, but conceptually, I think a hook is the best way to truly handle failure.

That brings us to a much broader topic of generalized hooks, which I'm personally not prepared to tackle without some outside help :)

@acaloiaro acaloiaro added the enhancement New feature or request label Oct 6, 2023
@acaloiaro acaloiaro added the in progress Work is in progress for an issue label Dec 28, 2023
@acaloiaro acaloiaro removed the in progress Work is in progress for an issue label Jan 28, 2024
@acaloiaro
Copy link
Owner

Pulling this out of "In Progress", as I think better telemetry and/or event "hooks" should come first.

@acaloiaro acaloiaro changed the title Garbage collecting completed jobs? Garbage collecting completed jobs Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants