Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secondary tasks never run when repo count is small #2349

Closed
ABrain7710 opened this issue Apr 21, 2023 · 1 comment
Closed

Secondary tasks never run when repo count is small #2349

ABrain7710 opened this issue Apr 21, 2023 · 1 comment
Assignees

Comments

@ABrain7710
Copy link
Contributor

ABrain7710 commented Apr 21, 2023

Description:
When running on the default repos, I noticed the secondary tasks weren't getting run. This is happening because the core collection is finishing really quickly, updating the weight to a large number, and then the next time around there are no new repos to collect so the scheduler is picking all the old core repos to recollect since there is nothing else to collect. In turn the core repos are always collecting whenever the start_secondary_collection function gets run.

How to reproduce:

  1. Create a new database with the default repos
  2. Run collection while watching the collection status table
  3. See how the core status while go from collecting to success to collecting to success all while the secondary status stays pending

Expected behavior:
I think we should only consider repos for recollection when they are a day or more old. This would leave the core status as success for 1 whole day after finishing, so the secondary collection has a chance to start. This is also useful because it means a small instance collecting 30 repos won't recollect each of those repos multiple times a day since there is nothing else to collect

@IsaacMilarky
Copy link
Contributor

I think the solution to this issue is to have the secondary task hook check the core_data_last_collected column instead of the status. Although, I think it also makes sense to have a hard limit so we don't collect data for repositories too often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants