You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
When running on the default repos, I noticed the secondary tasks weren't getting run. This is happening because the core collection is finishing really quickly, updating the weight to a large number, and then the next time around there are no new repos to collect so the scheduler is picking all the old core repos to recollect since there is nothing else to collect. In turn the core repos are always collecting whenever the start_secondary_collection function gets run.
How to reproduce:
Create a new database with the default repos
Run collection while watching the collection status table
See how the core status while go from collecting to success to collecting to success all while the secondary status stays pending
Expected behavior:
I think we should only consider repos for recollection when they are a day or more old. This would leave the core status as success for 1 whole day after finishing, so the secondary collection has a chance to start. This is also useful because it means a small instance collecting 30 repos won't recollect each of those repos multiple times a day since there is nothing else to collect
The text was updated successfully, but these errors were encountered:
I think the solution to this issue is to have the secondary task hook check the core_data_last_collected column instead of the status. Although, I think it also makes sense to have a hard limit so we don't collect data for repositories too often.
Description:
When running on the default repos, I noticed the secondary tasks weren't getting run. This is happening because the core collection is finishing really quickly, updating the weight to a large number, and then the next time around there are no new repos to collect so the scheduler is picking all the old core repos to recollect since there is nothing else to collect. In turn the core repos are always collecting whenever the
start_secondary_collection
function gets run.How to reproduce:
Expected behavior:
I think we should only consider repos for recollection when they are a day or more old. This would leave the core status as success for 1 whole day after finishing, so the secondary collection has a chance to start. This is also useful because it means a small instance collecting 30 repos won't recollect each of those repos multiple times a day since there is nothing else to collect
The text was updated successfully, but these errors were encountered: