Regularly restart GitPython-based services #156
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problems
The bot has been skipping a lot of runs for months now based on the checks from #112 and #115:
And the current pass is taking an extremely long time to complete, over 10 hours to get from 'A' to 'P' (where it used to be 10-20 minutes for 'A' to 'Z'):
This has meant a very long delay in the post-merge handling of KSP-CKAN/NetKAN#7831 (and who knows what else).
Cause
We still don't know exactly why this happens, but based on graphs that @techman83 has shared previously, it seems to be something that starts out mild and then gets gradually worse the longer the containers run. May be due to some aspect of GitPython, but we don't quite have proof of that, and rewriting to use another tool is a big project (see #148).
Changes
There's already a RestartWebhooks service to handle changes in certs once a week.
Now some new duplicates of that service are created to automatically restart the Indexer, Adder, and Mirrorer every 3 days. This should reset whatever the problem is with the Python-based services and hopefully give us better service availability.