Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regularly restart GitPython-based services #156

Closed
wants to merge 2 commits into from

Conversation

HebaruSan
Copy link
Member

Problems

The bot has been skipping a lot of runs for months now based on the checks from #112 and #115:

image

And the current pass is taking an extremely long time to complete, over 10 hours to get from 'A' to 'P' (where it used to be 10-20 minutes for 'A' to 'Z'):

image

image

This has meant a very long delay in the post-merge handling of KSP-CKAN/NetKAN#7831 (and who knows what else).

Cause

We still don't know exactly why this happens, but based on graphs that @techman83 has shared previously, it seems to be something that starts out mild and then gets gradually worse the longer the containers run. May be due to some aspect of GitPython, but we don't quite have proof of that, and rewriting to use another tool is a big project (see #148).

Changes

There's already a RestartWebhooks service to handle changes in certs once a week.

Now some new duplicates of that service are created to automatically restart the Indexer, Adder, and Mirrorer every 3 days. This should reset whatever the problem is with the Python-based services and hopefully give us better service availability.

@HebaruSan HebaruSan added Indexer Receives inflated modules and adds them to CKAN-meta Mirrorer Uploads mods to archive.org labels Apr 10, 2020
@techman83
Copy link
Member

The only thing is, we'd have to to restart these much more frequently than every 3 days. But we can implement it and now that things have settled at work, I should be able to get back to looking at pygit.

I'll check this out in the next couple of days if I don't think of something else that might give us some breathing space. I can tell you though that it's the Indexer where this problem lay and believe it is to do with all the branching as the other containers do lots of git work without issue and that's one of the key differences. Though the reflog may be also an issue, but I can't see how it could be.

@HebaruSan
Copy link
Member Author

KSP-CKAN/CKAN#3021 and KSP-CKAN/CKAN#3031 seem to have mostly solved the skipped runs problem by reducing the amount of staging, which @techman83 reports is important because git branch operations are costly and performed regardless of whether the files have changed. I think we can set this aside for now. Even if the problem returns, we know we can try to attack it by reducing usage of staging.

@HebaruSan HebaruSan closed this May 6, 2020
@techman83
Copy link
Member

I don't know that we should reduce our use of staging, but maybe get a bit smarter about creating branches. Though staging implies manual intervention, so being smart about when we stage is likely just as useful for less technical reasons.

@HebaruSan
Copy link
Member Author

being smart about when we stage is likely just as useful for less technical reasons

Yup, my motivation for those PRs was not having to deal with unnecessary decisions. It was sheer dumb luck that it happened to also help with other problems. But now we know that if any future change increases the use of staging, there's a trade-off to be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexer Receives inflated modules and adds them to CKAN-meta Mirrorer Uploads mods to archive.org
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants