Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Stop caching anything with CircleCI for Exchange packs #22

Closed
blag opened this issue May 16, 2020 · 9 comments
Closed

Proposal: Stop caching anything with CircleCI for Exchange packs #22

blag opened this issue May 16, 2020 · 9 comments

Comments

@blag
Copy link

blag commented May 16, 2020

We've had a few difficult-to-debug problems with caching in our Travis-CI and CircleCI builds, both with packs in Exchange, and with ST2 itself.

Recent-ish issues around CircleCI caching:

(I know there are more, but I'm pressed for time at the moment, I'll fill this list in later - also @nmaludy might have a few ideas for this list)

Cache coherency, along with naming things and off-by-one errors, are one of the two hard problems in computer science, and we seem to invite these problems into our lives when their benefits are mostly rare (thinking about Exchange packs) and marginal (especially when compared to how slow the rest of CircleCI builds are).

I propose that, along with removing FORCE_CHECK_ALL_FILES in proposal #7, that we remove any caching in our CircleCI jobs - this means no pip/PyPI cache, no RPM caching, and no Apt caching. Furthermore, a common feature request for CircleCI is "how do I generate the cache key from multiple files", and as of this writing there simply isn't a good way to do so.

I'm intentionally restricting this proposal to just packs on Exchange. The main ST2 builds happen frequently enough, and receive enough attention from multiple "frequent flier" troubleshooters, to have relatively quick fixes, so the benefit of caching there might actually outweigh the additional test complexity and cost of breakage.

Our pack CI is complicated enough without having to also deal with CircleCI caching, so let's keep our tests as simple and straightforward as we possibly can, even if that means they take a bit longer to complete and use a bit more bandwidth.

@nmaludy
Copy link
Member

nmaludy commented May 16, 2020

I'm so +1 on this it isn't even funny. Been bit by this more than once recently, both with the pip cache and now currently with the apt cache. Problems like this cause you to have to touch every repo...

Doing this may slow down the pack builds slightly but will increase maintainer happiness (especially mine) tremendously.

It's been so bad I've stopped looking into build caches for our internal builds because of the problems we're experiencing in StackStorm.

I agree, let's start with exchange packs, this should be a very low impact change.

@punkrokk
Copy link
Member

punkrokk commented May 16, 2020 via email

@arm4b
Copy link
Member

arm4b commented May 18, 2020

I'm wondering how the overall and aggregated build timing for packs will look like if we remove the caching alltogether? At the same time how frequently we deal with the caching issues in StackStorm Exchange org?
This will help us to understand whether removing CircleCI caching will bring more good comparing to harm or vice-versa.

@blag BTW as noted in #7 (comment) it doesn't look like we're on CircleCI OpenSource plan yet for StackStorm-Exchange Github org.

@arm4b
Copy link
Member

arm4b commented May 18, 2020

We dig a little bit deeper with @nmaludy in StackStorm-Exchange/stackstorm-napalm#65 and spot several clear problems and conclusions from them:

v1-dependency-cache-py27-{{ checksum "requirements.txt" }} should not include apt cache

Check the CircleCI config, currently {{ checksum "requirements.txt" }} is used for cache invalidation in every pack, however besides of pip caching this bucket includes apt metadata and packages. And so with this approach there is no way to invalidate apt cache. That's the problem we're experiencing right now when apt upstream repo was updated.

Besides of that, to address the previous issues with caching, the key should rely not just on {{ checksum "requirements.txt" }}, but also on st2 pip requirements {{ checksum "pip-requirements.txt" }}.

- v1-dependency-cache-py27-{{ checksum "st2-requirements.txt" }}-{{ checksum "requirements.txt" }}

CircleCI should be using Docker containers with our pre-packaged apt tools

StackStorm-Exchange CI process includes installing some of the pre-req tools, example:

0 upgraded, 190 newly installed, 0 to remove and 0 not upgraded.
Need to get 129 MB of archives.
After this operation, 422 MB of additional disk space will be used.

Instead of installing them over and over (even from apt cache) we'll need to build Docker containers that would pre-built these apt requirements.
I believe it's just something we missed during the CircleCI v1 -> v2 transition when the workflow was transitioned but the missed the benefits of CircleCI is Docker-friendly in its v2.
So codifying Dockerfiles to Exchange CI repo, building them with the tools we need, uploading to Docker hub and relying on them in all Exchange CI/CD jobs would help.

Further improvements

Would be a good idea to explore further pipfile.lock approach which would include the real tree of pip dependencies like in other package managers, see https://circleci.com/docs/2.0/caching/#pip-python and https://github.com/pypa/pipenv. I think that's also our issue in stackstorm/st2 as we currently don't have real reproducible builds per https://reproducible-builds.org/ (which is BTW important in the context of our Linux Foundation application) and best practices.

@arm4b
Copy link
Member

arm4b commented May 18, 2020

Based on further SSH debug research in StackStorm-Exchange/stackstorm-napalm#65 (comment), looks like the apt cache was actually never stored/used in CircleCI 2.0. This is, in fact, not CircleCI cache issue at all.

So what's happening there is that apt repository metadata is based on a very old state where it points to the package URLs which were already deleted from upstream repos. You might find similar if you take unsupported/old .iso image and try to install the packages straight away without doing apt-get update.

I guess for now our short-term workaround to fix StackStorm-Exchange repos is to add apt-get update before doing apt-get install,
while long-term enhancement is to build Docker images so we don't need to get those packages installed every time during the CI build, see https://discuss.circleci.com/t/caching-apt-get-intall-results-in-2-0/12883/2

@nmaludy
Copy link
Member

nmaludy commented May 19, 2020

@armab I will work today on creating PRs with the apt-get update change for all existing packs.

With regards to previous cache problems, i don't have PRs, but several months ago there was a change in one of the st2 repos, maybe during one of the dependency updates of st2 where a requirements.txt file was changed in the upstream repo and the exchange packs were still using the old pip cache. This caused issues with unit tests i believe in one specific version of tests, either Python2 only, or Python 3 only (can't remember). The fix was to invalidate the cache for those repos so it would download new content.

I'm still +1 for not relying on cache. I agree with @armab on installing system-level dependencies in the container.

I also agree with others that have mentioned our dependency system could use an overhaul (not biased here, i just am feeling the pain and would welcome a change).

@arm4b
Copy link
Member

arm4b commented May 19, 2020

@nmaludy creating PR for all the Exchange repos would be really time consuming. I think we can just commit the build fix directly to master (like StackStorm-Exchange/stackstorm-napalm@179dfe1), assuming there is no potential code/diff conflict.

@blag BTW maybe you could share any script samples that would iterate over the stackstorm-exchange packs?

@arm4b
Copy link
Member

arm4b commented May 19, 2020

OK looks like https://github.com/StackStorm-Exchange/index/blob/master/v1/index.json#L6 is the object we can iterate on.

@nmaludy
Copy link
Member

nmaludy commented May 19, 2020

Pushing change to all repos started this morning. Should be done sometime today.

As a side note, i could have done this very fast, except that the Exchange Deploy is single threaded and if >1 thing tries to deploy concurrently the build fails. Right now i'm having to sleep ~5m30s between each push so we space it out enough for deploys to finish.

Would be super nice if i could just push all of the repos at once, then let the build system figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants