-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use docker cache across runs #17
Comments
See these links: However, note this warning:
This would essentially mean that we'd only be able to run 1 build at a time which isn't really acceptable. However, I think there may be some at least partial solutions. If I follow the documentation above, it lays out how to create a volume for your container which you then mount as the docker cache dir ( I think this would work similarly to how secret volumes work. For the docker cache it would look something like this:
I believe this is doable and a pretty clean solution. One question to be answered though: what are the implications of allowing all instances of the same job have access to the same cache? Particularly what are the security concerns? |
Another thought after revisiting the previous comment. I think we could get around the limitation of restricting jobs to a single instance if we just don't make it an expectation that there will always be a cache and instead think of it as a performance perk when possible. After all, that's really what the docker build cache is anyways. When you build something for the first time you don't have a cache, so I don't think it's some kind of expectation that the user should have. Of course, the user may expect that once they do one build, subsequent builds will be cached, but what about for caches of builds running in parallel? How does docker itself handle cache layers for parallel builds? Do caches become available as soon as they finish within the build or not until the entire build finishes and likewise for consuming a cached layer.
(1) is fairly straightforward and may be a good place to start. (2) introduces a lot more into play. How do we proceed after these jobs have finished and another starts. Which volume will be used? I imagine we'd want to use the most recent one, but then should we just delete the other one? It may still be useful in the future to have image caches from older runs so we likely won't want to delete it. I also think it would be best to give the impression of all the caches from all previous runs to be in place unless they were removed explicitly. It would be confusing behavior to be missing caches all of a sudden just because two jobs ran at the same time. I wonder if it would be possible to somehow merge caches? This seems like it would be the ideal solution, but could be complex. My idea here would be that after two given volumes are unmounted and no longer used by any containers, the caches from both could be merged together, if that's even possible. So when the next job ran, it would have access to everything as if the previous two concurrent jobs ran serially. That said, it's always going to be best case that a build running in parallel to another won't have the cache from that other run. However, I think that is fine since the build cache is really just a performance improvement and any kind of dependence on an "ordered" cache shouldn't be assumed anyways with docker. If the user wants that then they should disable parallel builds (disabling parallel builds isn't yet supported). It's also worth considering what effect this will all have on the expectation that normally docker images that are completely different share layers they have in common. Could this throw off someones workflow? I don't have any insight here yet. |
Another note of emphasis is that the reason we can't mount the same build cache on all volumes is because
as mentioned in an above comment. |
The approach we ended up taking was to simply mount a volume meant for a specific job. We don't do any checks that the volume is already mounted elsewhere. Sysbox's docs mention that
From what I can tell, these errors are reported in the inner dockerd's logs which won't be visible in the job output (depending on how the user has set up their job). The job then fails without further output. This is not ideal and some error should be reported to the user (other than the 400). We should do some checking before the job is run that the volume is not already in use and report this back to the user. A follow up issue will be made for this work. |
This will bring builds back down to their previous duration and just, or maybe more, importantly will prevent disk size from being used up so quickly.
I know there is documentation in the sysbox GitHub repo. That would be a good place to start.
The text was updated successfully, but these errors were encountered: