New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caching and “apt-get update” #3313
Comments
How are you trying to install postgresql? Are you logging in and running For Debian or Ubuntu, if you're appending to a Dockerfile and want to use a cached build, another There may be a better way to do this - other suggestions welcome. There's some discussion of this in #880 For the |
I’m installing it with another With regards to your suggestion to run another What I’d suggest is having an expiration date for RUN instructions, so that e.g. |
I have made the following script to run
If you run this on |
My preferred method to combat this in a natural way that busts the cache only when necessary is to couple the update lines with the install lines, like so:
Then, if you ever change the list of packages, the cache is naturally and normally invalidated properly, causing Also, with the changes coming from stackbrew/debian (that will become just "debian" soon) and stackbrew/ubuntu, |
This solves the problem when you add a new package, but what about when you want to create a docker image that uses the same packages, but newer versions? Say you build your image three months ago, and since a new version of some package was released that you want in your docker image. However, docker won't install it, because it gets the old package list and old package version from the cache. |
This is where version pinning comes in handy (especially if the version of said package is actually important to your iagme), like |
tianon, thanks for sharing your way of doing this. I don’t feel like this is a satisfactory solution for my case, though, where the version number is not important to me. What’s important is just that the build file keeps working, ideally with some sort of caching. With my expiration time suggestion, that’d be the case. |
Or you can add a line like this as the third line in your Dockerfiles (after the ENV LAST_UPDATED 2013-12-20 To update the cache, simply change the LAST_UPDATED line and it'll invalidate everything below it, including any |
@ydavid365 in that case, why not just use -no-cache when you want to rebuild? |
@tianon I thought that if you select the `-no-cache' option it won't use the existing cache and also won't create any new cache for subsequent builds. |
Nope, your new lines are prime for use in the next cache use (unless you specify |
wow that's not the behavior that I expected. hm. thanks for pointing that On Thu, Dec 26, 2013 at 8:24 PM, Tianon Gravi notifications@github.comwrote:
|
Building with |
I wonder if there should be some "Dockerfile best practices" documentation which includes patterns like |
could be a good idea to have a community wiki or somethin. not sure if there's one 'best' way to do things at the moment. it depends On Fri, Jan 17, 2014 at 3:52 AM, Ben Firshman notifications@github.comwrote:
|
Why don't we just add a |
I've had the same concerns and I have a few ideas... package managers improved to support limiting updates to a specific date
The result is that your Dockerfile intrinsically documents the date that Downsides:
maintain your own date-labelled base images
Downsides:
/shrug |
Per my upstream package-manager suggestion, I found:
|
Ran into this. As a cheap easy way to not have to invalidate everything via
|
As containers are generally supposed to be focused on a particular process, I've given up trying to manage this particular aspect (although @jjbohn's comment idea is probably the most practical for now). I just try to make sure that a specific versions of certain packages are install by |
Yeah, I was writing it down and messing with it, I realized that by using a date comment, I'm pretty much just making a temporal association the same way I would with version numbers. Might as well stick with tying to a specific version. |
Running RUN apt-get update by itself will cause it to use the cache when rebuilding the dockerfile. This can lead to apt-get installs failing when updating programs. The preferred method is RUN apt-get update && apt-get install -y <any apt-get installs you need>. this will force apt-get update to run when any package is updated, or new ones are installed. moby/moby#3313
In fact I think there are bigger issues here. One is that if there are security updates to Debian packages, my images are not rebuild. So a better approach would be some cloud service which would trigger Docker Hub rebuilds when package you installed get security updates. That service could also re-trigger rebuilding without cache the layer where you cached Debian package database. In this way it would be granular and images would be updating only when necessary. The other approach (probably not nice for Docker Hub) would be to have a base image with |
Found this, a bit related: https://coreos.com/blog/vulnerability-analysis-for-containers/ |
@mitar you may be interested in "Project Nautilus", which was announced during DockerCon EU. I don't think there's a product page for that yet, but here are some slides; http://www.slideshare.net/Docker/official-repos-and-project-nautilus |
…docker images * See here: moby/moby#3313
|
This means that the result of the command is still stale, say, if a developer created an image in 2016 and then created a similar image in 2017 – the cache from 2016 will be used and the latest versions of the packages still won't be installed. |
@halt-hammerzeit What would you say was the desired behavior? Do you have a solution? |
https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#add-or-copy
If you're advocating the software being more "transparent" and "obvious" then caching The desired behaviour would be the software behaving more "obvious" and "transparent". |
@Sjord Ok, my suggestions then:
|
To avoid caching old package lists, every `apt-get install` should be prefixed with `apt-get update`. More info on the matter: - https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get - moby/moby#3313
To avoid caching old package lists, every `apt-get install` should be prefixed with `apt-get update`. More info on the matter: - https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get - moby/moby#3313
This issue is about clarifying the following scenario:
I created a docker container a couple weeks ago. The dockerfile can be found https://index.docker.io/u/stapelberg/git-daemon/. As you can see, it uses
RUN apt-get update
, which gets cached, as it should.Now, a couple weeks later, the package lists have changed, and with the old package lists, I cannot install postgresql (I get 404s for the files that are no longer on the Debian mirrors).
Obviously, when running
docker build -no-cache -t=stapelberg/postgresql .
, this is not a problem, because the cache does not get used.But that implies that I need to run every build that is based on Debian with -no-cache and can never make use of the cache. I have a hard time believing that this is how it’s supposed to be used.
I then tried to run
docker rmi
on the cached image:That error message is horrible. It doesn’t tell me any details about the conflict, so I have no clue what’s going on. My guess is that the issue is that 3702cc3eb5c9 is still in the “image chain” for e.g. stapelberg/git-daemon, which I do want to keep.
So, how can one specify that a certain step should not be cached for longer than a day?
Or how is running “apt-get update” supposed to work in Docker?
Note that my images inherit from Debian testing, which makes the problem really obvious. But with any Debian(-based) operating system this problem exists. Even the stable release get security updates when appropriate, or point releases. So one needs to have updated apt lists at all times.
Any clarification about what I’m doing wrong are appreciated. Thanks.
The text was updated successfully, but these errors were encountered: