Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerfile: Split the ADD command #3050

Closed
amluto opened this issue Dec 4, 2013 · 22 comments
Closed

dockerfile: Split the ADD command #3050

amluto opened this issue Dec 4, 2013 · 22 comments
Labels
area/builder exp/beginner kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@amluto
Copy link

amluto commented Dec 4, 2013

Currently the ADD command is IMO far too magical. It can add local and remote files. It will sometimes untar a file and it will sometimes not untar a file. If a file is a tarball that you want to copy, you accidentally untar it. If the file is a tarball in some unrecognized compressed format that you want to untar, you accidentally copy it.

There's also essentially no security for downloads.

Can we have unambiguous options, please. For example:

COPY-IN <local file or directory> <target in image>

UNTAR <local tarball> <target directory in image>

DOWNLOAD NOVERIFY <URL> <target>

DOWNLOAD SHA256 <sha256 hash> <URL> <target>

etc.

@vieux
Copy link
Contributor

vieux commented Dec 4, 2013

👍 👍 👍 👍 👍

I agree with you Currently the ADD command is IMO far too magical

Here are my comments:

  • We should keep options in one keyword
  • Dockerfile syntax should be easy to learn, no need to have so many commands
  • Right now with your example there is no way to untar a remote tarball, I would suggest something like:
    • COPY/INJECT -> copy local/remote files/directory/archive in the image
    • UNTAR -> copy local/remote archive in the image and untar them

@tianon
Copy link
Member

tianon commented Dec 4, 2013

Massive +1s, and I love @vieux's suggestion of just COPY and UNTAR commands to remove some of the magic here. This is a major roadblock for a couple of my projects right now.

As a side note, the hash verification stuff was already covered in #2579, and I think that'd be the appropriate place to discuss that further, especially since splitting ADD alone is a big enough topic for one issue. :)

I think the "remote download" items could also do with Last-Modified and/or ETag HEAD lookups for when we eventually get "ADD caching", but that's not really the discussion here either. 👍

@amluto
Copy link
Author

amluto commented Dec 4, 2013

I can work with COPY and UNTAR.

@scalp42
Copy link

scalp42 commented Dec 8, 2013

explicit > implicit +1

@SvenDowideit
Copy link
Contributor

why UNTAR rather than RUN tar zxvf?

ie, INSERT a_tarfile then (do i need to set a WORKDIR?) RUN tar zxvf a_tarfile

first up, INSERT is thus the same as docker insert and when i need to run something...

@shykes
Copy link
Contributor

shykes commented Dec 11, 2013

One difference is that "run tar" depends on the version of tar available in the image. It's better to guarantee the same version everywhere (ideally docker would embed its own implementation) , and insome scenarios the images will not have tar at all.

On Tue, Dec 10, 2013 at 11:05 PM, Sven Dowideit notifications@github.com
wrote:

why UNTAR rather than RUN tar zxvf?
ie, INSERT a_tarfile then (do i need to set a WORKDIR?) RUN tar zxvf a_tarfile

first up, INSERT is thus the same as docker insert and when i need to run something...

Reply to this email directly or view it on GitHub:
#3050 (comment)

@SvenDowideit
Copy link
Contributor

understood

@tianon
Copy link
Member

tianon commented Dec 11, 2013

Also because that would double the size of our base images, not to mention where does the original tar come from when we start with nothing?

I think it's very important that we keep the ability to create a layer entirely from a tar (for our base images at least), I would just be extremely grateful for a way to do so with a remote tar too, and if we could disentangle ADD while we're at it, I see that as a win, personally.

@tonyarkles
Copy link

Another vote for disentangling it. I've just run into a problem, and even my hacky workaround wasn't enough to get around the ADD magic.

I've got a tarball in my context (local) that needs to get copied into the image compressed (it gets used later on directly, not decompressed). I started out trying to ADD it, but discovered that it got untarred. As a workaround, I renamed it to .foo instead of .tar.gz and it still got decompressed!

Not sure what my next attempt is going to be, but it'd be awesome if there were a straightforward way to just say "copy this file as-is".

Edit: A final workaround that actually worked:

Before hand:
openssl enc -aes-256-cbc -k password -in myfilename.tar.gz -out myfilename.encrypted

In Dockerfile:
ADD ./myfilename.encrypted /root/
RUN openssl enc -d -aes-256-cbc -k password -in /root/myfilename.encrypted -out /root/myfilename.tar.gz

@pda
Copy link
Contributor

pda commented Mar 17, 2014

Dockerfile / docker build should implement minimal functionality, and instead delegate to commands which are available on the CLI as docker <cmd> ….

I think ADD is the biggest offender on this front. Some symptoms:

  • ADD in a Dockerfile is confusing and doesn't have its own documentation page.
  • docker add doesn't exist; no CLI mechanism to recursively add a local directory to an image.

I imagine ADD is tricky because it depends on the build context (tarred $PWD sent to the docker daemon). But I'm sure we can work around that. If running with a build context, use that; if not, tar and send the specified directory to be added to the image.

Maybe this warrants an issue, not a comment, but it seems very relevant here.

@pnasrat
Copy link
Contributor

pnasrat commented Apr 19, 2014

We have a bunch of bugs on ADD at the moment. What's the best way to pitch a strawman implementation?

@thedeeno
Copy link

+1 this magical ADD also has surprising results when you WANT an archive in the image. For instance, when adding a locally built gem I need to tar it first so the expansion doesn't break my RUN gem install xxx command.

@snarlysodboxer
Copy link

+1 for no magical untar. - this is not intuitional IMO.

This has caused hours of headache and troubleshooting.
Example:

ADD ./database_to_import.tar /tmp/database_to_import.tar
RUN pg_restore --clean --format tar -d database_name /tmp/database_to_import.tar

fails with confusing errors that are hard to troubleshoot because /tmp/database_to_import.tar becomes a directory inside the container without warning (except for one sentence deep in the ADD documentation.)

For now the only way I can find to get a tar file into an image without it auto-untarring is to make an extra directory and put the tar file in there like this:

ADD ./database_dir/ /tmp/
RUN pg_restore --clean --format tar -d database_name /tmp/database_to_import.tar

But that quickly becomes cumbersome.

@unclejack
Copy link
Contributor

There's a new COPY instruction which can be used to copy local files to the image. This COPY instruction doesn't extract, it only copies.

Splitting up the ADD instruction still requires some discussion. Decompressing and extracting a tarball should still be possible after splitting up ADD.

@EdBoraas
Copy link

EdBoraas commented Aug 7, 2014

So, now that we have COPY, is there any hope at all of having ADD unpack remote archives as it does for local archvies? The consistency would be a huge plus, and I have some base image cases that would benefit from this. (Alternatively, if ADD's behavior should just be left alone, how about an UNPACK command that always unpacks; i.e. the flip-side of the COPY case?)

@tiborvass
Copy link
Contributor

@EdBoraas We thought about that, and although we definitely would like to have UNPACK or EXTRACT commands, we're also thinking we probably should not have such commands automatically download remote files. Which brings to DOWNLOAD and the problem on how to combine those now 3 commands: optional commands (DOWNLOAD vs UNPACK DOWNLOAD, and COPY vs UNPACK COPY), or using pipes, etc. The goal is to deprecate ADD without changing its current quirky behavior that many people rely on.

@Bilge
Copy link

Bilge commented Oct 8, 2014

👍 I require the ability to use an archive as the rootfs for my image stored remotely. Currently there is no way to do this with a Dockerfile. ADD will only extract local images which limits my options.

Whether ADD should support extracting remote archives (which introduces a BC break) or a new command is added to extract remote archives (which preserves BC and could be named more intuitively) is of little concern to me since at this moment in time the feature to extract remote archives into the file system is completely missing but I would make good use of it if it were available.

However, I'd like to cast a vote in favour of UNPACK or EXTRACT.

@jessfraz jessfraz added kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. white-belt labels Feb 26, 2015
@dreamcat4
Copy link

Need the following:

  • To have ADD or UNPACK or EXTRACT work for at least a minimum of compressed .tar.gz and tgz compressed format. But also as many others as is possible.
  • Have ADD or UNPACK or EXTRACT be able to fetch and download and unpack in 1 step. From remote http:// curl/wget URL - whether it is ssl verified or not.

@duglin
Copy link
Contributor

duglin commented Mar 1, 2015

Once we have #10775 merged we can look to add options to ADD to allow for finer control over what happens. I would prefer to do that than to add a ton of new top level Dockerfile commands that almost do the same thing as each other. We can even then look to merge COPY and ADD, even having just those two causes confusion.

@thaJeztah
Copy link
Member

Once we have #10775 merged we can look to add options to ADD to allow for finer control over what happens

👍

@dreamcat4
Copy link

We also need a reliable caching of ADD'ed tarballs. Docker build is supposed to set mtime from last-modified HTTP header. But isn't working for us, neither from github releases URLS (redirected to s3 with a session key). Nor from local Apache webserver.

Where #7748 is as another possible solution to our caching problem. It downloads them every time, which sucks.

@jessfraz
Copy link
Contributor

Hello!
We are no longer accepting patches to the Dockerfile syntax as you can read about here: https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax

Mainly:

Allowing the Builder to be implemented as a separate utility consuming the Engine's API will open the door for many possibilities, such as offering alternate syntaxes or DSL for existing languages without cluttering the Engine's codebase

Then from there, patches/features like this can be re-thought. Hope you can understand.

ntgussoni added a commit to ntgussoni/backstage that referenced this issue Mar 24, 2021
I spent a good 4 hours trying to understand why the images would build correctly locally but when built in jenkins/kaniko they would miss the `dist` folders.

Turns out ADD is too magical, and in some occasions it would not untar.  See: moby/moby#3050
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder exp/beginner kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

No branches or pull requests