Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY with excluded files is not possible #15771

Open
bronger opened this issue Aug 22, 2015 · 94 comments
Open

COPY with excluded files is not possible #15771

bronger opened this issue Aug 22, 2015 · 94 comments
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@bronger
Copy link

bronger commented Aug 22, 2015

I need to COPY a part of a context directory to the container (the other part is subject to another COPY). Unfortunately, the current possibilities for this are suboptimal:

  1. COPY and prune. I could remove the unwanted material after an unlimited COPY. The problem is that the unwanted material may have changed, so the cache is invalidated.
  2. COPY every file in a COPY instruction of it own. This adds a lot of unnecessary layers to the image.
  3. Writing a wrapper around the "docker build" call that prepares the context in some way so that the Dockerfile can comfortably copy the wanted material. Cumbersome and difficult to maintain.
@cpuguy83
Copy link
Member

See https://docs.docker.com/reference/builder/#dockerignore-file
You can add entries to a .dockerignore file in the root of the project.

@bronger
Copy link
Author

bronger commented Aug 22, 2015

.dockerignore does not solve this issue. As I wrote, "the other part is subject to another COPY".

@cpuguy83
Copy link
Member

So you want to conditionally copy based on some other copy?

@bronger
Copy link
Author

bronger commented Aug 22, 2015

The context contains a lot of directories A1...A10 and a directory B. A1...A10 have one destination, B has another:

COPY A1 /some/where/A1/
COPY A2 /some/where/A2/
...
COPY A10 /some/where/A10/
COPY B some/where/else/B/

And this is awkward.

@cpuguy83
Copy link
Member

What part of it is awkward? Listing them all individually?

COPY A* /some/where/
COPY B /some/where/else/

Does this work?

@bronger
Copy link
Author

bronger commented Aug 22, 2015

The names A1..A10, B were fake. Besides, COPY A* ... throws together the contents of the directories.

There are a couple of options I admit, but I think that all of them are awkward. I mentioned three in my original posting. A fourth option is to rearrange my source code permanently so that A1..A10 are moved in a new directory A. I was hoping that this was not necessary because an additional nesting level is not something to wish for, and my current tools needed to special-case my dockerised projects then.

(BTW, #6094 (following symlinks) would help in this case. But apparently, this is no option either.)

@cpuguy83
Copy link
Member

@bronger if COPY behaved exactly like cp, would that solve your use-case?

I'm not sure I 100% understand.
Maybe @duglin can have a look.

@cpuguy83 cpuguy83 reopened this Aug 22, 2015
@duglin
Copy link
Contributor

duglin commented Aug 22, 2015

@bronger I think @cpuguy83 asked the right question, how would you solve this if you were using 'cp' ? I looked and didn't notice some kind of excludes option on 'cp' so I'm not sure how you would solve this outside of a 'docker build' either.

@bronger
Copy link
Author

bronger commented Aug 22, 2015

With cp behaviour, I could ameliorate the situation by saying

COPY ["A1", ... "A10", "/some/where/"]

It's still a mild maintenance problem because I would have to think of that line if I added an "A11" directory. But that would be acceptable.

Besides, cp does not need excludes, because copying everything and removing the unwanted parts has almost no performance impact beyond the copying itself. With docker's COPY, it means wrongly invalidated cache every time B is changed, and bigger images.

@duglin
Copy link
Contributor

duglin commented Aug 22, 2015

@bronger you can do:

COPY a b c d /some/where

just like you were suggesting.

As for doing a RUN rm ... after the COPY ..., yes you'll have on extra layer, but you still should be able to use the cache. If you see a cache miss due to it let me know, I don't think you should.

@bronger
Copy link
Author

bronger commented Aug 22, 2015

But

COPY a b c d /some/where/

copies the contents of the directories a b c d together, instead of creating the directories /some/where/{a,b,c,d}. It works like rsync with a slash appended to the src directory. Therefore, the four instructions

COPY a /some/where/a/
COPY b /some/where/b/
COPY c /some/where/c/
COPY d /some/where/d/

are needed.

As for the cache ... if I say

COPY . /some/where/
RUN rm -Rf /some/where/e

then the cache is not used if e changes, although e is not effectively included into the operation.

@duglin
Copy link
Contributor

duglin commented Aug 23, 2015

@bronger yep, sadly you're correct. I guess we could add a --exclude zzz type of flag, but per https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax it may not get a lot of traction right now.

@bronger
Copy link
Author

bronger commented Aug 23, 2015

Fair enough. Then I will use a COPY+rm for the time being and add a FixMe comment. Thank you for your time!

@pwaller
Copy link
Contributor

pwaller commented Aug 23, 2015

Just to 👍 this issue. I regularly regret that COPY doesn't mirror rsync's trailing slash semantics. It means you can't COPY multiple directories in a single statement, leading to layer proliferation.

I regularly encounter a case where I want to copy many directories except for one (which will be copied later, because I want it to have different layer-invalidation effects), so --exclude would be useful, as well.

Also, from man rsync:

       A trailing slash on the source changes this behavior to avoid  creating
       an  additional  directory level at the destination.  You can think of a
       trailing / on a source as meaning "copy the contents of this directory"
       as  opposed  to  "copy  the  directory  by name", but in both cases the
       attributes of the containing directory are transferred to the  contain‐
       ing  directory on the destination.  In other words, each of the follow‐
       ing commands copies the files in the same way, including their  setting
       of the attributes of /dest/foo:

              rsync -av /src/foo /dest
              rsync -av /src/foo/ /dest/foo

I guess it can't be changed now without breaking a lot of wild Dockerfiles.

@pwaller
Copy link
Contributor

pwaller commented Aug 23, 2015

As a concrete example, let's say I have a directory looking like this:

/vendor
/part1
/part2
/part3
/...
/partN

I want something that looks like:

COPY /vendor /docker/vendor
RUN /vendor/build
COPY /part1 /part2 ... /partN /docker/ # copy directories part1-N to /docker/part{1..N}/
RUN /docker/build1-N.sh

So that part1-N doesn't invalidate building of /vendor. (since /vendor is rarely updated compared to part1-N).

I have previously worked around this by putting part1-N in their own directory, so:

/vendor
/src/part1-N

But I have also encountered this problem in projects that I am not at liberty to rearrange quite so easily.

@antoineco
Copy link

@Praller good example, we're facing the exact same issue. The main problem is that Go's filepath.Match doesn't allow much creativity compared to regular expressions (i.e. no anti pattern)

@jason-kane
Copy link

I just came up with a somewhat crack-brained workaround for this. COPY can't exclude directories, but ADD can expand tgz.

It's one extra build step:
tar --exclude='./deferred_copy' -czf all_but_deferred.tgz .
docker build ...

Then in your Dockerfile:
ADD ./all_but_deferred.tgz /application_dir/
.. stuff in the rarely changing layers ..
ADD . /application_dir/
.. stuff in the often changing layers

That gives the full syntax of tar for including/excluding/whatever without gobs of wasted layers trying to include/exclude.

@mikeknep
Copy link

@jason-kane This is nice trick, thanks for sharing. One small point: it looks like you can't add the z (gzip) flag to tar—it changes the sha256 checksum value, which invalidates the Docker cache. Otherwise this approach works great for me.

@matthewmueller
Copy link
Contributor

+1 for this issue, I think it could be supported in the same way a lot of glob libraries support it:

Here's a proposal to copy everything except node_modules

COPY . /app -node_modules/

@duypm
Copy link

duypm commented Aug 1, 2016

I come across the same problem as well, and it's kind of painful for me when my Java webapps is about 900MB but almost 80% of that is rarely changed.
It's an early state of my application and the folder structure is somewhat stable so I don't mind adding 6-7 COPY layer to be able to use the cache, but it will surely hurt in the long term when more and more files and directories are added

@jfroffice
Copy link

👍

@kkozmic-seek
Copy link

I have the same problem although with docker cp, I want to copy all files from a folder except for one

@oaxlin
Copy link

oaxlin commented Sep 9, 2016

Exact same issue here. I want to copy a git repo and exclude the .git directory.

@antoineco
Copy link

@oaxlin you could use the .dockerignore file for that.

@kkozmic-seek
Copy link

@antoineco are you sure that will work? It's been a while since I tried but I'm pretty sure .dockerignore didn't work with docker cp, at least at the time

@antoineco
Copy link

@kkozmic-seek absolutely sure :) But the docker cp CLI subcommand you mentioned is different from the COPY statement found in the Dockerfile, which is the scope this issue.

docker cp has indeed nothing to do with Dockerfile and . dockerignore, but on the other hand it's not used for building images.

@maresja1
Copy link

Would really like this as well - to speed up build I could copy some folder in earlier parts of the build and then cache would help me out ...

@olalonde
Copy link

I'm not sure I understand what the use case is but wouldn't just touching the files to exclude before COPY solve the problem?

RUN touch /app/node_modules
COPY . /app
RUN rm /app/node_modules

AFAIK COPY doesn't overwrite file which is why I think this might work.

@asbjornu
Copy link

asbjornu commented Jun 16, 2020

I don't like the suggestions to have to repeat everything inside the .dockerignore file for every COPY statement in the Dockerfile. Being able to remain DRY with what's going to be a part of the image and not should be a priority, imho.

Looking at #33923, I don't think it's coincidental that what you want to exclude from the build context is exactly the same stuff you want to be excluded from COPY statements. I believe something like this would be a good solution:

COPY --use-dockerignore <source> <target>

Or perhaps even something like this:

COPY --use-ignorefile=".gitignore" <source> <target>

Seeing how .dockerignore is usually a 90% reproduction of .gitignore already, it feels extra annoying having to repeat every ignored file and folder yet again for each and every COPY statement. There has to be a better way.

@Antiarchitect
Copy link

@asbjornu .gitignore and .dockerignore are not the same things at all. Especially for multistage builds where artifacts are generated on a build stage and not present in git at all, nevertheless should be included in the resulting image.
And yes, with multistage builds introduced THERE SHOULD BE an ability to use different .dockerignore files per stage - absolutely.

@richardrl
Copy link

I often want to copy outside of "docker build". In these cases, .dockerignore does nothing. We need an amendment to "docker cp" its the only sensible solution

@Gaura
Copy link

Gaura commented Sep 24, 2020

It's been 5 years that this issue was opened. In September 2020, I still want this. A lot of people have suggested hacks to workaround but almost all of them and others have requested exclude flag in some form or another. Please don't let this issue go unresolved for more time now.

@cpuguy83
Copy link
Member

If you want something, you need to work on it or find someone to work on it.

@bronger
Copy link
Author

bronger commented Sep 24, 2020

If you want something, you need to work on it or find someone to work on it.

First we need to know whether upstream wants this.

@cpuguy83
Copy link
Member

#15771 (comment)

@stalkerg
Copy link

stalkerg commented Oct 19, 2020

After source code review, I think we should extend copy function here https://github.com/tonistiigi/fsutil/blob/master/copy/copy.go firstly. After that, we can extend backend.go in libsolver, and only after will be possilble extend AST and frontend of buildkit.
But after that, the copy will be close to rsync semantic than unix cp.

UPDATE: yes, after extending copy.go everything will be close to moby/buildkit#1492 plus parsing list of excludes.

@hetii
Copy link

hetii commented Mar 4, 2021

Here #33923 (comment) I describe my workaround that use any .dockerignore in project.

@smac89
Copy link

smac89 commented Apr 19, 2021

I just wanted to leave a comment here to say that any suggestion which involves fist doing COPY followed by RUN rm ..., defeats the purpose of having immutable builds, and the advantage of caching layers between builds.

The problem is that the moment that ignored file is modified, the next build becomes invalidated and has to discard its cache, which makes the build take longer. See the note at COPY:

Note
The first encountered COPY instruction will invalidate the cache for all following instructions from the Dockerfile if the contents of have changed. This includes invalidating the cache for RUN instructions. See the Dockerfile Best Practices guide – Leverage build cache for more information.

So really any solution that will work has to strive to preserve the build cache, otherwise it's not worth it

@srstsavage
Copy link

Looks like progress has been made in moby/buildkit#2082 but that selective COPY still isn't available in Docker.

Looking forward to this feature. I have a situation now where I want to copy a large directory of data assets into an image in a step before copying in the rest of the project's assets. The data directory rarely changes, so I want to avoid copying it in and creating another large image layer every time a change happens in a small text file outside of that large directory. Currently this doesn't seem possible unless I exhaustively specify every asset outside of the data directory in the COPY directive or move all non-data assets into a subdirectory in the project.

@gtarsia
Copy link

gtarsia commented Mar 2, 2022

TLDR for people looking for a solution: I've read this whole thread and the most viable solution I've seen is this one where he tars the entire directory, then in the Dockerfile use the ADD instruction to get those files. Not ideal, but the best we probably have. #15771 (comment).

I would suggest this as a better solution: DON'T use the --exclude flag in tar because there's a lot of weirdness to it (like the archive trying to archive itself because of using . as argument, even if you exclude it with --exclude, so if you use * as argument, so as to not archive the archive, it will pick everything overriding whatever --exclude flags you had, and sure, we could archive the pkg outside of our project, but that's not clean or safe, did I mention --exclude flags should be placed as first flags when using tar for some weird reason otherwise it doesn't work? takes deep breath).

Instead, pipe your list of files to tar, that way we don't have to deal with none of tars insanity.

ls -A -I <file-or-dir-to-ignore> -I node_modules -I '.git*' | xargs tar --mtime='1970-01-01' -zcf pkg.tgz
docker build ......

Here I used the -I flag in ls to ignore these files/dirs.
Also, I'm forcing the modified time of the archived files to the same date so even if you resave the non-excluded files with same contents, the cache won't be invalidated because of different modified time attributes.

In case it's unclear to someone, this is helpful for multi stage builds, if you don't have a multi stage build just ignore files in .dockerignore.

Ontopic now:

Hoping 2022 is finally the year this gets implemented.

It's funny to think that this problem originates because docker uses Golang's Match which was written with time guarantees as central aspect. Apparently, it's very hard to implement a regex with both time guarantees and negative lookaheads, otherwise something like COPY [(?!excludedfile)] . would have worked just fine.

@alahijani
Copy link

alahijani commented Aug 10, 2022

There is another workaround using multistage builds that also plays nice with the cache. You would need to add a new FROM stage before your existing stages and delete the excluded files in that stage. I call it the scratchpad stage:

# The scratchpad where we curate the files before the actual build:
FROM alpine AS scratchpad
WORKDIR /files
COPY maindir maindir
RUN rm -rf maindir/ex1 maindir/ex2 maindir/ex3...

# Your original Dockerfile goes here:
FROM ...
...
COPY --from=scratchpad /files/maindir maindir
RUN <stuff that depends on maindir...>

If you make a change to any of the excluded files it will only invalidate the cache for the scratchpad, but since the end result after rm is the same, the second FROM stage still uses its cached layers.

@FelipeJz
Copy link

I can't believe @matthewmueller proposal hasn't been implemented after all these years, wow.

@criscola
Copy link

criscola commented Oct 4, 2023

Can we just get one simple --exclude option for COPY, please?

@ye7iaserag
Copy link

This is very annoying since now we can use --mount=type=bind which has to be inside the context and if you add bound directory to the .dockerignore you get an error.
In my use case I use this bind mount to install stuff and it has 10GB of data, we are forced to have those useless 10GBs in the final image which makes no sense...

@thaJeztah
Copy link
Member

This is something that needs to be implemented in BuildKit, and there's tracking issues that are still open;

That said, using a --mount allows you to selectively copy files, using tools such as rsync (which may be more powerful than .dockerignore as well). With that approach, you can use a RUN instead of COPY to select the files to copy. Note, however, that the build-context still needs to be sent to the builder (as --mount during build will always use a sandboxed / copy of files).

Here's a quick example;

Create a "project" with some directories and files, some of which to be excluded

mkdir cpexclude && cd cpexclude
mkdir -p exclude_me include_me/dir
touch one two three exclude_me/four exclude_me/five exclude_me/six include_me/foo include_me/dir/bar

build Dockerfile, using a mount for the build-context, and use rsync instead of COPY (I'm using )

# syntax=docker/dockerfile:1

FROM alpine
WORKDIR /app
RUN apk add --no-cache rsync tree
RUN --mount=type=bind,target=/temp/src \
  rsync -ar --progress /temp/src/ /app/ --exclude exclude_me

Verify that the expected files are included, and the excude_me dir is not present;

docker run --rm foo tree /app

/app
├── include_me
│   ├── dir
│   │   └── bar
│   └── foo
├── one
├── three
└── two

But can be bind-mounted at runtime (if needed);

docker run --rm --mount type=bind,src=$(pwd)/exclude_me,dst=/app/exclude_me foo tree /app
/app
├── exclude_me
│   ├── five
│   ├── four
│   └── six
├── include_me
│   ├── dir
│   │   └── bar
│   └── foo
├── one
├── three
└── two

@ye7iaserag
Copy link

Thanks @thaJeztah
I wrote a bash script to generate the COPY instructions for all folders except ignored ones, and I update them in the dockerfile, which feels like a very bad solution but at least it's working for now

@Nowaker
Copy link

Nowaker commented Oct 29, 2023

but at least it's working for now

It's not an acceptable solution changing a command causes a cache-miss and fresh layers are built. Which is a waste.

@thaJeztah
Copy link
Member

It's not an acceptable solution

It was an example for an alternative / workaround.

As mentioned at the start of that comment, ultimately, this is something that requires changes in BuildKit, and there's tracking issues for that;

This is something that needs to be implemented in BuildKit, and there's tracking issues that are still open;

I'm locking the conversation on this ticket, as there's nothing actionable in this repository until this is supported in BuildKit. Once supported by BuildKit, and the BuildKit build-time dependency is updated in this repository, this ticket can be resolved.

@moby moby locked and limited conversation to collaborators Oct 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests