Skip to content

dockerignore implementation is relatively slow compared to Docker's implementation #859

@thomasboyt

Description

@thomasboyt

I ran into an issue in a project where my builds - run through docker-compose - seemed to be taking an awfully long time (around ~60 seconds) during the context build/upload stage. strace showed a ton of time was being spent stat()ing files that were included in my .dockerignore rules, which I found curious.

Oddly, when I simply used docker build to build the container, I didn't have this issue, and context build/upload took about ~3-5 seconds. I couldn't figure out what was going wrong, so I investigated docker-py, and found that almost all of my execution time was spent in this get_paths call.

It appears that the difference in execution time is because docker-py's implementation of dockerignore/tar exclusion is far slower than Docker's:

Docker's implementation of the dockerignore exclusion algorithm, (seen here), walks through each folder, but does not descend into a directory if it matched an exclusion pattern. Meanwhile, docker-py first gets an array of every single file in the context folder, and then applies a filter to the array. This seems to be what is causing the massive difference in execution time when I build my project - docker-py is iterating over thousands of files that Docker correctly ignores.

I started on a fix, using what I believe are the same rules as Docker's algorithm: thomasboyt@9f302f6

This runs just as fast as Docker's implementation, but doesn't fully implement exception rules (e.g. !foo), leading it to fail a few tests. Before I go through and add this feature, I wanted to confirm that I'm on the right path (and that no one else has a better solution/algorithm to apply).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions