Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a docker image for caffe development. #3518

Merged
merged 1 commit into from
Feb 27, 2016

Conversation

elezar
Copy link
Contributor

@elezar elezar commented Jan 5, 2016

The idea is to add docker files for Caffe. Both for development purposes (my original motivation) and for providing pre-built Caffe images. The following steps can be considered:

@elezar
Copy link
Contributor Author

elezar commented Jan 6, 2016

Thanks for the pointers, I will have a look at the best practices and adjust the docker file accordingly. The code to change the user started as a more general script, and I can try to simplify it to be better suited to the task of building caffe.

In terms of usefulness, I use the image locally and thought I would submit it as a pull request so that it is part of caffe as it was also mentioned in #2313 (comment).

If it is not found to be useful, we can close the pull request without merging, and I will move the utilities to its own repository.

@flx42
Copy link
Contributor

flx42 commented Jan 6, 2016

I just think there is a lot of code to set the user, it might scare Docker beginners.

But don't get me wrong, I think this PR is very useful, the Caffe users list is full of topics where people are struggling to compile Caffe on their machine. You don't need to convince me of the benefits offered by Docker, but I'm not a maintainer of Caffe.
I think it would be great to have an additional Dockerfile where Caffe and all its dependencies would be built, this would be for users that are not interested in doing low-level Caffe development. Using Caffe won't even require cloning the github repo anymore, it would be done in the Dockerfile, using Caffe would be as simple as this:

docker build -t caffe github.com/BVLC/caffe#:docker
nvidia-docker run -ti caffe [args...]

@AwokeKnowing
Copy link

This is excellent. This will allow a reliable way to guide students through tutorials, as well as let programmers cleanly keep up to date with caffe releases as well as keep their scripts running on previous versions if breaking changes occur. Hopefully there will be a repository on dockerhub of images of past versions (from here on out). I hope the dockerfile can get some attention and get "generalized" or whatever it needs to be a standard way of using caffe.

@elezar
Copy link
Contributor Author

elezar commented Jan 7, 2016

Given that there seems to be demand, what needs to be changed/added so that this is as useful as possible? Some thoughts:

Something that would be nice to have would be that the docker images be tagged using the Caffe version. What is the status of #3311?

@flx42
Copy link
Contributor

flx42 commented Jan 11, 2016

I think those are good ideas, I would keep things simple as a start, it will be easier for the project maintainers to review, especially if they don't have any prior experience with Docker.
I think we should start with a single Docker image where Caffe if entirely pre-built, new users will be able to start a shell with Caffe installed with just a few commands.
Then, if this first approach is successful, we could add another image for development, for advanced users.

@elezar
Copy link
Contributor Author

elezar commented Jan 25, 2016

I have made some changes to the Dockerfiles. There are now two (CPU and GPU) Dockerfiles which provide runtime containers. These can be used as replacements for the caffe executable?

I have also simplified the script for running the development container somewhat.

I would appreciate input wrt finishing this pull request off? Does BVLC have a Docker Hub orgainisation, for example, so that we can set up triggered builds there? Are there any comments related to where the Dockerfiles are specified, and whether other flavors (e.g. OpenBLAS, CUDA 7.0, non-CUDNN) should be made available?

@flx42
Copy link
Contributor

flx42 commented Jan 26, 2016

I think this is a good start, I don't think other flavors are required right now. The benefit of Docker is to bundle all the dependencies, so for instance cuDNN is already included in the image. It will be transparent for users.

I would use the cmake build system actually instead of having to copy Makefile.config.example and then using sed to modify the options.

@elezar
Copy link
Contributor Author

elezar commented Jan 29, 2016

Ok. I will switch the docker files to cmake instead and only have the two flavours (CPU and GPU with the latest tested CUDA and CUDNN).

Any comments on the location of the docker files? Or are docker/runtime/cpu and docker/runtime/gpu ok for this purpose?

@flx42
Copy link
Contributor

flx42 commented Jan 29, 2016

Regarding the location: ask the maintainers, they are all labeled to this PR so they will probably answer when they have the time :)

@flx42
Copy link
Contributor

flx42 commented Feb 10, 2016

@elezar Are you done working on this PR? If you're done let me know so I can review it again.

@shelhamer: you tagged this PR, is Docker going to be on your roadmap in the future?

@elezar
Copy link
Contributor Author

elezar commented Feb 10, 2016

From my side, the Docker images are done. If someone could check out the PR
and merge, or give me feedback, that would be greatly appreciated.

@flx42
Copy link
Contributor

flx42 commented Feb 11, 2016

Please squash all your patches into one for easier review (see git rebase --interactive).

@elezar
Copy link
Contributor Author

elezar commented Feb 12, 2016

@flx42 I have squashed the changes, and tried to make the docker files more in keeping with the steps given on the installation website.

I have also removed the additions (such as jupyter ports) which are not core to the requirements of the images. I have added Dockerfile generation options to the makefile (although they are themselves checked in) so that these are generated automatically from templates. This ensures that they are uniform, and adds some options in terms of extending these for future versions.

wget

# Install the python requirements.
RUN wget https://raw.githubusercontent.com/BVLC/caffe/master/python/requirements.txt -O /tmp/requirements.txt && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used docker so I'm definitely not an expert, but is it necessary to wget requirements.txt like this rather than getting it from the (presumably already downloaded) local copy of caffe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in response to one of @flx42 's comments, the development
image is intended to be used to build caffe from source locally (with
folders mounted as volumes. See start_caffe_docker.sh for example). This
means that the caffe source (and thus requirements.txt) is not available to
the image at build time. You will note that the runtime Dockerfiles (and
their corresponding templates) do use the locally available requirements
files.

Of course, if someone has a suggestion as to how this could be improved,
I'm all ears.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the caffe source (and thus requirements.txt) is not available to the image at build time.

Why not? The source is available locally because it was checked out (in order to get the Dockerfile in the first place). Yes, you may need to run docker build at a higher level (run it at the root level of the repo, and ignore some things in .dockerignore to make it build faster), but everything is visible to the docker build script.

Why not just ADD the single file to /tmp/ (ADD python/requirements.txt /tmp/).

  1. If you use ADD and change requirements.txt, re-building the docker image will see the new version of the file, and use that. That will not happen with RUN since the command does not change (and thus you will use a stale copy).
  2. If you use RUN, the build will always use the remote master repository version (via wget), leading to some unintuitive errors if one tries to edit requirements.txt in a local copy and re-build the image.

@elezar elezar force-pushed the feature/docker_images branch 2 times, most recently from 462d18a to 3141521 Compare February 13, 2016 11:05

CMD bash
# Clone Caffe repo and move into it
RUN cd /opt && git clone https://github.com/BVLC/caffe.git && cd caffe && \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not instead require that you check out the repository, and then run docker build inside it? The way you've written it, you can only create a runtime docker image for the remote master branch. What if I want to make a runtime image for my own build? People modify caffe and have local versions all the time.

A more flexible approach would be to use ADD to add the local source files into /opt/. You check out the code first, then run docker build -t <some-tag> docker/runtime/gpu/Dockerfile.

@seanbell
Copy link

@elezar How do you see these images as being used? I understand that you want to create a canonical version that uses the official master-branch code, but I don't the proposed code achieves that.

The way it is written, the RUN command always checks out from master which doesn't have any version information. That isn't reliable since (a) the docker cache won't re-run the RUN commands by default, and (b) two people who build the same container will get different results since the git HEAD of the master branch changes over time.

I agree that for development you want to mount in the source code, and that is indeed helpful. But for everything else that gets copied into the image itself, I think ADD is the right way to go, not git clone or wget. If you do it with ADD, two people building the same container (from the same commit or git tag) will get the same result.

Alternatively, if you really want to use wget or git clone (which I think is a bad idea), you need to change the URL to refer to a specific commit or version number.

@flx42
Copy link
Contributor

flx42 commented Feb 13, 2016

Thank you for joining the debate, @seanbell! Disclaimer: if you don't know, I'm a maintainer of https://github.com/NVIDIA/nvidia-docker

Let's merge the answers into the main thread.

Why not? The source is available locally because it was checked out (in order to get the Dockerfile in the first place).

Not with his approach, he wants users to compile Caffe after shelling inside the container.
The docker build step will simply setup an environment with all the dependencies installed, the current code is mounted when executing:
https://github.com/zalando/caffe/blob/314152192e69b1fd83d7db962cb88add2fb64c0f/docker/start_caffe_docker.sh#L36
I think the idea is that you do all your development (coding, compiling, debugging) while staying inside the same container shell and the file edits will be persisted to your host filesystem (because of the volume).
But this looks like an advanced use case and I'm not sure we should encourage this method, at least not now.
Also, @seanbell is right, the wget approach looks error prone.

Why not instead require that you check out the repository, and then run docker build inside it? The way you've written it, you can only create a runtime docker image for the remote master branch. What if I want to make a runtime image for my own build? People modify caffe and have local versions all the time.

I think this use case is more suited to the "devel" image. The runtime image is precisely for people that don't want to do that. They want to start using Caffe, or they are already advanced users but they don't have custom layers. People do use deb packages sometimes, right? :)
Ideally, the runtime image could simply install a Caffe deb package and all the dependencies, this is what we did for NVIDIA/caffe:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile

An advantage of this self-contained approach is that you don't need to clone the code, you can build a Dockerfile from GitHub directly:

docker build -t caffe github.com/BVLC/caffe#:docker

I like the convenience of avoiding to copy the whole repo on my local file system. And the point of this runtime image is to rely on a pristine, well-tested version, you don't want to ADD your local changes by mistake. Since there are no deb packages currently, we can git clone to a specific hash or tag instead. Always checking out the latest master is not good for reproducibility, indeed.

@seanbell, what you suggest sounds fine for the devel images and maybe that's what should be done, instead of having to mount the source at container launch time. Developers would then still be able to use their own branches.

Retrospectively, I think it might be better to start simple by only including the runtime images in this PR. This would be sufficient for caffe newcomers during hands-on tutorials.
The runtime images should be easier for the maintainers to review, it's pretty straightforward: it just looks like a sequence of build steps.
The devel image could be in a latter PR once people are more familiar with Docker.

@elezar
Copy link
Contributor Author

elezar commented Feb 14, 2016

Thanks for the comments @seanbell, and also the nice summary of the use cases @flx42. I think you may be right in suggesting that this pull request only include the runtime images. This is the use case that probably caters most to the target audience. That is to say people who want to quickly try caffe, or for a sort of archiving for particular versions. I will remove the other files for the time being and add them as a separate pull request (I should still be able to address my own use-case with some shell scripts, although the images may be larger than needed).

With regards to:

The way it is written, the RUN command always checks out from master which doesn't have any version information. That isn't reliable since (a) the docker cache won't re-run the RUN commands by default, and (b) two people who build the same container will get different results since the git HEAD of the master branch changes over time.

Yes, that is true. I do have the following to add:
Even if the command were to be run again, anyone building the images would have a bleeding edge (or possibly broken) version. One option would be to use the release tags (once they are available), or even just specify a particular hash. Ideally the update of the tag/hash in the docker file should be automated so that the overhead of releasing a new tag is reduced. I'm not sure how to get past the different version problem -- other than providing tagged images on Docker Hub or the like.

@shelhamer
Copy link
Member

I was able to build the caffe:{cpu,gpu} images and run with docker, nvidia-docker respectively to invoke caffe --version and open a shell.

However, the LeNet docker example fails because nvidia-docker neither recognizes -v nor -w and instead requires the full --volume= and --workdir= arguments. @elezar could you update the example script? @flx42 is this normal with nvidia-docker or did I mess up?

Here is an example call

nvidia-docker run --rm -ti -u 1000:1000 --volume=/home/shelhamer/caffe-dev:/workspace --workdir=/workspace caffe:gpu caffe train --solver=examples/mnist/lenet_solver.prototxt

that ran fine.

@flx42
Copy link
Contributor

flx42 commented Feb 27, 2016

@shelhamer Yeah that's a bug in nvidia-docker caused by the Docker 1.10 update, their options are now ambiguous: NVIDIA/nvidia-docker#46
I have a patch for this, I will push it Monday if it passes my tests.
The current workaround is to use long options as you discovered. So, it's not related to this PR; but it doesn't hurt to use long options since it's more explicit anyway.

@shelhamer
Copy link
Member

Alright, well I'm happy to merge once the example works whether by long options in this PR or a fix to nvidia-docker. @elezar feel free to force push an amended commit.

(If you do amend, consider formatting the commit message so it has a "title" line <80 chars and then further description after a newline as it formats better on the command line and on github, but no worries.)

These can be used as direct replacements for the Caffe executable.
@elezar
Copy link
Contributor Author

elezar commented Feb 27, 2016

I have switched to the --volume and --workdir options instead of -v and -w, respectively. These changes have also been reflected in the README. I was only able to run a quick sanity check on OSX. I don't currently have access to a Linux box. @shelhamer if you could just try to run the example again, and merge if you are happy.

shelhamer added a commit that referenced this pull request Feb 27, 2016
[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)
@shelhamer shelhamer merged commit 59d099c into BVLC:master Feb 27, 2016
@shelhamer
Copy link
Member

Thanks for the Caffe containers @elezar ! Next we'll figure out how to include these on the docker hub.

shelhamer added a commit to shelhamer/caffe that referenced this pull request Feb 27, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
shelhamer added a commit that referenced this pull request Feb 27, 2016
@flx42
Copy link
Contributor

flx42 commented Feb 27, 2016

Thank you for the additional fix @shelhamer, this bug is embarrassing for us, we will fix it quickly.

@elezar elezar deleted the feature/docker_images branch February 28, 2016 10:38
@elezar
Copy link
Contributor Author

elezar commented Feb 28, 2016

Thanks @shelhamer.

As for getting the images onto Docker Hub, I see two options. Either one sets up an automated build on Docker Hub itself (I have already done this for https://hub.docker.com/r/elezar/caffe), or one builds the images as part of some CI/CD steps and pushes them. I think the latter is the route that NVIDIA follows for https://hub.docker.com/r/nvidia/. Maybe @flx42 could comment on this?

I should add that for the automated builds on Docker Hub seem to be failing for the GPU images (with no log output). @flx42 do you have an idea what the reason for this may be?

@flx42
Copy link
Contributor

flx42 commented Feb 29, 2016

@elezar: we tried using the automated build on the DockerHub but we gave up on this idea because it was not suitable for our complex model with multiple images (see the discussion here).
In your case, I think both options should work. But be aware that doing automated testing for the GPU images could be challenging.

I don't know why it would fail on the Docker Hub, it used to work when I tested.

BlGene pushed a commit to BlGene/caffe that referenced this pull request Feb 29, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
BlGene pushed a commit to BlGene/caffe that referenced this pull request Mar 4, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Mar 17, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
SvenTwo pushed a commit to SvenTwo/caffe that referenced this pull request Apr 6, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016
[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)
fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Oct 24, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Oct 24, 2016
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
acmiyaguchi pushed a commit to acmiyaguchi/caffe-fast-rcnn that referenced this pull request Nov 13, 2017
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC/caffe#3518 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants