Add a docker image for caffe development. #3518

Merged
merged 1 commit into from Feb 27, 2016

Conversation

Projects
None yet
7 participants
Contributor

elezar commented Jan 5, 2016

The idea is to add docker files for Caffe. Both for development purposes (my original motivation) and for providing pre-built Caffe images. The following steps can be considered:

  • Create an "official" Caffe image that allows for caffe to be run without first building it (NVIDIA has their own version at https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile, for example). This would be used as in #3518 (comment).
  • Automatically build the Docker image and push it to Docker Hub (or some other repository).
  • Clean up the development Docker file (and the script around it) to be more concise and suited to starting a Caffe build environment.
Contributor

elezar commented Jan 6, 2016

Thanks for the pointers, I will have a look at the best practices and adjust the docker file accordingly. The code to change the user started as a more general script, and I can try to simplify it to be better suited to the task of building caffe.

In terms of usefulness, I use the image locally and thought I would submit it as a pull request so that it is part of caffe as it was also mentioned in #2313 (comment).

If it is not found to be useful, we can close the pull request without merging, and I will move the utilities to its own repository.

Contributor

flx42 commented Jan 6, 2016

I just think there is a lot of code to set the user, it might scare Docker beginners.

But don't get me wrong, I think this PR is very useful, the Caffe users list is full of topics where people are struggling to compile Caffe on their machine. You don't need to convince me of the benefits offered by Docker, but I'm not a maintainer of Caffe.
I think it would be great to have an additional Dockerfile where Caffe and all its dependencies would be built, this would be for users that are not interested in doing low-level Caffe development. Using Caffe won't even require cloning the github repo anymore, it would be done in the Dockerfile, using Caffe would be as simple as this:

docker build -t caffe github.com/BVLC/caffe#:docker
nvidia-docker run -ti caffe [args...]

This is excellent. This will allow a reliable way to guide students through tutorials, as well as let programmers cleanly keep up to date with caffe releases as well as keep their scripts running on previous versions if breaking changes occur. Hopefully there will be a repository on dockerhub of images of past versions (from here on out). I hope the dockerfile can get some attention and get "generalized" or whatever it needs to be a standard way of using caffe.

Contributor

elezar commented Jan 7, 2016

Given that there seems to be demand, what needs to be changed/added so that this is as useful as possible? Some thoughts:

  • Create an "official" Caffe image that allows for caffe to be run without first building it (NVIDIA has their own version at https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile, for example). This would be used as in #3518 (comment).
  • Automatically build the Docker image and push it to Docker Hub (or some other repository).
  • Clean up the development Docker file (and the script around it) to be more concise and suited to starting a Caffe build environment.

Something that would be nice to have would be that the docker images be tagged using the Caffe version. What is the status of #3311?

Contributor

flx42 commented Jan 11, 2016

I think those are good ideas, I would keep things simple as a start, it will be easier for the project maintainers to review, especially if they don't have any prior experience with Docker.
I think we should start with a single Docker image where Caffe if entirely pre-built, new users will be able to start a shell with Caffe installed with just a few commands.
Then, if this first approach is successful, we could add another image for development, for advanced users.

Contributor

elezar commented Jan 25, 2016

I have made some changes to the Dockerfiles. There are now two (CPU and GPU) Dockerfiles which provide runtime containers. These can be used as replacements for the caffe executable?

I have also simplified the script for running the development container somewhat.

I would appreciate input wrt finishing this pull request off? Does BVLC have a Docker Hub orgainisation, for example, so that we can set up triggered builds there? Are there any comments related to where the Dockerfiles are specified, and whether other flavors (e.g. OpenBLAS, CUDA 7.0, non-CUDNN) should be made available?

Contributor

flx42 commented Jan 26, 2016

I think this is a good start, I don't think other flavors are required right now. The benefit of Docker is to bundle all the dependencies, so for instance cuDNN is already included in the image. It will be transparent for users.

I would use the cmake build system actually instead of having to copy Makefile.config.example and then using sed to modify the options.

Contributor

elezar commented Jan 29, 2016

Ok. I will switch the docker files to cmake instead and only have the two flavours (CPU and GPU with the latest tested CUDA and CUDNN).

Any comments on the location of the docker files? Or are docker/runtime/cpu and docker/runtime/gpu ok for this purpose?

Contributor

flx42 commented Jan 29, 2016

Regarding the location: ask the maintainers, they are all labeled to this PR so they will probably answer when they have the time :)

Contributor

flx42 commented Feb 10, 2016

@elezar Are you done working on this PR? If you're done let me know so I can review it again.

@shelhamer: you tagged this PR, is Docker going to be on your roadmap in the future?

Contributor

elezar commented Feb 10, 2016

From my side, the Docker images are done. If someone could check out the PR
and merge, or give me feedback, that would be greatly appreciated.

Contributor

flx42 commented Feb 11, 2016

Please squash all your patches into one for easier review (see git rebase --interactive).

Contributor

elezar commented Feb 12, 2016

@flx42 I have squashed the changes, and tried to make the docker files more in keeping with the steps given on the installation website.

I have also removed the additions (such as jupyter ports) which are not core to the requirements of the images. I have added Dockerfile generation options to the makefile (although they are themselves checked in) so that these are generated automatically from templates. This ensures that they are uniform, and adds some options in terms of extending these for future versions.

@jeffdonahue jeffdonahue and 2 others commented on an outdated diff Feb 12, 2016

docker/templates/devel.template
+# Use --no-install-recommends for the boost libs.
+RUN apt-get install -y --no-install-recommends \
+ libboost-all-dev
+
+# Install utilities used for the build.
+RUN apt-get install -y \
+ build-essential \
+ cmake \
+ cmake-curses-gui \
+ git \
+ ca-certificates \
+ bc \
+ wget
+
+# Install the python requirements.
+RUN wget https://raw.githubusercontent.com/BVLC/caffe/master/python/requirements.txt -O /tmp/requirements.txt && \
@jeffdonahue

jeffdonahue Feb 12, 2016

Contributor

I've never used docker so I'm definitely not an expert, but is it necessary to wget requirements.txt like this rather than getting it from the (presumably already downloaded) local copy of caffe?

@elezar

elezar Feb 13, 2016

Contributor

As I mentioned in response to one of @flx42 's comments, the development
image is intended to be used to build caffe from source locally (with
folders mounted as volumes. See start_caffe_docker.sh for example). This
means that the caffe source (and thus requirements.txt) is not available to
the image at build time. You will note that the runtime Dockerfiles (and
their corresponding templates) do use the locally available requirements
files.

Of course, if someone has a suggestion as to how this could be improved,
I'm all ears.

@seanbell

seanbell Feb 13, 2016

Contributor

This means that the caffe source (and thus requirements.txt) is not available to the image at build time.

Why not? The source is available locally because it was checked out (in order to get the Dockerfile in the first place). Yes, you may need to run docker build at a higher level (run it at the root level of the repo, and ignore some things in .dockerignore to make it build faster), but everything is visible to the docker build script.

Why not just ADD the single file to /tmp/ (ADD python/requirements.txt /tmp/).

  1. If you use ADD and change requirements.txt, re-building the docker image will see the new version of the file, and use that. That will not happen with RUN since the command does not change (and thus you will use a stale copy).
  2. If you use RUN, the build will always use the remote master repository version (via wget), leading to some unintuitive errors if one tries to edit requirements.txt in a local copy and re-build the image.

@seanbell seanbell commented on an outdated diff Feb 13, 2016

docker/runtime/gpu/Dockerfile
+ ca-certificates \
+ bc \
+ wget
+
+# Install the python requirements.
+RUN wget https://raw.githubusercontent.com/BVLC/caffe/master/python/requirements.txt -O /tmp/requirements.txt && \
+ cat /tmp/requirements.txt && \
+ cat /tmp/requirements.txt | xargs -n1 pip install
+
+# Clean up the apt-get cache and remove unused packages.
+RUN apt-get clean \
+ && rm -rf /var/lib/apt/lists/*
+
+CMD bash
+# Clone Caffe repo and move into it
+RUN cd /opt && git clone https://github.com/BVLC/caffe.git && cd caffe && \
@seanbell

seanbell Feb 13, 2016

Contributor

Why not instead require that you check out the repository, and then run docker build inside it? The way you've written it, you can only create a runtime docker image for the remote master branch. What if I want to make a runtime image for my own build? People modify caffe and have local versions all the time.

A more flexible approach would be to use ADD to add the local source files into /opt/. You check out the code first, then run docker build -t <some-tag> docker/runtime/gpu/Dockerfile.

@seanbell seanbell commented on an outdated diff Feb 13, 2016

docker/runtime/cpu/Dockerfile
+ ca-certificates \
+ bc \
+ wget
+
+# Install the python requirements.
+RUN wget https://raw.githubusercontent.com/BVLC/caffe/master/python/requirements.txt -O /tmp/requirements.txt && \
+ cat /tmp/requirements.txt && \
+ cat /tmp/requirements.txt | xargs -n1 pip install
+
+# Clean up the apt-get cache and remove unused packages.
+RUN apt-get clean \
+ && rm -rf /var/lib/apt/lists/*
+
+CMD bash
+# Clone Caffe repo and move into it
+RUN cd /opt && git clone https://github.com/BVLC/caffe.git && cd caffe && \
@seanbell

seanbell Feb 13, 2016

Contributor

See comment on the gpu version -- cloning the official master branch only is limiting.

Contributor

seanbell commented Feb 13, 2016

@elezar How do you see these images as being used? I understand that you want to create a canonical version that uses the official master-branch code, but I don't the proposed code achieves that.

The way it is written, the RUN command always checks out from master which doesn't have any version information. That isn't reliable since (a) the docker cache won't re-run the RUN commands by default, and (b) two people who build the same container will get different results since the git HEAD of the master branch changes over time.

I agree that for development you want to mount in the source code, and that is indeed helpful. But for everything else that gets copied into the image itself, I think ADD is the right way to go, not git clone or wget. If you do it with ADD, two people building the same container (from the same commit or git tag) will get the same result.

Alternatively, if you really want to use wget or git clone (which I think is a bad idea), you need to change the URL to refer to a specific commit or version number.

Contributor

flx42 commented Feb 13, 2016

Thank you for joining the debate, @seanbell! Disclaimer: if you don't know, I'm a maintainer of https://github.com/NVIDIA/nvidia-docker

Let's merge the answers into the main thread.

Why not? The source is available locally because it was checked out (in order to get the Dockerfile in the first place).

Not with his approach, he wants users to compile Caffe after shelling inside the container.
The docker build step will simply setup an environment with all the dependencies installed, the current code is mounted when executing:
https://github.com/zalando/caffe/blob/314152192e69b1fd83d7db962cb88add2fb64c0f/docker/start_caffe_docker.sh#L36
I think the idea is that you do all your development (coding, compiling, debugging) while staying inside the same container shell and the file edits will be persisted to your host filesystem (because of the volume).
But this looks like an advanced use case and I'm not sure we should encourage this method, at least not now.
Also, @seanbell is right, the wget approach looks error prone.

Why not instead require that you check out the repository, and then run docker build inside it? The way you've written it, you can only create a runtime docker image for the remote master branch. What if I want to make a runtime image for my own build? People modify caffe and have local versions all the time.

I think this use case is more suited to the "devel" image. The runtime image is precisely for people that don't want to do that. They want to start using Caffe, or they are already advanced users but they don't have custom layers. People do use deb packages sometimes, right? :)
Ideally, the runtime image could simply install a Caffe deb package and all the dependencies, this is what we did for NVIDIA/caffe:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile

An advantage of this self-contained approach is that you don't need to clone the code, you can build a Dockerfile from GitHub directly:

docker build -t caffe github.com/BVLC/caffe#:docker

I like the convenience of avoiding to copy the whole repo on my local file system. And the point of this runtime image is to rely on a pristine, well-tested version, you don't want to ADD your local changes by mistake. Since there are no deb packages currently, we can git clone to a specific hash or tag instead. Always checking out the latest master is not good for reproducibility, indeed.

@seanbell, what you suggest sounds fine for the devel images and maybe that's what should be done, instead of having to mount the source at container launch time. Developers would then still be able to use their own branches.

Retrospectively, I think it might be better to start simple by only including the runtime images in this PR. This would be sufficient for caffe newcomers during hands-on tutorials.
The runtime images should be easier for the maintainers to review, it's pretty straightforward: it just looks like a sequence of build steps.
The devel image could be in a latter PR once people are more familiar with Docker.

Contributor

elezar commented Feb 14, 2016

Thanks for the comments @seanbell, and also the nice summary of the use cases @flx42. I think you may be right in suggesting that this pull request only include the runtime images. This is the use case that probably caters most to the target audience. That is to say people who want to quickly try caffe, or for a sort of archiving for particular versions. I will remove the other files for the time being and add them as a separate pull request (I should still be able to address my own use-case with some shell scripts, although the images may be larger than needed).

With regards to:

The way it is written, the RUN command always checks out from master which doesn't have any version information. That isn't reliable since (a) the docker cache won't re-run the RUN commands by default, and (b) two people who build the same container will get different results since the git HEAD of the master branch changes over time.

Yes, that is true. I do have the following to add:
Even if the command were to be run again, anyone building the images would have a bleeding edge (or possibly broken) version. One option would be to use the release tags (once they are available), or even just specify a particular hash. Ideally the update of the tag/hash in the docker file should be automated so that the overhead of releasing a new tag is reduced. I'm not sure how to get past the different version problem -- other than providing tagged images on Docker Hub or the like.

Owner

shelhamer commented Feb 14, 2016

@elezar

I think you may be right in suggesting that this pull request only include the runtime images. This is the use case that probably caters most to the target audience. That is to say people who want to quickly try caffe, or for a sort of archiving for particular versions.

As I'm still unfamiliar with Docker and would like to include an out-of-the-box method to run Caffe I like the suggestion of splitting off the dev image and focusing on the runtime image in this PR. The dev image seems useful in itself, but new users do need the most help and it sounds like it will be easier to review (giving me more time to get up to speed with Docker and figure out the dev image).

Thanks for the work on this @elezar, the perspective @flx42, and the review @seanbell!

Contributor

elezar commented Feb 14, 2016

I have removed the development images from this pull request (I will create a new pull request for this as soon as I get a chance).

@seanbell, I have not yet addressed your concerns with regards to the clone command in the Dockerfile. Suggestions would be welcome.

Contributor

flx42 commented Feb 14, 2016

Also, maybe we can find a better name than runtime for this case.
What about deploy?

Contributor

elezar commented Feb 14, 2016

Maybe standalone?

Contributor

flx42 commented Feb 15, 2016

We're almost there I think, here's what I suggest:
https://gist.github.com/flx42/7769849495a91bea52af

  • Only one apt-get install command.
  • We can use --no-install-recommends for all the packages, it will limit the size of the image.
  • apt-get clean is not needed since it's automatically done for Docker Ubuntu images.
  • gfortran is not needed anymore
  • Use git clone --depth 1 for a faster clone.
  • Use a for loop instead of xargs, with xargs you won't stop at the first error and finding the error message is troublesome afterwards.
  • Don't build the tests, it will just take more space in the image.
  • Regarding the entrypoint, I don't know what's the best option. Since caffe is in the path already I'm not sure we want to fix the entrypoint. For running a small python tutorial it might actually get in the way.
  • Concerning the install path, I'm still unsure what we should do. I think that's the last question to answer.

I verified it briefly, it seems to work.

Contributor

elezar commented Feb 15, 2016

Thanks for the comments. I have updated the Dockerfiles accordingly. Some notes that I would like to add:

  • I have kept the WORKDIR /workspace line for the time being. In order for the docker image to be used as an executable, files on the local system need to be mapped to the container using volumes, and it is useful to have a fixed point in this case. (the alternative would be to require the -w docker command line flag).
  • I have let the ENTRYPOINT ["caffe"] line in. I think for the primary use-case (running caffe as an executable), this is the way to go. The alternative would be to require something like: docker run -ti caffe caffe and I'm not a big fan of the stuttering. Note that if one wants to run python scripts or the like, one could change the entry point in the command line accordingly: docker run -ti --entrypoint python caffe some_python_script.py.

I don't feel that strongly about these though, and since they can both be addressed from the command line, I am willing to make the changes if they are preferred.

Contributor

flx42 commented Feb 15, 2016

Alright, let's see if someone else has an opinion on the entrypoint.
Otherwise: LGTM now!
Thanks for the effort :)

Contributor

elezar commented Feb 15, 2016

Thanks! @seanbell and @shelhamer do you have any final comments then?

@seanbell seanbell commented on an outdated diff Feb 17, 2016

docker/Makefile
@@ -0,0 +1,47 @@
+# A makefile to build the docker images for caffe.
+
+DOCKER ?= docker
+
+all: docker_files standalone
+
+.PHONY: standalone devel
+
+standalone: cpu_standalone gpu_standalone
+
+
+cpu_standalone: standalone/cpu/Dockerfile
+ $(DOCKER) build -t caffe standalone/cpu
@seanbell

seanbell Feb 17, 2016

Contributor

Should this be -t caffe:cpu, for symmetry with the GPU version?

@seanbell seanbell commented on an outdated diff Feb 17, 2016

docker/README.md
@@ -0,0 +1,46 @@
+# Caffe standalone Dockerfiles.
+
+The `standalone` subfolder contains docker files for generating both CPU and GPU executable images for Caffe. The images can be built using make, or by running:
+
+```
+docker build -t caffe standalone/gpu
@seanbell

seanbell Feb 17, 2016

Contributor

This is inconsistent with the Makefile. In the makefile, the name caffe (by itself) refers to the CPU version.

I think it would be clearest to tag them with caffe:gpu and caffe:cpu, or make caffe by itself refer to the GPU version since that's the default.

@seanbell seanbell commented on an outdated diff Feb 17, 2016

docker/README.md
+The `standalone` subfolder contains docker files for generating both CPU and GPU executable images for Caffe. The images can be built using make, or by running:
+
+```
+docker build -t caffe standalone/gpu
+```
+for example.
+
+Note that the GPU standalone requires a CUDA 7.5 capable driver to be installed on the system and [nvidia-docker|https://github.com/NVIDIA/nvidia-docker] for running the Docker containers.
+
+# Running Caffe using the docker image
+
+In order to test the Caffe image, run:
+```
+docker run -ti caffe --version
+```
+or
@seanbell

seanbell Feb 17, 2016

Contributor

This "or" is confusing since one is CPU and one is GPU (according to the Makefile). I think it would be more clear to explain the convention somewhere in this file near the top. There seems to be conflicting conventions right now.

Contributor

seanbell commented Feb 17, 2016

Sorry for the delay in replying.

Regarding entrypoint/workdir:
I personally prefer not having an entrypoint, as it leads to yet another thing to explain and mess up in tutorials, but that's just my opinion. The complexity vs "amount of typing saved" tradeoff isn't enough imo. It also looks less weird if you change the name of the image to caffe:gpu.

That being said, I'm okay with keeping in the entrypoint as long as it's explained in the README (e.g. what to do if you want another command like bash or python). It adds complexity, not reduces it.

Regarding RUN vs git clone:
I think there are 2 use cases for a runtime docker file right now, and the current code only addresses one of them, but could address both easily.

  1. Provide a canonical container artifact for trying out caffe, out-of-the-box. I think the best way to achieve this is to publish a docker image (e.g. on the hub) and refer to that. With the hub, people don't need to check out the repository. They just use a single docker run command and it pulls in the docker image. With that approach, it doesn't really matter whether you use git clone or ADD.
  2. Allow users to modify caffe (e.g. add new layers) and then use/share it as a standalone artifact. With this, you need to use ADD to get the locally modified version. You could modify the git clone step, which isn't elegant.

@flx42 Regarding ADD potentially including modifications, I don't think that this adds to any confusion. In general, if you download a library, modify it, and then re-build it locally, you would expect that the resulting artifact contains your modifications. The current proposed standalone Dockerfile does not have that property. If you wanted the pristine artifact, you would use the docker hub.

@fix42 regarding "And the point of this runtime image is to rely on a pristine, well-tested version, you don't want to ADD your local changes by mistake.", I think that using docker hub addresses that concern. The pristine well-tested version is the remote artifact.

I still think that using git clone is more confusing/problematic than ADD, but I don't want to get in the way of progress. At a minimum, the known issues associated with it should be clearly documented in the README, i.e. the fact that this builds the bleeding-edge version and that the docker build cache won't re-clone the newest version of caffe master unless the cache is cleared.

Contributor

flx42 commented Feb 17, 2016

I think the PR is currently building the bleeding-edge version as a placeholder, we really need Caffe to tag version 1.0.0 soon. It would be useful for the DockerHub too.

Contributor

elezar commented Feb 18, 2016

Thanks for all the comments @seanbell (and @flx42) again.

I have tried to clean up the readme, and to use the tags (:cpu and :gpu consistently). I have removed the predefined entrypoint. As was pointed out, this does make things a little more complex.

With regards to the cloning and building a specific branch, I have added a docker build arg (CLONE_TAG) to the Dockerfiles. This currently defaults to master (and thus the bleeding edge), but as soon as we have a tagged release, then we can change this accordingly. We can look into whether Dockerhub allows for the specification of build tags. It may be feasible to have both stable (tagged) images and the bleeding edge available there.

Contributor

seanbell commented Feb 18, 2016

It would be good to merge the pip install pydot command (that you just added) into the command that installs the other pip packages with &&. There's no reason why it needs to be a separate Docker filesystem layer.

Otherwise, this looks good to me! Thanks for all the changes.

Hi guys, I sooo glad the docker solution is being implemented. I am the "new guy who just wants to try out caffee" use case. The workflow I expect to use is like this:

  • I make a folder in my home directory called ~/digits/
  • In that folder, I just put digits.csv, and my layer defs and training script
  • (Here I don't know what to do, but what I want is to iteratively "edit a layer, type one docker command, and see the new training results", with the weights in my current folder (or wherever specified in my training script))
  • If I like the training results, I can just go on another computer (robot) that has ROS and docker installed, copy the weights and a script into a folder, and type one docker run command that will start up caffe and run the script and generate predictions as the script gets new data (eg, from ROS)

Is there yet a tutorial example of doing that with docker (ubuntu 14.04)? What I want is to NOT install caffe on any computer. In fact, I don't want to install the CUDA toolkit or anything else. Only have the layers/scripts + docker image.

Contributor

elezar commented Feb 19, 2016

@seanbell I have now moved the pydot install to the point where the requirements file is processed.

@AwokeKnowing It sounds as if the images would be useful for your use case. If you want to look at how the images can be used to train a model, the pull request includes an mnist example at examples/mnist/train_lenet_docker.sh.

Contributor

seanbell commented Feb 19, 2016

If everyone's happy with the current state, then the next step is to squash everything into a single commit, for a clean git history.

@shelhamer any other comments?

Contributor

elezar commented Feb 19, 2016

Sure. As soon as I get the go-ahead, I will squash and push again.

(by the way, it seems as if DockerHub does not yet support the ARG keyword in the docker files). I have setup a test automatic build at https://hub.docker.com/r/elezar/caffe/

@shelhamer shelhamer commented on an outdated diff Feb 19, 2016

docker/standalone/cpu/Dockerfile
@@ -0,0 +1,44 @@
+FROM ubuntu:14.04
+#MAINTAINER fixme@example.com
@shelhamer

shelhamer Feb 19, 2016

Owner

Could you set this to caffe-maint@googlegroups.com ? We've set up this forwarding list to handle packaging and distribution maintenance. Thanks, and sorry for the wait while that was set up.

@shelhamer shelhamer commented on an outdated diff Feb 19, 2016

examples/mnist/train_lenet_docker.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env sh
+set -e
+# The following example allows for the MNIST example (using LeNet) to be run
+# using the caffe docker image instead of building from source.
+# The GPU-enabled version of Caffe can be used, assuming that nvidia-docker
+# is installed, and the GPU-enabled Caffe image has been built.
+# Setting the GPU environment variable to 1 will enable the use of nvidia-docker.
+# e.g.
+# GPU=1 ./examples/mnist/train_lenet_docker.sh
+#
+# Not the use of the -u, -v, and -w command line options to ensure that
@shelhamer

shelhamer Feb 19, 2016

Owner

Not -> Note

@shelhamer shelhamer and 1 other commented on an outdated diff Feb 19, 2016

examples/mnist/train_lenet_docker.sh
+then
+DOCKER_CMD=nvidia-docker
+IMAGE=caffe:gpu
+else
+DOCKER_CMD=docker
+IMAGE=caffe:cpu
+fi
+
+echo "Using $DOCKER_CMD to launch $IMAGE"
+
+$DOCKER_CMD run --rm -ti \
+ -u $(id -u):$(id -g) \
+ -v $(pwd):/workspace \
+ -w /workspace \
+ $IMAGE \
+ caffe train --solver=examples/mnist/lenet_solver.prototxt $*
@shelhamer

shelhamer Feb 19, 2016

Owner

Won't this fail since the MNIST data hasn't been downloaded nor have the lmdb been created? It needs to call the data/mnist and examples/mnist scripts first.

@elezar

elezar Feb 20, 2016

Contributor

@shelhamer You are correct. I have spent the morning looking at the script a little to make it as "standalone" as possible.

One problem is that I am currently on my Macbook, and as Docker runs in a VM there, there are some issues with permissions when trying to write to mapped volumes. I am hoping to get something pushed today though (along with the other fixes required).

Owner

shelhamer commented Feb 19, 2016

(by the way, it seems as if DockerHub does not yet support the ARG keyword in the docker files). I have setup a test automatic build at https://hub.docker.com/r/elezar/caffe/

@elezar right, I just had this issue in trying to use the caffe:cpu image locally. Can you push a fix? Apart from that and my minor comments this looks good to me.

Thanks!

Contributor

elezar commented Feb 20, 2016

While addressing some of the last comments by @shelhamer I noted the following:

One problem is that I am currently on my Macbook, and as Docker runs in a VM there, there are some issues with permissions when trying to write to mapped volumes. I am hoping to get something pushed today though (along with the other fixes required).

When running an operation that creates an LMDB env (for reading OR writing) in a path rooted at a host volume, the following error is shown (for create_mnist.sh):

F0220 17:45:17.755547    19 convert_mnist_data.cpp:98] Check failed: mdb_env_open(mdb_env, db_path, 0, 0664) == 0 (22 vs. 0) mdb_env_open failed

and for caffe:

F0220 17:03:59.670662    10 db_lmdb.hpp:14] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument

When performing both these operations in a non host-mapped path they are successful, and one is also able to copy any generated data to the host path successfully.

I should add that I don't think this is caffe-specific, and I have just tried something similar with the NVIDIA DIGITS image. That is to say, when I run:

docker run --name digits -d -p 8080:34448 -v /Users/elezar/tmp/mnist:/data/mnist -v /Users/elezar/tmp/digits-jobs:/usr/share/digits/digits/jobs nvidia/digits

The creation of the LMDB database fails with the following log output:

2016-02-20 18:01:36 [20160220-180114-e35f] [ERROR] Create DB (train): InvalidParameterError: /usr/share/digits/digits/jobs/20160220-180114-e35f/train_db: Invalid argument
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: Traceback (most recent call last):
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: File "/usr/share/digits/tools/create_db.py", line 770, in <module>
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: hdf5_dset_limit = args['hdf5_dset_limit'],
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: File "/usr/share/digits/tools/create_db.py", line 283, in create_db
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: mean_files, **kwargs)
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: File "/usr/share/digits/tools/create_db.py", line 318, in _create_lmdb
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: max_dbs=0)
2016-02-20 18:01:36 [20160220-180114-e35f] [WARNING] Create DB (train) unrecognized output: lmdb.InvalidParameterError: /usr/share/digits/digits/jobs/20160220-180114-e35f/train_db: Invalid argument

@flx42 have you come across something like this before?

I have tested the DIGITS commands previously on a Linux system at work, with no problems there. I will then test the scripts that I have been working on to ensure that they work on Linux, before pushing them. We can then discuss what our options are for OS X.

@shelhamer shelhamer commented on an outdated diff Feb 20, 2016

docker/README.md
@@ -0,0 +1,52 @@
+# Caffe standalone Dockerfiles.
+
+The `standalone` subfolder contains docker files for generating both CPU and GPU executable images for Caffe. The images can be built using make, or by running:
+
+```
+docker build -t caffe:cpu standalone/cpu
+```
+for example. (Here `gpu` can be substituted for `cpu`, but to keep the readme simple, only the `cpu` case will be discussed in detail).
+
+Note that the GPU standalone requires a CUDA 7.5 capable driver to be installed on the system and [nvidia-docker|https://github.com/NVIDIA/nvidia-docker] for running the Docker containers. Here it is generally sufficient to user `nvidia-docker` instead of `docker` in any of the commands mentioned.
@shelhamer

shelhamer Feb 20, 2016

Owner

to user -> to use

Owner

shelhamer commented Feb 20, 2016

I just realized that this will do the most good if it's highlighted in the installation docs defined by caffe/docs/installation.md. @elezar feel free to include a patch for this or I can follow-up after merge with a docs edit.

Contributor

elezar commented Feb 22, 2016

@shelhamer Sure, I will update the docs too. Could you (and @flx42 and @seanbell) have a look at the updated examples/mnist/train_lenet_docker.sh and let me know if something along these lines would be ok? I have changed the script to be totally standalone (even more so once "official" images are available) so that it is the only file that is required to run the LeNet MNIST example using docker.

Note that the script works under Linux, but after the trouble I had under OSX over the weekend, I don't expect it to be so simple there (or under Windows). I will have to come up with a better solution there, but would like to get this PR in before investigating that further.

On a side note, I am currently building this branch at https://hub.docker.com/r/elezar/caffe but it may be a good idea to get this setup so that the image is associated with a BVLC account. In this case, we can change the script to use a BVLC/caffe image, meaning that no images need to be built locally.

@flx42 flx42 commented on an outdated diff Feb 24, 2016

docker/README.md
@@ -0,0 +1,52 @@
+# Caffe standalone Dockerfiles.
+
+The `standalone` subfolder contains docker files for generating both CPU and GPU executable images for Caffe. The images can be built using make, or by running:
+
+```
+docker build -t caffe:cpu standalone/cpu
+```
+for example. (Here `gpu` can be substituted for `cpu`, but to keep the readme simple, only the `cpu` case will be discussed in detail).
+
+Note that the GPU standalone requires a CUDA 7.5 capable driver to be installed on the system and [nvidia-docker|https://github.com/NVIDIA/nvidia-docker] for running the Docker containers. Here it is generally sufficient to use `nvidia-docker` instead of `docker` in any of the commands mentioned.
@flx42

flx42 Feb 24, 2016

Contributor

Bogus link!
[nvidia-docker| -> [nvidia-docker]

@flx42 flx42 commented on an outdated diff Feb 24, 2016

docker/templates/Dockerfile.template
+ libprotobuf-dev \
+ libsnappy-dev \
+ protobuf-compiler \
+ python-dev \
+ python-numpy \
+ python-pip \
+ python-scipy && \
+ rm -rf /var/lib/apt/lists/*
+
+ENV CAFFE_ROOT=/opt/caffe
+WORKDIR $CAFFE_ROOT
+
+# FIXME: clone a specific git tag and use ARG instead of ENV once DockerHub supports this.
+ENV CLONE_TAG=master
+
+RUN git clone -b ${CLONE_TAG} --single-branch --depth 1 https://github.com/BVLC/caffe.git . && \
@flx42

flx42 Feb 24, 2016

Contributor

--single-branch and --depth 1 looks redundant, according to the manual:

--[no-]single-branch
    Clone only the history leading to the tip of a single branch, either specified by the --branch option or the primary branch remote’s HEAD points at.
    When creating a shallow clone with the --depth option, this is the default

@flx42 flx42 commented on an outdated diff Feb 24, 2016

docker/templates/Dockerfile.template
+ python-pip \
+ python-scipy && \
+ rm -rf /var/lib/apt/lists/*
+
+ENV CAFFE_ROOT=/opt/caffe
+WORKDIR $CAFFE_ROOT
+
+# FIXME: clone a specific git tag and use ARG instead of ENV once DockerHub supports this.
+ENV CLONE_TAG=master
+
+RUN git clone -b ${CLONE_TAG} --single-branch --depth 1 https://github.com/BVLC/caffe.git . && \
+ for req in $(cat python/requirements.txt) pydot; do pip install $req; done && \
+ mkdir build && cd build && \
+ cmake ${CMAKE_ARGS} .. && \
+ make -j"$(nproc)" all && \
+ make -j"$(nproc)" pycaffe
@flx42

flx42 Feb 24, 2016

Contributor

@lukeyeager told me that you don't need to do make pycaffe when using cmake. So just do make -j"$(nproc)" instead, it would save us a line!

Contributor

elezar commented Feb 26, 2016

Should I squash the commits so that this can be merged? I can then update the docs, and use the standalone images as a basis for other development utilities.

Contributor

flx42 commented Feb 26, 2016

On my side, the PR looks good know. But I didn't have time to extensively test the images, mind you.
You could go ahead and squash the commits I guess.

Contributor

elezar commented Feb 26, 2016

I have squashed the commit.

Owner

shelhamer commented Feb 27, 2016

I was able to build the caffe:{cpu,gpu} images and run with docker, nvidia-docker respectively to invoke caffe --version and open a shell.

However, the LeNet docker example fails because nvidia-docker neither recognizes -v nor -w and instead requires the full --volume= and --workdir= arguments. @elezar could you update the example script? @flx42 is this normal with nvidia-docker or did I mess up?

Here is an example call

nvidia-docker run --rm -ti -u 1000:1000 --volume=/home/shelhamer/caffe-dev:/workspace --workdir=/workspace caffe:gpu caffe train --solver=examples/mnist/lenet_solver.prototxt

that ran fine.

Contributor

flx42 commented Feb 27, 2016

@shelhamer Yeah that's a bug in nvidia-docker caused by the Docker 1.10 update, their options are now ambiguous: NVIDIA/nvidia-docker#46
I have a patch for this, I will push it Monday if it passes my tests.
The current workaround is to use long options as you discovered. So, it's not related to this PR; but it doesn't hurt to use long options since it's more explicit anyway.

Owner

shelhamer commented Feb 27, 2016

Alright, well I'm happy to merge once the example works whether by long options in this PR or a fix to nvidia-docker. @elezar feel free to force push an amended commit.

(If you do amend, consider formatting the commit message so it has a "title" line <80 chars and then further description after a newline as it formats better on the command line and on github, but no worries.)

Evan Lezar Add Dockerfiles for creating Caffe executable images.
These can be used as direct replacements for the Caffe executable.
6cba462
Contributor

elezar commented Feb 27, 2016

I have switched to the --volume and --workdir options instead of -v and -w, respectively. These changes have also been reflected in the README. I was only able to run a quick sanity check on OSX. I don't currently have access to a Linux box. @shelhamer if you could just try to run the example again, and merge if you are happy.

@shelhamer shelhamer added a commit that referenced this pull request Feb 27, 2016

@shelhamer shelhamer Merge pull request #3518 from zalando/feature/docker_images
[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)
59d099c

@shelhamer shelhamer merged commit 59d099c into BVLC:master Feb 27, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Owner

shelhamer commented Feb 27, 2016

Thanks for the Caffe containers @elezar ! Next we'll figure out how to include these on the docker hub.

@shelhamer shelhamer added a commit to shelhamer/caffe that referenced this pull request Feb 27, 2016

@shelhamer shelhamer fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
cfa2c0c

@shelhamer shelhamer added a commit that referenced this pull request Feb 27, 2016

@shelhamer shelhamer Merge pull request #3740 from shelhamer/fix-docker-flags
Fix flags for nvidia-docker from #3518
04aa36e
Contributor

flx42 commented Feb 27, 2016

Thank you for the additional fix @shelhamer, this bug is embarrassing for us, we will fix it quickly.

elezar deleted the elezar:feature/docker_images branch Feb 28, 2016

Contributor

elezar commented Feb 28, 2016

Thanks @shelhamer.

As for getting the images onto Docker Hub, I see two options. Either one sets up an automated build on Docker Hub itself (I have already done this for https://hub.docker.com/r/elezar/caffe), or one builds the images as part of some CI/CD steps and pushes them. I think the latter is the route that NVIDIA follows for https://hub.docker.com/r/nvidia/. Maybe @flx42 could comment on this?

I should add that for the automated builds on Docker Hub seem to be failing for the GPU images (with no log output). @flx42 do you have an idea what the reason for this may be?

Contributor

flx42 commented Feb 29, 2016

@elezar: we tried using the automated build on the DockerHub but we gave up on this idea because it was not suitable for our complex model with multiple images (see the discussion here).
In your case, I think both options should work. But be aware that doing automated testing for the GPU images could be challenging.

I don't know why it would fail on the Docker Hub, it used to work when I tested.

@BlGene BlGene added a commit to BlGene/caffe that referenced this pull request Feb 29, 2016

@shelhamer @BlGene shelhamer + BlGene fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
5bc9aaa

@BlGene BlGene added a commit to BlGene/caffe that referenced this pull request Mar 4, 2016

@shelhamer @BlGene shelhamer + BlGene fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
06690ff

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Mar 17, 2016

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
4ca67f2

Is wget used here? I am not seeing it. Is this hidden in some other step here?

Contributor

elezar replied Mar 18, 2016

Ok, thanks for clarifying. Is there a way of disabling this download? I imagine these datasets can get quite big depending on how much you are including.

I noticed here that pydot is being installed. Though, I didn't see graphviz being installed. Is this actually be used for anything other than to resolve Python imports?

Contributor

elezar replied Mar 22, 2016

@SvenTwo SvenTwo added a commit to SvenTwo/caffe that referenced this pull request Apr 6, 2016

@shelhamer @SvenTwo shelhamer + SvenTwo fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
f6c4879

@fxbit fxbit added a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

@shelhamer @fxbit shelhamer + fxbit fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
94c4db6

@fxbit fxbit added a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

@shelhamer @fxbit shelhamer + fxbit Merge pull request #3518 from zalando/feature/docker_images
[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)
4a75fa9

@fxbit fxbit added a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

@shelhamer @fxbit shelhamer + fxbit Merge pull request #3740 from shelhamer/fix-docker-flags
Fix flags for nvidia-docker from #3518
8ebb8bf

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Oct 24, 2016

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
a175281

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Oct 24, 2016

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
0770031

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
f9658a6

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
844f732

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
7f1dc98

@zouxiaochuan zouxiaochuan added a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

@shelhamer @zouxiaochuan shelhamer + zouxiaochuan fix flags in #3518 for nvidia-docker
nvidia-docker requires long args with equal sign as of docker 1.10:
see BVLC#3518 (comment)
029f431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment