Making the nvidia/cuda automated repo #18

ruffsl · 2015-12-03T21:30:15Z

Thanks for making an cuda docker repo! One suggestion I'd make would be to make the nvidia/cuda docker hub repo to and automated repo. This repo could be useful as a testbed to test any future tags before official submission, but making it automated could really save time on maintenance in keeping the images up to date with the Dockerfiles. That's how we use the use them at osrf/ros. A neat thing also is use the git-repo's README.md to rendder the discription in the docker-repo, see Understand the build process.

flx42 · 2015-12-03T22:04:33Z

Yes, it's definitely on our list!

sheerun · 2015-12-04T21:45:10Z

btw. sometimes is easier to setup circleci and push build to hub instead

ruffsl · 2015-12-04T23:54:32Z

Another argument for automated repos would be that for others who create automated repos that happen build from of your image, it become trivial for those same people to enable a triggered build within the docker hub ecosystem. So when the Nvidia image updates with fixes, so to do users', Again the same could be done using web hooks and API calls, but keeping it simple with the docker hub interface makes it pleasant for newer users.

UniqueFool · 2015-12-05T11:05:17Z

The Phoronix test suite comes with OpenCL support, so could be useful to do regression-testing for the automated repo: http://www.phoronix.com/scan.php?page=article&item=nvidia-amd-opencl-2015&num=1

flx42 · 2015-12-10T07:22:12Z

@ruffsl For osrf/ros it looks like you also have multiple Dockerfiles with dependencies that mandates a specific order of build. How did you setup an automated build with these constraints? All the builds seems to start in parallel and thus I can't create the devel images properly since they depend on the runtime images.

@UniqueFool The problem with CI and testing is that I'm not currently aware of an open-source CI solution that would allow us to run GPU tests. We have internal solutions of course, but it will be more complex to integrate to GitHub. I will continue evaluating the solutions.

sheerun · 2015-12-10T09:24:59Z

You can just run build on CI, without testing:
https://circleci.com/docs/docker#deployment-to-a-docker-registry

flx42 · 2015-12-10T17:53:05Z

Sure, but it would be more convenient to deploy and test with the same solution. But indeed, the short-term solution could be to only automate the builds for now.

sheerun · 2015-12-10T18:42:38Z

You need to build it on CircleCI before testing anyway ;) So it's good first step to build + upload first.

ruffsl · 2015-12-10T21:12:14Z

@flx42 , yes I've noticed this. Looking at the build details recording the build logs, I'm seeing the start times for each tag to have been triggered roughly simultaneously, with one of my higher level tags starting first. I'm rather sure the official repos do not suffer the same shortcomings ( although perhaps I've not noticed thanks to how often the upstream Ubuntu image rebuilds and triggers everything else), but I'm uncertain how to invoke the same build order in a single user repo.

I've asked about this before, but was suggested to just re-trigger the build until cascading images reach steady state, I think this is a bit silly. Another approach I first used was to break up my tags into separate repos, like suggested here. This was a bit of a hassle to manage, but did insure that an sequential order is followed. Perhaps cuda runtime and development docker repos could be separate, but the lack of tag level vs repo level triggering would be hampering to further tag specific builds. Let me dig around, perhaps something has come along since I've last looked into this. Pinging @yosifkit or @tianon ?

yosifkit · 2015-12-10T22:24:45Z

I've not seen any change on the Docker Hub that would allow images to depend upon another tag in the same repo. This is one of the reasons that the official images do not use automated builds.

flx42 · 2015-12-11T00:01:47Z

@ruffsl It looks like it's worse than this. When I start my build using a POST request, all the builds start in parallel and then all the devel builds immediately fail since they depend on the runtime images.

Since all the runtime Dockerfiles for 6.5, 7.0 and 7.5 start with these lines:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L1-L10
My runtime images should be able to share the layers for those commands, but since they are built in parallel, it's not the case (except for the ubuntu layers, obviously):

$ docker history flx42/cuda:7.0-runtime
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]             
9a4be293a841        19 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.0          0 B                 
7410b9a2414b        19 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
bac2ad43afa4        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
18e862dcdeec        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
62e3850cc26d        19 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492   187.7 MB 

$ docker history flx42/cuda:7.5-runtime
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]             
92aaf1c5e65b        19 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.5          0 B                 
83968d3d71cb        19 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
ee4242ccf3fd        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
919a687073ec        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
04c48fe576ca        19 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492   187.7 MB

Images runtime and 7.5-runtime use the same Dockerfile, but for the same reason they have no common layer except from ubuntu. I didn't find a way to output multiple tags from a single automated build.

In my personal github (https://github.com/flx42/nvidia-docker) I modified the devel images to do FROM flx42/cuda:tag instead of FROM cuda:tag. This should allow me to build my devel images with a second POST request, right?
Well, yes, but it's rebuilding all the images, even the runtime images. This is costly and it also means that my runtime images will get overwritten.
My devel images will build this time, but they will be built on the older runtime images:

docker history flx42/cuda:7.5-devel
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]              
92aaf1c5e65b        30 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.5          0 B                 
83968d3d71cb        30 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
ee4242ccf3fd        30 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
919a687073ec        30 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
04c48fe576ca        30 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492

The first layers are the same as above for flx42/cuda:7.5-runtime.
But now flx42/cuda:7.5-runtime is different:

$ docker pull flx42/cuda:7.5-runtime
7.5-runtime: Pulling from flx42/cuda
[redownloading everything]

$ docker history flx42/cuda:7.5-runtime
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]               
d6f056622afd        11 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.5          0 B                 
52756da1d17b        11 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
b179bdd62a38        11 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
27ffaa5d4438        11 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
3cec27432703        11 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492   187.7 MB

So, we are needlessly duplicating layers that are physically the same. And since everything is rebuilt all the time, the user will have to fetch new layers even when the image they use didn't change.

ruffsl · 2016-01-09T20:13:12Z

@flx42 , you are correct. Given the limitations of the automated build mechanics, it doesn't seem currently possible to host an automated repo on Docker Hub. Like yosifkit mentioned, official images are not built using the same rules, and so once the commits to the cuda dockerfiles settle down, this would be a nice channel to distribute images updated with upstream sources.

This all take wind out of the sails for CI testing the master branch, but I suppose sheerun or UniqueFool suggestions along with the Makefiles you've already written would work well to automate pushing current images to the NVIDIA org repo for public review given triggered events on the master branch.

flx42 · 2016-01-11T23:58:40Z

Let's give up on the Docker Hub automated repo for now. CI remains an option so I will not close this issue yet.

ruffsl · 2016-02-01T03:39:51Z

@flx42 , On a side note, you may want to keep around these links or put them in the readme/wiki somewhere for others (for at least the ubuntu tags):

I just added something similar for the official ros repo and found it as a nice method for visually verifying parent image lineage.

flx42 · 2017-01-18T00:37:04Z

One year after... it's finally automated! We decided to use GitLab CI since it gives us more control on what we can do. Example of a pipeline run: https://gitlab.com/nvidia/cuda/pipelines/5876874
With GitLab CI it will also be possible to add our own machines to run GPU tests on the generated images. We already do this internally.
Closing, finally.

This was referenced Dec 4, 2015

OpenCL support #17

Closed

Split cuda image into cuda-runtime and cuda-devel #14

Closed

flx42 mentioned this issue Dec 10, 2015

Official Library Image in Docker Hub #7

Closed

sheerun mentioned this issue Dec 11, 2015

Support COPY from other builds moby/moby#18596

Closed

3XX0 added the enhancement label Jan 9, 2016

ruffsl mentioned this issue Jan 13, 2016

Latest images on docker hub don't contain latest ROS packages osrf/docker_images#27

Closed

flx42 mentioned this issue Feb 29, 2016

Add a docker image for caffe development. BVLC/caffe#3518

Merged

flx42 closed this as completed Jan 18, 2017

This was referenced Mar 12, 2020

docker: Error response from daemon #1217

Closed

Unable to create container #1218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making the nvidia/cuda automated repo #18

Making the nvidia/cuda automated repo #18

ruffsl commented Dec 3, 2015

flx42 commented Dec 3, 2015

sheerun commented Dec 4, 2015

ruffsl commented Dec 4, 2015

UniqueFool commented Dec 5, 2015

flx42 commented Dec 10, 2015

sheerun commented Dec 10, 2015

flx42 commented Dec 10, 2015

sheerun commented Dec 10, 2015

ruffsl commented Dec 10, 2015

yosifkit commented Dec 10, 2015

flx42 commented Dec 11, 2015

ruffsl commented Jan 9, 2016

flx42 commented Jan 11, 2016

ruffsl commented Feb 1, 2016

flx42 commented Jan 18, 2017 •

edited

Making the nvidia/cuda automated repo #18

Making the nvidia/cuda automated repo #18

Comments

ruffsl commented Dec 3, 2015

flx42 commented Dec 3, 2015

sheerun commented Dec 4, 2015

ruffsl commented Dec 4, 2015

UniqueFool commented Dec 5, 2015

flx42 commented Dec 10, 2015

sheerun commented Dec 10, 2015

flx42 commented Dec 10, 2015

sheerun commented Dec 10, 2015

ruffsl commented Dec 10, 2015

yosifkit commented Dec 10, 2015

flx42 commented Dec 11, 2015

ruffsl commented Jan 9, 2016

flx42 commented Jan 11, 2016

ruffsl commented Feb 1, 2016

flx42 commented Jan 18, 2017 • edited

flx42 commented Jan 18, 2017 •

edited