Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Making the nvidia/cuda automated repo #18

Closed
ruffsl opened this issue Dec 3, 2015 · 15 comments
Closed

Making the nvidia/cuda automated repo #18

ruffsl opened this issue Dec 3, 2015 · 15 comments

Comments

@ruffsl
Copy link
Contributor

ruffsl commented Dec 3, 2015

Thanks for making an cuda docker repo! One suggestion I'd make would be to make the nvidia/cuda docker hub repo to and automated repo. This repo could be useful as a testbed to test any future tags before official submission, but making it automated could really save time on maintenance in keeping the images up to date with the Dockerfiles. That's how we use the use them at osrf/ros. A neat thing also is use the git-repo's README.md to rendder the discription in the docker-repo, see Understand the build process.

@flx42
Copy link
Member

flx42 commented Dec 3, 2015

Yes, it's definitely on our list!

@sheerun
Copy link

sheerun commented Dec 4, 2015

btw. sometimes is easier to setup circleci and push build to hub instead

@ruffsl
Copy link
Contributor Author

ruffsl commented Dec 4, 2015

Another argument for automated repos would be that for others who create automated repos that happen build from of your image, it become trivial for those same people to enable a triggered build within the docker hub ecosystem. So when the Nvidia image updates with fixes, so to do users', Again the same could be done using web hooks and API calls, but keeping it simple with the docker hub interface makes it pleasant for newer users.

@UniqueFool
Copy link

The Phoronix test suite comes with OpenCL support, so could be useful to do regression-testing for the automated repo: http://www.phoronix.com/scan.php?page=article&item=nvidia-amd-opencl-2015&num=1

@flx42
Copy link
Member

flx42 commented Dec 10, 2015

@ruffsl For osrf/ros it looks like you also have multiple Dockerfiles with dependencies that mandates a specific order of build. How did you setup an automated build with these constraints? All the builds seems to start in parallel and thus I can't create the devel images properly since they depend on the runtime images.

@UniqueFool The problem with CI and testing is that I'm not currently aware of an open-source CI solution that would allow us to run GPU tests. We have internal solutions of course, but it will be more complex to integrate to GitHub. I will continue evaluating the solutions.

@sheerun
Copy link

sheerun commented Dec 10, 2015

You can just run build on CI, without testing:
https://circleci.com/docs/docker#deployment-to-a-docker-registry

@flx42
Copy link
Member

flx42 commented Dec 10, 2015

Sure, but it would be more convenient to deploy and test with the same solution. But indeed, the short-term solution could be to only automate the builds for now.

@sheerun
Copy link

sheerun commented Dec 10, 2015

You need to build it on CircleCI before testing anyway ;) So it's good first step to build + upload first.

@ruffsl
Copy link
Contributor Author

ruffsl commented Dec 10, 2015

@flx42 , yes I've noticed this. Looking at the build details recording the build logs, I'm seeing the start times for each tag to have been triggered roughly simultaneously, with one of my higher level tags starting first. I'm rather sure the official repos do not suffer the same shortcomings ( although perhaps I've not noticed thanks to how often the upstream Ubuntu image rebuilds and triggers everything else), but I'm uncertain how to invoke the same build order in a single user repo.

I've asked about this before, but was suggested to just re-trigger the build until cascading images reach steady state, I think this is a bit silly. Another approach I first used was to break up my tags into separate repos, like suggested here. This was a bit of a hassle to manage, but did insure that an sequential order is followed. Perhaps cuda runtime and development docker repos could be separate, but the lack of tag level vs repo level triggering would be hampering to further tag specific builds. Let me dig around, perhaps something has come along since I've last looked into this. Pinging @yosifkit or @tianon ?

@yosifkit
Copy link

I've not seen any change on the Docker Hub that would allow images to depend upon another tag in the same repo. This is one of the reasons that the official images do not use automated builds.

@flx42
Copy link
Member

flx42 commented Dec 11, 2015

@ruffsl It looks like it's worse than this. When I start my build using a POST request, all the builds start in parallel and then all the devel builds immediately fail since they depend on the runtime images.

Since all the runtime Dockerfiles for 6.5, 7.0 and 7.5 start with these lines:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L1-L10
My runtime images should be able to share the layers for those commands, but since they are built in parallel, it's not the case (except for the ubuntu layers, obviously):

$ docker history flx42/cuda:7.0-runtime
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]             
9a4be293a841        19 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.0          0 B                 
7410b9a2414b        19 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
bac2ad43afa4        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
18e862dcdeec        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
62e3850cc26d        19 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492   187.7 MB 

$ docker history flx42/cuda:7.5-runtime
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]             
92aaf1c5e65b        19 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.5          0 B                 
83968d3d71cb        19 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
ee4242ccf3fd        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
919a687073ec        19 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
04c48fe576ca        19 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492   187.7 MB 

Images runtime and 7.5-runtime use the same Dockerfile, but for the same reason they have no common layer except from ubuntu. I didn't find a way to output multiple tags from a single automated build.

In my personal github (https://github.com/flx42/nvidia-docker) I modified the devel images to do FROM flx42/cuda:tag instead of FROM cuda:tag. This should allow me to build my devel images with a second POST request, right?
Well, yes, but it's rebuilding all the images, even the runtime images. This is costly and it also means that my runtime images will get overwritten.
My devel images will build this time, but they will be built on the older runtime images:

docker history flx42/cuda:7.5-devel
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]              
92aaf1c5e65b        30 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.5          0 B                 
83968d3d71cb        30 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
ee4242ccf3fd        30 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
919a687073ec        30 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
04c48fe576ca        30 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492 

The first layers are the same as above for flx42/cuda:7.5-runtime.
But now flx42/cuda:7.5-runtime is different:

$ docker pull flx42/cuda:7.5-runtime
7.5-runtime: Pulling from flx42/cuda
[redownloading everything]

$ docker history flx42/cuda:7.5-runtime
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
[...]               
d6f056622afd        11 minutes ago      /bin/sh -c #(nop) ENV CUDA_VERSION=7.5          0 B                 
52756da1d17b        11 minutes ago      /bin/sh -c apt-key adv --fetch-keys http://de   25.66 kB            
b179bdd62a38        11 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_FPR=889be   0 B                 
27ffaa5d4438        11 minutes ago      /bin/sh -c #(nop) ENV NVIDIA_GPGKEY_SUM=bd841   0 B                 
3cec27432703        11 minutes ago      /bin/sh -c #(nop) MAINTAINER NVIDIA CORPORATI   0 B                 
89d5d8e8bafb        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
e24428725dd6        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB            
1796d1c62d0c        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
0bf056161913        2 days ago          /bin/sh -c #(nop) ADD file:9b5ba3935021955492   187.7 MB 

So, we are needlessly duplicating layers that are physically the same. And since everything is rebuilt all the time, the user will have to fetch new layers even when the image they use didn't change.

@ruffsl
Copy link
Contributor Author

ruffsl commented Jan 9, 2016

@flx42 , you are correct. Given the limitations of the automated build mechanics, it doesn't seem currently possible to host an automated repo on Docker Hub. Like yosifkit mentioned, official images are not built using the same rules, and so once the commits to the cuda dockerfiles settle down, this would be a nice channel to distribute images updated with upstream sources.

This all take wind out of the sails for CI testing the master branch, but I suppose sheerun or UniqueFool suggestions along with the Makefiles you've already written would work well to automate pushing current images to the NVIDIA org repo for public review given triggered events on the master branch.

@flx42
Copy link
Member

flx42 commented Jan 11, 2016

Let's give up on the Docker Hub automated repo for now. CI remains an option so I will not close this issue yet.

@ruffsl
Copy link
Contributor Author

ruffsl commented Feb 1, 2016

@flx42 , On a side note, you may want to keep around these links or put them in the readme/wiki somewhere for others (for at least the ubuntu tags):

image

I just added something similar for the official ros repo and found it as a nice method for visually verifying parent image lineage.

@flx42
Copy link
Member

flx42 commented Jan 18, 2017

One year after... it's finally automated! We decided to use GitLab CI since it gives us more control on what we can do. Example of a pipeline run: https://gitlab.com/nvidia/cuda/pipelines/5876874
With GitLab CI it will also be possible to add our own machines to run GPU tests on the generated images. We already do this internally.
Closing, finally.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants