New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pre-built docker images to speed up CI #432
Conversation
ci/htcondor.sh
Outdated
@@ -3,26 +3,21 @@ | |||
function jobqueue_before_install { | |||
docker version | |||
|
|||
docker run -d --name jobqueue-htcondor-mini htcondor/mini:el7 # might fail if called as script | |||
docker run --rm -d --name jobqueue-htcondor-mini tillkit/dask-jobqueue-ci:htcondor #TODO: changeme on final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
htcondor will be changed to docker-compose too, I did not want to put it into one PR, but same as compose: imagename needs to be adjusted once an official autobuild is there
ci/htcondor.sh
Outdated
pip3 install -e .; | ||
pip3 install pytest; | ||
rm -f /var/log/condor/* | ||
pip3 install --upgrade --upgrade-strategy eager -e .; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something that should probably done on the images to check changes in dask and distributed
ci/htcondor.sh
Outdated
chown -R submituser:submituser /dask-jobqueue | ||
pip3 freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a bit unrelated but I could not help it :)
ci/htcondor.sh
Outdated
@@ -40,3 +35,8 @@ function jobqueue_after_script { | |||
" | |||
} | |||
[[ "${BASH_SOURCE[0]}" != "${0}" ]] || jobqueue_after_script # excute if called as script | |||
|
|||
function jobqueue_cleanup { | |||
docker stop jobqueue-htcondor-mini |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this should also not be in the PR
ci/htcondor/Dockerfile
Outdated
@@ -0,0 +1,9 @@ | |||
# base image | |||
ARG HTCONDOR_VERSION | |||
FROM htcondor/mini:${HTCONDOR_VERSION}${HTCONDOR_VERSION:+-}el7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed to have a HTCondor image. Actually for HTCondor there is close to 0 speedup.
@@ -6,6 +6,7 @@ function jobqueue_before_install { | |||
|
|||
# start pbs cluster | |||
cd ./ci/pbs | |||
docker-compose pull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be needed to not build the docker files (I kept the build lines)
ci/pbs/docker-compose.yml
Outdated
@@ -3,6 +3,7 @@ version: "2" | |||
services: | |||
|
|||
master: | |||
image: tillkit/dask-jobqueue-ci:pbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be changed to the official
@@ -1,4 +1,4 @@ | |||
FROM ubuntu:14.04 | |||
FROM ubuntu:14.04 as base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this a multistage build because otherwise this will generate independent layers and the pull time increases
@@ -14,6 +14,11 @@ RUN conda install -c conda-forge python=$PYTHON_VERSION dask distributed pytest | |||
|
|||
COPY ./*.sh / | |||
COPY ./*.txt / | |||
|
|||
FROM base as slave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tag this target with a hook later
@@ -1,11 +1,12 @@ | |||
version: "2.1" | |||
version: "3.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to support targets for the builds
build: | ||
context: . | ||
dockerfile: Dockerfile-master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed and replaced with multistage target
ci/sge/hooks/post_build
Outdated
@@ -0,0 +1,3 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is automatically executed once the master image is build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems related to autodocker IIUC (e.g. https://docs.docker.com/docker-hub/builds/advanced/#custom-build-phase-hooks)? Can we remove it?
Actually slurm is still building. damn |
@lesteve , it works now. I have 2 questions:
|
Very nice to see this progress!
In general I prefer to keep PRs focused on one thing as much as possible because it makes it so much easier to review. If this is not too much work to remove the HTCondor stuff, that would be great!
No strong opinion on this. I would say this would be probably faster to create our own project in dockerhub e.g. dask-jobqueue (or daskjobqueue maybe you can not have dashes in the name?). Another Dask project that has its own project on dockerhub it like this is dask-gateway (https://hub.docker.com/u/daskgateway). I am not sure what the daskdev images are used for to be honest. Also the potentiality for reuse of our docker images outside of Dask-Jobqueue is very small I would say. |
This reverts commit 0097335.
I see no reason why we couldn't add folks to the existing project.
There are a bunch of things in dask-gateway that I would like to consolidate back into mainstream dask. Consolidating the helm repo is one, docker images another.
They are used by dask-kubernetes, dask-cloudprovider, the Dask Helm Chart and generally just folks wanting to use Dask in Docker.
While true I see no harm in consolidating in a single location. Also helps share out maintenance with multiple people being able to access them. In the same way the dask-jobqueue repo lives under the dask org on GitHub. |
done (was luckily in a separate commit)
I see some potential use for people to test their jobs (at home) before submitting to clusters (I like to do this sometimes without good internet). Sure would need a bit more docs |
Thanks @jacobtomlinson for your detailed answer! I am a complete newbie regarding dockerhub, so I guess what we would need is something like:
Is there is anything else I am missing? |
Sure. Happy to add you to the org on Docker Hub. I'll follow the standard Dask process of requesting another owner approval by email. I'll let you know when this is done. I tend to avoid the auto builds on Docker Hub as they are often slow and you have limited control over them. They work by waiting for GitHub webhooks and trigger a build on their side when you push to master. However we are likely already building the Docker image as part of CI on master in order to test it, so it makes sense to push that image up to Docker Hub instead. Checkout the deploy script we have in dask/dask-docker. It needs you to configure a |
it is a bit of a chicken and egg problem here as the goal was not rebuilding them too often, but I left the code for building in there.
Maybe travis could check the age and rather use build instead of pull once a week. everything seems a bit hacky.
Sure an option would be creating a separate repo like in dask-docker but this would quite a change.
Maybe a branch with a different CI target?
…________________________________
Von: Jacob Tomlinson <notifications@github.com>
Gesendet: Freitag, 22. Mai 2020 12:23
An: dask/dask-jobqueue
Cc: Till Riedel; Author
Betreff: Re: [dask/dask-jobqueue] [WIP] use docker autobuilds to speed up CI (#432)
Sure. Happy to add you to the org on Docker Hub. I'll follow the standard Dask process of requesting another owner approval by email. I'll let you know when this is done.
I tend to avoid the auto builds on Docker Hub as they are often slow and you have limited control over them. They work by waiting for GitHub webhooks and trigger a build on their side when you push to master.
However we are likely already building the Docker image as part of CI on master in order to test it, so it makes sense to push that image up to Docker Hub instead.
Checkout the deploy script we have in dask/dask-docker. It needs you to configure a DOCKER_USERNAME and DOCKER_PASSWORD in the CI secrets but then pushes all images based on the Docker compose config.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I would be in favour favour of something like this:
|
I've added @lesteve to Docker Hub, @guillaumeeb if you can share your Docker Hub username I'll add you too. Feel free to ping me if you want any more input. |
I will quickly add a build target to the ci/x.sh and you can configure travis. Ideally the travis job will run the push after running the other tests though to keep the CI from breaking.
would be perfect if it could be triggered manually on PRs that contain dockerfile changes. But this is an extra and i am no travis guru...
________________________________
Von: Loïc Estève <notifications@github.com>
Gesendet: Freitag, 22. Mai 2020 13:24
An: dask/dask-jobqueue
Cc: Till Riedel; Author
Betreff: Re: [dask/dask-jobqueue] [WIP] use docker autobuilds to speed up CI (#432)
Maybe travis could check the age and rather use build instead of pull once a week. everything seems a bit hacky.
I would be in favour favour of something like this:
a weekly Travis Cron job builds and pushes the Docker image
the "normal" CI would reuse the Docker image and not build it
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I know it's possible to do this kind of thing on GitHub Actions. Haven't tried on Travis. |
It's also guillaumeeb! Thanks @jacobtomlinson. |
It seems you cannot distinguish multiple cron jobs on one branch easily (travis-ci/travis-ci#8820). I check the day of the week : I also looked at the deploy script, that looks really complicated: Is there any reason not to just use If you set up a docker repo and set the DOCKER_USERNAME and DOCKER_PASSWORD in the CI secrets I can give it a try. |
I am going to push the docker images by hand to One that is done I will merge this PR. We can work on building the Docker images automatically in a weekly Travis Cron job in a separate PR. Let me know if you see a problem with this approach! |
Thanks @lesteve for this very good suggestion! Learning a lot on how to do PR in small batches. (I am always worried that once the normal work week starts I will run out of time and thus I am really happy to get things merged early, so nobody needs to wait for me). I will then first be able to finally rework the htcondor bit. (Maybe we can then even hold the weekly schedule before the next PR :) ) |
I removed what I thought was a autodocker-related file in the last commit. If this passes, I'll 'merge this one. |
OK the build is green, merging this one, thanks a lot @riedel! About small PRs yeah as a maintainer this is so much easier to review ... as a PR submitter it is a bit of a pain sometimes ... so need to find a compromise in between (not that easy). To be fair, in this particular case, the fact that you don't have permissions to set-up things yourself adds some friction. |
Just a note I have found creating CI jobs with various cron schedules is much easier in GitHub Actions. Don't know if you were thinking about migrating things there, but it is nice to configure and use. |
It seems that some projects have migrated to Github Actions, I have to say I was not plannning to do it in the near future ... but if someone wants to have a look and do a PR that would be more than welcome ! |
addressing #424, started with htcondor just to check travis, then will add others