Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pre-built docker images to speed up CI #432

Merged
merged 6 commits into from May 23, 2020

Conversation

riedel
Copy link
Member

@riedel riedel commented May 21, 2020

addressing #424, started with htcondor just to check travis, then will add others

@riedel
Copy link
Member Author

riedel commented May 21, 2020

This is the current autobuild config:
grafik

@riedel riedel changed the title [WIP] Try docker autobuilds to speed up CI [WIP] use docker autobuilds to speed up CI May 21, 2020
ci/htcondor.sh Outdated
@@ -3,26 +3,21 @@
function jobqueue_before_install {
docker version

docker run -d --name jobqueue-htcondor-mini htcondor/mini:el7 # might fail if called as script
docker run --rm -d --name jobqueue-htcondor-mini tillkit/dask-jobqueue-ci:htcondor #TODO: changeme on final
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

htcondor will be changed to docker-compose too, I did not want to put it into one PR, but same as compose: imagename needs to be adjusted once an official autobuild is there

ci/htcondor.sh Outdated
pip3 install -e .;
pip3 install pytest;
rm -f /var/log/condor/*
pip3 install --upgrade --upgrade-strategy eager -e .;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something that should probably done on the images to check changes in dask and distributed

ci/htcondor.sh Outdated
chown -R submituser:submituser /dask-jobqueue
pip3 freeze
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bit unrelated but I could not help it :)

ci/htcondor.sh Outdated
@@ -40,3 +35,8 @@ function jobqueue_after_script {
"
}
[[ "${BASH_SOURCE[0]}" != "${0}" ]] || jobqueue_after_script # excute if called as script

function jobqueue_cleanup {
docker stop jobqueue-htcondor-mini
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should also not be in the PR

@@ -0,0 +1,9 @@
# base image
ARG HTCONDOR_VERSION
FROM htcondor/mini:${HTCONDOR_VERSION}${HTCONDOR_VERSION:+-}el7
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to have a HTCondor image. Actually for HTCondor there is close to 0 speedup.

@@ -6,6 +6,7 @@ function jobqueue_before_install {

# start pbs cluster
cd ./ci/pbs
docker-compose pull
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be needed to not build the docker files (I kept the build lines)

@@ -3,6 +3,7 @@ version: "2"
services:

master:
image: tillkit/dask-jobqueue-ci:pbs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be changed to the official

@@ -1,4 +1,4 @@
FROM ubuntu:14.04
FROM ubuntu:14.04 as base
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a multistage build because otherwise this will generate independent layers and the pull time increases

@@ -14,6 +14,11 @@ RUN conda install -c conda-forge python=$PYTHON_VERSION dask distributed pytest

COPY ./*.sh /
COPY ./*.txt /

FROM base as slave
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tag this target with a hook later

@@ -1,11 +1,12 @@
version: "2.1"
version: "3.4"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to support targets for the builds

build:
context: .
dockerfile: Dockerfile-master
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and replaced with multistage target

@@ -0,0 +1,3 @@
#!/bin/bash
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is automatically executed once the master image is build

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems related to autodocker IIUC (e.g. https://docs.docker.com/docker-hub/builds/advanced/#custom-build-phase-hooks)? Can we remove it?

@riedel
Copy link
Member Author

riedel commented May 21, 2020

Some speedup, but some are slower. The caching of travis is probably relevant here. The package caches might be much "warmer" that the docker image speed.

before:

grafik

after:

grafik

@riedel
Copy link
Member Author

riedel commented May 21, 2020

Actually slurm is still building. damn

@riedel
Copy link
Member Author

riedel commented May 21, 2020

There is probably a lot of room for optimization, but now I see at least either same speed or speed up:

grafik

@riedel
Copy link
Member Author

riedel commented May 21, 2020

@lesteve , it works now. I have 2 questions:

@lesteve
Copy link
Member

lesteve commented May 22, 2020

Very nice to see this progress!

Is it OK if, remove all HTCondor stuff from this PR to make it look smaller and do it together with everything related to #427

In general I prefer to keep PRs focused on one thing as much as possible because it makes it so much easier to review. If this is not too much work to remove the HTCondor stuff, that would be great!

should the autobuilt docker images go into hub.docker.com/u/daskdev ? (@jacobtomlinson, would you be the right one to do this? )

No strong opinion on this. I would say this would be probably faster to create our own project in dockerhub e.g. dask-jobqueue (or daskjobqueue maybe you can not have dashes in the name?). Another Dask project that has its own project on dockerhub it like this is dask-gateway (https://hub.docker.com/u/daskgateway). I am not sure what the daskdev images are used for to be honest. Also the potentiality for reuse of our docker images outside of Dask-Jobqueue is very small I would say.

@guillaumeeb
Copy link
Member

If I'm not mistaken, this PR will fix #52. In this one @mrocklin offered to give daskdev credentials if needed. This could be a good place.

@jacobtomlinson
Copy link
Member

I would say this would be probably faster to create our own project in dockerhub

I see no reason why we couldn't add folks to the existing project.

Another Dask project that has its own project on dockerhub it like this is dask-gateway (https://hub.docker.com/u/daskgateway).

There are a bunch of things in dask-gateway that I would like to consolidate back into mainstream dask. Consolidating the helm repo is one, docker images another.

I am not sure what the daskdev images are used for to be honest.

They are used by dask-kubernetes, dask-cloudprovider, the Dask Helm Chart and generally just folks wanting to use Dask in Docker.

Also the potentiality for reuse of our docker images outside of Dask-Jobqueue is very small I would say.

While true I see no harm in consolidating in a single location. Also helps share out maintenance with multiple people being able to access them. In the same way the dask-jobqueue repo lives under the dask org on GitHub.

@riedel
Copy link
Member Author

riedel commented May 22, 2020

In general I prefer to keep PRs focused on one thing as much as possible because it makes it so much easier to review. If this is not too much work to remove the HTCondor stuff, that would be great!

done (was luckily in a separate commit)

Also the potentiality for reuse of our docker images outside of Dask-Jobqueue is very small I would say.

I see some potential use for people to test their jobs (at home) before submitting to clusters (I like to do this sometimes without good internet). Sure would need a bit more docs

@lesteve
Copy link
Member

lesteve commented May 22, 2020

Thanks @jacobtomlinson for your detailed answer!

I am a complete newbie regarding dockerhub, so I guess what we would need is something like:

  • necessary rights on dockerhub to push Docker images on daskdev. I am guessing this is done at an individual level? If that's the case, could you give the necessary rights to @guillaumeeb and me as a start? My username on dockerhub is lesteve.
  • some inputs (e.g. a link) as to how I enable autodocker on the dask-jobqueue/dask-jobqueue repo and configure it to push into daskdev/dask-jobqueue-ci on DockerHub

Is there is anything else I am missing?

@jacobtomlinson
Copy link
Member

Sure. Happy to add you to the org on Docker Hub. I'll follow the standard Dask process of requesting another owner approval by email. I'll let you know when this is done.

I tend to avoid the auto builds on Docker Hub as they are often slow and you have limited control over them. They work by waiting for GitHub webhooks and trigger a build on their side when you push to master.

However we are likely already building the Docker image as part of CI on master in order to test it, so it makes sense to push that image up to Docker Hub instead.

Checkout the deploy script we have in dask/dask-docker. It needs you to configure a DOCKER_USERNAME and DOCKER_PASSWORD in the CI secrets but then pushes all images based on the Docker compose config.

@riedel
Copy link
Member Author

riedel commented May 22, 2020 via email

@lesteve
Copy link
Member

lesteve commented May 22, 2020

Maybe travis could check the age and rather use build instead of pull once a week. everything seems a bit hacky.

I would be in favour favour of something like this:

  • a weekly Travis Cron job builds and pushes the Docker image
  • the "normal" CI would reuse the Docker image and not build it

@jacobtomlinson
Copy link
Member

I've added @lesteve to Docker Hub, @guillaumeeb if you can share your Docker Hub username I'll add you too.

Feel free to ping me if you want any more input.

@riedel
Copy link
Member Author

riedel commented May 22, 2020 via email

@jacobtomlinson
Copy link
Member

would be perfect if it could be triggered manually on PRs that contain dockerfile changes.

I know it's possible to do this kind of thing on GitHub Actions. Haven't tried on Travis.

@guillaumeeb
Copy link
Member

@guillaumeeb if you can share your Docker Hub username I'll add you too.

It's also guillaumeeb! Thanks @jacobtomlinson.

@riedel
Copy link
Member Author

riedel commented May 22, 2020

It seems you cannot distinguish multiple cron jobs on one branch easily (travis-ci/travis-ci#8820).

I check the day of the week : [[ $TRAVIS_EVENT_TYPE == "cron" && $(date +%u) == 5 ]] && docker-compose build

I also looked at the deploy script, that looks really complicated: Is there any reason not to just use docker-compose push? (works for me locally).

If you set up a docker repo and set the DOCKER_USERNAME and DOCKER_PASSWORD in the CI secrets I can give it a try.

@lesteve
Copy link
Member

lesteve commented May 23, 2020

I am going to push the docker images by hand to daskdev/dask-jobqueue (using this PR code) and make sure that we can reuse them in the CI with this PR CI.

One that is done I will merge this PR. We can work on building the Docker images automatically in a weekly Travis Cron job in a separate PR.

Let me know if you see a problem with this approach!

@riedel riedel marked this pull request as ready for review May 23, 2020 08:52
@riedel riedel changed the title [WIP] use docker autobuilds to speed up CI use docker autobuilds to speed up CI May 23, 2020
@riedel riedel changed the title use docker autobuilds to speed up CI use docker builds to speed up CI May 23, 2020
@riedel riedel changed the title use docker builds to speed up CI use pre-built docker images to speed up CI May 23, 2020
@riedel
Copy link
Member Author

riedel commented May 23, 2020

Thanks @lesteve for this very good suggestion! Learning a lot on how to do PR in small batches. (I am always worried that once the normal work week starts I will run out of time and thus I am really happy to get things merged early, so nobody needs to wait for me). I will then first be able to finally rework the htcondor bit. (Maybe we can then even hold the weekly schedule before the next PR :) )

@lesteve
Copy link
Member

lesteve commented May 23, 2020

I removed what I thought was a autodocker-related file in the last commit. If this passes, I'll 'merge this one.

@lesteve
Copy link
Member

lesteve commented May 23, 2020

OK the build is green, merging this one, thanks a lot @riedel!

About small PRs yeah as a maintainer this is so much easier to review ... as a PR submitter it is a bit of a pain sometimes ... so need to find a compromise in between (not that easy).

To be fair, in this particular case, the fact that you don't have permissions to set-up things yourself adds some friction.

@lesteve lesteve changed the title use pre-built docker images to speed up CI Use pre-built docker images to speed up CI May 23, 2020
@lesteve lesteve merged commit b2a5ac8 into dask:master May 23, 2020
@jacobtomlinson
Copy link
Member

Just a note I have found creating CI jobs with various cron schedules is much easier in GitHub Actions. Don't know if you were thinking about migrating things there, but it is nice to configure and use.

@lesteve
Copy link
Member

lesteve commented May 27, 2020

It seems that some projects have migrated to Github Actions, I have to say I was not plannning to do it in the near future ... but if someone wants to have a look and do a PR that would be more than welcome !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants