Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker hub times out build v1.7.0 #22

Closed
hardingnj opened this issue Mar 1, 2016 · 13 comments
Closed

docker hub times out build v1.7.0 #22

hardingnj opened this issue Mar 1, 2016 · 13 comments

Comments

@hardingnj
Copy link

This is due to addition of simupop, which takes ages.

Should we prune the Dockerfile or move to a push model?

@hardingnj
Copy link
Author

For the moment, I've removed simupop we can think about how to address this later.

@alimanfoo
Copy link
Contributor

Thanks Nick, I have no immediate plans to use simupop so fine to remove. At
some point in the not-too-distant future we might consider using bioconda
to install binaries instead of installing everything from source via pip,
but that needs some investigation, I haven't tried bioconda yet.

On Tuesday, March 1, 2016, Nick Harding notifications@github.com wrote:

For the moment, I've removed simupop we can think about how to address
this later.


Reply to this email directly or view it on GitHub
#22 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@hardingnj
Copy link
Author

hardingnj commented Apr 25, 2016

I think I'm going to have a go with conda/bioconda

We have some desicions to make though.

  1. base our docker image on contimuum's anaconda, which is on debian. This means that we start with a lot of the heavy lifting done, then we can use conda to install the bioconda channel.
  2. we can use the bioconda docker image for their installation environment, which is on Centos5.
  3. start with ubuntu and install miniconda (stripped down anaconda) and do everything from scratch. I think this this option we may run into timeout issues again.

Bioconda is something I wasn't aware of. For several of the things in biipy, we may want to think about writing recipes for conda/bioconda. Jerome has done this for msprime. It doesn't seem to be a lot of additional work, very similar to what you (AM) did with basemap/treemix.

@hardingnj
Copy link
Author

I think my preference is for 1. It does require us hitching our wagon to anaconda, but we can easily control versions using their tags. Would be interested to hear thoughts though.

@alimanfoo
Copy link
Contributor

Do you know which steps are causing the most time in the build currently?

Whichever option we go for, I think we still want to build numpy (and
possibly scipy?) from scratch against openblas, rather than install
binaries. I know these steps are both very time consuming but the
performance improvement from building against openblas for things like PCA
is dramatic (order of magnitude).

On Mon, Apr 25, 2016 at 10:48 AM, Nick Harding notifications@github.com
wrote:

I think my preference is for 1. It does require us hitching our wagon to
anaconda, but we can easily control versions using their tags. Would be
interested to hear thoughts though.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#22 (comment)

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@hardingnj
Copy link
Author

Numpy takes quite a while, but scipy takes ages... like > 40 minutes from source.

The other thing we could do is have a base image where we install numpy and scipy and pull from that?

Or, most simple of all, we could build locally and push images to dockerhub instead of the docker hub/github interface

@alimanfoo
Copy link
Contributor

On Mon, Apr 25, 2016 at 11:28 AM, Nick Harding notifications@github.com
wrote:

Numpy takes quite a while, but scipy takes ages... like > 40 minutes from
source.

Ouch.

The other thing we could do is have a base image where we install numpy
and scipy and pull from that?

Or, most simple of all, we could build locally and push images to
dockerhub instead of the docker hub/github interface

I have a mild preference for sticking with automated builds. It's less
convenient, but it's harder to screw something up.

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@wrighting
Copy link

My 2p, an automated build to build a base image, and then work from the image is a good way to go.
(Does it help to give the build more resources?)

@alimanfoo
Copy link
Contributor

Btw I think it's also worth considering starting from a Ubuntu 16.04 base image, with Python 3.5 as the default it would simplify a number of the existing steps.

@hardingnj
Copy link
Author

I've made a start on this. Splitting some of the overhead into a "base" image.

I don't know how to check if we are installing numpy from source with openblas. The installation takes very little time, so I suspect we are not.

Additionally, I am having issues installing ipython 4.2.0/llvmlite

I'll push my changes to a branch.

@hardingnj
Copy link
Author

Maybe we can discuss later in the week. Hit a bit of a wall here :/

@alimanfoo
Copy link
Contributor

Sure, skype tomorrow?

Using latest pip installs a binary version of numpy, i.e., bypasses
compilation. This has changed since the previous time we built a biipy
image. Basically we just need to force pip to compile numpy, if openblas is
already installed then numpy will detect it during the build process and
build against it.

On Wednesday, April 27, 2016, Nick Harding notifications@github.com wrote:

Maybe we can discuss later in the week. Hit a bit of a wall here :/


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#22 (comment)

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@hardingnj
Copy link
Author

Thanks all. Fixed in newest version, ended up splitting the dockerfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants