Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce container size using neurodocker-minify #4

Closed
wants to merge 23 commits into from
Closed

Conversation

Lestropie
Copy link
Member

Tagging @kaczmarj.

Made a little bit of progress on this, but only in draft form for now.

@Lestropie Lestropie linked an issue Sep 16, 2020 that may be closed by this pull request
@Lestropie
Copy link
Member Author

Todo: Install ANTs from source rather than through apt-get. This should permit specifying the installation location, which should in turn enable specifying this location as a directory to prune in the neurodocker-minify call. Currently the ANTs installation puts files in locations that can't be pruned, e.g. /usr/bin.

- Install ANTs from source rather than via neurodebian, enabling installation to specific directory and hence pruning of unused files.
- Update README instructions based on first working experience with neurodocker-minify.
- Tweaks to tests.sh to remove unwanted files.
@Lestropie
Copy link
Member Author

minify got the container from 19GB to 1.8GB. Most of that will be first atlases I think.

That process does however seem to be wiping the environment variables set within the container recipe file...

@kaczmarj
Copy link
Collaborator

minify got the container from 19GB to 1.8GB

I'm sure Docker never envisioned 19GB "microservices" :)

The minification will indeed wipe the environment variables, so those will have to be reset. Also it might make more sense to have two separate Dockerfiles -- one full Dockerfile to be used for minification that includes the entire FSL and ANTs installations, and a second Dockerfile that installs the minified versions of FSL and ANTs.

@Lestropie
Copy link
Member Author

The minification will indeed wipe the environment variables, so those will have to be reset.

Do you have an established recommendation for how to address this? I'd expected to go straight from minification to DockerHub upload; utilising a second Dockerfile just to set some environment variables seems clunky. Better to just write them explicitly to e.g. ~/.profile?

Also it might make more sense to have two separate Dockerfiles -- one full Dockerfile to be used for minification that includes the entire FSL and ANTs installations, and a second Dockerfile that installs the minified versions of FSL and ANTs.

How would you envision this working? Would it simply be the case that, after following the instructions currently being drafted in the README, the minified container would then be uploaded to DockerHub, and a second one-liner Dockerfile would be defined that simply downloads such? Or are you thinking of something different?

@kaczmarj
Copy link
Collaborator

kaczmarj commented Sep 17, 2020

I was thinking of having one Dockerfile with the full installations of ANTs, FSL, and MRtrix3. The file could be full.Dockerfile for example. This image would be used for minification. neurodocker-minify would prune the ANTs and FSL directories based on various MRtrix3 commands. Then, these pruned directories would be extracted, compressed into a tar.gz or similar, and then uploaded somewhere (even a GitHub release on this repo).

There would be a second Dockerfile, for example slim.Dockerfile, that installs MRtrix3 and the pruned ANTs and FSL from the tarball online. This second Docker image could be made very slim by excluding compilers and development libraries.

I can make a full example later today.

@kaczmarj
Copy link
Collaborator

@Lestropie - I wrote my ideas down in the branch https://github.com/MRtrix3/containers/tree/kaczmarj-minify

Can you please take a look? The Dockerfiles there use multi-stage builds, and with BuildKit (which ships with Docker nowadays), those stages can be built in parallel. I have outlines that in the README of that branch. Those Dockerfiles aren't yet complete, but they're close.

@Lestropie
Copy link
Member Author

OK, there's some ideas in there I've not seen before, and your slim recipe is different to what I had been thinking, though yours potentially has greater overlap with MRtrix3/mrtrix3#2134. There might be a third option that merges ideas from both and provides a good solution for both. I suspect you've more experience with containers than I have, but I'll at least write down my thinking in full and we'll see if it makes sense.


What I had initially intended was:

  1. Complete the instructions for minification I've documented thus far.

  2. Upload the resulting image to DockerHub; this would be the image that users would access.

  3. The Dockerfile for building the container pre-minification would be renamed; a second Dockerfile would then be defined, which would simply contain something like:

    from MRtrix3:latest
    

    So it would just be essentially a proxy for any service that grabs software containers based on the presence of a Dockerfile in the repository.

The problems with this solution are:


Now what I'm thinking instead is:

  1. Using the solution I have here currently; but at the end of tests.sh, erase the contents of /opt/mrtrix3/ from the container.

  2. Instead of your suggestion of tarballing the contents of e.g. /opt/ants/ and /opt/fsl/, upload the whole container image-with MRtrix3 scrubbed-to DockerHub. This becomes a template intended for making MRtrix3, with all of the dependencies for building & running in place, but without MRtrix3 itself.

  3. In this repo (MRtrix3/containers), a second Dockerfile pulls this template container from DockerHub, builds MRtrix3 master as release, removes the compilation dependencies and unnecessary contents of /opt/mrtrix3/.

  4. In MRtrix3/script_test_action, pull the same template container from DockerHub, build the nominated MRtrix3 commitish a release but with asserts enabled, and don't bother with subsequent container cleanup as it's only for running CI tests.


From what you've got there, there's definitely optimisations for the initial build that could be introduced as a standalone build optimisation changeset, but I think my latest idea of having the full container there as a template to pull is cleaner than tarballing the dependencies and then starting the container build from scratch post-minify.

@Lestropie
Copy link
Member Author

Okay, I think I have a working version:
http://hub.docker.com/repository/docker/mrtrix3/mrtrix3
Instructions in the README of this branch are all up-to-date.
This seems like a decent solution to me, and I think should help sort out the issues I was having with the Python script CI testing; but it's all open to reasonable criticism if there's a better alternative.
Also very

@jdtournier: If you want to make a Docker account, I can then add you as a member of the DockerHub organisation. Being a free account, we only get up to three members, and I can't make the base repository private.

@jdtournier
Copy link
Member

OK, I've just created an account on DockerHub as jdtournier. Thanks!

@Lestropie
Copy link
Member Author

CLosed in favour of #11.

@Lestropie Lestropie closed this Oct 17, 2020
@Lestropie Lestropie mentioned this pull request Oct 17, 2020
@Lestropie Lestropie deleted the minify branch November 11, 2020 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strip down external dependencies?
3 participants