Conversation
| @@ -1,18 +1,19 @@ | |||
| FROM python:3.8.6-slim-buster as dependencies | |||
| FROM python:3.8.11-slim-buster as dependencies | |||
There was a problem hiding this comment.
General maintenance keeping the base Python version updated
|
|
||
| RUN apt-get update \ | ||
| && apt-get install -y tini \ | ||
| && apt-get clean \ |
There was a problem hiding this comment.
apt-get clean is automatically run in this and most images via a config (/etc/apt/apt.conf.d/docker-clean).
| aiohttp==3.7.2 \ | ||
| aiohttp==3.7.4 \ | ||
| colorlog \ | ||
| cryptography \ | ||
| traitlets==4.3.3 \ | ||
| traitlets==5.0.5 \ | ||
| pyyaml \ | ||
| kubernetes-asyncio==11.0.0 | ||
| kubernetes-asyncio==12.1.1 |
There was a problem hiding this comment.
Systematic version updates as I had to update aiohttp in order to have a arm64 (aka. aarch64) compatible conda package in the dask-gateway image.
| LABEL MAINTAINER="Jim Crist-Harif" | ||
|
|
||
| ARG CONDA_VERSION=py38_4.8.3 | ||
| ARG CONDA_SHA256=879457af6a0bf5b34b48c12de31d4df0ee2f06a8e68768e5758c3293b2daf688 |
There was a problem hiding this comment.
Removal of checksums as amd64 / arm64 will have different checksums and it is a bit tedious to keep track of both.
| # - Create user dask | ||
| RUN useradd -m -U -u 1000 dask | ||
|
|
||
| # - Install tini | ||
| # - Install miniconda build dependencies | ||
| # - Download miniconda and check the sha256 checksum | ||
| # - Install miniconda | ||
| RUN apt-get update \ | ||
| && apt-get install -y tini wget bzip2 \ | ||
| && rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
I split this apart from a section below making so many different things in the same RUN layer that it became a bit hard to read.
| && find /opt/conda/ -follow -type f -name '*.a' -delete \ | ||
| && /opt/conda/bin/conda clean -afy \ | ||
| && chown -R dask:dask /opt/conda \ | ||
| && apt-get autoremove --purge -y wget bzip2 \ |
There was a problem hiding this comment.
By not removing wget / bzip2 the image will not grow further because adding + removing won't help if its not done in the same layer, and since it is no longer in the same layer I don't remove these binaries later.
The reason for having one more RUN statements was for readability.
There was a problem hiding this comment.
Do you have a rough idea of how much larger the image is now?
There was a problem hiding this comment.
My rough idea is that it is minimal (<1MB) and not worth worrying about. I did not confirm this thoroughly in an e2e build comparison though.
$ apt-cache --no-all-versions show wget | grep '^Size: '
Size: 348824
$ apt-cache --no-all-versions show bzip2 | grep '^Size: '
Size: 34064There was a problem hiding this comment.
Ah, so it seems like 992 kB + 195 kB.
$ apt-cache --no-all-versions show wget | grep Installed-Size
Installed-Size: 992
$ apt-cache --no-all-versions show bzip2 | grep Installed-Size
Installed-Size: 195
| ARG DASK_VERSION=2021.7.1 | ||
| ARG DISTRIBUTED_VERSION=2021.7.1 | ||
| ARG DASK_VERSION=2021.8.1 | ||
| ARG DISTRIBUTED_VERSION=2021.8.1 |
There was a problem hiding this comment.
I just bumped these while I was at it so all the conda/pip packages installed were updated across now that I had to update at least aiohttp.
|
|
||
|
|
There was a problem hiding this comment.
For readability as separation between the build steps
jacobtomlinson
left a comment
There was a problem hiding this comment.
This generally looks good to me!
We recently added Arm support to the Dask docker images in dask/dask-docker#166.
I wonder if there are any learnings we can take from that PR, or anything we can reuse to ensure a bit more consistency between projects.
Ah one key thing to keep consistent is a transition to mamba and mambaforge to install it. I'd be happy to help with that as well but it felt like out of scope for this PR to do that also. Oh @holdenk and I worked on these kinds of things in jupyter/docker-stacks also btw! :) 👋 Happy to see you contribute here as well @holdenk!!! Want to help by contributing a review of this PR also? |
|
I can address the |
TomAugspurger
left a comment
There was a problem hiding this comment.
Thanks for this, one question about image sizes inline.
I'd love it if we could come up with a policy about what library versions to use in the Dockerfiles, to avoid having repeated conversations about versions. Is there any reason not to use the latest version of all libraries (dask, distributed, aiohttp, etc) at the time of release?
One maybe special consideration is Python itself. Right now we're on 3.8.6 (bumping to 3.8.11). Is there any reason to keep that on 3.8? Or should we move to 3.9?
| && find /opt/conda/ -follow -type f -name '*.a' -delete \ | ||
| && /opt/conda/bin/conda clean -afy \ | ||
| && chown -R dask:dask /opt/conda \ | ||
| && apt-get autoremove --purge -y wget bzip2 \ |
There was a problem hiding this comment.
Do you have a rough idea of how much larger the image is now?
UPDATE: @TomAugspurger I created #437 regarding this question. Let's move this discussion to there. I think a good policy is to use a requirements.in file that is automatically refrozen to requirements.txt by dependabot, and then we install such requirements.txt in the dockerfile. That helps users get info about what version has been used in each release via git history, we avoid a maintenance burden, and we get security warnings via automation etc. We have a mix of conda/pip in these files though, making my idea a bit troublesome. I'd consider going 100% pip in these images or 100% conda and hope there is a good tool like there is for pip that dependabot can use for an environment.yml file |
|
Go for merge on this? |
|
Thanks @jacobtomlinson @TomAugspurger @jcrist for your review efforts! |
This is a step along the way for #421 about compatibility on arm64 machines like Raspberry Pi.