Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: prevent manual pin of dependencies and improve build speed #95

Merged
merged 70 commits into from Jan 27, 2022
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
d21f8e2
docker: try mamba
tlvu Dec 17, 2021
e7a9640
docker: enable DEBUG logging for conda
tlvu Dec 17, 2021
e1e9fba
docker: remove direct dependencies from xclim and ravenpy
tlvu Dec 17, 2021
2ef396c
docker: remove 'jupyter lab build' step, probably not needed with jup…
tlvu Dec 17, 2021
f71e61e
docker: try set channel_priority strict to hopefully increase perform…
tlvu Dec 18, 2021
3a5472f
docker: try to pin xclim and ravenpy to hopefully increase performance
tlvu Dec 18, 2021
6f8f0ea
docker: use --strict-channel-priority with conda env create to be sure
tlvu Dec 18, 2021
53e81d7
docker: remove defaults channel to decrease number of packages for th…
tlvu Dec 18, 2021
7bb511c
docker: replace conda env create with conda create
tlvu Dec 18, 2021
335b4b2
docker: replace conda create needs --name
tlvu Dec 18, 2021
f8df4cf
docker: use --file instead -f for conda create
tlvu Dec 18, 2021
23b8912
docker: conda create do not work, revert to conda env create
tlvu Dec 18, 2021
f992cef
docker: install mamba the proper way and use mamba env create
tlvu Dec 18, 2021
965860b
docker: unpin xclim and ravenpy to try with mamba
tlvu Dec 18, 2021
eafacd9
docker: use mamba to create conda env and perform 2-stage install to …
tlvu Dec 18, 2021
0452d4c
docker: need to specify channels for manual mamba create
tlvu Dec 18, 2021
26fa54f
docker: channel conda-forge before defaults, gdal needed by ravenpy n…
tlvu Dec 18, 2021
ff696db
docker: typo: mamba env install and not just mamba install
tlvu Dec 18, 2021
bf768d9
docker: typo: mamba env update and not mamba env install
tlvu Dec 18, 2021
498bc8c
docker: add vtk-cdat channel for vtk-cdat, needed by vcs
tlvu Dec 18, 2021
6d13a42
Revert "docker: add vtk-cdat channel for vtk-cdat, needed by vcs"
tlvu Dec 18, 2021
3e8cfc5
docker: remove vcs for weird dependency solve error, will have to inv…
tlvu Dec 18, 2021
9860b0d
docker: temp change to leverage build caching
tlvu Dec 18, 2021
e0f5325
docker: optimize for cache re-use during dev rebuild cycle
tlvu Dec 18, 2021
f105f91
docker: try hack to ensure latest xclim and ravenpy
tlvu Dec 18, 2021
36a4b0f
docker: mamba install do nothing if already installed, have to use ma…
tlvu Dec 18, 2021
f5fa5d7
docker: mamba update did not update either, force install exact version
tlvu Dec 18, 2021
66953c0
docker: try pin xclim and ravenpy immediately at the beginning to see…
tlvu Dec 18, 2021
87350e8
docker: even with pin in initial install, still getting downgrade so …
tlvu Dec 18, 2021
cc2d7ef
docker: pin in environment.yml stopped the downgrading, remove duplic…
tlvu Dec 18, 2021
d3a06f5
docker: conda config --set channel_priority strict probably not neede…
tlvu Dec 18, 2021
a62f3c2
docker: wonder if conda config --set channel_priority was the cause o…
tlvu Dec 18, 2021
22c214a
docker: conda config --set channel_priority strict not responsible fo…
tlvu Dec 18, 2021
72dc0c9
docker: no need to jupyter labextension install @jupyter-widgets/jupy…
tlvu Dec 18, 2021
b6eaa61
docker: install ipyleaflet using conda for new jupyter lab v3
tlvu Dec 18, 2021
1ca6f68
docker: drop jupyterlab-topbar-text and jupyterlab-theme-toggle since…
tlvu Dec 18, 2021
e4263a4
docker: pin python=3.8 for vcs package
tlvu Dec 18, 2021
e3b2c23
docker: pin python=3.7 for libgdal
tlvu Dec 18, 2021
30a1a00
docker: restore defaults channel that was remove to optimize conda pe…
tlvu Dec 19, 2021
7ab6d90
docker: reduce number of layers during builds to hopefully increase a…
tlvu Dec 19, 2021
b13769a
docker: missing manual mkdir /etc/jupyter in previous commit
tlvu Dec 19, 2021
87fb219
docker: document reasoning of new changes
tlvu Dec 19, 2021
5d3c259
docker: update startup CMD, old one refuses to work
tlvu Dec 20, 2021
50a85d1
docker: remove NOTEBOOK_ARGS from CMD
tlvu Dec 20, 2021
4343030
docker: force newer nodejs for 'jupyter lab build' failure
tlvu Dec 21, 2021
ca17d14
docker: add comment reminder to remove nodejs and 'jupyter lab build'…
tlvu Dec 21, 2021
68af06d
docker: unable to pin nodejs >= 17.1.0, have to pin lower to >= 16.0
tlvu Dec 21, 2021
3d2707a
docker: ravenpy 0.7.6 released
tlvu Dec 21, 2021
eac4141
docker: rollback start script and restore previous startup CMD
tlvu Dec 21, 2021
c6fd0f4
docker: document performance problem with one single 'conda env creat…
tlvu Dec 21, 2021
890c228
docker: ravenpy 0.7.7 released
tlvu Dec 21, 2021
62bc689
release: update to use image pavics/workflow-tests:211221
tlvu Dec 21, 2021
eaa2b01
docker: clean up comment from previous pin
tlvu Dec 21, 2021
d67a2d2
Dockerfile.testing: pin shapely for notebook failure, bokeh for perfo…
tlvu Jan 11, 2022
035560e
docker: remove vcs package to avoid python downgrade, xarray will dro…
tlvu Jan 13, 2022
ced6249
docker: create the conda env as the regular user to ensure no permiss…
tlvu Jan 13, 2022
6d31473
docker: attempt to fix mamba creates temp file at root where user jen…
tlvu Jan 13, 2022
faa0341
docker: 'python -m ipykernel install' requires root
tlvu Jan 13, 2022
b714564
docker: prevent serious downgrade of openssl during second install ph…
tlvu Jan 13, 2022
8aba3a5
docker: revert change to build as regular user since it downgrades op…
tlvu Jan 13, 2022
c9d2ef6
docker: xesmf-0.6.2 requires clisops>=0.8.0, new ravenpy available
tlvu Jan 14, 2022
107c402
docker: remove mamba for mamba_gator plugin to work properly
tlvu Jan 14, 2022
8bd0094
docker: pin shapely due to notebook failure
tlvu Jan 14, 2022
c90413c
release: update to use image pavics/workflow-tests:220116
tlvu Jan 17, 2022
9370388
docker: prevent accidental cartopy downgrade from mamba
tlvu Jan 17, 2022
58cd714
docker: pin python=3.9
tlvu Jan 17, 2022
d58f090
release: update to use image pavics/workflow-tests:220116.1
tlvu Jan 17, 2022
b4acbdc
conftest: avoid failing on holoviews display json code change
tlvu Jan 18, 2022
007ae84
docker: pin python=3.8 for xESMF
tlvu Jan 21, 2022
34d38cc
release: update to use image pavics/workflow-tests:220121
tlvu Jan 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Expand Up @@ -10,7 +10,7 @@ pipeline {
// https://jenkins.io/doc/book/pipeline/syntax/
agent {
docker {
image "pavics/workflow-tests:211123-update211216"
image "pavics/workflow-tests:211221"
label 'linux && docker'
}
}
Expand Down
2 changes: 1 addition & 1 deletion binder/Dockerfile
@@ -1,4 +1,4 @@
FROM pavics/workflow-tests:211123-update211216
FROM pavics/workflow-tests:211221

USER root

Expand Down
80 changes: 56 additions & 24 deletions docker/Dockerfile
@@ -1,6 +1,10 @@
FROM continuumio/miniconda3

RUN conda update conda
# Use mamba for much improved performance over conda.
# The 'channel_priority strict' did help conda but it was not enough.
RUN conda update conda -n base && \
conda install mamba -n base -c defaults -c conda-forge && \
conda config --set channel_priority strict

# to checkout other notebooks and to run pip install
RUN apt-get update && \
Expand All @@ -9,18 +13,40 @@ RUN apt-get update && \
firefox-esr x11-utils && \
apt-get clean

COPY environment.yml /environment.yml

# needed for our specific jenkins
# Create user jenkins for our Jenkins e2e notebooks test suite.
# Change /opt/conda folder permissions for jupyter-conda extension.
RUN groupadd --gid 1000 jenkins \
&& useradd --uid 1000 --gid jenkins --create-home jenkins
&& useradd --uid 1000 --gid jenkins --create-home jenkins && \
chmod -R a+rwx /opt/conda

# Change these folders' permissions for jupyter-conda extension
RUN chmod -R a+rwx /opt/conda
COPY environment.yml /environment.yml

# create env "birdy"
# use umask 0000 so that the files for the new environment are usable by user 'jenkins' for the jupyter-conda-extension
RUN umask 0000 && conda env create -f /environment.yml
#
# Perform 2 stages install because one single 'conda env create -f
# /environment.yml' was taking forever to complete, same with mamba.
# Had to do this 2 stages install. 2 stages install was also taking forever
# with conda so had to switch to mamba.
#
# One single 'conda env create -f /environment.yml' takes forever because we
# removed all direct dependencies of xclim and ravenpy in /environment.yml for
# dependencies pinning by xclim and ravenpy to take effect. This results in
# conda having a lot more packages to "solve" and it seems the solver
# performance dropped exponentially with the number of packages to solve.
#
# Conda was stuck at this step:
# DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 2500273
#
# Pin python=3.8 for vtk-cdat needed by vcs. Otherwise, 3.9 would have been selected.
# package vcs-8.1-py_0 requires vtk-cdat >8.1, but none of the providers can be installed
# package vtk-cdat-8.2.0.8.2.1-py38hbc81915_0 requires python >=3.8,<3.9.0a0 *_cpython
# Pin python=3.7 for libgdal (note libgdal was working with python 3.9 but blew up when we pin 3.8 for vcs)
# package gdal-3.1.4-py38h25844d8_1 requires libgdal 3.1.4 h50e41a3_1, but none of the providers can be installed
# TODO remove python pin once a vtk-cdat for py39 exist, libgdal already worked for 3.9.
RUN umask 0000 && \
mamba create --name birdy --channel conda-forge --channel defaults xclim ravenpy python=3.7 && \
mamba env update --name birdy --file /environment.yml

# alternate way to 'source activate birdy'
ENV PATH="/opt/conda/envs/birdy/bin:$PATH"
Expand All @@ -33,35 +59,41 @@ RUN python -m ipykernel install --name birdy
# anything accidentally
# this is for debug only, all dependencies should be specified in
# environment.yml above
# RUN conda install -c conda-forge -c cdat -c bokeh -c plotly -c defaults -n birdy nbdime
# RUN mamba install -c conda-forge -c cdat -c bokeh -c plotly -c defaults -n birdy nbdime

# build jupyterlab extensions installed by conda, see `jupyter labextension list`
# Supposedly not needed with jupyterlab v3 anymore but see
# https://github.com/jupyterlab/jupyterlab/issues/11726#issuecomment-998901247
# TODO: remove 'jupyter lab build' step once all extensions move to prebuilt extensions,
# see comment https://github.com/jupyterlab/jupyterlab/issues/11726#issuecomment-998917305
# Currently jupyter-dash is holding back this step, see
# https://github.com/plotly/jupyter-dash/issues/49
RUN jupyter lab build

# for ipywidgets to work with jupyter lab (notebooks works out of the box)
RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager \
&& jupyter serverextension enable voila --sys-prefix \
&& jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leaflet \
&& jupyter labextension install jupyterlab-topbar-text \
jupyterlab-theme-toggle
RUN jupyter serverextension enable voila --sys-prefix
# && jupyter labextension install jupyterlab-clipboard

ADD https://raw.githubusercontent.com/jupyter/docker-stacks/master/base-notebook/start.sh /usr/local/bin/
ADD https://raw.githubusercontent.com/jupyter/docker-stacks/master/base-notebook/start-singleuser.sh /usr/local/bin/
ADD https://raw.githubusercontent.com/jupyter/docker-stacks/master/base-notebook/start-notebook.sh /usr/local/bin/
ADD https://raw.githubusercontent.com/jupyter/docker-stacks/master/base-notebook/fix-permissions /usr/local/bin/
ADD https://raw.githubusercontent.com/jupyter/docker-stacks/master/base-notebook/jupyter_notebook_config.py /etc/jupyter/
RUN chmod a+rx /usr/local/bin/start.sh /usr/local/bin/start-singleuser.sh /usr/local/bin/start-notebook.sh /usr/local/bin/fix-permissions; \
chmod a+r /etc/jupyter/jupyter_notebook_config.py
# This should be "master" but commit
# https://github.com/jupyter/docker-stacks/commit/c772e98ac794173d6ed83a08ec249038b27ca3be
# is breaking with us since we do not have user jovyan.
ENV DOCKER_STACKS_COMMIT=709206ac8788475728cc9c992c25fb5f1501bc29

# For Pavics-landing notebooks to re-create Jupyter env layout:
# /notebook_dir for Pavics-landing notebooks to re-create Jupyter env layout:
# /notebook_dir/writable-workspace, /notebook_dir/pavics-homepage.
#
# Path to the /notebook_dir/pavics-homepage/tutorial_data/*.geojson files are
# hardcoded so users can copy the nb to writable-workspace/ dir and still be able
# to run them seemlessly from the Jupyter env (without having to also copy
# those *.geojson files with the notebooks).
RUN mkdir /notebook_dir && chown jenkins /notebook_dir
RUN wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_COMMIT/base-notebook/start.sh --output-document /usr/local/bin/start.sh && \
wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_COMMIT/base-notebook/start-singleuser.sh --output-document /usr/local/bin/start-singleuser.sh && \
wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_COMMIT/base-notebook/start-notebook.sh --output-document /usr/local/bin/start-notebook.sh && \
wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_COMMIT/base-notebook/fix-permissions --output-document /usr/local/bin/fix-permissions && \
mkdir /etc/jupyter && \
wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_COMMIT/base-notebook/jupyter_notebook_config.py --output-document /etc/jupyter/jupyter_notebook_config.py && \
chmod a+rx /usr/local/bin/start.sh /usr/local/bin/start-singleuser.sh /usr/local/bin/start-notebook.sh /usr/local/bin/fix-permissions && \
chmod a+r /etc/jupyter/jupyter_notebook_config.py && \
mkdir /notebook_dir && chown jenkins /notebook_dir

# problem running start-notebook.sh when being root
# the jupyter/base-notebook image also do not default to root user so we do the same here
Expand Down
51 changes: 36 additions & 15 deletions docker/environment.yml
Expand Up @@ -6,13 +6,27 @@ channels:
- bokeh
- plotly # for jupyter-dash
- defaults

dependencies:

# Do not put xclim and ravenpy direct dependencies here to let xclim and ravenpy
# manage their own dependencies pinning.
#
# xclim direct dependencies: https://github.com/conda-forge/xclim-feedstock/blob/master/recipe/meta.yaml
# ravenpy direct dependencies: https://github.com/conda-forge/ravenpy-feedstock/blob/master/recipe/meta.yaml

# Pin latest xclim and ravenpy to avoid downgrading during the second install
# phase. Mamba is quicker to solve dependencies than conda but it is less
# precise so accidental downgrade happends.
- xclim >= 0.32.1
- ravenpy >= 0.7.7
Zeitsperre marked this conversation as resolved.
Show resolved Hide resolved

- matplotlib
- xarray
- numpy
# - xarray # from xclim and ravenpy
# - numpy # from xclim and ravenpy
- birdy
- owslib>=0.23.0
- netcdf4
# - owslib>=0.23.0 # from ravenpy
# - netcdf4 # from ravenpy
# https://github.com/ecmwf/cfgrib
# Python interface to map GRIB files to the Unidata's Common Data Model v4
# following the CF Conventions.
Expand All @@ -22,11 +36,11 @@ dependencies:
- descartes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs descartes? This library is ancient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs descartes? This library is ancient.

90ccbec

I just update the PR description with list of package changes. You might want to review the PR description again.

# Pin rasterio for ravenpy, remove on next build.
# See https://github.com/CSHS-CWRA/RavenPy/commit/eae66e9afc30e2381e9119644a0695d1d248c739
- rasterio <= 1.2.6
- gdal # for osgeo
- geopandas
- pandas
- rioxarray
# - rasterio <= 1.2.6 # from ravenpy
tlvu marked this conversation as resolved.
Show resolved Hide resolved
# - gdal # for osgeo, from ravenpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# - gdal # for osgeo, from ravenpy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually mean gdal is not a dependency from ravenpy? But it still is https://github.com/conda-forge/ravenpy-feedstock/blob/ae682f7bb3fc586737733d923111d3d2107f5ae0/recipe/meta.yaml#L34

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no, GDAL is all-knowing and ever-present. I figured there's no major reason for specifying it to be installed since it comes bundled with so many other packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean. But according to this commit d2948dc, some notebooks was actually importing it directly and it was not yet in the Jupyter env.

I'd keep it here for historical reason.

# - geopandas # from ravenpy
# - pandas # from xclim and ravenpy
# - rioxarray # from ravenpy
- scikit-image
- ipyleaflet
- threddsclient
Expand All @@ -48,12 +62,12 @@ dependencies:
# pinning hvplot did not solve the problem with violin plot.
- hvplot
- nc-time-axis
- cftime
- statsmodels # for ravenpy
# - cftime # from xclim and ravenpy
# - statsmodels # for ravenpy
# for error 'ImportError: HTTPFileSystem requires "requests" and "aiohttp" to
# be installed' with call 'fsspec.filesystem('https')'
- aiohttp
- pydantic
# - pydantic # from ravenpy
# Intake is a lightweight set of tools for loading and sharing data in data science projects
- intake
# https://intake.readthedocs.io/en/latest/plugin-directory.html
Expand All @@ -74,11 +88,9 @@ dependencies:
- zarr
# https://github.com/dask/s3fs/
- s3fs
- xclim
# Pinning shapely for ravenpy. Remove on next rebuild.
# https://github.com/CSHS-CWRA/RavenPy/blob/f63e1e5b967c0d7c17e679c8f9d6d309a94096e6/environment.yml#L35
- shapely <=1.7.1
- ravenpy
# - shapely <=1.7.1 # from ravenpy
tlvu marked this conversation as resolved.
Show resolved Hide resolved
# https://github.com/roocs/clisops
- clisops
# Universal Regridder for Geospatial Data
Expand All @@ -105,6 +117,10 @@ dependencies:
- notebook
- jupyterlab
- jupyterhub
# https://ipywidgets.readthedocs.io/en/latest/user_install.html
- ipywidgets
# https://github.com/jupyter-widgets/ipyleaflet
- ipyleaflet
# https://github.com/mamba-org/gator (was jupyter_conda)
- mamba_gator
# to diff .ipynb files
Expand All @@ -125,6 +141,11 @@ dependencies:
# xeus-python: back-end kernel implementing the Jupyter Debug Protocol
- xeus-python
- jupyter-dash
# Force newer nodejs for 'jupyter lab build' issue
# https://github.com/jupyterlab/jupyterlab/issues/11726#issuecomment-998901247
# TODO: remove nodejs once all extensions move to prebuilt extensions, see comment
# https://github.com/jupyterlab/jupyterlab/issues/11726#issuecomment-998917305
- nodejs >= 16.0
# utilities
- curl
- wget
Expand Down
2 changes: 1 addition & 1 deletion launchcontainer
@@ -1,7 +1,7 @@
#!/bin/sh -x

if [ -z "$DOCKER_IMAGE" ]; then
DOCKER_IMAGE="pavics/workflow-tests:211123-update211216"
DOCKER_IMAGE="pavics/workflow-tests:211221"
fi

if [ -z "$CONTAINER_NAME" ]; then
Expand Down
2 changes: 1 addition & 1 deletion launchnotebook
Expand Up @@ -7,7 +7,7 @@ if [ -z "$PORT" ]; then
fi

if [ -z "$DOCKER_IMAGE" ]; then
DOCKER_IMAGE="pavics/workflow-tests:211123-update211216"
DOCKER_IMAGE="pavics/workflow-tests:211221"
fi

if [ -z "$CONTAINER_NAME" ]; then
Expand Down