Skip to content

Commit

Permalink
Add kaggle notebook Dockerfile (kubeflow#1109)
Browse files Browse the repository at this point in the history
  • Loading branch information
Pete MacKinnon authored and k8s-ci-robot committed Jul 15, 2018
1 parent 4320841 commit eb795a0
Show file tree
Hide file tree
Showing 2 changed files with 95 additions and 0 deletions.
78 changes: 78 additions & 0 deletions components/contrib/kaggle-notebook-image/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.

# use basic syntax for now
FROM gcr.io/kaggle-images/python:latest

USER root

ENV DEBIAN_FRONTEND noninteractive

ENV NB_USER jovyan
ENV NB_UID 1000
ENV HOME /home/$NB_USER
# We prefer to have a global conda install
# to minimize the amount of content in $HOME
ENV CONDA_DIR=/opt/conda
ENV PATH $CONDA_DIR/bin:$PATH

# Use bash instead of sh
SHELL ["/bin/bash", "-c"]

# add https support
RUN apt-get update && apt-get install -yq --no-install-recommends apt-transport-https

RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
locale-gen

ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

# Create jovyan user with UID=1000 and in the 'users' group
# but allow for non-initial launches of the notebook to have
# $HOME provided by the contents of a PV
RUN useradd -M -s /bin/bash -N -u $NB_UID $NB_USER && \
chown -R ${NB_USER}:users /usr/local/bin && \
mkdir -p $HOME

RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update && \
apt-get install -y google-cloud-sdk kubectl && \
# pin to 0.8.1 due to conflicts with tornado>=5.x and pyzmq>=17.x
# we just really need jupyterhub-singleuser for our KF scripts
pip install jupyterhub==0.8.1

# Install Tini - used as entrypoint for container
RUN cd /tmp && \
wget --quiet https://github.com/krallin/tini/releases/download/v0.10.0/tini && \
echo "1361527f39190a7338a0b434bd8c88ff7233ce7b9a4876f3315c22fce7eca1b0 *tini" | sha256sum -c - && \
mv tini /usr/local/bin/tini && \
chmod +x /usr/local/bin/tini

RUN chown -R ${NB_USER}:users $HOME

ENV GITHUB_REF https://raw.githubusercontent.com/kubeflow/kubeflow/master/components/tensorflow-notebook-image

ADD --chown=jovyan:users $GITHUB_REF/jupyter_notebook_config.py /tmp

# Wipe $HOME for PVC detection later
WORKDIR $HOME
RUN rm -fr $(ls -A $HOME)

# Get init scripts from kubeflow
ADD --chown=jovyan:users \
$GITHUB_REF/start-singleuser.sh \
$GITHUB_REF/start-notebook.sh \
$GITHUB_REF/start.sh \
$GITHUB_REF/pvc-check.sh \
/usr/local/bin/

RUN chmod a+rx /usr/local/bin/*

# Configure container startup
EXPOSE 8888
ENTRYPOINT ["tini", "--"]
CMD ["start-notebook.sh"]
17 changes: 17 additions & 0 deletions components/contrib/kaggle-notebook-image/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Kaggle notebook

This Dockerfile builds an image that is derived from the latest [Kaggle python image](gcr.io/kaggle-images/pythoni:latest) but which is compatible for launching from the Kubeflow JupyterHub. [Kaggle](https://www.kaggle.com/) is the home of data science collaboration and competition.

Important notes:
* this notebook is not curated by the Kubeflow project and is not regularly tested
* the versions of TensorFlow, PyTorch, and the other libraries included may change at any time
* this is a very large notebook, over 21 Gb in size. Since our notebook uses the latest Kaggle image, docker pulls (and notebook launches) can take a lengthy period of time.
* the base image size for docker devicemapper is 10 Gb, which won't be large enough to run this image. Your docker daemon must be configured for at least 30 Gb (`-storage-opt dm.basesize=30G`) or use a storage driver like overlay2.
* the Kaggle image includes TensorFlow 1.9 or greater built with AVX2 support, so the image may not run on older CPU
* the other Kubeflow curated notebooks have the feature that the jovyan user is able to install new packages from pip or conda. Unfortunately this is not the case with the Kaggle image due to the impact on the image size due to adding a new ownership layer.

To build the image run:
```
docker build --pull -t kubeflow-kaggle-notebook:latest .
```
Of course specify whatever repo and image tag you need for your purposes.

0 comments on commit eb795a0

Please sign in to comment.