Skip to content

Commit

Permalink
Merge pull request #79 from betatim/new-hub-docs
Browse files Browse the repository at this point in the history
[MRG] Add more documentation
  • Loading branch information
betatim committed Aug 5, 2018
2 parents 74bdeb2 + 2dc6e34 commit a5a28ea
Show file tree
Hide file tree
Showing 4 changed files with 245 additions and 2 deletions.
208 changes: 208 additions & 0 deletions docs/source/additional-hub-configuration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
Additional ideas and snippets to customise your hub
===================================================

This is a collection of snippets and pointers to further customise your hub.


Custom user image
-----------------

Each hub can have a different environment, set of libraries and tools that is
provided to students. Hub's have a default user image, but it does not contain
many tools useful for doing science. However you can use it to test the rest
of your hub's setup.

The `Jupyter docker stacks <https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html>`_
provide a good collection of user images to start from. They are maintained by
the Jupyter team and updated reasonably often. They already work with JupyterHub
so you can quickly get going. Take a look at `the relationship between images <https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships>`_
to get an idea of how the images relate to each other and what is installed
in each.

To configure your hub to use the `datascience-notebook <https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-datascience-notebook>`_
image edit your :code:`<hubname>/values.yaml` and add the following snippet:

.. code-block:: yaml
jupyterhub:
singleuser:
image:
name: jupyter/datascience-notebook
tag: 135a595d2a93
startTimeout: 600
You have to specify both the name and an explicit tag. You can not use :code:`latest`
as tag.

Pulling images from docker hub can be a bit slow at times. This means it is a
good idea to increase the :code:`startTimeout` to 600 seconds as shown above.


Self-made user image
--------------------

Sometimes none of the images available as part of the docker-stacks is enough
and you want to build a custom image. A good way to get started with this is
to base your work on a docker-stacks image that does most of what you need
and then customise it further. If you need the libraries from the
:code:`earth-analytics-python-env` you can start from that Docker image as
well.

To create a self-made user image create a new directory in the
:code:`user-images/` directory that has the same name as your hub.

In this directory place a :code:`Dockerfile`. This will be automatically
built by travis. To allow travis to find images and determine which need
rebuilding when you need to follow this naming convention.

Below an example of a minimally modified earth-analytics-python-env
docker image. It picks a specific tag of the earth-analytics-python-env and
then installs JupyterHub version 0.9.0. It also installs the `nbzip <https://github.com/data-8/nbzip>`_
notebook extension that lets students download the contents of their JupyterHub
home directories as a ZIP file to their local machine. The three commands that
install and enable the extension are typical for notebook extensions.

.. code-block:: shell
FROM earthlab/earth-analytics-python-env:41ae80f
RUN pip install --no-cache --upgrade --upgrade-strategy only-if-needed \
jupyterhub==0.9.0 nbzip==0.0.4
RUN jupyter serverextension enable --py nbzip --sys-prefix
RUN jupyter nbextension install --py nbzip --sys-prefix
RUN jupyter nbextension enable --py nbzip --sys-prefix
This image will be automatically built by travis. You will need to adjust your
hub's :code:`values.yaml` to use this image:

.. code-block:: yaml
jupyterhub:
singleuser:
image:
# tag will be set by travis on deployment
name: earthlabhubops/ea-k8s-user-<hubname>
tag: set-on-deployment
startTimeout: 600
By following the convention that the custom user image for your hub is placed in
:code:`user-images/<hubname>` your docker image will be called :code:`earthlabhubops/ea-k8s-user-<hubname>`.
You do not have to set the tag by hand, travis will take care of that for you.

Pulling images from docker hub can be a bit slow at times. This means it is a
good idea to increase the :code:`startTimeout` to 600 seconds as shown above.


Prefetching data
----------------

It can be worth prefetching data for your students and including it directly
in the docker image. This means they will not have to wait when the course
starts. The downside is that your docker image gets bigger. Unfortunately we
can not directly add data to student's home directories. We can only bake this
data into the docker image used for each user. In this example we also setup
the necessary steps for the data to be copied over to each student's home
directory when they log into the hub.

To include data in your docker image create a custom user image for your hub
by following `Self-made user image`_.

An example of using :code:`earthpy` to download the :code:`spatial-vector-lidar`
dataset is given below:

.. code-block:: shell
# Have to explicitly change the matplotlib backend in order to use
# earthpy on the command line.
RUN python -c "import matplotlib; matplotlib.use('Agg'); import earthpy; data = earthpy.io.EarthlabData('/data'); data.get_data('spatial-vector-lidar')"
The general idea is to execute a Python command to trigger the download and
store the results in :code:`/data`. You could use any kind of command to do this.
For example you could use :code:`wget` to fetch a dataset from FigShare or
any other website. Try out your command locally to make sure it does exactly
what you think it should do.

You can place the data in almost any location inside the container. By convention
we use :code:`/data` though.

If all you need is that the data is available in the container then you are done
now. If you'd like to also copy the data over to the students home directory
read the below snippet:

.. code-block:: yaml
jupyterhub:
singleuser:
lifecycleHooks:
postStart:
exec:
command:
- "sh"
- "-c"
- >
mkdir -p /home/jovyan/earth-analytics/data;
rsync --ignore-existing -razv --progress /data/ /home/jovyan/earth-analytics/data;
The :code:`lifecycleHooks` entry in the :code:`values.yaml` of your hub give
you the option to run commands when a user's pod starts. You can place any
command here. Keep in mind that the user can start interacting with their pod
already before these commands complete. This means you want commands in this
section to runreasonably quickly. Otherwise users might be confused or interfere
with the commands here.

The above snippet does two things: it makes sure that the :code:`earth-analytics/data`
directory exists in the users home directory. After that it uses :code:`rsync`
to copy the data from :code:`/data` to this directory. The way :code:`rsync` is
configured means that it will not overwrite files that already exist in the user's
home directory. The assumption is that a user might have edited these files and
does not want them to be overwritten. If users want to refresh their datasets
because they broke something they can delete that file or dataset, stop their
server, and then restart it. They should now have the latest version of the
data again. Or they can run the above :code:`rsync` command manually.


.. _self-made-hub-image:

Self-made hub image
-------------------

You can customise the image and environment in which the JupyterHub itself runs.
This is useful when you want to use custom authenticators. To create a custom
hub image create a directory called :code:`hub-images/<hubname>`.

An example of installing the Hash authenticator is given here:

.. code-block:: shell
# the tag given here has to be compatible with the version of the
# helm chart you are using for this hub.
FROM jupyterhub/k8s-hub:f8dec3f
USER root
RUN pip3 install --no-cache-dir \
jupyterhub-hashauthenticator==0.4.0
USER ${NB_USER}
This image will be automatically built by travis. You will need to adjust your
hub's :code:`values.yaml` to use this image:

.. code-block:: yaml
jupyterhub:
hub:
image:
# tag will be set by travis on deployment
name: earthlabhubops/ea-k8s-hub-<hubname>
tag: set-on-deployment
By following the convention that the custom hub image for your hub is placed in
:code:`hub-images/<hubname>` your hub's docker image will be called :code:`earthlabhubops/ea-k8s-hub-<hubname>`.
You do not have to set the tag by hand, travis will take care of that for you.


Custom authentication
---------------------

To configure the authentication mechanism read :ref:`authentication`.
8 changes: 6 additions & 2 deletions docs/source/authentication.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _authentication:

Authentication for your hub
===========================

Expand All @@ -20,7 +22,9 @@ GitHub account already.
will not need to create a new account.

`Hash authentication` is good for workshops with participants who might not
have a UC Boulder account.
have a UC Boulder account. The Hash authenticator is not part of the default
JupyterHub setup we use, so you will have to create a :ref:`self-made-hub-image`.


User whitelist and admin accounts
---------------------------------
Expand Down Expand Up @@ -163,7 +167,7 @@ with participants who do not have a UC Boulder account.

To be able to use the hash authenticator you will need to have a custom image
for your hub as the Hash authenticator package is not installed by default.
See the :code:`hub-images/` subdirectory for how to create a custom image.
You will have to create a :ref:`self-made-hub-image`.

The public part of the configuration has to be done in :code:`<NAMEOFYOURHUB>/values.yaml`:

Expand Down
30 changes: 30 additions & 0 deletions docs/source/day-to-day.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,36 @@ written using YAML that describe the state we want the hub to be in. After you
create a new chart describing a hub configuration and merge it, travis will
take care of making the real world correspond to your wishes.

All the hub deployments are based on the `Zero to JupyterHub guide
<http://zero-to-jupyterhub.readthedocs.io/>`_
(`GitHub repository <https://github.com/jupyterhub/zero-to-jupyterhub-k8s>`_).
The guide provides excellent advice on configuring your hub as well as a helm
chart that we use. Each of the hubs here can use a different version of the
Z2JH helm chart. This raises two questions: which version should I use and how
do I find out what versions are available?

All versions of the JupyterHub helm charts are available from `<https://jupyterhub.github.io/helm-chart/>`_.
We are currently using a `development release <https://jupyterhub.github.io/helm-chart/#development-releases-jupyterhub>`_
of the chart for msot hubs. The reason for this is that a lot of new features
have been added but no new release has been made (should happen in August 2018).
If you do not know better picking the latest development relase is a good choice.

To change the version of the hub that you are using edit :code:`<hubname>/requirements.yaml`.
The below snippet shows how to use :code:`v0.7-578b3a2`:

.. code-block:: yaml
dependencies:
- name: jupyterhub
version: "v0.7-578b3a2"
repository: "https://jupyterhub.github.io/helm-chart"
You can also inspect what version :code:`staginghub/requirements.yaml` is
using. Unless there are security related fixes or bugs that hinder your use of
a specific version of a chart the recommendation is to not update your chart
version during a workshop. Over the course of a semester it might be worth
upgrading to the latest version, but should mostly be avoided.

Take a look at :code:`staginghub/` as an example chart to base yours on. A chart can
describe anything from a simple to a very complex setup. We typically use them
for low complexity things. The most important file is :code:`values.yaml` which is
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ Welcome to the hub-ops's documentation!

day-to-day
authentication
additional-hub-configuration
tooling
initial-setup

0 comments on commit a5a28ea

Please sign in to comment.