Merge pull request #79 from betatim/new-hub-docs

[MRG] Add more documentation
earthlab · Aug 5, 2018 · a5a28ea · a5a28ea
2 parents 74bdeb2 + 2dc6e34
commit a5a28ea
Show file tree

Hide file tree

Showing 4 changed files with 245 additions and 2 deletions.
diff --git a/docs/source/additional-hub-configuration.rst b/docs/source/additional-hub-configuration.rst
@@ -0,0 +1,208 @@
+Additional ideas and snippets to customise your hub
+===================================================
+
+This is a collection of snippets and pointers to further customise your hub.
+
+
+Custom user image
+-----------------
+
+Each hub can have a different environment, set of libraries and tools that is
+provided to students. Hub's have a default user image, but it does not contain
+many tools useful for doing science. However you can use it to test the rest
+of your hub's setup.
+
+The `Jupyter docker stacks <https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html>`_
+provide a good collection of user images to start from. They are maintained by
+the Jupyter team and updated reasonably often. They already work with JupyterHub
+so you can quickly get going. Take a look at `the relationship between images <https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships>`_
+to get an idea of how the images relate to each other and what is installed
+in each.
+
+To configure your hub to use the `datascience-notebook <https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-datascience-notebook>`_
+image edit your :code:`<hubname>/values.yaml` and add the following snippet:
+
+.. code-block:: yaml
+
+    jupyterhub:
+      singleuser:
+        image:
+          name: jupyter/datascience-notebook
+          tag: 135a595d2a93
+        startTimeout: 600
+
+You have to specify both the name and an explicit tag. You can not use :code:`latest`
+as tag.
+
+Pulling images from docker hub can be a bit slow at times. This means it is a
+good idea to increase the :code:`startTimeout` to 600 seconds as shown above.
+
+
+Self-made user image
+--------------------
+
+Sometimes none of the images available as part of the docker-stacks is enough
+and you want to build a custom image. A good way to get started with this is
+to base your work on a docker-stacks image that does most of what you need
+and then customise it further. If you need the libraries from the
+:code:`earth-analytics-python-env` you can start from that Docker image as
+well.
+
+To create a self-made user image create a new directory in the
+:code:`user-images/` directory that has the same name as your hub.
+
+In this directory place a :code:`Dockerfile`. This will be automatically
+built by travis. To allow travis to find images and determine which need
+rebuilding when you need to follow this naming convention.
+
+Below an example of a minimally modified earth-analytics-python-env
+docker image. It picks a specific tag of the earth-analytics-python-env and
+then installs JupyterHub version 0.9.0. It also installs the `nbzip <https://github.com/data-8/nbzip>`_
+notebook extension that lets students download the contents of their JupyterHub
+home directories as a ZIP file to their local machine. The three commands that
+install and enable the extension are typical for notebook extensions.
+
+.. code-block:: shell
+
+    FROM earthlab/earth-analytics-python-env:41ae80f
+
+    RUN pip install --no-cache --upgrade --upgrade-strategy only-if-needed \
+      jupyterhub==0.9.0 nbzip==0.0.4
+
+    RUN jupyter serverextension enable --py nbzip --sys-prefix
+    RUN jupyter nbextension install --py nbzip --sys-prefix
+    RUN jupyter nbextension enable --py nbzip --sys-prefix
+
+This image will be automatically built by travis. You will need to adjust your
+hub's :code:`values.yaml` to use this image:
+
+.. code-block:: yaml
+
+    jupyterhub:
+      singleuser:
+        image:
+          # tag will be set by travis on deployment
+          name: earthlabhubops/ea-k8s-user-<hubname>
+          tag: set-on-deployment
+        startTimeout: 600
+
+By following the convention that the custom user image for your hub is placed in
+:code:`user-images/<hubname>` your docker image will be called :code:`earthlabhubops/ea-k8s-user-<hubname>`.
+You do not have to set the tag by hand, travis will take care of that for you.
+
+Pulling images from docker hub can be a bit slow at times. This means it is a
+good idea to increase the :code:`startTimeout` to 600 seconds as shown above.
+
+
+Prefetching data
+----------------
+
+It can be worth prefetching data for your students and including it directly
+in the docker image. This means they will not have to wait when the course
+starts. The downside is that your docker image gets bigger. Unfortunately we
+can not directly add data to student's home directories. We can only bake this
+data into the docker image used for each user. In this example we also setup
+the necessary steps for the data to be copied over to each student's home
+directory when they log into the hub.
+
+To include data in your docker image create a custom user image for your hub
+by following `Self-made user image`_.
+
+An example of using :code:`earthpy` to download the :code:`spatial-vector-lidar`
+dataset is given below:
+
+.. code-block:: shell
+
+    # Have to explicitly change the matplotlib backend in order to use
+    # earthpy on the command line.
+    RUN python -c "import matplotlib; matplotlib.use('Agg'); import earthpy; data = earthpy.io.EarthlabData('/data'); data.get_data('spatial-vector-lidar')"
+
+The general idea is to execute a Python command to trigger the download and
+store the results in :code:`/data`. You could use any kind of command to do this.
+For example you could use :code:`wget` to fetch a dataset from FigShare or
+any other website. Try out your command locally to make sure it does exactly
+what you think it should do.
+
+You can place the data in almost any location inside the container. By convention
+we use :code:`/data` though.
+
+If all you need is that the data is available in the container then you are done
+now. If you'd like to also copy the data over to the students home directory
+read the below snippet:
+
+.. code-block:: yaml
+
+    jupyterhub:
+      singleuser:
+        lifecycleHooks:
+          postStart:
+            exec:
+              command:
+                - "sh"
+                - "-c"
+                - >
+                  mkdir -p /home/jovyan/earth-analytics/data;
+                  rsync --ignore-existing -razv --progress /data/ /home/jovyan/earth-analytics/data;
+
+The :code:`lifecycleHooks` entry in the :code:`values.yaml` of your hub give
+you the option to run commands when a user's pod starts. You can place any
+command here. Keep in mind that the user can start interacting with their pod
+already before these commands complete. This means you want commands in this
+section to runreasonably quickly. Otherwise users might be confused or interfere
+with the commands here.
+
+The above snippet does two things: it makes sure that the :code:`earth-analytics/data`
+directory exists in the users home directory. After that it uses :code:`rsync`
+to copy the data from :code:`/data` to this directory. The way :code:`rsync` is
+configured means that it will not overwrite files that already exist in the user's
+home directory. The assumption is that a user might have edited these files and
+does not want them to be overwritten. If users want to refresh their datasets
+because they broke something they can delete that file or dataset, stop their
+server, and then restart it. They should now have the latest version of the
+data again. Or they can run the above :code:`rsync` command manually.
+
+
+.. _self-made-hub-image:
+
+Self-made hub image
+-------------------
+
+You can customise the image and environment in which the JupyterHub itself runs.
+This is useful when you want to use custom authenticators. To create a custom
+hub image create a directory called :code:`hub-images/<hubname>`.
+
+An example of installing the Hash authenticator is given here:
+
+.. code-block:: shell
+
+    # the tag given here has to be compatible with the version of the
+    # helm chart you are using for this hub.
+    FROM jupyterhub/k8s-hub:f8dec3f
+
+    USER root
+    RUN pip3 install --no-cache-dir \
+             jupyterhub-hashauthenticator==0.4.0
+
+    USER ${NB_USER}
+
+This image will be automatically built by travis. You will need to adjust your
+hub's :code:`values.yaml` to use this image:
+
+.. code-block:: yaml
+
+    jupyterhub:
+      hub:
+        image:
+          # tag will be set by travis on deployment
+          name: earthlabhubops/ea-k8s-hub-<hubname>
+          tag: set-on-deployment
+
+By following the convention that the custom hub image for your hub is placed in
+:code:`hub-images/<hubname>` your hub's docker image will be called :code:`earthlabhubops/ea-k8s-hub-<hubname>`.
+You do not have to set the tag by hand, travis will take care of that for you.
+
+
+Custom authentication
+---------------------
+
+To configure the authentication mechanism read :ref:`authentication`.
diff --git a/docs/source/authentication.rst b/docs/source/authentication.rst
@@ -1,3 +1,5 @@
+.. _authentication:
+
 Authentication for your hub
 ===========================
 
@@ -20,7 +22,9 @@ GitHub account already.
 will not need to create a new account.
 
 `Hash authentication` is good for workshops with participants who might not
-have a UC Boulder account.
+have a UC Boulder account. The Hash authenticator is not part of the default
+JupyterHub setup we use, so you will have to create a :ref:`self-made-hub-image`.
+
 
 User whitelist and admin accounts
 ---------------------------------
@@ -163,7 +167,7 @@ with participants who do not have a UC Boulder account.
 
 To be able to use the hash authenticator you will need to have a custom image
 for your hub as the Hash authenticator package is not installed by default.
-See the :code:`hub-images/` subdirectory for how to create a custom image.
+You will have to create a :ref:`self-made-hub-image`.
 
 The public part of the configuration has to be done in :code:`<NAMEOFYOURHUB>/values.yaml`:
 

diff --git a/docs/source/day-to-day.rst b/docs/source/day-to-day.rst
@@ -52,6 +52,36 @@ written using YAML that describe the state we want the hub to be in. After you
 create a new chart describing a hub configuration and merge it, travis will
 take care of making the real world correspond to your wishes.
 
+All the hub deployments are based on the `Zero to JupyterHub guide
+<http://zero-to-jupyterhub.readthedocs.io/>`_
+(`GitHub repository <https://github.com/jupyterhub/zero-to-jupyterhub-k8s>`_).
+The guide provides excellent advice on configuring your hub as well as a helm
+chart that we use. Each of the hubs here can use a different version of the
+Z2JH helm chart. This raises two questions: which version should I use and how
+do I find out what versions are available?
+
+All versions of the JupyterHub helm charts are available from `<https://jupyterhub.github.io/helm-chart/>`_.
+We are currently using a `development release <https://jupyterhub.github.io/helm-chart/#development-releases-jupyterhub>`_
+of the chart for msot hubs. The reason for this is that a lot of new features
+have been added but no new release has been made (should happen in August 2018).
+If you do not know better picking the latest development relase is a good choice.
+
+To change the version of the hub that you are using edit :code:`<hubname>/requirements.yaml`.
+The below snippet shows how to use :code:`v0.7-578b3a2`:
+
+.. code-block:: yaml
+
+    dependencies:
+    - name: jupyterhub
+      version: "v0.7-578b3a2"
+      repository: "https://jupyterhub.github.io/helm-chart"
+
+You can also inspect what version :code:`staginghub/requirements.yaml` is
+using. Unless there are security related fixes or bugs that hinder your use of
+a specific version of a chart the recommendation is to not update your chart
+version during a workshop. Over the course of a semester it might be worth
+upgrading to the latest version, but should mostly be avoided.
+
 Take a look at :code:`staginghub/` as an example chart to base yours on. A chart can
 describe anything from a simple to a very complex setup. We typically use them
 for low complexity things. The most important file is :code:`values.yaml` which is

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -7,5 +7,6 @@ Welcome to the hub-ops's documentation!
 
    day-to-day
    authentication
+   additional-hub-configuration
    tooling
    initial-setup