Skip to content

Commit

Permalink
Merge pull request #16 from bird-house/add-jupyterhub-user-data-dir
Browse files Browse the repository at this point in the history
Add JupyterHub user data directories and integrate with Magpie for authentication.

* **JupyterHub authentication is integrated with Magpie**
  * Adding new user and password management is done via Magpie
  * User public/public needs to exist on Magpie for the demo account to work
  * **Since user "public" exists on Magpie, any "secure" folder on Thredds that needs authentication will be visible to that user !** Probably okay for now since everything on Thredds is public but we have to keep that in mind.  We will need that fine grained permission capabilities soon.

* **Each user have their own private workspace and Jupyter server**
  * Can only see and modify/delete their own files
  * Can only restart their own Jupyter server so can not kill other people running jobs
  * Of course this **do not apply to the demo public/public user account**

* **Login banner changed with `SUPPORT_EMAIL` from `env.local` for users to request private account/workspace**
  * Review wording please
  * Is this a good email to use?  For Ouranos it is pavics@ouranos.ca

* **New persistence on disk: `/data/jupyterhub_user_data`**
    * To include in backup

* **Other features**
  * User's preferences (theme, ...) are also persisted
  * Update JupyterHub version

* **Regressions**
  * Can not live update the JupyterHub anymore without killing all the user jupyter container and consequently killing their running jobs, might relate to the JupyterHub version update
  • Loading branch information
tlvu committed Feb 3, 2020
2 parents ccf8c22 + 9edeb73 commit 53576cc
Show file tree
Hide file tree
Showing 16 changed files with 182 additions and 75 deletions.
23 changes: 20 additions & 3 deletions birdhouse/README.md
Expand Up @@ -44,12 +44,12 @@ postgres instance. See [`scripts/create-wps-pgsql-databases.sh`](scripts/create-

## Manual steps post deployment

Change geoserver default admin password:
### Change geoserver default admin password

* Go to
https://<PAVICS_HOST>/geoserver/web/wicket/bookmarkable/org.geoserver.security.web.UserGroupRoleServicesPage (Security -> Users, Groups, and Roles)
https://<PAVICS_FQDN>/geoserver/web/wicket/bookmarkable/org.geoserver.security.web.UserGroupRoleServicesPage (Security -> Users, Groups, and Roles)

* Login using the default username `admin` and password `geoserver`.
* Login using the default username `admin` and default password `geoserver`.

* Click on tab "Users/Groups".

Expand All @@ -60,6 +60,23 @@ Change geoserver default admin password:
* Click "Save".


### Create `public` user in Magpie for JupyterHub login

* Go to
https://<PAVICS_FQDN>/magpie/ui/login, login with the `admin` user,
password should be in `env.local`.

* Then go to https://<PAVICS_FQDN>/magpie/ui/users/add

* Fill in:
* User name: `public`
* Email: anything is fine
* Password: `public`
* User group: `anonymous`

* Click "Add User".


## Mostly automated unattended continuous deployment

Automated unattended continuous deployment means if code change in the checkout
Expand Down
4 changes: 1 addition & 3 deletions birdhouse/common.env
@@ -1,6 +1,4 @@
# All env this common.env can be overridden by env.local.

# dupe with backup-juputerhub-notebooks.sh
export DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME=jupyterhub_user_persistence

export DOCKER_NOTEBOOK_IMAGE="pavics/workflow-tests:200120"
export JUPYTERHUB_USER_DATA_DIR="/data/jupyterhub_user_data"
1 change: 1 addition & 0 deletions birdhouse/config/jupyterhub/custom_templates/.gitignore
@@ -0,0 +1 @@
login.html
26 changes: 26 additions & 0 deletions birdhouse/config/jupyterhub/custom_templates/login.html.template
@@ -0,0 +1,26 @@
{% extends "templates/login.html" %} {% set announcement_login = '

<p><strong>Public login:</strong> public/public</p>
<p>
Given this public nature, anyone can tamper with your notebooks so please
<strong>export your valuable notebooks elsewhere</strong> if you want to
preverve them.
</p>
<p>
Contact <strong>${SUPPORT_EMAIL}</strong> for information on how to
<strong>get an account and a private workspace</strong>.
</p>
<p>
The only writable folder is <strong>writable-workspace</strong>
(/notebook_dir/writable-workspace in the terminal) and it is persisted
between sessions.
</p>
<p>
Please <strong>be considerate</strong> with the amount of
<strong>disk space usage</strong> on this Jupyter instance.
</p>
<p>
This Jupyter instance can restart every day.
<strong>Long running processes will be killed without notice.</strong>
</p>
' %}
10 changes: 0 additions & 10 deletions birdhouse/config/jupyterhub/entrypoint

This file was deleted.

46 changes: 44 additions & 2 deletions birdhouse/config/jupyterhub/jupyterhub_config.py.template
@@ -1,21 +1,63 @@
import os
from os.path import join
import logging
import subprocess

c.JupyterHub.bind_url = 'http://:8000/jupyter'
c.JupyterHub.hub_ip = 'jupyterhub'

c.JupyterHub.authenticator_class = 'jupyterhub_magpie_authenticator.MagpieAuthenticator'
c.MagpieAuthenticator.magpie_url = "http://magpie:2001"

c.JupyterHub.cookie_secret_file = '/persist/jupyterhub_cookie_secret'
c.JupyterHub.db_url = '/persist/jupyterhub.sqlite'

c.JupyterHub.template_paths = ['/template_paths']
c.JupyterHub.template_paths = ['/custom_templates']

c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'

c.DockerSpawner.image = os.environ['DOCKER_NOTEBOOK_IMAGE']
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.network_name = os.environ['DOCKER_NETWORK_NAME']

notebook_dir = '/notebook_dir'
jupyterhub_data_dir = os.environ['JUPYTERHUB_USER_DATA_DIR']
host_user_data_dir = join(jupyterhub_data_dir, "{username}")
container_workspace_dir = join(notebook_dir, "writable-workspace")
container_home_dir = join(container_workspace_dir, ".home")

c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.volumes = { os.environ['DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME']: notebook_dir }
c.DockerSpawner.environment = {
"HOME": container_home_dir
}

c.DockerSpawner.volumes = {host_user_data_dir: container_workspace_dir}

host_tutorial_notebooks_dir = join(jupyterhub_data_dir, "tutorial-notebooks")
if os.path.exists(host_tutorial_notebooks_dir):
c.DockerSpawner.volumes[host_tutorial_notebooks_dir] = {
"bind": join(notebook_dir, "tutorial-notebooks"),
"mode": "ro"
}

readme = join(jupyterhub_data_dir, "README.ipynb")
if os.path.exists(readme):
c.DockerSpawner.volumes[readme] = {
"bind": join(notebook_dir, "README.ipynb"),
"mode": "ro"
}

def create_dir_hook(spawner):
username = spawner.user.name
user_dir = join(jupyterhub_data_dir, username)

if not os.path.exists(user_dir):
os.mkdir(user_dir, 0o755)

subprocess.call(["chown", "-R", "1000:1000", user_dir])

c.Spawner.pre_spawn_hook = create_dir_hook

c.DockerSpawner.default_url = '/lab'
c.DockerSpawner.remove = True # delete containers when servers are stopped
${ENABLE_JUPYTERHUB_MULTI_NOTEBOOKS}
Expand Down
8 changes: 0 additions & 8 deletions birdhouse/config/jupyterhub/template_paths/login.html

This file was deleted.

21 changes: 17 additions & 4 deletions birdhouse/deployment/deploy.sh
Expand Up @@ -129,13 +129,20 @@ cd $COMPOSE_DIR

. ./common.env

set +x # hide passwd in env.local in logs
# reload again after common.env since env.local can override common.env
. $ENV_LOCAL_FILE
set -x

# stop all to force reload any changed config that are volume-mount into the containers
./pavics-compose.sh stop

# this container is not managed by docker-compose, have to handle it manually
# user containers are not managed by docker-compose, have to handle them manually
# rm and not just stop to force spawning newer image
docker stop jupyter-public
docker rm jupyter-public
for jupyter_cont in `docker ps --format '{{.Names}}' | grep jupyter-`; do
docker stop $jupyter_cont
docker rm $jupyter_cont
done

# override git ssh command because this repo is private and need proper credentials
#
Expand All @@ -158,6 +165,12 @@ cd $COMPOSE_DIR
# reload again after git pull because this file could be changed by the pull
. ./common.env

set +x # hide passwd in env.local in logs
# reload again after common.env since env.local can override common.env
# (ex: JUPYTERHUB_USER_DATA_DIR)
. $ENV_LOCAL_FILE
set -x

# restart everything, only changed containers will be destroyed and recreated
./pavics-compose.sh up -d

Expand All @@ -166,7 +179,7 @@ docker pull $DOCKER_NOTEBOOK_IMAGE

# deploy new README.ipynb to Jupyter instance
docker run --rm --name deploy_README_ipynb \
-v $DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME:/notebook_dir \
-v "$JUPYTERHUB_USER_DATA_DIR":/notebook_dir \
-v $REPO_ROOT/docs/source/notebooks:/nb:ro \
-u root \
bash \
Expand Down
8 changes: 7 additions & 1 deletion birdhouse/deployment/install-deploy-notebook
Expand Up @@ -29,9 +29,15 @@ if [ ! -e "$REPO_ROOT/birdhouse/deployment/trigger-deploy-notebook" ]; then
exit 2
fi

. "$REPO_ROOT/birdhouse/common.env"

if [ -f "$REPO_ROOT/birdhouse/env.local" ]; then
# allow override of JUPYTERHUB_USER_DATA_DIR
. "$REPO_ROOT/birdhouse/env.local"
fi

set -x

sudo cp -v $REPO_ROOT/birdhouse/deployment/trigger-deploy-notebook $CRON_FILE
cat $REPO_ROOT/birdhouse/deployment/trigger-deploy-notebook | envsubst '${JUPYTERHUB_USER_DATA_DIR}' | sudo tee $CRON_FILE
sudo chown root:root $CRON_FILE
sudo chmod 755 $CRON_FILE
15 changes: 8 additions & 7 deletions birdhouse/deployment/trigger-deploy-notebook
Expand Up @@ -12,10 +12,11 @@
#
# Logs to /var/log/PAVICS/notebookdeploy.log, re-use existing logrotate.


if [ -z "$DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME" ]; then
# dupe with pavics-compose.sh
DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME=jupyterhub_user_persistence
if [ -z "$JUPYTERHUB_USER_DATA_DIR" ]; then
# running script manually (not with cron) source env.local file.
COMPOSE_DIR="$(dirname -- "$(dirname -- "$(realpath -- "$0")")")"
. "$COMPOSE_DIR/common.env" # default JUPYTERHUB_USER_DATA_DIR
. "$COMPOSE_DIR/env.local" # optional override JUPYTERHUB_USER_DATA_DIR
fi

LOG_FILE="/var/log/PAVICS/notebookdeploy.log"
Expand Down Expand Up @@ -70,8 +71,8 @@ cat << __EOF__ > $TMP_SCRIPT
#!/bin/sh -x
cd $NOTEBOOK_DIR_MNT
rm -rf $TUTORIAL_NOTEBOOKS_DIR
cp -rv /$TUTORIAL_NOTEBOOKS_DIR $TUTORIAL_NOTEBOOKS_DIR
rm -rf $TUTORIAL_NOTEBOOKS_DIR/*
cp -rv /$TUTORIAL_NOTEBOOKS_DIR/* $TUTORIAL_NOTEBOOKS_DIR
# make read-only
chown -R root:root $TUTORIAL_NOTEBOOKS_DIR
__EOF__
Expand All @@ -83,7 +84,7 @@ docker run --rm \
-u root \
-v $TMP_SCRIPT:/deploy-notebook:ro \
-v $TMPDIR/$TUTORIAL_NOTEBOOKS_DIR:/$TUTORIAL_NOTEBOOKS_DIR:ro \
-v $DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME:$NOTEBOOK_DIR_MNT:rw \
-v "$JUPYTERHUB_USER_DATA_DIR":$NOTEBOOK_DIR_MNT:rw \
--entrypoint /deploy-notebook \
bash

Expand Down
22 changes: 14 additions & 8 deletions birdhouse/docker-compose.yml
Expand Up @@ -248,7 +248,13 @@ services:
entrypoint: /entrypointwrapper
restart: always
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080${TWITCHER_PROTECTED_PATH}/thredds/catalog.html"]
test:
[
"CMD",
"curl",
"--fail",
"http://localhost:8080${TWITCHER_PROTECTED_PATH}/thredds/catalog.html",
]

mongodb:
image: mongo:3.4.0
Expand Down Expand Up @@ -330,7 +336,7 @@ services:
restart: always

jupyterhub:
image: pavics/jupyterhub:1.0-20190501
image: pavics/jupyterhub:1.0.0-20200130
container_name: jupyterhub
hostname: jupyterhub
ports:
Expand All @@ -339,21 +345,21 @@ services:
#DOCKER_NOTEBOOK_IMAGE: jupyter/scipy-notebook:latest
DOCKER_NOTEBOOK_IMAGE: ${DOCKER_NOTEBOOK_IMAGE}
DOCKER_NETWORK_NAME: jupyterhub_network
DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME: ${DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME}
JUPYTERHUB_USER_DATA_DIR: ${JUPYTERHUB_USER_DATA_DIR}
JUPYTERHUB_ADMIN_USERS: ${JUPYTERHUB_ADMIN_USERS}
JUPYTERHUB_USERS_PASS: ${JUPYTERHUB_USERS_PASS}
volumes:
- ./config/jupyterhub/jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro
- ./config/jupyterhub/entrypoint:/entrypoint:ro
- ./config/jupyterhub/template_paths:/template_paths:ro
- ./config/jupyterhub/custom_templates:/custom_templates:ro
- ${JUPYTERHUB_USER_DATA_DIR}:${JUPYTERHUB_USER_DATA_DIR}
- jupyterhub_data_persistence:/persist:rw
- /var/run/docker.sock:/var/run/docker.sock:rw
links:
- magpie
networks:
# ensure Hub and Notebook servers are on the same network 'jupyterhub_network'
# the Hub and the rest of the stack are on network 'default'
networks:
- default
- jupyterhub_network
entrypoint: /entrypoint
restart: always

# need external network so the folder name is not prefixed to network name
Expand Down
6 changes: 5 additions & 1 deletion birdhouse/env.local.example
Expand Up @@ -16,7 +16,6 @@ export TOMCAT_NCWMS_PASSWORD=ncwmspass
export SUPPORT_EMAIL=helpdesk@example.com
export CMIP5_THREDDS_ROOT=birdhouse/CMIP5/CCCMA
export JUPYTERHUB_ADMIN_USERS="{'admin'}" # python set syntax
export JUPYTERHUB_USERS_PASS="admin:admin public:public" # space separated 'username:passwd' format
export CATALOG_USERNAME=admin-catalog
export CATALOG_PASSWORD=qwerty
export CATALOG_THREDDS_SERVICE=thredds
Expand Down Expand Up @@ -104,6 +103,11 @@ export POSTGRES_MAGPIE_PASSWORD=postgres-qwerty
# }
#"

# The parent folder where all the user notebooks will be stored.
# For example, a user named "bob" will have his data in $JUPYTERHUB_USER_DATA_DIR/bob
# and this folder will be mounted when he logs into JupyterHub.
#export JUPYTERHUB_USER_DATA_DIR=/data/jupyterhub_user_data

# Extra PyWPS config for **all** WPS services (currently only Flyingpigeon, Finch and Raven supported).
# export EXTRA_PYWPS_CONFIG="
# [logging]
Expand Down
12 changes: 0 additions & 12 deletions birdhouse/pavics-compose.sh
Expand Up @@ -18,7 +18,6 @@ VARS='
$SUPPORT_EMAIL
$CMIP5_THREDDS_ROOT
$JUPYTERHUB_ADMIN_USERS
$JUPYTERHUB_USERS_PASS
$CATALOG_USERNAME
$CATALOG_PASSWORD
$CATALOG_THREDDS_SERVICE
Expand Down Expand Up @@ -125,19 +124,8 @@ if [[ $1 == "up" ]]; then

# no error if already exist
# create externally so nothing will delete these data volume automatically
docker volume create ${DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME} # users notebooks
docker volume create jupyterhub_data_persistence # jupyterhub db and cookie secret
docker volume create thredds_persistence # logs, cache

# set proper write permissions for image pavics/workflow-tests
# this is shared for all users, no multi users isolation
# see open issue https://github.com/moby/moby/issues/32582 [feature] Allow
# mounting sub-directories of named volumes
# for multi user isolation, we should leverage sub-directories of named
# volumes
docker run --rm --name set_permissions \
-v ${DOCKER_JUPYTERHUB_USER_PERSISTENCE_VOLUME}:/notebook_dir bash \
bash -c "chown 1000:1000 /notebook_dir"
fi

COMPOSE_CONF_LIST="-f docker-compose.yml"
Expand Down
16 changes: 0 additions & 16 deletions birdhouse/scripts/backup-juputerhub-notebooks.sh

This file was deleted.

20 changes: 20 additions & 0 deletions birdhouse/scripts/backup-jupyterhub-notebooks.sh
@@ -0,0 +1,20 @@
#!/bin/sh -x
# Backup to /tmp/jupyterhub_user_persistence.tgz with default values.

if [ -z "$BACKUP_OUT_DIR" ]; then
BACKUP_OUT_DIR=/tmp
fi

if [ -z "$JUPYTERHUB_USER_DATA_DIR" ]; then
JUPYTERHUB_USER_DATA_DIR=/data/jupyterhub_user_data
fi

docker run --rm \
--name backup_jupyterhub_data \
-u root \
-v "$BACKUP_OUT_DIR":/backups \
-v "$JUPYTERHUB_USER_DATA_DIR":/data_vol_to_backup:ro \
bash \
tar czvf /backups/jupyterhub_user_data.tgz -C /data_vol_to_backup .

# vi: tabstop=8 expandtab shiftwidth=4 softtabstop=4

0 comments on commit 53576cc

Please sign in to comment.