<style type="text/css" media="screen">
a:link { color:#551199; text-decoration: none; }
a:visited { color:#FFFFFF; text-decoration: none; }
a:hover { color:#FFFFFF; text-decoration: none; }
a:active { color:#FFFFFF; text-decoration: underline; }
</style>
# <a id="sc" style="text-decoration:none;">Lessons Learn from deploying our own Jupyterhub Server</a>
![](Pictures/juphub.png)                                                                                      ![](Pictures/hpedevlogo-NB.JPG)       


<br>


The {{ BRANDING }} Community Team has been using Juyter Notebooks for more than a year now. We started using them during TSS 2020. We looked at the different deployment possibilities as we were looking for  a way to use multiple instances of notebooks at once. We therefore directed our effort towards JupyterHub. 

**What is JupyterHub?**

JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Users - including students, researchers, and data scientists - can get their work done in their own workspaces on shared resources which can be managed efficiently by system administrators.

JupyterHub runs in the cloud or on your own hardware, and makes it possible to serve a pre-configured data science environment to any user in the world. It is customizable and scalable, and is suitable for small and large teams, academic courses, and large-scale infrastructure.
Key features of JupyterHub

**Customizable** - JupyterHub can be used to serve a variety of environments. It supports dozens of kernels with the Jupyter server, and can be used to serve a variety of user interfaces including the Jupyter Notebook, Jupyter Lab, RStudio, nteract, and more.

**Flexible** - JupyterHub can be configured with authentication in order to provide access to a subset of users. Authentication is pluggable, supporting a number of authentication protocols (such as OAuth and GitHub).

**Scalable** - JupyterHub is container-friendly, and can be deployed with modern-day container technology. It also runs on Kubernetes, and can run with up to tens of thousands of users.

**Portable** - JupyterHub is entirely open-source and designed to be run on a variety of infrastructure. This includes commercial cloud providers, virtual machines, or even your own laptop hardware.

The foundational JupyterHub code and technology can be found in the JupyterHub repository. This repository and the JupyterHub documentation contain more information about the internals of JupyterHub, its customization, and its configuration.
Deploy a JupyterHub

The Jupyter Community curates two JupyterHub “distributions” for deploying in the cloud. Follow the links below for more information.

[Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org/) deploys JupyterHub on Kubernetes using Docker, allowing it to be scaled and maintained efficiently for large numbers of users. Zero to JupyterHub is a Helm Chart for deploying JupyterHub quickly, as well as a guide to deploying and configuring your JupyterHub on Kubernetes.

[The Littlest JupyterHub](https://tljh.jupyter.org/), a recent and evolving distribution designed for smaller deployments, is a lightweight method to install JupyterHub on a single virtual machine. The Littlest JupyterHub (also known as TLJH), provides a guide with information on creating a VM on several cloud providers, as well as installing and customizing JupyterHub so that users may access it at a public URL.


When to use TLJH vs. Z2JH

The choice between TLJH and Z2JH ultimately comes down to only a few questions:

Do you want your hub and all users to live on a single, larger machine vs. spreading users on a cluster of smaller machines that are scaled up or down?

     If you can use a single machine, we recommend The Littlest JupyterHub.

     If you wish to use multiple machines, we recommend Zero to JupyterHub for Kubernetes.

Do you need to use container technology?

      If no, we recommend The Littlest JupyterHub.

      If yes, we recommend Zero to JupyterHub for Kubernetes.



In [None]:
Finally, one could leverage an existing HPE Ezmeral Container Platform to deploy a JupyterHub server leveraging the Zero to Kubernetes approach...

</br>
<p><i class="fas fa-2x fa-walking" style="color:#551199;"></i>&nbsp;&nbsp;<b>1-Single Server Deployment </b></p>

In our case, we went for a single server approach to keep things simple. As we wanted to size properly the server in order to support multiple workshops with multiple students, we went for:
* 1 DL360 Gen 10
* 192 Gigas of Ram 
* 400 Gigas of HDD
* Ubuntu 18.04

You can also leverage Cloud providers:
* [Digital Ocean](https://tljh.jupyter.org/en/latest/install/digitalocean.html)
* [OVH but Strasbourg France](https://tljh.jupyter.org/en/latest/install/ovh.html)
* [Jetstream](https://tljh.jupyter.org/en/latest/install/jetstream.html)
* [Google Cloud](https://tljh.jupyter.org/en/latest/install/google.html)
* [AWS](https://tljh.jupyter.org/en/latest/install/amazon.html)
* [Azure](https://tljh.jupyter.org/en/latest/install/azure.html)


[Manual Installation Guide](https://tljh.jupyter.org/en/latest/install/custom-server.html)
[Sizing Considerations](https://tljh.jupyter.org/en/latest/howto/admin/resource-estimation.html#howto-admin-resource-estimation)

Originally, we deployed everything **by hand**, keeping trace of all packages installation a single readme file. However, as we went further with developping new notebooks for future workshops, we figured out that several JupyterHub servers would be requirered:
* A Sandbox environemnt to test new ideas
* A Staging environment to develop new workshops' notebooks
* A Production environment to host events' workshops

Our first attempt to automation was targetting the notebooks deployment:

* Each workshop is associated to a local (and LDAP in some cases) users range from student0 to student2000
 * This workshop for instance is leveraging a range from 1300 to 1400
* All reference workshops are associated to student0
 
When we deployed a workshop a year ago: we used a single ansible playbook to copy content from student0 to the selected users (x to z)

Later on and with the help of **Bruno Cornec**, we started building up several scripts to:
* Perform Server Preparation & JupyterHub Installation
* Improve Notebooks Deployment (Leveraging Ansible Variables)
* Check Platform Sanity on a daily basis

To manage the different environments:
we defined a set of yaml files to specify the different parameters linked to the location / environment (IP Addresses, Hostnames, etc…). these yaml files would be digested by the different playbooks.


In [None]:
PBKDIR: staging
JPHOST: jupyter3.example.com
JPIP: xx.xx.xx.xx
JPHOSTEXT: nb3.example.com
JPHUBAPISRV: http://{{ JPHOST }}:8000
JPHUBTOKEN: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
BASESTDID: 0
APIENDPOINT: https://example.com/api
APIUSER: user
APIPWD: password
LDAPDMN: dc=example,dc=com
#
KIBANAPWD: "zzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
KIBANAPORT: "xxxxxx"
#
STACKSTORMSRVNAME: "sta-stackstorm"
STACKSTORMSRVIP: "xx.xx.xx.xx"
STACKSTORMWEBUIPORT: "yyyy"
#
VCENTERAPP: "vcenter.example.com"
VCENTERADMIN: "user"
VCENTERPWD: "xxxxxxx"
#

## We have 3 main Ansible playbooks:

**The first playbook based on a location parameter would perform:**
* the Jupyterhub installation
* System Update
* Repository Update
* Apps Installation
* System Performance Tuning
* Security Setup
* JupyterHub application installation and configuration
* kernels setup & configuration 
* Linux users creation
* JupyterHub users creation

Check [ansible_install_jupyterhub.yml](./ansible_install_jupyterhub.yml)

**The second playbook** would take care of deploying the reference notebooks on the newly created Jupyterhub.

Check [ansible_copy_folder.yml](./ansible_copy_folder.yml)

**A third playbook** is ran on demand and nightly to ensure that the configuration is consistent and up to date.

Check [ansible_check_jupyterhub.yml](./ansible_check_jupyterhub.yml)

We created two Git repositories, one for the infrastructure management and all the development we did to automate our deployments and a second one for the reference notebooks content.

Our git repositories are private for now. Should you be willing to replicate our work, please contact us and we will give you access to the repo, allowing you to leverage our work.

We actually tested this work on two new locations: 
1- We have now a JupyterHub running on HPE GreenLake. This server should become over time a new production site on top of the existing one.
2- We helped our colleagues from the Geneva Innovation Center to build up a new JupyterHub Server over there.

With these few information, you should be able to build up your own JupyterHub server to host:
* **Customers Workshops**
* **Team Trainings**
* **Demos**

<br><br>

## <i class="fas fa-2x fa-map-marker-alt" style="color:#551199;"></i>&nbsp;&nbsp;Next Steps


<h2>Conclusion&nbsp;&nbsp;&nbsp;&nbsp;<a href="4-WKSHP-Conclusion.ipynb" target="New" title="Next LAB: How to collaborate with {{ BRANDING }} Community Team to develop Notebook"><i class="fas fa-chevron-circle-right" style="color:#551199;"></i></a></h2>