Skip to content
Deploy a BinderHub from scratch on Microsoft Azure
Shell Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
azure/paas/arm Bump container version in ARM template Sep 7, 2019
images Add new images to folder Jun 10, 2019
templates Move YAML templates into a templates folder Jun 19, 2019
.gitignore Ignore VSCode settings Jun 11, 2019
.travis.yml Add windows testing in Sep 8, 2019
Dockerfile Fix kubectl architecture in Dockerfile May 29, 2019
LICENSE
README.md Tidy up json comments Sep 8, 2019
deploy.sh remove dependency of tenancy id, output role assignment as table Sep 8, 2019
info.sh Standard bash shebang Sep 8, 2019
logs.sh Standard bash shebang Sep 8, 2019
setup.sh Standard bash shebang Sep 8, 2019
teardown.sh Standard bash shebang Sep 8, 2019
template-config.json Merge branch 'master' into feature/acr Sep 7, 2019
upgrade.sh Standard bash shebang Sep 8, 2019

README.md

Automatically deploy a BinderHub to Microsoft Azure

mit_license_badge Build Status

BinderHub is a cloud-based, multi-server technology used for hosting repoducible computing environments and interactive Jupyter Notebooks built from code repositories.

This repo contains a set of scripts to automatically deploy a BinderHub onto Microsoft Azure, and connect either a Docker Hub account/organisation or an Azure Container Registry, so that you can host your own Binder service.

This repo is based on the following set of deployment scripts for Google Cloud: nicain/binder-deploy

You will require a Microsoft Azure account and subscription. A Free Trial subscription can be obtained here. You will be asked to provide a credit card for verification purposes. You will not be charged. Your resources will be frozen once your subscription expires, then deleted if you do not reactivate your account within a given time period. If you are building a BinderHub as a service for an organisation, your institution may already have an Azure account. You should contact your IT Services for further information regarding permissions and access (see the Service Principal Creation section below).

Table of Contents


Usage

This repo can either be run locally or as "Platform as a Service" through the "Deploy to Azure" button in the "Deploy to Azure" Button section.

To use these scripts locally, clone this repo and change into the directory.

git clone https://github.com/alan-turing-institute/binderhub-deploy.git
cd binderhub-deploy

To make the scripts executable and then run them, do the following:

chmod 700 <script-name>.sh
./<script-name>.sh

[NOTE: The above command is UNIX specific. If you are running Windows 10, this blog post discusses using a bash shell in Windows.]

To build the BinderHub, you should run setup.sh first (to install the required command line tools), then deploy.sh (which will build the BinderHub). Once the BinderHub is deployed, you can run logs.sh and info.sh to get the JupyterHub logs and IP addresses respectively. teardown.sh should only be used to delete your BinderHub deployment.

You need to create a file called config.json which has the format described in the code block below. Fill the quotation marks with your desired namespaces, etc. config.json is git-ignored so sensitive information, such as passwords and Service Principals, cannot not be pushed to GitHub.

  • For a list of available data centre regions, see here. This should be a region and not a location, for example "West Europe" or "Central US". These can be equivalently written as westeurope and centralus, respectively.
  • For a list of available Linux Virtual Machines, see here. This should be something like, for example Standard_D2s_v3.
  • The versions of the BinderHub Helm Chart can be found here and are of the form 0.2.0-<commit-hash>. It is advised to select the most recent version unless you specifically require an older one.
  • If you are deploying an Azure Container Registry, find out more about the SKU tiers here.
{
  "container_registry": "",        // Choose Docker Hub or ACR with 'dockerhub' or 'azurecr' values, respectively.
  "azure": {
    "subscription": "",            // Azure subscription name or ID (a hex-string)
    "res_grp_name": "",            // Azure Resource Group name
    "location": "",                // Azure Data Centre region
    "node_count": 1,               // Number of nodes to deploy. 3 is preferrable for a stable cluster, but may be liable to caps.
    "vm_size": "Standard_D2s_v3",  // Azure virtual machine type to deploy
    "sp_app_id": null,             // Azure service principal ID (optional)
    "sp_app_key": null,            // Azure service principal password (optional)
    "sp_tenant_id": null           // Azure tenant ID (optional)
  },
  "binderhub": {
    "name": "",                    // Name of your BinderHub
    "version": "",                 // Helm chart version to deploy, should be 0.2.0-<commit-hash>
    "image_prefix": ""             // The prefix to preppend to Docker images (e.g. "binder-prod")
  },
  "docker": {
    "username": null,              // Docker username (can be supplied at runtime)
    "password": null,              // Docker password (can be supplied at runtime)
    "org": null                    // A Docker Hub organisation to push images to (optional)
  },
  "acr": {
    "registry_name": null,         // Name to give the ACR. This must be alpha-numerical and unique to Azure.
    "sku": "Basic"                 // The SKU capacity and pricing tier for the ACR
  }
}

You can copy template-config.json should you require.

Please note that all entries in template-config.json must be surrounded by double quotation marks ("), with the exception of node_count.

Important for Free Trial subscriptions

If you have signed up to an Azure Free Trial subscription, you are not allowed to deploy more than 4 cores. How many cores you deploy depends on your choice of node_count and vm_size.

For example, a Standard_D2s_v3 machine has 2 cores. Therefore, setting node_count to 2 will deploy 4 cores and you will have reached your quota for cores on your Free Trial subscription.

Choosing between Docker Hub and Azure Container Registry

To select either a Docker Hub account/organisation or an Azure Container Registry (ACR), you must set the top-level container_registry key in config.json to either dockerhub or azurecr respectively. This will tell deploy.sh which variables and YAML templates to use. Then fill in the values under either the dockerhub or acr key as required.

Using a Docker Hub account/organisation has the benefit of being relatively simple to set up. However, all the BinderHub images pushed there will be publicly available. For a few extra steps, deploying an ACR will allow the BinderHub images to be pushed to a private repository.

Important Caveats when deploying an ACR

Service Principal:

In the Service Principal Creation section, we cover how to create a Service Principal in order to deploy a BinderHub. When following these steps, the --role argument of Contributor should be replaced with Owner. This is because the Service Principal will need the AcrPush role in order to push images to the ACR and the Contributor role does not have permission to create new role assignments.

setup.sh

This script checks whether the required command line tools are already installed. If any are missing, the script uses the system package manager or curl to install the command line interfaces (CLIs). The CLIs to be installed are:

Any dependencies that are not automatically installed by these packages will also be installed.

deploy.sh

This script reads in values from config.json and deploys a Kubernetes cluster. It then creates config.yaml and secret.yaml files which are used to install the BinderHub using the templates in the templates folder.

If you have chosen a Docker Hub account/organisation, the script will ask for your Docker ID and password if you haven't supplied them in the config file. The ID is your Docker username, NOT the associated email. If you have provided a Docker organisation in config.json, then Docker ID MUST be a member of this organisation.

If you have chosen an ACR, the script will create one and assign the AcrPush role to your Service Principal. The registry server and Service Principal credentials will then be parsed into config.yaml and secret.yaml so that the BinderHub can connect to the ACR.

Both a JupyterHub and BinderHub are installed via a Helm Chart onto the deployed Kubernetes cluster and the config.yaml file is updated with the JupyterHub IP address.

config.yaml and secret.yaml are both git-ignored so that secrets cannot be pushed back to GitHub.

The script also outputs log files (<file-name>.log) for each stage of the deployment. These files are also git-ignored.

logs.sh

This script will print the JupyterHub logs to the terminal to assist with debugging issues with the BinderHub. It reads from config.json in order to get the BinderHub name.

info.sh

This script will print the pod status of the Kubernetes cluster and the IP addresses of both the JupyterHub and BinderHub to the terminal. It reads the BinderHub name from config.json.

upgrade.sh

This script will automatically upgrade the Helm Chart deployment configuring the BinderHub and then prints the Kubernetes pods. It reads the BinderHub name and Helm Chart version from config.json.

teardown.sh

This script will purge the Helm Chart release, delete the Kubernetes namespace and then delete the Azure Resource Group containing the computational resources. It will read the namespaces from config.json. The user should check the Azure Portal to verify the resources have been deleted. It will also purge the cluster information from your kubectl configuration file.

"Deploy to Azure" Button

To deploy BinderHub to Azure in a single click (and some form-filling), use the deploy button below.

Deploy to Azure

Service Principal Creation

You will be asked to provide a Service Principal in the form launched when you click the "Deploy to Azure" button above.

[NOTE: The following instructions can also be run in a local terminal session. They will require the Azure command line to be installed, so make sure to run setup.sh first.]

To create a Service Principal, go to the Azure Portal (and login!) and open the Cloud Shell:

Open Shell in Azure

You may be asked to create storage when you open the shell. This is expected, click "Create".

Make sure the shell is set to Bash, not PowerShell.

Bash Shell

Set the subscription you'd like to deploy your BinderHub on.

az account set -s <subscription>

This image shows the command being executed for an "Azure Pass - Sponsorship" subscription.

Set Subscription

You will need the subscription ID, which you can retrieve by running:

az account list --refresh --output table

List Subscriptions

Next, create the Service Principal with the following command. Make sure to give it a sensible name!

az ad sp create-for-rbac --name binderhub-sp --role Contributor --scopes /subscriptions/<subscription ID from above>

NOTE: If you are deploying an ACR rather than connecting to Docker Hub, then this command should be:

az ad sp create-for-rbac --name binderhub-sp --role Owner --scopes /subscriptions/<subscription ID from above>

Create Service Principal

The fields appId, password and tenant are the required pieces of information. These should be copied into the "Service Principal App ID", "Service Principal App Key" and "Service Principal Tenant ID" fields in the form, respectively.

Keep this information safe as the password cannot be recovered after this step!

Monitoring Deployment Progress

To monitor the progress of the blue-button deployment, go to the Azure portal and select "Resource Groups" from the left hand pane. Then in the central pane select the resource group you chose to deploy into.

Select Resource Group

This will give you a right hand pane containing the resources within the group. You may need to "refresh" until you see a new container instance.

Select Container Instance

When it appears, select it and then in the new pane go to "Settings->Containers". You should see your new container listed.

Container Events

Select it, then in the lower right hand pane select "Logs". You may need to "refresh" this to display the logs until the container starts up. The logs are also not auto-updating, so keep refreshing them to see progress.

Container Logs

Retrieving Deployment Output from Azure

When BinderHub is deployed using the "Deploy to Azure" button (or with a local container), output logs, YAML files, and ssh keys are pushed to an Azure storage account to preserve them once the container exits. The storage account is created in the same resource group as the Kubernetes cluster, and files are pushed into a storage blob within the account.

Both the storage blob name and the storage account name are derived from the name you gave to your BinderHub instance, but may be modified and/or have a random seed appended. To find the storage account name, navigate to your resource group by selecting "Resource Groups" in the left-most panel of the Azure Portal, then clicking on the resource group containing your BinderHub instance. Along with any pre-existing resources (for example, if you re-used an existing resource group), you should see three new resources: a container instance, a Kubernetes service, and a storage account. Make a note of the name of the storage account (referred to in the following commands as ACCOUNT_NAME) then select this storage account.

Storage Account

In the new pane that opens, select "Blobs" from the "Services" section. You should see a single blob listed. Make a note of the name of this blob, which will be BLOB_NAME in the following commands.

Blob Storage

Select Blob Storage

The Azure CLI can be used to fetch files from the blob (either in the cloud shell in the Azure Portal, or in a local terminal session if you've run setup.sh first). Files are fetched into a local directory, which must already exist, referred to as OUTPUT_DIRECTORY in the following commands.

You can run setup.sh to install the Azure CLI or use the cloud shell on the Azure Portal.

To fetch all files:

  az storage blob download-batch --account-name <ACCOUNT_NAME> --source <BLOB_NAME> --pattern "*" -d "<OUTPUT_DIRECTORY>"

The --pattern argument can be used to fetch particular files, for example all log files:

  az storage blob download-batch --account-name <ACCOUNT_NAME> --source <BLOB_NAME> --pattern "*.log" -d "<OUTPUT_DIRECTORY>"

To fetch a single file, specify REMOTE_FILENAME for the name of the file in blob storage, and LOCAL_FILENAME for the filename it will be fetched into:

  az storage blob download --account-name <ACCOUNT_NAME> --container-name <BLOB_NAME> --name <REMOTE_FILENAME> --file <LOCAL_FILENAME>

For full documentation, see the az storage blob documentation.

Accessing your BinderHub after Deployment

Once the deployment has succeeded and you've downloaded the log files, visit the IP address of your Binder page to test it's working.

The Binder IP address can be found by running the following:

cat <OUTPUT_DIRECTORY>/binder-ip.log

A good repository to test your BinderHub with is binder-examples/requirements

Running the Container Locally

The third way to deploy BinderHub to Azure would be to pull the Docker image and run it directly, parsing the values you would have entered in config.json as environment variables.

You will need the Docker CLI installed. Installation instructions can be found here.

First, pull the binderhub-setup image.

docker pull sgibson91/binderhub-setup:<TAG>

where <TAG> is your chosen image tag.

A list of availabe tags can be found here. It is recommended to use the most recent version number. The latest tag is the most recent build from master branch and may be subject fluctuations.

Then, run the container with the following arguments, replacing the <> fields as necessary:

docker run \
-e "BINDERHUB_CONTAINER_MODE=true" \
-e "SP_APP_ID=<Service Principal ID>" \
-e "SP_APP_KEY=<Service Principal Key>" \
-e "SP_TENANT_ID=<Service Principal Tenant ID>" \
-e "RESOURCE_GROUP_NAME=<Chosen Resource Group name>" \
-e "RESOURCE_GROUP_LOCATION=westeurope" \
-e "AZURE_SUBSCRIPTION=<Azure Subscription ID>" \
-e "BINDERHUB_NAME=<Chosen BinderHub name>" \
-e "BINDERHUB_VERSION=<Chosen BinderHub version>" \
-e "AKS_NODE_COUNT=1" \
-e "AKS_NODE_VM_SIZE=Standard_D2s_v3" \
-e "DOCKER_IMAGE_PREFIX=binder-dev" \
-e "DOCKER_USERNAME=<Docker ID>" \
-e "DOCKER_PASSWORD=<Docker password>" \
-it sgibson91/binderhub-setup:<TAG>

The output will be printed to your terminal and the files will be pushed to blob storage, as in the button deployment. See the Retrieving Deployment Output from Azure section for how to return these files.

Customising your BinderHub Deployment

Customising your BinderHub deployment is as simple as editing config.yaml and/or secret.yaml and then upgrading the BinderHub Helm Chart. The Helm Chart can be upgraded by running upgrade.sh (make sure you have the CLIs installed by running setup.sh first).

The Jupyter guide to customising the underlying JupyterHub can be found here.

The BinderHub guide for changing the landing page logo can be found here.

Contributors

We would like to acknowledge and thank the following people for their contributions to this project:

You can’t perform that action at this time.