Skip to content

Deployment

Bart Middag edited this page May 17, 2016 · 15 revisions

Deployment

Deployment, either locally or to the Google Cloud platform, can easily be done using the auto-deployment-server. This page describes steps to either setting up the deployment server for easy deployment or deploying the application via command-line. Prior to following these steps, it is required to execute the commands listed on the Deploying demo page, up to "Build sources" (one-time only).

Using the deployment server

To set up the auto-deployment-server, it suffices to enter its directory and use the following command to run the server:

node bin/www

The deployment server runs now locally at: http://localhost:3000.

Deployment using this server is fairly straightforward, but there are some caveats:

  • Due to the stateful nature of the gcloud commands provided by Google, only one user at a time can be authenticated for use of these commands. Running the deployment server on the Google Cloud platform will allow multiple users to be authenticated as a separate environment is created for each user.
  • For remote deployment, the Docker image that is deployed by default is pulled from the image repository gcr.io/propane-bearing-124123/datalab:amma-bda (unless the build option is selected). Of course, this should be changed to Datatonic's repository, but we advise to let this default repository remain static. If it was pulled from the client's repository, that would require it to be uploaded every time a client wants to use the deployer. See the section on remote command-line deployment for more information on pushing images to the Image Registry.
  • For remote deployment, some behavior was changed to encourage sharing of notebooks between clients. What was changed and how to restore the original behavior is described in the Modified behavior in remote deployment section.

Deployment via command-line

Note that for these steps, you need to be signed in to gcloud as well as have your project ID selected. How to do this is described on the Deploying demo page.

Local deployment

Deploying a datalab instance locally can be done using the following commands starting from the root of the repository:

cd datalab
./deploy-local.sh

The application will be deployed to http://localhost:8081/. This script accepts the following parameters:

  • --build will build the sources before deploying. This is not normally done since not many modifications to the sources are expected.
  • --environment will use all following arguments as environment variables in the deployment, e.g. --environment "VARIABLE1" "VALUE1" "VARIABLE2" "VALUE2". Note that if used, the --environment option should always be the last option as all following options will be read as environment variables (e.g. ./deploy-local.sh --environment "VAR1" "VAL1" --build is not a valid use of the command).

Remote deployment

To facilitate deployment to a Google Cloud project, two different scripts were created for the two steps that this process consists of: building the image and pushing it to the image repository on Google Cloud, and deploying the image to the AppEngine. As such, it suffices to execute the following commands:

cd datalab
./push-to-gcloud.sh
./deploy-gcloud.sh

The first script, push-to-gcloud.sh, builds the sources and pushes the image to the Google Cloud Container Registry. It will be hosted on the repository gcr.io/<PROJECT_ID>/datalab:amma-bda. By default, it will also grant pull access for all users to this repository. Note that this behavior can be modified using the following parameters:

  • --repository <OTHER_REPO_ID> will change the destination to gcr.io/<OTHER_REPO_ID>/datalab:amma-bda. Note that the process will keep retrying if you do not have access to this image repository.
  • --tag <OTHER_TAG> will change the destination to gcr.io/<PROJECT_ID>/datalab:<OTHER_TAG>. This is especially useful if multiple versions of the datalab image need to be hosted on the same repository.
  • --no-access will ensure that no access is granted to users of the repository.

The second script, deploy-gcloud.sh gets this image from the image repository gcr.io/<PROJECT_ID>/datalab:amma-bda and deploys it to the AppEngine. It accepts the following parameters.

  • --repository <OTHER_REPO_ID> will change the source of the image to gcr.io/<OTHER_REPO_ID>/datalab:amma-bda. Of course, you need to have access to this repository. This parameter should be used when multiple clients can work with the same build of datalab, as this means it does not have to be pushed to Google Cloud for every deployment.
  • --tag <OTHER_TAG> will change the source of the image to gcr.io/<PROJECT_ID>/datalab:<OTHER_TAG>. This is especially useful if multiple versions of the datalab image are hosted on the same repository and the client should use a specific version.
  • --build will also execute the push-to-gcloud.sh command. This means that both commands can be condensed into one: ./deploy-gcloud.sh --build. However, it can still be useful to execute them separately, e.g. if an image needs to be pushed to multiple locations or if the image does not need to be pushed at all.
  • --environment will use all following arguments as environment variables in the deployment, e.g. --environment "VARIABLE1" "VALUE1" "VARIABLE2" "VALUE2". Note that if used, the --environment option should always be the last option as all following options will be read as environment variables.

More deployment parameters are specified in datalab/deploy/app.yaml. These are the default deployment parameters for Datalab, but if e.g. more memory or disk space should be allocated to the datalab instance, they can be modified.

Modified Datalab behavior in remote deployment

For remote deployment, we ensured that clients only need to be authenticated using their Google account to access the Google Cloud project, and that one client's notebooks can be seen by all other clients. Some parameters have been adjusted to allow this. How to restore the original behavior of Datalab is described below.

Restoring the original behavior: admins only

Who is allowed to access the Datalab instance is specified in the deployment parameters. For url: /.* it is specified: login: required, meaning a user may access the datalab instance as long as he is logged in to Google. This is the only option that is not Datalab's default behavior, as Google Cloud Datalab normally only allows users with access to the Cloud project. This behavior can be restored by changing this line to login: admin. It can also be changed to login: optional, which specifies that datalab will not use login at all. Note that this last option will cause users to be logged in as the default service account, causing the authorization (see below) and workspaces (see below) features to be rendered useless.

Restoring the original behavior: authorization of select clients

When using login: required or login: admins, it is possible to authorize only certain clients to use Datalab. To do this, the last line in datalab/deploy/app.yaml which says DATALAB_NO_AUTH: true should be removed. After re-deploying, users will need to be authorized. Authorizing a user is as simple as running the following commands starting from the root of this repository:

cd datalab/deploy
./authorize <email>

The <email> parameter is the email the client uses to sign into Google and thus the datalab instance.

This mechanism was present in the original version of Datalab, but the authorization token was granted by the deployment server instead.

Restoring the original behavior: separate workspaces

In order to encourage sharing of notebooks between users of datalab, we have merged the workspaces of the users. This behavior can be modified by modifying the line "useMergedWorkspace": true, in datalab/sources/web/datalab/config/settings.cloud.json. Replacing true with false here will restore the original behavior where workspaces are separated from each other.

Remote Deployment: manually starting/stopping VMs

It is not necessary or even possible to start or stop VMs managed by the Google Cloud AppEngine manually. The AppEngine will automatically start, pause and stop the VMs without any possibility for user interaction (for instance, deleting a VM will result in the AppEngine restarting it). This is a common misconception about Google Cloud Datalab as manually starting and stopping VMs was previously advised in the Datalab documentation. However, this documentation was not up to date as it already ran on the AppEngine (Datalab used to be deployed to user-managed Compute Engine VMs and only recently switched to Google-managed AppEngine deployment).

Deleting your deployment

Local deployment

Use the command below to stop all running Docker images:

docker stop $(docker ps -a -q) > /dev/null 2>&1

Remote deployment

  1. Go to the App Engine Versions page in the Google Cloud console.
  2. In the Service dropdown, make sure datalab is selected.
  3. In the version list, click the checkbox next to the deployment you want to delete.
  4. Click Delete.