-
Notifications
You must be signed in to change notification settings - Fork 0
Deployment
Deployment, either locally or to the Google Cloud platform, can easily be done using the auto-deployment-server
. This page describes steps to either setting up the deployment server for easy deployment or deploying the application via command-line.
Prior to following these steps, it is required to execute the commands listed on the Deploying demo page, up to "Build sources" (one-time only).
To set up the auto-deployment-server
, it suffices to enter its directory and use the following command to run the server:
node bin/www
The deployment server runs now locally at: http://localhost:3000
.
Deployment using this server is fairly straightforward, but there are some caveats:
- Due to the stateful nature of the
gcloud
commands provided by Google, only one user at a time can be authenticated for use of these commands. Running the deployment server on the Google Cloud platform will allow multiple users to be authenticated as a separate environment is created for each user. - For remote deployment, the Docker image that is deployed by default is pulled from the image repository
gcr.io/propane-bearing-124123/datalab:amma-bda
(unless the build option is selected). Of course, this should be changed to Datatonic's repository, but we advise to let this default repository remain static. If it was pulled from the client's repository, that would require it to be uploaded every time a client wants to use the deployer. See the section on remote command-line deployment for more information on pushing images to the Image Registry. - For remote deployment, some behavior was changed to encourage sharing of notebooks between clients. What was changed and how to restore the original behavior is described in the Modified behavior in remote deployment section.
Note that for these steps, you need to be signed in to gcloud
as well as have your project ID selected. How to do this is described on the Deploying demo page.
Deploying a datalab instance locally can be done using the following commands starting from the root of the repository:
cd datalab
./deploy-local.sh
The application will be deployed to http://localhost:8081/
.
This script accepts the following parameters:
-
--build
will build the sources before deploying. This is not normally done since not many modifications to the sources are expected. -
--environment
will use all following arguments as environment variables in the deployment, e.g.--environment "VARIABLE1" "VALUE1" "VARIABLE2" "VALUE2"
. Note that if used, the--environment
option should always be the last option as all following options will be read as environment variables (e.g../deploy-local.sh --environment "VAR1" "VAL1" --build
is not a valid use of the command).
To facilitate deployment to a Google Cloud project, two different scripts were created for the two steps that this process consists of: building the image and pushing it to the image repository on Google Cloud, and deploying the image to the AppEngine. As such, it suffices to execute the following commands:
cd datalab
./push-to-gcloud.sh
./deploy-gcloud.sh
The first script, push-to-gcloud.sh
, builds the sources and pushes the image to the Google Cloud Container Registry. It will be hosted on the repository gcr.io/<PROJECT_ID>/datalab:amma-bda
. By default, it will also grant pull access for all users to this repository. Note that this behavior can be modified using the following parameters:
-
--repository <OTHER_REPO_ID>
will change the destination togcr.io/<OTHER_REPO_ID>/datalab:amma-bda
. Note that the process will keep retrying if you do not have access to this image repository. -
--tag <OTHER_TAG>
will change the destination togcr.io/<PROJECT_ID>/datalab:<OTHER_TAG>
. This is especially useful if multiple versions of the datalab image need to be hosted on the same repository. -
--no-access
will ensure that no access is granted to users of the repository.
The second script, deploy-gcloud.sh
gets this image from the image repository gcr.io/<PROJECT_ID>/datalab:amma-bda
and deploys it to the AppEngine. It accepts the following parameters.
-
--repository <OTHER_REPO_ID>
will change the source of the image togcr.io/<OTHER_REPO_ID>/datalab:amma-bda
. Of course, you need to have access to this repository. This parameter should be used when multiple clients can work with the same build of datalab, as this means it does not have to be pushed to Google Cloud for every deployment. -
--tag <OTHER_TAG>
will change the source of the image togcr.io/<PROJECT_ID>/datalab:<OTHER_TAG>
. This is especially useful if multiple versions of the datalab image are hosted on the same repository and the client should use a specific version. -
--build
will also execute thepush-to-gcloud.sh
command. This means that both commands can be condensed into one:./deploy-gcloud.sh --build
. However, it can still be useful to execute them separately, e.g. if an image needs to be pushed to multiple locations or if the image does not need to be pushed at all. -
--environment
will use all following arguments as environment variables in the deployment, e.g.--environment "VARIABLE1" "VALUE1" "VARIABLE2" "VALUE2"
. Note that if used, the--environment
option should always be the last option as all following options will be read as environment variables.
More deployment parameters are specified in datalab/deploy/app.yaml
. These are the default deployment parameters for Datalab, but if e.g. more memory or disk space should be allocated to the datalab instance, they can be modified.
For remote deployment, we ensured that clients only need to be authenticated using their Google account to access the Google Cloud project, and that one client's notebooks can be seen by all other clients. Some parameters have been adjusted to allow this. How to restore the original behavior of Datalab is described below.
Who is allowed to access the Datalab instance is specified in the deployment parameters. For url: /.*
it is specified: login: required
, meaning a user may access the datalab instance as long as he is logged in to Google. This is the only option that is not Datalab's default behavior, as Google Cloud Datalab normally only allows users with access to the Cloud project. This behavior can be restored by changing this line to login: admin
. It can also be changed to login: optional
, which specifies that datalab will not use login at all. Note that this last option will cause users to be logged in as the default service account, causing the authorization (see below) and workspaces (see below) features to be rendered useless.
When using login: required
or login: admins
, it is possible to authorize only certain clients to use Datalab. To do this, the last line in datalab/deploy/app.yaml
which says DATALAB_NO_AUTH: true
should be removed. After re-deploying, users will need to be authorized. Authorizing a user is as simple as running the following commands starting from the root of this repository:
cd datalab/deploy
./authorize <email>
The <email>
parameter is the email the client uses to sign into Google and thus the datalab instance.
This mechanism was present in the original version of Datalab, but the authorization token was granted by the deployment server instead.
In order to encourage sharing of notebooks between users of datalab, we have merged the workspaces of the users. This behavior can be modified by modifying the line "useMergedWorkspace": true,
in datalab/sources/web/datalab/config/settings.cloud.json
. Replacing true
with false
here will restore the original behavior where workspaces are separated from each other.
It is not necessary or even possible to start or stop VMs managed by the Google Cloud AppEngine manually. The AppEngine will automatically start, pause and stop the VMs without any possibility for user interaction (for instance, deleting a VM will result in the AppEngine restarting it). This is a common misconception about Google Cloud Datalab as manually starting and stopping VMs was previously advised in the Datalab documentation. However, this documentation was not up to date as it already ran on the AppEngine (Datalab used to be deployed to user-managed Compute Engine VMs and only recently switched to Google-managed AppEngine deployment).
Use the command below to stop all running Docker images:
docker stop $(docker ps -a -q) > /dev/null 2>&1
- Go to the App Engine Versions page in the Google Cloud console.
- In the Service dropdown, make sure datalab is selected.
- In the version list, click the checkbox next to the deployment you want to delete.
- Click Delete.
Project Advanced Multimedia Applications: Datatonic and Google Cloud Datalab