Skip to content

Connecting to a Leo Notebook

Robert Title edited this page Mar 22, 2018 · 11 revisions

Once you have created a cluster using Leo, you then need to open the Jupyter notebook via the Leo proxy. This is a guide for integrators on how to do that.

Swagger

The swagger documentation for the Leo proxy is here: https://notebooks.firecloud.org/#/notebooks

A brief description of the APIs:

  1. GET /notebooks/{googleProject}/{clusterName}
    • Opening this URL in a browser will redirect to .../tree and render the Jupyter tree viewer.
  2. GET /notebooks/{googleProject}/{clusterName}/api/contents/{path}
    • Opening this URL in a browser will open the notebook at {path}.
  3. POST /notebooks/{googleProject}/{clusterName}/api/localize
    • This is a custom Jupyter extension that is installed by Leo. It supports localizing/delocalizing files between the notebook server and GCS.
  4. GET /notebooks/{googleProject}/{clusterName}/setCookie
    • Sets a browser cookie needed to authorize connections to the Jupyter notebook (more on this below). Note: this API terminated in Leo; it does not proxy through to Jupyter.
  5. GET /notebooks/invalidateToken
    • The reverse of (4), this invalidates the browser cookie. Note: this API terminated in Leo; it does not proxy through to Jupyter.

This list is not exhaustive! Jupyter notebooks has its own REST API and Swagger definition. All of these APIs work through the Leo proxy; they are just not duplicated in Leo's swagger page.

Proxy authorization

SSL

The Leo proxy connects to the Jupyter server over port 443 using the GCE instance's public IP. A firewall rule is created in the user's project to allow traffic on port 443. It uses 2-way SSL authentication to ensure that only Leo can access the Jupyter server (because only Leo has the client-side key). For more information on the SSL certs and setup see https://github.com/DataBiosphere/leonardo/blob/develop/CERTIFICATES.md.

LeoToken cookie

The Leo proxy also requires that a LeoToken cookie be set in Leo's subdomain (e.g. notebooks.firecloud.org). The cookie's value should be a valid Google OAuth2.0 access token. Leo will resolve the token to an email address and consult an authorization provider (for example, Sam) whether the user is allowed to access the notebook.

We use a cookie for this (instead of say, a bearer token in an Authorization header) because we don't control the Jupyter UI and can't easily intercept calls it makes to the server. Setting the access token in a cookie is a convenient way to provide the user's identity to the Leo proxy for authorization.

Token refresh

Once the cookie is set, the notebook can be loaded in a browser. However Google access tokens expire after 1 hour. In order for the notebook to continue working after 1 hour, we need to refresh the cookie value with an updated token from Google. Leo installs a notebook extension on all clusters which handles this cookie refresh. Every 2 minutes on a timer, the extensions calls gapi authorize() to get the current token for the signed in user, and updates the LeoToken cookie. This way, the notebook can be used indefinitely.

There are a couple caveats to this.

  1. If your laptop goes to sleep while you have a notebook open, the Javascript timer stops running and it's possible for the token or cookie to expire. Leo does not currently handle this very gracefully; the proxy will start returning 401s until you reset the cookie and reload the page. There are a few ideas in Proposed Changes below on how to handle this better.
  2. If a user signs out of Google or is revoked access, the notebook continues to work until either the cookie or Leo's token cache expires. For this reason we ask that integrating UIs call Leo /notebooks/invalidateToken when a user signs out. This will ensure that all cache entries are invalidated and the notebook stops working.

How to set the LeoToken cookie

Because the cookie domain is most likely different domain than the integrating app, it can't be set directly in the UI. It needs to be set by Leo via a Set-Cookie HTTP response header. Here is our current approach for setting this cookie:

  1. UI has a link to open a notebook:
    <a id="notebook" href="#">My Notebook</a>
    
  2. At click time, the anchor tag is hijacked, and an AJAX call is made to the Leo /setCookie endpoint, passing in the user's bearer token via Authorization header.
  3. Leo responds with 200 and a Set-Cookie: LeoToken=<token> header
  4. UI loads the Leo proxy notebook URL in a new tab using something like window.open()
  5. UI sends the application's Google client ID to the other tab via postMessage. The client ID is needed for the notebook extension to successfully refresh the token (see above).
  6. The notebook extension receives the client ID in the postMessage request, and kicks off the token refresh timer.

Steps 1-5 are exemplified here. The notebook extension code in step 6 can be found here.

Proposed changes

There are a few problems with the above approach.

  1. postMessage handshake is brittle

    • While it works, it locks us into a "open in new window/tab" UX and necessitates having parent/child windows talking to each other. E.g. if you open a Leo notebook URL directly, the refresh timer might not get scheduled.
    • Proposed solution: add the ability to set a default client ID in the "create cluster" request. Then the notebook would not need any information from the parent window in order to kick off the token refresh timer. This gets more complicated if a user wants to use the same notebook with multiple clients (FireCloud and Saturn for example). We could get around this by still allowing setting clientId via postMessage, and having the one specified in "create cluster" act as a default.
  2. /setCookie handshake is problematic because:

    • (a) hijacking <a> to making an AJAX call and then open a new tab is a recipe for popup blockers
    • (b) /setCookie is currently scoped to the cluster, but it only needs to be done once per user. This is redundant if the user has multiple clusters.
    • Proposed solution: make all non-proxy Leo APIs (e.g. GET /api/cluster) set a cookie in the response. Then there would be no need for the AJAX call -- the UI could just follow the <a> tag directly. The UI would need to be careful to call a Leo API on a timer at least once per hour, to ensure that the cookie doesn't expire.
  3. Authentication errors are not handled gracefully

    • See token refresh section for ways the token/cookie could expire
    • Proposed solution: instead of just returning 401, Leo can return a default page containing Javascript to handle re-authentication. In particular, it would do the following:
      1. Call the gapi signIn() method, which will load the Google sign-in pop-up if needed.
      2. Once the user is signed in, grabs the token and sets the LeoToken cookie
      3. Reloads the page
    • This has the added benefit that it makes Leo more independently usable: no need to set the cookie through some external UI; the notebook URL will do it for you.