# Using Cloud Datalab - Accessing Cloud Data

This notebook describes how Google Cloud Datalab integrates within your Google Cloud project, and how you can work with data, manage your notebooks, and invoke APIs that are part of Google Cloud Platform.

## An Under-the-Covers Look

Cloud Datalab functionality is packaged into a docker container. This container contains a ready-to-use environment including the Python runtime, a set of libraries picked for data analysis and visualization scenarios, Google Cloud Platform integration functionality, and this front-end server enabling this environment.

Each deployed or running Cloud Datalab environment represents a Cloud Datalab _workspace_. A workspace is comprised of two parts:

* A named version (by default `main`) of an AppEngine module called `datalab`. AppEngine launches a separate Google Compute Engine instance (or [managed VM](https://cloud.google.com/appengine/docs/managed-vms/)) for each version.

* A [source repository](https://cloud.google.com/tools/cloud-repositories/docs/) branch within the git repository associated with the Google Cloud Platform project.

You can deploy one or more Cloud Datalab workspaces within your Google Cloud Platform project. These workspace instances are accessible to all owners and editors within the project, and are inaccessible to others. You can decide to use and share a single workspace, or you can create dedicated instances - for example, one per user, or one per feature/workstream. The choice is yours.

Within this workspace, the Cloud Datalab frontend manages notebooks, notebook sessions, and the corresponding instances of IPython and Python runtime.

## Google Cloud Integration

In [None]:
import gcp

context = gcp.Context.default()
print 'The current project is %s' % context.project_id

Within Cloud Datalab, the `gcp` Python library provides access to Google Cloud Platform services. It automatically handles initialization to detect the current project, as well as the OAuth token used to invoke APIs. In particular, it uses the OAuth token representing the project's service account, rather than an individual user's credentials.

## Service Accounts

This is an important detail.

The code you author and the data you access is stored in notebooks that are shared across the project. As such, the authorization used to execute and retrieve that data is based upon the project.

Also, any applications or data pipelines you produce within Cloud Datalab are deployed using the project's service account, not individual accounts; this use of the project's service account is generally considered good practice.

Consequently, to access resources contained within another project, you will need to authorize the service account of your Cloud Datalab project within that other project, rather than authorize a particular user.

In [None]:
context.credentials.get_access_token()

The above code prints out the access token representing the service account. However, it is more useful to see the service account itself, so it can be authorized to access resources and data in another project.

This service account is listed in the Google Cloud Platform console for your project. Specifically within the console, navigate to the Permissions->Service accounts section, where you should see the service account listed. Look for a service account with a `gserviceaccount.com` email id.

Often there are multiple service accounts. The one being used for within the Cloud Datalab environment can be retrieved using the following `curl` command (which invokes the [Compute Engine metadata service](https://cloud.google.com/compute/docs/metadata?hl=en)):

In [None]:
%%bash
curl --silent -H "Metadata-Flavor: Google" \
  http://metadata/computeMetadata/v1/instance/service-accounts/default/email