# Workspace setup

<table align="left">

  <td>
    <a href="https://github.com/DataBiosphere/terra-axon-examples/blob/main/workspace_setup.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/DataBiosphere/terra-axon-examples/main/workspace_setup.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in a Verily Workbench notebook instance
    </a>
  </td>                                                                                               
</table>

## Overview

This notebook sets up some resources expected to exist for Verily Workbench tutorials. Add more setup to this notebook to meet your needs.

### Objective

Perform common workspace setup tasks including:

1. Creating the Cloud Storage buckets used in Verily Workbench tutorials.
1. Creating the BigQuery dataset used in Verily Workbench tutorials.

#### How to run this notebook

Run this notebook cell by cell to set up your workspace. All setup steps are optional but highly recommended so that your workspace is compatible with the Verily Workbench tutorials.

#### Costs

This notebook takes less than a minute to run, which will typically cost less than $0.01 of compute time on your cloud environment.

## Setup and Configuration

In [None]:
import os

In [None]:
if not os.getenv('GOOGLE_CLOUD_PROJECT'):
    raise Exception('Expected environment variables are not available. Please let terra-support@verily.com know.')

## Use the Terra CLI to create some default workspace resources

These default workspace resources are  assumed to exist by Verily Workbench training materials. Specifically, the "self-cleaning" Google Cloud Storage bucket and BigQuery dataset are useful for tutorials. Those tutorials can create GCS files and BigQuery tables that you don't need to remember to clean up after the tutorial has been completed because they are created in the "self-cleaning" storage resources.

First, run the following cell to confirm that you are using the workspace that you intend to work in. (You can also run `terra workspace list` to see your list of workspaces).

In [None]:
!terra status

Take a look at your workspace resources prior to creating these default resources.

In [None]:
!terra resource list

### Create Cloud Storage buckets

Create two Cloud Storage buckets in your workspace with the following workspace reference names:

- `ws_files`: The Verily Workbench utility to share notebooks with other Workbench users will write files to this durable default bucket.
- `ws_files_autodelete_after_two_weeks`: The code in Verily Workbench tutorials will write output files to the "autodelete" bucket by default. Any file in this bucket will be automatically deleted two weeks after it is written. This alleviates the need for you to remember to clean up temporary and example files manually.

In [None]:
!terra resource resolve --name ws_files || terra resource create gcs-bucket \
    --name=ws_files \
    --bucket-name=${GOOGLE_CLOUD_PROJECT}-ws-files \
    --cloning=COPY_NOTHING \
    --description="Bucket for reports and provenance records."

In [None]:
!terra resource resolve --name ws_files_autodelete_after_two_weeks || terra resource create gcs-bucket \
    --name=ws_files_autodelete_after_two_weeks \
    --bucket-name=${GOOGLE_CLOUD_PROJECT}-autodelete-after-two-weeks \
    --cloning=COPY_NOTHING \
    --auto-delete=14 \
    --description="Bucket for temporary storage of file data. Send test outputs here for automatic cleanup after two weeks."

### Create a BigQuery dataset

Create a BigQuery dataset in your workspace with reference name `tabular_data_autodelete_after_two_weeks`.
The code in Verily Workbench tutorials will write BigQuery tables to the "autodelete"' dataset by default.
Any table in this dataset will be automatically deleted two weeks after it is written.
This alleviates the need for you to remember to clean up temporary and example tables manually.

In [None]:
!terra resource resolve --name tabular_data_autodelete_after_two_weeks || terra resource create bq-dataset \
    --name=tabular_data_autodelete_after_two_weeks \
    --dataset-id=tabular_data_autodelete_after_two_weeks \
    --cloning=COPY_NOTHING \
    --default-table-lifetime=1209600 \
    --description="BigQuery dataset for temporary storage of tabular data. Send test outputs here for automatic cleanup after two weeks."

Take a look at your workspace resources after creating these default resources.

In [None]:
!terra resource list

## Provenance

Generate information about this notebook environment and the packages installed.

In [None]:
!date

Conda and pip installed packages:

In [None]:
!conda env export

JupyterLab extensions:

In [None]:
!jupyter labextension list

Number of cores:

In [None]:
!grep ^processor /proc/cpuinfo | wc -l

Memory:

In [None]:
!grep "^MemTotal:" /proc/meminfo

---
Copyright 2022 Verily Life Sciences LLC

Use of this source code is governed by a BSD-style   
license that can be found in the LICENSE file or at   
https://developers.google.com/open-source/licenses/bsd