# Workspace setup

<table align="left">

  <td>
    <a href="https://github.com/DataBiosphere/terra-axon-examples/blob/main/workspace_setup.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/DataBiosphere/terra-axon-examples/main/workspace_setup.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in a Terra notebook instance
    </a>
  </td>                                                                                               
</table>

## Overview

This notebook sets up some reasonable defaults for your workspace and some resources expected to exist for TVC tutorials. Add more setup to this notebook to meet your needs.

### Objective

Perform common workspace setup tasks including:

1. Configuring the user name and email address to use for your git commits.
1. Creating the Cloud Storage buckets used in TVC tutorials.
1. Creating the BigQuery dataset used in TVC tutorials.
1. Creating a directory on this machine for Python virtual environments used in TVC tutorials.

#### How to run this notebook

Run this notebook cell by cell to set up your workspace. All setup steps are optional, but highly recommended so that your workspace is compatible with the TVC tutorials.

#### Costs

This notebook takes less than a minute to run, which will typically cost less than $0.01 of compute time on your cloud environment.

## Setup and Configuration

In [None]:
import os

In [None]:
if not os.getenv('GOOGLE_CLOUD_PROJECT'):
    raise Exception('Expected environment variables are not available. Please let terra-support@verily.com know.')

### Set up source control

Edit [git configuration](https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration) to tell git the name and email address to use for **all** your commits. This is optional. If this is not set globally, JupyterLab will prompt for the name and email address to use upon the first commit to a newly clone repository.

<div class="alert alert-block alert-info">
<b>Tip:</b> Also <a href="https://terra-solutions-tvcdocs-stage.uc.r.appspot.com/docs/getting_started/web_ui/#creating-an-ssh-key">set up the Terra-provided GitHub SSH key</a> for convenient interaction with source control.
</div>


In [None]:
# [Optional] EDIT THIS CELL If you wish to set your name and email address for all git repositories, change these
# values to be correct for you. All other cells in this notebook work fine unchanged.

# Uncomment the following line if you want to use your Terra email address as your Git email address.
#GIT_EMAIL = os.environ['TERRA_USER_EMAIL']
GIT_EMAIL = None

GIT_NAME = None

In [None]:
!git config --global --list

In [None]:
if GIT_NAME is not None:
    !git config --global user.name "{GIT_NAME}"

if GIT_EMAIL is not None:
    !git config --global user.email "{GIT_EMAIL}"

!git config --global --list | grep user

Specify any other useful [git configuration](https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration).

In [None]:
# [Optional] EDIT THIS CELL If you wish to set the text editor when using git
# in the terminal instead of via the JupyterLab git extension.

# !git config --global core.editor emacs

### Use the Terra CLI to create some default workspace resources

These default workspace resources are used by TVC training material and assumed to exist. Specfifically the "self-cleaning" Google Cloud Storage bucket and BigQuery dataset are useful for tutorials. Those tutorials can create GCS files and BigQuery tables that you don't need to remember to clean up after the tutorial has been completed because they are created in the "self-cleaning" storage resources.

First, run the following cell to confirm that you are using the workspace that you intend to work in. (You can also run `terra workspace list` to see your list of workspaces).

In [None]:
!terra status

Take a look at your workspace resources prior to creating these default resources.

In [None]:
!terra resource list

#### Create Cloud Storage buckets

Create two Cloud Storage buckets in your workspace with the following workspace reference names:

- `ws_files`: The TVC utility to share notebooks with other TVC users will write files to this durable default bucket.
- `ws_files_autodelete_after_two_weeks`: The code in TVC tutorials will write output files to the "autodelete" bucket by default. Any file in this bucket will be automatically deleted two weeks after it is written. This alleviates the need for you to remember to clean up temporary and example files manually.

In [None]:
!terra resource resolve --name ws_files || terra resource create gcs-bucket \
    --name=ws_files \
    --bucket-name=${GOOGLE_CLOUD_PROJECT}-ws-files \
    --cloning=COPY_NOTHING \
    --description="Bucket for reports and provenance records."

In [None]:
!terra resource resolve --name ws_files_autodelete_after_two_weeks || terra resource create gcs-bucket \
    --name=ws_files_autodelete_after_two_weeks \
    --bucket-name=${GOOGLE_CLOUD_PROJECT}-autodelete-after-two-weeks \
    --cloning=COPY_NOTHING \
    --auto-delete=14 \
    --description="Bucket for temporary storage of file data. Send test outputs here for automatic cleanup after two weeks."

#### Create a BigQuery dataset

Create a BigQuery dataset in your workspace with reference name `tabular_data_autodelete_after_two_weeks`.
The code in TVC tutorials will write BigQuery tables to the "autodelete"' dataset by default.
Any table in this dataset will be automatically deleted two weeks after it is written.
This alleviates the need for you to remember to clean up temporary and example tables manually.

In [None]:
!terra resource resolve --name tabular_data_autodelete_after_two_weeks || terra resource create bq-dataset \
    --name=tabular_data_autodelete_after_two_weeks \
    --dataset-id=tabular_data_autodelete_after_two_weeks \
    --cloning=COPY_NOTHING \
    --default-table-lifetime=1209600 \
    --description="BigQuery dataset for temporary storage of tabular data. Send test outputs here for automatic cleanup after two weeks."

Take a look at your workspace resources after creating these default resources.

In [None]:
!terra resource list

### Create a local directory for Python virtual environments

Several of the TVC tutorials create [Python virtual environments](https://docs.python.org/3/tutorial/venv.html). They will all be placed in this subdirectory.

> A virtual environment is a Python tool for dependency management and project isolation. They allow Python site packages (third party libraries) to be installed locally in an isolated directory for a particular project, as opposed to being installed globally (i.e. as part of a system-wide Python) [[1]](https://towardsdatascience.com/virtual-environments-104c62d48c54#:~:text=A%20virtual%20environment%20is%20a,a%20system%2Dwide%20Python).

In [None]:
!mkdir -p ~/venvs

## Provenance

Generate information about this notebook environment and the packages installed.

In [None]:
!date

Conda and pip installed packages:

In [None]:
!conda env export

JupyterLab extensions:

In [None]:
!jupyter labextension list

Number of cores:

In [None]:
!grep ^processor /proc/cpuinfo | wc -l

Memory:

In [None]:
!grep "^MemTotal:" /proc/meminfo

---
Copyright 2022 Verily Life Sciences LLC

Use of this source code is governed by a BSD-style   
license that can be found in the LICENSE file or at   
https://developers.google.com/open-source/licenses/bsd