# Working with workspace resources

<table align="left">

  <td>
    <a href="https://github.com/DataBiosphere/terra-axon-examples/blob/main/first_hour_on_vwb/working_with_resources.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/DataBiosphere/terra-axon-examples/main/first_hour_on_vwb/working_with_resources.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in a Verily Workbench cloud environment
    </a>
  </td>                                                                                               
</table>


## Overview

This notebook provides examples of working with workspace resources in Verily Workbench. Build upon the best practices demonstrated in this notebook to include and share resources in your own workspaces.


### Objective

Use this notebook to perform common workspace resource operations including:
- [Add a referenced resource](#add-ref-resources)
    - [Add a GitHub repo](#add-git-repo)
    - [Add a BigQuery dataset](#add-bq-dataset)
    - [Add a BigQuery table](#add-bq-table)
    - [Add a Google Cloud Storage bucket](#add-gcs-bucket)
    - [Add a Google Cloud Storage object](#add-gcs-object)
- [Create a controlled resource](#add-controlled-resource)
    - [Create a BigQuery dataset](#create-bq-dataset)
    - [Create a Google Cloud Storage bucket](#create-gcs-bucket)
    - [Create a cloud environment](#create-cloud-env)

#### How to run this notebook

1. Run the [Notebook setup](#notebook-setup) section before running the cells of the other sections.
1. Each subsection provides information about a particular type of workspace resource and creates a widget. Run the cell in a particular subsection to create the widget, then input your resource's information and click the widget's button to add or create a resource.

#### Costs

This notebook takes less than a minute to run, which will typically cost less than $0.01 of compute time on your cloud environment.


### Notebook setup <a href="notebook-setup"></a>

Run the cell below to import dependencies and utilities.


In [None]:
from IPython.display import display, HTML
import ipywidgets as widgets
import subprocess
import widget_utils as wu

'''
Resolves bucket URL from bucket reference in workspace.
'''
def get_bucket_url_from_reference(bucket_reference):
    BUCKET_CMD_OUTPUT = !terra resolve --name={bucket_reference}
    BUCKET = BUCKET_CMD_OUTPUT[0]
    return BUCKET

'''
Resolves current workspace ID from workspace description.
'''
def get_current_workspace_id():
    WORKSPACE_CMD_OUTPUT = !terra workspace describe --format=json | jq --raw-output ".id"
    WORKSPACE_ID = WORKSPACE_CMD_OUTPUT[0]
    return WORKSPACE_ID

CURRENT_WORKSPACE_ID = get_current_workspace_id()
print(f'Workspace ID: {CURRENT_WORKSPACE_ID}')

### Workspace setup

<div class="alert alert-block alert-info">
<b>Note:</b> This notebook assumes that <a href="../../terra-axon-examples/workspace_setup.ipynb">`workspace_setup.ipynb`</a> has been run.
</div>
    
`workspace_setup.ipynb` creates two Cloud Storage buckets for your workspace files with workspace reference names:

- ws_files
- ws_files_autodelete_after_two_weeks

The code in this notebook will write output files to the "autodelete" bucket by default.  
 Any file in this bucket will be automatically deleted <b>two weeks</b> after it is written.  
 This alleviates the need for you to remember to clean up temporary and example files manually.  
 If you want to write outputs to a durable location, simply change the assignment of the `BUCKET_REFERENCE` variable in the cell below and re-run the notebook.


In [None]:
# Change this to "ws_files" to use the durable workspace bucket instead of the autodelete bucket.
BUCKET_REFERENCE = "ws_files_autodelete_after_two_weeks"

In [None]:
MY_BUCKET = get_bucket_url_from_reference(BUCKET_REFERENCE)
print(f'Bucket ID: {MY_BUCKET}')

### Add referenced resources
<a id='add-ref-resourcess'></a>

A [referenced resource](https://terra-docs.api.verily.com/docs/getting_started/web_ui/#referenced-vs-workspace-controlled-resources) points to a source outside of the current workspace in order to represent data or other elements in Verily Workbench.

For each type of referenced resource supported by Verily Workbench, this notebook provides a **widget** which runs a Workbench CLI command to add a referenced resource.<br>simply fill in the inputs with your desired values and click the button.


#### Choosing the right cloning instruction

<a id='cloning-instruction'></a>

Each resource in a workspace has a **cloning instruction** that dictates the presence or absence of that resource and its contents in clones of the original workspace. It's important to understand the options available so you can choose the appropriate cloning behavior for each resource you add to your workspace. If you do not specify a cloning instruction when adding or creating a workspace resource, `CLONE_REFERENCE` is the default value used.

The table below describes the expected behavior, based on the cloning instruction, of a resource in a child workspace which is the clone of a parent workspace.

| Cloning Instruction | Details                                                                                                                                                                                        |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `COPY_NOTHING`      | Resource does not exist in clone workspace.                                                                                                                                                    |
| `COPY_DEFINITION`   | **Only for controlled resources** Create a new controlled resource with same metadata (location, lifecycle rules, etc.) in clone workspace. Files/data are not copied to new resource.         |
| `COPY_REFERENCE`    | Create new referenced resource that ponits to same cloud resource as source.                                                                                                                   |
| `COPY_RESOURCE`     | **Only for controlled resources** Create a new controlled resource with same metadata (location, lifecycle rules, etc.) in clone workspace. Files/data are copied from source to new resource. |


#### 1. Add a GitHub repository
<a id='add-github-repo'></a>

Both public and private GitHub repositories can be added to data collections. No additional setup is required to add a public GitHub repository. In order to add a private GitHub repository, you must first [set up your Verily Workbench SSH key](https://terra-docs.api.verily.com/docs/how_to_guides/terra_ssh_key_guide) and should subsequently provide repo URLs of the format `git@github.com` and not `http://`.

Run the cell below to create a widget. Next, populate the following input fields:

- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Description`: Provide a short description of the repository.
- `Name`: It's strongly recommended that you use the name of the GitHub repo here. For example, `terra-axon-examples` would be the perfect name for the referenced resource of [this repo](https://github.com/DataBiosphere/terra-axon-examples).
- `Repo URL`: The URL of the GitHub repostiory you wish to add.

Click the button to add the repo to your workspace. The output should resemble:

```
Successfully added referenced git repo.
Name:         <REPO_NAME>
Description:  <DESCRIPTION>
Type:         GIT_REPO
Stewardship:  REFERENCED
Cloning:      COPY_REFERENCE
Properties:   class Properties {
    []
}
Git repo Url: https://github.com/<REPO_NAME>
```


In [None]:
class AddRepoWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.input_name = wu.TextInputWidget("<REPO_NAME>", "Name:").get()
        self.input_description = wu.TextInputWidget(
            "<REPO_DESCRIPTION>", "Description:").get()
        self.input_repo_url = wu.TextInputWidget("<REPO_URL>", "URL:").get()
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.output = widgets.Output()
        self.add_repo_button = wu.StyledButton(
            'Add repository', 'Click to add a git repository', 'plus').get()
        self.add_repo_button.on_click(self.add_repo)
        self.vb = widgets.VBox(
            [
                self.label,
                self.cloning_drop_down,
                self.input_description,
                self.input_name,
                self.input_repo_url,
                self.add_repo_button,
                self.output
            ],
            layout=wu.vbox_layout
        )

    def add_repo(self, b):
        with self.output:
            description_content = f"\"{self.input_description.value}\""
            print("Running command to add repo...")
            terraCommand = f"""terra resource add-ref git-repo \\
            --cloning={self.cloning_drop_down.value} \\
            --description={description_content} \\
            --name={self.input_name.value} \\
            --repo-url={self.input_repo_url.value} \\
            --workspace={CURRENT_WORKSPACE_ID}"""
            print(terraCommand)
            result = subprocess.run(["terra", "resource", "add-ref", "git-repo",
                                     f"--cloning={self.cloning_drop_down.value}",
                                     f"--description={description_content}",
                                     f"--name={self.input_name.value}",
                                     f"--repo-url={self.input_repo_url.value}",
                                     f"--workspace={CURRENT_WORKSPACE_ID}"],
                                    capture_output=True,
                                    text=True)
            print(result.stderr) if not result.stdout else print(result.stdout)


# Instantiate widget
add_repo_widget = AddRepoWidget()
display(add_repo_widget.vb)

#### 2. Add a BigQuery dataset
<a id='add-bq-dataset'></a>

Run the cell below to create a widget that adds a BigQuery dataset as a referenced resource to your workspace.

Then, populate the input fields:

- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Dataset ID`: The BigQuery dataset ID of your dataset.
- `Description`: A description of this resource's contents and purpose.
- `Name`: The name by which you will reference this resource in your workspace. A short and descriptive name is suggested, e.g., `human-genomic-data`.
- `Path`: Consists of the GCP project ID and dataset ID with format `<PROJECT_ID>.<DATASET_ID>`.
- `GCP Project ID`: The Google Cloud Project ID of your dataset.

**NOTE:** You must populate either

- the path field (which will have format `<PROJECT_ID>.<DATASET_ID>`) field or
- both [dataset ID and the GCP project ID](https://cloud.google.com/bigquery/docs/datasets-intro#datasets) fields.

Click the button to add the BigQuery dataset reference to your workspace. The output should resemble:

```
Successfully added referenced BigQuery dataset.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         BQ_DATASET
Stewardship:  REFERENCED
Cloning:      COPY_REFERENCE
Properties:   class Properties {
    []
}
GCP project id: <PROJECT_ID>
BigQuery dataset id: <DATASET_ID>
Location: US
# Tables: <NUMBER_OF_TABLES>
```


In [None]:
class AddBQDatasetWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value="Please provide appropriate values in the input boxes.")
        self.warning = widgets.Label(
            value="Provide either the path or BOTH the Dataset ID and GCP Project ID. All other fields required.")
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.input_dataset_id = wu.TextInputWidget(
            "<DATASET_ID>", "Dataset ID:").get()
        self.input_description = wu.TextInputWidget(
            "<REPO_DESCRIPTION>", "Description:").get()
        self.input_name = wu.TextInputWidget("<NAME>", "Name:").get()
        self.input_path = wu.TextInputWidget("<PATH>", "Path:").get()
        self.input_project_id = wu.TextInputWidget(
            "<GCP_PROJECT_ID>", "GCP Project ID:").get()
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            "Add BigQuery dataset", "Click to add a BigQuery dataset as a referenced resource", "plus",).get()
        self.button.on_click(self.add_dataset)
        self.vb = widgets.VBox(
            children = [
                self.label,
                self.warning,
                self.cloning_drop_down,
                self.input_dataset_id,
                self.input_description,
                self.input_name,
                self.input_path,
                self.input_project_id,
                self.button,
                self.output
            ],
            layout = wu.vbox_layout
        )

    def add_dataset(self, b):
        with self.output:
            description_content = f"{self.input_description.value}"
            commandList = [
                "terra","resource","add-ref","bq-dataset",
                f"--cloning={self.cloning_drop_down.value}",
                f"--description={description_content}",
                f"--name={self.input_name.value.strip()}",
                f"--workspace={CURRENT_WORKSPACE_ID}",
            ]
            if self.input_path.value.strip() != "":
                commandList.append(
                    f"--path={self.input_path.value.strip()}")
            else:
                commandList.append(
                    f"--dataset-id={self.input_dataset_id.value.strip()}")
                commandList.append(
                    f"--project-id={self.input_project_id.value.strip()}")
            
            print('Running command:')
            print('\n'.join(commandList))
            print('')
            
            result = subprocess.run(commandList, capture_output = True, text = True)
            print(result.stderr) if not result.stdout else print(result.stdout)


# Instantiate widget
add_bq_dataset_widget = AddBQDatasetWidget()
display(add_bq_dataset_widget.vb)

#### 3. Add a BigQuery table
<a id='add-bq-table'></a>

Run the cell below to create widget that adds a BigQuery table as a referenced resource to your workspace.

Then, populate the input fields:

- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Dataset ID`: The [BigQuery dataset ID](https://google.com/bigquery/docs/datasets-intro#datasets) of your table.
- `Description`: A description of this resource's contents and purpose.
- `Name`: The name by which you will reference this resource in your workspace. A short and descriptive name is suggested, e.g., `human-genomic-data-table`.
- `Path`: Consists of your table's GCP project ID, dataset ID, & table ID with format `<PROJECT_ID>.<DATASET_ID>.<TABLE_ID>`.
- `GCP Project ID`: The [GCP project ID](https://cloud.google.com/docs/overview#projects) of your table.
- `Table ID`: The [BigQuery table ID](https://cloud.google.com/bigquery/docs/datasets-intro#datasets) of your table.

**NOTE:** You must populate either:

- the path field (which will have format `<PROJECT_ID>.<DATASET_ID>.<TABLE_ID>`), or
- the dataset ID, GCP project ID and table ID fields.

Click the button to add the BigQuery dataset reference to your workspace. The output should resemble:

```
Successfully added referenced BigQuery data table.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         BQ_TABLE
Stewardship:  REFERENCED
Cloning:      COPY_REFERENCE
Properties:   class Properties {
    []
}
GCP project id: <PROJECT_ID>
BigQuery dataset id: <DATASET_ID>
BigQuery table id: <TABLE_ID>
# Rows: (unknown)
```


In [None]:
class AddBQTableWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.warning = widgets.Label(
            value="Provide either the path and Table ID OR the GCP Project ID, Dataset ID and Table ID. All other fields required.")
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.input_dataset_id = wu.TextInputWidget(
            "<DATASET_ID>", "Dataset ID:").get()
        self.input_description = wu.TextInputWidget(
            "<REPO_DESCRIPTION>", "Description:").get()
        self.input_name = wu.TextInputWidget("<NAME>", "Name:").get()
        self.input_path = wu.TextInputWidget("<PATH>", "Path:").get()
        self.input_project_id = wu.TextInputWidget(
            "<GCP_PROJECT_ID>", "GCP Project ID:").get()
        self.input_table_id = wu.TextInputWidget(
            "<TABLE_ID>", "Table ID:").get()
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            "Add BigQuery table", "Click to add a BigQuery table as a referenced resource.", "plus").get()
        self.button.on_click(self.add_bq_table)
        self.vb = widgets.VBox(
            [
                self.label,
                self.warning,
                self.cloning_drop_down,
                self.input_dataset_id,
                self.input_description,
                self.input_name,
                self.input_path,
                self.input_project_id,
                self.input_table_id,
                self.button,
                self.output
            ],
            layout = wu.vbox_layout
        )

    def add_bq_table(self, b):
        with self.output:
            description_content = f"{self.input_description.value}"

            commandList = [
                "terra","resource","add-ref","bq-dataset",
                f"--cloning={self.cloning_drop_down.value}",
                f"--description={description_content}",
                f"--name={self.input_name.value.strip()}",
                f"--workspace={CURRENT_WORKSPACE_ID}",
            ]
            
            if self.input_path.value.strip() != "":
                commandList.append(
                    f"--path={self.input_path.value.strip()}")
            else:
                commandList.append(
                    f"--dataset-id={self.input_dataset_id.value.strip()}")
                commandList.append(
                    f"--project-id={self.input_project_id.value.strip()}")
                commandList.append(
                    f"--project-id={self.input_project_id.value.strip()}")
  
            print('Running command:')
            print('\n'.join(commandList))
            print('')

            result = subprocess.run(commandList, capture_output = True, text = True)
            print(result.stderr) if not result.stdout else print(result.stdout)


# Instantiate widget
add_bq_table_widget = AddBQTableWidget()
display(add_bq_table_widget.vb)

#### 4. Add a Google Cloud Storage bucket
<a id='add-gcs-bucket'></a>

Run the cell below to create a widget, then populate the widget's input fields and click the button to add a Google Cloud Storage bucket as a referenced resource in your workspace.

Widget input parameters include:

- `Bucket Name`: Must be a string that [meets requirements](https://cloud.google.com/storage/docs/buckets#naming), and must be globally unique. A recommended strategy is to preprend the `<PROJECT_ID>` of the GCP project associated with your workspace to a descriptive string (e.g., `<PROJECT_ID>_output_data`).
- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Description`: This description is shown in the Resources tab of your workspace's page in Verily Workbench and when the `terra resource list` command is run. The description you provide should add helpful context about the purpose and/or contents of your bucket.
- `Name`: This is the name which can be referenced in Workbench CLI commands. The value should be brief and memorable while communicating the purpose and/or contents of the bucket (e.g. `<STUDY_NAME>-data-bucket`).

<br>Once you've run the cell below, populated the input fields and clicked the button, the output should resemble:

```
Successfully added referenced GCS bucket.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         GCS_BUCKET
Stewardship:  REFERENCED
Cloning:      COPY_REFERENCE
Properties:   class Properties {
    []
}
GCS bucket name: <BUCKET_NAME>
Location: US-CENTRAL1
# Objects: <NUMBER_OF_OBJECTS>
```


In [None]:
class AddGcsBucketWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.input_bucket_name = wu.TextInputWidget(
            "<BUCKET_NAME>", "Bucket Name:").get()
        self.input_description = wu.TextInputWidget(
            "<DESCRIPTION>", "Description:").get()
        self.input_name = wu.TextInputWidget("<NAME>", "Name:").get()
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            'Add GCS bucket', 'Click to add a GCS bucket as a referenced resource.', 'plus').get()
        self.vb = widgets.VBox(
            [
                self.label,
                self.input_bucket_name,
                self.cloning_drop_down,
                self.input_description,
                self.input_name,
                self.button,
                self.output
            ],
            layout = wu.vbox_layout
        )
        self.button.on_click(self.add_gcs_bucket)

    def add_gcs_bucket(self, b):
        with self.output:
            description_content = f"\"{self.input_description.value}\""

            commandList = ["terra", "resource", "add-ref", "gcs-bucket",
                                     f"--bucket-name={self.input_bucket_name.value.strip()}",
                                     f"--cloning={self.cloning_drop_down.value}",
                                     f"--description={description_content}",
                                     f"--name={self.input_name.value}",
                                     f"--workspace={CURRENT_WORKSPACE_ID}"]
            
            print('Running command:')
            print('\n'.join(commandList))
            print('')
            
            result = subprocess.run(commandList, capture_output=True, text=True)
            print(result.stderr) if not result.stdout else print(result.stdout)


# Instantiate widget
add_gcs_bucket_widget = AddGcsBucketWidget()
display(add_gcs_bucket_widget.vb)

#### 5. Add Google Cloud Storage object
<a id='add-gcs-object'></a>

Run the cell below to create a widget, then populate the widget's input fields and click the button to add a Google Cloud Storage bucket object as a referenced resource in your workspace.

Widget input parameters include:

- `Bucket Name`: Must be a string that [meets requirements](https://cloud.google.com/storage/docs/buckets#naming) and is globally unique. A recommended strategy is to preprend the `<PROJECT_ID>` of the GCP project associated with your workspace to a descriptive string (e.g., `<PROJECT_ID>_output_data`).- `Bucket Name`: Must be a string that [meets requirements](https://cloud.google.com/storage/docs/buckets#naming) and is globally unique. A recommended strategy is to preprend the `<PROJECT_ID>` of the GCP project associated with your workspace to a descriptive string (e.g., `<PROJECT_ID>_output_data`).
- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Description`: This value is shown in the Resources tab of your workspace's page in Verily Workbench and when the `terra resource list` command is run. The description you provide should add helpful context about the purpose and/or contents of your bucket.
- `Name`: This name can be referenced in Workbench CLI commands. The value should be brief and memorable while communicating the purpose and/or contents of the bucket (e.g. `<STUDY_NAME>-data-bucket`).
- `Object Name`: The name of the object in the Google Cloud Storage bucket (e.g. `my_data.csv`).
- `Path`: Must be the full path of the object (e.g. `folder/my_data.csv`).

**NOTE:** You must populate either

- the bucket name and path fields, or
- the bucket name and object name fields, wherein the object name is the full path to the object (e.g. `/folder1/folder2/my_data.csv`).

Once you've run the cell below, populated the input fields and clicked the button, the output should resemble:

```
Successfully added referenced GCS bucket object.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         GCS_OBJECT
Stewardship:  REFERENCED
Cloning:      <CLONING_INSTRUCTION>
Properties:   class Properties {
    []
}
GCS bucket name: <BUCKET_NAME>
Full path to the object: <PATH>
Is directory: (unknown)
Size: (unknown)
The time that the object's storage class was last changed or the time of the object creation: (unknown)
```


In [None]:
class AddGcsObjectWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.warning = widgets.Label(
            value='Provide either the path OR the object name. All other fields required.')
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.input_bucket_name = wu.TextInputWidget(
            "<BUCKET_NAME>", "Bucket Name:").get()
        self.input_description = wu.TextInputWidget(
            "<DESCRIPTION>", "Description:").get()
        self.input_name = wu.TextInputWidget("<NAME>", "Name:").get()
        self.input_object_name = wu.TextInputWidget(
            "<OBJECT_NAME>", "Object Name:").get()
        self.input_path = wu.TextInputWidget("<PATH>", "Path to Object:").get()
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            'Add GCS object', 'Click to add a GCS bucket as a referenced resource.', 'plus').get()
        self.button.on_click(self.add_gcs_object)
        self.vb = widgets.VBox([
            self.label,
            self.warning,
            self.input_bucket_name,
            self.cloning_drop_down,
            self.input_description,
            self.input_name,
            self.input_object_name,
            self.input_path,
            self.button,
            self.output], layout = wu.vbox_layout)

    def add_gcs_object(self, b):
        with self.output:
            description_content = f"\"{self.input_description.value}\""

            commandList = ["terra", "resource", "add-ref", "gcs-object",
                                         f"--bucket-name={self.input_bucket_name.value.strip()}",
                                         f"--cloning={self.cloning_drop_down.value}",
                                         f"--description={description_content}",
                                         f"--name={self.input_name.value}",
                                         f"--workspace={CURRENT_WORKSPACE_ID}"]
            
            if self.input_path.value != "":
                commandList.append(f"--path={self.input_path.value.strip()}")

            else:
                commandList.append(f"--object-name={self.input_object_name.value.strip()}")
            
            print('Running command:')
            print('\n'.join(commandList))
            print('')
            
            result = subprocess.run(commandList, capture_output=True, text=True)
            print(result.stderr) if not result.stdout else print(result.stdout)


add_gcs_obj_widget = AddGcsObjectWidget()
display(add_gcs_obj_widget.vb)

### Create controlled resources
<a id='add-controlled-resources'></a>

[Controlled resources](https://terra-docs.api.verily.com/docs/getting_started/web_ui/#referenced-vs-workspace-controlled-resources) are cloud resources that are managed or created by Verily Workbench within the current workspace.
For each type of controlled resource supported in Verily Workbench, this notebook provides a widget which runs a Terra CLI command to add a referenced resource. Simply run the cell to create the widget, fill in the inputs with your desired values and click the button.


#### 1. Create a BigQuery dataset
<a id='create-bq-dataset'></a>

Run the cell below to create a widget that creates a BigQuery dataset as a controlled resource of your workspace.

Then, populate the input fields:

- `Access`: Defaults to `SHARED_ACCESS`, which grants bucket access to all users with access to the workspace, based on the user's workspace access (Reader or Writer). `PRIVATE_ACCESS` ensures the bucket and its contents cannot be shared.
- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Dataset ID`: The BigQuery dataset ID of your dataset. If a dataset ID is not provided, the `Name` value will be used to create the dataset ID.
- `Description`: A description of this resource's contents and purpose.
- `GCP Project ID`: The [Google Cloud project ID](https://cloud.google.com/docs/overview#projects) of your dataset.
- `Name`: The name by which you will reference this resource in your workspace. A short and descriptive name is suggested, e.g., `human-genomic-data`.
- `Path`: Consists of the GCP project ID and dataset ID with format `<PROJECT_ID>.<DATASET_ID>`.

**NOTE:** You must populate either

- the path field (which will have format `<PROJECT_ID>.<DATASET_ID>`) or
- both the [dataset ID and project ID](https://cloud.google.com/bigquery/docs/datasets-intro#datasets) fields.

Click the button to create a workspace-controlled BigQuery dataset.

The output should resemble:

```
Successfully added controlled BigQuery dataset.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         BQ_DATASET
Stewardship:  CONTROLLED
Cloning:      COPY_REFERENCE
Access scope: SHARED_ACCESS
Managed by:   USER
Properties:   class Properties {
    []
}
GCP project id: <GOOGLE_PROJECT_ID>
BigQuery dataset id: <DATASET_ID>
Location: us-central1
# Tables: 0
```


In [None]:
class CreateBQDataset(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.warning = widgets.Label(
            value="Dataset ID is optional; if not provided, one will be automatically generated.")
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.access_drop_down = wu.DropdownInputWidget(
            ['SHARED_ACCESS', 'PRIVATE_ACCESS'], 'SHARED_ACCESS', "Access:").get()
        self.input_dataset_id = wu.TextInputWidget(
            "<DATASET ID>", "Dataset ID").get()
        self.input_description = wu.TextInputWidget(
            "<DESCRIPTION>", "Description:").get()
        self.input_name = wu.TextInputWidget("<NAME>", "Name:").get()
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            'Create dataset', 'Click to create a BigQuery dataset as a controlled resource.', 'plus').get()
        self.toggle_optional = wu.ShowOptionalCheckbox().get()
        self.toggle_optional.observe(self.toggle)
        self.required_fields = [
            self.toggle_optional, self.label, self.warning,
            self.access_drop_down, self.cloning_drop_down, self.input_description,
            self.input_name, self.button, self.output
        ]
        self.vb = widgets.VBox(children=self.required_fields,
                               layout=wu.vbox_layout
                               )
        self.button.on_click(self.create_bq_dataset)

    def toggle(self, event):
        self.output.clear_output()
        with self.output:
            if self.toggle_optional.value == True:
                parameterList = []
                for r in self.required_fields:
                    parameterList.append(r)
                # Insert optional field, preserving alphabetical order.
                parameterList.insert(5, self.input_dataset_id)
                self.vb.children = parameterList
            else:
                if self.vb.children != self.required_fields:
                    self.vb.children = self.required_fields

    def create_bq_dataset(self, b):
        with self.output:
            description_content = f"\"{self.input_description.value}\""
            commandList = ["terra", "resource", "create", "bq-dataset",
                           f"--access={self.access_drop_down.value}",
                           f"--cloning={self.cloning_drop_down.value}",
                           f"--description={description_content}",
                           f"--name={self.input_name.value}",
                           f"--workspace={CURRENT_WORKSPACE_ID}"]
            if self.input_dataset_id.value != "":
                commandList.append(
                    f"--dataset-id={self.input_dataset_id.value}")

            print('Running command:')
            print('\n'.join(commandList))
            print('')

            result = subprocess.run(
                commandList, capture_output=True, text=True)
            print(result.stderr) if not result.stdout else print(result.stdout)


create_bq_dataset_widget = CreateBQDataset()
display(create_bq_dataset_widget.vb)

#### 2. Create a Google Cloud Storage bucket
<a id='create-gcs-bucket'></a>

Run the cell below to create a widget, then populate the widget's input fields and click the button to add a Google Cloud Storage bucket as a resource controlled by your workspace.

Widget input parameters include:

- `Access`: Defaults to `SHARED_ACCESS`, which grants bucket access to all users with access to the workspace, based on the user's workspace access (Reader or Writer). `PRIVATE_ACCESS` ensures the bucket and its contents cannot be shared.
- `Auto-delete`: (_Optional_) Number of days after which to auto-delete the objects in the bucket. This option is a shortcut for specifying a [lifecycle rule](https://github.com/DataBiosphere/terra-cli#gcs-bucket-lifecycle-rules).
- `Bucket Name`: Must be a string that [meets requirements](https://cloud.google.com/storage/docs/buckets#naming), and must be globally unique. If no bucket name is provided, one will be supplied by combining the resource name and the GCP project ID.
- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction).
- `Description`: The description field's value is displayed in the Resources tab of your workspace's page and when the `terra resource list` command is run. The description you provide should add helpful context about the purpose and/or contents of your bucket (e.g. "<TERRA_USER_EMAIL> genomics workflow outputs").
- `Lifecycle`: (_Optional_) [Lifecycle rules](https://cloud.google.com/storage/docs/lifecycle) specified in a JSON-formatted file. See [examples](https://github.com/DataBiosphere/terra-cli#gcs-bucket-lifecycle-rules).
- `Location`: (_Optional_) The [Google cloud location](https://cloud.google.com/storage/docs/locations) of the bucket. Default value is `us-central1`.- `Location`: (_Optional_) The [Google cloud location](https://cloud.google.com/storage/docs/locations) of the bucket. Default value is `us-central1`.
- `Name`: This name can be referenced in Workbench CLI commands. The value should be brief and memorable while communicating the purpose and/or contents of the bucket (e.g. `<STUDY_NAME>-data-bucket`).
- `Storage`: (_Optional_) The [Google Cloud Storage class](https://cloud.google.com/storage/docs/storage-classes). Default value is `STANDARD`.

<br>Once you've run the cell below, populated the input fields and clicked the button, the output should resemble:

```
Successfully added controlled GCS bucket.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         GCS_BUCKET
Stewardship:  CONTROLLED
Cloning:      <CLONING>
Access scope: <ACCESS>
Managed by:   USER
Properties:   class Properties {
    []
}
GCS bucket name: <BUCKET_NAME>
Location: US-CENTRAL1
# Objects: 0
```


In [None]:
class CreateGcsBucketWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.access_drop_down = wu.DropdownInputWidget(
            ['SHARED_ACCESS', 'PRIVATE_ACCESS'], 'SHARED_ACCESS', "Access:").get()
        self.input_auto_delete = wu.TextInputWidget(
            "<AUTODELETE (DAYS)>", "Auto Delete (Days) - optional:")
        self.input_bucket_name = wu.TextInputWidget(
            "<BUCKET_NAME>", "Bucket Name:")
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.input_description = wu.TextInputWidget(
            "<DESCRIPTION>", "Description:")
        self.input_lifecycle = wu.TextInputWidget(
            "<PATH_TO_JSON>", "Lifecycle (JSON) - optional:")
        self.input_location = wu.TextInputWidget(
            "<LOCATION>", "Location - optional:")
        self.input_name = wu.TextInputWidget("<NAME>", "Name:")
        self.storage_drop_down = wu.DropdownInputWidget(
            ['STANDARD', 'NEARLINE', 'COLDLINE', 'ARCHIVE'], 'STANDARD', "Storage Class:").get()
        self.input_path = wu.TextInputWidget("<PATH>", "Path to Object:")
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            'Create resource', 'Click to create a BigQuery dataset as a controlled resource.', 'plus').get()
        self.button.on_click(self.create_gcs_bucket)
        self.toggle_optional = wu.ShowOptionalCheckbox().get()
        self.toggle_optional.observe(self.toggle)
        self.required_fields = [
            self.toggle_optional, self.label, self.access_drop_down,
            self.input_bucket_name.get(), self.cloning_drop_down,
            self.input_description.get(), self.input_name.get(),
            self.storage_drop_down, self.button, self.output]
        self.vb = widgets.VBox(
            children=self.required_fields,
            layout=wu.vbox_layout
        )

    def toggle(self, event):
        self.output.clear_output()
        with self.output:
            if self.toggle_optional.value == True:
                parameterList = []
                for r in self.required_fields:
                    parameterList.append(r)
                # Insert optional fields, preserving alphabetical order.
                parameterList.insert(4, self.input_auto_delete.get())
                parameterList.insert(7, self.input_lifecycle.get())
                parameterList.insert(8, self.input_location.get())
                self.vb.children = parameterList
            else:
                if self.vb.children != self.required_fields:
                    self.vb.children = self.required_fields

    def create_gcs_bucket(self, b):
        with self.output:
            description_content = f"\"{self.input_description.get().value}\""
            commandList = [
                "terra", "resource", "create", "gcs-bucket",
                f"--access={self.access_drop_down.value}",
                f"--cloning={self.cloning_drop_down.value}",
                f"--description={description_content}",
                f"--name={self.input_name.get().value}",
                f"--storage={self.storage_drop_down.value}",
                f"--workspace={CURRENT_WORKSPACE_ID}"
            ]

            if self.input_auto_delete.get().value != "":
                commandList.append(
                    f"--auto-delete={self.input_auto_delete.get().value}")
            if self.input_bucket_name.get().value != "":
                commandList.append(
                    f"--bucket-name={self.input_bucket_name.get().value}")
            if self.input_lifecycle.get().value != "":
                commandList.append(
                    f"--lifecycle={self.input_lifecycle.get().value}")
            if self.input_location.get().value != "":
                commandList.append(
                    f"--location={self.input_location.get().value}")

            print('Running command:')
            print('\n'.join(commandList))
            print('')

            result = subprocess.run(
                commandList, capture_output=True, text=True)
            print(result.stderr) if not result.stdout else print(result.stdout)


create_gcs_bucket_widget = CreateGcsBucketWidget()
display(create_gcs_bucket_widget.vb)

#### 3. Create a GCP notebook
<a id='create-cloud-env'></a>

Using the widget in this section, you can create a GCP notebook as a controlled resource of this workspace.
Run the cell below to create a widget, then populate the widget's input fields and click the button to add a cloud environment as a resource controlled by your workspace.

For complex use cases, the Workbench CLI allows for the specification of some [additional parameters](https://cloud.google.com/vertex-ai/docs/workbench/reference/rest/v1/projects.locations.instances#Instance) which the widget doesn't support.

Widget parameters include:

- `Access`: Defaults to `SHARED_ACCESS`, which grants bucket access to all users with access to the workspace, based on the user's workspace access (Reader or Writer). `PRIVATE_ACCESS` ensures the bucket and its contents cannot be shared.
- `Cloning`: Read about these options [in the Cloning Instruction section](#choosing-the-right-cloning-instruction). Default is `COPY_REFERENCE`.
- `Description`: (_Optional_) Should describe the purpose of the cloud environment.
- `Instance ID`: (_Optional_) A unique name given to the cloud environment which cannot be changed later. If none is provided, the instance ID will be set to the name. This ID is visible in the Google Cloud console for debugging, not in the VWB UI.
- `Name`: This name is displayed in the VWB UI Environments tab. The value should be brief and memorable while communicating the purpose of the cloud environment (e.g. `analysis-notebooks-cloud-env`).
- `Post-startup Script`: (_Optional_) Path to a Bash script that automatically runs after your cloud env fully boots up. The path must be a URL or Cloud Storage path, e.g. 'gs://path-to-file/file-name'. If no post-startup script is provided, the [default script](https://github.com/DataBiosphere/terra-workspace-manager/blob/main/service/src/main/java/bio/terra/workspace/service/resource/controlled/cloud/gcp/ainotebook/post-startup.sh) will be used, which is recommended in most cases.

Once you've run the cell below, populated the input fields and clicked the button, the output should resemble:

```
Successfully added controlled GCP Notebook instance.
Name:         <NAME>
Description:  <DESCRIPTION>
Type:         AI_NOTEBOOK
Stewardship:  CONTROLLED
Cloning:      COPY_NOTHING
Access scope: PRIVATE_ACCESS
Managed by:   USER
Region:       us-central1
Private user: <TERRA_USER_EMAIL>
Properties:   class Properties {
    []
}
GCP project id: <GOOGLE_PROJECT_ID>
Instance id:   <INSTANCE_ID>
Location: us-central1-a
Instance name: projects/<GOOGLE_PROJECT_ID>/locations/us-central1-a/instances/<INSTANCE_ID>
State:         PROVISIONING
Metadata:
   notebooks-api-version: v1
   post-startup-script: https://raw.githubusercontent.com/DataBiosphere/terra-workspace-manager/main/service/src/main/java/bio/terra/workspace/service/resource/controlled/cloud/gcp/ainotebook/post-startup.sh
   disable-swap-binaries: true
   serial-port-logging-enable: true
   terra-cli-server: verily
   terra-workspace-id: <WORKSPACE_ID>
   proxy-mode: service_account
   enable-guest-attributes: TRUE
   warmup-libraries: matplotlib.pyplot
   shutdown-script: /opt/deeplearning/bin/shutdown_script.sh
   notebooks-api: PROD
Proxy URL:     (undefined)
Create time:   <YYYY-MM-DDTHH:MM:SS>
```


In [None]:
class CreateGcpNotebookWidget(object):
    def __init__(self):
        self.label = widgets.Label(
            value='Please provide appropriate values in the input boxes.')
        self.access_drop_down = wu.DropdownInputWidget(
            ['SHARED_ACCESS', 'PRIVATE_ACCESS'], 'SHARED_ACCESS', "Access:").get()
        self.cloning_drop_down = wu.DropdownInputWidget(
            ['COPY_NOTHING', 'COPY_DEFINITION', 'COPY_RESOURCE', 'COPY_REFERENCE'], 'COPY_REFERENCE', "Cloning:").get()
        self.input_name = wu.TextInputWidget("<NAME>", "Name:")
        self.input_description = wu.TextInputWidget(
            "<DESCRIPTION>", "Description:")
        self.input_instance_id = wu.TextInputWidget(
            "<INSTANCE_ID>", "Instance ID (Cannot Be Changed) - optional:")
        self.input_post_startup_script = wu.TextInputWidget(
            "<PATH TO SCRIPT (gs://)>", "Post-Startup Script - optional:")
        self.output = widgets.Output()
        self.button = wu.StyledButton(
            'Create cloud environment', 'Click to create a cloud environment as a controlled resource.', 'plus').get()
        self.button.on_click(self.create_gcp_notebook)
        self.toggle_optional = wu.ShowOptionalCheckbox().get()
        self.toggle_optional.observe(self.toggle)
        self.required_fields = [
            self.toggle_optional, self.label, self.access_drop_down,
            self.cloning_drop_down, self.input_description.get(),
            self.input_name.get(), self.button, self.output]
        self.vb = widgets.VBox(
            children=self.required_fields,
            layout=wu.vbox_layout
        )

    def toggle(self, event):
        self.output.clear_output()
        with self.output:
            if self.toggle_optional.value == True:
                parameterList = []
                for r in self.required_fields:
                    parameterList.append(r)
                # Insert optional fields, preserving alphabetical order.
                parameterList.insert(5, self.input_instance_id.get())
                parameterList.insert(7, self.input_post_startup_script.get())
                self.vb.children = parameterList
            else:
                if self.vb.children != self.required_fields:
                    self.vb.children = self.required_fields

    def create_gcp_notebook(self, b):
        with self.output:
            description_content = f"\"{self.input_description.get().value}\""
            commandList = [
                "terra", "resource", "create", "gcp-notebook",
                f"--access={self.access_drop_down.value}",
                f"--cloning={self.cloning_drop_down.value}",
                f"--name={self.input_name.get().value}",
                f"--description={description_content}",
                f"--workspace={CURRENT_WORKSPACE_ID}"
            ]
            if self.input_instance_id.get().value != "":
                commandList.append(
                    f"--instance-id={self.input_instance_id.get().value}")
            if self.input_post_startup_script.get().value != "":
                commandList.append(
                    f"--post-startup-script={self.input_post_startup_script.get().value}")

            print('Running command:')
            print('\n'.join(commandList))
            print('')

            result = subprocess.run(
                commandList, capture_output=True, text=True)
            print(result.stderr) if not result.stdout else print(result.stdout)


create_gcp_notebook_widget = CreateGcpNotebookWidget()
display(create_gcp_notebook_widget.vb)

## Provenance

Generate information about this notebook environment and the packages installed.


In [None]:
!date

Conda and pip installed packages:


In [None]:
!conda env export

JupyterLab extensions:


In [None]:
!jupyter labextension list

Number of cores:


In [None]:
!grep ^processor /proc/cpuinfo | wc -l

Memory:


In [None]:
!grep "^MemTotal:" /proc/meminfo

---

Copyright 2022 Verily Life Sciences LLC

Use of this source code is governed by a BSD-style  
license that can be found in the LICENSE file or at  
https://developers.google.com/open-source/licenses/bsd
