# Working with data collections

<table align="left">

  <td>
    <a href="https://github.com/DataBiosphere/terra-axon-examples/blob/main/first_hour_on_terra/working_with_data_collections.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/DataBiosphere/terra-axon-examples/main/first_hour_on_terra/working_with_data_collections.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in a Terra notebook instance
    </a>
  </td>                                                                                               
</table>

## Overview

This notebook provides examples of working with data collections in Terra. Build upon the best practices described in this notebook to create and share your own data collections. 

### Objective

Perform common workspace resource operations including:

1. Create a new data collection from cloud data.
1. Share the data collection with collaborators.
1. Add the data collection as a resource to a new workspace.

#### How to run this notebook

Please run the setup section before running any other section in this notebook.

#### Costs

This notebook takes less than a minute to run, which will typically cost less than $0.01 of compute time on your cloud environment.

### Setup

Run the cell below to capture the ID of the current workspace. You'll use this value to return to the current workspace after you've created a new workspace as part of the process of creating a data collection.

In [None]:
import json
from ipywidgets import Button, Layout

CURRENT_WORKSPACE_ID = !terra workspace describe --format=json | jq --raw-output ".id"

## Create a data collection

The process of creating a data collection requires you to specify the following:
1. What data do you want to share? What type of resources--Cloud Storage buckets or objects, BigQuery tables, GitHUb repositories--will be made available via this data collection?
1. With whom do you wish to share this data? Will you be sharing the data collection with all members of an existing Terra group (e.g. for your organization or team), or will you need to create a new Terra group in order to restrict access to the data collection?
1. Will you update the data collection by releasing future versions?

<div class="alert alert-block alert-success">
<b>Note:</b> 
    If you'd like to restrict access to your data collection to members of a specific group, you'll need to provide the <a href="https://et-docs-tests.googleplex.com/docs/reference/glossary/#policy">group policy constraint</a> at the time of workspace creation. See <a href="../creating_a_group.ipynb">../creating_a_group.ipynb</a> for details on how to create a Terra group that can be used for group policy constraints on workspaces and data collections. </div>

### Create a new workspace

In order to create a data collection, you must first create a new workspace. Run the cell below to create a new workspace. The output should resemble:

```
Workspace successfully created.
ID:                <GOOGLE_PROJECT_ID>-dc-ws
Name:              <TERRA_USER_EMAIL>-Data
Description:       A new workspace which I will transform into a data collection.
Cloud Platform:    GCP
Google project:    <GOOGLE_PROJECT_ID>
Cloud console:     https://console.cloud.google.com/home/dashboard?project=terra-vpp-arctic-mustard-2133
Properties:
  terra-type: workspace
Created:           YYYY-MM-DD
Last updated:      YYYY-MM-DD
# Resources:       0
```

In [None]:
input_style= {'description_width':'initial'}
buttonOutput = 'Please provide appropriate values in the input boxes.'

input_workspace_id = widgets.Text(
 placeholder="<WORKSPACE_ID>",
 description="Workspace ID:",
 style=input_style
)
output_workspace_id = widgets.Text()
input_description = widgets.Text(
 placeholder="<DESCRIPTION>",
 description="Description:",
 style=input_style
 )
input_name = widgets.Text(
 placeholder="<RESOURCE_NAME>",
 description="Resource Name:",
 style=input_style
 )
display(input_workspace_id)
display(input_description)
display(input_name)

# Define function to bind inputs to outputs.
def bind_input_to_output(sender):
    output_workspace_id.value = input_workspace_id.value

# define a function for the button to call
def button_click_event(b):
    with output:
        global buttonOutput
        terraCommand = """terra workspace create \\
        --id={0} \\
        --description=\"{1}\" \\
        --name={2}""".format(input_workspace_id.value,input_description.value,input_version.value)       
        print(terraCommand)
        os.system(terraCommand)

# get a reference to the widget output
output = widgets.Output()
input_workspace_id.observe(bind_input_to_output)
button = widgets.Button(
    description='Create workspace',
    disabled=False,
    button_style='',
    tooltip='Click to create a new workspace',
    icon='check',
    layout=Layout(width='50%', height='40px')
)

#bind the button_click_event to the button call event
button.on_click(button_click_event)

# show the current state of the output
print(buttonOutput)

#display the button
display(button, output)

### Add resources to data collection

Add new controlled and/or referenced resources to the workspace created in the previous step using the Terra CLI or the Terra UI Resources tab. To easily add resources, use the widgets provided in [../workspace_resource_examples.ipynb](../workspace_resource_examples.ipynb../workspace_resource_examples.ipynb).

### Convert workspace to data collection

Run the cell below to change the type of the workspace you created to 'data collection'. The output should resemble:
```
Workspace properties successfully updated.
ID:                emmarogge-getting-started-data-collection-ws
Name:              emmarogge@google.com-data
Description:       A new workspace which will become a data collection.
Cloud Platform:    GCP
Google project:    terra-vpp-amiable-haricot-324
Cloud console:     https://console.cloud.google.com/home/dashboard?project=terra-vpp-amiable-haricot-324
Properties:
  terra-workspace-short-description: descriptive content
  terra-workspace-version: 1.0
  terra-type: data-collection
Created:           2023-05-03
Last updated:      2023-05-03
# Resources:       0Workspace properties successfully updated.
ID:                emmarogge-getting-started-data-collection-ws
Name:              emmarogge@google.com-data
Description:       A new workspace which will become a data collection.
Cloud Platform:    GCP
Google project:    terra-vpp-amiable-haricot-324
Cloud console:     https://console.cloud.google.com/home/dashboard?project=terra-vpp-amiable-haricot-324
Properties:
  terra-workspace-short-description: descriptive content
  terra-workspace-version: 1.0
  terra-type: data-collection
Created:           2023-05-03
Last updated:      2023-05-03
# Resources:       0
```

In [None]:
input_dc_workspace_id = widgets.Text(
 placeholder="<WORKSPACE_ID>",
 description="Workspace ID:",
 style=input_style,
 value=output_workspace_id.value
)
widgetLink = widgets.jslink((input_dc_workspace_id,'value'),(input_workspace_id,'value'))
input_short_description = widgets.Text(
 placeholder="<DESCRIPTION>",
 description="Description:",
 style=input_style
 )
input_version = widgets.Text(
 placeholder="<VERSION>",
 description="Version:",
 style=input_style
 )
display(input_workspace_id)
display(input_short_description)
display(input_version)

# define a variable that will be reset when the button is pressed
buttonOutput = 'Please provide appropriate values in the input boxes.'

# define a function for the button to call
def button_click_event(b):
    with output:
        global buttonOutput
        terraCommand = """terra workspace set-property \\
        --workspace={0} \\
        --properties="terra-type=data-collection,terra-workspace-short-description={1},terra-workspace-version={2}"       """.format(input_workspace_id.value,input_short_description.value,input_version.value)       
        # print(terraCommand)
        os.system(terraCommand)

# get a reference to the widget output
output = widgets.Output()

button = widgets.Button(
    description='Convert to data collection',
    disabled=False,
    button_style='',
    tooltip='Click to convert to data collection',
    icon='check',
    layout=Layout(width='50%', height='40px')
)

#bind the button_click_event to the button call event
button.on_click(button_click_event)

# show the current state of the output
print(buttonOutput)

#display the button
display(button, output)

### Add data collection to workspace

Follow the steps below to add your new data collection to an Enterprise Terra workspace.<br>The video below provides a visual walkthrough of these steps.

1. In the [Enterprise Terra workspace UI](https://terra.verily.com/workspaces), select a workspace.
1. Navigate to the Resources tab.
1. Click the "+ Data catalog" button.
1. Select your newly created data collection from those listed in the modal.
1. Navigate through the steps in the modal to complete the addition of the data collection to your workspace.
1. On a cloud environment terminal in your workspace, run `terra resource list` to confirm the data collection has been added.

<video controls src="screencasts/add_data_collection_to_workspace.mp4" width=600>Add data collection to workspace</video>

## Provenance

Generate information about this notebook environment and the packages installed.

In [None]:
!date

Conda and pip installed packages:

In [None]:
!conda env export

JupyterLab extensions:

In [None]:
!jupyter labextension list

Number of cores:

In [None]:
!grep ^processor /proc/cpuinfo | wc -l

Memory:

In [None]:
!grep "^MemTotal:" /proc/meminfo

---
Copyright 2022 Verily Life Sciences LLC

Use of this source code is governed by a BSD-style   
license that can be found in the LICENSE file or at   
https://developers.google.com/open-source/licenses/bsd