# Copy Datasets Using Globus


Each attendee has access to their own Guest Collection, which is a shared endpoint that can be used to host and share data.


To start, execute the cell below to make sure you're in the correct working directory.

In [None]:
import os
current_folder = %pwd
if current_folder != "cheapandfair-template":
    %cd cheapandfair-template

**TODO**

* Explain how to login
* Explain how to find the Guest Collection

## Login to Globus

Go to <a href="https://www.globus.org" target="_blank">globus.org</a>, click the "Log In" button on the top right of the page.

<div>
<img alt="Screenshot of the Globus landing page" src="./img/globus-login.png" style="width: 50%; height: 50%" />
</div>

If you aren't currently logged in, select the "Sign in with GitHub" option. If you have an existing Globus account, read this guide on how to <a href="https://docs.globus.org/guides/tutorials/manage-identities/link-to-existing/" target="_blank">link your GitHub identity to your current Globus account</a>.

<div>
<img alt="Screenshot of selecting the identity provider" src="./img/select-github.png" style="width: 50%; height: 50%" />
</div>

## Join the Cheap and FAIR Tutorial Users Globus Group

Click this link: <a href="https://app.globus.org/groups/fad784e0-67dd-11ef-87ff-09715fb135c2/join" target="_blank">Cheap and FAIR Tutorial Users</a>.

If you have multiple identities in your account, select your GitHub identity. If needed, enter your name and organization, and acknowledge the example Terms and Conditions.

<div>
<img alt="Screenshot of joining the Group" src="./img/join-group.png" style="width: 50%; height: 50%" />
</div>

After you request to join the group it will be approved by the instructors.

## Create a Group for Controlling Data Access

While we're looking at Globus Groups, we'll create one to be used later to allow a limited set of users access some datasets.

Go to the <a href="https://app.globus.org/groups" target="_blank">Globus Groups page</a> and click on the "Create new group" link on the top right of the page.

<div>
<img alt="Screenshot of the Globus Groups page" src="./img/create-group.png" style="width: 50%; height: 50%" />
</div>

Define a name for your Group. Short and clear names are best. Check the box that users may request to request to join and we recommend that membership lists are restricted.

<div>
<img alt="Screenshot of creating a Globus Group" src="./img/group-info.png" style="width: 50%; height: 50%" />
</div>

Once the Group is created you'll see an overview page for it. In the body of the page there's a UUID. Click the blue copy icon and paste it into the cell below to replace the `xxxx...` text beside `GROUP=` in the configuration file.

<div>
<img alt="Screenshot of a Globus Group overview page" src="./img/group-page.png" style="width: 50%; height: 50%" />
</div>

In [None]:
%%file config.toml
# This is the UUID of the Globus Group you created at the beginning of this notebook
GROUP='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

## Finding Your Guest Collection

You each have been granted the access manager role on a Globus Guest Collection. This allows you to read and write anywhere in the Collection, and to set permissions on folders in the Collection. The [Globus documentation](https://docs.globus.org/api/transfer/permissions/) has more on how the various roles and permissions work.

To find your Collection, go to the Collections tab on the left side of the Globus webapp, or [click this link](https://app.globus.org/collections).

<div>
<img alt="Screenshot of a the Globus Collections button" src="./img/collection-icon.png"/>
</div>

Select the "Shareable By You" filter and look for one with the name "Cheap and FAIR Tutorial Collection <animal name>". In this example the animal name is "Caribou". Each Collection has a unique animal name.

<div>
<img alt="Screenshot of filtering Globus Collections" src="./img/shareable.png" style="width: 50%; height: 50%" />
</div>

For this tutorial, we assume you're putting all of the datasets under a single folder. This isn't necessary, it just makes things easier.

Modify the following cell and execute it to update the configuration file `config.toml`:

`UUID` is the UUID of the Guest Collection. You can find this and the `DOMAIN` by navigating to the Guest Collection in the Globus web app and look at the settings page.

<div>
<img alt="Screenshot of a Globus Guest Collection overview" src="./img/my-collection.png" style="width: 50%; height: 50%" />
</div>


The same variables are also needed for the Source Collection, which is the endpoint where the data is currently stored. These do not need to be modified for the purposes of this tutorial.

Define your collection's base folder using the `FOLDER` variable. Reasonable names include "datasets", "data", or "repository". These forlder names are just for hierachical organization.

In [None]:
import json
import copy_dataset

In [None]:
%%file -a config.toml
# The following refer to the destination collection, where the data will be copied to and that will serve as backend for the data portal
UUID='02809c9e-1325-4fb2-8b99-fa9b072b2b64'
FOLDER='/datasets/'
DOMAIN='g-b0852d.c2d0f8.bd7c.data.globus.org'
# collection of the source data, during the tutorial this is the "Cheap and FAIR Tutorial Datasets" collection
SOURCE_UUID='7352d991-b0a0-49a2-830c-e8fe8c968ca2'
SOURCE_FOLDER='/public/datasets/'

## Authenticate with Globus

Login to Globus using the following command. This will open a browser window where you can login to Globus and receive a token to paste back into the notebook.

In [None]:
copy_dataset.login();

## Copy Datasets and create manifests
We copy the dataset by providing the name of the dataset, the UUID of the destination Guest Collection, and the folder we want it copied to.

The method will return the manifest of the files that were copied and will also write a copy of the manifest to the local directory named `{dataset}-manifest.json`.

In [None]:
cmb_manifest = copy_dataset.copydataset('cmb')

Let's look at a couple of the entries in the file manifest.

In [None]:
print(json.dumps(cmb_manifest[:2], indent=2))

Now we can copy the other two datasets. You won't need to login again because the tokens have been cached in `~/.cheapandfair.json`.

In [None]:
dust_manifest = copy_dataset.copydataset('dust')
synch_manifest = copy_dataset.copydataset('synch')
synch_manifest = copy_dataset.copydataset('cmb_spectra')
synch_manifest = copy_dataset.copydataset('dust_synch_spectra')

We can see that the manifests were saved locally.

In [None]:
!ls *.json

## Viewing your Data

Let's look at your Collection and see how the datasets are arranged and then set permissions on the folders. Evaluate the next cell and click the link to see your Collection in the File Manager view.

In [None]:
import toml
config = toml.load("config.toml")
url = f'https://app.globus.org/file-manager?origin_id={config["UUID"]}&two_pane=false'
print(url)

You should see a listing of the files and folders in your Collection. You can double click on the folder you specified earlier (e.g., `/datasets/`) and see each dataset folder.

<div>
<img alt="Screenshot of a Globus Collection file listing" src="./img/caribou.png" style="width: 50%; height: 50%" />
</div>

If you go into a particular folder you can see files in it. Dataset folders can have subfolder, although these example datasets have files only in one folder.

<div>
<img alt="Screenshot of a Globus Collection file listing" src="./img/caribou-cmb.png" style="width: 50%; height: 50%" />
</div>



## Setting Permissions

Globus Guest Collections allow you to set permissions on a folder level. All of the subfolder inherit the permission of the higher level folder, so you be less restrictive on subfolders, but you cannot reduce access to a subfolder. This is why the SRDR model suggests that all datasets have a "top" folder, even if the dataset has subfolders. This way you can assign permissions on a per-dataset basis.

To see and manage the permission on the Guest Collection, click the "Permissions" link in the File Manager.

<div>
<img alt="Screenshot of a Globus Collection file listing" src="./img/permissions-button.png" style="width: 50%; height: 50%" />
</div>

The permissions page for each Collection has a consistent URL. You can also evaluate the cell below and click the link.

In [None]:
perm_url = f'https://app.globus.org/file-manager/collections/{config["UUID"]}/sharing'
print(perm_url)

In the permissions tab you can manage the permissions by clicking the "Add Permissions -- Share With" button on the right. You may need to provide another consent.

<div>
<img alt="Screenshot of a Globus Collection permission list" src="./img/share-with.png" style="width: 50%; height: 50%" />
</div>

Again, the URL is consistent so you can use the cell below to create a link directly to the add permissions window.

In [None]:
add_perm_url = f'https://app.globus.org/file-manager/collections/{config["UUID"]}/sharing/create'
print(add_perm_url)

There are five datasets and you'll set permissions on each of them as follows:

- `cmb`: Read-only by the Group you created
- `cmb_spectra`: Read-only by the Group you created
- `dust`: Read-only by the public (anonymous access)
- `synch`: Read-only by the public (anonymous access)
- `dust_synch_spectra`: Read-only by the public (anonymous access)

Permissions can be managed using the Globus API via the Python and JavaScript SDKS, or using the Globus CLI. We're using the webapp user interface so that you have a way to quickly check the state of the permissions at all times.

We'll start with the `cmb` dataset to show how to add read access by a Group.

- Use the "Browse" button to select that folder.
- Set the type of entity to share with "group".
- Use the "Select a Group" button to choose your Group.
- Click the "Add Permission" button at the bottom.
- At the prompt, agree to add another permission.

<div>
<img alt="Screenshot of adding a permission to a Globus Collection" src="./img/cmb-perms.png" style="width: 50%; height: 50%" />
</div>

Next, we'll make the `dust` dataset public.

- Use the "Browse" button to select that folder.
- Set the type of entity to share with "public (anonymous)".
- Click the "Add Permission" button at the bottom.
- At the prompt, agree to add another permission.

<div>
<img alt="Screenshot of adding a permission to a Globus Collection" src="./img/dust-perms.png" style="width: 50%; height: 50%" />
</div>

Go through the other datasets and add permission according to the list above. When you're done, you can review the permissions on your datasets by checking the permission tab.

<div>
<img alt="Screenshot of a Globus Collection permission list" src="./img/all-perms.png" style="width: 50%; height: 50%" />
</div>

You should have one permission line per dataset.

If you need to you can remove permissions by deleting them. You can't edit a permission and if you move the data, the permissions don't follow it. 