# Copy Datasets Using Globus


Each attendee has access to their own Guest Collection, which is a shared endpoint that can be used to host and share data.


* Explain how to login
* Explain how to find the Guest Collection
* Explain how to configure permissions on each folder

Specify the UUID of your Guest Collection in `my_collection_uuid`

For this tutorial, we assume you're putting all of the datasets under a single folder. This isn't necessary, it just makes things easier.

Define your collection's base folder using the `my_collection_folder` variable. Reasonable names include "datasets", "data", or "repository". These forlder names are just for hierachical organization.

In [None]:
import json
import copydataset

In [None]:
%cd cheapandfair-template/

In [None]:
%%file ENDPOINT.sh
UUID='18ed636e-0389-44c3-b533-cb3901dfc60f'
FOLDER='/myfolder/'
DOMAIN='g-1926f5.c2d0f8.bd7c.data.globus.org'

In [None]:
import toml
endpoint = toml.load("ENDPOINT.sh")

We copy the dataset by providing the name of the dataset, the UUID of the destination Guest Collection, and the folder we want it copied to.

The method will return the base URL of the Guest Collection and the manifest of the files that were copied. The URL returned will be the same for each of the datasets we copy because they're all going to the same Guest Collection. The method will also write a copy of the manifest to the local directory named `<dataset>-manifest.json`.

## Authenticate with Globus

Use the Globus SDK to copy the examples datasets with Globus. When you first run the `copydataset` method you'll be prompted to login to Globus to get tokens to copy the data and upload the manifest of files for each dataset.

In [None]:
url, cmb_manifest = copydataset.copydataset('cmb', endpoint["UUID"], endpoint["FOLDER"])

The URL returned is the base URL of your Guest Collection. This will be used later.

In [None]:
print(url)

Let's look at a couple of the entries in the file manifest.

In [None]:
print(json.dumps(cmb_manifest[:2], indent=2))

Now we can copy the other two datasets. You won't need to login again because the tokens have been cached in `~/.cheapandfair.json`.

In [None]:
url, dust_manifest = copydataset.copydataset('dust',  endpoint["UUID"], endpoint["FOLDER"])
url, synch_manifest = copydataset.copydataset('synch',  endpoint["UUID"], endpoint["FOLDER"])

We can see that the manifests were saved locally.

In [None]:
!ls *.json

Finally let's save details about the endpoint in a file so that we can use it in the following steps, this includes the Guest Collection UUID, the domain for HTTPS access and the root folder of the datasets relative to the root of the Guest Collection.
The file will be used both in bash script and in Python, make sure there are no spaces around the `=` sign: