# Upload a local datafile to add or replace a Dataset in a Collection

The script in this notebook performs the upload of a local datafile to a given Collection (as identified by its Collection uuid), where the datafile becomes a Dataset accessible via the Data Portal UI.

In order to use this script, you must...
- have a Curation API key (obtained from upper-righthand dropdown in the Data Portal UI after logging in)
- know the id of the Collection to which you wish to upload the datafile (from `/collections/<collection_id>` in url path in Data Portal UI)

**For new Dataset uploads**:
- You must decide upon a string tag (the `curator_tag`) to use to uniquely identify the resultant Dataset within its Collection. This tag must *NOT* be the tag of an existing Dataset within this Collection (read on below), and it must _NOT_ conform to the uuid format.

**For replacing/updating existing Datasets**:
- Uploads to a curator tag for which there already exists a Dataset in the given Collection will result in the existing Dataset being replaced by the new Dataset created from the datafile that you are uploading.
- Alternatively, while not necessarily recommended, an existing dataset _may_ be targeted for replacement by using the Dataset's Cellxgene uuid as the tag when writing to S3.
- You can only add/replace Datasets in private Collections or revision Collections.

For all uploads, the `.h5ad` suffix must be appended to the tag in the S3 write key. See example below.

#### <font color='#bc00b0'>Please fill in the required values:</font>

<font color='#bc00b0'>(Required) Provide the path to your api key file</font>

In [None]:
api_key_file <- "path/to/api-key.txt"

<font color='#bc00b0'>(Required) Provide the absolute path to the h5ad datafile to upload</font>

In [None]:
filename <- "/absolute/path/to-datafile.h5ad"

<font color='#bc00b0'>(Required) Enter your chosen `curator_tag`, which will serve as a unique identifier _within this Collection_ for the resultant Dataset. **Must possess the '.h5ad' suffix**.</font>
    
_We recommmend using a tagging scheme that 1) makes sense to you, and 2) will help organize and facilitate your 
automation of future uploads for adding new Datasets and replacing existing Datasets._

In [None]:
curator_tag <- "arbitrary/tag/chosen-by-you.h5ad"

<font color='#bc00b0'>(Required) Enter the uuid of the Collection to which you wish to add this datafile as a Dataset</font>

_The Collection uuid can be found by looking at the url path in the address bar 
when viewing your Collection in the UI of the Data Portal website:_ `collections/{collection_id}`_. You can only add/replace Datasets in private Collections or revision Collections (and not public ones)._

In [None]:
collection_id <- "01234567-89ab-cdef-0123-456789abcdef"

### Import dependencies

In [None]:
library("readr")
library("aws.s3")
library("httr")
library("stringr")

### Use API key to obtain a temporary access token

In [None]:
api_key <- read_file(api_key_file)
url <- "https://api.cellxgene.dev.single-cell.czi.technology/curation/v1/auth/token"
res <- POST(url=url, add_headers(`x-api-key`=api_key))
stop_for_status(res)
access_token <- content(res)$access_token

##### (optional, debug) verify status code of response

In [None]:
print(res$status_code)

### Retrieve temporary s3 write credentials. These credentials will only work for _this_ Collection.

In [None]:
s3_credentials_url <- str_interp("https://api.cellxgene.dev.single-cell.czi.technology/curation/v1/collections/${collection_id}/datasets/s3-upload-credentials")
bearer_token <- str_interp("Bearer ${access_token}")
res <- POST(url=s3_credentials_url, add_headers(`Authorization`=bearer_token))
stop_for_status(res)
res_content <- content(res)
access_key_id <- res_content$Credentials$AccessKeyId
secret_access_key <- res_content$Credentials$SecretAccessKey
session_token <- res_content$Credentials$SessionToken
upload_path <- res_content$UploadPath

### Extract formatted upload path from credentials endpoint response

In [None]:
bucket <- res_content$Bucket
key_prefix <- res_content$UploadKeyPrefix
upload_key <- paste(key_prefix, curator_tag, sep="")
print(str_interp("Full S3 write path is s3://${bucket}/${upload_key}"))

### Upload file using temporary s3 credentials

In [None]:
Sys.setenv(
    "AWS_ACCESS_KEY_ID" = access_key_id,
    "AWS_SECRET_ACCESS_KEY" = secret_access_key,
    "AWS_SESSION_TOKEN" = session_token,
    "AWS_DEFAULT_REGION" = "us-west-2"
)
put_object(file=filename, object=upload_key, bucket=bucket)