# Upload a local datafile to add or replace a Dataset in a Collection

The script in this notebook performs the upload of a local datafile to a given Collection (as identified by its Collection id), where the datafile becomes a Dataset accessible via the CZ CELLxGENE Discover data portal.

In order to use this script, you must have a Curation API key (obtained from upper-righthand dropdown in the CZ CELLxGENE Discover data portal after logging in).

_For **new** Datasets_: You must separately create a Dataset (the `create_dataset.ipynb` notebook). Then, use the returned Dataset `id` as the suffix (append to the `UploadKeyPrefix` returned from the `/s3-upload-credentials` endpoint) of the S3 upload key. See code below, or read more detailed instructions about how to submit Datasets via S3 upload in [the description for the credentials endpoint](https://api.cellxgene.cziscience.com/curation/ui/#/collection/backend.corpora.lambdas.api.v1.curation.collections.collection_id.datasets.upload_s3.get).

_For **replacing/updating** existing Datasets_: Uploads to a Dataset id that is already populated with data will result in the existing Dataset being replaced by a new Dataset created from the datafile that you are uploading.


You can only add/replace Datasets in _private_ Collections or _private Revisions_ of published Collections.

### Import dependencies

In [None]:
from src.dataset import upload_datafiles_from_manifest, upload_local_datafile
from src.utils.config import set_api_access_config

#### <font color='#bc00b0'>Please fill in the required values:</font>

<font color='#bc00b0'>(Required) Provide the path to your api key file</font>

In [None]:
api_key_file_path = "path/to/api-key-file"

<font color='#bc00b0'>(Required) Provide the absolute path to the h5ad datafile to upload</font>

In [None]:
anndata_file_path = "/absolute/path/to-datafile.h5ad"
atac_fragment_file_path = "/absolute/path/to-datafile.tsv.bgz"
manifest = {}

<font color='#bc00b0'>(Required) Enter the id of the Collection to which you wish to add this datafile as a Dataset</font>

_The Collection id can be found by looking at the url path in the address bar 
when viewing your Collection in the CZ CELLxGENE Discover data portal: `/collections/{collection_id}`. You can only add/replace Datasets in private Collections or private revisions of published Collections. In order to edit a published Collection, you must first create a revision of that Collection._

In [None]:
collection_id = "01234567-89ab-cdef-0123-456789abcdef"

<font color='#bc00b0'>(Required) Enter the id of the Dataset to which you wish to upload your datafile</font>

_The Dataset id can be found by using the `GET /collections/{collection_id}` endpoint and filtering for the Dataset of interest OR by looking at the url path in the address when viewing your Dataset using the CZ CELLxGENE Explorer browser tool: `/e/{dataset_id}.cxg/`. See heading at top for rules about adding vs updating Datasets._

In [None]:
dataset_id = "abcdef01-2345-6789-abcd-ef0123456789"

### Set url and access token env vars

In [None]:
set_api_access_config(api_key_file_path)

### Upload Anndata file using temporary s3 credentials

In [None]:
manifest["anndata"] = upload_local_datafile(anndata_file_path, collection_id, dataset_id)

### Upload ATAC Fragment file using temporary s3 credentials (optional)

In [None]:
manifest["atac_fragment"] = upload_local_datafile(atac_fragment_file_path, collection_id, dataset_id)

### Submit the manifest to the dataset

In [None]:
upload_datafiles_from_manifest(manifest, collection_id, dataset_id)