# Fetch full metadata for a Dataset

The script in this notebook retrieves full metadata for a given Dataset.

Fetching a Dataset requires only the Collection id and the Dataset id; it does not require an API key/access token.

### Import dependencies

In [None]:
library("readr")
library("httr")
library("stringr")
library("rjson")

#### <font color='#bc00b0'>Please fill in the required values:</font>

<font color='#bc00b0'>(Required) Enter the id of the Collection that contains the Dataset for which you want to fetch full metadata</font>

_The Collection id can be found by looking at the url path in the address bar 
when viewing your Collection in the CZ CELLxGENE Discover data portal: `/collections/{collection_id}`._

In [None]:
collection_id <- "01234567-89ab-cdef-0123-456789abcdef"

<font color='#bc00b0'>(Required) Enter the id of the Dataset</font>

_The Dataset id can be found by using the `/collections/{collection_id}` endpoint and filtering for the Dataset of interest OR by looking at the url path in the address when viewing your Dataset using the CZ CELLxGENE Explorer browser tool: `/e/{dataset_id}.cxg/`._

In [None]:
dataset_id <- "abcdef01-2345-6789-abcd-ef0123456789"

### Specify domain (and API url)

In [None]:
domain_name <- "cellxgene.cziscience.com"
site_url <- str_interp("https://${domain_name}")
api_url_base <- str_interp("https://api.${domain_name}")

### Formulate request and fetch a Datasets metadata

In [None]:
dataset_path <- str_interp("/curation/v1/collections/${collection_id}/datasets/${dataset_id}") 
url <- str_interp("${api_url_base}${dataset_path}")
res <- GET(url=url, add_headers(`Content-Type`="application/json"))
stop_for_status(res)
res_content <- content(res)
print(res_content)

### Download Dataset Assets

The dataset metadata provides download URLs for every asset associated with the current dataset version. For public collections, that means the most recently published version of a dataset. For private collections, that means the most recently successfully processed dataset version.

These download URLs are permalinks to download the assets for this particular version of a dataset. If this dataset is revised, you would need to fetch the dataset metadata again to get the latest dataset version asset download links.

In [None]:
assets <- content(res)$assets
dataset_id <- content(res)$dataset_id
for (asset in assets) {
    download_filename <- str_interp("${dataset_id}.${asset$filetype}")
    print(str_interp("Downloading ${download_filename}... "))
    res <- GET(asset$url, write_disk(download_filename), progress())
    stop_for_status(res)
}
print("Done downloading assets")