# Download a Dataset's available assets

The script in this notebook retrieves download links for a Dataset and then uses those links to download all available assets.

Fetching a Dataset requires only the Collection id and the Dataset id; it does not require an API key/access token.

### Import dependencies

In [133]:
library("readr")
library("httr")
library("stringr")
library("rjson")

#### <font color='#bc00b0'>Please fill in the required values:</font>

<font color='#bc00b0'>(Required) Enter the id of the Collection that contains the Dataset for which you want to fetch download links</font>

_The Collection id can be found by looking at the url path in the address bar 
when viewing your Collection in the CZ CELLxGENE Discover data portal: `/collections/{collection_id}`._

In [134]:
collection_id <- "01234567-89ab-cdef-0123-456789abcdef"
collection_id <- "2ec4249f-056e-468b-a1d4-f4767e92e202"

<font color='#bc00b0'>(Required) Enter the id of the Dataset for which you want to fetch download links</font>

_The Dataset id can be found by using the `/collections/{collection_id}` endpoint and filtering for the Dataset of interest OR by looking at the url path in the address when viewing your Dataset using the CZ CELLxGENE Explorer browser tool: `/e/{dataset_id}.cxg/`._

In [135]:
dataset_id <- "abcdef01-2345-6789-abcd-ef0123456789"
dataset_id <- "f9388592-6cf9-478f-9921-d4cab9237f06"

### Specify domain (and API url)

In [136]:
domain_name <- "cellxgene.cziscience.com"
domain_name <- "cellxgene.dev.single-cell.czi.technology"
site_url <- str_interp("https://${domain_name}")
api_url_base <- str_interp("https://api.${domain_name}")

### Formulate request and fetch a Dataset's assets

In [137]:
assets_path <- str_interp("/curation/v1/collections/${collection_id}/datasets/${dataset_id}/assets")
url <- str_interp("${api_url_base}${assets_path}")
res <- GET(url=url, add_headers(`Content-Type`="application/json"))
stop_for_status(res)

### Use download links to download assets

In [138]:
assets <- content(res)
for (asset in assets) {
    download_filename <- str_interp("${collection_id}_${dataset_id}_${asset$filename}")
    print(str_interp("Downloading ${asset$filetype} file to ${download_filename}... "))
    df <- file(download_filename, "wb")
    # AWS presigned urls do not provide "Content-Length" in response to HEAD, so we use a GET of 0 bytes
    res <- GET(asset$presigned_url, config = add_headers(Range = 'bytes=0'))
    filesize <- strtoi(res$headers$`Content-Length`)
    total_bytes_received <- 0
    res <- GET(
        asset$presigned_url,
        write_stream(function(bytes) {
            print(typeof(bytes))
            print(length(bytes))
            writeBin(bytes, df)
            total_bytes_received <- total_bytes_received + length(bytes)
            print("tbr")
            print(total_bytes_received)
            percent_of_total_upload <- trunc(total_bytes_received / filesize * 100 * 10) / 10
            cat(str_interp("${percent_of_total_upload}% downloaded"))
        })
    )
    close(df)
    stop_for_status(res)
}
print("Done downloading assets")

[1] "Downloading H5AD file to 2ec4249f-056e-468b-a1d4-f4767e92e202_f9388592-6cf9-478f-9921-d4cab9237f06_local.h5ad... "
[1] "raw"
[1] 16949
[1] "tbr"
[1] 16949
3.5% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded[1] "raw"
[1] 17408
[1] "tbr"
[1] 17408
3.6% downloaded