# Download a Dataset's available assets

The script in this notebook retrieves download links for a Dataset and then uses those links to download all available assets.

Fetching a Dataset requires only the Collection id and the Dataset id; it does not require an API key/access token.

Note: This notebook guides download for assets of the current version of a dataset. For public collections, that is the most recently
published version of a dataset. For private collections, that is the most recently successfully processed dataset version. This notebook does not cover downloading previously published versions of a revised dataset.

Note: The 'assets' field exists for the all endpoints that return full dataset or dataset version metadata in the response. You may adapt this guide to download the dataset 'assets' returned by any of those endpoints.

### Import dependencies

In [None]:
import requests

#### <font color='#bc00b0'>Please fill in the required values:</font>

<font color='#bc00b0'>(Required) Enter the id of the Collection that contains the Dataset for which you want to download assets</font>

_The Collection id can be found by looking at the url path in the address bar 
when viewing your Collection in the CZ CELLxGENE Discover data portal: `/collections/{collection_id}`._

In [None]:
collection_id = "01234567-89ab-cdef-0123-456789abcdef"

<font color='#bc00b0'>(Required) Enter the id of the Dataset for which you want to download assets</font>

_The Dataset id can be found by using the `/collections/{collection_id}` endpoint and filtering for the Dataset of interest OR by looking at the url path in the address when viewing your Dataset using the CZ CELLxGENE Explorer browser tool: `/e/{dataset_id}.cxg/`._

In [None]:
dataset_id = "abcdef01-2345-6789-abcd-ef0123456789"

### Specify domain (and API url)

In [None]:
domain_name = "cellxgene.cziscience.com"
site_url = f"https://{domain_name}"
api_url_base = f"https://api.{domain_name}"

### Formulate request and fetch dataset metadata

In [None]:
dataset_path = f"/curation/v1/collections/{collection_id}/datasets/{dataset_id}"
url = f"{api_url_base}{dataset_path}"
res = requests.get(url=url)
res.raise_for_status()

### Use download links to download assets

In [None]:
assets = res.json()["assets"]
# Alternatively, you may parse the response of any endpoint that returns full dataset or dataset version metadata for the 'assets' field,
# and pass that into the loop below to download all its returned dataset assets
for asset in assets:
    download_filename = f"{collection_id}_{dataset_id}_{asset['filename']}"
    print(f"\nDownloading {asset['filetype']} file to {download_filename}... ")
    with requests.get(asset["url"], stream=True) as res:
        res.raise_for_status()
        filesize = int(res.headers["Content-Length"])
        with open(download_filename, "wb") as df:
            total_bytes_received = 0
            for chunk in res.iter_content(chunk_size=1024 * 1024):
                df.write(chunk)
                total_bytes_received += len(chunk)
                percent_of_total_upload = float("{:.1f}".format(total_bytes_received / filesize * 100))
                color = "\033[38;5;10m" if percent_of_total_upload == 100 else ""
                print(f"\033[1m{color}{percent_of_total_upload}% downloaded\033[0m\r", end="")
print("\n\nDone downloading assets")