# Download a Dataset's available assets

The script in this notebook offers two approaches to downloading a Dataset's assets:
1) A direct download of _all_ available assets to local files of the form `{collection_id}_{dataset_id}.{filetype_ext}`

2) Manual retrieval of presigned_url download links for individualized downloads of assets

Fetching a Dataset requires only the Collection id and the Dataset id; it does not require an API key/access token.

### Import dependencies

In [None]:
from src.utils.config import set_api_access_config  # still required for setting api url env vars
from src.utils.logger import set_log_level  # can set_log_level("ERROR") for less logging; default is "INFO"
from src.dataset import download_assets, get_download_links_for_assets
import requests

#### <font color='#bc00b0'>Please fill in the required values:</font>

<font color='#bc00b0'>(Required) Enter the id of the Collection that contains the Dataset for which you want to download assets</font>

_The Collection id can be found by looking at the url path in the address bar 
when viewing your Collection in the CZ CELLxGENE Discover data portal: `/collections/{collection_id}`._

In [None]:
collection_id = "01234567-89ab-cdef-0123-456789abcdef"

<font color='#bc00b0'>(Required) Enter the id of the Dataset for which you want to download assets</font>

_The Dataset id can be found by using the `/collections/{collection_id}` endpoint and filtering for the Dataset of interest OR by looking at the url path in the address when viewing your Dataset using the CZ CELLxGENE Explorer browser tool: `/e/{dataset_id}.cxg/`._

In [None]:
dataset_id = "abcdef01-2345-6789-abcd-ef0123456789"

### Set url env vars

In [None]:
set_api_access_config()

### 1) Download all assets directly to local files

In [13]:
# Uncomment code below to download all assets

# download_assets(collection_id, dataset_id)

### OR

### 2) Fetch list of assets with download links

In [None]:
# Uncomment code below to fetch download links and manually iterate through the download process

# assets = get_download_links_for_assets(collection_id, dataset_id)
# for asset in assets:
#     download_filename = f"{collection_id}_{dataset_id}_{asset['filename']}"
#     print(f"Downloading {asset['filetype']} file to {download_filename}... ")
#     with requests.get(asset["presigned_url"], stream=True) as res:
#         res.raise_for_status()
#         with open(download_filename, "wb") as df:
#             for chunk in res.iter_content(chunk_size=1024 * 1024):
#                 df.write(chunk)
# print("Done downloading assets")