# Download CellxGene AnnData Objects

This notebook loads the relevant CellxGene AnnData Objects and writes them into a user defined directory. Currently, it will create a /results/ directory in the home directory of the git repo and place them within the /anndata_objects directory inside this folder. Some of the other analysis notebooks will point to this directory for anndata objects, so make sure to run this notebook prior to those. These notebooks will specificy if they can be run with this data or if they serve as an example of what analyses were performed.

The CellxGene collection ID is: **398e34a9-8736-4b27-a9a7-31a47a67f446** . All data is stored with CellxGene at: https://cellxgene.cziscience.com/collections/398e34a9-8736-4b27-a9a7-31a47a67f446

This notebook was adapted from the ChanZuckerberg Docs: https://github.com/chanzuckerberg/single-cell-curation/blob/main/notebooks/curation_api/python_raw/get_dataset.ipynb

In [16]:
import requests
import os

In [17]:
##USER Defined Anndata Object Directory
##Keep this if you want to use immediately with other notebooks
ANNDATA_OBJECT_DIR='../results/anndata_objects'

In [18]:
##Define domain names for cellxgene
domain_name = "cellxgene.cziscience.com"
site_url = f"https://{domain_name}"
api_url_base = f"https://api.{domain_name}"

##Define specific collection ID for this study
collection_id = "398e34a9-8736-4b27-a9a7-31a47a67f446"

##Fetch collection
collection_path = f"/curation/v1/collections/{collection_id}"
collection_url = f"{api_url_base}{collection_path}"
res = requests.get(url=collection_url)
res.raise_for_status()
res_content = res.json()

In [19]:
kits_downloaded = []
for dataset in res_content['datasets']:
    assets = dataset["assets"]
    dataset_id = dataset["dataset_id"]
    kit_name = dataset['title']
    kits_downloaded.append(kit_name)
    for asset in assets:
        if asset['filetype'] == 'H5AD':
            download_filename = os.path.join(ANNDATA_OBJECT_DIR, f'{kit_name}_annotated.h5ad')
            print(f"\nDownloading {kit_name} to {download_filename} ... ")
            with requests.get(asset["url"], stream=True) as res:
                res.raise_for_status()
                filesize = int(res.headers["Content-Length"])
                with open(download_filename, "wb") as df:
                    total_bytes_received = 0
                    for chunk in res.iter_content(chunk_size=1024 * 1024):
                        df.write(chunk)
                        total_bytes_received += len(chunk)
                        percent_of_total_upload = float("{:.1f}".format(total_bytes_received / filesize * 100))
                        color = "\033[38;5;10m" if percent_of_total_upload == 100 else ""
                        print(f"\033[1m{color}{percent_of_total_upload}% downloaded\033[0m\r", end="")
    print("\n\nDone downloading assets")


Downloading Honeycomb-rep2 to ../results/anndata_objects/Honeycomb-rep2_annotated.h5ad ... 
[1m[38;5;10m100.0% downloaded[0m

Done downloading assets

Downloading 10X_FRP-rep1 to ../results/anndata_objects/10X_FRP-rep1_annotated.h5ad ... 
[1m[38;5;10m100.0% downloaded[0m

Done downloading assets

Downloading 10X_3-rep1 to ../results/anndata_objects/10X_3-rep1_annotated.h5ad ... 
[1m[38;5;10m100.0% downloaded[0m

Done downloading assets

Downloading BD-rep1 to ../results/anndata_objects/BD-rep1_annotated.h5ad ... 
[1m[38;5;10m100.0% downloaded[0m

Done downloading assets

Downloading harmony_integrated_data to ../results/anndata_objects/harmony_integrated_data_annotated.h5ad ... 
[1m[38;5;10m100.0% downloaded[0m

Done downloading assets

Downloading 10X_5-rep2 to ../results/anndata_objects/10X_5-rep2_annotated.h5ad ... 
[1m[38;5;10m100.0% downloaded[0m

Done downloading assets

Downloading 10X_3-rep2 to ../results/anndata_objects/10X_3-rep2_annotated.h5ad ... 
[1m[3

In [20]:
kits_downloaded = sorted(kits_downloaded)
print(f'Downloaded Kit Data: {kits_downloaded}')

Downloaded Kit Data: ['10X_3-rep1', '10X_3-rep2', '10X_5-rep1', '10X_5-rep2', '10X_FRP-rep1', '10X_FRP-rep2', 'BD-rep1', 'BD-rep2', 'Fluent-rep1', 'Fluent-rep2', 'Fluent-rep3', 'Honeycomb-rep1', 'Honeycomb-rep2', 'Parse-rep1', 'Scale-rep1', 'Scipio-rep1', 'Scipio-rep2', 'harmony_integrated_data']
