# Getting started

In the current release (Summer 2023), the Allen Brain Cell Atlas includes :
* 1.7 million single cell transcriptomes spanning the whole adult mouse brain using the 10Xv2 method (**WMB-10Xv2**)
* 2.3 million single cell transcriptomes spanning the whole adult mouse brain using the 10Xv3 method (**WMB-10Xv3**)
* Clustering analysis of 4.0 million single cell transcriptomes spanning the whole adult mouse brain combining the 10Xv2 and 10Xv3 datasets (**WMB-10X**)
* A five level whole adult mouse brain taxonomy of cell types (**WMB-taxonomy**)
* 4.0 million cell spatial transcriptomics dataset spanning a single adult mouse brain with a 500 gene panel and mapped to the whole mouse brain taxonomy (**MERFISH-C57BL6J-638850**)


Data associated with the Allen Brain Cell Atlas is hosted on Amazon Web Services (AWS) in an S3 bucket as a AWS Public Dataset. 
No account or login is required. The S3 bucket is located here [arn:aws:s3:::allen-brain-cell-atlas](https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html).

Each release has an associated **manifest.json** which list all the specific version of directories and files that are part of the release. We recommend using the manifest as the starting point of data download and usage.

Expression matrices are stored in the [anndata h5ad format](https://anndata.readthedocs.io/en/latest/) and needs to be downloaded to a local file system for usage.

In [2]:
import requests
import json
import os

Let's open the manifest file associated with the current release.

In [3]:
url = 'https://allen-brain-cell-atlas.s3-us-west-2.amazonaws.com/releases/20230630/manifest.json'
manifest = json.loads(requests.get(url).text)
print("version: ", manifest['version'])

version:  20230630


At the top level, the manifest consists of the release *version* tag, S3 *resource_uri*,  dictionaries *directory_listing* and *file_listing*. A simple option to download data is to use the **AWS Command Line Interface ([AWS CLI](https://aws.amazon.com/cli/))** to download specific directories. All the example notebooks in this repository assumes that data has been downloaded locally in the same file organization as specified by the "relative_path" field in the manifest.

In [4]:
manifest.keys()
print("version:",manifest['version'])
print("resource_uri:",manifest['resource_uri'])

version: 20230630
resource_uri: s3://allen-brain-cell-atlas/


In [5]:
manifest['directory_listing'].keys()

dict_keys(['MERFISH-C57BL6J-638850', 'WMB-10Xv2', 'WMB-10Xv3', 'WMB-10X', 'WMB-taxonomy'])

Let look at the information associated with the spatial transcriptomics dataset **MERFISH-C57BL6J-638850**. This dataset has two related directories: *expression_matrices* containing a set of h5ad files and *metadata* containing a set of csv files. Use the *view_link* url to browse the directories on a web-browser.

In [6]:
directories = manifest['directory_listing']['MERFISH-C57BL6J-638850']['directories']
directories

{'expression_matrices': {'version': '20230630',
  'relative_path': 'expression_matrices/MERFISH-C57BL6J-638850/20230630',
  'url': 'https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/expression_matrices/MERFISH-C57BL6J-638850/20230630/',
  'view_link': 'https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html#expression_matrices/MERFISH-C57BL6J-638850/20230630/'},
 'metadata': {'version': '20230630',
  'relative_path': 'metadata/MERFISH-C57BL6J-638850/20230630',
  'url': 'https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/metadata/MERFISH-C57BL6J-638850/20230630/',
  'view_link': 'https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html#metadata/MERFISH-C57BL6J-638850/20230630/'}}

In [7]:
print(directories['expression_matrices']['view_link'])
print(directories['metadata']['view_link'])

https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html#expression_matrices/MERFISH-C57BL6J-638850/20230630/
https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html#metadata/MERFISH-C57BL6J-638850/20230630/


Suppose you would like to download data to *~/temp_download_root*, you can construct **[AWS CLI](https://aws.amazon.com/cli/)** commands using the following pattern

In [8]:
download_root = './my_download_root'

local_path = os.path.join( download_root, directories['metadata']['relative_path'])
remote_path = manifest['resource_uri'] + directories['metadata']['relative_path']

command = "mkdir -p %s" % (local_path)
print(command)
command = "aws s3 sync --no-sign-request %s %s" % (remote_path, local_path)
print(command)

mkdir -p ./my_download_root/metadata/MERFISH-C57BL6J-638850/20230630
aws s3 sync --no-sign-request s3://allen-brain-cell-atlas/metadata/MERFISH-C57BL6J-638850/20230630 ./my_download_root/metadata/MERFISH-C57BL6J-638850/20230630


Here is simple loop to generate the set of commands to download all the directories for the release to a local file system than maintains the structures defined in the manifest.

In [9]:
for dataset in manifest['directory_listing'] :
    
    dataset_dict =  manifest['directory_listing']
    
    for directory in dataset_dict[dataset]['directories'] :
        
        directory_dict = dataset_dict[dataset]['directories'][directory]
        
        local_path = os.path.join( download_root, directory_dict['relative_path'])
        remote_path = manifest['resource_uri'] + directory_dict['relative_path']

        command = "mkdir -p %s" % (local_path)
        print(command)
        command = "aws s3 sync --no-sign-request %s %s" % (remote_path, local_path)
        print(command)
        

mkdir -p ./my_download_root/expression_matrices/MERFISH-C57BL6J-638850/20230630
aws s3 sync --no-sign-request s3://allen-brain-cell-atlas/expression_matrices/MERFISH-C57BL6J-638850/20230630 ./my_download_root/expression_matrices/MERFISH-C57BL6J-638850/20230630
mkdir -p ./my_download_root/metadata/MERFISH-C57BL6J-638850/20230630
aws s3 sync --no-sign-request s3://allen-brain-cell-atlas/metadata/MERFISH-C57BL6J-638850/20230630 ./my_download_root/metadata/MERFISH-C57BL6J-638850/20230630
mkdir -p ./my_download_root/expression_matrices/WMB-10Xv2/20230630
aws s3 sync --no-sign-request s3://allen-brain-cell-atlas/expression_matrices/WMB-10Xv2/20230630 ./my_download_root/expression_matrices/WMB-10Xv2/20230630
mkdir -p ./my_download_root/expression_matrices/WMB-10Xv3/20230630
aws s3 sync --no-sign-request s3://allen-brain-cell-atlas/expression_matrices/WMB-10Xv3/20230630 ./my_download_root/expression_matrices/WMB-10Xv3/20230630
mkdir -p ./my_download_root/metadata/WMB-10X/20230630
aws s3 sync -