# Get to Know a Dataset: EMBER

This notebook serves as a guided tour of th [EMBER](https://emberarchive.org) Open Data bucket. EMBER is the Ecosystem for Multi-modal Brain-behavior Experimentation and Research.

More usage examples, tutorials, and documentation for this dataset and others can be found at the [Registry of Open Data on AWS](https://registry.opendata.aws/).

### Q: How is the EMBER Open Data Bucket organized?

To understand the organization of the EMBER Open Data bucket, it's important to understand the organization of an EMBER project.

An EMBER project is the top-level organizational unit for EMBER. Within an EMBER project, public data and additional metadata is mostly commonly be stored as dandisets in [EMBER-DANDI](https://dandi.emberarchive.org/). In special cases, EMBER also supports storing other datasets forms.

The EMBER Open Data bucket is organized into three sections, as follows:

1. [EMBER-DANDI](https://dandi.emberarchive.org/) dandisets
    - EMBER-DANDI dandisets are stored using the prefixes blobs/ and dandisets/
2. Other EMBER Data
    - Other forms of datasets are stored using the prefix other/
3. Tools
    - EMBER tools are stored using the prefix tools/


For this tutorial, we will demonstrate using 2 EMBER projects:  
- [Kumar2025](https://emberarchive.org/project/kumar2025)
- [Shepherd2025 - Dandiset 000463](https://dandi.emberarchive.org/dandiset/000463)


First we will import the Python libraries required throughout this notebook.

In [None]:
# This notebook requires the following additional libraries
# (please install using the preferred method for your environment, e.g. uv, pip, conda):
#
# "boto3>=1.42.29",
# "pynwb>=3.1.3",
# "requests>=2.32.5",

import boto3
import requests

from botocore import UNSIGNED
from botocore.config import Config
from pynwb import NWBHDF5IO, NWBFile

Next, we will define the location of our EMBER Open Data Bucket and create our boto3 S3 client.

In [None]:
# EMBER S3 bucket
bucket = "ember-open-data"

# List the top level of the bucket using boto3. Because this is a public bucket, we don't need to sign requests.
# Here we set the signature version to unsigned, which is required for public buckets.
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))

# Print the items in the top-level prefixes to see all of the different BossDB project datasets
for item in s3.list_objects_v2(Bucket=bucket, Delimiter='/')['CommonPrefixes']:
    print(item['Prefix'])

At the top level of the EMBER Open Data bucket, we see that prefixes correspond to EMBER-DANDI dandisets (blobs/ and dandisets/), other datasets (other/), and tools (tools/) as described in the previous section.

In the code blocks below, we will dive into each EMBER project.

**Kumar 2025**
- EMBER Open Data bucket S3 Prefix: `other/kumar2025/`
- Data can also be accessed through our [EMBER Project File Browser: Kumar2025](https://ember-open-data.s3.us-east-1.amazonaws.com/other/kumar2025/index.html)


We will see that data is organized into classifers, pose_files, and videos.


In [None]:
# Kumar2025
project = "kumar2025"

# List the key prefixes within the top level of the Kumar2025 dataset
for item in s3.list_objects_v2(Bucket=bucket, Prefix=f'other/{project}/', Delimiter='/', MaxKeys=10)['CommonPrefixes']:
    print(item['Prefix'])

**Shepherd 2025 - Dandiset 000463**
- Data can also be accessed through [EMBER-DANDI File Browser: Dandiset 000463](https://dandi.emberarchive.org/dandiset/000463/draft/files)

We will see that data within the dandiset is organized by subject.

Please note that the organization of data within a dandiset does not have a direct correspondence to the organization of data within the S3 bucket. In the steps below, we will see how to get the full S3 bucket path.

In [None]:
# Dandiset 000463
dandiset = "000463"
dandiset_version = "draft"
dandi_api_base = "https://api-dandi.emberarchive.org/api"

response = requests.get(f"{dandi_api_base}/dandisets/{dandiset}/versions/{dandiset_version}/assets/paths")
resp_json = response.json()

paths = set()
for asset_file in resp_json["results"]:
    paths.add(asset_file["path"])

paths = sorted(paths)
for path in paths:
    print(path)

### Q: What data formats are present in your dataset? What kinds of data are stored using these formats? Can you give any advice for how you work with these data formats?

EMBER is the data archive for multimodal neurophysiological and behavioral datasets.


# TODO !


NWB
BIDs


### Q: Can you show us an example of downloading and loading data from your dataset?

As an example, we will load a data file from each project


**Shepherd 2025 - Dandiset 000463**
- Data can also be accessed through [EMBER-DANDI File Browser: Dandiset 000463](https://dandi.emberarchive.org/dandiset/000463/draft/files?location=sub-ADPTM01).

We will see that data files can be accessed using the EMBER-DANDI API or directly via S3:
- https://api-dandi.emberarchive.org/api/assets/2d0c4695-a091-48c3-b70c-61b5567ef515/download/
- https://ember-open-data.s3.amazonaws.com/blobs/a67/f98/a67f98c3-ffb8-4c40-8b2f-45706e6bf8a9

In [None]:
# Kumar 2025

# TODO !

In [None]:
# Shepherd 2025 - Dandiset 000463

# Use the first file path from above
file_path = paths[0]

# Query Dandiset assets for files witht the above file path
response = requests.get(f"{dandi_api_base}/dandisets/{dandiset}/versions/{dandiset_version}/assets/?path={file_path}&metadata=false&zarr=false")

# Filter out non-NWB files (for this demo)
assets = response.json()["results"]
nwb_assets = [asset for asset in assets if ".nwb" in  asset["path"]]

# Select 1 NWB file to explore further
file = nwb_assets[0]
asset_id = file["asset_id"]
print(f"Asset Path:\n\t{file["path"]}")

# Get asset metadata
asset_response = requests.get(f"{dandi_api_base}/dandisets/{dandiset}/versions/{dandiset_version}/assets/{asset_id}")
asset_access_urls = asset_response.json()["contentUrl"]

print(f"Asset Access Methods:")
for access_method in asset_access_urls:
    print(f"\t{access_method}")

# Get S3 Access URL
s3_access_url = asset_access_urls[1]
local_file = file["path"].split("/")[-1]

# Get File contents
resp = requests.get(s3_access_url)
with open(local_file, "wb") as f:
    f.write(resp.content)

# Open the NWB file
with NWBHDF5IO(local_file, "r") as io:
    nwbfile = io.read()
