# AlphaGenome Metadata Export

Exports all available track metadata for human and mouse genomes.
Useful for identifying which cell types / tissues are available for a given output type.

**Output files:**
- `human_tracks_metadata.csv` — all 5,563 human tracks (already exists)
- `mouse_tracks_metadata.csv` — all mouse tracks
- `human_contact_maps_metadata.csv` — human contact map cell types only
- `mouse_contact_maps_metadata.csv` — mouse contact map cell types only
- `ear_related_metadata.csv` — any tracks matching ear/cochlea/auditory terms

**External resources for finding ontology terms:**
- [EBI Ontology Lookup Service (OLS4)](https://www.ebi.ac.uk/ols4) — recommended by AlphaGenome docs
- [UBERON anatomy ontology browser](https://www.ebi.ac.uk/ols4/ontologies/uberon) — tissues and organs (inner ear = UBERON:0001846)
- [Cell Ontology (CL) browser](https://www.ebi.ac.uk/ols4/ontologies/cl) — specific cell types (hair cells, neurons)
- [EFO browser](https://www.ebi.ac.uk/ols4/ontologies/efo) — cell lines and experimental factors
- [ENCODE biosample search](https://www.encodeproject.org/search/?type=Biosample) — what ENCODE has profiled
- [AlphaGenome tissue_ontology_mapping colab](https://github.com/google-deepmind/alphagenome) — official examples

In [1]:
import os
import pandas as pd
from dotenv import load_dotenv
from alphagenome.models import dna_client

load_dotenv()
API_KEY = os.getenv('ALPHA_GENOME_API_KEY')

dna_model = dna_client.create(API_KEY)
print('Model initialized.')

Model initialized.


## 1. Fetch all metadata

In [2]:
human_meta = dna_model.output_metadata(dna_client.Organism.HOMO_SAPIENS).concatenate()
mouse_meta = dna_model.output_metadata(dna_client.Organism.MUS_MUSCULUS).concatenate()

print(f'Human tracks: {len(human_meta):,}')
print(f'Mouse tracks: {len(mouse_meta):,}')

Human tracks: 5,563
Mouse tracks: 1,038


## 2. Track counts by output type

In [3]:
human_counts = human_meta.groupby('output_type').size().rename('# Human tracks')
mouse_counts = mouse_meta.groupby('output_type').size().rename('# Mouse tracks')
pd.concat([human_counts, mouse_counts], axis=1).astype(pd.Int64Dtype())

Unnamed: 0_level_0,# Human tracks,# Mouse tracks
output_type,Unnamed: 1_level_1,Unnamed: 2_level_1
OutputType.ATAC,167,18.0
OutputType.CAGE,546,188.0
OutputType.DNASE,305,67.0
OutputType.RNA_SEQ,667,173.0
OutputType.CHIP_HISTONE,1116,183.0
OutputType.CHIP_TF,1617,127.0
OutputType.SPLICE_SITES,4,4.0
OutputType.SPLICE_SITE_USAGE,734,180.0
OutputType.SPLICE_JUNCTIONS,367,90.0
OutputType.CONTACT_MAPS,28,8.0


## 3. Contact maps only (the cell types relevant for Hi-C / TAD analysis)

In [4]:
human_contact = human_meta[human_meta['output_type'] == 'OutputType.CONTACT_MAPS'].copy()
mouse_contact = mouse_meta[mouse_meta['output_type'] == 'OutputType.CONTACT_MAPS'].copy()

print(f'Human contact map cell types: {len(human_contact)}')
print(f'Mouse contact map cell types: {len(mouse_contact)}')

display(human_contact[['ontology_curie', 'biosample_name', 'biosample_type', 'biosample_life_stage']])
display(mouse_contact[['ontology_curie', 'biosample_name', 'biosample_type', 'biosample_life_stage']])

Human contact map cell types: 0
Mouse contact map cell types: 0


Unnamed: 0,ontology_curie,biosample_name,biosample_type,biosample_life_stage


Unnamed: 0,ontology_curie,biosample_name,biosample_type,biosample_life_stage


## 4. Search for inner ear / cochlea / auditory related tracks

These are the terms most relevant to your deletion analysis.
Inner ear-specific ENCODE data is rare — see what's closest.

In [5]:
EAR_KEYWORDS = [
    'ear', 'cochlea', 'cochlear', 'auditory', 'vestibul',
    'hair cell', 'spiral ganglion', 'organ of corti',
    'sensory', 'otic', 'stria vascularis'
]

pattern = '|'.join(EAR_KEYWORDS)

all_meta = pd.concat(
    [human_meta.assign(organism='human'), mouse_meta.assign(organism='mouse')],
    ignore_index=True
)

ear_tracks = all_meta[
    all_meta['biosample_name'].str.contains(pattern, case=False, na=False)
    | all_meta['name'].str.contains(pattern, case=False, na=False)
]

print(f'Ear-related tracks found: {len(ear_tracks)}')
if len(ear_tracks) > 0:
    display(ear_tracks[['organism', 'ontology_curie', 'biosample_name', 'biosample_type', 'output_type']])
else:
    print('No direct matches. See neural/sensory alternatives below.')

Ear-related tracks found: 129


Unnamed: 0,organism,ontology_curie,biosample_name,biosample_type,output_type
141,human,UBERON:0002080,heart right ventricle,tissue,OutputType.ATAC
142,human,UBERON:0002084,heart left ventricle,tissue,OutputType.ATAC
238,human,CL:0002536,amniotic epithelial cell,primary_cell,OutputType.CAGE
345,human,UBERON:0000948,heart,tissue,OutputType.CAGE
398,human,UBERON:0002084,heart left ventricle,tissue,OutputType.CAGE
...,...,...,...,...,...
6371,mouse,UBERON:0000948,heart,tissue,OutputType.SPLICE_SITE_USAGE
6460,mouse,UBERON:0000948,heart,tissue,OutputType.SPLICE_SITE_USAGE
6461,mouse,UBERON:0000948,heart,tissue,OutputType.SPLICE_SITE_USAGE
6550,mouse,UBERON:0000948,heart,tissue,OutputType.SPLICE_JUNCTIONS


## 5. Closest alternatives if inner ear is not available

Inner ear hair cells are sensory neurons derived from the otic placode.
The closest available proxy cell types are typically neural or epithelial.

In [6]:
PROXY_KEYWORDS = [
    'neuron', 'neural', 'nerve', 'brain', 'sensory',
    'epithelial', 'epithelium', 'stem cell'
]

proxy_pattern = '|'.join(PROXY_KEYWORDS)

# Only look at contact map tracks for proxies (most relevant for TAD analysis)
contact_all = pd.concat(
    [human_contact.assign(organism='human'), mouse_contact.assign(organism='mouse')],
    ignore_index=True
)

proxy_tracks = contact_all[
    contact_all['biosample_name'].str.contains(proxy_pattern, case=False, na=False)
]

print(f'Neural/sensory/epithelial contact map tracks: {len(proxy_tracks)}')
display(proxy_tracks[['organism', 'ontology_curie', 'biosample_name', 'biosample_type']])

Neural/sensory/epithelial contact map tracks: 0


Unnamed: 0,organism,ontology_curie,biosample_name,biosample_type


## 6. Export all CSVs

In [7]:
human_meta.to_csv('human_tracks_metadata.csv')
mouse_meta.to_csv('mouse_tracks_metadata.csv')
human_contact.to_csv('human_contact_maps_metadata.csv')
mouse_contact.to_csv('mouse_contact_maps_metadata.csv')

if len(ear_tracks) > 0:
    ear_tracks.to_csv('ear_related_metadata.csv')

print('Exported:')
print('  human_tracks_metadata.csv         — all human tracks')
print('  mouse_tracks_metadata.csv          — all mouse tracks')
print('  human_contact_maps_metadata.csv    — human contact map cell types')
print('  mouse_contact_maps_metadata.csv    — mouse contact map cell types')
if len(ear_tracks) > 0:
    print('  ear_related_metadata.csv           — ear/cochlea/auditory tracks')

Exported:
  human_tracks_metadata.csv         — all human tracks
  mouse_tracks_metadata.csv          — all mouse tracks
  human_contact_maps_metadata.csv    — human contact map cell types
  mouse_contact_maps_metadata.csv    — mouse contact map cell types
  ear_related_metadata.csv           — ear/cochlea/auditory tracks


## 7. Look up a specific ontology term

Once your PI identifies a cell type of interest (e.g. from EBI OLS),
check whether AlphaGenome has it available.

In [8]:
# Change this to any CURIE you want to look up
CURIE_TO_CHECK = 'UBERON:0001846'  # inner ear

result = all_meta[all_meta['ontology_curie'] == CURIE_TO_CHECK]
if len(result) > 0:
    print(f'Found {len(result)} track(s) for {CURIE_TO_CHECK}:')
    display(result[['organism', 'ontology_curie', 'biosample_name', 'output_type']])
else:
    print(f'{CURIE_TO_CHECK} is not available in AlphaGenome.')
    print('Try a related term from https://www.ebi.ac.uk/ols4')

Found 4 track(s) for UBERON:0001846:


Unnamed: 0,organism,ontology_curie,biosample_name,output_type
5587,mouse,UBERON:0001846,internal ear,OutputType.CAGE
5649,mouse,UBERON:0001846,internal ear,OutputType.CAGE
5681,mouse,UBERON:0001846,internal ear,OutputType.CAGE
5743,mouse,UBERON:0001846,internal ear,OutputType.CAGE
