# Babel

This Jupyter Notebook shows you what it looks like when you [run Babel](https://github.com/NCATSTranslator/Babel/blob/master/docs/Downloads.md).

First, let's make sure that we're in the right directory:

In [31]:
import os
import subprocess
import gzip
import yaml
from pathlib import Path

# Make sure we're in the root directory of the Babel Git repository.
cwd = Path.cwd()
possible_directories = [
    # Try CWD
    cwd,
    # Try parent
    cwd.parent
]
found = False
for directory in possible_directories:
    if (directory / "Snakefile").exists() and (directory / ".git").is_dir():
        found = True
        
        # Found it!
        if directory == cwd:
            print(f"We are in the Babel Git repository!")
        else:
            print(f"Found the Babel Git repository at {directory}, changing to that directory.")
            os.chdir(directory)

if not found:
    raise RuntimeError(f"Could not find Babel Git repository in one of these locations: {possible_directories}")

We are in the Babel Git repository!


In [32]:
# The target to use. CellLine was chosen as a small one, but you can choose any listed at https://github.com/NCATSTranslator/Babel#readme.
TARGET = "CellLine"

# The number of CPU cores to use. You can run `nproc` to find out how many cores you have.
CORES=5

# Run the macromolecular complex Babel pipeline.
subprocess.run(["uv", "run", "snakemake", "-c", str(CORES), f"babel_outputs/compendia/{TARGET}.txt"])

Assuming unrestricted shared filesystem usage.
INFO snakemake.logging [2026-01-27T07:30:59-0500]: Assuming unrestricted shared filesystem usage.
INFO snakemake.logging [2026-01-27T07:30:59-0500]: None
host: Vespasian.local
INFO snakemake.logging [2026-01-27T07:30:59-0500]: host: Vespasian.local
Building DAG of jobs...
INFO snakemake.logging [2026-01-27T07:30:59-0500]: Building DAG of jobs...
Using shell: /bin/bash
INFO snakemake.logging [2026-01-27T07:30:59-0500]: Using shell: /bin/bash
Provided cores: 5
INFO snakemake.logging [2026-01-27T07:30:59-0500]: Provided cores: 5
Rules claiming more threads will be scaled down.
INFO snakemake.logging [2026-01-27T07:30:59-0500]: Rules claiming more threads will be scaled down.
Job stats:
job                     count
--------------------  -------
cell_line_compendia         1
get_CLO_labels              1
get_clo                     1
get_clo_ids                 1
get_icrdf                   1
get_obo_descriptions        1
get_obo_labels       

loading CLO
loading complete
took 0:00:00.864178
loading CLO
loading complete
took 0:00:00.864300


[Tue Jan 27 02:31:05 2026]
Finished jobid: 1 (Rule: get_clo_ids)
INFO snakemake.logging [2026-01-27T07:31:05-0500]: Finished jobid: 1 (Rule: get_clo_ids)
2 of 8 steps (25%) done
INFO snakemake.logging [2026-01-27T07:31:05-0500]: None
[Tue Jan 27 02:31:06 2026]
Finished jobid: 3 (Rule: get_CLO_labels)
INFO snakemake.logging [2026-01-27T07:31:06-0500]: Finished jobid: 3 (Rule: get_CLO_labels)
3 of 8 steps (38%) done
INFO snakemake.logging [2026-01-27T07:31:06-0500]: None
UberGraph(https://ubergraph.apps.renci.org/sparql).get_all_descriptions(): 100%|██████████| 20/20 [03:11<00:00,  9.59s/batch]
[Tue Jan 27 02:34:22 2026]
Finished jobid: 7 (Rule: get_obo_descriptions)
INFO snakemake.logging [2026-01-27T07:34:22-0500]: Finished jobid: 7 (Rule: get_obo_descriptions)
4 of 8 steps (50%) done
INFO snakemake.logging [2026-01-27T07:34:22-0500]: None
UberGraph(https://ubergraph.apps.renci.org/sparql).get_all_synonyms():  59%|█████▉    | 13/22 [04:29<03:06, 20.70s/it]]
RuleException:
HTTPError in 

loading babel_outputs/intermediate/cell_line/ids/CLO


[Tue Jan 27 02:48:32 2026]
Finished jobid: 0 (Rule: cell_line_compendia)
INFO snakemake.logging [2026-01-27T07:48:32-0500]: Finished jobid: 0 (Rule: cell_line_compendia)
8 of 8 steps (100%) done
INFO snakemake.logging [2026-01-27T07:48:32-0500]: None
Complete log(s): /Users/gaurav/Developer/translator/babel/.snakemake/log/2026-01-27T023058.434586.snakemake.log
INFO snakemake.logging [2026-01-27T07:48:32-0500]: Complete log(s): /Users/gaurav/Developer/translator/babel/.snakemake/log/2026-01-27T023058.434586.snakemake.log


CompletedProcess(args=['uv', 'run', 'snakemake', '-c', '5', 'babel_outputs/compendia/CellLine.txt'], returncode=0)

This will have produced three files:

## Compendia

In [38]:
with open(f"babel_outputs/compendia/{TARGET}.txt") as fin:
    for line in fin:
        print(line)

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



## Synonyms

In [39]:
with gzip.open(f"babel_outputs/synonyms/{TARGET}.txt.gz", "rt") as fin:
    text = fin.read()

print(text)

FileNotFoundError: [Errno 2] No such file or directory: 'babel_outputs/synonyms/CellLine.txt.gz'

## Metadata

In [30]:
with open(f"babel_outputs/metadata/{TARGET}.txt.yaml", "r") as fin:
    metadata = yaml.safe_load(fin)
metadata

{'combined_from': {'ComplexPortal': {'combined_from': [],
   'counts': [],
   'created_at': '2026-01-27T01:56:36.636743',
   'description': 'Labels and synonyms extracted from ComplexPortal download of 559292 (Saccharomyces cerevisiae)',
   'name': 'ComplexPortal',
   'sources': [{'name': 'ComplexPortal for organism 559292 (Saccharomyces cerevisiae)',
     'type': 'download',
     'url': 'http://ftp.ebi.ac.uk/pub/databases/intact/complex/current/complextab/559292.tsv'}],
   'type': 'transform',
   'url': ''}},
 'counts': {'cliques': 634,
  'eq_ids': 634,
  'property_sources': {},
  'synonyms': 0},
 'created_at': '2026-01-27T02:27:59.078323',
 'description': '',
 'name': 'MacromolecularComplex.txt',
 'sources': [],
 'type': 'compendium',
 'url': ''}