Alethio Therapeutics Python Toolkit - A growing collection of open-source computational tools used by Alethio Therapeutics.
alethiotx is a modular Python package providing specialized tools for therapeutic research and drug discovery. Currently, the package features the Artemis module for drug target prioritization using public knowledge graphs. Additional modules and capabilities will be added in future releases.
The Artemis module enables accessible and scalable drug target prioritization by integrating drug molecule and target data from ChEMBL (including clinical trial phases and approvals), MeSH disease hierarchies, HGNC gene families, pathway information from GeneShot, and machine learning pipelines. It leverages public knowledge graphs to prioritize therapeutic targets across multiple disease areas.
- ChEMBL Integration: Query and process ChEMBL bioactive molecule database with clinical trial information and automatic parent molecule normalization
- MeSH Hierarchy: Retrieve MeSH disease trees and descendants for comprehensive disease coverage
- HGNC Gene Families: Download and analyze gene family data to identify and filter over-represented families
- Clinical Scoring: Calculate clinical validation scores for drug targets based on trial phases, approvals, and family representation
- Pathway Genes: Retrieve and analyze disease-associated genes using Ma'ayan Lab's GeneShot API
- Machine Learning Pipeline: Built-in cross-validation with configurable classifiers for target prediction
- UpSet Plots: Visualize gene set intersections across multiple diseases
- Multi-Disease Support: Pre-configured for breast, lung, prostate, melanoma, bowel cancer, diabetes, and cardiovascular disease
Additional modules for various aspects of drug discovery and therapeutic research are planned for future releases. Stay tuned!
pip install alethiotxNote: The examples below demonstrate the Artemis module functionality. As new modules are added to the package, they will have their own usage examples.
from alethiotx.artemis.chembl import molecules
from alethiotx.artemis.clinical import compute
# Query ChEMBL for parent molecules with clinical trial data
chembl_data = molecules(version='36', top_n_activities=1)
# Compute clinical validation scores for specific diseases
results = compute(
mesh_headings=['Breast Neoplasms', 'Lung Neoplasms'],
chembl_version='36',
trials_only_last_n_years=6,
filter_families=True
)
# Access results for each disease
breast_targets = results['Breast Neoplasms']
print(breast_targets.head())from alethiotx.artemis.clinical import load
# Load pre-computed clinical scores for multiple diseases from S3
breast, lung, prostate, melanoma, bowel, diabetes, cardio = load(date='2025-12-08')from alethiotx.artemis.pathway import get, load
# Query GeneShot API for disease-associated genes
aml_genes = get("acute myeloid leukemia", rif='generif')
print(aml_genes.loc["FLT3", ["gene_count", "rank"]])
# Load pre-computed pathway genes for multiple diseases
breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load(date='2025-11-11', n=100)from alethiotx.artemis.cv import prepare, run
import pandas as pd
# Prepare your knowledge graph features (X) and clinical scores (y)
result = prepare(
X,
y,
pathway_genes=pathway_genes,
known_targets=known_targets,
bins=3,
rand_seed=12345
)
# Run cross-validation pipeline
scores = run(
result['X'],
result['y_binary'],
n_splits=5,
n_iterations=10,
classifier='rf',
scoring='roc_auc'
)
print(f"Mean AUC: {sum(scores)/len(scores):.3f}")from alethiotx.artemis.upset import prepare, create
from alethiotx.artemis.clinical import load
from alethiotx.artemis.pathway import load as load_pathway
# Load clinical scores for multiple diseases
breast, lung, prostate, melanoma, bowel, diabetes, cardio = load(date='2025-12-08')
# Prepare data for UpSet plot (mode='ct' for clinical targets)
upset_data = prepare(breast, lung, prostate, melanoma, bowel, diabetes, cardio, mode='ct')
# Create and display the UpSet plot
plot = create(upset_data, min_subset_size=5)
plot.plot()
# For pathway genes, use mode='pg'
breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load_pathway(date='2025-11-11', n=100)
upset_data_pg = prepare(breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg, mode='pg')
plot_pg = create(upset_data_pg, min_subset_size=10)
plot_pg.plot()The Artemis module includes built-in pre-computed data for:
- Breast Cancer (Breast Neoplasms)
- Lung Cancer (Lung Neoplasms)
- Prostate Cancer (Prostatic Neoplasms)
- Melanoma (Skin Neoplasms)
- Bowel Cancer (Intestinal Neoplasms)
- Diabetes Mellitus Type 2
- Cardiovascular Disease
The module supports querying any disease with MeSH headings via the compute() function.
molecules(version, top_n_activities)- Query ChEMBL for parent molecules with clinical trial datainfer_nct_year(nct_id)- Infer registration year from ClinicalTrials.gov NCT identifier
compute(mesh_headings, chembl_version, trials_only_last_n_years, filter_families)- Compute clinical validation scores for drug targetsload(date)- Load pre-computed clinical scores from S3lookup_drug_family_representation(chembl)- Create drug-disease-family representation lookup tablefilter_overrepresented_families(targets_df, drug_chembl_id, mesh_heading, lookup_table)- Filter over-represented gene familiesunique(scores, overlap, common_genes)- Remove overlapping genes from clinical scoresapproved(scores)- Filter to include only approved targetsall_targets(scores)- Extract all unique target genes from score lists
get(search, rif)- Query Ma'ayan Lab's GeneShot API for disease-associated genesload(date, n)- Load pre-computed pathway genes from S3unique(genes, overlap, common_genes)- Remove overlapping genes from pathway lists
tree(s3_base, url_base, file_base)- Retrieve MeSH tree structuredescendants(heading, s3_base, file_base, url_base)- Get all descendant MeSH headings
download(gene_has_family_url, family_url, hgnc_complete_url)- Download HGNC gene family dataprocess(gene_has_family, family, hgnc_data)- Process HGNC data and create gene-family mappings
prepare(X, y, pathway_genes, known_targets, term_num, bins, rand_seed)- Prepare datasets for ML model trainingrun(X, y, n_splits, n_iterations, classifier, scoring)- Cross-validation pipeline with configurable classifiers
prepare(breast, lung, prostate, melanoma, bowel, diabetes, cardiovascular, mode)- Prepare data for UpSet plotcreate(indications, min_subset_size)- Create UpSet plots for visualizing gene set intersections
find_overlapping_genes(genes, overlap, common_genes)- Find genes that overlap across multiple gene lists
The Artemis module uses AWS S3 for storing pre-computed data:
s3://alethiotx-artemis/data/
├── clinical_scores/{date}/{disease}.csv
├── pathway_genes/{date}/{disease}.csv
├── chembl/{version}/molecules.csv
└── mesh/d{year}.pkl
- Python >= 3.9
- requests
- scikit-learn
- pandas
- numpy
- setuptools
- fsspec
- s3fs
- upsetplot
- chembl-downloader
If you use the Artemis module in your research, please cite:
Artemis: public knowledge graphs enable accessible and scalable drug target discovery
Vladimir Kiselev, Alethio Therapeutics
For other modules, citation information will be provided as they are released.
This project is licensed under the MIT License - see the LICENSE file for details.
Vladimir Kiselev
Email: vlad.kiselev@alethiomics.com
- Homepage: https://github.com/alethiotx/pypi
- Issues: https://github.com/alethiotx/pypi/issues
Contributions are welcome! Please feel free to submit a Pull Request.
Current Focus: Artemis - Enabling accessible and scalable drug target discovery through public knowledge graphs.
Coming Soon: Additional modules for expanded drug discovery capabilities.