Skip to content

OpenMS/agentomics

Repository files navigation

Agentomics

A growing collection of 118 standalone CLI tools built with pyopenms for proteomics and metabolomics workflows. Every tool in this repository fills a gap not covered by existing OpenMS TOPP tools — small, focused utilities that researchers need daily but typically write as throwaway scripts.

Why This Exists

Mass spectrometry researchers constantly need small utilities: extract an XIC from an mzML file, compute adduct m/z values for a metabolite, check peptide uniqueness in a FASTA database, validate crosslink distances against a PDB structure. These tasks are too simple for a full pipeline but too tedious to re-implement from scratch every time.

Agentomics collects these utilities into a single, organized repository where each tool is:

  • Self-contained — no cross-tool dependencies, install and run independently
  • CLI-first — every tool has an click interface, usable from the command line or imported as a Python library
  • Tested — every tool ships with unit tests using synthetic pyopenms data
  • pyopenms-native — built on the official Python bindings for OpenMS, not reimplementing what already exists

AI-Generated Disclaimer

All code in this repository is written entirely by AI agents (Claude Code, GitHub Copilot, Cursor, Gemini, etc.). This is an agentic-only development project — tool ideas were researched from GitHub repositories, community forums (BioStars, Reddit), published papers, and pyopenms documentation, then implemented by AI. Human review is applied for quality control and direction, but the code itself is machine-generated. Use at your own discretion and always validate results against established tools for critical analyses.

Contributing (Agentic Workflow)

This repo is designed for AI agent contributions. The full contributor guide is in AGENTS.md, but the key idea is:

  1. Pick a gap — find a utility task that researchers need but no TOPP tool covers
  2. Follow the structure — every tool lives in its own directory with a standard layout (see below)
  3. Validate in isolation — each tool must pass ruff check and pytest in a fresh venv with only pyopenms installed
  4. Do not duplicate TOPP tools — if FileConverter, PeakPickerHiRes, FalseDiscoveryRate, or any other TOPP command already does it, don't rebuild it here

Two Claude Code skills are available for contributors:

  • contribute-script — guided workflow for adding a new tool
  • validate-script — validate any tool in an isolated venv (ruff + pytest)

Tool Structure

Every tool follows the same directory layout:

tools/<domain>/<topic>/<tool_name>/
├── <tool_name>.py        # The tool (importable functions + click CLI)
├── requirements.txt      # pyopenms + tool-specific deps (no version pins)
├── README.md             # Brief description + CLI usage examples
└── tests/
    ├── conftest.py       # requires_pyopenms marker + sys.path setup
    └── test_<tool_name>.py

Every .py file contains:

  1. A module docstring describing the tool, its features, and usage
  2. A pyopenms import guard with a user-friendly error message
  3. Importable functions with type hints and numpy-style docstrings — so the tool works both as a library and as a CLI
  4. A main() function wiring up click for command-line usage
  5. An if __name__ == "__main__": main() guard

Domains: proteomics/, metabolomics/

Proteomics topics: spectrum_analysis/, peptide_analysis/, protein_analysis/, fasta_utils/, file_conversion/, quality_control/, targeted_proteomics/, identification/, ptm_analysis/, structural_proteomics/, specialized/, rna/

Metabolomics topics: formula_tools/, feature_processing/, spectral_analysis/, compound_annotation/, drug_metabolism/, isotope_labeling/, lipidomics/, export/

Requirements

pip install pyopenms

Some tools require additional dependencies (numpy, scipy). Check each tool's requirements.txt.

Running a Tool

# Install dependencies
pip install -r tools/proteomics/spectrum_analysis/theoretical_spectrum_generator/requirements.txt

# Run via CLI
python tools/proteomics/spectrum_analysis/theoretical_spectrum_generator/theoretical_spectrum_generator.py --help

# Run tests
PYTHONPATH=tools/proteomics/spectrum_analysis/theoretical_spectrum_generator \
  python -m pytest tools/proteomics/spectrum_analysis/theoretical_spectrum_generator/tests/ -v

Validation

Each tool is validated in an isolated venv:

TOOL_DIR=tools/<domain>/<topic>/<tool_name>
VENV_DIR=$(mktemp -d)
python -m venv "$VENV_DIR"
"$VENV_DIR/bin/python" -m pip install -r "$TOOL_DIR/requirements.txt"
"$VENV_DIR/bin/python" -m pip install pytest ruff
"$VENV_DIR/bin/python" -m ruff check "$TOOL_DIR/"
PYTHONPATH="$TOOL_DIR" "$VENV_DIR/bin/python" -m pytest "$TOOL_DIR/tests/" -v
rm -rf "$VENV_DIR"

Both ruff and pytest must pass with zero errors.


Tool Catalog

Proteomics (89 tools)

Spectrum Analysis (7 tools)

Tool Description
theoretical_spectrum_generator Generate theoretical b/y/a/c/x/z fragment ion spectra for peptide sequences
spectrum_similarity_scorer Compute cosine similarity between MS2 spectra from MGF files
spectrum_annotator Annotate observed MS2 peaks with theoretical fragment ion matches
spectrum_scoring_hyperscore Score experimental spectra against theoretical using HyperScore
spectrum_entropy_calculator Calculate normalized Shannon entropy for MS2 spectra
spectral_library_builder Build consensus spectral libraries from mzML + peptide identifications
spectral_library_format_converter Convert between spectral library formats (MSP, TraML)

Peptide Analysis (12 tools)

Tool Description
peptide_property_calculator Calculate pI, hydrophobicity, charge at pH, amino acid composition
peptide_mass_calculator Monoisotopic/average masses and b/y fragment ions
peptide_uniqueness_checker Check if peptides are proteotypic within a FASTA database
modification_mass_calculator Query Unimod by name or mass shift, compute modified peptide masses
modified_peptide_generator Enumerate all modified peptide variants for given variable/fixed mods
peptide_modification_analyzer Residue-by-residue mass breakdown of modified peptides
peptide_detectability_predictor Predict peptide detectability from physicochemical heuristics
isoelectric_point_calculator Calculate pI using Henderson-Hasselbalch with configurable pK sets
charge_state_predictor Predict charge state distribution based on basic residues
amino_acid_composition_analyzer Amino acid frequency and composition statistics
rt_prediction_additive Predict peptide RT using additive hydrophobicity models
peptide_mass_fingerprint Generate/match peptide mass fingerprints for MALDI-TOF identification

Protein Analysis (5 tools)

Tool Description
protein_digest In-silico enzymatic protein digestion
protein_coverage_calculator Map peptides to proteins and calculate sequence coverage
protein_group_reporter Generate clean protein-level reports with group membership
spectral_counting_quantifier Calculate protein abundances using emPAI or NSAF methods
peptide_to_protein_mapper Map peptide sequences to parent proteins in a FASTA database

FASTA Utilities (8 tools)

Tool Description
fasta_subset_extractor Extract proteins by accession list, keyword, or length range
fasta_statistics_reporter Report protein count, lengths, amino acid frequency, tryptic peptide counts
contaminant_database_merger Append cRAP contaminant sequences with configurable prefix
fasta_cleaner Remove duplicates, fix headers, filter by length
fasta_merger Merge multiple FASTA files with duplicate removal
fasta_decoy_validator Check if a FASTA already contains decoys, validate prefix consistency
fasta_in_silico_digest_stats Digest a FASTA and report peptide-level statistics
fasta_taxonomy_splitter Split multi-organism FASTA by taxonomy from headers

File Conversion (7 tools)

Tool Description
mgf_mzml_converter Bidirectional MGF ↔ mzML converter with spectrum filtering (merged from mgf_to_mzml_converter + mzml_to_mgf_converter)
consensus_map_to_matrix Convert consensusXML to flat quantification matrix
idxml_to_tsv_exporter Export idXML identification results to flat TSV
ms_data_to_csv_exporter Export mzML/featureXML data to CSV with column selection
mztab_summarizer Parse mzTab files and extract summary statistics
featurexml_merger Merge multiple featureXML files
ms_data_ml_exporter Export MS features as ML-ready matrices

Quality Control (15 tools)

Tool Description
lc_ms_qc_reporter Comprehensive QC report from mzML (TIC, MS1/MS2 counts, charge distribution)
mzqc_generator Generate mzQC-format (HUPO-PSI standard) quality control files
identification_qc_reporter Report identification-level QC metrics from search results
run_comparison_reporter Compare mzML files side-by-side (TIC correlation, shared precursors)
mass_error_distribution_analyzer Compute precursor and fragment mass error distributions
acquisition_rate_analyzer Analyze MS1/MS2 acquisition rates, cycle time, duty cycle
precursor_isolation_purity Estimate precursor isolation purity and co-isolation interference
injection_time_analyzer Extract and analyze injection time values from mzML
collision_energy_analyzer Extract and analyze collision energy values across MS2 spectra
precursor_charge_distribution Analyze charge state distribution across MS2 spectra
precursor_recurrence_analyzer Analyze precursor resampling frequency in DDA runs
missed_cleavage_analyzer Analyze missed cleavage distribution as a digestion QC metric
sample_complexity_estimator Estimate sample complexity from MS1 peak density
spectrum_file_info Summary statistics for mzML files
ms1_feature_intensity_tracker Track feature intensities across a batch of mzML runs

Targeted Proteomics (7 tools)

Tool Description
xic_extractor Extract ion chromatograms for target m/z values from mzML
tic_bpc_calculator Compute TIC and base peak chromatograms from mzML
transition_list_generator Generate SRM/MRM/PRM transition lists from peptide sequences
irt_calculator Convert observed RT to indexed retention time (iRT) values
inclusion_list_generator Generate instrument inclusion lists from identification results
dia_window_analyzer Report DIA isolation window scheme from mzML metadata
library_coverage_estimator Estimate proteome coverage of a spectral library

Identification (7 tools)

Tool Description
feature_detection_proteomics Peptide feature detection from LC-MS/MS data
psm_feature_extractor Extract rescoring features from PSMs (mass error, coverage, intensity)
peptide_spectral_match_validator Validate individual PSMs by recomputing fragment ion coverage
semi_tryptic_peptide_finder Classify peptides as fully/semi/non-tryptic
sequence_tag_generator Generate de novo sequence tags from MS2 fragment ion ladders
mzml_spectrum_subsetter Extract specific spectra from mzML by scan number list
mzml_metadata_extractor Extract instrument metadata from mzML files

PTM Analysis (5 tools)

Tool Description
ptm_site_localization_scorer Score PTM site localization confidence using fragment ion coverage
phosphosite_class_filter Classify phosphosites into Class I/II/III by localization probability
phospho_motif_analyzer Extract sequence windows around phosphosites and analyze kinase motifs
phospho_enrichment_qc Compute phospho-enrichment efficiency and pSer/pThr/pTyr ratios
glycopeptide_mass_calculator Calculate glycopeptide masses with glycan compositions

Structural Proteomics (5 tools)

Tool Description
hdx_deuterium_uptake Calculate deuterium uptake from HDX-MS time course data
hdx_back_exchange_estimator Estimate per-peptide back-exchange rates from fully deuterated controls
crosslink_mass_calculator Calculate masses for crosslinked peptide pairs (DSS, BS3, DSSO)
xl_distance_validator Validate crosslink distances against PDB structures
xl_link_classifier Classify crosslinks as intra-protein, inter-protein, or monolink

Specialized (7 tools)

Tool Description
immunopeptide_filter Filter peptides for MHC-I/II by length range and motif
immunopeptidome_qc QC for immunopeptidomics (length distribution, anchor residues)
metapeptide_lca_assigner Assign lowest common ancestor taxonomy from peptide-protein mappings
cleavage_site_profiler Profile protease cleavage site specificity from N-terminomics data
nterm_modification_annotator Classify N-terminal peptides (protein N-term, signal peptide, neo-N-term)
proteoform_delta_annotator Annotate mass differences between proteoforms with known PTMs
topdown_coverage_calculator Compute per-residue bond cleavage coverage for intact proteins

RNA (3 tools)

Tool Description
rna_mass_calculator Calculate mass, formula, and isotopes for RNA/oligonucleotide sequences
rna_digest In silico RNA digestion with RNases (T1, U2, etc.)
rna_fragment_spectrum_generator Generate theoretical RNA fragment spectra (c/y/w/a-B ions)

Metabolomics (34 tools)

Formula Tools (8 tools)

Tool Description
adduct_calculator Compute m/z for all common ESI adducts given a formula or mass
molecular_formula_finder Enumerate valid molecular formulas for an accurate mass with element constraints
mass_decomposition_tool Find molecular formula compositions for a given mass within tolerance
formula_mass_calculator Calculate exact masses for molecular formulas with adduct support
formula_validator_golden_rules Apply Kind & Fiehn's Seven Golden Rules to filter formula candidates
rdbe_calculator Calculate Ring/Double Bond Equivalence for molecular formulas
metabolite_formula_annotator Annotate features with candidate formulas using mass + isotope fit
mass_accuracy_calculator Compute m/z mass accuracy (ppm error) for sequences or formulas

Feature Processing (7 tools)

Tool Description
blank_subtraction_tool Subtract blank/control features from sample features by m/z + RT matching
duplicate_feature_detector Detect and flag duplicate features by m/z and RT proximity
adduct_group_analyzer Group features by adduct relationships into ion identity groups
isf_detector Detect in-source fragmentation artifacts by coelution and neutral loss
targeted_feature_extractor Extract features for known compounds from MS1 data
mass_defect_filter Filter features by mass defect and Kendrick mass defect
metabolite_feature_detection Metabolite feature detection from LC-MS data

Spectral Analysis (4 tools)

Tool Description
spectral_entropy_scorer Compute spectral entropy similarity (Li & Fiehn 2021)
neutral_loss_scanner Scan MS2 spectra for characteristic neutral losses
isotope_pattern_analyzer Generate theoretical isotope distributions, cosine similarity scoring, Da/ppm tolerance, Cl/Br halogen detection (merged from isotope_pattern_matcher + isotope_pattern_scorer + isotope_pattern_fit_scorer)
massql_query_tool Query mzML data using MassQL-like syntax

Compound Annotation (4 tools)

Tool Description
van_krevelen_data_generator Compute H:C and O:C ratios, classify into biochemical compound classes
kendrick_mass_defect_analyzer Compute Kendrick mass defect for homologous series detection (CH2, CF2, etc.)
suspect_screener Match detected masses against suspect screening lists (CompTox, NORMAN)
metabolite_class_predictor Predict compound class from mass defect, element ratios, and RDBE

Drug Metabolism (2 tools)

Tool Description
drug_metabolite_screener Predict Phase I/II drug metabolites and screen mzML for matches
mass_difference_network_builder Connect features by known biotransformation mass differences

Isotope Labeling (2 tools)

Tool Description
isotope_label_detector Detect 13C/15N-labeled metabolites by paired feature analysis
mid_natural_abundance_corrector Correct mass isotopomer distributions for natural 13C abundance

Lipidomics (2 tools)

Tool Description
lipid_species_resolver Enumerate acyl chain combinations from sum-composition lipid annotations
lipid_ecn_rt_predictor Predict lipid RT from Equivalent Carbon Number

Export (3 tools)

Tool Description
gnps_fbmn_exporter Export MS2 + quantification in GNPS Feature-Based Molecular Networking format
sirius_exporter Export features + MS2 data to SIRIUS .ms format
kovats_ri_calculator Calculate Kovats Retention Index from alkane standards for GC-MS

License

BSD 3-Clause — see LICENSE.

About

A repository of agentic created tools in proteomics using pyopenms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages