A growing collection of 118 standalone CLI tools built with pyopenms for proteomics and metabolomics workflows. Every tool in this repository fills a gap not covered by existing OpenMS TOPP tools — small, focused utilities that researchers need daily but typically write as throwaway scripts.
Mass spectrometry researchers constantly need small utilities: extract an XIC from an mzML file, compute adduct m/z values for a metabolite, check peptide uniqueness in a FASTA database, validate crosslink distances against a PDB structure. These tasks are too simple for a full pipeline but too tedious to re-implement from scratch every time.
Agentomics collects these utilities into a single, organized repository where each tool is:
- Self-contained — no cross-tool dependencies, install and run independently
- CLI-first — every tool has an
clickinterface, usable from the command line or imported as a Python library - Tested — every tool ships with unit tests using synthetic pyopenms data
- pyopenms-native — built on the official Python bindings for OpenMS, not reimplementing what already exists
All code in this repository is written entirely by AI agents (Claude Code, GitHub Copilot, Cursor, Gemini, etc.). This is an agentic-only development project — tool ideas were researched from GitHub repositories, community forums (BioStars, Reddit), published papers, and pyopenms documentation, then implemented by AI. Human review is applied for quality control and direction, but the code itself is machine-generated. Use at your own discretion and always validate results against established tools for critical analyses.
This repo is designed for AI agent contributions. The full contributor guide is in AGENTS.md, but the key idea is:
- Pick a gap — find a utility task that researchers need but no TOPP tool covers
- Follow the structure — every tool lives in its own directory with a standard layout (see below)
- Validate in isolation — each tool must pass
ruff checkandpytestin a fresh venv with onlypyopenmsinstalled - Do not duplicate TOPP tools — if
FileConverter,PeakPickerHiRes,FalseDiscoveryRate, or any other TOPP command already does it, don't rebuild it here
Two Claude Code skills are available for contributors:
contribute-script— guided workflow for adding a new toolvalidate-script— validate any tool in an isolated venv (ruff + pytest)
Every tool follows the same directory layout:
tools/<domain>/<topic>/<tool_name>/
├── <tool_name>.py # The tool (importable functions + click CLI)
├── requirements.txt # pyopenms + tool-specific deps (no version pins)
├── README.md # Brief description + CLI usage examples
└── tests/
├── conftest.py # requires_pyopenms marker + sys.path setup
└── test_<tool_name>.py
Every .py file contains:
- A module docstring describing the tool, its features, and usage
- A pyopenms import guard with a user-friendly error message
- Importable functions with type hints and numpy-style docstrings — so the tool works both as a library and as a CLI
- A
main()function wiring upclickfor command-line usage - An
if __name__ == "__main__": main()guard
Domains: proteomics/, metabolomics/
Proteomics topics: spectrum_analysis/, peptide_analysis/, protein_analysis/, fasta_utils/, file_conversion/, quality_control/, targeted_proteomics/, identification/, ptm_analysis/, structural_proteomics/, specialized/, rna/
Metabolomics topics: formula_tools/, feature_processing/, spectral_analysis/, compound_annotation/, drug_metabolism/, isotope_labeling/, lipidomics/, export/
pip install pyopenmsSome tools require additional dependencies (numpy, scipy). Check each tool's requirements.txt.
# Install dependencies
pip install -r tools/proteomics/spectrum_analysis/theoretical_spectrum_generator/requirements.txt
# Run via CLI
python tools/proteomics/spectrum_analysis/theoretical_spectrum_generator/theoretical_spectrum_generator.py --help
# Run tests
PYTHONPATH=tools/proteomics/spectrum_analysis/theoretical_spectrum_generator \
python -m pytest tools/proteomics/spectrum_analysis/theoretical_spectrum_generator/tests/ -vEach tool is validated in an isolated venv:
TOOL_DIR=tools/<domain>/<topic>/<tool_name>
VENV_DIR=$(mktemp -d)
python -m venv "$VENV_DIR"
"$VENV_DIR/bin/python" -m pip install -r "$TOOL_DIR/requirements.txt"
"$VENV_DIR/bin/python" -m pip install pytest ruff
"$VENV_DIR/bin/python" -m ruff check "$TOOL_DIR/"
PYTHONPATH="$TOOL_DIR" "$VENV_DIR/bin/python" -m pytest "$TOOL_DIR/tests/" -v
rm -rf "$VENV_DIR"Both ruff and pytest must pass with zero errors.
| Tool | Description |
|---|---|
theoretical_spectrum_generator |
Generate theoretical b/y/a/c/x/z fragment ion spectra for peptide sequences |
spectrum_similarity_scorer |
Compute cosine similarity between MS2 spectra from MGF files |
spectrum_annotator |
Annotate observed MS2 peaks with theoretical fragment ion matches |
spectrum_scoring_hyperscore |
Score experimental spectra against theoretical using HyperScore |
spectrum_entropy_calculator |
Calculate normalized Shannon entropy for MS2 spectra |
spectral_library_builder |
Build consensus spectral libraries from mzML + peptide identifications |
spectral_library_format_converter |
Convert between spectral library formats (MSP, TraML) |
| Tool | Description |
|---|---|
peptide_property_calculator |
Calculate pI, hydrophobicity, charge at pH, amino acid composition |
peptide_mass_calculator |
Monoisotopic/average masses and b/y fragment ions |
peptide_uniqueness_checker |
Check if peptides are proteotypic within a FASTA database |
modification_mass_calculator |
Query Unimod by name or mass shift, compute modified peptide masses |
modified_peptide_generator |
Enumerate all modified peptide variants for given variable/fixed mods |
peptide_modification_analyzer |
Residue-by-residue mass breakdown of modified peptides |
peptide_detectability_predictor |
Predict peptide detectability from physicochemical heuristics |
isoelectric_point_calculator |
Calculate pI using Henderson-Hasselbalch with configurable pK sets |
charge_state_predictor |
Predict charge state distribution based on basic residues |
amino_acid_composition_analyzer |
Amino acid frequency and composition statistics |
rt_prediction_additive |
Predict peptide RT using additive hydrophobicity models |
peptide_mass_fingerprint |
Generate/match peptide mass fingerprints for MALDI-TOF identification |
| Tool | Description |
|---|---|
protein_digest |
In-silico enzymatic protein digestion |
protein_coverage_calculator |
Map peptides to proteins and calculate sequence coverage |
protein_group_reporter |
Generate clean protein-level reports with group membership |
spectral_counting_quantifier |
Calculate protein abundances using emPAI or NSAF methods |
peptide_to_protein_mapper |
Map peptide sequences to parent proteins in a FASTA database |
| Tool | Description |
|---|---|
fasta_subset_extractor |
Extract proteins by accession list, keyword, or length range |
fasta_statistics_reporter |
Report protein count, lengths, amino acid frequency, tryptic peptide counts |
contaminant_database_merger |
Append cRAP contaminant sequences with configurable prefix |
fasta_cleaner |
Remove duplicates, fix headers, filter by length |
fasta_merger |
Merge multiple FASTA files with duplicate removal |
fasta_decoy_validator |
Check if a FASTA already contains decoys, validate prefix consistency |
fasta_in_silico_digest_stats |
Digest a FASTA and report peptide-level statistics |
fasta_taxonomy_splitter |
Split multi-organism FASTA by taxonomy from headers |
| Tool | Description |
|---|---|
mgf_mzml_converter |
Bidirectional MGF ↔ mzML converter with spectrum filtering (merged from mgf_to_mzml_converter + mzml_to_mgf_converter) |
consensus_map_to_matrix |
Convert consensusXML to flat quantification matrix |
idxml_to_tsv_exporter |
Export idXML identification results to flat TSV |
ms_data_to_csv_exporter |
Export mzML/featureXML data to CSV with column selection |
mztab_summarizer |
Parse mzTab files and extract summary statistics |
featurexml_merger |
Merge multiple featureXML files |
ms_data_ml_exporter |
Export MS features as ML-ready matrices |
| Tool | Description |
|---|---|
lc_ms_qc_reporter |
Comprehensive QC report from mzML (TIC, MS1/MS2 counts, charge distribution) |
mzqc_generator |
Generate mzQC-format (HUPO-PSI standard) quality control files |
identification_qc_reporter |
Report identification-level QC metrics from search results |
run_comparison_reporter |
Compare mzML files side-by-side (TIC correlation, shared precursors) |
mass_error_distribution_analyzer |
Compute precursor and fragment mass error distributions |
acquisition_rate_analyzer |
Analyze MS1/MS2 acquisition rates, cycle time, duty cycle |
precursor_isolation_purity |
Estimate precursor isolation purity and co-isolation interference |
injection_time_analyzer |
Extract and analyze injection time values from mzML |
collision_energy_analyzer |
Extract and analyze collision energy values across MS2 spectra |
precursor_charge_distribution |
Analyze charge state distribution across MS2 spectra |
precursor_recurrence_analyzer |
Analyze precursor resampling frequency in DDA runs |
missed_cleavage_analyzer |
Analyze missed cleavage distribution as a digestion QC metric |
sample_complexity_estimator |
Estimate sample complexity from MS1 peak density |
spectrum_file_info |
Summary statistics for mzML files |
ms1_feature_intensity_tracker |
Track feature intensities across a batch of mzML runs |
| Tool | Description |
|---|---|
xic_extractor |
Extract ion chromatograms for target m/z values from mzML |
tic_bpc_calculator |
Compute TIC and base peak chromatograms from mzML |
transition_list_generator |
Generate SRM/MRM/PRM transition lists from peptide sequences |
irt_calculator |
Convert observed RT to indexed retention time (iRT) values |
inclusion_list_generator |
Generate instrument inclusion lists from identification results |
dia_window_analyzer |
Report DIA isolation window scheme from mzML metadata |
library_coverage_estimator |
Estimate proteome coverage of a spectral library |
| Tool | Description |
|---|---|
feature_detection_proteomics |
Peptide feature detection from LC-MS/MS data |
psm_feature_extractor |
Extract rescoring features from PSMs (mass error, coverage, intensity) |
peptide_spectral_match_validator |
Validate individual PSMs by recomputing fragment ion coverage |
semi_tryptic_peptide_finder |
Classify peptides as fully/semi/non-tryptic |
sequence_tag_generator |
Generate de novo sequence tags from MS2 fragment ion ladders |
mzml_spectrum_subsetter |
Extract specific spectra from mzML by scan number list |
mzml_metadata_extractor |
Extract instrument metadata from mzML files |
| Tool | Description |
|---|---|
ptm_site_localization_scorer |
Score PTM site localization confidence using fragment ion coverage |
phosphosite_class_filter |
Classify phosphosites into Class I/II/III by localization probability |
phospho_motif_analyzer |
Extract sequence windows around phosphosites and analyze kinase motifs |
phospho_enrichment_qc |
Compute phospho-enrichment efficiency and pSer/pThr/pTyr ratios |
glycopeptide_mass_calculator |
Calculate glycopeptide masses with glycan compositions |
| Tool | Description |
|---|---|
hdx_deuterium_uptake |
Calculate deuterium uptake from HDX-MS time course data |
hdx_back_exchange_estimator |
Estimate per-peptide back-exchange rates from fully deuterated controls |
crosslink_mass_calculator |
Calculate masses for crosslinked peptide pairs (DSS, BS3, DSSO) |
xl_distance_validator |
Validate crosslink distances against PDB structures |
xl_link_classifier |
Classify crosslinks as intra-protein, inter-protein, or monolink |
| Tool | Description |
|---|---|
immunopeptide_filter |
Filter peptides for MHC-I/II by length range and motif |
immunopeptidome_qc |
QC for immunopeptidomics (length distribution, anchor residues) |
metapeptide_lca_assigner |
Assign lowest common ancestor taxonomy from peptide-protein mappings |
cleavage_site_profiler |
Profile protease cleavage site specificity from N-terminomics data |
nterm_modification_annotator |
Classify N-terminal peptides (protein N-term, signal peptide, neo-N-term) |
proteoform_delta_annotator |
Annotate mass differences between proteoforms with known PTMs |
topdown_coverage_calculator |
Compute per-residue bond cleavage coverage for intact proteins |
| Tool | Description |
|---|---|
rna_mass_calculator |
Calculate mass, formula, and isotopes for RNA/oligonucleotide sequences |
rna_digest |
In silico RNA digestion with RNases (T1, U2, etc.) |
rna_fragment_spectrum_generator |
Generate theoretical RNA fragment spectra (c/y/w/a-B ions) |
| Tool | Description |
|---|---|
adduct_calculator |
Compute m/z for all common ESI adducts given a formula or mass |
molecular_formula_finder |
Enumerate valid molecular formulas for an accurate mass with element constraints |
mass_decomposition_tool |
Find molecular formula compositions for a given mass within tolerance |
formula_mass_calculator |
Calculate exact masses for molecular formulas with adduct support |
formula_validator_golden_rules |
Apply Kind & Fiehn's Seven Golden Rules to filter formula candidates |
rdbe_calculator |
Calculate Ring/Double Bond Equivalence for molecular formulas |
metabolite_formula_annotator |
Annotate features with candidate formulas using mass + isotope fit |
mass_accuracy_calculator |
Compute m/z mass accuracy (ppm error) for sequences or formulas |
| Tool | Description |
|---|---|
blank_subtraction_tool |
Subtract blank/control features from sample features by m/z + RT matching |
duplicate_feature_detector |
Detect and flag duplicate features by m/z and RT proximity |
adduct_group_analyzer |
Group features by adduct relationships into ion identity groups |
isf_detector |
Detect in-source fragmentation artifacts by coelution and neutral loss |
targeted_feature_extractor |
Extract features for known compounds from MS1 data |
mass_defect_filter |
Filter features by mass defect and Kendrick mass defect |
metabolite_feature_detection |
Metabolite feature detection from LC-MS data |
| Tool | Description |
|---|---|
spectral_entropy_scorer |
Compute spectral entropy similarity (Li & Fiehn 2021) |
neutral_loss_scanner |
Scan MS2 spectra for characteristic neutral losses |
isotope_pattern_analyzer |
Generate theoretical isotope distributions, cosine similarity scoring, Da/ppm tolerance, Cl/Br halogen detection (merged from isotope_pattern_matcher + isotope_pattern_scorer + isotope_pattern_fit_scorer) |
massql_query_tool |
Query mzML data using MassQL-like syntax |
| Tool | Description |
|---|---|
van_krevelen_data_generator |
Compute H:C and O:C ratios, classify into biochemical compound classes |
kendrick_mass_defect_analyzer |
Compute Kendrick mass defect for homologous series detection (CH2, CF2, etc.) |
suspect_screener |
Match detected masses against suspect screening lists (CompTox, NORMAN) |
metabolite_class_predictor |
Predict compound class from mass defect, element ratios, and RDBE |
| Tool | Description |
|---|---|
drug_metabolite_screener |
Predict Phase I/II drug metabolites and screen mzML for matches |
mass_difference_network_builder |
Connect features by known biotransformation mass differences |
| Tool | Description |
|---|---|
isotope_label_detector |
Detect 13C/15N-labeled metabolites by paired feature analysis |
mid_natural_abundance_corrector |
Correct mass isotopomer distributions for natural 13C abundance |
| Tool | Description |
|---|---|
lipid_species_resolver |
Enumerate acyl chain combinations from sum-composition lipid annotations |
lipid_ecn_rt_predictor |
Predict lipid RT from Equivalent Carbon Number |
| Tool | Description |
|---|---|
gnps_fbmn_exporter |
Export MS2 + quantification in GNPS Feature-Based Molecular Networking format |
sirius_exporter |
Export features + MS2 data to SIRIUS .ms format |
kovats_ri_calculator |
Calculate Kovats Retention Index from alkane standards for GC-MS |
BSD 3-Clause — see LICENSE.