Skip to content

Add AI contributor skills, per-script structure, and CI pipeline#2

Merged
ypriverol merged 16 commits intomainfrom
add/ai-contributor-skills
Mar 24, 2026
Merged

Add AI contributor skills, per-script structure, and CI pipeline#2
ypriverol merged 16 commits intomainfrom
add/ai-contributor-skills

Conversation

@timosachsenberg
Copy link
Contributor

@timosachsenberg timosachsenberg commented Mar 24, 2026

Summary

  • Migrate 7 pyopenms scripts from feature branch into per-script directories, each with its own requirements.txt, README.md, tests/conftest.py, and test file (28 tests total, all passing)
  • Add ruff.toml (E/F/W/I rules, line-length 120) and ensure all scripts pass lint
  • Add two Claude Code skills: contribute-script (guided new tool creation) and validate-script (isolated venv validation)
  • Add AGENTS.md — platform-agnostic contributor guide for any AI agent (Copilot, Cursor, Gemini, etc.)
  • Add GitHub Actions CI (.github/workflows/validate.yml) that detects changed scripts on PR and validates each in a parallel isolated venv
  • Update CLAUDE.md and README.md to reflect the new structure
  • Fix deprecated pyopenms 3.5.0 APIs in feature_detection_proteomics and metabolite_feature_detection

Test plan

  • All 7 scripts pass ruff check with zero errors
  • All 28 tests pass via PYTHONPATH=<script> python -m pytest <script>/tests/ -v
  • CI workflow triggers correctly on a PR that touches scripts/
  • Verify contribute-script skill works end-to-end by adding a new tool

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added 7 new pyopenms CLI tools for metabolomics and proteomics: isotope pattern matching, mass accuracy calculation, metabolite feature detection, proteomics feature detection, peptide mass calculation, protein digestion, and spectrum file analysis.
  • Documentation

    • Added contributor guidelines, Claude Code integration instructions, and repository setup documentation for AI-assisted tool development.
  • Chores

    • Added linting configuration, automated validation workflow, and skill templates for contributing and validating new tools.

timosachsenberg and others added 16 commits March 24, 2026 20:07
Defines per-script directory structure, Claude Code skills (contribute-script,
validate-script), AGENTS.md contributor guide, and GitHub Actions CI for
validating pyopenms script contributions in isolated venvs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mplate

- Resolve PYTHONPATH vs sys.path contradiction: PYTHONPATH is primary,
  conftest.py provides fallback for direct pytest invocation
- Add concrete conftest.py template to reduce implementation ambiguity
- Clarify migration: copy files from feature branch, decompose monolithic
  test files and shared requirements.txt into per-script structure
- Use consistent language for branch migration (copy + restructure)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…line

14 tasks covering: ruff.toml, migration of 7 scripts to per-script
directories, Claude Code skills, AGENTS.md, GitHub Actions CI, and
documentation updates. Reviewed and approved with minor fixes applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…thetic test data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…thetic test data

Fix FeatureFindingMetabo.run() API change in pyopenms >= 3.x: third argument
is now output_chromatograms list, not a chrom_fwhm float.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- validate-script: isolated venv validation (ruff + pytest) for any script
- contribute-script: guided workflow for creating new pyopenms tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detects changed script directories on PR, validates each in parallel
with its own venv: pip install from script's requirements.txt, ruff lint,
pytest. Uses Python 3.11 for pyopenms wheel availability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 'run all tests' one-liner to CLAUDE.md
- Fix feature_detection_proteomics README to reference
  FeatureFinderAlgorithmPicked (pyopenms 3.5.0+ API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

This pull request establishes a comprehensive AI-assisted contribution and validation framework for the agentomics repository. It introduces standardized directory structure, validation procedures, and CI/CD automation for stand-alone pyopenms-based CLI tools, along with seven new functional scripts for proteomics and metabolomics analysis.

Changes

Cohort / File(s) Summary
Documentation & Guidance
AGENTS.md, CLAUDE.md, README.md
New contributor guidance defining per-script directory layout, required files, code patterns, and validation workflow. Updated README with project overview, requirements, scripts table, and validation summary.
Configuration & Infrastructure
ruff.toml, .github/workflows/validate.yml
Added linting rules (E, F, W, I lint categories with 120-char line length) and GitHub Actions workflow that detects changed script directories on pull requests, then validates each via isolated venv with ruff and pytest checks.
Claude Code Skills
.claude/skills/contribute-script.md, .claude/skills/validate-script.md
Defined two rigid, step-by-step skills: contribute-script scaffolds new tool directories with boilerplate, and validate-script runs linting and tests in a fresh temporary virtual environment.
Planning & Design Documentation
docs/superpowers/plans/2026-03-24-ai-contributor-skills.md, docs/superpowers/specs/2026-03-24-ai-contributor-skills-design.md
Comprehensive implementation plan and design specification documenting the full contribution/validation infrastructure, migration workflow, and delivery requirements.
Metabolomics Scripts
scripts/metabolomics/isotope_pattern_matcher/*, scripts/metabolomics/mass_accuracy_calculator/*, scripts/metabolomics/metabolite_feature_detection/*
Three new metabolomics tools: isotope pattern generation/matching, ppm mass accuracy calculation, and metabolite feature detection from LC-MS data. Each includes script, requirements, README, conftest with conditional skip marker, and test suite.
Proteomics Scripts
scripts/proteomics/feature_detection_proteomics/*, scripts/proteomics/peptide_mass_calculator/*, scripts/proteomics/protein_digest/*, scripts/proteomics/spectrum_file_info/*
Four new proteomics tools: LC-MS feature detection, peptide mass and fragment ion calculation, in-silico protein digestion, and mzML file summary statistics. Each follows the same structure: script with pyopenms import guard, isolated dependencies, README documentation, and pyopenms-conditional tests.

Sequence Diagram(s)

sequenceDiagram
    participant PR as Pull Request
    participant GHA as GitHub Actions
    participant Detect as detect-changes Job
    participant Validate as validate Job
    participant Env as Isolated venv
    participant Ruff as Ruff Linter
    participant Pytest as Pytest Runner

    PR->>GHA: Trigger on scripts/** changes
    GHA->>Detect: Run detect-changes
    Detect->>Detect: Diff against base branch
    Detect->>Detect: Extract script directories
    Detect->>GHA: Output matrix with script_dir list
    
    GHA->>Validate: Run validate (if has_changes)
    loop For each script_dir in matrix
        Validate->>Env: Create isolated venv
        Validate->>Env: Install script's requirements.txt
        Validate->>Env: Install pytest and ruff
        Validate->>Ruff: Run ruff check on script_dir
        Ruff->>Validate: Return lint results
        Validate->>Pytest: Run pytest on script_dir/tests/
        Pytest->>Validate: Return test results
        Validate->>Env: Cleanup venv
    end
    Validate->>PR: Report pass/fail status
Loading
sequenceDiagram
    participant Dev as Developer
    participant Claude as Claude (contribute-script)
    participant Repo as Repository
    participant Venv as Isolated venv
    participant Validate as validate-script Skill

    Dev->>Claude: Run contribute-script skill
    Claude->>Dev: Prompt for tool domain & name
    Dev->>Claude: Provide inputs
    Claude->>Repo: Create script directory structure
    Claude->>Repo: Write scaffolded script.py (with pyopenms guard)
    Claude->>Repo: Write requirements.txt, README.md
    Claude->>Repo: Create tests/conftest.py (with skip marker)
    Claude->>Repo: Write test_*.py files
    
    Claude->>Validate: Invoke validate-script
    Validate->>Venv: Create temporary venv
    Validate->>Venv: Install script dependencies + pytest + ruff
    Validate->>Venv: Run ruff check
    Validate->>Venv: Run pytest
    Venv->>Validate: Return lint and test results
    Validate->>Venv: Cleanup
    Validate->>Claude: Report validation outcome
    Claude->>Dev: Confirm ready for commit
Loading

Poem

🐰 Hops excitedly
Scripts now scaffold with a hop and a bound,
Validation venvs spin and dance all around,
Seven new tools in their neat little holes,
With tests that skip smartly and lint that controls!
Agentomics blooms—oh what joy to behold! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add AI contributor skills, per-script structure, and CI pipeline' directly and clearly summarizes the main changes: introduction of AI skills, restructured per-script organization, and CI/CD automation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add/ai-contributor-skills

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (7)
scripts/proteomics/feature_detection_proteomics/requirements.txt (1)

1-1: Constrain pyopenms to a validated range.

Line 1 is unpinned; future upstream changes can break this script unexpectedly in CI and local setups.

Suggested change
-pyopenms
+pyopenms>=3.5,<4
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/feature_detection_proteomics/requirements.txt` at line 1,
The requirements file currently lists an unpinned dependency "pyopenms"; update
that entry in requirements.txt to a validated version or range (for example pin
to the exact tested version or a bounded range like >=X.Y.Z,<A.B.C) so CI and
local environments remain reproducible; after changing the "pyopenms" line, run
the proteomics feature_detection_proteomics tests or the project's dependency
verification job to confirm the chosen version works.
scripts/proteomics/peptide_mass_calculator/requirements.txt (1)

1-1: Pin pyopenms to a tested range for reproducibility.

Line 1 is currently unbounded, so CI behavior can drift as new releases land. Please constrain this to a validated range.

Suggested change
-pyopenms
+pyopenms>=3.5,<4
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/peptide_mass_calculator/requirements.txt` at line 1,
Replace the unbounded pyopenms entry in requirements.txt with a pinned, tested
version range (or exact version) to ensure reproducible CI; update the single
line "pyopenms" to a constrained specification such as a specific version or a
range that your tests validate (for example, pyopenms>=2.7.0,<2.8.0) and ensure
this matches the version used in your CI/test matrix.
scripts/proteomics/feature_detection_proteomics/tests/test_feature_detection_proteomics.py (1)

32-34: Assert persisted output content, not just path existence.

This test would be stronger if it verifies the written featureXML can be loaded and matches the returned map size, instead of only checking that the file exists.

🔧 Suggested test hardening
             fm = detect_features(input_path, output_path)
             assert isinstance(fm, oms.FeatureMap)
             assert os.path.exists(output_path)
+            reloaded = oms.FeatureMap()
+            oms.FeatureXMLFile().load(output_path, reloaded)
+            assert reloaded.size() == fm.size()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@scripts/proteomics/feature_detection_proteomics/tests/test_feature_detection_proteomics.py`
around lines 32 - 34, The test currently only checks that the output_path file
exists; instead, load the written featureXML into a new oms.FeatureMap (use
oms.FeatureXMLFile().load or the appropriate loader) and assert the loaded map
is an oms.FeatureMap and that its feature count matches the returned fm (compare
fm.size() or len(fm) to loaded_map.size() or len(loaded_map)); replace the
assert os.path.exists(output_path) with these checks so the persisted content is
validated against the returned fm from detect_features.
scripts/metabolomics/mass_accuracy_calculator/tests/test_mass_accuracy_calculator.py (1)

9-40: Please add invalid-charge tests.

Given the public API takes charge: int, add coverage for charge <= 0 to protect against division-by-zero/regression paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@scripts/metabolomics/mass_accuracy_calculator/tests/test_mass_accuracy_calculator.py`
around lines 9 - 40, Add tests that pass invalid charges (0 and a negative int)
to the public APIs theoretical_mz_from_sequence and theoretical_mz_from_formula
to guard against division-by-zero/regression; for each function call with charge
0 and charge -1 assert that the call raises a clear exception (e.g. ValueError)
using pytest.raises, and include both sequence- and formula-based paths
(theoretical_mz_from_sequence("PEPTIDEK", 0),
theoretical_mz_from_sequence("PEPTIDEK", -1),
theoretical_mz_from_formula("C6H12O6", 0),
theoretical_mz_from_formula("C6H12O6", -1)) so the test suite ensures
invalid-charge handling is enforced.
scripts/proteomics/spectrum_file_info/tests/test_spectrum_file_info.py (1)

25-57: Add a regression test for spectra that have no peaks.

Current tests don’t cover the edge case where spectra exist but get_peaks() is empty. That case is important for validating tic_per_spectrum length and mz_range fallback behavior.

Suggested test addition
+    def test_summarise_spectra_with_no_peaks(self):
+        import pyopenms as oms
+        from spectrum_file_info import summarise_experiment
+
+        exp = oms.MSExperiment()
+        for i in range(3):
+            spec = oms.MSSpectrum()
+            spec.setMSLevel(1)
+            spec.setRT(float(i))
+            exp.addSpectrum(spec)
+
+        summary = summarise_experiment(exp)
+        assert summary["n_spectra"] == 3
+        assert len(summary["tic_per_spectrum"]) == 3
+        assert all(v == 0.0 for v in summary["tic_per_spectrum"])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/spectrum_file_info/tests/test_spectrum_file_info.py`
around lines 25 - 57, Add a regression test that builds an MSExperiment
containing spectra whose get_peaks() returns empty (i.e., spectra with no
peaks), call summarise_experiment on it, then assert the experiment-level
counters still report the correct number of spectra (summary["n_spectra"] == n)
and that len(summary["tic_per_spectrum"]) == n to ensure TIC is recorded per
spectrum, and verify the mz_range fallback is handled by asserting
summary["mz_range"] is falsy (e.g., None or an empty/None tuple) when no peaks
exist; reference summarise_experiment, tic_per_spectrum, get_peaks(), and
mz_range to locate the behavior to test.
docs/superpowers/plans/2026-03-24-ai-contributor-skills.md (1)

64-68: Consider adding language specifiers to fenced code blocks.

Several code blocks (e.g., lines 66-68 for requirements.txt content) lack language specifiers. While this doesn't affect execution, adding txt or text improves markdown rendering and satisfies linters.

📝 Example fix
- ```
+ ```txt
  pyopenms
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/superpowers/plans/2026-03-24-ai-contributor-skills.md around lines 64 -
68, Update the fenced code block under "Step 3: Create requirements.txt" that
currently contains "pyopenms" so it includes a language specifier (e.g., change
totxt or ```text) to improve markdown rendering and satisfy linters;
locate the fenced block associated with the "Step 3: Create requirements.txt"
heading and add the specifier to the opening backticks.


</details>

</blockquote></details>
<details>
<summary>scripts/metabolomics/isotope_pattern_matcher/isotope_pattern_matcher.py (1)</summary><blockquote>

`82-89`: **Consider using closest match instead of first match within tolerance.**

The current implementation takes the first observed peak within the m/z tolerance window (line 86 `break`). If multiple observed peaks fall within the tolerance, using the closest match would be more accurate. However, for typical isotope patterns where peaks are well-separated (>1 Da), this is unlikely to cause issues.


<details>
<summary>♻️ Optional enhancement for closest-match selection</summary>

```diff
     for tmz, tint in theoretical:
         matched = 0.0
+        best_delta = float('inf')
         for omz, oint in observed:
-            if abs(omz - tmz) <= mz_tolerance:
+            delta = abs(omz - tmz)
+            if delta <= mz_tolerance and delta < best_delta:
+                best_delta = delta
                 matched = oint
-                break
         theo_vec.append(tint)
         obs_vec.append(matched)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@scripts/metabolomics/isotope_pattern_matcher/isotope_pattern_matcher.py`
around lines 82 - 89, The loop that matches theoretical peaks to observed picks
the first observed peak within mz_tolerance (variables tmz/tint and omz/oint
with the early break), which can select a non-closest match when multiple
observed peaks fall inside the window; change the logic in the matching block so
you search all observed peaks for the minimum abs(omz - tmz) and only accept it
if that minimum <= mz_tolerance (leave matched = 0.0 if none found), then append
tint and the chosen matched intensity to theo_vec and obs_vec — update the
matching code that currently sets matched and uses break to instead compute and
use the closest match.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/contribute-script.md:

  • Around line 75-80: The snippet uses sys.exit in the ImportError handler but
    never imports sys, which will raise NameError on failure; add an import sys at
    the top of the block so the try/except around importing pyopenms (import
    pyopenms as oms) can call sys.exit(...) safely when ImportError is caught.

In @.claude/skills/validate-script.md:

  • Around line 21-30: Add a trap-based cleanup to guarantee removal of the
    temporary virtualenv: immediately after VENV_DIR=$(mktemp -d) register trap 'rm
    -rf "$VENV_DIR"' EXIT (so it runs on any exit/failure) and delete the final
    explicit rm -rf "$VENV_DIR" line at the end; ensure the trap references the same
    VENV_DIR variable and uses the EXIT signal so the venv is removed on both
    success and failure.

In @.github/workflows/validate.yml:

  • Around line 24-27: The CHANGED assignment pipeline can fail when grep finds no
    matches; modify the command that pipes to grep (the part using grep -oP
    'scripts/[^/]+/[^/]+/') to add a fallback (append || true) so grep's non-zero
    exit doesn't break the step, ensuring CHANGED becomes an empty list instead of
    causing the job to fail; update the pipeline around the CHANGED variable
    assignment and keep the rest of the processing (sort -u | jq -R -s -c ...)
    unchanged.

In @docs/superpowers/specs/2026-03-24-ai-contributor-skills-design.md:

  • Around line 28-37: The conftest.py template currently combines imports on one
    line; update the template so imports follow PEP8/ruff isort style by splitting
    "import sys, os, pytest" into separate, ordered imports (import os; import sys;
    import pytest) with appropriate blank lines, keeping the existing
    sys.path.insert(0, os.path.join(os.path.dirname(file), "..")) and preserving
    the try/except that sets HAS_PYOPENMS; modify the template's import block to
    match the actual conftest.py pattern used in
    scripts/metabolomics/metabolite_feature_detection/tests/conftest.py and ensure
    symbols like HAS_PYOPENMS and sys.path.insert remain unchanged.

In @scripts/metabolomics/isotope_pattern_matcher/isotope_pattern_matcher.py:

  • Around line 53-56: The code computes max_ab = max(ab for _, ab in peaks) and
    divides by it, which can raise ZeroDivisionError if all intensities are zero;
    fix by checking if max_ab == 0 and in that case return the peaks with normalized
    intensity 0.0 (e.g. return [(mz, 0.0) for mz, _ in peaks]) instead of performing
    the division; update the branch that currently builds the normalized list (the
    list comprehension using mz, ab in peaks) to handle this zero-max_ab case.

In @scripts/metabolomics/mass_accuracy_calculator/mass_accuracy_calculator.py:

  • Around line 37-55: Validate and reject non-positive charge values to avoid
    divide-by-zero: in the importable helper theoretical_mz_from_sequence(sequence:
    str, charge: int) add an explicit guard that raises a clear exception (e.g.,
    ValueError) when charge <= 0, and apply the same validation to any other helper
    functions that compute (mass + charge * PROTON) / charge; additionally enforce
    and validate the CLI argument for --charge so it cannot accept 0 or negative
    values (use argparse type/check or a custom validator) and return a helpful
    message for users when an invalid charge is provided.

In
@scripts/metabolomics/metabolite_feature_detection/metabolite_feature_detection.py:

  • Around line 118-122: The current output default logic (declared in
    parser.add_argument("--output", ...)) incorrectly documents ".featureXML"
    and uses a brittle string replace that fails for inputs without a .mzML
    extension or with different casing; change the code that computes the default
    output (where args.output is set/constructed around line 139) to use
    pathlib.Path: derive output by taking Path(args.input), if path.suffix.lower()
    == ".mzml" replace the suffix with "_metabolites.featureXML", otherwise append
    "_metabolites.featureXML" to the stem; also update the parser help text to
    reflect the actual default naming (e.g., "(default:
    _metabolites.featureXML)").

In
@scripts/proteomics/feature_detection_proteomics/feature_detection_proteomics.py:

  • Around line 17-22: The module currently calls sys.exit() in the module-level
    try/except that imports pyopenms (the try: import pyopenms as oms except
    ImportError: sys.exit(...)), which will terminate the interpreter on import;
    instead either move the pyopenms import and dependency check into the main()
    function so missing dependency only impacts CLI invocation, or replace the
    sys.exit() with raising ImportError (e.g., re-raise or raise ImportError with
    the same message) so importers/tests can catch it; update the import usage to
    reference oms after the deferred import or ensure tests that import this module
    can handle the ImportError from the module-level raise.
  • Line 93: The default output path calculation for output_path (using
    args.output or args.input.replace(".mzML", ".featureXML")) is case-sensitive and
    can produce the same path as args.input (e.g., "sample.mzml" or no matching
    suffix), risking overwrite; update the logic that sets output_path in
    feature_detection_proteomics.py to perform a case-insensitive extension swap
    (use os.path.splitext on args.input and append ".featureXML") and ensure the
    resulting output_path is different from args.input (if identical, add a suffix
    like ".featureXML" or ".featureXML.1" to avoid clobbering); reference the
    output_path assignment where args.output and args.input are used.

In @scripts/proteomics/peptide_mass_calculator/peptide_mass_calculator.py:

  • Around line 32-59: In peptide_masses, validate the charge parameter before
    performing m/z calculations: in the function peptide_masses(sequence: str,
    charge: int = 1) check that charge is a positive integer (e.g. if not
    isinstance(charge, int) or charge <= 0) and raise a clear ValueError such as
    "charge must be a positive integer" to prevent ZeroDivisionError when computing
    mz_monoisotopic and mz_average using PROTON.
  • Around line 22-27: Current module-level sys.exit on ImportError terminates the
    whole test run; replace the immediate exit with deferred error handling by
    capturing the optional import (set oms = None) and adding a helper function
    _require_pyopenms() that attempts to import/validate pyopenms or raises
    ImportError with the same install message; then call _require_pyopenms() at the
    start of the public functions peptide_masses() and fragment_ions() so the
    ImportError is raised at runtime and the @requires_pyopenms test decorator can
    skip tests instead of killing the process.

In @scripts/proteomics/protein_digest/protein_digest.py:

  • Around line 38-44: The digest_protein function currently allows invalid
    digestion params; add explicit validation at the top of digest_protein to ensure
    missed_cleavages is a non-negative integer and min_length and max_length are
    positive integers with min_length <= max_length (raise ValueError with clear
    message on invalid input), and clamp or reject values outside sensible bounds;
    apply the same validation logic to the other digestion functions that accept the
    same parameters (any other functions that take missed_cleavages, min_length,
    max_length in this module) so invalid inputs are caught before digestion logic
    runs.
  • Around line 62-64: The docstring describing the returned list-of-dicts is out
    of sync: it mentions keys start, end, and missed_cleavages while the
    actual payload contains sequence, length, and monoisotopic_mass. Update
    the function's return docstring (the docstring block that currently lists
    sequence, start, end, monoisotopic_mass, missed_cleavages) to
    accurately describe the actual returned keys (sequence, length,
    monoisotopic_mass) or alternatively modify the function that builds the dicts
    so they include start, end, and missed_cleavages; ensure the docstring and
    the returned dict structure (produced by the code that constructs each peptide
    dict) match exactly.

In @scripts/proteomics/spectrum_file_info/spectrum_file_info.py:

  • Around line 4-5: The docstring/help claims support for "mzML (or mzXML)" but
    the loader call MzMLFile().load(...) only handles mzML; update the code to make
    the behavior match the docs by either changing the top-level help text to "mzML
    only" or adding explicit mzXML handling: detect the input file extension inside
    spectrum_file_info.py and call the appropriate loader (e.g., use
    MzXMLFile().load(...) for .mzXML files) so that the loader selection (MzMLFile
    vs MzXMLFile) aligns with the advertised formats.
  • Around line 56-70: summarise_experiment currently skips spectra with zero
    peaks which misaligns tic_per_spectrum and can leave mz_min/mz_max as inf/-inf;
    change the loop that calls spec.get_peaks() so you always append a TIC value
    (append 0.0 when intensities is empty) and only update mz_min/mz_max when peaks
    exist, then after the loop replace any leftover inf/-inf mz bounds with a safe
    default (e.g., (0.0, 0.0) or None) before returning; apply the same fix in the
    duplicate block around the second occurrence (lines noted as 124-126) so
    functions/methods summarise_experiment, spec.get_peaks(), mz_min, mz_max, and
    tic_per_spectrum are updated accordingly.

Nitpick comments:
In @docs/superpowers/plans/2026-03-24-ai-contributor-skills.md:

  • Around line 64-68: Update the fenced code block under "Step 3: Create
    requirements.txt" that currently contains "pyopenms" so it includes a language
    specifier (e.g., change totxt or ```text) to improve markdown rendering
    and satisfy linters; locate the fenced block associated with the "Step 3: Create
    requirements.txt" heading and add the specifier to the opening backticks.

In @scripts/metabolomics/isotope_pattern_matcher/isotope_pattern_matcher.py:

  • Around line 82-89: The loop that matches theoretical peaks to observed picks
    the first observed peak within mz_tolerance (variables tmz/tint and omz/oint
    with the early break), which can select a non-closest match when multiple
    observed peaks fall inside the window; change the logic in the matching block so
    you search all observed peaks for the minimum abs(omz - tmz) and only accept it
    if that minimum <= mz_tolerance (leave matched = 0.0 if none found), then append
    tint and the chosen matched intensity to theo_vec and obs_vec — update the
    matching code that currently sets matched and uses break to instead compute and
    use the closest match.

In
@scripts/metabolomics/mass_accuracy_calculator/tests/test_mass_accuracy_calculator.py:

  • Around line 9-40: Add tests that pass invalid charges (0 and a negative int)
    to the public APIs theoretical_mz_from_sequence and theoretical_mz_from_formula
    to guard against division-by-zero/regression; for each function call with charge
    0 and charge -1 assert that the call raises a clear exception (e.g. ValueError)
    using pytest.raises, and include both sequence- and formula-based paths
    (theoretical_mz_from_sequence("PEPTIDEK", 0),
    theoretical_mz_from_sequence("PEPTIDEK", -1),
    theoretical_mz_from_formula("C6H12O6", 0),
    theoretical_mz_from_formula("C6H12O6", -1)) so the test suite ensures
    invalid-charge handling is enforced.

In @scripts/proteomics/feature_detection_proteomics/requirements.txt:

  • Line 1: The requirements file currently lists an unpinned dependency
    "pyopenms"; update that entry in requirements.txt to a validated version or
    range (for example pin to the exact tested version or a bounded range like

=X.Y.Z,<A.B.C) so CI and local environments remain reproducible; after changing
the "pyopenms" line, run the proteomics feature_detection_proteomics tests or
the project's dependency verification job to confirm the chosen version works.

In
@scripts/proteomics/feature_detection_proteomics/tests/test_feature_detection_proteomics.py:

  • Around line 32-34: The test currently only checks that the output_path file
    exists; instead, load the written featureXML into a new oms.FeatureMap (use
    oms.FeatureXMLFile().load or the appropriate loader) and assert the loaded map
    is an oms.FeatureMap and that its feature count matches the returned fm (compare
    fm.size() or len(fm) to loaded_map.size() or len(loaded_map)); replace the
    assert os.path.exists(output_path) with these checks so the persisted content is
    validated against the returned fm from detect_features.

In @scripts/proteomics/peptide_mass_calculator/requirements.txt:

  • Line 1: Replace the unbounded pyopenms entry in requirements.txt with a
    pinned, tested version range (or exact version) to ensure reproducible CI;
    update the single line "pyopenms" to a constrained specification such as a
    specific version or a range that your tests validate (for example,
    pyopenms>=2.7.0,<2.8.0) and ensure this matches the version used in your CI/test
    matrix.

In @scripts/proteomics/spectrum_file_info/tests/test_spectrum_file_info.py:

  • Around line 25-57: Add a regression test that builds an MSExperiment
    containing spectra whose get_peaks() returns empty (i.e., spectra with no
    peaks), call summarise_experiment on it, then assert the experiment-level
    counters still report the correct number of spectra (summary["n_spectra"] == n)
    and that len(summary["tic_per_spectrum"]) == n to ensure TIC is recorded per
    spectrum, and verify the mz_range fallback is handled by asserting
    summary["mz_range"] is falsy (e.g., None or an empty/None tuple) when no peaks
    exist; reference summarise_experiment, tic_per_spectrum, get_peaks(), and
    mz_range to locate the behavior to test.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `99ad1eec-e6f7-4632-9c4b-a6b7038cfb99`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between a7ade618be33016aa93f8fed05ea900091fbf767 and 6ca1237b84940522ea34f992d9d1f1c108cb0b5e.

</details>

<details>
<summary>📒 Files selected for processing (44)</summary>

* `.claude/skills/contribute-script.md`
* `.claude/skills/validate-script.md`
* `.github/workflows/validate.yml`
* `AGENTS.md`
* `CLAUDE.md`
* `README.md`
* `docs/superpowers/plans/2026-03-24-ai-contributor-skills.md`
* `docs/superpowers/specs/2026-03-24-ai-contributor-skills-design.md`
* `ruff.toml`
* `scripts/metabolomics/isotope_pattern_matcher/README.md`
* `scripts/metabolomics/isotope_pattern_matcher/isotope_pattern_matcher.py`
* `scripts/metabolomics/isotope_pattern_matcher/requirements.txt`
* `scripts/metabolomics/isotope_pattern_matcher/tests/conftest.py`
* `scripts/metabolomics/isotope_pattern_matcher/tests/test_isotope_pattern_matcher.py`
* `scripts/metabolomics/mass_accuracy_calculator/README.md`
* `scripts/metabolomics/mass_accuracy_calculator/mass_accuracy_calculator.py`
* `scripts/metabolomics/mass_accuracy_calculator/requirements.txt`
* `scripts/metabolomics/mass_accuracy_calculator/tests/conftest.py`
* `scripts/metabolomics/mass_accuracy_calculator/tests/test_mass_accuracy_calculator.py`
* `scripts/metabolomics/metabolite_feature_detection/README.md`
* `scripts/metabolomics/metabolite_feature_detection/metabolite_feature_detection.py`
* `scripts/metabolomics/metabolite_feature_detection/requirements.txt`
* `scripts/metabolomics/metabolite_feature_detection/tests/conftest.py`
* `scripts/metabolomics/metabolite_feature_detection/tests/test_metabolite_feature_detection.py`
* `scripts/proteomics/feature_detection_proteomics/README.md`
* `scripts/proteomics/feature_detection_proteomics/feature_detection_proteomics.py`
* `scripts/proteomics/feature_detection_proteomics/requirements.txt`
* `scripts/proteomics/feature_detection_proteomics/tests/conftest.py`
* `scripts/proteomics/feature_detection_proteomics/tests/test_feature_detection_proteomics.py`
* `scripts/proteomics/peptide_mass_calculator/README.md`
* `scripts/proteomics/peptide_mass_calculator/peptide_mass_calculator.py`
* `scripts/proteomics/peptide_mass_calculator/requirements.txt`
* `scripts/proteomics/peptide_mass_calculator/tests/conftest.py`
* `scripts/proteomics/peptide_mass_calculator/tests/test_peptide_mass_calculator.py`
* `scripts/proteomics/protein_digest/README.md`
* `scripts/proteomics/protein_digest/protein_digest.py`
* `scripts/proteomics/protein_digest/requirements.txt`
* `scripts/proteomics/protein_digest/tests/conftest.py`
* `scripts/proteomics/protein_digest/tests/test_protein_digest.py`
* `scripts/proteomics/spectrum_file_info/README.md`
* `scripts/proteomics/spectrum_file_info/requirements.txt`
* `scripts/proteomics/spectrum_file_info/spectrum_file_info.py`
* `scripts/proteomics/spectrum_file_info/tests/conftest.py`
* `scripts/proteomics/spectrum_file_info/tests/test_spectrum_file_info.py`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +75 to +80
```python
try:
import pyopenms as oms
except ImportError:
sys.exit("pyopenms is required. Install it with: pip install pyopenms")
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether script modules using sys.exit also import sys.
# Expected: no output (or review any printed files).

fd -e py . scripts | while read -r f; do
  if rg -n '\bsys\.exit\s*\(' "$f" >/dev/null; then
    if ! rg -n '^\s*import\s+sys\b|^\s*from\s+sys\s+import\b' "$f" >/dev/null; then
      echo "Missing import sys: $f"
    fi
  fi
done

Repository: OpenMS/agentomics

Length of output: 43


🏁 Script executed:

cat -n .claude/skills/contribute-script.md | sed -n '70,85p'

Repository: OpenMS/agentomics

Length of output: 773


🏁 Script executed:

cat -n .claude/skills/contribute-script.md | sed -n '60,90p'

Repository: OpenMS/agentomics

Length of output: 1231


Add import sys to the pyopenms import-guard pattern.

The documented pattern at lines 75–80 uses sys.exit() without showing import sys. Contributors copying this exact snippet will encounter a NameError on the import failure path.

Suggested doc fix
 - pyopenms import guard:
   ```python
+  import sys
   try:
       import pyopenms as oms
   except ImportError:
       sys.exit("pyopenms is required. Install it with:  pip install pyopenms")
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/contribute-script.md around lines 75 - 80, The snippet uses
sys.exit in the ImportError handler but never imports sys, which will raise
NameError on failure; add an import sys at the top of the block so the
try/except around importing pyopenms (import pyopenms as oms) can call
sys.exit(...) safely when ImportError is caught.

Comment on lines +21 to +30
```bash
SCRIPT_DIR=<path-to-script-directory>
VENV_DIR=$(mktemp -d)
python -m venv "$VENV_DIR"
"$VENV_DIR/bin/python" -m pip install -r "$SCRIPT_DIR/requirements.txt"
"$VENV_DIR/bin/python" -m pip install pytest ruff
"$VENV_DIR/bin/python" -m ruff check "$SCRIPT_DIR/"
PYTHONPATH="$SCRIPT_DIR" "$VENV_DIR/bin/python" -m pytest "$SCRIPT_DIR/tests/" -v
rm -rf "$VENV_DIR"
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Demonstrate cleanup behavior with and without trap.
# Expected: without trap, temp dir remains after failure; with trap, it is removed.

set -euo pipefail

echo "Without trap:"
if bash -c 'set -e; V=$(mktemp -d); false; rm -rf "$V"' 2>/dev/null; then :; fi
# Can't read V from subshell directly; show equivalent reproducible pattern:
bash -c 'set -e; V=$(mktemp -d); echo "$V" > /tmp/venv_path_no_trap; false; rm -rf "$V"' || true
V1=$(cat /tmp/venv_path_no_trap)
if [ -d "$V1" ]; then echo "Leaked: $V1"; rm -rf "$V1"; fi

echo "With trap:"
bash -c 'set -e; V=$(mktemp -d); echo "$V" > /tmp/venv_path_trap; trap "rm -rf \"$V\"" EXIT; false' || true
V2=$(cat /tmp/venv_path_trap)
if [ -d "$V2" ]; then echo "Unexpected leak: $V2"; else echo "Cleaned as expected"; fi

Repository: OpenMS/agentomics

Length of output: 133


🏁 Script executed:

# Check if the file exists and read the relevant lines
if [ -f ".claude/skills/validate-script.md" ]; then
  echo "=== File exists, reading lines 15-40 for context ==="
  sed -n '15,40p' ".claude/skills/validate-script.md" | cat -n
else
  echo "File not found at .claude/skills/validate-script.md"
  # Search for similar files
  find . -name "*validate*script*" -o -name "*.md" -path "*claude*" 2>/dev/null | head -20
fi

Repository: OpenMS/agentomics

Length of output: 1064


🏁 Script executed:

# Get actual line numbers and check for line 34 reference
wc -l ".claude/skills/validate-script.md"
echo "=== Lines 20-35 (to see full context and line 34) ==="
sed -n '20,35p' ".claude/skills/validate-script.md" | cat -n

Repository: OpenMS/agentomics

Length of output: 886


Add trap-based cleanup to ensure venv removal on failure.

The current code does not guarantee cleanup if any command fails before the final rm -rf. Step 5 (line 34) explicitly requires cleanup even when validation fails. Add trap 'rm -rf "$VENV_DIR"' EXIT after VENV_DIR=$(mktemp -d) and remove the final rm -rf "$VENV_DIR" line.

Suggested command block update
   ```bash
   SCRIPT_DIR=<path-to-script-directory>
   VENV_DIR=$(mktemp -d)
+  trap 'rm -rf "$VENV_DIR"' EXIT
   python -m venv "$VENV_DIR"
   "$VENV_DIR/bin/python" -m pip install -r "$SCRIPT_DIR/requirements.txt"
   "$VENV_DIR/bin/python" -m pip install pytest ruff
   "$VENV_DIR/bin/python" -m ruff check "$SCRIPT_DIR/"
   PYTHONPATH="$SCRIPT_DIR" "$VENV_DIR/bin/python" -m pytest "$SCRIPT_DIR/tests/" -v
-  rm -rf "$VENV_DIR"
</details>

Also applies to: 34-34

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/validate-script.md around lines 21 - 30, Add a trap-based
cleanup to guarantee removal of the temporary virtualenv: immediately after
VENV_DIR=$(mktemp -d) register trap 'rm -rf "$VENV_DIR"' EXIT (so it runs on any
exit/failure) and delete the final explicit rm -rf "$VENV_DIR" line at the end;
ensure the trap references the same VENV_DIR variable and uses the EXIT signal
so the venv is removed on both success and failure.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +24 to +27
CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'scripts/' \
| grep -oP 'scripts/[^/]+/[^/]+/' \
| sort -u \
| jq -R -s -c 'split("\n") | map(select(length > 0))')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Reproduce grep no-match behavior under bash -e in a command-substitution pipeline.
# Expected: first block fails; second block with `|| true` succeeds.

set -euo pipefail

echo "a/b/c.txt" > /tmp/changed_paths.txt

echo "Case 1 (without || true) - should fail:"
if bash -e -c 'X=$(cat /tmp/changed_paths.txt | grep -oP "scripts/[^/]+/[^/]+/" | sort -u); echo "$X"'; then
  echo "Unexpected success"
else
  echo "Failed as expected"
fi

echo "Case 2 (with || true) - should succeed:"
bash -e -c 'X=$(cat /tmp/changed_paths.txt | grep -oP "scripts/[^/]+/[^/]+/" || true); echo "X=${X:-<empty>}"'

Repository: OpenMS/agentomics

Length of output: 170


🏁 Script executed:

cat -n .github/workflows/validate.yml | head -50

Repository: OpenMS/agentomics

Length of output: 1897


🏁 Script executed:

git ls-files '.github/workflows/'

Repository: OpenMS/agentomics

Length of output: 91


🌐 Web query:

GitHub Actions bash run shell set -e default behavior

💡 Result:

In GitHub Actions, for run steps on Linux/macOS runners (non-Windows), the default shell behavior when no shell is explicitly specified is to use bash -e {0} if bash is available, or sh -e {0} otherwise. The -e flag enables set -e semantics, causing the script to exit immediately if any command returns a non-zero exit code (fail-fast behavior). When shell: bash is explicitly specified, it uses bash --noprofile --norc -eo pipefail {0}, which includes set -e, set -o pipefail (fail on pipeline failure), and suppresses profile/rc loading. This is the default to enforce fail-fast behavior in CI scripts, ensuring steps fail early on errors. To disable, use shell: bash {0} or set +e in the script. On Windows, the default is PowerShell, not bash.

Citations:


🏁 Script executed:

fd -t f '.github/workflows/' -name '*.yml' -o -name '*.yaml' | xargs grep -l 'git diff' | head -5

Repository: OpenMS/agentomics

Length of output: 291


🏁 Script executed:

#!/bin/bash
# Test grep behavior with pipefail (which GitHub Actions uses by default)

echo "a/b/c.txt" > /tmp/changed_paths.txt

echo "Case 1 (bash -eo pipefail, without || true) - should fail:"
if bash -eo pipefail -c 'X=$(cat /tmp/changed_paths.txt | grep -oP "scripts/[^/]+/[^/]+/" | sort -u); echo "$X"'; then
  echo "Unexpected success"
else
  echo "Failed as expected (exit code: $?)"
fi

echo ""
echo "Case 2 (bash -eo pipefail, with || true) - should succeed:"
if bash -eo pipefail -c 'X=$(cat /tmp/changed_paths.txt | grep -oP "scripts/[^/]+/[^/]+/" || true | sort -u); echo "X=${X:-<empty>}"'; then
  echo "Succeeded as expected"
else
  echo "Unexpected failure"
fi

Repository: OpenMS/agentomics

Length of output: 245


🏁 Script executed:

fd '.github/workflows' -type f

Repository: OpenMS/agentomics

Length of output: 231


🏁 Script executed:

find .github/workflows -type f -name '*.yml' -o -name '*.yaml'

Repository: OpenMS/agentomics

Length of output: 91


🏁 Script executed:

grep -r "grep -oP" .github/workflows/

Repository: OpenMS/agentomics

Length of output: 139


Add fallback to grep to prevent pipeline failure on zero matches.

In GitHub Actions, the default shell (bash -eo pipefail) will fail the entire step if grep finds no matches and returns exit code 1. When no script directories change, this pipeline fails before reaching the error-handling check at line 29. Add || true after grep to ensure the pipeline continues.

Suggested fix
          CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'scripts/' \
            | grep -oP 'scripts/[^/]+/[^/]+/' \
+            || true \
            | sort -u \
             | jq -R -s -c 'split("\n") | map(select(length > 0))')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/validate.yml around lines 24 - 27, The CHANGED assignment
pipeline can fail when grep finds no matches; modify the command that pipes to
grep (the part using grep -oP 'scripts/[^/]+/[^/]+/') to add a fallback (append
|| true) so grep's non-zero exit doesn't break the step, ensuring CHANGED
becomes an empty list instead of causing the job to fail; update the pipeline
around the CHANGED variable assignment and keep the rest of the processing (sort
-u | jq -R -s -c ...) unchanged.

Comment on lines +28 to +37
```python
import sys, os, pytest
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
try:
import pyopenms
HAS_PYOPENMS = True
except ImportError:
HAS_PYOPENMS = False
requires_pyopenms = pytest.mark.skipif(not HAS_PYOPENMS, reason="pyopenms not installed")
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistency: conftest.py template differs from actual implementation.

The template shows import sys, os, pytest on a single line (line 29), which violates PEP 8 and the ruff isort rules configured in ruff.toml. The actual implementation in scripts/metabolomics/metabolite_feature_detection/tests/conftest.py correctly separates imports.

Update the template to match the actual implementation pattern:

📝 Proposed fix
- ```python
- import sys, os, pytest
- sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+ ```python
+ import os
+ import sys
+ 
+ import pytest
+ 
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```python
import sys, os, pytest
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
try:
import pyopenms
HAS_PYOPENMS = True
except ImportError:
HAS_PYOPENMS = False
requires_pyopenms = pytest.mark.skipif(not HAS_PYOPENMS, reason="pyopenms not installed")
```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-24-ai-contributor-skills-design.md` around
lines 28 - 37, The conftest.py template currently combines imports on one line;
update the template so imports follow PEP8/ruff isort style by splitting "import
sys, os, pytest" into separate, ordered imports (import os; import sys; import
pytest) with appropriate blank lines, keeping the existing sys.path.insert(0,
os.path.join(os.path.dirname(__file__), "..")) and preserving the try/except
that sets HAS_PYOPENMS; modify the template's import block to match the actual
conftest.py pattern used in
scripts/metabolomics/metabolite_feature_detection/tests/conftest.py and ensure
symbols like HAS_PYOPENMS and sys.path.insert remain unchanged.

Comment on lines +53 to +56
if not peaks:
return []
max_ab = max(ab for _, ab in peaks)
return [(mz, ab / max_ab * 100) for mz, ab in peaks]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential division by zero when all peak intensities are zero.

If max_ab is 0 (all peaks have zero intensity), line 56 will raise a ZeroDivisionError. While rare, this could occur with unusual formula inputs.

🛡️ Proposed fix
     if not peaks:
         return []
     max_ab = max(ab for _, ab in peaks)
+    if max_ab == 0:
+        return [(mz, 0.0) for mz, _ in peaks]
     return [(mz, ab / max_ab * 100) for mz, ab in peaks]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not peaks:
return []
max_ab = max(ab for _, ab in peaks)
return [(mz, ab / max_ab * 100) for mz, ab in peaks]
if not peaks:
return []
max_ab = max(ab for _, ab in peaks)
if max_ab == 0:
return [(mz, 0.0) for mz, _ in peaks]
return [(mz, ab / max_ab * 100) for mz, ab in peaks]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/metabolomics/isotope_pattern_matcher/isotope_pattern_matcher.py`
around lines 53 - 56, The code computes max_ab = max(ab for _, ab in peaks) and
divides by it, which can raise ZeroDivisionError if all intensities are zero;
fix by checking if max_ab == 0 and in that case return the peaks with normalized
intensity 0.0 (e.g. return [(mz, 0.0) for mz, _ in peaks]) instead of performing
the division; update the branch that currently builds the normalized list (the
list comprehension using mz, ab in peaks) to handle this zero-max_ab case.

Comment on lines +32 to +59
def peptide_masses(sequence: str, charge: int = 1) -> dict:
"""Return monoisotopic and average masses for the given peptide sequence.

Parameters
----------
sequence:
Amino acid sequence, optionally with bracket-enclosed modifications,
e.g. ``"PEPTM[147]IDEK"``.
charge:
Desired charge state for m/z calculation (default 1).

Returns
-------
dict
Dictionary with keys ``monoisotopic_mass``, ``average_mass``,
``mz_monoisotopic``, and ``mz_average``.
"""
aa_seq = oms.AASequence.fromString(sequence)
mono = aa_seq.getMonoWeight()
avg = aa_seq.getAverageWeight()
return {
"sequence": sequence,
"charge": charge,
"monoisotopic_mass": mono,
"average_mass": avg,
"mz_monoisotopic": (mono + charge * PROTON) / charge,
"mz_average": (avg + charge * PROTON) / charge,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate charge before m/z calculation.

Line 57 and Line 58 divide by charge without bounds checks; --charge 0 will crash with ZeroDivisionError.

Suggested fix
 def peptide_masses(sequence: str, charge: int = 1) -> dict:
@@
-    aa_seq = oms.AASequence.fromString(sequence)
+    if charge <= 0:
+        raise ValueError("charge must be >= 1")
+    aa_seq = oms.AASequence.fromString(sequence)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/peptide_mass_calculator/peptide_mass_calculator.py` around
lines 32 - 59, In peptide_masses, validate the charge parameter before
performing m/z calculations: in the function peptide_masses(sequence: str,
charge: int = 1) check that charge is a positive integer (e.g. if not
isinstance(charge, int) or charge <= 0) and raise a clear ValueError such as
"charge must be a positive integer" to prevent ZeroDivisionError when computing
mz_monoisotopic and mz_average using PROTON.

Comment on lines +38 to +44
def digest_protein(
sequence: str,
enzyme: str = "Trypsin",
missed_cleavages: int = 0,
min_length: int = 6,
max_length: int = 40,
) -> list:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing boundary validation for digestion parameters.

missed_cleavages, min_length, and max_length accept invalid values (e.g., negatives or min_length > max_length) and are passed through directly. Add explicit validation before digestion.

Suggested fix
@@
 def digest_protein(
@@
 ) -> list:
@@
+    if missed_cleavages < 0:
+        raise ValueError("missed_cleavages must be >= 0")
+    if min_length < 1 or max_length < 1:
+        raise ValueError("min_length and max_length must be >= 1")
+    if min_length > max_length:
+        raise ValueError("min_length must be <= max_length")
+
     protein_seq = oms.AASequence.fromString(sequence)
@@
     args = parser.parse_args()
@@
     if not args.sequence:
         parser.error("--sequence is required unless --list-enzymes is used.")
+    if args.missed_cleavages < 0:
+        parser.error("--missed-cleavages must be >= 0.")
+    if args.min_length < 1 or args.max_length < 1:
+        parser.error("--min-length and --max-length must be >= 1.")
+    if args.min_length > args.max_length:
+        parser.error("--min-length must be <= --max-length.")

Also applies to: 66-73, 101-121, 137-146

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/protein_digest/protein_digest.py` around lines 38 - 44,
The digest_protein function currently allows invalid digestion params; add
explicit validation at the top of digest_protein to ensure missed_cleavages is a
non-negative integer and min_length and max_length are positive integers with
min_length <= max_length (raise ValueError with clear message on invalid input),
and clamp or reject values outside sensible bounds; apply the same validation
logic to the other digestion functions that accept the same parameters (any
other functions that take missed_cleavages, min_length, max_length in this
module) so invalid inputs are caught before digestion logic runs.

Comment on lines +62 to +64
list of dict
Each entry contains ``sequence``, ``start``, ``end``,
``monoisotopic_mass``, and ``missed_cleavages``.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return docstring is out of sync with actual payload.

Line [63]-[64] documents start, end, and missed_cleavages, but returned dicts only contain sequence, length, and monoisotopic_mass.

Also applies to: 79-83

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/protein_digest/protein_digest.py` around lines 62 - 64,
The docstring describing the returned list-of-dicts is out of sync: it mentions
keys `start`, `end`, and `missed_cleavages` while the actual payload contains
`sequence`, `length`, and `monoisotopic_mass`. Update the function's return
docstring (the docstring block that currently lists `sequence`, `start`, `end`,
`monoisotopic_mass`, `missed_cleavages`) to accurately describe the actual
returned keys (`sequence`, `length`, `monoisotopic_mass`) or alternatively
modify the function that builds the dicts so they include `start`, `end`, and
`missed_cleavages`; ensure the docstring and the returned dict structure
(produced by the code that constructs each peptide dict) match exactly.

Comment on lines +4 to +5
Read an mzML (or mzXML) file and print a summary of its contents:
number of spectra, MS levels, retention time range, m/z range,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Input-format claim is broader than the implemented loader.

Line [4] advertises mzML/mzXML, but Line [86] only uses MzMLFile().load(...). Please either narrow the docs/help text to mzML-only or add explicit mzXML handling.

Also applies to: 73-87

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/spectrum_file_info/spectrum_file_info.py` around lines 4 -
5, The docstring/help claims support for "mzML (or mzXML)" but the loader call
MzMLFile().load(...) only handles mzML; update the code to make the behavior
match the docs by either changing the top-level help text to "mzML only" or
adding explicit mzXML handling: detect the input file extension inside
spectrum_file_info.py and call the appropriate loader (e.g., use
MzXMLFile().load(...) for .mzXML files) so that the loader selection (MzMLFile
vs MzXMLFile) aligns with the advertised formats.

Comment on lines +56 to +70
mzs, intensities = spec.get_peaks()
if len(mzs) > 0:
mz_min = min(mz_min, float(mzs.min()))
mz_max = max(mz_max, float(mzs.max()))
tic_values.append(float(intensities.sum()))

return {
"n_spectra": len(spectra),
"ms_levels": ms_levels,
"rt_range_sec": (rt_min, rt_max),
"mz_range": (mz_min, mz_max),
"tic_total": sum(tic_values),
"tic_max": max(tic_values) if tic_values else 0.0,
"tic_per_spectrum": tic_values,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

summarise_experiment drops zero-peak spectra and can emit invalid m/z ranges.

At Line [60], TIC is appended only when peaks exist, so tic_per_spectrum may not align with spectrum count/indexing. Also, if all spectra have no peaks, Line [66] returns inf/-inf for mz_range, which is misleading in CLI output.

Suggested fix
@@
-    mz_min = float("inf")
-    mz_max = float("-inf")
+    mz_min = float("inf")
+    mz_max = float("-inf")
+    has_peak_data = False
@@
-        mzs, intensities = spec.get_peaks()
-        if len(mzs) > 0:
+        mzs, intensities = spec.get_peaks()
+        tic_values.append(float(intensities.sum()))
+        if len(mzs) > 0:
+            has_peak_data = True
             mz_min = min(mz_min, float(mzs.min()))
             mz_max = max(mz_max, float(mzs.max()))
-            tic_values.append(float(intensities.sum()))
@@
+    mz_range = (mz_min, mz_max) if has_peak_data else (None, None)
     return {
@@
-        "mz_range": (mz_min, mz_max),
+        "mz_range": mz_range,
@@
-    mz_lo, mz_hi = summary["mz_range"]
-    print(f"{'m/z range':<22}: {mz_lo:.4f} – {mz_hi:.4f}")
+    mz_lo, mz_hi = summary["mz_range"]
+    if mz_lo is None:
+        print(f"{'m/z range':<22}: n/a (no centroid/profile peaks)")
+    else:
+        print(f"{'m/z range':<22}: {mz_lo:.4f} – {mz_hi:.4f}")

Also applies to: 124-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/proteomics/spectrum_file_info/spectrum_file_info.py` around lines 56
- 70, summarise_experiment currently skips spectra with zero peaks which
misaligns tic_per_spectrum and can leave mz_min/mz_max as inf/-inf; change the
loop that calls spec.get_peaks() so you always append a TIC value (append 0.0
when intensities is empty) and only update mz_min/mz_max when peaks exist, then
after the loop replace any leftover inf/-inf mz bounds with a safe default
(e.g., (0.0, 0.0) or None) before returning; apply the same fix in the duplicate
block around the second occurrence (lines noted as 124-126) so functions/methods
summarise_experiment, spec.get_peaks(), mz_min, mz_max, and tic_per_spectrum are
updated accordingly.

@ypriverol ypriverol merged commit e4c5f86 into main Mar 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants