Skip to content

Conversation

ypriverol
Copy link
Member

@ypriverol ypriverol commented Aug 15, 2025

PR Type

Documentation


Description

  • Renamed MS2Rescore documentation file to rescoring

  • Completely rewrote rescoring documentation with comprehensive content

  • Updated table of contents references

  • Added detailed technical specifications and usage examples


Diagram Walkthrough

flowchart LR
  A["ms2rescore.rst"] -- "renamed to" --> B["rescoring.rst"]
  B -- "expanded content" --> C["Technical Details"]
  B -- "added" --> D["Usage Examples"]
  B -- "included" --> E["Configuration Parameters"]
  F["identification.rst"] -- "updated references" --> B
Loading

File Walkthrough

Relevant files
Documentation
identification.rst
Updated documentation references to rescoring                       

docs/identification.rst

  • Updated table of contents reference from ms2rescore to rescoring
  • Changed documentation link reference in rescoring section
+2/-2     
ms2rescore.rst
Removed original MS2Rescore documentation                               

docs/ms2rescore.rst

  • Completely removed the original MS2Rescore documentation file
  • File contained 73 lines of basic MS2Rescore information
+0/-73   
rescoring.rst
Created comprehensive rescoring documentation                       

docs/rescoring.rst

  • Created comprehensive new rescoring documentation (238 lines)
  • Added detailed technical specifications for quantms-rescoring
    framework
  • Included configuration parameters, usage examples, and troubleshooting
  • Added supported MS2PIP models table and feature descriptions
+238/-0 

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive Rescoring guide (replacing legacy MS2Rescore), new met_excision subsection, Getting Started/Quickstart/Tutorials/Troubleshooting/Glossary/Capabilities/Comparison/Inputs‑Outputs/quantms_output/protein_database and many other doc pages; refactored landing/navigation and cross‑references; added parameter anchors and guides.
    • Added a public Nextflow parameter schema and automated parameter docs generation.
  • Chores
    • Docs release bumped to 1.6.0, Slack badges secured (HTTPS), updated .gitignore, and new site styling (custom CSS).

@ypriverol ypriverol requested review from Copilot and enryH August 15, 2025 15:28
Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Warning

Rate limit exceeded

@ypriverol has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4047a and 4c9c2a3.

📒 Files selected for processing (3)
  • docs/protein_database.rst (1 hunks)
  • docs/setup_docs.py (2 hunks)
  • docs/usage.rst (3 hunks)

Walkthrough

Reorganized and expanded project documentation: removed legacy MS2Rescore page, added a new Rescoring doc and many user-facing guides (getting_started, quickstart, tutorials, outputs, protein_database, troubleshooting, glossary, comparison, capabilities), introduced a Nextflow JSON schema and schema-driven docs generation, bumped docs to 1.6.0, and added styling and housekeeping changes.

Changes

Cohort / File(s) Summary
Rescoring & identification
docs/ms2rescore.rst, docs/rescoring.rst, docs/identification.rst, docs/percolator.rst
Removed legacy ms2rescore.rst; added rescoring.rst; updated references in identification.rst (ms2rescore → rescoring); added Percolator "Relevant parameters" cross-references and a met_excision subsection.
Getting started & onboarding
docs/getting_started.rst, docs/quickstart.rst, docs/tutorials.rst
Added comprehensive Getting Started guide, 5-minute Quickstart, and multi-track Tutorials (Beginner / Intermediate / Advanced).
Outputs & databases
docs/quantms_output.rst, docs/protein_database.rst, docs/inputs_outputs.rst
Added detailed outputs layout (quantms_output.rst), comprehensive protein FASTA/database guidance, and inputs/outputs overview with toctree and cross-links.
Index, navigation & metadata
docs/index.rst, docs/conf.py, docs/contact.rst, docs/dev.rst, docs/usage.rst
Reworked landing page with HTML cards and updated toctree; bumped release to 1.6.0; added html_css_files=['custom.css']; switched Slack badge to HTTPS; assorted wording/branding tweaks.
DIA / formats / workflow docs
docs/dia.rst, docs/formats.rst, docs/msstats.rst, docs/fdr.rst, docs/idpep.rst, docs/inference.rst, docs/introduction.rst
Grammar/structure edits, added anchors and "Relevant parameters" cross-references, reorganized sections, clarified FDR wording, and removed some image directives.
LFQ / ISO / identification cross-links
docs/lfq.rst, docs/iso.rst, docs/msstats.rst, docs/percolator.rst
Inserted parameter cross-reference blocks and anchors to improve navigation and link parameter pages.
Capabilities & comparison
docs/capabilities.rst, docs/comparison.rst
Added Capabilities timeline and a high-level comparison table vs. other tools.
New troubleshooting & glossary
docs/troubleshooting.rst, docs/glossary.rst
Added Troubleshooting guide and a Glossary with common terms and definitions.
Schema & docs tooling
docs/nextflow_schema.json, docs/setup_docs.py
Added comprehensive Nextflow JSON schema; updated setup_docs.py with SCHEMA_URL, Markdown→reST converter, schema-to-reST renderer, generate_parameters() and main flow to run both download and schema generation.
Styling & static assets
docs/_static/custom.css
Added custom.css with parameter-table, nav/button, section header, admonition, responsive and search-highlight styles.
Housekeeping
.gitignore
Added common ignore patterns (.venv, .DS_Store, .idea, .vscode, .pytest_cache, coverage files).

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant PSM as PSM inputs
  participant FG as Feature Generators (MS2PIP / DeepLC / AlphaPeptDeep)
  participant Per as Percolator (rescoring)
  participant Output as Rescored PSMs / Reports

  User->>PSM: submit multi-engine PSM inputs
  PSM->>FG: request predicted/derived features
  FG-->>PSM: attach generated features
  PSM->>Per: send feature-annotated PSMs
  Per-->>Output: compute rescored rankings and export tables
  Output->>User: deliver rescored artifacts and reports
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

A rabbit hops through pages bright,
I nibble lines by soft lamplight.
Schemas sprout and guides unfurl,
Rescoring sings — the docs now twirl.
I stash a carrot, leave a clue — hop on, reader; enjoy the view.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch v1.6.0_yasset

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Broken Links

Several internal references use :doc: roles (e.g., iso, lfq, msgf, comet, sage, percolator) that may not match actual document names/paths after the rename. Validate that all :doc: targets resolve and Sphinx builds without warnings.

quantms DDA workflows DDAplex (:doc:`iso`) and DDA-LFQ (:doc:`lfq`) use multiple search engines (:doc:`msgf`, :doc:`comet`, :doc:`sage`) to identify peptides from mass spectra. The output of these search engines is a list of peptide-spectrum matches (PSMs) along with their corresponding scores, which vary depending on the search engine. 

quantms uses :doc:`percolator`, an SVM-based classifier, to boost the number of identifications by using additional properties of the PSMs to better separate true positives from false positives. Among the properties Percolator uses are: retention time, number of missed cleavages, peptide length, and the original search engine scores. The majority of these properties are computed by Percolator or calculated by the OpenMS PercolatorAdapter. 

.. important:: From our experience with Percolator and running datasets with quantms at quantms.org, this is the most stable/production-ready way to boost the number of identifications. 

Recent developments in deep-learning algorithms have enabled the proteomics community to compute additional properties for PSMs that are now available for Percolator to enable more accurate rescoring. quantms has developed a component called `quantms-rescoring <https://github.com/bigbio/quantms-rescoring>`_ which enables the computation of additional properties/features for each PSM using these algorithms.
Formatting

Tables under "MS2PIP Features", "DeepLC Features", and "Spectrum Quality Features" mix pipe tables within a bullet/paragraph context and include an emoji; this may not render correctly in Sphinx/reST. Convert to proper reST tables and remove emoji for consistent rendering.

MS2PIP Features (70+ features)
| Feature Category | Examples |
|-----------------|----------|
| Spectral Correlations | SpecPearson, SpecCosineNorm, DotProd |
| Ion-specific Metrics | IonBPearsonNorm, IonYPearsonNorm |
| Statistical Measures | SpecMseNorm, MinAbsDiffNorm, MeanAbsDiffNorm |

AlphaPeptDeep Integration
- **Generic Deep Learning Model**: Leverages advanced neural networks for fragment intensity prediction
- **Transfer Learning**: Competes with MS2PIP to select the best-performing model
- **Feature-Rich Output**: Generates comprehensive spectral similarity features

DeepLC Optimization
- **Model Benchmarking**: Automatically compares pretrained vs. retrained models
- **Per-Run Calibration**: Calibrates models for each run to account for chromatographic variations
- **MAE-Based Selection**: Selects model with lowest Mean Absolute Error for retention time prediction
- **Best Peptide Tracking**: Maintains best retention time prediction for each peptide across multiple PSMs

DeepLC Features (6 features)
| Feature | Description |
|---------|-------------|
| ObservedRetentionTime | Experimental retention time |
| PredictedRetentionTime | Model-predicted retention time |
| RtDiff | Difference between observed and predicted |
| *Best variants | Best predictions across multiple PSMs |

Spectrum-Based Features 📊

Signal Quality Metrics
- **Signal-to-Noise Ratio (SNR)**: Calculates ratio of maximum intensity to background noise
- **Spectral Entropy**: Quantifies uniformity of peak distribution
- **TIC Distribution Analysis**: Analyzes distribution of Total Ion Current across peaks
- **Weighted m/z Standard Deviation**: Estimates spectral complexity through intensity-weighted calculations

Spectrum Quality Features (4 features)
| Feature | Description |
|---------|-------------|
| Snr | Signal-to-noise ratio |
| SpectralEntropy | Peak distribution uniformity |
| FracTICinTop10Peaks | Fraction of total ion current in top peaks |
| WeightedStdMz | Intensity-weighted m/z standard deviation |
Consistency

Terminology alternates between quantms-rescoring and MS2Rescore; also references MS2 level vs. MS2PIP capitalization. Unify naming (e.g., quantms-rescoring) and ensure the troubleshooting section matches the new name.

The package implements multiprocessing capabilities for efficient handling of large datasets and maintains compatibility with OpenMS formats (idXML and mzML). It supports various fragmentation methods and MS analyzers through multiple pre-trained models.

.. note:: The quantms-rescoring package currently supports only MS2 level spectra and requires that input files contain consistent MS levels and dissociation methods.

Feature Generators in quantms-rescoring
--------------------------------------

quantms-rescoring uses multiple open-source components to compute additional features for each PSM based on deep-learning algorithms:

- **DeepLC**: Predicts peptide retention times using deep learning, including modified peptides 
- **MS2PIP**: Predicts MS/MS peak intensities for peptide fragmentation 
- **AlphaPeptDeep**: Predicts peptide fragmentation intensity and retention time
- **Spectrum features**: Computes signal-to-noise ratio, spectral entropy, and other spectrum-derived metrics

Using the parameter ``--feature_generators``, you can specify which feature generators to use. Possible values are ``deeplc``, ``ms2pip``, ``alphapeptdeep``. For ms2pip and alphapeptdeep, the model is specified using the ``--ms2_model`` parameter. To see the supported models, see the :ref:`supported-ms2pip-models` section. If the feature generator is alphapeptdeep, the ``--ms2_model`` MUST be ``generic``. For deeplc, the model is selected automatically. 

.. note:: quantms-rescoring uses the model specified by the generator to compute features. The ms2pip and alphapeptdeep models are mutually exclusive; they cannot be used together.

Configuration Parameters
------------------------

quantms provides comprehensive control over the rescoring process through the following parameters:

**Core Rescoring Settings**

- ``--ms2rescore`` (default: ``false``): Enable/disable peptide identification rescoring with LC-MS predictors such as MS²PIP and DeepLC
- ``--rescore_range`` (default: ``independent_run``): Defines the scope of rescoring:  
  - ``independent_run``: Rescoring performed independently for each run
  - ``by_sample``: Rescoring performed at the sample level
  - ``by_project``: Rescoring performed across the entire project

**Model Configuration**

- ``--ms2_model`` (default: ``HCD2021``): Specifies which deep learning model to use for feature generation
- ``--ms2_model_dir`` (default: ``null``): Path to local MS2 prediction model files to avoid repeated downloads
- ``--find_best_model`` (default: ``true``): Automatically find the best MS2 model for the dataset
- ``--force_model`` (default: ``false``): Force usage of the specified MS2PIP model without optimization

**Feature Generation**

- ``--feature_generators`` (default: ``deeplc,ms2pip``): Specifies which feature generators to use
- ``--add_snr_feature_percolator`` (default: ``false``): Add signal-to-noise ratio features for identification rescoring
- ``--consider_modloss`` (default: ``false``): Include modification loss ions in MS2 model predictions (useful for phospho-peptide analysis)

**Technical Parameters**

- ``--calibration_set_size`` (default: ``0.15``): Percentage of data used as calibration set for DeepLC (15%)
- ``--ms2rescore_fragment_tolerance`` (default: ``0.05``): Fragment mass tolerance used for MS2PIP (in Da)

.. important:: For optimal results, ensure that your experimental data matches the properties of the selected MS2PIP model in terms of fragmentation method, mass analyzer, and peptide modifications. We recommend to use the parameter ``--find_best_model true`` and ``--force_model false`` which will automatically select the best model for your dataset. 

.. _supported-ms2pip-models:
Supported MS2PIP Models
-----------------------

quantms supports multiple pre-trained MS2PIP models optimized for different experimental conditions. For different experiments, the model can be specified using the ``--ms2_model`` parameter, and the ``--ms2_model_dir`` parameter can point to a local directory to avoid duplicate model downloads:

+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| Model        | Fragmentation method | MS2 mass analyzer                      | Peptide properties                                 |
+==============+======================+========================================+====================================================+
| HCD2019      | HCD                  | Orbitrap                               | Tryptic digest                                     |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| HCD2021      | HCD                  | Orbitrap                               | Tryptic / Chymotrypsin digest                      |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| CID          | CID                  | Linear ion trap                        | Tryptic digest                                     |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| iTRAQ        | HCD                  | Orbitrap                               | Tryptic digest, iTRAQ-labeled                      |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| iTRAQphospho | HCD                  | Orbitrap                               | Tryptic digest, iTRAQ-labeled, enriched for        |
|              |                      |                                        | phosphorylation                                    |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| TMT          | HCD                  | Orbitrap                               | Tryptic digest, TMT-labeled                        |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| TTOF5600     | CID                  | Quadrupole time-of-flight              | Tryptic digest                                     |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| HCDch2       | HCD                  | Orbitrap                               | Tryptic digest                                     |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| CIDch2       | CID                  | Linear ion trap                        | Tryptic digest                                     |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| Immuno-HCD   | HCD                  | Orbitrap                               | Immunopeptides                                     |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| CID-TMT      | CID                  | Linear ion trap                        | Tryptic digest, TMT-labeled                        |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| timsTOF2023  | CID                  | Ion mobility quadrupole time-of-flight | Tryptic-, elastase digest, immuno class 1          |
+--------------+----------------------+----------------------------------------+----------------------------------------------------+
| timsTOF2024  | CID                  | Ion mobility quadrupole time-of-flight | Tryptic-, elastase digest, immuno class 1 & class 2|
+--------------+----------------------+----------------------------------------+----------------------------------------------------+

.. note:: The complete feature mapping between quantms-rescoring feature names is available in the `quantms-rescoring documentation <https://github.com/bigbio/quantms-rescoring>`_.

Usage Examples
--------------

**Basic rescoring with default settings:**

.. code-block:: bash

   nextflow run nf-core/quantms --ms2rescore true --input samplesheet.csv --fasta database.fasta

**Advanced rescoring with custom model:**

.. code-block:: bash

   nextflow run nf-core/quantms \
     --ms2rescore true \
     --ms2_model TMT \
     --add_snr_feature_percolator true \
     --input samplesheet.csv \
     --fasta database.fasta

**Rescoring with local model directory:**

.. code-block:: bash

   nextflow run nf-core/quantms \
     --ms2rescore true \
     --ms2_model_dir /path/to/models \
     --force_model true \
     --input samplesheet.csv \
     --fasta database.fasta

Troubleshooting
---------------

**Feature generator failures**: This might result from incorrect model parameters or unsupported experimental types leading to poor model predictions. In such cases, MS2Rescore can be disabled using ``--ms2rescore false`` (default setting).

**Memory and performance**: For large datasets, consider:

- Using ``--rescore_range independent_run`` for better memory management
- Setting ``--ms2_model_dir`` to a local directory to avoid repeated downloads

Copilot

This comment was marked as outdated.

Copy link
Contributor

qodo-merge-pro bot commented Aug 15, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Validate claims and defaults

The new docs assert specific thresholds, defaults, and automatic model selection
behaviors (e.g., correlation cutoffs, calibration set sizes, feature counts,
--find_best_model true) that may not precisely match the current implementation.
Cross-verify these technical claims and parameter defaults with the actual
quantms/quantms-rescoring code and release versions, and explicitly note version
applicability to avoid misleading users.

Examples:

docs/rescoring.rst [26-28]
for MS2PIP, the tool automatically evaluates the user-specified model against a comprehensive library of fragmentation-specific models (HCD, CID variants) by computing Pearson correlations between predicted and observed spectra on a calibration set (typically 20% of high-scoring target PSMs), selecting the model that achieves the highest correlation above a 0.4 threshold while ensuring at least 70% of the calibration set exceeds a 0.6 correlation threshold. The system further implements **automatic tolerance optimization** by analyzing both user-provided and dataset-inferred tolerances, dynamically adjusting parameters when initial settings prove suboptimal for the data characteristics. 

- **For DeepLC optimization**, the tool employs a sophisticated benchmarking approach that creates separate instances using pretrained and retrained models, evaluates their performance on a 60% calibration subset by computing Mean Absolute Error (MAE) for retention time predictions, and selects the superior performer while maintaining per-run calibration to account for chromatographic drift. **The optimization process extends to feature validation** through correlation-based quality control mechanisms that ensure only high-performing models contribute to the final feature set, with configurable thresholds for correlation quality (default 70% of calibration PSMs above 0.6 correlation) and calibration set composition (default 20% of top-scoring targets). This multi-layered approach ensures that quantms-rescoring adapts automatically to diverse experimental conditions, fragmentation methods, and instrument configurations while maintaining rigorous quality standards for feature generation, ultimately providing optimized, validated features that maximize the discriminative power of downstream rescoring algorithms.
docs/rescoring.rst [116-139]
- ``--ms2rescore`` (default: ``false``): Enable/disable peptide identification rescoring with LC-MS predictors such as MS²PIP and DeepLC
- ``--rescore_range`` (default: ``independent_run``): Defines the scope of rescoring:  
  - ``independent_run``: Rescoring performed independently for each run
  - ``by_sample``: Rescoring performed at the sample level
  - ``by_project``: Rescoring performed across the entire project

**Model Configuration**

- ``--ms2_model`` (default: ``HCD2021``): Specifies which deep learning model to use for feature generation
- ``--ms2_model_dir`` (default: ``null``): Path to local MS2 prediction model files to avoid repeated downloads

 ... (clipped 13 lines)

Solution Walkthrough:

Before:

# In rescoring.rst, there are specific, unverified claims:

# Claim about calibration set size in description
... on a calibration set (typically 20% of high-scoring target PSMs) ...

# Different claim in parameter list
--calibration_set_size (default: 0.15): Percentage of data used ... (15%)

# Claim about feature counts
MS2PIP Features (70+ features)

# Claim about parameter defaults
--find_best_model (default: true)

After:

# All technical claims should be verified against the codebase.

# Consistent and verified calibration set size
... on a calibration set (typically <verified_%> of high-scoring target PSMs) ...
--calibration_set_size (default: <verified_value>): ... (<verified_%>).

# Verified feature counts
MS2PIP Features (<verified_number>+ features)

# Verified parameter defaults
--find_best_model (default: <verified_default>)
Suggestion importance[1-10]: 9

__

Why: This is a critical suggestion for a documentation PR, as it addresses the factual accuracy of numerous technical specifications, defaults, and algorithmic behaviors, which is fundamental to providing reliable user guidance.

High
General
Add newline after reference label
Suggestion Impact:A newline was added after the reference label before the "Supported MS2PIP Models" heading. Additionally, the anchor name was later changed to supported-ms2pip-models-table elsewhere, but the immediate addition at lines 142-146 reflects the suggested newline.

code diff:

-**Feature Generation**
+.. _supported-ms2pip-models:
+
+Supported MS2PIP Models
 

Keep the reference label isolated from the section title by adding a blank line
after the target. Without a separating newline, some Sphinx versions may not
correctly associate the anchor.

docs/rescoring.rst [142-144]

 .. _supported-ms2pip-models:
+
 Supported MS2PIP Models
 -----------------------

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly applies a reStructuredText best practice by adding a blank line after a reference label to ensure it is correctly associated with the following section, preventing potential broken links.

Medium
Verify correct :doc: target

Ensure the :doc: reference matches the new filename slug. If the document is
named 'rescoring.rst', reference it as :doc:rescoring without backticks or
change to :doc:/rescoring for absolute path to prevent broken links.

docs/identification.rst [83]

+quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`rescoring`.
 
-

[Suggestion processed]

Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify a change that is already correctly implemented in the PR, and the improved code is identical to the existing code, offering low value.

Low
Possible issue
Remove incompatible emoji
Suggestion Impact:The emoji was removed from the "Spectrum-Based Features" heading, matching the suggestion to use plain text.

code diff:

-Spectrum-Based Features 📊
+Spectrum-Based Features
 

Remove the emoji to avoid Sphinx/reStructuredText build or PDF/LaTeX rendering
issues. Use plain text headings to ensure compatibility across builders and
encodings.

docs/rescoring.rst [70-72]

-Spectrum-Based Features 📊
+Spectrum-Based Features
 
 Signal Quality Metrics

[Suggestion processed]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that using an emoji in a reStructuredText heading can cause build or rendering issues, and removing it improves documentation compatibility.

Low
  • Update

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
docs/rescoring.rst (4)

70-77: Remove emoji to keep docs consistent and avoid potential build font issues.

Minor, but emoji can cause rendering inconsistencies depending on fonts.

-Spectrum-Based Features 📊
+Spectrum-Based Features

140-141: Grammar/conciseness tweak in the important note.

Small improvement.

-.. important:: For optimal results, ensure that your experimental data matches the properties of the selected MS2PIP model in terms of fragmentation method, mass analyzer, and peptide modifications. We recommend to use the parameter ``--find_best_model true`` and ``--force_model false`` which will automatically select the best model for your dataset. 
+.. important:: For optimal results, ensure that your experimental data matches the properties of the selected MS2PIP model in terms of fragmentation method, mass analyzer, and peptide modifications. We recommend using ``--find_best_model true`` and ``--force_model false``, which automatically selects the best model for your dataset.

215-216: Update legacy naming: “MS2Rescore” → “rescoring”.

The page was renamed; keep terminology consistent.

-**Feature generator failures**: This might result from incorrect model parameters or unsupported experimental types leading to poor model predictions. In such cases, MS2Rescore can be disabled using ``--ms2rescore false`` (default setting).
+**Feature generator failures**: This might result from incorrect model parameters or unsupported experimental types leading to poor model predictions. In such cases, rescoring can be disabled using ``--ms2rescore false`` (default).

179-180: Consider linking directly to the exact feature mapping page.

If a stable URL exists in quantms-rescoring docs for the feature mapping, deep-linking would be more helpful than the repo root.

I can update the link once you provide the precise target path.

docs/identification.rst (1)

83-83: Minor grammar improvement.

“boost identification rate by” → “boost the identification rate via”.

-quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`rescoring`.
+quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost the identification rate via :doc:`rescoring`.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4c5f0a5 and 3cdd4ec.

⛔ Files ignored due to path filters (1)
  • docs/images/quantms-rescoring.png is excluded by !**/*.png
📒 Files selected for processing (3)
  • docs/identification.rst (2 hunks)
  • docs/ms2rescore.rst (0 hunks)
  • docs/rescoring.rst (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/ms2rescore.rst
🔇 Additional comments (5)
docs/rescoring.rst (5)

105-108: Clarify whether “spectrum” is also a selectable feature generator.

You mention spectrum quality modules conceptually above, but the allowed values for --feature_generators only list deeplc, ms2pip, alphapeptdeep. If “spectrum” (or similar) is configurable, document it; if it’s always on, say so.

I can propose a precise wording update once you confirm the intended usage and defaults.


1-20: Strong content and context.

Clear overview of multi-engine PSM inputs, Percolator, and the motivation for the rescoring framework. Good introduction and placement within the quantms ecosystem.


91-94: Good call-out of MS level constraint.

The explicit MS2-only support note will help users debug mismatches early.


31-34: Image path verified — no action required.

Verified: docs/images/quantms-rescoring.png (found at /home/jailuser/git/docs/images/quantms-rescoring.png)


148-178: Sanity-check results — table syntax OK; please build docs & verify timsTOF models

Quick summary: I checked docs/rescoring.rst — the grid table (lines 148–177) is syntactically valid (all '|' rows have the expected 5 pipes) and the iTRAQphospho multi-line cell is present. However, timsTOF2023/timsTOF2024 only appear in the docs; I could not confirm they are present/registered in the quantms-rescoring code or model manifest from the repo search.

Please verify the following:

  • docs/rescoring.rst
    • Supported Models summary: lines ~30–40
    • ms2_model defaults/options: lines ~124–126
    • Grid table: lines 148–177 (syntactically OK)
  • Action items for you:
    • Build the docs (Sphinx) locally and check rendered table alignment and any build warnings.
    • Confirm timsTOF2023 and timsTOF2024 are actually supported by quantms-rescoring (model manifest / source). If they are not present in the upstream model list, update/remove/correct the entries in docs/rescoring.rst.

ypriverol and others added 4 commits August 15, 2025 18:08
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
identifications for :doc:`comet` and :doc:`msgf` search engines. quantms supports multiple strategies (individual run/by samples/by projects) to re-scoring.
The former was shown to have less time consuming and comparable performance.
quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`ms2rescore`.
quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`rescoring`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Verify correct :doc: target

Suggested change
quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`rescoring`.
quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`rescoring`.

ypriverol and others added 6 commits August 15, 2025 18:13
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
docs/identification.rst (1)

15-15: Toctree: rename to “rescoring” looks correct.

The navigation will now point to the new page. Make sure all remaining references to ms2rescore are removed across the repo.

🧹 Nitpick comments (26)
docs/fdr.rst (2)

37-37: Be explicit about the “standard formula” or link to authoritative docs.

“a standard formula for FDR calculation” is vague. Either state the exact definition used here (e.g., target-decoy q-value estimation) or link to the OpenMS FalseDiscoveryRate/IDPosteriorErrorProbability docs.

Apply this wording to improve clarity:

-When using OpenMS' distribution-fitting approach, a standard formula for FDR calculation is used at the PSM level.
+When using OpenMS' distribution-fitting approach, target–decoy-based q-values are computed at the PSM level using OpenMS’ established procedures (see `FalseDiscoveryRate <https://www.openms.org/documentation/html/TOPP_FalseDiscoveryRate.html>`_ and `IDPosteriorErrorProbability <https://www.openms.org/documentation/html/TOPP_IDPosteriorErrorProbability.html>`_).

To verify link validity within the docs build context, you can also include local cross-refs if these tools are already described elsewhere in your docs.


39-42: Clarify scope of per-file PSM-level FDR and cross-reference protein-level section.

Small improvement: call out the implications of per-file PSM-level filtering and cross-link to the protein-level section for full-traceability.

-The FDR filtering at peptide spectrum match (PSM) level is currently always applied at the single file level.
-We argue that experiment-wide FDR control at the end of the workflow on the protein level is sufficient to limit error
-rates for the overall analysis. Nonetheless, an option for experiment-wide re-scoring and FDR control on PSM/peptide-level
-is under consideration.
+The FDR filtering at peptide spectrum match (PSM) level is currently applied per input file.
+In quantms, experiment-wide FDR control at the end of the workflow on the protein (group) level is considered sufficient to limit
+error rates for the overall analysis (see :ref:`Protein level <Protein level>`). An option for experiment-wide re-scoring and
+FDR control on PSM/peptide level is under consideration.
docs/quickstart.rst (1)

8-14: Make Nextflow install path-safe and ensure it’s on PATH

  • File: docs/quickstart.rst, Lines: 8–14
-   # 1) Install Nextflow (once)
-   curl -s https://get.nextflow.io | bash
-   mv nextflow ~/.local/bin/
+   # 1) Install Nextflow (once)
+   curl -s https://get.nextflow.io | bash
+   mkdir -p ~/.local/bin
+   mv nextflow ~/.local/bin/
+   # Ensure ~/.local/bin is on your PATH (add this to your shell profile if needed)
+   export PATH="$HOME/.local/bin:$PATH"

[optional_refactors_recommended]

docs/identification.rst (2)

83-83: Prefer consistent naming for MS2PIP across docs.

Elsewhere the project often uses “MS2PIP” (without superscript). Align naming to avoid searchability/consistency issues.

-quantms also introduces LC-MS predictors such as MS²PIP and DeepLC to boost identification rate by :doc:`rescoring`.
+quantms also introduces LC-MS predictors such as MS2PIP and DeepLC to boost identification rate by :doc:`rescoring`.

96-114: Verify --met_excision parameter, clarify SAGE support, and tighten wording

Please confirm the following before merging:

  • The CLI flag is indeed --met_excision (found in nextflow_schema.json and documented in docs/parameters.rst).
  • SAGE behavior: does it silently ignore this flag or return an error? Update the note accordingly.
  • The Database search <parameters.html#database-search> anchor resolves correctly.
 N-terminal methionine excision (met_excision)
 ---------------------------------------------

 - Parameter: ``--met_excision`` (default: ``true``)
 - Purpose: Accounts for co-translational removal of the initial N-terminal methionine from proteins. If enabled, the search allows peptides that start at the second amino acid position after Met removal, improving identification sensitivity when this biology is present.
 - Notes:
-  - Supported by Comet and MS-GF+ in the workflow.
-  - Currently not supported by SAGE; the setting may be ignored for SAGE searches.
+  - Supported by Comet and MS-GF+.
+  - Not supported by SAGE (flag will be ignored); please verify actual behavior.
 - When to adjust: Disable only if your samples do not exhibit N-terminal methionine excision and you want to minimally reduce the search space. In most proteomics data this excision is common, and keeping it enabled is recommended.

- See `Database search <parameters.html#database-search>`_ for enzyme specificity, termini, tolerances, and modification options.

Locations to review:

  • docs/identification.rst (lines 96–114)
  • docs/parameters.rst (entry for met_excision)
  • nextflow_schema.json (property "met_excision")
docs/capabilities.rst (1)

1-33: Confirmed: AlphaPeptDeep integration shipped in v1.6.0
Verified that the 1.6.0 release notes include “support AlphaPeptDeep model” (PR #571).

• Release notes check:
– 1.6.0 body contains “support AalphaPeptDeep model” entry (see bigbio/quantms#571)
• Optional refactor:
– In docs/capabilities.rst, replace each generic “Release X.Y.Z” link with a deep link to the corresponding section of the release notes to improve navigability.

[optional_refactors_recommended]

docs/troubleshooting.rst (2)

14-16: Clarify DIA decoy guidance

Consider adding “If your FASTA already contains decoys, they will be ignored for DIA runs” to make behavior explicit.


21-22: Minor developer UX nit: show exact commands

Include the commands to recreate the venv and install requirements to reduce ambiguity for new contributors.

Apply this diff:

- - If docs fail to build locally, recreate the venv and install requirements: see ``requirements.txt``.
+ - If docs fail to build locally, recreate the venv and install requirements:
+ 
+   .. code-block:: bash
+ 
+      python3 -m venv .venv
+      source .venv/bin/activate
+      pip install -U pip
+      pip install -r requirements.txt
docs/usage.rst (1)

48-53: Good: direct pull example is simplified

Removing “-r dev” is appropriate for stable usage. Consider pinning the documented release for reproducibility (e.g., “-r 1.6.0”) in at least one example.

Apply this diff:

-   nextflow run bigbio/quantms -profile test_lfq,<docker/singularity/podman/institute>
+   nextflow run bigbio/quantms -r 1.6.0 -profile test_lfq,<docker/singularity/podman/institute>
docs/nextflow_schema.json (9)

3-3: Schema $id should align with the published path

docs/setup_docs.py fetches from HEAD; this file advertises master. Align to avoid confusing tooling and links.

Apply this diff:

-    "$id": "https://raw.githubusercontent.com/bigbio/quantms/master/nextflow_schema.json",
+    "$id": "https://raw.githubusercontent.com/bigbio/quantms/HEAD/nextflow_schema.json",

296-301: Description vs default mismatch for fragment_mass_tolerance_unit

Description says default is 'ppm' but default is 'Da'. Fix the description to match the actual default.

Apply this diff:

-                    "description": "Fragment mass tolerance unit used for database search. Possible values are 'ppm' (default) and 'Da'.",
-                    "default": "Da",
+                    "description": "Fragment mass tolerance unit used for database search. Possible values are 'Da' (default) and 'ppm'.",
+                    "default": "Da",

135-156: Help text contradicts enum default for decoy method

Help text says “Default: pseudo-reverse peptides” but enum is ["reverse","shuffle"] and default is "reverse". Align help text.

Apply this diff:

-                    "help_text": "If decoys were not yet included in the input database, they have to be appended by OpenMS DecoyGenerator by adding this flag (TODO allow specifying generator type).\nDefault: pseudo-reverse peptides"
+                    "help_text": "If decoys were not yet included in the input database, they have to be appended by OpenMS DecoyGenerator by adding this flag (TODO allow specifying generator type).\nDefault: reverse (pseudo-reverse variant of reverse is used by some tools)."

333-336: Instrument values are not constrained; add enum and default

The description implies two choices. Add enum + default to improve validation and docs.

Apply this diff:

                 "instrument": {
                     "type": "string",
                     "description": "Type of instrument that generated the data. 'low_res' or 'high_res' (default; refers to LCQ and LTQ instruments)",
-                    "fa_icon": "fas fa-list-ol"
+                    "fa_icon": "fas fa-list-ol",
+                    "enum": ["low_res", "high_res"],
+                    "default": "high_res"
                 },

1104-1109: Typos in DIA-NN skip_preliminary_analysis description

Fix “insted” and “concensus”.

Apply this diff:

-                    "description": "Skip the preliminary analysis step, thus use the passed spectral library as-is insted of generating a local concensus library.",
+                    "description": "Skip the preliminary analysis step; use the passed spectral library as-is instead of generating a local consensus library.",

1137-1142: fa_icon uses boolean false — likely breaks icon rendering

Use a valid icon class or omit the key for random_preanalysis_seed.

Apply this diff:

-                    "fa_icon": false
+                    "fa_icon": "fas fa-dice"

1434-1438: Invalid Font Awesome class

Use “far fa-calendar” (or similar), not “far calendar”.

Apply this diff:

-                    "fa_icon": "far calendar",
+                    "fa_icon": "far fa-calendar",

31-37: Email regex is too restrictive for modern TLDs

The pattern limits TLDs to 2–5 chars; many valid TLDs are longer. Consider relaxing or documenting that validation is best-effort.

Would you like a PR to update to a more permissive pattern (e.g., allow 2–63 chars) consistent with nf-core schemas?


1038-1056: DIA-NN “scan_window” help text spacing

Tiny nit: leading space before “Ideally” in help_text.

Apply this diff:

-                    "help_text": " Ideally, should be approximately equal to the average number of data points per peak",
+                    "help_text": "Ideally, should be approximately equal to the average number of data points per peak",
docs/formats.rst (1)

83-84: Minor wording polish for local input hint

Tweak phrasing for clarity.

Apply this diff:

-.. important:: If you want to load local input spectra files instead of from SDRF or Experimental Design file, please set `--root_folder` and `--local_input_type` (default: mzML).
+.. important:: If you want to load local input spectra files instead of using paths from the SDRF or experimental design file, set ``--root_folder`` and ``--local_input_type`` (default: ``mzML``).
docs/protein_database.rst (3)

16-18: Prefix consistency with contaminants across docs

Here you recommend CONTAM_. In formats.rst the advice mentions CONTAMINANT_. Please align across docs to avoid confusion in downstream filtering logic.

Do you want me to raise a follow-up PR to standardize on CONTAM_ across the repository?


98-109: Typo: “entrainment only” → “entrapment only”

Fix wording in the DIA example.

Apply this diff:

- - DIA database generation (entrainment only; decoys handled by DIA-NN):
+ - DIA database generation (entrapment only; decoys handled by DIA-NN):

122-129: Nice: recommended FASTA naming

This helps reproducibility and provenance. Consider suggesting to store checksum (e.g., md5) alongside the FASTA.

docs/dia.rst (1)

42-42: Minor wording inconsistency.

The sentence mixes "developers" with "bioinformaticians" but could be streamlined for clarity.

Apply this diff to improve readability:

-The first step of the workflow translates the SDRF parameters into DIA-NN configuration parameters, including post-translation modifications, enzyme, etc. For developers and bioinformaticians, details can be found in `diann to parameters <https://github.com/bigbio/quantms/blob/dev/bin/prepare_diann_parameters.py>`_.
+The first step of the workflow translates the SDRF parameters into DIA-NN configuration parameters, including post-translation modifications, enzyme, etc. Implementation details can be found in `diann to parameters <https://github.com/bigbio/quantms/blob/dev/bin/prepare_diann_parameters.py>`_.
docs/setup_docs.py (1)

55-55: Fix ambiguous variable name.

The variable name l is flagged by the linter as ambiguous.

Apply this diff to use a more descriptive variable name:

-            rst_lines.extend(["    " + l for l in code_buffer])
+            rst_lines.extend(["    " + line for line in code_buffer])
docs/getting_started.rst (2)

26-26: Version reference should be updated.

The hardcoded version "1.6.0" in the test command may become outdated as new releases are made.

Consider using a placeholder or note about checking for the latest version:

-   nextflow run bigbio/quantms -r 1.6.0 -profile test_lfq,docker
+   nextflow run bigbio/quantms -r 1.6.0 -profile test_lfq,docker
+   # Note: Check https://github.com/bigbio/quantms/releases for the latest version

Or use a more generic approach:

-   nextflow run bigbio/quantms -r 1.6.0 -profile test_lfq,docker
+   nextflow run bigbio/quantms -r <latest_version> -profile test_lfq,docker

51-54: Version consistency in pilot example.

Same version reference consideration applies here as well.

Consider the same approach as suggested for the test command to handle version references consistently.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 04dbcbd and bd3117c.

📒 Files selected for processing (30)
  • .gitignore (1 hunks)
  • docs/capabilities.rst (1 hunks)
  • docs/comparison.rst (1 hunks)
  • docs/conf.py (1 hunks)
  • docs/contact.rst (1 hunks)
  • docs/dev.rst (1 hunks)
  • docs/dia.rst (2 hunks)
  • docs/fdr.rst (1 hunks)
  • docs/formats.rst (5 hunks)
  • docs/getting_started.rst (1 hunks)
  • docs/glossary.rst (1 hunks)
  • docs/identification.rst (3 hunks)
  • docs/idpep.rst (1 hunks)
  • docs/index.rst (3 hunks)
  • docs/inference.rst (4 hunks)
  • docs/inputs_outputs.rst (1 hunks)
  • docs/introduction.rst (0 hunks)
  • docs/iso.rst (1 hunks)
  • docs/lfq.rst (1 hunks)
  • docs/msstats.rst (2 hunks)
  • docs/nextflow_schema.json (1 hunks)
  • docs/percolator.rst (1 hunks)
  • docs/protein_database.rst (1 hunks)
  • docs/quantms_output.rst (1 hunks)
  • docs/quickstart.rst (1 hunks)
  • docs/rescoring.rst (1 hunks)
  • docs/setup_docs.py (2 hunks)
  • docs/troubleshooting.rst (1 hunks)
  • docs/tutorials.rst (1 hunks)
  • docs/usage.rst (3 hunks)
💤 Files with no reviewable changes (1)
  • docs/introduction.rst
✅ Files skipped from review due to trivial changes (13)
  • .gitignore
  • docs/inputs_outputs.rst
  • docs/tutorials.rst
  • docs/percolator.rst
  • docs/comparison.rst
  • docs/glossary.rst
  • docs/dev.rst
  • docs/conf.py
  • docs/quantms_output.rst
  • docs/contact.rst
  • docs/msstats.rst
  • docs/idpep.rst
  • docs/iso.rst
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/rescoring.rst
🧰 Additional context used
🪛 Ruff (0.12.2)
docs/setup_docs.py

55-55: Ambiguous variable name: l

(E741)

🔇 Additional comments (35)
docs/lfq.rst (1)

66-70: Confirm LFQ anchor in parameters docs

I couldn’t locate a “Protein Quantification (LFQ)” section or a .. _protein-quantification-lfq: label in docs/parameters.rst, and no matching HTML anchor was found. Please manually verify that:

  • docs/parameters.rst contains a heading that Sphinx will render as
    <hX id="protein-quantification-lfq">Protein Quantification (LFQ)</hX>
  • The built parameters.html exposes an element with id="protein-quantification-lfq" so the cross-reference resolves correctly.
docs/troubleshooting.rst (1)

7-9: Solid, actionable checks — LGTM

Nice concise guidance on PATH, container runtime sanity checks, and Java version requirements.

docs/usage.rst (2)

60-63: Nice wording improvements

Clearer sentence structure and corrected grammar — looks good.


81-85: Conda deprecation warning placement — LGTM

Moving the warning to the Usage section reads better than top-of-doc.

docs/formats.rst (4)

8-11: Good: explicit anchor and double-underscore link for mzTab

Improves cross-referencing consistency.


137-143: Good: consensusXML anchor and link target

This will help other pages reference the section reliably.


147-153: Helpful addition: tab-based formats section

Nice addition to guide users to TSV outputs from OpenMS tools.


156-156: Security nit: HTTPS badge — good fix

Switching to HTTPS for the Slack badge is appreciated.

docs/protein_database.rst (2)

33-38: Good guidance + curated links

Clear recommendation to use UniProt-backed contaminants and why that helps DIA-NN gene grouping.


43-46: DDA vs DIA decoy handling — LGTM

Clear and accurate: decoys inside workflow for DDA; skip for DIA as DIA-NN handles decoys internally.

docs/dia.rst (5)

91-92: Excellent addition for DIA-NN users.

The note about using contaminants with UniProt descriptions containing gene names is valuable practical guidance that will help users avoid gene grouping issues in DIA-NN. The cross-reference to the protein database documentation and test datasets is helpful.


96-96: Good standardization of tool naming.

Consistently capitalizing "DIA converter" improves documentation consistency.


102-102: Mathematical notation improvement.

The formatting change from "1-Q.value" to "1 - Q.value" improves mathematical readability.


104-108: Well-structured parameter documentation.

The new "Relevant parameters" section with comprehensive DIA-NN options improves the documentation's usability by providing clear parameter guidance in one place.


109-115: Important technical constraint clearly documented.

The "Important technical notes" section properly warns users about the Conda limitation, which is crucial for deployment planning.

docs/inference.rst (9)

14-14: Grammar correction.

Fixed the subject-verb agreement issue.


18-19: Improved sentence structure.

The corrections improve readability and flow while maintaining the original meaning.


21-21: Clearer method description.

The restructured sentence better explains the two methods and their relationship.


31-31: More concise parameter instruction.

The simplified wording is clearer and more direct.


41-42: Enhanced parameter documentation.

The expanded description of aggregation score methods and clearer parameter guidance improves usability.


47-53: Better parameter explanations.

The corrections to grammar and the clarified examples (like "top_PSMs 3") make the parameter documentation more practical and usable.


61-63: Mathematical formula clarity.

Adding the explicit FDR formula enhances the technical documentation's precision.


66-79: Comprehensive technical explanation.

The expanded note about "picked" TDS provides valuable technical context that helps users understand why this method is recommended and how it addresses common issues in protein FDR estimation.


93-93: Citation update.

The reference update to [SAV2015a] aligns with the citation used in the text.

docs/setup_docs.py (7)

6-8: LGTM!

The new SCHEMA_URL constant is properly defined and follows the same pattern as OUTPUT_URL.


37-100: Well-implemented Markdown to RST converter.

The _md_to_rst function handles the essential conversion needs for the documentation workflow. The implementation correctly processes headings, code blocks, links, and anchors with appropriate RST formatting.


102-133: Enhanced download_output function with conversion.

The updated function now properly handles Markdown-to-RST conversion and includes appropriate cleanup of temporary files. The logic for handling different file extensions is sound.


134-171: Robust parameter collection logic.

The _collect_params function effectively traverses JSON schema structures to extract parameter metadata. The recursive approach and heuristic for identifying leaf parameters is well-designed.


173-257: Comprehensive schema-to-RST renderer.

The _schema_to_rst function generates well-formatted RST documentation from Nextflow schema. The grouping logic, sanitization of descriptions, and handling of different data types is thorough.


259-278: Good error handling in parameter generation.

The function properly handles download and parsing failures with appropriate warnings, ensuring the documentation build doesn't fail completely if the schema is unavailable.


280-282: Main execution flow updated appropriately.

The addition of generate_parameters() call ensures both output and parameter documentation are generated when the script runs.

docs/index.rst (3)

16-39: Excellent documentation restructuring.

The reorganized toctree creates a logical flow from quickstart/getting started through tutorials, technical documentation, and support resources. The new entries (quickstart, getting_started, tutorials, parameters, inputs_outputs, protein_database, comparison, capabilities, glossary, troubleshooting) significantly enhance the documentation's usability.


47-47: Security improvement with HTTPS.

Updated the Slack badge URL from HTTP to HTTPS for better security.


58-63: Valuable addition for new users.

The "What's new and capabilities" section provides useful entry points for users to understand quantms capabilities and compare it with other tools. This helps with user onboarding and tool evaluation.

docs/getting_started.rst (1)

1-64: Excellent getting started guide.

This comprehensive guide provides a logical progression from installation through first runs. Key strengths include:

  • Clear prerequisites and installation instructions
  • Built-in test validation step
  • Analysis path guidance for different data types
  • Practical pilot run example
  • Well-organized next steps with relevant cross-references

The structure and content will significantly help new users get started with quantms effectively.

Comment on lines +919 to +933
"description": "Add the log2 ratios of the abundance values to the output.",
"default": "false",
"fa_icon": "fas fa-list-ol"
},
"normalize": {
"type": "boolean",
"description": "Scale peptide abundances so that medians of all samples are equal.(Default false)",
"default": "false",
"fa_icon": "far fa-check-square"
},
"fix_peptides": {
"type": "boolean",
"description": "Use the same peptides for protein quantification across all samples.(Default false)",
"default": "false",
"fa_icon": "fas fa-bug"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Boolean defaults set as strings — will break validation

These three boolean properties use string defaults ("false") instead of boolean false. This can cause schema validation and UI rendering issues.

Apply this diff:

                 "ratios": {
                     "type": "boolean",
                     "description": "Add the log2 ratios of the abundance values to the output.",
-                    "default": "false",
+                    "default": false,
                     "fa_icon": "fas fa-list-ol"
                 },
                 "normalize": {
                     "type": "boolean",
                     "description": "Scale peptide abundances so that medians of all samples are equal.(Default false)",
-                    "default": "false",
+                    "default": false,
                     "fa_icon": "far fa-check-square"
                 },
                 "fix_peptides": {
                     "type": "boolean",
                     "description": "Use the same peptides for protein quantification across all samples.(Default false)",
-                    "default": "false",
+                    "default": false,
                     "fa_icon": "fas fa-bug"
                 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"description": "Add the log2 ratios of the abundance values to the output.",
"default": "false",
"fa_icon": "fas fa-list-ol"
},
"normalize": {
"type": "boolean",
"description": "Scale peptide abundances so that medians of all samples are equal.(Default false)",
"default": "false",
"fa_icon": "far fa-check-square"
},
"fix_peptides": {
"type": "boolean",
"description": "Use the same peptides for protein quantification across all samples.(Default false)",
"default": "false",
"fa_icon": "fas fa-bug"
"ratios": {
"type": "boolean",
"description": "Add the log2 ratios of the abundance values to the output.",
"default": false,
"fa_icon": "fas fa-list-ol"
},
"normalize": {
"type": "boolean",
"description": "Scale peptide abundances so that medians of all samples are equal.(Default false)",
"default": false,
"fa_icon": "far fa-check-square"
},
"fix_peptides": {
"type": "boolean",
"description": "Use the same peptides for protein quantification across all samples.(Default false)",
"default": false,
"fa_icon": "fas fa-bug"
},
🤖 Prompt for AI Agents
In docs/nextflow_schema.json around lines 919 to 933 the boolean properties
"normalize", "fix_peptides" (and the preceding boolean property) use string
defaults ("false") instead of boolean false; change the "default" values from
the string "false" to the boolean false for each boolean property so the JSON
schema uses true boolean types (no quotes) to fix validation and UI rendering
issues.

@ypriverol ypriverol requested a review from Copilot August 16, 2025 16:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR significantly enhances the quantms documentation by introducing comprehensive content improvements and reorganization. The main purpose is to replace the basic MS2Rescore documentation with a thorough rescoring framework guide while adding essential new documentation pages to improve user experience and accessibility.

Key changes include:

  • Comprehensive rewrite of rescoring documentation with technical specifications and usage examples
  • Addition of multiple new user-facing documentation pages (quickstart, tutorials, troubleshooting, etc.)
  • Implementation of parameter generation from Nextflow schema for better maintainability

Reviewed Changes

Copilot reviewed 31 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
rescoring.rst New comprehensive rescoring documentation replacing ms2rescore.rst
quickstart.rst New 5-minute quickstart guide for new users
tutorials.rst New step-by-step tutorials for different user experience levels
troubleshooting.rst New troubleshooting guide for common issues
setup_docs.py Enhanced documentation build system with parameter generation
protein_database.rst New comprehensive protein database preparation guide
quantms_output.rst New detailed output structure documentation
getting_started.rst New comprehensive getting started guide
parameters.rst New auto-generated parameter reference from schema
nextflow_schema.json Added Nextflow parameter schema for documentation generation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

f.write(rst_text)

# Remove temporary md file to avoid duplicate docnames
try:
Copy link

Copilot AI Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling logic could be improved by catching more specific exceptions and providing clearer error messages. Consider catching requests.exceptions.RequestException and json.JSONDecodeError separately for better debugging.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (18)
docs/tutorials.rst (2)

206-207: Anchor fragment may be brittle; add an explicit section label for stable intra-page links

The link to #beginner-dda-label-free-quantification relies on Sphinx’s auto-generated ID. Emojis/punctuation in the heading can alter the slug unexpectedly. Add a reST label above the heading to ensure a stable target.

Apply:

+.. _beginner-dda-label-free-quantification:
+
 🌱 Beginner: DDA Label-Free Quantification
 -------------------------------------------

11-35: Heavy use of raw HTML reduces portability (PDF/epub builders) and bypasses Sphinx features

Raw HTML is fine for HTML builds, but it won’t render for other builders and bypasses linkcheck and theming. Consider replacing these grids/cards with sphinx-design (Grid, Card, and Button components) to keep the layout while preserving Sphinx link semantics.

I can propose a sphinx-design rewrite of one block as a template if you want to trial it.

docs/_static/custom.css (2)

77-92: Respect reduced motion preferences

There are multiple transitions/animations (hover lift, shimmer on section headers). Add a prefers-reduced-motion guard for accessibility.

Apply:

 @media (max-width: 768px) {
   .nav-grid {
     grid-template-columns: 1fr;
     gap: 8px;
   }
   ...
 }
+
+/* Accessibility: reduce motion for users preferring less animation */
+@media (prefers-reduced-motion: reduce) {
+  .nav-button,
+  .parameter-table tr:hover,
+  .section-header::before {
+    transition: none !important;
+    transform: none !important;
+  }
+}

30-33: Hover background color may hurt readability of code text within rows

On .parameter-table tr:hover you darken the row background; ensure sufficient contrast for inline code and text. Consider a subtler hover or skip hover on small tables.

If you see readability issues, reduce the hover shade:

-.parameter-table tr:hover {
-    background-color: #e2e8f0;
-    transition: background-color 0.2s ease;
-}
+.parameter-table tr:hover {
+    background-color: #eef2f7;
+    transition: background-color 0.2s ease;
+}
docs/conf.py (1)

25-26: Reduce version string duplication across docs using a substitution

1.6.0 appears hard-coded in multiple pages. Define a reST substitution from release so pages can use |quantms_version| instead of manual updates.

Apply:

 # The full version, including alpha/beta/rc tags
 release = '1.6.0'
+
+# Make the release usable as a substitution in .rst files: |quantms_version|
+rst_prolog = f"""
+.. |quantms_version| replace:: {release}
+"""

Then, replace occurrences of 1.6.0 in commands with |quantms_version|.

docs/getting_started.rst (2)

28-31: Harden Nextflow install snippet (permissions and PATH)

The snippet assumes ~/.local/bin exists and is on PATH. Add mkdir and PATH export; although Nextflow is executable after download, being explicit helps. Also retain chmod for clarity.

-            curl -s https://get.nextflow.io | bash<br>
-            mv nextflow ~/.local/bin/
+            curl -s https://get.nextflow.io | bash<br>
+            chmod +x nextflow<br>
+            mkdir -p ~/.local/bin<br>
+            mv nextflow ~/.local/bin/<br>
+            export PATH="$HOME/.local/bin:$PATH"

127-133: Avoid hard-coding the release; use a substitution

The Nextflow command pins -r 1.6.0. If you adopt |quantms_version| as suggested in conf.py, you can reduce duplication and update risk.

-   nextflow run bigbio/quantms -r 1.6.0 \
+   nextflow run bigbio/quantms -r |quantms_version| \
docs/quickstart.rst (4)

19-21: Harden Nextflow install snippet (permissions and PATH)

Same note as in Getting started: ensure executable bit, destination exists, and PATH is updated.

-         curl -s https://get.nextflow.io | bash<br>
-         mv nextflow ~/.local/bin/
+         curl -s https://get.nextflow.io | bash<br>
+         chmod +x nextflow<br>
+         mkdir -p ~/.local/bin<br>
+         mv nextflow ~/.local/bin/<br>
+         export PATH="$HOME/.local/bin:$PATH"

32-33: Avoid hard-coding the release; use a substitution

To keep versions in sync, prefer |quantms_version|.

-         nextflow run bigbio/quantms -r 1.6.0 -profile test_lfq,docker
+         nextflow run bigbio/quantms -r |quantms_version| -profile test_lfq,docker

98-103: Avoid hard-coding the release in user-facing commands

Use the |quantms_version| substitution once defined in conf.py.

-         nextflow run bigbio/quantms -r 1.6.0 \<br>
+         nextflow run bigbio/quantms -r |quantms_version| \<br>

68-88: Use :doc: roles instead of raw .html where possible

The raw HTML link to quantms_output.html is brittle. Prefer :doc:quantms_output (outside raw) or ensure the target exists and matches the generator’s output filename.

If keeping raw HTML, run the link verification script (see tutorials.rst comment) to ensure quantms_output exists.

docs/index.rst (7)

11-14: Add alt text to the image for accessibility and better SEO

The image lacks an alt attribute. Adding alt text improves accessibility (screen readers) and SEO.

Apply this diff:

 .. image:: images/quantms.png
    :width: 600
+   :alt: quantms logo
    :align: center

4-9: Heavy inline HTML/CSS: prefer Sphinx-native constructs (or sphinx-design) and add non-HTML fallbacks

The hero banner is HTML-only with inline styles. This won’t render on non-HTML builders (e.g., LaTeX/PDF) and is harder to theme/translate. Consider using Sphinx-native markup or the sphinx-design extension (cards, grids), and provide an RST fallback via only:: not html.

Example fallback (outside the raw block):

.. only:: not html

   Welcome to quantms
   -------------------
   A comprehensive, reproducible workflow for mass spectrometry-based proteomics quantification.

If sphinx-design is enabled, you can replace raw HTML with sd-* components for cleaner theming.


15-36: Internal links inside raw HTML are brittle (.html); prefer :doc: links or ensure equivalent RST navigation exists

The “Quick Start / Learn the Basics / Step-by-Step” cards use links. These are fragile across builders and URL schemes. Prefer :doc:... in RST or sphinx-design components; at minimum, provide an RST-based duplicate navigation outside the raw block so non-HTML builds aren’t empty.

If keeping raw HTML, ensure each target page exists and is included in a toctree to guarantee it’s built.


66-128: Prefer Sphinx-native links over hardcoded .html and provide non-HTML fallbacks

The Documentation/Analysis/Advanced/Tools/Support/Development grids use raw HTML and .html links. Consider Sphinx-native roles (:doc:, :ref:) or sphinx-design cards for portability and theming; add RST fallbacks under only:: not html.

If keeping raw HTML, double-check each target has a corresponding built page (present in a toctree).


130-158: Include iso and lfq in the toctree to ensure pages are built

Given the homepage links to iso.html and lfq.html, ensure these pages are built by Sphinx by adding them to the toctree.

Apply this diff:

    dia
+
+   iso
+   lfq
    statistics

159-159: Remove stray '|' line

This lone pipe likely renders undesirably (line blocks) and appears to be a leftover artifact.

Apply this diff:

-|
+

176-181: Use :doc: for internal links instead of hardcoded .html

Internal links should avoid .html suffixes for portability across builders and URL changes.

Apply this diff:

-- See a concise overview of what quantms can do and when features were introduced in `Capabilities <capabilities.html>`_.
-- For a quick orientation on how quantms compares to other tools, check `Comparison <comparison.html>`_.
+- See a concise overview of what quantms can do and when features were introduced in :doc:`Capabilities <capabilities>`.
+- For a quick orientation on how quantms compares to other tools, check :doc:`Comparison <comparison>`.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bd3117c and 1996f4e.

📒 Files selected for processing (7)
  • docs/_static/custom.css (1 hunks)
  • docs/conf.py (2 hunks)
  • docs/getting_started.rst (1 hunks)
  • docs/index.rst (3 hunks)
  • docs/quantms_output.rst (1 hunks)
  • docs/quickstart.rst (1 hunks)
  • docs/tutorials.rst (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/quantms_output.rst
🔇 Additional comments (4)
docs/conf.py (2)

26-26: LGTM: Release bump to 1.6.0

Release bump aligns with the docs content and commands referencing 1.6.0.


63-66: LGTM: Including custom CSS via html_css_files

custom.css under _static will be correctly picked up by Sphinx. No issues.

docs/quickstart.rst (1)

32-35: Make success message consistent with actual pipeline output

The message says: Pipeline completed successfully!. Verify the exact wording produced by the pipeline so users can match it precisely.

If it differs (e.g., missing exclamation or different casing), update the text accordingly.

docs/index.rst (1)

165-167: Switch to HTTPS for Slack badge looks good

Thanks for updating to HTTPS; safer and consistent with modern best practices.


<div style="background: #f8fafc; padding: 20px; border-radius: 8px; margin: 20px 0;">
<h3 style="margin: 0 0 15px 0; color: #475569;">🔬 What type of mass spectrometry data do you have?</h3>
<div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 15px;">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify target pages exist for all doc references

This page references multiple documents via :doc: (getting_started, protein_database, quantms_output, msstats, troubleshooting, glossary, faq, contact, usage, formats, parameters, pmultiqc, tutorials, lfq, dda, dia, iso). Ensure all exist to avoid linkcheck failures.

Run:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(docs/getting_started.rst)

fd -e rst | sed -E 's|^\./||; s|\.rst$||' > /tmp/all_rst_docs.txt

rg -nP --no-heading -o ':(?:doc):`([a-zA-Z0-9_/\-]+)`' "${files[@]}" \
  | awk -F'`' '{print $2}' | sort -u > /tmp/doc_roles.txt || true

echo "Missing doc targets from getting_started.rst:"
comm -23 /tmp/doc_roles.txt /tmp/all_rst_docs.txt || true

Length of output: 436


Verify missing doc targets in docs/getting_started.rst

The following :doc: references in docs/getting_started.rst have no corresponding .rst files and will break link checks:

  • contact
  • faq
  • formats
  • glossary
  • protein_database
  • troubleshooting
  • usage

Please add these pages (e.g. docs/contact.rst) or remove/update the references to avoid broken links.

🤖 Prompt for AI Agents
In docs/getting_started.rst around line 61, the :doc: references listed
(contact, faq, formats, glossary, protein_database, troubleshooting, usage)
point to missing .rst files and will break link checks; either create the
corresponding files (e.g., docs/contact.rst, docs/faq.rst, etc.) with
appropriate content and add them to the table of contents if needed, or
remove/update the :doc: references in getting_started.rst to point to existing
pages or external URLs so the links resolve cleanly.

Comment on lines +38 to +65
.. raw:: html

<div style="background: #e8f4fd; padding: 20px; border-radius: 8px; margin: 20px 0;">
<h3 style="margin: 0 0 15px 0; color: #0c5aa6;">🔬 What type of data do you have?</h3>
<div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px;">
<div style="background: white; padding: 15px; border-radius: 6px; text-align: center;">
<strong>DDA</strong><br>
<small>Data-Dependent Acquisition</small><br>
<a href="dda.html" style="color: #0c5aa6;">→ DDA Analysis</a>
</div>
<div style="background: white; padding: 15px; border-radius: 6px; text-align: center;">
<strong>DIA</strong><br>
<small>Data-Independent Acquisition</small><br>
<a href="dia.html" style="color: #0c5aa6;">→ DIA Analysis</a>
</div>
<div style="background: white; padding: 15px; border-radius: 6px; text-align: center;">
<strong>TMT/iTRAQ</strong><br>
<small>Isobaric Labeling</small><br>
<a href="iso.html" style="color: #0c5aa6;">→ Isobaric Analysis</a>
</div>
<div style="background: white; padding: 15px; border-radius: 6px; text-align: center;">
<strong>Label-Free</strong><br>
<small>LFQ Quantification</small><br>
<a href="lfq.html" style="color: #0c5aa6;">→ LFQ Analysis</a>
</div>
</div>
</div>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Potential broken links: iso.html and lfq.html pages aren’t in the toctree

This grid links to iso.html and lfq.html, but those documents are not listed in the toctree below. If they’re not referenced elsewhere, Sphinx may not build them, resulting in broken links.

  • Add iso and lfq to the toctree (see diff in the toctree comment below).
  • Alternatively, confirm there are docs/iso/index.rst and docs/lfq/index.rst included by another toctree.

Run this script to verify the presence of the linked docs:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List of doc basenames to verify (toctree + links)
docs_to_check=(
  quickstart getting_started tutorials usage introduction preprocessing identification
  protein_database dda dia statistics parameters inputs_outputs pmultiqc benchmarks
  comparison capabilities glossary troubleshooting debug faq presentations dev contact
  iso lfq
)

missing=0
for d in "${docs_to_check[@]}"; do
  if fd -a -t f "docs/${d}.rst" >/dev/null 2>&1 || fd -a -t f "docs/${d}/index.rst" >/dev/null 2>&1; then
    printf "OK   %s\n" "$d"
  else
    printf "MISS %s (expected docs/%s.rst or docs/%s/index.rst)\n" "$d" "$d" "$d"
    missing=1
  fi
done

exit $missing

Length of output: 3204


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- toctree in docs/index.rst ----"
rg -nP '^\s*\.\. toctree::' -n docs/index.rst -C5

Length of output: 378


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Entries with iso or lfq at indentation (possible toctree entries):"
rg -nP '^\s{3,}(iso|lfq)$' -n docs

echo
echo "Contexts around toctree directives in all docs:"
rg -nP '^\s*\.\. toctree::' -n docs -C5 | sed 's/\x1b\[[0-9;]*m//g'

Length of output: 2464


Fix missing Sphinx toctree entries for iso.html and lfq.html

The homepage grid links to iso.html and lfq.html, but although docs/iso.rst and docs/lfq.rst exist (and are referenced in the hidden toctree of docs/dda.rst at lines 29–34), docs/dda.rst itself isn’t included in the master toctree in docs/index.rst (lines 130–135). As a result, Sphinx will not build the DDA page or its nested iso and lfq pages, leading to broken links on the homepage.

• Confirmed files:

  • docs/iso.rst → exists
  • docs/lfq.rst → exists

• Current master toctree in docs/index.rst (lines 130–135):

.. toctree::
   :maxdepth: 1
   :hidden:

   quickstart
   getting_started

• To resolve, update docs/index.rst:
– Option A: Pull in the entire DDA section (and its nested iso/lfq) by adding:
```diff
.. toctree::
:maxdepth: 1
:hidden:

    quickstart
    getting_started
  + dda
```  

– Option B: Explicitly list the pages you link to:
```diff
.. toctree::
:maxdepth: 1
:hidden:

    quickstart
    getting_started
  + iso
  + lfq
```  

This will ensure Sphinx builds iso.html and lfq.html (and dda.html) so the homepage links resolve.

🤖 Prompt for AI Agents
In docs/index.rst around lines 38 to 65 and update the master toctree at ~lines
130–135: the homepage links to iso.html and lfq.html but Sphinx won't build them
because dda.rst (which contains a hidden toctree referencing iso and lfq) is not
included in the master toctree; fix by editing docs/index.rst to either add
"dda" to the master toctree so its nested iso and lfq pages are built, or
explicitly add "iso" and "lfq" (and optionally "dda") as entries in the master
toctree so Sphinx generates iso.html and lfq.html and the homepage links
resolve.

Comment on lines +171 to +188
<div style="background: #f0f9ff; padding: 15px; border-radius: 6px; border: 1px solid #e2e8f0;">
<strong style="color: #3b82f6;">🧠 Percolator</strong><br>
<small>Semi-supervised machine learning</small><br>
<a href="percolator.html" style="color: #3b82f6;">→ Setup guide</a>
</div>

<div style="background: #f0fdf4; padding: 15px; border-radius: 6px; border: 1px solid #e2e8f0;">
<strong style="color: #22c55e;">🔬 MS²PIP</strong><br>
<small>Deep learning models</small><br>
<a href="rescoring.html" style="color: #22c55e;">→ Model selection</a>
</div>

<div style="background: #fef3c7; padding: 15px; border-radius: 6px; border: 1px solid #e2e8f0;">
<strong style="color: #f59e0b;">⚡ IDPEP</strong><br>
<small>Retention time prediction</small><br>
<a href="idpep.html" style="color: #f59e0b;">→ Configuration</a>
</div>
</div>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Avoid hard-coding .html links in raw HTML; prefer Sphinx-aware links to prevent broken references

Raw HTML anchors to percolator.html, rescoring.html, and idpep.html bypass Sphinx’s link resolution and linkcheck, and won’t be handled by extensions like sphinx_new_tab_link. If any page is renamed or moved, these will silently break (and non-HTML builders won’t resolve them).

Suggested approaches (any of these):

  • Replace raw HTML blocks with reST links using :doc: roles (best for portability).
  • If you must keep raw HTML layout, consider switching to sphinx-design for cards, which supports reST links.
  • At minimum, verify those targets exist in this build and keep names consistent.

Run this to verify targets exist:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Collect Sphinx docnames present in the repo (strip .rst)
fd -e rst | sed -E 's|^\./||; s|\.rst$||' > /tmp/all_rst_docs.txt

# Extract :doc:`name` references
rg -nP --no-heading -o ':(?:doc):`([a-zA-Z0-9_/\-]+)`' docs/tutorials.rst | awk -F'`' '{print $2}' > /tmp/doc_roles.txt || true

# Extract raw HTML hrefs to .html, drop fragments and extension
rg -nP --no-heading -o 'href="([a-zA-Z0-9_\-/]+)\.html(?:#[^"]*)?"' docs/tutorials.rst \
  | sed -E 's/.*href="([^"]+)".*/\1/' > /tmp/html_refs.txt || true

# Combine and dedupe references
cat /tmp/doc_roles.txt /tmp/html_refs.txt 2>/dev/null | sort -u > /tmp/all_refs.txt || true

echo "Checking missing docs referenced from tutorials.rst..."
comm -23 /tmp/all_refs.txt /tmp/all_rst_docs.txt || true

Length of output: 791


Broken links in docs/tutorials.rst: missing targets for percolator.html, rescoring.html, idpep.html

The raw HTML anchors on lines 171–188 point to percolator.html, rescoring.html, and idpep.html, but there are no matching .rst source files in the repo—these links will 404 and bypass Sphinx’s linkchecker and extensions.

To fix:

  • docs/tutorials.rst (171–188): these <a href="…"> targets don’t exist. Either add/rename the corresponding docs/percolator.rst, docs/rescoring.rst, and docs/idpep.rst files, or…
  • Replace the raw HTML anchors with Sphinx-aware links, for example:
    .. container:: card
    
       **🧠 Percolator**  
       Semi-supervised machine learning  
       :doc:`→ Setup guide <percolator>`
  • If you prefer the current card layout, consider using the sphinx-design card directive, which supports ReST link roles.
🤖 Prompt for AI Agents
In docs/tutorials.rst around lines 171–188 the raw HTML anchors link to
percolator.html, rescoring.html, and idpep.html which do not exist and will 404;
replace these raw <a href="..."> anchors with Sphinx-aware links or create the
missing .rst pages. Fix option A: create docs/percolator.rst,
docs/rescoring.rst, and docs/idpep.rst (with proper toctree inclusion) so the
.html targets exist. Fix option B (preferred): convert the HTML card blocks to
ReST/Sphinx syntax (or use sphinx-design card directive) and use :doc:`→ Setup
guide <percolator>`, :doc:`→ Model selection <rescoring>`, and :doc:`→
Configuration <idpep>` link roles so Sphinx resolves them; ensure the referenced
doc names match the filenames without .rst and update toctree if needed.

ypriverol and others added 4 commits August 16, 2025 17:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@ypriverol ypriverol merged commit 82455e2 into main Aug 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant