perf(flashtnt): load only the selected proteoform's scan-scoped data by t0mdavid-m · Pull Request #77 · OpenMS/FLASHApp

t0mdavid-m · 2026-05-28T13:22:01Z

Selecting a proteoform now resolves its scan and filters the spectra, mass table, and tag table to that scan instead of shipping every scan's data to the browser. sequence_data is stored one row per proteoform (sequence_data.pq) and the Sequence View pushdown-loads only the selected proteoform's row, replacing the ~40s monolithic load.

Summary by CodeRabbit

New Features
- Persistent, efficient storage for sequence data enabling faster access and selective reads.
- Proteoform→scan mapping to power per-proteoform selections and per-scan views.
Refactor
- Unified scan-scoped data loading so proteoform mapping and sequence data attach consistently.
- Selection/filtering now drives per-scan and tag views from mapped proteoform entries for more accurate displays.

Selecting a proteoform now resolves its scan and filters the spectra, mass table, and tag table to that scan instead of shipping every scan's data to the browser. sequence_data is stored one row per proteoform (sequence_data.pq) and the Sequence View pushdown-loads only the selected proteoform's row, replacing the ~40s monolithic load. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-28T13:29:50Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 869a53c9-f711-4e50-8c58-4bc7bcd324e7

📥 Commits

Reviewing files that changed from the base of the PR and between 0faee0a and b0bda5a.

📒 Files selected for processing (3)

src/parse/tag_resolution.py
src/parse/tnt.py
src/render/update.py

📝 Walkthrough

Walkthrough

This PR replaces generic sequence_data storage with a PyArrow Parquet-backed store, adds tag-space→proteoform and proteoform→scan mappings, persists sequence_data at parse time, and refactors initialization and runtime filtering to load per-proteoform entries for flashtnt.

Changes

PyArrow Sequence Data Persistence and Scan Filtering

Layer / File(s)	Summary
Sequence Data Schema and Read/Write Utilities `src/render/sequence_data_store.py`	New module defines an explicit PyArrow schema for one-row-per-proteoform records with nested sequence, coverage, and fragment mass lists, plus modification structs. Provides normalization helpers for numpy scalars, table builders from in-memory mappings, dataset coercion utilities, and read functions for both single-entry filtering and full reconstruction.
Tag-space → Proteoform Mapping `src/parse/tag_resolution.py`	Adds `_split_ints` and `build_tagspace_to_proteoform_map(...)` implementing a greedy, strictly-increasing assignment from tag-space ProteoformIndex values to protein-space proteoform indices, using intersection (with union fallback) across tag indices.
Proteoform-to-Scan Mapping `src/render/scan_resolution.py`	Adds `build_proteoform_scan_map(...)` to construct a lookup mapping each proteoform index to scan ID and deconvolution row index by deduplicating/indexing the scan table and joining against protein data, omitting unmapped proteoforms.
Parse-Time Sequence Data Persistence `src/parse/tnt.py`	Imports sequence-data utilities and switches sequence_data persistence from generic store to Parquet: builds an Arrow table from sequence_data and writes it via `file_manager.parquet_sink(...)` with configurable row-group sizing. Also derives `tagspace_to_proteoform` and groups tag ranges by mapped proteoform indices.
Initialization Scan-Scoped Loading and Map Attachment `src/render/initialize.py`	Introduces `_attach_proteoform_scan_map` and `_load_scan_scoped` to fetch per-scan cached datasets and eagerly attach the proteoform→scan map for flashtnt. Refactors branches (`deconv_spectrum`, `combined_spectrum`, `anno_spectrum`, `mass_table`, `sequence_view`, `tag_table`) to use the new loader and stores the sequence_data dataset path in `additional_data['sequence_data_ds']`.
Runtime Per-Scan Filtering and Sequence Data Loading `src/render/update.py`	Extends `filter_data` with flashtnt-specific branches that use the proteoform scan map to filter `per_scan_data` and `tag_table` by deconvolution index and scan, and loads per-proteoform sequence entries via `load_entry(...)` instead of slicing a pre-loaded dataset.

Possibly related PRs

OpenMS/FLASHApp#77: Implements the same flashtnt scan-scoped workflow—Parquet persistence of sequence_data, proteoform scan mapping, scan-scoped initialization loading, and per-proteoform lazy loading during filtering.

"I’m a rabbit in code, nibbling bytes with care,
Parquet tables snug, saved in tidy rows,
Maps that hop between proteoform and scan,
On selection I fetch just the entry that shows,
Data delivered quick — a carrot for pros!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: optimizing flashtnt by loading only the selected proteoform's scan-scoped data instead of all data.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch speedup/flashtnt-viewer-scoped-loading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/render/update.py`:
- Around line 175-183: The Tag Table branch is applying a proteoform_scan_map
filter for every tool even though proteoform_scan_map is only set for flashtnt;
change the branch so it only applies the scan-based filtering when running
flashtnt (e.g., check additional_data.get('tool') == 'flashtnt' or that
'proteoform_scan_map' exists and is non-empty) before using proteoform_scan_map,
selection_store.get('proteinIndex') and modifying data['tag_table'] so other
tools do not clear the table.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfdfa9c4-34db-40bc-85c9-ce22e4bd3284

📥 Commits

Reviewing files that changed from the base of the PR and between b18b6d7 and 0faee0a.

📒 Files selected for processing (5)

src/parse/tnt.py
src/render/initialize.py
src/render/scan_resolution.py
src/render/sequence_data_store.py
src/render/update.py

coderabbitai · 2026-05-28T13:29:53Z

+    elif component == 'Tag Table':
+        # flashtnt-only panel: ship only the selected proteoform's scan's tags.
+        scan_map = additional_data.get('proteoform_scan_map', {})
+        entry = scan_map.get(selection_store.get('proteinIndex'))
+        if entry is None:
+            data['tag_table'] = data['tag_table'].iloc[0:0, :]
+        else:
+            tags = data['tag_table']
+            data['tag_table'] = tags[tags['Scan'] == entry['scan']]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict this tag-table filter to flashtnt.

This branch currently runs for every Tag Table render, but proteoform_scan_map is only populated in src/render/initialize.py when tool == 'flashtnt'. For other tools, entry is always None, so the table gets cleared on every update.

Suggested fix

- elif component == 'Tag Table': + elif (component == 'Tag Table') and (tool == 'flashtnt'): # flashtnt-only panel: ship only the selected proteoform's scan's tags. scan_map = additional_data.get('proteoform_scan_map', {}) entry = scan_map.get(selection_store.get('proteinIndex')) if entry is None: data['tag_table'] = data['tag_table'].iloc[0:0, :] else: tags = data['tag_table'] data['tag_table'] = tags[tags['Scan'] == entry['scan']]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/render/update.py` around lines 175 - 183, The Tag Table branch is applying a proteoform_scan_map filter for every tool even though proteoform_scan_map is only set for flashtnt; change the branch so it only applies the scan-based filtering when running flashtnt (e.g., check additional_data.get('tool') == 'flashtnt' or that 'proteoform_scan_map' exists and is non-empty) before using proteoform_scan_map, selection_store.get('proteinIndex') and modifying data['tag_table'] so other tools do not clear the table.

The Tag Table and on-spectrum tag overlay came up empty on large datasets. Tags are scan (spectrum) data, so scope the feed to the selected proteoform's scan and stamp ProteinIndex so the frontend's tag.ProteinIndex===selectedProteinIndex filter passes the scan's tags through to the table and the overlay. Also correct per-proteoform coverage: tag_dfs.ProteinIndex is FLASHTagger's tag-space index, which diverges from protein_dfs.index on large runs, so the coverage loop associated the wrong tags. Map tag-space -> protein-space via protein.tsv TagIndices and group coverage by protein-space. The stored tag_dfs is unchanged, so the golden regression is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

t0mdavid-m merged commit b4b9d48 into develop May 28, 2026
4 of 5 checks passed

coderabbitai Bot mentioned this pull request Jun 2, 2026

Migrate FLASHApp visualizations to OpenMS-Insight components #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(flashtnt): load only the selected proteoform's scan-scoped data#77

perf(flashtnt): load only the selected proteoform's scan-scoped data#77
t0mdavid-m merged 2 commits into
developfrom
speedup/flashtnt-viewer-scoped-loading

t0mdavid-m commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t0mdavid-m commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

t0mdavid-m commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading