perf(flashtnt): load only the selected proteoform's scan-scoped data#77
Conversation
Selecting a proteoform now resolves its scan and filters the spectra, mass table, and tag table to that scan instead of shipping every scan's data to the browser. sequence_data is stored one row per proteoform (sequence_data.pq) and the Sequence View pushdown-loads only the selected proteoform's row, replacing the ~40s monolithic load. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR replaces generic sequence_data storage with a PyArrow Parquet-backed store, adds tag-space→proteoform and proteoform→scan mappings, persists sequence_data at parse time, and refactors initialization and runtime filtering to load per-proteoform entries for flashtnt. ChangesPyArrow Sequence Data Persistence and Scan Filtering
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/render/update.py`:
- Around line 175-183: The Tag Table branch is applying a proteoform_scan_map
filter for every tool even though proteoform_scan_map is only set for flashtnt;
change the branch so it only applies the scan-based filtering when running
flashtnt (e.g., check additional_data.get('tool') == 'flashtnt' or that
'proteoform_scan_map' exists and is non-empty) before using proteoform_scan_map,
selection_store.get('proteinIndex') and modifying data['tag_table'] so other
tools do not clear the table.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dfdfa9c4-34db-40bc-85c9-ce22e4bd3284
📒 Files selected for processing (5)
src/parse/tnt.pysrc/render/initialize.pysrc/render/scan_resolution.pysrc/render/sequence_data_store.pysrc/render/update.py
| elif component == 'Tag Table': | ||
| # flashtnt-only panel: ship only the selected proteoform's scan's tags. | ||
| scan_map = additional_data.get('proteoform_scan_map', {}) | ||
| entry = scan_map.get(selection_store.get('proteinIndex')) | ||
| if entry is None: | ||
| data['tag_table'] = data['tag_table'].iloc[0:0, :] | ||
| else: | ||
| tags = data['tag_table'] | ||
| data['tag_table'] = tags[tags['Scan'] == entry['scan']] |
There was a problem hiding this comment.
Restrict this tag-table filter to flashtnt.
This branch currently runs for every Tag Table render, but proteoform_scan_map is only populated in src/render/initialize.py when tool == 'flashtnt'. For other tools, entry is always None, so the table gets cleared on every update.
Suggested fix
- elif component == 'Tag Table':
+ elif (component == 'Tag Table') and (tool == 'flashtnt'):
# flashtnt-only panel: ship only the selected proteoform's scan's tags.
scan_map = additional_data.get('proteoform_scan_map', {})
entry = scan_map.get(selection_store.get('proteinIndex'))
if entry is None:
data['tag_table'] = data['tag_table'].iloc[0:0, :]
else:
tags = data['tag_table']
data['tag_table'] = tags[tags['Scan'] == entry['scan']]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/render/update.py` around lines 175 - 183, The Tag Table branch is
applying a proteoform_scan_map filter for every tool even though
proteoform_scan_map is only set for flashtnt; change the branch so it only
applies the scan-based filtering when running flashtnt (e.g., check
additional_data.get('tool') == 'flashtnt' or that 'proteoform_scan_map' exists
and is non-empty) before using proteoform_scan_map,
selection_store.get('proteinIndex') and modifying data['tag_table'] so other
tools do not clear the table.
The Tag Table and on-spectrum tag overlay came up empty on large datasets. Tags are scan (spectrum) data, so scope the feed to the selected proteoform's scan and stamp ProteinIndex so the frontend's tag.ProteinIndex===selectedProteinIndex filter passes the scan's tags through to the table and the overlay. Also correct per-proteoform coverage: tag_dfs.ProteinIndex is FLASHTagger's tag-space index, which diverges from protein_dfs.index on large runs, so the coverage loop associated the wrong tags. Map tag-space -> protein-space via protein.tsv TagIndices and group coverage by protein-space. The stored tag_dfs is unchanged, so the golden regression is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Selecting a proteoform now resolves its scan and filters the spectra, mass table, and tag table to that scan instead of shipping every scan's data to the browser. sequence_data is stored one row per proteoform (sequence_data.pq) and the Sequence View pushdown-loads only the selected proteoform's row, replacing the ~40s monolithic load.
Summary by CodeRabbit
New Features
Refactor