Update and Major fixes to the Model and Pathway Loss by BenjaminIsaac0111 · Pull Request #5 · BenjaminIsaac0111/SpatialTranscriptFormer

BenjaminIsaac0111 · 2026-03-04T01:40:17Z

Double-checking that the model and pathway loss are correctly ingesting and predicting the data correctly. I realised there were some things we did not calculate correctly, such as the auxiliary loss for the pathway.

I noted that every member gene in a pathway (even lowly expressed transcription factors) should contribute equally to the spatial activation signature, preventing high-count housekeeping genes from dominating pathway patterns.

I also removed the data-driven pathways learning module. I might get back to this at some point, as this might be interesting to implement as a feature, but for now, I am not 100% on this - at least without more contextual reading on how pathways might interact. Would be good to get an expert on this.

Updated the documentation to reflect these changes and made the preset model a little bit easier to run from CLI.

…ored for better maintainbility.

…tion - Refactor train.py by extracting logic into training submodules: - arguments.py: CLI parameter definitions - builder.py: Model and criterion setup - checkpoint.py: Robust saving and loading logic - Fix learning rate plateau by replacing disjoint schedulers with SequentialLR to properly chain linear warmup and cosine decay phases. - Simplify SpatialTranscriptFormer architecture: - Remove redundant log_temperature parameter to reduce gradient variance. - Implement L1-normalization for MSigDB pathway weight initialization to prevent exponential prediction explosion at startup. - Enhance load_checkpoint with robust error handling for EOFError (corrupted files) and ValueError (architecture/optimizer mismatches) to ensure graceful fallbacks.

- Fix test_checkpoint.py by adding required schedulers argument to save_checkpoint and load_checkpoint calls. - Fix test_checkpoints.py by updating pathway initialization assembly assertion to expect L1-normalized weights instead of raw binary values. - Fix test_spatial_interaction.py by removing obsolete test_temperature_scaling (log_temperature removed from model). - Fix test_pathways.py by providing mock args to _compute_pathway_truth to satisfy new visualization parameters. - Add test_pathway_stability.py to verify numerical stability and gradient flow of the final stabilized pathway-informed architecture.

Implemented spatial Z-score normalization and mean-aggregation for biological pathway ground-truth calculation. This ensures that every member gene in a pathway (even lowly-expressed transcription factors) contributes equally to the spatial activation signature, preventing high-count housekeeping genes from dominating the pathway patterns. Changes: - Updated AuxiliaryPathwayLoss to spatially standardize genes before projecting onto the pathway matrix. - Handled normalization across batch (patch-level) and spatial (whole-slide) dimensions with proper masking. - Switched from raw summation to mean-aggregation (averaging by pathway member counts). - Synchronized visualization.py ground-truth logic with the new objective. - Fixed mock tests in test_losses.py to match the normalized targets. Variance analysis on HEST data indicated raw gene variance ratios exceeding 300,000x, necessitating this standardization for biologically relevant pathway supervision.

…ation Deprecates the experimental data-driven pathway discovery in favor of strictly biologically-prior-driven interpretability. Updates the auxiliary pathway loss to use spatial Z-score normalization, ensuring lowly-expressed transcription factors contribute equally to the spatial objective. - Remove `--sparsity-lambda` and associated L1 regularization logic. - Implement spatial Z-score normalization in `AuxiliaryPathwayLoss`. - Synchronize visualization ground truth calculation with the new math. - Add `--plot-pathways-list` for dynamic user control over heatmaps. - Update plot labels to reflect Z-scored spatial patterns. - Cleanup: Delete LATENT_DISCOVERY.md and scrub legacy doc references. Ref: docs/PATHWAY_MAPPING.md, src/spatial_transcript_former/training/losses.py

…selection logic..

BenjaminIsaac0111 added 10 commits February 27, 2026 12:56

feat: add prototype visualization and organ filtering

c24d59b

fix: correct spatial-pe argument handling

cd0f00c

fix: sync num_genes with global_genes.json

3db1987

fix: global sync of num_genes across model and data

dd03c61

- Updated the monitoring dashboard with a more modern look and refact…

63d0912

…ored for better maintainbility.

- Updated visulisation test as this was failing with the new pathway …

c44b954

…selection logic..

BenjaminIsaac0111 changed the title ~~UpdateUpdate and Major fixes to the Model and Pathway Loss~~ Update and Major fixes to the Model and Pathway Loss Mar 4, 2026

BenjaminIsaac0111 self-assigned this Mar 4, 2026

BenjaminIsaac0111 added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 4, 2026

BenjaminIsaac0111 merged commit ae68c5f into main Mar 4, 2026
2 checks passed

BenjaminIsaac0111 deleted the feature/prototype-results branch March 4, 2026 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update and Major fixes to the Model and Pathway Loss#5

Update and Major fixes to the Model and Pathway Loss#5
BenjaminIsaac0111 merged 10 commits intomainfrom
feature/prototype-results

BenjaminIsaac0111 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BenjaminIsaac0111 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant