Reformatting: Annotate subworkflow, OLGA module, Patient subworkflow#75
Reformatting: Annotate subworkflow, OLGA module, Patient subworkflow#75
Conversation
- parallelize concatenation of files to avoid loading all of them in memory; might remove RESOLVE_SAMPLESHEET - improve calculation of log10_pgen, joining of sample repertoires + olga pgen values - simplify calculation of min/max olga log10_pgen values
- Add patient workflow to run on samples grouped by patient - Move modules such as compare_calc, giana, gliph to patient to reduce scope of algorithms for large datasets - Refactored compare/patient calc to vectorize operations Further cleaning up of code and updating documentation will be required.
Unit Test Results10 tests 10 ✅ 2m 49s ⏱️ Results for commit 85cd394. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Pull request overview
This PR adds a new patient-level analysis stage to the Nextflow pipeline and refactors the annotate → OLGA plumbing so downstream stages (sample/compare/patient) can reuse shared intermediate artifacts.
Changes:
- Introduce a
patientworkflow level (schema + Cirro form + newPATIENTsubworkflow and patient modules/scripts). - Refactor
ANNOTATEto produce per-sample processed CDR3 files and a concatenated/sorted CDR3 table, and to emit OLGA-derived stats for reuse downstream. - Rework OLGA sample merge + histogram inputs to use shared OLGA stats and simplify sample-level OLGA steps.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/tcrtoolkit.nf | Adds patient stage wiring and guards; updates call graph (ANNOTATE/SAMPLE/PATIENT/COMPARE). |
| subworkflows/local/annotate.nf | Refactors annotate subworkflow to use ANNOTATE_PROCESS, concatenation via collectFile, and emits olga_stats. |
| subworkflows/local/sample.nf | Updates SAMPLE inputs and OLGA calls to use shared olga_stats; removes deprecated OLGA max-writing. |
| subworkflows/local/patient.nf | New patient-level subworkflow: group samples by meta.patient, concatenate, run patient metrics + clustering tools. |
| subworkflows/local/compare.nf | Simplifies compare stage to TCR sharing + OLGA merge only (removes GIANA/GLIPH2 here). |
| modules/local/annotate/main.nf | Adds ANNOTATE_PROCESS module to standardize per-sample CDR3 extraction. |
| modules/local/olga/main.nf | Adds log10 computation earlier, emits OLGA stats, refactors sample merge to a streaming join, and updates histogram calc inputs. |
| modules/local/patient/main.nf | New processes for patient concatenation and patient overlap matrix calculation. |
| bin/patient_calc.py | New vectorized patient overlap calculator writing per-patient matrices. |
| modules/local/compare/gliph2.nf | Makes GLIPH2 outputs patient-scoped (input tuple includes patient). |
| modules/local/compare/giana.nf | Makes GIANA outputs patient-scoped and changes logging/output set. |
| nextflow_schema.json | Extends workflow_level validation to allow patient. |
| nextflow.config | Changes default olga_chunk_length. |
| modules/local/sample/sample_calc.nf | Removes the stub: block for SAMPLE_CALC. |
| .cirro/process-input.json | Wires new patient_lvl form value into params. |
| .cirro/process-form.json | Adds patient toggle; updates olga_chunk_length type/default. |
| .cirro/preprocess.py | Adds patient flag into workflow_level construction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Running patient analysis | ||
| if (levels.contains('patient')) { | ||
| PATIENT( ANNOTATE.out.processed_samples ) | ||
| } |
There was a problem hiding this comment.
New patient workflow level is introduced (new subworkflow + new modules and significant OLGA plumbing changes), but the nf-test suite currently only asserts a minimal sample/compare run succeeds. Consider adding at least one nf-test case that runs with workflow_level=patient (and optionally sample,patient) to catch the ANNOTATE dependency and verify expected patient outputs are produced.
There was a problem hiding this comment.
Will add nf-test later
| @@ -17,12 +17,4 @@ process SAMPLE_CALC { | |||
| """ | |||
| sample_calc.py -s '${sample_meta.sample}' -c ${count_table} | |||
| """ | |||
There was a problem hiding this comment.
This repo uses stub: blocks in several modules (e.g. modules/local/sample/sample_plot.nf, modules/local/compare/compare_plot.nf). Removing the stub: section from SAMPLE_CALC makes -stub-run / lightweight test runs inconsistent and may break workflows that rely on stubbing heavy steps. Consider restoring a minimal stub that creates the expected output files for this process.
| """ | |
| """ | |
| stub: | |
| """ | |
| touch sample_stats_${sample_meta.sample}.csv | |
| touch v_family_${sample_meta.sample}.csv | |
| touch d_family_${sample_meta.sample}.csv | |
| touch j_family_${sample_meta.sample}.csv | |
| """ |
There was a problem hiding this comment.
Will introduce stubs later
No description provided.