Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305

ilyasoifer · 2023-05-02T14:04:11Z

This tool is useful for alignment of the reads to a large set of templates that are very close in sequence. In this case the alignment is close to the problem that the HaplotypeCaller is solving - of calculating the likelihood of the read compared to a set of close haplotypes.
The tool outputs read x haplotype matrix for a set of reads and haplotypes using either FlowPairHMM or FlowBasedAlignmentEngine

meganshand · 2023-05-03T14:38:39Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+ *    FlowPairHMM and FlowBasedAlignment (FBA), but can be easily extended.
+ *    At present, there are two output formats that can be specified using parameter --output-format: extended and concise.
+ *    The extended format contains a readxhaplotype matrix that shows alignment score of each read versus each haplotype.
+ *    Condensed format will following columns (not nessarily in this order) for each processed read:


Suggested change

* Condensed format will following columns (not nessarily in this order) for each processed read:

* Condensed format contains the following columns (not nessarily in this order) for each processed read:

Do you mean the columns are in any order? It looks like they're in a fixed order from the writer.

meganshand · 2023-05-03T14:39:31Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+ *
+ * <h3>Usage examples</h3>
+ * <pre>
+*             java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \


Suggested change

* java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \

* gatk FlowPairHMMAlignReadsToHaplotypes \

meganshand · 2023-05-03T14:40:24Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+ * <pre>
+*             java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \
+ *            -H ~{haplotype_list} -O ~{base_file_name}.matches.tsv \
+ *            -I ~{input_bam} "--flow-use-t0-tag -E FBA \


Suggested change

* -I ~{input_bam} "--flow-use-t0-tag -E FBA \

* -I ~{input_bam} --flow-use-t0-tag -E FBA \

meganshand · 2023-05-03T14:40:33Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+ *            -H ~{haplotype_list} -O ~{base_file_name}.matches.tsv \
+ *            -I ~{input_bam} "--flow-use-t0-tag -E FBA \
+ *            --flow-fill-empty-bins-value 0.00001 --flow-probability-threshold 0.00001 \
+ *            --flow-likelihood-optimized-comp"


Suggested change

* --flow-likelihood-optimized-comp"

* --flow-likelihood-optimized-comp

meganshand · 2023-05-03T14:42:28Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+    private static final Logger logger = LogManager.getLogger(FlowPairHMMAlignReadsToHaplotypes.class);
+
+    @Argument(fullName = "haplotypes", shortName = "H", doc="Fasta file with haplotypes")
+    public GATKPath haplotypesFa;


These can all be public static final

java does not like that - it says that the variable may not have been initialized

meganshand · 2023-05-03T14:45:11Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+    @Argument(fullName = "output-format", doc="concise or expanded output format: " +
+            "expanded - output full read x haplotype, concise - output for each read best haplotype and " +
+            "score differences from the next best and the reference haplotype", optional=true)
+    public String outputFormat = "expanded";


This should probably be an enum. Or since there's only two options you could make it a boolean with "expanded" as the default (eg. Boolean conciseOutputFormat = false)

Same comment below for the aligner argument.

Good point, fixed!

meganshand · 2023-05-03T14:47:07Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+    public FlowBasedAlignmentArgumentCollection fbargs = new FlowBasedAlignmentArgumentCollection();
+
+
+


Suggested change

@ArgumentCollection

meganshand · 2023-05-03T14:48:44Z

...roadinstitute/hellbender/tools/walkers/featuremapping/FlowPairHMMAlignReadsToHaplotypes.java

+
+
+    LikelihoodEngineArgumentCollection likelihoodArgs = new LikelihoodEngineArgumentCollection();
+    SAMFileHeader sequenceHeader;


I think you can make these private static final

...adinstitute/hellbender/tools/walkers/haplotypecaller/FlowBasedAlignmentLikelihoodEngine.java

meganshand · 2023-05-03T15:04:15Z

...ain/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/FlowBasedHMMEngine.java

-            //in general reads are already trimmed to the haplotype starts and ends so diff_left <= 0 and diff_right <= 0
-            fbr.applyBaseClipping(Math.max(0, diffLeft), Math.max(diffRight, 0), false);
-        }
+//        final int haplotypeStart = processedHaplotypes.get(0).getStart();


Should this be uncommented?

This actually is not necessary in the context of the GATK

ilyasoifer · 2023-05-20T15:35:23Z

@meganshand - all fixed, PTAL
Thanks!

ilyasoifer added 11 commits March 20, 2023 20:57

Initial commit

5c818a4

Working version

8d5f044

Alternative implementation that runs simple flow-based alignment

825a17d

Added onTraverseSuccess

8667661

Version that supports both aligners

e920765

Alternative implementation that runs Nicer output

aeae767

Added more concise output format

4d7f071

Added test!

276a1a0

Added test files + changed precision

82f4ede

false -> true

96c188f

nicer code

26ef6ad

ilyasoifer requested review from jamesemery and meganshand May 2, 2023 14:04

Updated test

5753406

meganshand reviewed May 3, 2023

View reviewed changes

PR comments

06c0dec

ilyasoifer requested a review from meganshand May 20, 2023 15:32

PR comments

cb65d24

meganshand approved these changes May 25, 2023

View reviewed changes

meganshand merged commit 0ff9b91 into broadinstitute:master May 25, 2023
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305

Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305

ilyasoifer commented May 2, 2023

meganshand May 3, 2023

meganshand May 3, 2023

ilyasoifer May 17, 2023

meganshand May 3, 2023

ilyasoifer May 17, 2023

meganshand May 3, 2023

ilyasoifer May 17, 2023

meganshand May 3, 2023

ilyasoifer May 17, 2023

meganshand May 3, 2023

ilyasoifer May 17, 2023

ilyasoifer May 17, 2023

meganshand May 3, 2023

ilyasoifer May 20, 2023

meganshand May 3, 2023

meganshand May 3, 2023

ilyasoifer May 20, 2023

meganshand May 3, 2023

ilyasoifer May 20, 2023

ilyasoifer commented May 20, 2023

	* Condensed format will following columns (not nessarily in this order) for each processed read:
	* Condensed format contains the following columns (not nessarily in this order) for each processed read:

	* java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \
	* gatk FlowPairHMMAlignReadsToHaplotypes \

	* -I ~{input_bam} "--flow-use-t0-tag -E FBA \
	* -I ~{input_bam} --flow-use-t0-tag -E FBA \

	* --flow-likelihood-optimized-comp"
	* --flow-likelihood-optimized-comp

		public FlowBasedAlignmentArgumentCollection fbargs = new FlowBasedAlignmentArgumentCollection();



		LikelihoodEngineArgumentCollection likelihoodArgs = new LikelihoodEngineArgumentCollection();
		SAMFileHeader sequenceHeader;

Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305

Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305

Conversation

ilyasoifer commented May 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilyasoifer commented May 20, 2023