Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting POS of 0 #110

Open
anoronh4 opened this issue May 16, 2022 · 7 comments
Open

getting POS of 0 #110

anoronh4 opened this issue May 16, 2022 · 7 comments

Comments

@anoronh4
Copy link

I ran brass on several samples and in one sample i got the following record:

11	0	4897_2	0	0]11:109403548]	4	.	SVTYPE=BND;MATEID=4897_1;TSRDS=A00227:491:HGLM2DSX3:4:1619:32814:17581,A00333:460:HFNH3DSX3:3:1246:9498:24643,A00333:460:HFNH3DSX3:3:1357:18231:27962,A00333:460:HFNH3DSX3:4:1557:7997:8672;BKDIST=109403549;SVCLASS=inversion	RC:PS	0:0	0:4

position 0 looks pretty unusual to me and i was not able to validate the call with any of 3 other callers (delly, manta, svaba). this is causing issues for me because certain downstream tools will not handle a position of 0 (pysam for example).

@anoronh4
Copy link
Author

anoronh4 commented Jun 6, 2022

Hope all is well. Just wondering if you need any further information/files from me to answer this question. By the way I'm using the quay.io/wtsicgp/brass:v6.3.4 container. As per the specification, VCF files should contain 1-based coordinates: https://samtools.github.io/hts-specs/VCFv4.3.pdf

@anoronh4
Copy link
Author

just wondering if there was any update on this.

@anoronh4
Copy link
Author

anoronh4 commented Sep 26, 2022

i have run into this issue again with a different sample:

$ zgrep -v "^#" /path/to/brass/sample.annot.vcf.gz | awk -F"\t" '$2 == 0'
3	0	5631_2	0	[3:80865736[0	5	.	SVTYPE=BND;MATEID=5631_1;TSRDS=A00227:541:HM73TDSX3:2:1321:30101:27132,A00227:541:HM73TDSX3:3:1137:14832:13025,A00227:541:HM73TDSX3:3:1163:23972:14669,A00227:541:HM73TDSX3:4:2455:30120:12540,A00333:514:HLYMTDSX3:2:1409:8630:9455;BKDIST=80865737;SVCLASS=inversion	RC:PS	0:0:5
3	0	5695_2	0	[3:80870254[0	8	.	SVTYPE=BND;MATEID=5695_1;TSRDS=A00227:541:HM73TDSX3:1:1328:22155:15562,A00227:541:HM73TDSX3:1:2469:27588:14857,A00227:541:HM73TDSX3:2:1608:9751:25708,A00227:541:HM73TDSX3:2:1647:23728:25739,A00227:541:HM73TDSX3:2:2305:20654:23077,A00227:541:HM73TDSX3:3:2203:7726:14904,A00333:514:HLYMTDSX3:2:1433:9932:16313,A00333:514:HLYMTDSX3:2:2636:32958:14575;BKDIST=80870255;SVCLASS=inversion	RC:PS	0:0	0:8
3	0	6174_2	0	0]3:80911681]	4	.	SVTYPE=BND;MATEID=6174_1;TSRDS=A00227:541:HM73TDSX3:1:1477:31159:22138,A00227:541:HM73TDSX3:1:2540:17987:9596,A00227:541:HM73TDSX3:4:1518:21640:27101,A00227:541:HM73TDSX3:4:2214:22019:20870;BKDIST=80911682;SVCLASS=inversion	RC:PS	0:0	0:4

This time, the issue appears 3 times in the same output vcf. Does your team plan to look into this issue at all?

interestingly, when i searched the brm.bam file for the first read mentioned in those vcf lines, it does not reconcile with the vcf:

$ samtools view /path/to/brass/tmpBrass/sample.brm.bam | grep A00227:541:HM73TDSX3:2:1321:30101:27132
A00227:541:HM73TDSX3:2:1321:30101:27132	145	3	80865728	60	39S112M	=	80866798	960	AGAGCGATCAGGCAAGAGAAGAAATGAAGGGCATTCAAGCAGGAATACCTCAACCCCTCCAACACAGCAGGCACAGCAGCTTCCCAAATTCGAGGGGACGGAGAACAAAGCTGGGGGCCTGTTACCAGTCCCCAAGAGTTAGAGTCCACAG	1<:=5<+<=;<==;;=;=;;=8;77=;;==:=:=<=;;==;==;:(:=><=98=88>9=8,;>;>1:>1>>>;><>>=?@?>???==<?:7?=??8?<7>?=;<;,<<<>?=>>98;>?==:<;>=;<<====;2=<<<::<::::<9;98	SA:Z:3,80866832,-,45M106S,60,0;	MC:Z:79M72S	MD:Z:112	PG:Z:MarkDuplicateRG:Z:TA00227@541@HM73TDSX3@2	NM:i:0	AS:i:112	XS:i:33
A00227:541:HM73TDSX3:2:1321:30101:27132	97	3	80866798	60	79M72S	=	80865728	-960	ACTCCTATTCAACACAGTACTAGAAGTCCTGGCCAGAGCGATCAGGCAAGAGAAGAAATGAAGGGCATTCAAGCAGGAATACCTCAACCCCTCCAACACAGCAGGCACAGCAGCTTCCCAAATTCGAGGGGACGGAGAACAAAGCTGGGGG	;<<??688:=;=<<<<?;<=<<?=>?;>><>>>>=?=?>6=;>=?>?>:@>@>?@>:?<?>?@???><=?>?@?>@?>?<==?=?=?>???=??=?=>=>@?>@??>>>@?>@?:=???>?8;;>6>?>>>==6>=?=8==*??><??>=;	SA:Z:3,80865728,+,73S78M,60,0;	MC:Z:39S112M	MD:Z:79	PG:Z:MarkDuplicates	RG:Z:TA00227@541@HM73TDSX3@2	NM:i:0	AS:i:79	XS:i:31

@tischfis
Copy link

I am having similar issue, any update BRASS team? thanks!

@rulixxx
Copy link

rulixxx commented Sep 27, 2022

Would need some small test input to try to reproduce.

@anoronh4
Copy link
Author

anoronh4 commented Oct 4, 2022

thanks for your response. i will need some time to come up with a small test input and make sure it can reproduce the issue. is there a way to run brass on a bam that has been filtered by region?

in the meantime, i have done a little more work to isolate the problem, finding the source of the invalid coordinate in the .r4 intermediate file:

$ cat brass/tumor_vs_normal.r4 | awk -F"\t" '$12 == 0' 
3	80865824	80865850	3	80867060	80867071	5631	5	-	+	80865736	0	80865736 (0)	80866775 (0)	0 (0)	0 (0)	A00227:541:HM73TDSX3:2:2333:25129:13479,A00227:541:HM73TDSX3:3:2636:13286:8218,A00227:541:HM73TDSX3:4:2252:20066:33207,A00333:514:HLYMTDSX3:2:1106:31195:31657,A00333:514:HLYMTDSX3:2:1116:29740:36933,A00333:514:HLYMTDSX3:2:1409:8630:9455,A00333:514:HLYMTDSX3:2:1657:3875:9392,A00333:514:HLYMTDSX3:2:2232:2311:15452	8	1		0	0
3	80869882	80870017	3	80870790	80870983	5695	8	-	+	80870254	0	80870254 (2)	80870111 (2)	0 (0)	0 (0)	A00227:541:HM73TDSX3:2:2464:15980:14700,A00227:541:HM73TDSX3:3:1411:26874:24455,A00227:541:HM73TDSX3:3:2366:11333:22811,A00227:541:HM73TDSX3:3:2640:26503:3552,A00227:541:HM73TDSX3:4:1336:12292:19366,A00227:541:HM73TDSX3:4:1650:16188:34068,A00227:541:HM73TDSX3:4:1659:11234:11303,A00227:541:HM73TDSX3:4:2158:5367:6715,A00227:541:HM73TDSX3:4:2515:3821:20572,A00227:541:HM73TDSX3:4:2515:6451:20556,A00227:541:HM73TDSX3:4:2643:20383:3443,A00333:514:HLYMTDSX3:2:1506:30427:16047,A00333:514:HLYMTDSX3:2:1616:18059:12790,A00333:514:HLYMTDSX3:2:1646:27154:33364	14	2		0	0
3	80911970	80911978	3	80913732	80913783	6174	4	+	-	80911681	0	80911624 (2)	80911681 (2)	0 (0)	0 (0)	A00227:541:HM73TDSX3:1:2404:15302:31093,A00227:541:HM73TDSX3:3:1428:22417:22091	2	0

The 12th column of this file represents the "high_end_bkpt" and is calculated using the following lines:

if ($F[9] eq '+') {
$high_end_bkpt = ($high_end_high_clip_count >= $CLIPPED_READS_NEEDED ? $high_end_high_clip_pos : highest_pos_of_reads($high_end_reads_of_rg{$F[6]}));
}
else {
$high_end_bkpt = ($high_end_low_clip_count >= $CLIPPED_READS_NEEDED ? $high_end_low_clip_pos : lowest_pos_of_reads($high_end_reads_of_rg{$F[6]}));
}

seems to me that this should not evaluate to 0, or should get lead to the variant being filtered out?

This column is passed on to .r5 and .r6, and then gets added to .groups.clean.bedpe with the following lines:

$F[4] = $ids{$id}->[1]-1;
$F[5] = $ids{$id}->[1];

Does this help identify the source of the problem?

@rulixxx
Copy link

rulixxx commented Oct 5, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants