Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precise variants with homology longer than the variant are incorrectly filtered #243

Closed
YuanwenGuo opened this issue Sep 9, 2019 · 13 comments
Assignees
Labels

Comments

@YuanwenGuo
Copy link

Hello,
I am trying to use GRIDSS detecting possible breakends in multiple samples. It successfully identifies most of breakends, but looks like some breaks can not be picked up with default configuration settings. Take one breakend as example:
example

I tried to adjust configuration with the following parameters, but it didn't help much:
minMapq = 10
assembly.minReads = 1
variantcalling.minReads = 1
variantcalling.minScore = 10
variantcalling.minSize = 10

I also tried to use cohort scripts to process my samples, which did not detect the breakends too.

Could you please give me any suggestions about how to change configurations to make GRDISS be able to detect these kinds of breakends?

Thank you!

Best,
Yuanwen

@d-cameron d-cameron added the bug label Sep 9, 2019
@d-cameron d-cameron changed the title Some breakends can not be detected by GRIDSS Precise variants with homology longer than the variant are incorrectly filtered Sep 9, 2019
@d-cameron
Copy link
Member

I'll put in a fix for this in the next GRIDSS release

@d-cameron
Copy link
Member

d-cameron commented Sep 9, 2019

The work-around for the current version is to supply a CONFIGURATION_FILE parameter with:

variantcalling.writeFiltered = true

Unfortuantely, that output file will be very large. The variants in question are incorrectly filtered as REFERENCE_ALLELE since (due the large homology) the start and end bounds overlap so they trigger the reference allele filter.

@d-cameron d-cameron self-assigned this Sep 9, 2019
@YuanwenGuo
Copy link
Author

Thank you very much for the response! I will try the configurations you suggested, and see how it works.

Look forward the new release!

Best,
Yuanwen

@YuanwenGuo
Copy link
Author

Hi Daniel,
I rerun GRIDSS with the configuration settings as you suggested. It helped a lot and some break-ends can be detected now. But unfortunately, this bread-end (POS:837244) in the picture could not be detected. I wonder if there is any other parameters I can adjust to make it work?

Thank you for the help!

Best,
Yuanwen

@d-cameron
Copy link
Member

d-cameron commented Sep 11, 2019 via email

@YuanwenGuo
Copy link
Author

Hi Daniel,

Thank you very much for the detailed and informative explanation!

I've been checking bam files as you suggested, and it seems like assembly.bam.sv.bam contig alignments don't overlap with the
assembly.bam contig alignments as showing in the following figures (top is assembly.bam.sv.bam, and bottom is assembly.bam)
Screen Shot 2019-09-11 at 10 13 26 PM
Screen Shot 2019-09-11 at 10 15 19 PM

Therefore, it should be because assembly contigs align to a different position.

I appreciate the help!

Best,
Yuanwen

@YuanwenGuo
Copy link
Author

Hi Daniel,
After reading #185 issue, I am still concerned about my process. If I understand it correctly, that means assembly.bam is before realigning, and assembly.sv.bam is after realigning. So if I can see split contigs in assembly.sv.bam (which is showing as top one in the previous comment):
64751372-8b38db00-d4e1-11e9-84fa-3a39090b9acb
Then GRIDSS should be able to detect it with appropriate filtering.

I have tried configurations as following, but they didn't work.
minMapq=10
assembly.minReads = 1

####################

Variant calling

####################
variantcalling.minReads = 1
variantcalling.minScore = 10
variantcalling.minSize = 3
variantcalling.callUnassembledBreakends = true
variantcalling.breakendMargin = 3
variantcalling.writeFiltered = true
variantcalling.maxBreakendHomologyLength = 2000

I wonder if there is any other filtering I can play with? Thank you!

Best,
Yuanwen

@d-cameron
Copy link
Member

If I understand it correctly, that means assembly.bam is before realigning, and assembly.sv.bam is after realigning.

That is correct.

So if I can see split contigs in assembly.sv.bam then GRIDSS should be able to detect it with appropriate filtering.

GRIDSS will report variants even when there is not assembly support (FILTER=NO_ASSEMBLY). That said, GRIDSS heavily favours variants that do have assembly support.

variantcalling.maxBreakendHomologyLength = 2000

This parameter changes the window size for S/W imprecise homology detection. It's not going to change sensitivity but pushing it out to 2kbp will cause false negatives in the homology alignment due to the lack of flanking anchors. I'd leave this one at the default value.

I wonder if there is any other filtering I can play with?

Have you tried my minMapq suggestion?

Can you post the SAM records for assembly.sv.bam record in question, as well as few of the supporting reads for the call that isn't detected? I'm getting quite curious as to why that particular site is so refractory.

What's your upstream pipeline look like? Which aligner and additional steps are you performing (e.g markdup BQSR)?

d-cameron pushed a commit that referenced this issue Sep 18, 2019
@d-cameron
Copy link
Member

The hom > variant length should be resolved in 2.6.2.

@YuanwenGuo
Copy link
Author

Hi Daniel,
Sorry for the late reply. Please see attachment for SAM records I can find around (~200bp before and 200bp after the breakend POS). The breakend position should be on scaffold_22 at position 837243/837244).
surrounding_SAM_records.txt

I tried to attach the original bam file, but seems like bam type is not supported here. Please let me know if the bam file will be helpful for this issue.

Yes, I did try minMapq=10, unfortunately this breakend still can not be detected.

Thank you very much for the help!

Best,
Yuanwen

@YuanwenGuo
Copy link
Author

The upstream pipeline is performed by my colleague. We used bwa and then picard for removing duplicates.

Thank you,
Yuanwen

@d-cameron
Copy link
Member

It looks like you have a lot of duplicated sequences in your scaffolds. GRIDSS takes the conservative approach of only calling variants from uniquely mappable sequence. Single breakends are called when one side is multi-mapping but no calls are made when neither side has good mapping quality support.

Looking at your reads, they're all either mapq=60, or mapq=0. Furthermore, the read alignments of the mates are split across at least 4 different locations (scaffold_15:2510710, scaffold_15:2739465, scaffold_49:387751, scaffold_51:530826). GRIDSS doesn't make a call since whichever position it guesses is likely to be wrong.

If you really want the 4 calls made (e.g. you're looking for misassemblies in your scaffolds), then you can set the minMapq=0, and deal with the much higher false discovery rate. You'll also need to add scoring.model = ReadCount to your gridss.properties file since mapq=0 corresponds to Pr(correct alignment) = 0 for that read so it will be given 0 weight in the (default) likelihood-based scoring model used by GRIDSS.

@YuanwenGuo
Copy link
Author

Thank you for the detailed explanation! Now I can see why this SV is not detected.

GRIDSS is a really powerful breakend detector from our experience, much better than some other popular programs (thanks a lot for developing such a great tool). At this point, we probably don't want to increase false positives, and will only take what GRIDSS reports which are reliable calls.

Best,
Yuanwen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants