Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes LiftoverVCF for indels (again) #1469

Merged
merged 10 commits into from
Apr 6, 2020

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented Feb 15, 2020

fixes #1258
fixes #1280

Thanks to the several users who reported this is still a problem, and sorry for ignoring you for a year. @nh13, @kirannarta, and @duartemolha!

Description

Give your PR a concise yet descriptive title
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

- fixes (again) #1258 which was still broken due to the call to "make()" in the presence of a mismatching "END" attribute and "getEnd()" result.
pshapiro4broad
pshapiro4broad previously approved these changes Feb 18, 2020
Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java code looks ok; can't say I fully understand the fix or the test though.

public void testLiftOverIndels(final LiftOver liftOver, final ReferenceSequence reference, final VariantContext source, final VariantContext result) {

final Interval target = liftOver.liftOver(new Interval(source.getContig(), source.getStart(), source.getEnd()), 1);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bingo

@nh13
Copy link
Collaborator

nh13 commented Feb 18, 2020

@yfarjoun I am going to see if I can make a simple test case for #1280 and see if this fixes it.

@nh13
Copy link
Collaborator

nh13 commented Feb 18, 2020

@yfarjoun this fix does not solve my issue: #1280. I have a single-variant test case in that PR.

@yfarjoun
Copy link
Contributor Author

@nh13 while this PR was not intended to fix your other PR, I found the cause and I think I fixed it...care to test? (and review the PR while you're at it...) 😄

@yfarjoun
Copy link
Contributor Author

@pshapiro4broad the problem was that the old code called "build()" on a builder that was in an improper state, the end of the variant was not at the same place where the END attribute was (for a non symbolic variant), and this didn't validate.

The changes to the tests simply adds the END attribute so that this problem is triggered, indeed the modified tests failed before I made the change.

@nh13 nh13 mentioned this pull request Feb 18, 2020
Yossi Farjoun added 4 commits February 19, 2020 14:46
@yfarjoun
Copy link
Contributor Author

@nh13 you inspired me to make this work on symbolic alleles as well...and then I also figured out that I can simplify the code around the liftover that involves reverse-complementing....anyhow, no good deed will go unpunished....want to review this?

@nh13
Copy link
Collaborator

nh13 commented Feb 19, 2020

@yfarjoun I can review if it helps. I won't be able to look at it until this weekend given my current workload. Is that allright?

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Feb 20, 2020 via email

@chengdai
Copy link

chengdai commented Feb 28, 2020

Any chance there has been an update to this? I'm encountering a similar error with the following genotype (test.b37.vcf.gz). For reference, I am trying to lift over genotypes from build 37 to build 38. I'm using the b37ToHg38 chain and GRCh38_full_analysis_set_plus_decoy_hla.fa found on the 1000 Genomes FTP

1       144609982       rs201830440     AATATATATATATAT A,<NON_REF>     788.73  .       DB;DP=28;ExcessHet=3.0103;MLEAC=1,1;MLEAF=0.5,0.5;RAW_MQ=68809;AC=1,1;AN=2      GT:AD:DP:GQ:PL:SB 1/2:0,19,0,0,0,0,0:19:2:824,41,194,783,0,780:0,0,9,10

The error message is:

[Fri Feb 28 19:05:05 UTC 2020] Executing as root@9823faf0b713 on Linux 4.14.154-128.181.amzn2.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.22.0-SNAPSHOT
INFO    2020-02-28 19:05:05     LiftoverVcf     Loading up the target reference genome.
INFO    2020-02-28 19:05:15     LiftoverVcf     Lifting variants over and sorting (not yet writing the output file.)
[Fri Feb 28 19:05:15 UTC 2020] picard.vcf.LiftoverVcf done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=6722945024
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
        at picard.util.LiftoverUtils.lambda$leftAlignVariant$4(LiftoverUtils.java:379)
        at java.util.stream.Collectors.lambda$groupingBy$45(Collectors.java:907)
        at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
        at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1620)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at picard.util.LiftoverUtils.leftAlignVariant(LiftoverUtils.java:379)
        at picard.util.LiftoverUtils.reverseComplementVariantContext(LiftoverUtils.java:178)
        at picard.util.LiftoverUtils.liftVariant(LiftoverUtils.java:76)
        at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:426)
        at picard.cmdline.CommandLineProgram.instanc
[test.b37.vcf.gz](https://github.com/broadinstitute/picard/files/4268648/test.b37.vcf.gz)
eMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Thanks!

@nh13
Copy link
Collaborator

nh13 commented Feb 28, 2020

Sorry, I'm extremely busy at work and haven't had time outside of work to look at this. Please do not wait on my review to hold this up, as I cannot commit to doing it in the next two weeks (work + vacation)

@yfarjoun yfarjoun requested a review from nh13 March 25, 2020 19:30
Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, only some very minor formatting changes.

return true;
}

return vc.getAlleles().stream().filter(a -> !a.isSymbolic()).filter(a -> !a.equals(Allele.SPAN_DEL)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the sun/oracle java code guidelines cover streaming operations yet, but I think it's more readable if each stream segment starts a new line:

        return vc.getAlleles().stream()
                .filter(a -> !a.isSymbolic())
                .filter(a -> !a.equals(Allele.SPAN_DEL))
                .anyMatch(a -> a.length() != 1);

@@ -365,7 +359,9 @@ protected static void leftAlignVariant(final VariantContextBuilder builder, fina

// Put each allele into the alleleBasesMap unless it is a spanning deletion.
// Spanning deletions are dealt with as a special case later in fixedAlleleMap.
alleles.stream().filter(a->!a.equals(Allele.SPAN_DEL)).forEach(a -> alleleBasesMap.put(a, a.getBases()));
alleles.stream()
.filter(a->!a.equals(Allele.SPAN_DEL)&&!a.isSymbolic())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Suggested change
.filter(a->!a.equals(Allele.SPAN_DEL)&&!a.isSymbolic())
.filter(a -> !a.equals(Allele.SPAN_DEL) && !a.isSymbolic())

also, above you write filter(a -> !a.isSymbolic()).filter(a -> !a.equals(Allele.SPAN_DEL). The version with one call to filter() is probably clearer, either way I would use the same form in both places to be consistent.


for (final Allele oldAllele : originalAlleles) {
alleles.add(LiftoverUtils.reverseComplement(oldAllele, target, refSeq, isIndel, addToStart));
private static boolean isIndelForLiftover(final VariantContext vc){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Suggested change
private static boolean isIndelForLiftover(final VariantContext vc){
private static boolean isIndelForLiftover(final VariantContext vc) {

@yfarjoun yfarjoun merged commit f20d6ec into master Apr 6, 2020
@yfarjoun yfarjoun deleted the yf_liftover_still_has_issues_with_END_attribute branch April 6, 2020 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Liftover gVCFs with indels LiftoverVCF + Badly formed variant context at location chr1:899903; getEnd()
4 participants