Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep information for <NON_REF> alleles #2116

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@karenfeng
Copy link
Contributor

commented Jan 16, 2019

INFO and FORMAT fields with Number=<A/R/G> are not properly stored in ADAM for the <NON_REF> allele. This is a problem with both default and non-default fields. During conversion to ADAM, most fields lose any information related to the <NON_REF> allele. This is with the exception of PL, for which the majority of values is stored between genotypeLikelihoods and nonRefLikelihoods. However, this still drops the value for the ALT/<NON_REF> genotype.

As a result of this bug, the output list for most fields with Number=<A/R/G> does not contain enough values in the resulting gVCF (those with Number=A should have 2 but have 1, those with Number=R should have 3 but have 2, and those with Number=G should have 10 but have 6). For PL, the missing value for the ALT/<NON_REF> genotype contains 3223.

To fix this problem, this PR does the following:

  • The <NON_REF> allele index is passed into the FORMAT functions.
  • For non-default fields with Number=<A/R>, the value for the <NON_REF> allele is added to the output list for addition to the attribute map (VariantAnnotation.attributes for INFO fields, and VariantCallingAnnotations.attributes for FORMAT fields).
  • For default fields with Number=<A/R>, the value for the <NON_REF> allele is added to the attribute map keyed as "NON_REF_<FIELD_NAME>" and restored to the output list during extraction if present.
  • For all fields with Number=G, the indices include those for REF, ALT, and <NON_REF>.
@AmplabJenkins

This comment has been minimized.

Copy link

commented Jan 16, 2019

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2935/

Build result: FAILURE

GitHub pull request #2116 of commit efbbe95 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2116/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains 6ebbd52 # timeout=10Checking out Revision 6ebbd52 (origin/pr/2116/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 6ebbd52b75b4382b4f5995d6a288e1fa6f6d3f36First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.11,2.2.2,ubuntuADAM-prb ? 2.7.5,2.11,2.2.2,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jan 16, 2019

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2936/

Build result: FAILURE

GitHub pull request #2116 of commit 36199e6 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2116/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains c37240f # timeout=10Checking out Revision c37240f (origin/pr/2116/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f c37240fc448ce498a0bb7a44aa75247b5edd5045First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.11,2.2.2,ubuntuADAM-prb ? 2.7.5,2.11,2.2.2,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Copy link
Member

commented Jan 21, 2019

Hello @karenfeng! As an alternative to this approach, have you considered splitting <NON_REF> alleles out to separate records? Adding all of these nonref allele fields and attributes to every Variant record makes me feel like we have the model wrong.

@heuermh

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

@karenfeng @fnothaft Would it be possible to start a discussion around this issue? I've done major refactoring of genotypes in #2117 but have yet to commit changes to non ref fields in that pull request, waiting for this to be resolved.

@heuermh

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

Ping @karenfeng @fnothaft for discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.