Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadVcf does not dedupe sample ID #1874

Closed
fnothaft opened this issue Jan 15, 2018 · 0 comments
Closed

loadVcf does not dedupe sample ID #1874

fnothaft opened this issue Jan 15, 2018 · 0 comments
Assignees
Labels
bug
Milestone

Comments

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Jan 15, 2018

Happens when loading multiple VCFs that have the same sample ID. E.g., from 1kg:

val vcs = sc.loadVcf("1kg/release/20130502/ALL*.vcf.gz")
vcs: org.bdgenomics.adam.rdd.variant.VariantContextRDD = VariantContextRDD with 86 reference sequences and 58825 samples

That might be the right number of samples for ExAC (it isn't), but definitely too many for 1kg...

@fnothaft fnothaft added the bug label Jan 15, 2018
@fnothaft fnothaft added this to the 0.24.0 milestone Jan 15, 2018
@fnothaft fnothaft self-assigned this Jan 15, 2018
fnothaft pushed a commit to fnothaft/adam that referenced this issue Jan 16, 2018
Resolves bigdatagenomics#1874. While samples should be unique in a single VCF, we may load data
from multiple VCFs that contain the same samples (e.g., VCFs from a single
sequencing project where the VCFs are split by chromosome). This change dedupes
sample IDs on load.
heuermh added a commit that referenced this issue Jan 22, 2018
Resolves #1874. While samples should be unique in a single VCF, we may load data
from multiple VCFs that contain the same samples (e.g., VCFs from a single
sequencing project where the VCFs are split by chromosome). This change dedupes
sample IDs on load.
@heuermh heuermh added this to Completed in Release 0.24.0 Feb 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant