New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in vcf2adam #899

Closed
dnaase opened this Issue Dec 22, 2015 · 3 comments

Comments

Projects
None yet
2 participants
@dnaase

dnaase commented Dec 22, 2015

When I try to convert vcf to adam, no matter output gzipped or not.

adam-submit vcf2adam test.vcf test_vcf.adam -parquet_compression_codec GZIP

It always gives the following warnings.

Dec 22, 2015 2:28:27 PM WARNING: org.apache.parquet.hadoop.ParquetOutputCommitter: could not write summary file for test_vcf.adam
org.apache.parquet.io.ParquetEncodingException: test_vcf.adam/part-r-00000.gz.parquet invalid: all the files must be contained in the root test_vcf.adam

Also, when i load in test_vcf.adam adam-shell. it always give error:

java.lang.IllegalArgumentException: unknown VCF format, cannot create RecordReader: file:test_vcf.adam/part-r-00000.gz.parquet

I am using the newest adam-0.18.2 with spark-1.5.1-hadoop-2.4 in java 8 environment

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Dec 22, 2015

Member

Hello! I haven't been able to reproduce those exact warnings. Is the file test_vcf.adam/part-r-00000.gz.parquet empty?

Guessing at your input data, note that VCF files representing dbSNP may not include any genotypes, only variants, so you may wish to use the -only_variants flag in your vcf2adam step. E.g.

$ head dbsnp_138.hg19.vcf
##fileformat=VCFv4.1
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chrM    64      rs3883917       C       T       .       .       ASP;CAF=[0.9373,0.06268];COMMON=1;OTHERKG;R5;RS=3883917;RSPOS=64;SAO=0;SSR=0;VC=SNV;VP=0x050000020005000002000100;WGT=1;dbSNPBuildID=108

$ adam-submit -- vcf2adam -only_variants dbsnp_138.hg19.vcf dbsnp_138.hg19.adam
$ adam-submit -- print dbsnp_138.hg19.adam -o dbsnp_138.hg19.out
$ head dbsnp_138.hg19.out
{"variantErrorProbability": null, "contig": {"contigName": "chrY", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 1412328, "end": 1412329, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}
{"variantErrorProbability": null, "contig": {"contigName": "chrY", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 1412353, "end": 1412356, "referenceAllele": "TCA", "alternateAllele": "TT", "svAllele": null, "isSomatic": false}
Member

heuermh commented Dec 22, 2015

Hello! I haven't been able to reproduce those exact warnings. Is the file test_vcf.adam/part-r-00000.gz.parquet empty?

Guessing at your input data, note that VCF files representing dbSNP may not include any genotypes, only variants, so you may wish to use the -only_variants flag in your vcf2adam step. E.g.

$ head dbsnp_138.hg19.vcf
##fileformat=VCFv4.1
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chrM    64      rs3883917       C       T       .       .       ASP;CAF=[0.9373,0.06268];COMMON=1;OTHERKG;R5;RS=3883917;RSPOS=64;SAO=0;SSR=0;VC=SNV;VP=0x050000020005000002000100;WGT=1;dbSNPBuildID=108

$ adam-submit -- vcf2adam -only_variants dbsnp_138.hg19.vcf dbsnp_138.hg19.adam
$ adam-submit -- print dbsnp_138.hg19.adam -o dbsnp_138.hg19.out
$ head dbsnp_138.hg19.out
{"variantErrorProbability": null, "contig": {"contigName": "chrY", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 1412328, "end": 1412329, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}
{"variantErrorProbability": null, "contig": {"contigName": "chrY", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 1412353, "end": 1412356, "referenceAllele": "TCA", "alternateAllele": "TT", "svAllele": null, "isSomatic": false}
@dnaase

This comment has been minimized.

Show comment
Hide comment
@dnaase

dnaase Dec 23, 2015

Thanks a lot! I figure it out when i tried anno2adam!

dnaase commented Dec 23, 2015

Thanks a lot! I figure it out when i tried anno2adam!

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 7, 2016

Member

Great! May I close this issue?

Member

heuermh commented Jan 7, 2016

Great! May I close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment