New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with transformVariant // Adam to vcf #1782

Closed
Rokshan2016 opened this Issue Oct 24, 2017 · 15 comments

Comments

Projects
3 participants
@Rokshan2016

Rokshan2016 commented Oct 24, 2017

Hi,
I am trying to convert adam to vcf . But getting this error. Is there any other way I can convert the adam file to .vcf file?

Command:

./adam-submit transformVariants hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam/ hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/100G_omni1.vcf -coalesce 1

or

./adam-submit transformVariants hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam/part-r-00000.gz.parquet hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/100G_omni1.vcf -coalesce 1

Error:

java.io.FileNotFoundException: Couldn't find any files matching hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam/part-r-00000.gz.parquet
17/10/24 19:06:48 INFO cli.TransformVariants: Overall Duration: 10.08 secs
Exception in thread "main" java.io.FileNotFoundException: Couldn't find any files matching hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam/part-r-00000.gz.parquet
at org.bdgenomics.adam.rdd.ADAMContext.getFsAndFilesWithFilter(ADAMContext.scala:1354)
at org.bdgenomics.adam.rdd.ADAMContext.loadAvroSequenceDictionary(ADAMContext.scala:1164)
at org.bdgenomics.adam.rdd.ADAMContext.loadParquetVariants(ADAMContext.scala:2124)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVariants$1.apply(ADAMContext.scala:2779)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVariants$1.apply(ADAMContext.scala:2774)
at scala.Option.fold(Option.scala:157)
at org.apache.spark.rdd.Timer.time(Timer.scala:48)
at org.bdgenomics.adam.rdd.ADAMContext.loadVariants(ADAMContext.scala:2772)
at org.bdgenomics.adam.cli.TransformVariants.run(TransformVariants.scala:120)
at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
at org.bdgenomics.adam.cli.TransformVariants.run(TransformVariants.scala:74)
at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:126)
at org.bdgenomics.adam.cli.ADAMMain$.main(ADAMMain.scala:65)
at org.bdgenomics.adam.cli.ADAMMain.main(ADAMMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/10/24 19:06:48 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/10/24 19:06:48 INFO ui.SparkUI: Stopped Spark web UI at http://10.48.3.64:4040
17/10/24 19:06:48 INFO cluster.Ya

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 25, 2017

Member

What does hadoop fs -ls hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam show?

Member

heuermh commented Oct 25, 2017

What does hadoop fs -ls hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam show?

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

First of all I convert it from 100G_omni1.vcf to 1000G_omni.adam. And i works fine. But when I tried to convert 1000G_omni.adam to vcf again it gives an error.

1000G_omni.adam contains

##fileformat=VCFv4.2
##FILTER=<ID=NOT_POLY_IN_1000G,Description="Alternate allele count = 0">
##FILTER=<ID=badAssayMapping,Description="The mapping information for the SNP assay is internally inconsistent in the chip metadata">
##FILTER=<ID=dup,Description="Duplicate assay at same position with worse Gentrain Score">
##FILTER=<ID=id10,Description="Within 10 bp of an known indel">
##FILTER=<ID=id20,Description="Within 20 bp of an known indel">
##FILTER=<ID=id5,Description="Within 5 bp of an known indel">
##FILTER=<ID=id50,Description="Within 50 bp of an known indel">
##FILTER=<ID=refN,Description="Reference base is N. Assay is designed for 2 alt alleles">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GC,Number=.,Type=Float,Description="Gencall Score">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the gVCF block">
##FORMAT=<ID=MQ,Number=1,Type=Float,Description="Root mean square (RMS) mapping quality">
##FORMAT=<ID=MQ0,Number=1,Type=Float,Description="Total number of reads with mapping quality=0">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Read-backed phasing quality">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phase set ID">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##FilterLiftedVariants="analysis_type=FilterLiftedVariants input_file=[] read_buffer_size=null phone_home=STANDARD read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL reference_sequence=/humgen/1kg/reference/human_g1k_v37.fasta rodBind=[] nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false disable_experimental_low_memory_sharding=false logging_level=INFO log_to_file=null help=false variant=(RodBinding name=variant source=./0.451323408008651.sorted.vcf) out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub filter_mismatching_base_and_quals=false"
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Membership in 1000 Genomes">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral allele">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count">
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total read depths for each allele">
##INFO=<ID=ADF,Number=R,Type=Integer,Description="Read depths for each allele on the forward strand">
##INFO=<ID=ADR,Number=R,Type=Integer,Description="Read depths for each allele on the reverse strand">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency for each allele">
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="Cigar string describing how to align alternate alleles to the reference allele">
##INFO=<ID=CR,Number=.,Type=Float,Description="SNP Callrate">
##INFO=<ID=DB,Number=0,Type=Flag,Description="Membership in dbSNP">
##INFO=<ID=GentrainScore,Number=.,Type=Float,Description="Gentrain Score">
##INFO=<ID=H2,Number=0,Type=Flag,Description="Membership in HapMap2">
##INFO=<ID=H3,Number=0,Type=Flag,Description="Membership in HapMap3">
##INFO=<ID=HW,Number=.,Type=Float,Description="Hardy-Weinberg Equilibrium">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic event">
##INFO=<ID=VALIDATED,Number=0,Type=Flag,Description="Validated by follow-up experiment">
##contig=<ID=1,length=249250621,assembly=b37>
##contig=<ID=10,length=135534747,assembly=b37>
##contig=<ID=11,length=135006516,assembly=b37>
##contig=<ID=12,length=133851895,assembly=b37>
##contig=<ID=13,length=115169878,assembly=b37>
##contig=<ID=14,length=107349540,assembly=b37>
##contig=<ID=15,length=102531392,assembly=b37>
##contig=<ID=16,length=90354753,assembly=b37>
##contig=<ID=17,length=81195210,assembly=b37>
##contig=<ID=18,length=78077248,assembly=b37>
##contig=<ID=19,length=59128983,assembly=b37>
##contig=<ID=2,length=243199373,assembly=b37>
##contig=<ID=20,length=63025520,assembly=b37>

Rokshan2016 commented Oct 25, 2017

First of all I convert it from 100G_omni1.vcf to 1000G_omni.adam. And i works fine. But when I tried to convert 1000G_omni.adam to vcf again it gives an error.

1000G_omni.adam contains

##fileformat=VCFv4.2
##FILTER=<ID=NOT_POLY_IN_1000G,Description="Alternate allele count = 0">
##FILTER=<ID=badAssayMapping,Description="The mapping information for the SNP assay is internally inconsistent in the chip metadata">
##FILTER=<ID=dup,Description="Duplicate assay at same position with worse Gentrain Score">
##FILTER=<ID=id10,Description="Within 10 bp of an known indel">
##FILTER=<ID=id20,Description="Within 20 bp of an known indel">
##FILTER=<ID=id5,Description="Within 5 bp of an known indel">
##FILTER=<ID=id50,Description="Within 50 bp of an known indel">
##FILTER=<ID=refN,Description="Reference base is N. Assay is designed for 2 alt alleles">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GC,Number=.,Type=Float,Description="Gencall Score">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the gVCF block">
##FORMAT=<ID=MQ,Number=1,Type=Float,Description="Root mean square (RMS) mapping quality">
##FORMAT=<ID=MQ0,Number=1,Type=Float,Description="Total number of reads with mapping quality=0">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Read-backed phasing quality">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phase set ID">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##FilterLiftedVariants="analysis_type=FilterLiftedVariants input_file=[] read_buffer_size=null phone_home=STANDARD read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL reference_sequence=/humgen/1kg/reference/human_g1k_v37.fasta rodBind=[] nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false disable_experimental_low_memory_sharding=false logging_level=INFO log_to_file=null help=false variant=(RodBinding name=variant source=./0.451323408008651.sorted.vcf) out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub filter_mismatching_base_and_quals=false"
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Membership in 1000 Genomes">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral allele">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count">
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total read depths for each allele">
##INFO=<ID=ADF,Number=R,Type=Integer,Description="Read depths for each allele on the forward strand">
##INFO=<ID=ADR,Number=R,Type=Integer,Description="Read depths for each allele on the reverse strand">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency for each allele">
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="Cigar string describing how to align alternate alleles to the reference allele">
##INFO=<ID=CR,Number=.,Type=Float,Description="SNP Callrate">
##INFO=<ID=DB,Number=0,Type=Flag,Description="Membership in dbSNP">
##INFO=<ID=GentrainScore,Number=.,Type=Float,Description="Gentrain Score">
##INFO=<ID=H2,Number=0,Type=Flag,Description="Membership in HapMap2">
##INFO=<ID=H3,Number=0,Type=Flag,Description="Membership in HapMap3">
##INFO=<ID=HW,Number=.,Type=Float,Description="Hardy-Weinberg Equilibrium">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic event">
##INFO=<ID=VALIDATED,Number=0,Type=Flag,Description="Validated by follow-up experiment">
##contig=<ID=1,length=249250621,assembly=b37>
##contig=<ID=10,length=135534747,assembly=b37>
##contig=<ID=11,length=135006516,assembly=b37>
##contig=<ID=12,length=133851895,assembly=b37>
##contig=<ID=13,length=115169878,assembly=b37>
##contig=<ID=14,length=107349540,assembly=b37>
##contig=<ID=15,length=102531392,assembly=b37>
##contig=<ID=16,length=90354753,assembly=b37>
##contig=<ID=17,length=81195210,assembly=b37>
##contig=<ID=18,length=78077248,assembly=b37>
##contig=<ID=19,length=59128983,assembly=b37>
##contig=<ID=2,length=243199373,assembly=b37>
##contig=<ID=20,length=63025520,assembly=b37>

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 25, 2017

Member

Hi @Rokshan2016

What happens if you run the command:

./adam-submit transformVariants hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/100G_omni1.vcf -coalesce 1

Specifically, this changes the input file name from hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam/part-r-00000.gz.parquet to hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam.

Member

fnothaft commented Oct 25, 2017

Hi @Rokshan2016

What happens if you run the command:

./adam-submit transformVariants hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/100G_omni1.vcf -coalesce 1

Specifically, this changes the input file name from hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam/part-r-00000.gz.parquet to hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/1000G_omni.adam.

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

yes. I tried that as well. Same error

Rokshan2016 commented Oct 25, 2017

yes. I tried that as well. Same error

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

I tried with latest adam version.
Command :
./adam-submit --driver-memory 3g --executor-memory 3g -- adam2vcf hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974P.vcf

Error :
: 22365 length: 22365 hosts: []}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
17/10/25 12:43:11 INFO ZlibFactory: Successfully loaded & initialized native-zlib library
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO Executor: Finished task 13.0 in stage 0.0 (TID 13). 1264 bytes result sent to driver
Oct 25, 2017 12:43:10 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 200
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 36 ms. row count = 2
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 34 ms. row count = 3
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in17/10/25 12:43:11 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, localhost, executor driver, partition 16, ANY, 6157 bytes)
17/10/25 12:43:11 INFO Executor: Running task 16.0 in stage 0.0 (TID 16)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00016.gz.parquet start: 0 end: 22365 length: 22365 hosts: []}
17/10/25 12:43:11 INFO Executor: Finished task 16.0 in stage 0.0 (TID 16). 1191 bytes result sent to driver
17/10/25 12:43:11 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, localhost, executor driver, partition 17, ANY, 6158 bytes)
17/10/25 12:43:11 INFO Executor: Running task 17.0 in stage 0.0 (TID 17)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00017.gz.parquet start: 0 end: 32533 length: 32533 hosts: []}
17/10/25 12:43:11 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 671 ms on localhost (executor driver) (1/200)
17/10/25 12:43:11 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 52 ms on localhost (executor driver) (2/200)
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:12 ERROR Executor: Exception in task 10.0 in stage 0.0 (TID 10)
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/10/25 12:43:12 ERROR Executor: Exception in task 8.0 in stage 0.0 (TID 8)
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/10/25 12:43:12 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, localhost, executor driver, partition 18, ANY, 6158 bytes)
17/10/25 12:43:12 INFO Executor: Running task 18.0 in stage 0.0 (TID 18)
17/10/25 12:43:12 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, localhost, executor driver, partition 19, ANY, 6156 bytes)
17/10/25 12:43:12 INFO Executor: Running task 19.0 in stage 0.0 (TID 19)
17/10/25 12:43:12 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, localhost, executor driver): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at

Rokshan2016 commented Oct 25, 2017

I tried with latest adam version.
Command :
./adam-submit --driver-memory 3g --executor-memory 3g -- adam2vcf hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974P.vcf

Error :
: 22365 length: 22365 hosts: []}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
17/10/25 12:43:11 INFO ZlibFactory: Successfully loaded & initialized native-zlib library
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO Executor: Finished task 13.0 in stage 0.0 (TID 13). 1264 bytes result sent to driver
Oct 25, 2017 12:43:10 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 200
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 36 ms. row count = 2
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 34 ms. row count = 3
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in17/10/25 12:43:11 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, localhost, executor driver, partition 16, ANY, 6157 bytes)
17/10/25 12:43:11 INFO Executor: Running task 16.0 in stage 0.0 (TID 16)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00016.gz.parquet start: 0 end: 22365 length: 22365 hosts: []}
17/10/25 12:43:11 INFO Executor: Finished task 16.0 in stage 0.0 (TID 16). 1191 bytes result sent to driver
17/10/25 12:43:11 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, localhost, executor driver, partition 17, ANY, 6158 bytes)
17/10/25 12:43:11 INFO Executor: Running task 17.0 in stage 0.0 (TID 17)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00017.gz.parquet start: 0 end: 32533 length: 32533 hosts: []}
17/10/25 12:43:11 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 671 ms on localhost (executor driver) (1/200)
17/10/25 12:43:11 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 52 ms on localhost (executor driver) (2/200)
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:12 ERROR Executor: Exception in task 10.0 in stage 0.0 (TID 10)
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/10/25 12:43:12 ERROR Executor: Exception in task 8.0 in stage 0.0 (TID 8)
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/10/25 12:43:12 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, localhost, executor driver, partition 18, ANY, 6158 bytes)
17/10/25 12:43:12 INFO Executor: Running task 18.0 in stage 0.0 (TID 18)
17/10/25 12:43:12 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, localhost, executor driver, partition 19, ANY, 6156 bytes)
17/10/25 12:43:12 INFO Executor: Running task 19.0 in stage 0.0 (TID 19)
17/10/25 12:43:12 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, localhost, executor driver): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

And this time it just printed the header, like this
SRR1517974P.vcf_head

##fileformat=VCFv4.2
##FILTER=<ID=HETINDELMAXAF,Description="Allelic fraction was above 0.666000 for a het INDEL.">
##FILTER=<ID=HETINDELMINAF,Description="Allelic fraction was below 0.333000 for a het INDEL.">
##FILTER=<ID=HETINDELQD,Description="Quality by depth was below 2.000000 for a heterozygous INDEL.">
##FILTER=<ID=HETSNPMAXAF,Description="Allelic fraction was above 0.666000 for a het SNP.">
##FILTER=<ID=HETSNPMINAF,Description="Allelic fraction was below 0.333000 for a het SNP.">
##FILTER=<ID=HETSNPQD,Description="Quality by depth was below 2.000000 for a heterozygous SNP.">
##FILTER=<ID=HOMINDELMINAF,Description="Allelic fraction was below 0.666000 for a hom INDEL.">
##FILTER=<ID=HOMINDELQD,Description="Quality by depth was below 1.000000 for a homozygous INDEL.">
##FILTER=<ID=HOMSNPMINAF,Description="Allelic fraction was below 0.666000 for a hom SNP.">
##FILTER=<ID=HOMSNPQD,Description="Quality by depth was below 1.000000 for a homozygous SNP.">
##FILTER=<ID=INDELMAXDP,Description="Read depth was above 200 for a INDEL.">
##FILTER=<ID=INDELMINDP,Description="Read depth was below 10 for a INDEL.">
##FILTER=<ID=SNPMAXDP,Description="Read depth was above 200 for a SNP.">
##FILTER=<ID=SNPMINDP,Description="Read depth was below 10 for a SNP.">
##FILTER=<ID=SNPMQ,Description="RMS mapping quality was below 30.000000 for a SNP.">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the gVCF block">
##FORMAT=<ID=MQ,Number=1,Type=Float,Description="Root mean square (RMS) mapping quality">
##FORMAT=<ID=MQ0,Number=1,Type=Float,Description="Total number of reads with mapping quality=0">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Read-backed phasing quality">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phase set ID">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Membership in 1000 Genomes">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral allele">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count">
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total read depths for each allele">
##INFO=<ID=ADF,Number=R,Type=Integer,Description="Read depths for each allele on the forward strand">
##INFO=<ID=ADR,Number=R,Type=Integer,Description="Read depths for each allele on the reverse strand">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency for each allele">
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="Cigar string describing how to align alternate alleles to the reference allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="Membership in dbSNP">
##INFO=<ID=H2,Number=0,Type=Flag,Description="Membership in HapMap2">
##INFO=<ID=H3,Number=0,Type=Flag,Description="Membership in HapMap3">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic event">
##INFO=<ID=VALIDATED,Number=0,Type=Flag,Description="Validated by follow-up experiment">
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $SRR1517974

Cancel

Rokshan2016 commented Oct 25, 2017

And this time it just printed the header, like this
SRR1517974P.vcf_head

##fileformat=VCFv4.2
##FILTER=<ID=HETINDELMAXAF,Description="Allelic fraction was above 0.666000 for a het INDEL.">
##FILTER=<ID=HETINDELMINAF,Description="Allelic fraction was below 0.333000 for a het INDEL.">
##FILTER=<ID=HETINDELQD,Description="Quality by depth was below 2.000000 for a heterozygous INDEL.">
##FILTER=<ID=HETSNPMAXAF,Description="Allelic fraction was above 0.666000 for a het SNP.">
##FILTER=<ID=HETSNPMINAF,Description="Allelic fraction was below 0.333000 for a het SNP.">
##FILTER=<ID=HETSNPQD,Description="Quality by depth was below 2.000000 for a heterozygous SNP.">
##FILTER=<ID=HOMINDELMINAF,Description="Allelic fraction was below 0.666000 for a hom INDEL.">
##FILTER=<ID=HOMINDELQD,Description="Quality by depth was below 1.000000 for a homozygous INDEL.">
##FILTER=<ID=HOMSNPMINAF,Description="Allelic fraction was below 0.666000 for a hom SNP.">
##FILTER=<ID=HOMSNPQD,Description="Quality by depth was below 1.000000 for a homozygous SNP.">
##FILTER=<ID=INDELMAXDP,Description="Read depth was above 200 for a INDEL.">
##FILTER=<ID=INDELMINDP,Description="Read depth was below 10 for a INDEL.">
##FILTER=<ID=SNPMAXDP,Description="Read depth was above 200 for a SNP.">
##FILTER=<ID=SNPMINDP,Description="Read depth was below 10 for a SNP.">
##FILTER=<ID=SNPMQ,Description="RMS mapping quality was below 30.000000 for a SNP.">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the gVCF block">
##FORMAT=<ID=MQ,Number=1,Type=Float,Description="Root mean square (RMS) mapping quality">
##FORMAT=<ID=MQ0,Number=1,Type=Float,Description="Total number of reads with mapping quality=0">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Read-backed phasing quality">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phase set ID">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Membership in 1000 Genomes">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral allele">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count">
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total read depths for each allele">
##INFO=<ID=ADF,Number=R,Type=Integer,Description="Read depths for each allele on the forward strand">
##INFO=<ID=ADR,Number=R,Type=Integer,Description="Read depths for each allele on the reverse strand">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency for each allele">
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=CIGAR,Number=A,Type=String,Description="Cigar string describing how to align alternate alleles to the reference allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="Membership in dbSNP">
##INFO=<ID=H2,Number=0,Type=Flag,Description="Membership in HapMap2">
##INFO=<ID=H3,Number=0,Type=Flag,Description="Membership in HapMap3">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic event">
##INFO=<ID=VALIDATED,Number=0,Type=Flag,Description="Validated by follow-up experiment">
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $SRR1517974

Cancel

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

SRR1517974.fastq -> bwa alignment -> SRR1517974.sam -> sort + mark duplicate + base_recalibration --> SRR1517974.adam -> variant calling with avocado ->SRR1517974A.adam

Rokshan2016 commented Oct 25, 2017

SRR1517974.fastq -> bwa alignment -> SRR1517974.sam -> sort + mark duplicate + base_recalibration --> SRR1517974.adam -> variant calling with avocado ->SRR1517974A.adam

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 25, 2017

Member

Note -coalesce 1 is not the same as -single. When transforming from Variants in Parquet+Avro format to VCF, if you want the VCF in a single file, you need the -single argument.

$ adam-submit transformVariants in.vcf in.variants.adam

$ ls -ls in.adam/
total 152
 0 -rw-r--r--  1       0 Oct 25 11:49 _SUCCESS
32 -rw-r--r--  1   13652 Oct 25 11:49 _common_metadata
24 -rw-r--r--  1    9419 Oct 25 11:49 _header
40 -rw-r--r--  1   18408 Oct 25 11:49 _metadata
 8 -rw-r--r--  1    1398 Oct 25 11:49 _seqdict.avro
48 -rw-r--r--  1   20904 Oct 25 11:49 part-r-00000.gz.parquet

All the output partitions are merged into a single VCF file

$ adam-submit transformVariants -single in.variants.adam out.vcf 

$ ls -ls out.vcf 
24 -rw-r--r--  1   10552 Oct 25 11:51 out.vcf

If you leave out -single, then you get a directory of partitions of the VCF file

$ adam-submit transformVariants -single in.variants.adam out.vcf 

$ ls -ls out.vcf 
total 24
 0 -rw-r--r--  1 heuermh  staff      0 Oct 25 11:52 _SUCCESS
24 -rw-r--r--  1 heuermh  staff  10552 Oct 25 11:52 part-r-00000

(In this small example, there is only enough data to fill one partition. A larger data set would show multiple part-r-* files.)

Member

heuermh commented Oct 25, 2017

Note -coalesce 1 is not the same as -single. When transforming from Variants in Parquet+Avro format to VCF, if you want the VCF in a single file, you need the -single argument.

$ adam-submit transformVariants in.vcf in.variants.adam

$ ls -ls in.adam/
total 152
 0 -rw-r--r--  1       0 Oct 25 11:49 _SUCCESS
32 -rw-r--r--  1   13652 Oct 25 11:49 _common_metadata
24 -rw-r--r--  1    9419 Oct 25 11:49 _header
40 -rw-r--r--  1   18408 Oct 25 11:49 _metadata
 8 -rw-r--r--  1    1398 Oct 25 11:49 _seqdict.avro
48 -rw-r--r--  1   20904 Oct 25 11:49 part-r-00000.gz.parquet

All the output partitions are merged into a single VCF file

$ adam-submit transformVariants -single in.variants.adam out.vcf 

$ ls -ls out.vcf 
24 -rw-r--r--  1   10552 Oct 25 11:51 out.vcf

If you leave out -single, then you get a directory of partitions of the VCF file

$ adam-submit transformVariants -single in.variants.adam out.vcf 

$ ls -ls out.vcf 
total 24
 0 -rw-r--r--  1 heuermh  staff      0 Oct 25 11:52 _SUCCESS
24 -rw-r--r--  1 heuermh  staff  10552 Oct 25 11:52 part-r-00000

(In this small example, there is only enough data to fill one partition. A larger data set would show multiple part-r-* files.)

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

ok , got that. I tried with single option. But its giving the same error. Any suggestion for latest adam version? because I am using avocado in spark 2 . So thats why using adam latest version. Trying to use the command adam2vcf

Rokshan2016 commented Oct 25, 2017

ok , got that. I tried with single option. But its giving the same error. Any suggestion for latest adam version? because I am using avocado in spark 2 . So thats why using adam latest version. Trying to use the command adam2vcf

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

Is there any option in avocado that I can use to get the output file in one single file. Because i am getting 199 parquet files.

Rokshan2016 commented Oct 25, 2017

Is there any option in avocado that I can use to get the output file in one single file. Because i am getting 199 parquet files.

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

Hi @

I tried this command:
./adam-submit transformVariants -single hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/SRR1518011.adam hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/SRR1518011.vcf

Now I am getting this issue:

17/10/25 13:21:41 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, ip-10-48-3-65.ips.local, executor 4, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:41 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-48-3-65.ips.local, executor 4): org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch. Avro field 'variant' not found.
at org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:128)
at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:89)
at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:64)
at org.apache.parquet.avro.AvroCompatRecordMaterializer.(AvroCompatRecordMaterializer.java:34)
at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:138)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:130)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:179)
at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:201)
at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

17/10/25 13:21:41 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 15, ip-10-48-3-65.ips.local, executor 4, partition 0, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:41 INFO scheduler.TaskSetManager: Lost task 14.0 in stage 0.0 (TID 14) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 1]
17/10/25 13:21:41 INFO scheduler.TaskSetManager: Starting task 14.1 in stage 0.0 (TID 16, ip-10-48-3-65.ips.local, executor 4, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:41 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 15) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 2]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 17, ip-10-48-3-65.ips.local, executor 8, partition 0, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on ip-10-48-3-65.ips.local, executor 8: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 3]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 0.0 (TID 18, ip-10-48-3-65.ips.local, executor 4, partition 1, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 14.1 in stage 0.0 (TID 16) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 4]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 14.2 in stage 0.0 (TID 19, ip-10-48-3-65.ips.local, executor 4, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 1.1 in stage 0.0 (TID 18) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 5]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 1.2 in stage 0.0 (TID 20, ip-10-48-3-65.ips.local, executor 4, partition 1, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 14.2 in stage 0.0 (TID 19) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 6]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 14.3 in stage 0.0 (TID 21, ip-10-48-3-65.ips.local, executor 8, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 17) on ip-10-48-3-65.ips.local, executor 8: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 7]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 22, ip-10-48-3-65.ips.local, executor 4, partition 0, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 1.2 in stage 0.0 (TID 20) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 8]
17/10/25 13:21:42 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-48-3-64.ips.local:38844 (size: 28.8 KB, free: 530.0 MB)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 1.3 in stage 0.0 (TID 23, ip-10-48-3-65.ips.local, executor 8, partition 1, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 14.3 in stage 0.0 (TID 21) on ip-10-48-3-65.ips.local, executor 8: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 9]
17/10/25 13:21:42 ERROR scheduler.TaskSetManager: Task 14 in stage 0.0 failed 4 times; aborting job
17/10/25 13:21:42 INFO cluster.YarnScheduler: Cancelling stage 0
17/10/25 13:21:42 INFO cluster.YarnScheduler: Stage 0 was cancelled
17/10/

Rokshan2016 commented Oct 25, 2017

Hi @

I tried this command:
./adam-submit transformVariants -single hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/SRR1518011.adam hdfs://ip-10-48-3-5.ips.local:8020/user/rokshan.jahan/data/SRR1518011.vcf

Now I am getting this issue:

17/10/25 13:21:41 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, ip-10-48-3-65.ips.local, executor 4, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:41 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-48-3-65.ips.local, executor 4): org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch. Avro field 'variant' not found.
at org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:128)
at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:89)
at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:64)
at org.apache.parquet.avro.AvroCompatRecordMaterializer.(AvroCompatRecordMaterializer.java:34)
at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:138)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:130)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:179)
at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:201)
at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

17/10/25 13:21:41 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 15, ip-10-48-3-65.ips.local, executor 4, partition 0, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:41 INFO scheduler.TaskSetManager: Lost task 14.0 in stage 0.0 (TID 14) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 1]
17/10/25 13:21:41 INFO scheduler.TaskSetManager: Starting task 14.1 in stage 0.0 (TID 16, ip-10-48-3-65.ips.local, executor 4, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:41 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 15) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 2]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 17, ip-10-48-3-65.ips.local, executor 8, partition 0, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on ip-10-48-3-65.ips.local, executor 8: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 3]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 0.0 (TID 18, ip-10-48-3-65.ips.local, executor 4, partition 1, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 14.1 in stage 0.0 (TID 16) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 4]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 14.2 in stage 0.0 (TID 19, ip-10-48-3-65.ips.local, executor 4, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 1.1 in stage 0.0 (TID 18) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 5]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 1.2 in stage 0.0 (TID 20, ip-10-48-3-65.ips.local, executor 4, partition 1, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 14.2 in stage 0.0 (TID 19) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 6]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 14.3 in stage 0.0 (TID 21, ip-10-48-3-65.ips.local, executor 8, partition 14, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 17) on ip-10-48-3-65.ips.local, executor 8: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 7]
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 22, ip-10-48-3-65.ips.local, executor 4, partition 0, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 1.2 in stage 0.0 (TID 20) on ip-10-48-3-65.ips.local, executor 4: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 8]
17/10/25 13:21:42 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-48-3-64.ips.local:38844 (size: 28.8 KB, free: 530.0 MB)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Starting task 1.3 in stage 0.0 (TID 23, ip-10-48-3-65.ips.local, executor 8, partition 1, NODE_LOCAL, 2310 bytes)
17/10/25 13:21:42 INFO scheduler.TaskSetManager: Lost task 14.3 in stage 0.0 (TID 21) on ip-10-48-3-65.ips.local, executor 8: org.apache.parquet.io.InvalidRecordException (Parquet/Avro schema mismatch. Avro field 'variant' not found.) [duplicate 9]
17/10/25 13:21:42 ERROR scheduler.TaskSetManager: Task 14 in stage 0.0 failed 4 times; aborting job
17/10/25 13:21:42 INFO cluster.YarnScheduler: Cancelling stage 0
17/10/25 13:21:42 INFO cluster.YarnScheduler: Stage 0 was cancelled
17/10/

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 25, 2017

Member

Hi @Rokshan2016! Do you know what versions of ADAM and Avocado you are running? It looks like you are running incompatible versions of the two tools. If you call either tool with --version, it will print the Git commit hashes your JAR was built from.

BTW, you can't use transformVariants with the output of Avocado; you need to use transformGenotypes. Additionally, bigdatagenomics/avocado#266 added code that directly exports Avocado's genotype output to VCF.

Member

fnothaft commented Oct 25, 2017

Hi @Rokshan2016! Do you know what versions of ADAM and Avocado you are running? It looks like you are running incompatible versions of the two tools. If you call either tool with --version, it will print the Git commit hashes your JAR was built from.

BTW, you can't use transformVariants with the output of Avocado; you need to use transformGenotypes. Additionally, bigdatagenomics/avocado#266 added code that directly exports Avocado's genotype output to VCF.

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 25, 2017

Hi @fnothaft
I am using the following versions --

ADAM version: 0.22.0
Built for: Apache Spark 2.1.0, Scala 2.11.8, and Hadoop 2.7.3

Avocado :

Avocado version: 0.0.3-SNAPSHOT
Commit: ${git.commit.id} Build: ${timestamp}
Built for: Scala 2.11.8 and Hadoop 2.6.0

** In new version Adam, do not find transformGenotypes option. But found the following options

ADAM ACTIONS
countKmers : Counts the k-mers/q-mers from a read dataset.
countContigKmers : Counts the k-mers/q-mers from a read dataset.
transform : Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations
transformFeatures : Convert a file with sequence features into corresponding ADAM format and vice versa
mergeShards : Merges the shards of a file
reads2coverage : Calculate the coverage from a given ADAM file

CONVERSION OPERATIONS
vcf2adam : Convert a VCF file to the corresponding ADAM format
adam2vcf : Convert an ADAM variant to the VCF ADAM format
fasta2adam : Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences.
adam2fasta : Convert ADAM nucleotide contig fragments to FASTA files
adam2fastq : Convert BAM to FASTQ files
fragments2reads : Convert alignment records into fragment records.
reads2fragments : Convert alignment records into fragment records.

PRINT
print : Print an ADAM formatted file
flagstat : Print statistics on reads in an ADAM file (similar to samtools flagstat)
view : View certain reads from an alignment-record file.

I tried adam2vcf but it is giving this error:

I tried with latest adam version.
Command :
./adam-submit --driver-memory 3g --executor-memory 3g -- adam2vcf hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974P.vcf

Error :
: 22365 length: 22365 hosts: []}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
17/10/25 12:43:11 INFO ZlibFactory: Successfully loaded & initialized native-zlib library
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO Executor: Finished task 13.0 in stage 0.0 (TID 13). 1264 bytes result sent to driver
Oct 25, 2017 12:43:10 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 200
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 36 ms. row count = 2
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 34 ms. row count = 3
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in17/10/25 12:43:11 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, localhost, executor driver, partition 16, ANY, 6157 bytes)
17/10/25 12:43:11 INFO Executor: Running task 16.0 in stage 0.0 (TID 16)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00016.gz.parquet start: 0 end: 22365 length: 22365 hosts: []}
17/10/25 12:43:11 INFO Executor: Finished task 16.0 in stage 0.0 (TID 16). 1191 bytes result sent to driver
17/10/25 12:43:11 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, localhost, executor driver, partition 17, ANY, 6158 bytes)
17/10/25 12:43:11 INFO Executor: Running task 17.0 in stage 0.0 (TID 17)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00017.gz.parquet start: 0 end: 32533 length: 32533 hosts: []}
17/10/25 12:43:11 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 671 ms on localhost (executor driver) (1/200)
17/10/25 12:43:11 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 52 ms on localhost (executor driver) (2/200)
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:12 ERROR Executor: Exception in task 10.0 in stage 0.0 (TID 10)
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

** I am expecting if we can have some tool that can convert the avocado output(.adam) to .vcf file

Rokshan2016 commented Oct 25, 2017

Hi @fnothaft
I am using the following versions --

ADAM version: 0.22.0
Built for: Apache Spark 2.1.0, Scala 2.11.8, and Hadoop 2.7.3

Avocado :

Avocado version: 0.0.3-SNAPSHOT
Commit: ${git.commit.id} Build: ${timestamp}
Built for: Scala 2.11.8 and Hadoop 2.6.0

** In new version Adam, do not find transformGenotypes option. But found the following options

ADAM ACTIONS
countKmers : Counts the k-mers/q-mers from a read dataset.
countContigKmers : Counts the k-mers/q-mers from a read dataset.
transform : Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations
transformFeatures : Convert a file with sequence features into corresponding ADAM format and vice versa
mergeShards : Merges the shards of a file
reads2coverage : Calculate the coverage from a given ADAM file

CONVERSION OPERATIONS
vcf2adam : Convert a VCF file to the corresponding ADAM format
adam2vcf : Convert an ADAM variant to the VCF ADAM format
fasta2adam : Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences.
adam2fasta : Convert ADAM nucleotide contig fragments to FASTA files
adam2fastq : Convert BAM to FASTQ files
fragments2reads : Convert alignment records into fragment records.
reads2fragments : Convert alignment records into fragment records.

PRINT
print : Print an ADAM formatted file
flagstat : Print statistics on reads in an ADAM file (similar to samtools flagstat)
view : View certain reads from an alignment-record file.

I tried adam2vcf but it is giving this error:

I tried with latest adam version.
Command :
./adam-submit --driver-memory 3g --executor-memory 3g -- adam2vcf hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974P.vcf

Error :
: 22365 length: 22365 hosts: []}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
17/10/25 12:43:11 INFO ZlibFactory: Successfully loaded & initialized native-zlib library
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:11 INFO Executor: Finished task 13.0 in stage 0.0 (TID 13). 1264 bytes result sent to driver
Oct 25, 2017 12:43:10 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 200
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 2 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 3 records.
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 36 ms. row count = 2
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 34 ms. row count = 3
Oct 25, 2017 12:43:11 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in17/10/25 12:43:11 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, localhost, executor driver, partition 16, ANY, 6157 bytes)
17/10/25 12:43:11 INFO Executor: Running task 16.0 in stage 0.0 (TID 16)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00016.gz.parquet start: 0 end: 22365 length: 22365 hosts: []}
17/10/25 12:43:11 INFO Executor: Finished task 16.0 in stage 0.0 (TID 16). 1191 bytes result sent to driver
17/10/25 12:43:11 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, localhost, executor driver, partition 17, ANY, 6158 bytes)
17/10/25 12:43:11 INFO Executor: Running task 17.0 in stage 0.0 (TID 17)
17/10/25 12:43:11 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: hdfs://ipsawdvpvfhnn03.ips.local:8020/user/rokshan.jahan/data/SRR1517974A.adam/part-r-00017.gz.parquet start: 0 end: 32533 length: 32533 hosts: []}
17/10/25 12:43:11 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 671 ms on localhost (executor driver) (1/200)
17/10/25 12:43:11 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 52 ms on localhost (executor driver) (2/200)
17/10/25 12:43:11 INFO CodecPool: Got brand-new decompressor [.gz]
17/10/25 12:43:12 ERROR Executor: Exception in task 10.0 in stage 0.0 (TID 10)
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:80)
at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:138)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:52)
at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:41)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:239)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

** I am expecting if we can have some tool that can convert the avocado output(.adam) to .vcf file

@Rokshan2016

This comment has been minimized.

Show comment
Hide comment
@Rokshan2016

Rokshan2016 Oct 26, 2017

Hi, TransformGenotypes works fine.

Rokshan2016 commented Oct 26, 2017

Hi, TransformGenotypes works fine.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 7, 2017

Member

Thank you, @Rokshan2016

Member

heuermh commented Nov 7, 2017

Thank you, @Rokshan2016

@heuermh heuermh closed this Nov 7, 2017

@heuermh heuermh added this to the 0.23.0 milestone Dec 7, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment