New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro.GenericData error with ADAM 0.12.0 on reading from ADAM file #290

Closed
pkondratyuk opened this Issue Jul 4, 2014 · 20 comments

Comments

Projects
None yet
7 participants
@pkondratyuk

pkondratyuk commented Jul 4, 2014

We've built a simple program based on the adam-core API that reads from an ADAM file and tries to write a SAM file, and it compiles fine but for some reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);

val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 4, 2014

This error occurs when Spark isn't able to find the ADAMRecord class in
your classpath. It typically occurs when forget to specify the -spark_jar
option on the commandline (which will copy the jar file to all the worker
nodes).

Let us know if that fixes your problem.

Matt http://www.linkedin.com/in/mattmassie/ Massie
http://www.twitter.com/matt_massie
UC, Berkeley AMPLab https://twitter.com/amplab

On Fri, Jul 4, 2014 at 2:45 PM, pkondratyuk notifications@github.com
wrote:

We've built a simple program based on the adam-core API that reads from an
ADAM file and tries to write a SAM file, and it compiles fine but for some
reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);

val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new
ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}


Reply to this email directly or view it on GitHub
#290.

AmplabJenkins commented Jul 4, 2014

This error occurs when Spark isn't able to find the ADAMRecord class in
your classpath. It typically occurs when forget to specify the -spark_jar
option on the commandline (which will copy the jar file to all the worker
nodes).

Let us know if that fixes your problem.

Matt http://www.linkedin.com/in/mattmassie/ Massie
http://www.twitter.com/matt_massie
UC, Berkeley AMPLab https://twitter.com/amplab

On Fri, Jul 4, 2014 at 2:45 PM, pkondratyuk notifications@github.com
wrote:

We've built a simple program based on the adam-core API that reads from an
ADAM file and tries to write a SAM file, and it compiles fine but for some
reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);

val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new
ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}


Reply to this email directly or view it on GitHub
#290.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 4, 2014

Looking at your code more closely, (1) add the ADAM jar to the SparkContext
(ignore my commandline advice) and (2) you need to drop your SAM file onto
HDFS. You almost never want to mix local and distributed processing.

Matt http://www.linkedin.com/in/mattmassie/ Massie
http://www.twitter.com/matt_massie
UC, Berkeley AMPLab https://twitter.com/amplab

On Fri, Jul 4, 2014 at 2:51 PM, Matt Massie massie@eecs.berkeley.edu
wrote:

This error occurs when Spark isn't able to find the ADAMRecord class in
your classpath. It typically occurs when forget to specify the -spark_jar
option on the commandline (which will copy the jar file to all the worker
nodes).

Let us know if that fixes your problem.

Matt http://www.linkedin.com/in/mattmassie/ Massie
http://www.twitter.com/matt_massie
UC, Berkeley AMPLab https://twitter.com/amplab

On Fri, Jul 4, 2014 at 2:45 PM, pkondratyuk notifications@github.com
wrote:

We've built a simple program based on the adam-core API that reads from
an ADAM file and tries to write a SAM file, and it compiles fine but for
some reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);

val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new
ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}


Reply to this email directly or view it on GitHub
#290.

AmplabJenkins commented Jul 4, 2014

Looking at your code more closely, (1) add the ADAM jar to the SparkContext
(ignore my commandline advice) and (2) you need to drop your SAM file onto
HDFS. You almost never want to mix local and distributed processing.

Matt http://www.linkedin.com/in/mattmassie/ Massie
http://www.twitter.com/matt_massie
UC, Berkeley AMPLab https://twitter.com/amplab

On Fri, Jul 4, 2014 at 2:51 PM, Matt Massie massie@eecs.berkeley.edu
wrote:

This error occurs when Spark isn't able to find the ADAMRecord class in
your classpath. It typically occurs when forget to specify the -spark_jar
option on the commandline (which will copy the jar file to all the worker
nodes).

Let us know if that fixes your problem.

Matt http://www.linkedin.com/in/mattmassie/ Massie
http://www.twitter.com/matt_massie
UC, Berkeley AMPLab https://twitter.com/amplab

On Fri, Jul 4, 2014 at 2:45 PM, pkondratyuk notifications@github.com
wrote:

We've built a simple program based on the adam-core API that reads from
an ADAM file and tries to write a SAM file, and it compiles fine but for
some reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);

val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new
ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}


Reply to this email directly or view it on GitHub
#290.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 4, 2014

Member

@massie @AmplabJenkins Good catch! Perhaps @pkondratyuk can reuse the spark creation code inside of SparkCommand?

Member

fnothaft commented Jul 4, 2014

@massie @AmplabJenkins Good catch! Perhaps @pkondratyuk can reuse the spark creation code inside of SparkCommand?

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jul 4, 2014

Hm... That does not seem to help. We tried adding the adam-(version).jar with sc.addJar() and the error is still there. Earlier we also tried passing our "fat jar" with the SparkContext constructor, to the same effect.

I see that SparkCommand creates the spark context in a different manner... Should we try that method?

pkondratyuk commented Jul 4, 2014

Hm... That does not seem to help. We tried adding the adam-(version).jar with sc.addJar() and the error is still there. Earlier we also tried passing our "fat jar" with the SparkContext constructor, to the same effect.

I see that SparkCommand creates the spark context in a different manner... Should we try that method?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 5, 2014

Member

@pkondratyuk I just got an email from a coworker who had a similar issue; I think it's something more subtle. Over the last fortnight, we've moved the schemas out of the core ADAM project (this repo) and into the bdg-formats repository. We did this to make it easier to package the Avro schemas for different languages (e.g., the JVM, C, C++, Python, etc). It's possible that you've converted read files to ADAM using a new build of ADAM, and the schemas were stored using the org.bdgenomics.formats.avro.ADAMRecord schema. Now, you're trying to read it with the org.bdgenomics.adam.avro.ADAMRecord schema, and Avro is getting confused because the package path for the schemas doesn't match up.*

Can you send the output of the following command? You should run this on whichever ADAM jar you used to convert files to ADAM.

java -jar adam-cli/target/adam-0.12.1-SNAPSHOT.jar buildinfo

If it returns a commit later than #280, then that's our culprit!

I think the best way to fix this is to point at the bdg-formats repository for your schemas. Confusingly enough, a 0.12.1-SNAPSHOT release exists for the adam-formats project... This confusion should be fixed when we cut a new 0.13.0 or 0.12.1 release of ADAM.

* Just to expand on this, normally Avro gracefully handles schema changes by using the default value for new fields and by dropping old fields. However, since we changed the package path of the schemas, Avro is understandably confused.

Member

fnothaft commented Jul 5, 2014

@pkondratyuk I just got an email from a coworker who had a similar issue; I think it's something more subtle. Over the last fortnight, we've moved the schemas out of the core ADAM project (this repo) and into the bdg-formats repository. We did this to make it easier to package the Avro schemas for different languages (e.g., the JVM, C, C++, Python, etc). It's possible that you've converted read files to ADAM using a new build of ADAM, and the schemas were stored using the org.bdgenomics.formats.avro.ADAMRecord schema. Now, you're trying to read it with the org.bdgenomics.adam.avro.ADAMRecord schema, and Avro is getting confused because the package path for the schemas doesn't match up.*

Can you send the output of the following command? You should run this on whichever ADAM jar you used to convert files to ADAM.

java -jar adam-cli/target/adam-0.12.1-SNAPSHOT.jar buildinfo

If it returns a commit later than #280, then that's our culprit!

I think the best way to fix this is to point at the bdg-formats repository for your schemas. Confusingly enough, a 0.12.1-SNAPSHOT release exists for the adam-formats project... This confusion should be fixed when we cut a new 0.13.0 or 0.12.1 release of ADAM.

* Just to expand on this, normally Avro gracefully handles schema changes by using the default value for new fields and by dropping old fields. However, since we changed the package path of the schemas, Avro is understandably confused.

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jul 5, 2014

@fnothaft: I have tried switching to ADAM 0.11.0 and the problem is still there, surprisingly... The current adam files we are attempting to read were created with that version too.

The buildinfo query you mentioned gives a NullPointerException with adam-0.11.0.jar, don't know how to diagnose it any further...

What should I add to the pom.xml file in order to point to the bdg-formats repository? Is there a way to attach the pom file somehow to my comment, so that you could maybe spot a problem with it?

Currently, our scala program pulls the adam-0.11.0 jars from the apache repository, but trying to add a dependency to bdg-formats fails because of a missing pom file.

pkondratyuk commented Jul 5, 2014

@fnothaft: I have tried switching to ADAM 0.11.0 and the problem is still there, surprisingly... The current adam files we are attempting to read were created with that version too.

The buildinfo query you mentioned gives a NullPointerException with adam-0.11.0.jar, don't know how to diagnose it any further...

What should I add to the pom.xml file in order to point to the bdg-formats repository? Is there a way to attach the pom file somehow to my comment, so that you could maybe spot a problem with it?

Currently, our scala program pulls the adam-0.11.0 jars from the apache repository, but trying to add a dependency to bdg-formats fails because of a missing pom file.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 5, 2014

Member

@pkondratyuk Could you send the pom.xml you're using to build? I think we may be able to resolve this issue quicker if I can reproduce your issue on my end. You can either post it here, or email it to me at fnothaft@berkeley.edu.

Member

fnothaft commented Jul 5, 2014

@pkondratyuk Could you send the pom.xml you're using to build? I think we may be able to resolve this issue quicker if I can reproduce your issue on my end. You can either post it here, or email it to me at fnothaft@berkeley.edu.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 5, 2014

Member

@pkondratyuk I've just sent you an email back; I believe it's an issue with transitive dependencies. Let me know if it resolves your issues. I've opened #289 to add further docs to the wiki about building apps downstream of ADAM.

Member

fnothaft commented Jul 5, 2014

@pkondratyuk I've just sent you an email back; I believe it's an issue with transitive dependencies. Let me know if it resolves your issues. I've opened #289 to add further docs to the wiki about building apps downstream of ADAM.

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Jul 24, 2014

Contributor

@fnothaft @pkondratyuk Can I close this issue? Did the email (and #289) resolve the problem?

Contributor

tdanford commented Jul 24, 2014

@fnothaft @pkondratyuk Can I close this issue? Did the email (and #289) resolve the problem?

@tdanford tdanford added bug and removed discussion labels Jul 24, 2014

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jul 24, 2014

@tdanford:

We have not found how to solve the problem. Trying to exclude dependencies did not work on our system.

We are still searching for the solution.

pkondratyuk commented Jul 24, 2014

@tdanford:

We have not found how to solve the problem. Trying to exclude dependencies did not work on our system.

We are still searching for the solution.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 2, 2015

Member

@pkondratyuk recent build changes (currently well past 0.12.0, we're now at version 0.18.1) should improve the situation for extending ADAM. Please let me know if you are still looking at this.

Member

heuermh commented Nov 2, 2015

@pkondratyuk recent build changes (currently well past 0.12.0, we're now at version 0.18.1) should improve the situation for extending ADAM. Please let me know if you are still looking at this.

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Nov 2, 2015

Hi,
We are no longer using that code.
Thanks,
Peter

On Mon, Nov 2, 2015 at 4:10 PM, Michael L Heuer notifications@github.com
wrote:

@pkondratyuk https://github.com/pkondratyuk recent build changes
(currently well past 0.12.0, we're now at version 0.18.1) should improve
the situation for extending ADAM. Please let me know if you are still
looking at this.


Reply to this email directly or view it on GitHub
#290 (comment)
.

pkondratyuk commented Nov 2, 2015

Hi,
We are no longer using that code.
Thanks,
Peter

On Mon, Nov 2, 2015 at 4:10 PM, Michael L Heuer notifications@github.com
wrote:

@pkondratyuk https://github.com/pkondratyuk recent build changes
(currently well past 0.12.0, we're now at version 0.18.1) should improve
the situation for extending ADAM. Please let me know if you are still
looking at this.


Reply to this email directly or view it on GitHub
#290 (comment)
.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft
Member

fnothaft commented Nov 2, 2015

Thanks @pkondratyuk!

@fnothaft fnothaft closed this Nov 2, 2015

@xchen97

This comment has been minimized.

Show comment
Hide comment
@xchen97

xchen97 Nov 16, 2015

I am using the recent version of adam-cli_2.10-0.18.1-SNAPSHOT.jar and had the similar issue running actions such as count_kmers, adam2vcf.
Here are part of the error output:
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.formats.avro.Genotype
at org.bdgenomics.adam.rdd.variation.GenotypeRDDFunctions$$anonfun$10.apply(VariationRDDFunctions.scala:151)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Nov 16, 2015 4:35:41 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 79

xchen97 commented Nov 16, 2015

I am using the recent version of adam-cli_2.10-0.18.1-SNAPSHOT.jar and had the similar issue running actions such as count_kmers, adam2vcf.
Here are part of the error output:
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.formats.avro.Genotype
at org.bdgenomics.adam.rdd.variation.GenotypeRDDFunctions$$anonfun$10.apply(VariationRDDFunctions.scala:151)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Nov 16, 2015 4:35:41 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 79

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Nov 16, 2015

Member

It appears you are trying to load an ADAM file with AlignmentRecords and convert it to the VCF format. That is not a supported operation. You need to load an ADAM file with Genotype records in order to generate VCF.

Matt

On Nov 16, 2015, at 8:53 AM, xchen97 notifications@github.com wrote:

I am using the recent version of adam-cli_2.10-0.18.1-SNAPSHOT.jar and had the similar issue running actions such as count_kmers, adam2vcf.
Here are part of the error output:
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.formats.avro.Genotype
at org.bdgenomics.adam.rdd.variation.GenotypeRDDFunctions$$anonfun$10.apply(VariationRDDFunctions.scala:151)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Nov 16, 2015 4:35:41 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 79


Reply to this email directly or view it on GitHub #290 (comment).

Member

massie commented Nov 16, 2015

It appears you are trying to load an ADAM file with AlignmentRecords and convert it to the VCF format. That is not a supported operation. You need to load an ADAM file with Genotype records in order to generate VCF.

Matt

On Nov 16, 2015, at 8:53 AM, xchen97 notifications@github.com wrote:

I am using the recent version of adam-cli_2.10-0.18.1-SNAPSHOT.jar and had the similar issue running actions such as count_kmers, adam2vcf.
Here are part of the error output:
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.formats.avro.Genotype
at org.bdgenomics.adam.rdd.variation.GenotypeRDDFunctions$$anonfun$10.apply(VariationRDDFunctions.scala:151)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$52.apply(RDD.scala:1461)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Nov 16, 2015 4:35:41 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 79


Reply to this email directly or view it on GitHub #290 (comment).

@xchen97

This comment has been minimized.

Show comment
Hide comment
@xchen97

xchen97 Nov 16, 2015

Thanks Matt.

Is there a variant calling function I can use to generate VCF file from an ADAM file with AlignmentRecords?

xchen97 commented Nov 16, 2015

Thanks Matt.

Is there a variant calling function I can use to generate VCF file from an ADAM file with AlignmentRecords?

@xchen97

This comment has been minimized.

Show comment
Hide comment
@xchen97

xchen97 Nov 16, 2015

I was able to run similar job on GATK to generate VCF file:

java -jar ./GenomeAnalysisTK.jar -T HaplotypeCaller -R reference-hg19-1m.fa -I ~/chr1_platinum-genomes-bam-NA12877_S1.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o chr1_platinum-genomes-bam-NA12877_S1_variants.vcf

xchen97 commented Nov 16, 2015

I was able to run similar job on GATK to generate VCF file:

java -jar ./GenomeAnalysisTK.jar -T HaplotypeCaller -R reference-hg19-1m.fa -I ~/chr1_platinum-genomes-bam-NA12877_S1.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o chr1_platinum-genomes-bam-NA12877_S1_variants.vcf

@xchen97

This comment has been minimized.

Show comment
Hide comment
@xchen97

xchen97 Nov 17, 2015

I re-tested with the included test file "small.sam", transform works but got the same error with flagstat. Any doc or debug tool can also be helpful here?

xchen97 commented Nov 17, 2015

I re-tested with the included test file "small.sam", transform works but got the same error with flagstat. Any doc or debug tool can also be helpful here?

@xchen97

This comment has been minimized.

Show comment
Hide comment
@xchen97

xchen97 Nov 24, 2015

Hi Matt, the issue seems caused by Spark 1.5.2. Can you please take a look into it? Thx

xchen97 commented Nov 24, 2015

Hi Matt, the issue seems caused by Spark 1.5.2. Can you please take a look into it? Thx

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 24, 2015

Member

Thanks for the ping @xchen97. I've been looking into this the last few days. It appears that somewhere between Spark 1.4 and Spark 1.5, Spark added a dependency on avro-ipc:1.7.7. This seems to be causing issues with the format schemas. I've opened bigdatagenomics/bdg-formats#69 to move the formats to 1.7.7 but haven't been able to run tests yet to determine whether this fixes the issue. We should have a resolution soon.

Member

fnothaft commented Nov 24, 2015

Thanks for the ping @xchen97. I've been looking into this the last few days. It appears that somewhere between Spark 1.4 and Spark 1.5, Spark added a dependency on avro-ipc:1.7.7. This seems to be causing issues with the format schemas. I've opened bigdatagenomics/bdg-formats#69 to move the formats to 1.7.7 but haven't been able to run tests yet to determine whether this fixes the issue. We should have a resolution soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment