Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
ADAM works on Cloudera but does NOT work on MAPR #1475
The Cloudera QuickStart is the latest version, latest spark version, latest Scala version, I am using the latest ADAM, it has only one node, it uses the JDK 1.8. These are the steps that I have taken on the QuickStart:
1- Downloaded and built Adam- successful
The MAPR is a much bigger system with many nodes, many spark projects has been developed and running successfully on this system. It has the Spark version 1.6.1, Scala version 2.10.5, I have used older versions of ADAM and new versions of ADAM, on all I am getting the same cast error as soon as the code uses sc.loadAlignments. These are the steps that I have taken on the MAPR:
1- Downloaded and built Adam- successful
Here is the error:
Thank you fnothaft for such a quick response :) I am using the small.sam which comes with the code. Then I use the transform and it creates the small.adam and immediately I run the adam-submit flagstat on the newly created small.adam file. I am not sure if I understand the question properly but I am using everything within the same package, taking the same steps on both systems and I am not using any other older adam schema. Understanding of whether there is a correlation between adam schema and MAPR is not that trivial to me. I am taking the same steps on Cloudera and MAPR, one works the other one doesn’t. Would you please elaborate? Thanks
Thank you heuermh, I have checked the spark-default.conf and also other places and do not see any Avro related jar file to be added to the classpath at runtime, is there any other place specifically I need to check?
I also noticed something else that might be causing the issue, on Cloudera Qucikstart the Apache Avro library is installed and integrated with spark, but on MAPR it is not. The question is do I need to add the Avro Jar file to the spark-defaults.conf ? for example adding this to spark-defaults.conf?
Do I need to add any other libraries, jars or anything to spark 1.6.1 on MAPR so that it works? Thanks again
I have resolved the issue and I am going write the solution in detail because no one should be ever subjected to that kind of pain again. Before doing that I wanted to thank fnothaft, heuermh and everybody for trying to help.
If you look at the pom.xml in ADAM code and search for avro, depending on the ADAM version you are using, you will find something like this:
That means that in my version of ADAM avro 1.8.0 is used, that means that you have to tell spark to use this version. It turned out that as of Spark 1.5.2 (the Spark version on my MAPR is 1.6.1) you have to follow the following steps in order to integrate the Spark-SQL with Avro. You have to download the right version of the avro jar (in my case avro-1.8.0.jar) and assign it to both spark.executor.extraClassPath and spark.driver.extraClassPath.
To do this in MAPR you have to go to /opt/mapr/spark/spark-version/conf and open the
Inside that you have to add the following lines:
If you don’t have admin privileges on your MAPR and cannot modify the /opt/mapr/spark/spark-/conf/spark-defaults.conf, it is not a big deal. Copy the /opt/mapr/spark/spark-version/conf/ directory to /your desirable path/conf, modify the spark-defaults.conf and then on MAPR do an export:
Great to hear!
For your information, things might get better/worse/more complicated in the near future. The dependency version for avro in ADAM was bumped to 1.8.1 in commit 9505d47, and an attempt to move to Parquet version 1.8.2 causes some kind of runtime conflict in Apache Spark related to avro (see e.g. https://issues.apache.org/jira/browse/SPARK-19697).
Meanwhile, is it ok to close this issue?