Work With ADAM fasta2adam in a distributed mode #881

guerai · 2015-11-12T09:53:41Z

Hello,
I'm beginner in genomic area but I'm studing K-mer sequence and I'm using spark on Hadoop, I want to test ADAM performance on my cluster. As First I want to convert my FASTA test files in adam files using fasta2adam.
I'm using adam-submit command:
adam-submit --conf spark.yarn.jar ${SPARK_HOME}/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar --master yarn-cluster --driver-memory 4g --num-executors 15 --executor-cores 2 --executor-memory 4g fasta2adam INPUT_FILE OUTPUT_FILE

but command does not start on cluster but as single node
could someone help me
thanks for our time
Best regards

heuermh · 2015-11-12T16:56:19Z

I'm not sure why ${SPARK_HOME}/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar is there, might that be mistakenly interpreted to be the application jar?

https://spark.apache.org/docs/1.5.1/submitting-applications.html#launching-applications-with-spark-submit

guerai · 2015-11-12T19:18:20Z

Sorry it was a misprint. I want to execute fasta2adam conversion on my test cluster and I'm using this command
adam-submit --master yarn-cluster --driver-memory 4g --num-executors 15 --executor-cores 2 --executor-memory 4g fasta2adam INPUT_FILE OUTPUT_FILE
the cluster works well with hadoop and spark but not using adam-submit
Thanks for your time

heuermh · 2015-11-12T19:49:57Z

You might try

adam-submit \
  --master yarn-cluster \
  --driver-memory 4g \
  --num-executors 15 \
  --executor-cores 2 \
  --executor-memory 4g \
  -- \
  fasta2adam INPUT_FILE OUTPUT_FILE

Note the -- separating the Spark and ADAM options. This feature was added recently and may not be very obvious from documentation (e.g. I don't see it mentioned anywhere in the current version of README.md).

It is shown in the usage doc

adam-submit \
  --master yarn-cluster \
  --driver-memory 4g \
  --num-executors 15 \
  --executor-cores 2 \
  --executor-memory 4g \
  fasta2adam adam-core/src/test/resources/artificial.fa artificial.adam

Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit


     e            888~-_              e                 e    e
    d8b           888   \            d8b               d8b  d8b
   /Y88b          888    |          /Y88b             d888bdY88b
  /  Y88b         888    |         /  Y88b           / Y88Y Y888b
 /____Y88b        888   /         /____Y88b         /   YY   Y888b
/      Y88b       888_-~         /      Y88b       /          Y888b

Usage: adam-submit [<spark-args> --] <adam-args>

guerai · 2015-11-13T19:41:16Z

Thanks for you answer now my job start on my cluster I have all executors that I have been set in command line I'm trying to convert my file thanks for your help. I'll update about my progress
best regards

guerai · 2015-11-14T10:51:04Z

Thanks with your help I was able to confevert a fasta file type in an adam file, but when I try to count kmers

adam-submit --master yarn-cluster --driver-memory 4g --num-executors 15 --executor-cores 2 --executor-memory 4g -- count_kmers hdfs://INPUTFILE hdfs://OUTPUTFILE 20

I receive always the same error:
User class threw exception: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://OUTPUTFILE already exists

also if I change it every executions.
It happen if I use the as input file the entire directory of transformed file that contains all part-** sub files and also if I use as input the merged single file created with hadoop fs -getmerge
I'm finding other solution to solve it.

fnothaft · 2016-07-06T15:54:57Z

Closing as resolved/not an ADAM bug.

fnothaft closed this as completed Jul 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work With ADAM fasta2adam in a distributed mode #881

Work With ADAM fasta2adam in a distributed mode #881

guerai commented Nov 12, 2015

heuermh commented Nov 12, 2015

guerai commented Nov 12, 2015

heuermh commented Nov 12, 2015

guerai commented Nov 13, 2015

guerai commented Nov 14, 2015

fnothaft commented Jul 6, 2016

Work With ADAM fasta2adam in a distributed mode #881

Work With ADAM fasta2adam in a distributed mode #881

Comments

guerai commented Nov 12, 2015

heuermh commented Nov 12, 2015

guerai commented Nov 12, 2015

heuermh commented Nov 12, 2015

guerai commented Nov 13, 2015

guerai commented Nov 14, 2015

fnothaft commented Jul 6, 2016