Update usage docs running for EC2 and CDH #493

Closed
arahuja opened this Issue Nov 19, 2014 · 13 comments

Comments

Projects
4 participants
@arahuja
Contributor

arahuja commented Nov 19, 2014

  • Update to use adam-submit
  • Update CDH docs to have YARN and standalone information
@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 7, 2016

Member

Regarding EC2, the existing (out of date) documentation is here:
https://github.com/bigdatagenomics/adam/blob/master/docs/source/40_running_on_EC2.md

To confirm, do we want the updated ADAM EC2 quickstart to deploy Spark via spark-ec2 scripts as described at?:
http://spark.apache.org/docs/1.6.1/ec2-scripts.html

or do we want to provide cgcloud/Toil or other instructions?

Member

jpdna commented Oct 7, 2016

Regarding EC2, the existing (out of date) documentation is here:
https://github.com/bigdatagenomics/adam/blob/master/docs/source/40_running_on_EC2.md

To confirm, do we want the updated ADAM EC2 quickstart to deploy Spark via spark-ec2 scripts as described at?:
http://spark.apache.org/docs/1.6.1/ec2-scripts.html

or do we want to provide cgcloud/Toil or other instructions?

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 7, 2016

Member

So Spark's spark-ec2 script as of Spark 1.6.1 doesn't work for us, unless we modify it, due to it using java 7 and we need java 8.

I still feel like cgcloud/toil is a bit heavy weight for some users, and perhaps we should modify spark-ec2 script, or make an equivalent, that can be used for simple ADAM deployment on EC2.

But - if you think we should only document a cgcloud/toil path for ADAM on EC2 that is fine too.
What do you think @fnothaft and @heuermh

Member

jpdna commented Oct 7, 2016

So Spark's spark-ec2 script as of Spark 1.6.1 doesn't work for us, unless we modify it, due to it using java 7 and we need java 8.

I still feel like cgcloud/toil is a bit heavy weight for some users, and perhaps we should modify spark-ec2 script, or make an equivalent, that can be used for simple ADAM deployment on EC2.

But - if you think we should only document a cgcloud/toil path for ADAM on EC2 that is fine too.
What do you think @fnothaft and @heuermh

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2016

Member

So Spark's spark-ec2 script as of Spark 1.6.1 doesn't work for us, unless we modify it, due to it using java 7 and we need java 8.

Officially, the spark-ec2 script is beta/unsupported, so we shouldn't use it anyways.

I still feel like cgcloud/toil is a bit heavy weight for some users, and perhaps we should modify spark-ec2 script, or make an equivalent, that can be used for simple ADAM deployment on EC2.

I would do cgcloud-spark, which is fairly battle tested, well supported by Hannes + co, etc. I would definitely not modify the spark-ec2 script or roll our own. There's a non-trivial amount of work needed to get all of this working, and you run into a shocking number of bugs and weird edge cases.

If time allowed, I would also document ADAM on Elastic Spark, but I would not plan on that as a primary documented route, since the cost model for Elastic Spark is different. If we are interested in that, I can connect you to some folks at AWS.

Member

fnothaft commented Oct 7, 2016

So Spark's spark-ec2 script as of Spark 1.6.1 doesn't work for us, unless we modify it, due to it using java 7 and we need java 8.

Officially, the spark-ec2 script is beta/unsupported, so we shouldn't use it anyways.

I still feel like cgcloud/toil is a bit heavy weight for some users, and perhaps we should modify spark-ec2 script, or make an equivalent, that can be used for simple ADAM deployment on EC2.

I would do cgcloud-spark, which is fairly battle tested, well supported by Hannes + co, etc. I would definitely not modify the spark-ec2 script or roll our own. There's a non-trivial amount of work needed to get all of this working, and you run into a shocking number of bugs and weird edge cases.

If time allowed, I would also document ADAM on Elastic Spark, but I would not plan on that as a primary documented route, since the cost model for Elastic Spark is different. If we are interested in that, I can connect you to some folks at AWS.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 7, 2016

Member

It looks like cgcloud-spark is on Spark 1.5.2
and java7.

We should have java8 and Spark 1.6.1 to match
the next release and current master

Should we talk with Hannes about updating spark_box ?

Also - what is the interaction here with the joda time stuff?

Member

jpdna commented Oct 7, 2016

It looks like cgcloud-spark is on Spark 1.5.2
and java7.

We should have java8 and Spark 1.6.1 to match
the next release and current master

Should we talk with Hannes about updating spark_box ?

Also - what is the interaction here with the joda time stuff?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2016

Member

I'm OK with staying on Spark 1.5.2 or moving to 1.6.1. It's a one line change. I'll open a PR to move to Java 8.

Also - what is the interaction here with the joda time stuff?

Conductor is impacted by the joda time/Java 8 mismatch, but ADAM isn't. There's a WAR at BD2KGenomics/cgl-docker-lib#187.

Member

fnothaft commented Oct 7, 2016

I'm OK with staying on Spark 1.5.2 or moving to 1.6.1. It's a one line change. I'll open a PR to move to Java 8.

Also - what is the interaction here with the joda time stuff?

Conductor is impacted by the joda time/Java 8 mismatch, but ADAM isn't. There's a WAR at BD2KGenomics/cgl-docker-lib#187.

@fnothaft fnothaft referenced this issue in BD2KGenomics/cgcloud Oct 7, 2016

Closed

Move cgcloud-spark to Spark 1.6.1, Java 8 #231

@fnothaft

This comment has been minimized.

Show comment
Hide comment
Member

fnothaft commented Oct 7, 2016

@jpdna

This comment has been minimized.

Show comment
Hide comment
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 13, 2016

Member

@jpdna yes, that'd be great!

Member

fnothaft commented Oct 13, 2016

@jpdna yes, that'd be great!

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 16, 2016

Member

This: BD2KGenomics/cgcloud#234
has me stuck, must be something dumb I'm missing as this worked for me several times in past, including last week on different machine. My prob is same on current cgcloud from pip install cgcloud-core - so not due to using your branch Frank. Let me know any thoughts...

Member

jpdna commented Oct 16, 2016

This: BD2KGenomics/cgcloud#234
has me stuck, must be something dumb I'm missing as this worked for me several times in past, including last week on different machine. My prob is same on current cgcloud from pip install cgcloud-core - so not due to using your branch Frank. Let me know any thoughts...

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 16, 2016

Member

has me stuck,

I unstuck it, I needed new IAM key

Member

jpdna commented Oct 16, 2016

has me stuck,

I unstuck it, I needed new IAM key

@fnothaft fnothaft referenced this issue Nov 8, 2016

Closed

HBase backend for Genotypes #1246

0 of 3 tasks complete
@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Nov 16, 2016

Member

blocked on this cgcloud error:
BD2KGenomics/cgcloud#245

Member

jpdna commented Nov 16, 2016

blocked on this cgcloud error:
BD2KGenomics/cgcloud#245

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 18, 2016

Member

EC2 docs resolved by #1279. CDH docs are forthcoming. @jpdna do you plan to tackle the CDH docs? If not, I'll knock them out tomorrow.

Member

fnothaft commented Nov 18, 2016

EC2 docs resolved by #1279. CDH docs are forthcoming. @jpdna do you plan to tackle the CDH docs? If not, I'll knock them out tomorrow.

fnothaft added a commit to fnothaft/adam that referenced this issue Dec 4, 2016

@heuermh heuermh closed this in #1301 Dec 6, 2016

heuermh added a commit that referenced this issue Dec 6, 2016

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Dec 6, 2016

Member

CDH docs

I'm not sure what the CDH docs should include.
Basically if you have CDH installed, then if you get an ADAM distribution, it will just work AFAIK.
I guess I'd show the example of pointing to yarn-client:

adam-submit --master yarn-client --num-executors 10 --executor-cores 2 --executor-memory 20g -- transform hdfs://hdfsmaster/data/input/file1.sam hdfs://hdfsmaster/data/output/file1.adam

If you have an idea of what it should include ands its faster for you to do then go ahead, otherwise point me in right direction and I'll look at today.

Member

jpdna commented Dec 6, 2016

CDH docs

I'm not sure what the CDH docs should include.
Basically if you have CDH installed, then if you get an ADAM distribution, it will just work AFAIK.
I guess I'd show the example of pointing to yarn-client:

adam-submit --master yarn-client --num-executors 10 --executor-cores 2 --executor-memory 20g -- transform hdfs://hdfsmaster/data/input/file1.sam hdfs://hdfsmaster/data/output/file1.adam

If you have an idea of what it should include ands its faster for you to do then go ahead, otherwise point me in right direction and I'll look at today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment