Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGCloud deploy docs #1279

Merged
merged 1 commit into from Nov 18, 2016
Merged

CGCloud deploy docs #1279

merged 1 commit into from Nov 18, 2016

Conversation

@jpdna
Copy link
Member

jpdna commented Nov 17, 2016

No description provided.

@jpdna jpdna changed the title CGCloud deply docs CGCloud deploy docs Nov 17, 2016
Copy link
Member

fnothaft left a comment

Couple of small nits, otherwise looks great! Thanks @jpdna!


#### Launch a cluster

Spin up a Spark cluster with one master and two slave nodes with the command:

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Prefer leader/worker to master/slave.

Also, I would note in the documents that you're setting up a cluster where the workers are m3.large. Somewhat obvious, I concede, but it's useful to note that you can set a different leader node type. Also, doesn't this command need you to provide a cluster name?

export MY_KEYFILE="?????.pem"
export MY_CLUSTER_NAME="adam_cluster"
export MY_CLUSTER_SIZE=10
[CGCloud](https://github.com/BD2KGenomics/cgcloud) lets you automate the creation, management and provisioning of VMs and clusters of VMs in Amazon EC2.

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Can you wrap lines at 80 characters throughout?

```
cgcloud ssh spark-master
```

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Nit: extra whitespace.

Export the path to your `spark-ec2` script,
To use the ADAM application on top of Spark, we need to download and install ADAM on `spark-master`
From the command line on `spark-master` download a release from:
https://github.com/bigdatagenomics/adam/releases

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Nit: missing period at EOL.

alias spark_ec2_login="$SPARK_EC2_SCRIPT -k $MY_KEYPAIR -i $MY_KEYFILE login $MY_CLUSTER_NAME"
The typical flow of data to and from your ADAM application on EC2 will be:
- Upload data to AWS S3
- Use Conductor (described below) or otherwise transfer from S3 to the HDFS on your cluster

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Can you add an anchor link {#conductor} in the section where conductor is described, and link from here (described below) -> [(described below)](#conductor). This'll make navigation a bit easier.

To transfer large amounts of data back and forth from S3, we suggest using [Conductor](https://github.com/BD2KGenomics/conductor).

Its also possible to directly use AWS S3 as a distributed file system, but with some loss of performance.
( example to be added )

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Nit: I might drop the example to be added bit and remove the paragraph break between this paragraph and the conductor paragraph.

@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1622/
Test PASSed.

@jpdna jpdna force-pushed the jpdna:cgcloud_doc branch 4 times, most recently from e8bed96 to 4d4ab71 Nov 17, 2016
@jpdna
Copy link
Member Author

jpdna commented Nov 17, 2016

ready for further review or merge

@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1623/
Test PASSed.

@jpdna jpdna force-pushed the jpdna:cgcloud_doc branch from 4d4ab71 to f887597 Nov 17, 2016
@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1624/
Test PASSed.

@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1625/
Test PASSed.

@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1626/
Test PASSed.

Copy link
Member

fnothaft left a comment

Few small nits, otherwise LGTM!

alias spark_ec2_destroy="$SPARK_EC2_SCRIPT destroy $MY_CLUSTER_NAME"
alias spark_ec2_login="$SPARK_EC2_SCRIPT -k $MY_KEYPAIR -i $MY_KEYFILE login $MY_CLUSTER_NAME"
Spin up a Spark cluster named `cluster1` with one leader and two workers nodes
of instance type `m3.large`with the command:

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Space between words in m3.largewith

#### Install ADAM

To use the ADAM application on top of Spark, we need to download and install
ADAM on `spark-master`

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

period at EOL

To use the ADAM application on top of Spark, we need to download and install
ADAM on `spark-master`
From the command line on `spark-master` download a release from:
https://github.com/bigdatagenomics/adam/releases

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

Punctuation at EOL? Maybe remove paragraph break.

As of this writing, CGCloud supports Spark 1.6.2, not Spark 2.x, so download
the Spark 1.x Scala2.10 release:
```
wget https://repo1.maven.org/maven2/org/bdgenomics/adam/\

This comment has been minimized.

Copy link
@fnothaft

fnothaft Nov 17, 2016

Member

I would remove the \ed linebreak here.

@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1627/
Test PASSed.

cgcloud doc edits

edits to cgcloud docs

more cgcloud edits

more cgcloud docs edits

more cgcloud docs edits

edit cgcloud docs

more cgcloud doc edits
@jpdna jpdna force-pushed the jpdna:cgcloud_doc branch from f887597 to fdfee7c Nov 17, 2016
@jpdna
Copy link
Member Author

jpdna commented Nov 17, 2016

ready again for more review or merge

@AmplabJenkins
Copy link

AmplabJenkins commented Nov 17, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1628/
Test PASSed.

@fnothaft fnothaft merged commit 20a0eb2 into bigdatagenomics:master Nov 18, 2016
1 check passed
1 check passed
default Merged build finished.
Details
@fnothaft
Copy link
Member

fnothaft commented Nov 18, 2016

Merged! Thanks @jpdna!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.