Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm deployment readme #1571

Merged
merged 5 commits into from Jul 5, 2017
Merged

Slurm deployment readme #1571

merged 5 commits into from Jul 5, 2017

Conversation

@jpdna
Copy link
Member

@jpdna jpdna commented Jun 19, 2017

No description provided.

@coveralls
Copy link

@coveralls coveralls commented Jun 19, 2017

Coverage Status

Coverage remained the same at 83.344% when pulling d9b0768 on jpdna:slurm_readme into 152a8ad on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jun 19, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2107/
Test PASSed.

Copy link
Member

@fnothaft fnothaft left a comment

Looks great! Thanks @jpdna! As an aside, would you mind folding this into 40_deploying_ADAM.md?


For those groups with access to a HPC cluster managed by [Slurm](https://en.wikipedia.org/wiki/Slurm_Workload_Manager), a number of compute nodes often with a few hundred GB of local disk on each node - all attached to large shared network disk storage, it is possible to spin up a temporary Spark cluster for use by ADAM.

While the full IO bandwidth benefits of Spark processing are likely best realized through a set of co-located compute/storage nodes, depending of on your network setup you may find Spark deployed on HPC to be a workable solution for testing or even production at scale, especially for those applications which perform multiple in-memory transformations and thus benefit from Spark's in-memory processing model.

This comment has been minimized.

@fnothaft

fnothaft Jun 19, 2017
Member

of on -> on

@@ -0,0 +1,66 @@
# Running ADAM on Slurm

For those groups with access to a HPC cluster managed by [Slurm](https://en.wikipedia.org/wiki/Slurm_Workload_Manager), a number of compute nodes often with a few hundred GB of local disk on each node - all attached to large shared network disk storage, it is possible to spin up a temporary Spark cluster for use by ADAM.

This comment has been minimized.

@fnothaft

fnothaft Jun 19, 2017
Member

a number -> and a number? Also, should - all be and/or?


While the full IO bandwidth benefits of Spark processing are likely best realized through a set of co-located compute/storage nodes, depending of on your network setup you may find Spark deployed on HPC to be a workable solution for testing or even production at scale, especially for those applications which perform multiple in-memory transformations and thus benefit from Spark's in-memory processing model.

Follow the primary instructions in the ADAM README.md for installing ADAM into `$ADAM_HOME`

This comment has been minimized.

@fnothaft

fnothaft Jun 19, 2017
Member

Can you link them to the install docs in docs/source/02_installation.md?

indicate the address of the Spark master to which your application or ADAM-shell should connect such as `spark://somehostname:7077`

## Start ADAM Shell
Your sys admin will probably prefer that you aunch your ADAM-shell or start an application from a cluster node rather than the head node you log in to so you may want to do so with:

This comment has been minimized.

@fnothaft

fnothaft Jun 19, 2017
Member

aunch -> launch

$ADAM_HOME/bin/adam-shell --master spark://hostnamefromslurmdotout:7077
```

## Or run ADAM submit

This comment has been minimized.

@fnothaft

fnothaft Jun 19, 2017
Member

Might prefer "Run a batch job with adam-submit"

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Jun 19, 2017

Oh also, can you break all lines at 80 characters?

@fnothaft fnothaft added this to the 0.23.0 milestone Jun 22, 2017
@coveralls
Copy link

@coveralls coveralls commented Jun 26, 2017

Coverage Status

Coverage remained the same at 83.046% when pulling 2235b65 on jpdna:slurm_readme into 0306717 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jun 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2153/
Test PASSed.

Copy link
Member

@fnothaft fnothaft left a comment

Two small nits. Looks really good! Thanks @jpdna!

### Start Spark cluster

A Spark cluster can be started as a muti-node job in Slurm by creating a job file `run.cmd` such as below:

This comment has been minimized.

@fnothaft

fnothaft Jun 26, 2017
Member

muti -> multi


A Spark cluster can be started as a muti-node job in Slurm by creating a job file `run.cmd` such as below:
```

This comment has been minimized.

@fnothaft

fnothaft Jun 26, 2017
Member

If you add bash at the end of this line, it'll do bash specific syntax supporting.

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Jun 26, 2017

Resolves #1229.


Follow the primary [instructions](https://github.com/bigdatagenomics/adam/blob/master/docs/source/02_installation.md)
for installing ADAM into `$ADAM_HOME`

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

Does $ADAM_HOME need to be on shared disk?

module load spark
# If spark is not installed as a module, you will need to specifiy absolute path to $SPARK_HOME/bin/spark-start

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

Does $SPARK_HOME need to be on shared disk?

This will start a Spark cluster containing 2 nodes that persists for 5 hours, unless you kill it sooner.
The `slurm.out` file created in the current directory will contain a line produced by `echo $MASTER`
above which willindicate the address of the Spark master to which your application or ADAM-shell

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

willindicate → will indicate


This will start a Spark cluster containing 2 nodes that persists for 5 hours, unless you kill it sooner.
The `slurm.out` file created in the current directory will contain a line produced by `echo $MASTER`

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

file[  ]created → file[ ]created

```

You should be able to connect to the Spark Web UI at `spark://hostnamefromslurmdotout:4040`, however

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

spark://http:// ?

module load spark
# If spark is not installed as a module, you will need to specifiy absolute path to $SPARK_HOME/bin/spark-start

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

If spark → If Spark


### Start ADAM Shell
Your sys admin will probably prefer that you launch your ADAM-shell or start an application from a

This comment has been minimized.

@heuermh

heuermh Jun 26, 2017
Member

use adam-shell consistently, similarly adam-submit

@coveralls
Copy link

@coveralls coveralls commented Jun 26, 2017

Coverage Status

Coverage remained the same at 83.046% when pulling 5a0e8fb on jpdna:slurm_readme into 0306717 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jun 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2158/
Test PASSed.

@heuermh
heuermh approved these changes Jul 5, 2017
@heuermh heuermh merged commit 8572fb7 into bigdatagenomics:master Jul 5, 2017
3 checks passed
3 checks passed
codacy/pr Good work! A positive pull request.
Details
coverage/coveralls Coverage remained the same at 83.046%
Details
default Merged build finished.
Details
@heuermh
Copy link
Member

@heuermh heuermh commented Jul 5, 2017

Thank you, @jpdna

@heuermh heuermh mentioned this pull request Aug 29, 2017
@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.