Skip to content

Commit

Permalink
Merge pull request #56 from ResearchComputing/master
Browse files Browse the repository at this point in the history
Updating dev
  • Loading branch information
mtrahan41 committed Jan 17, 2019
2 parents abc1f03 + d614b4b commit 98cc48b
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 2 deletions.
Binary file added Interactive-Jobs/putty-1-small.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Interactive-Jobs/putty-2-small.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@ This convention is to separate words in a variable name without the
use of white space. White space within variables is usually difficult
for programming languages to interpret. Because of this variables
must be delimited in some way. Here are several delimiting conventions
commonly used in code: __Snakecase:__ Words are delimited by an
underscore.
commonly used in code:

__Snakecase:__ Words are delimited by anunderscore.

```
variable_one
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ If you have any questions, please contact rc-help@colorado.edu.
running-jobs/slurm-commands
running-jobs/squeue-status-codes
running-jobs/interactive-jobs
running-jobs/roce-enabled

.. toctree::
:maxdepth: 2
Expand Down
42 changes: 42 additions & 0 deletions docs/running-jobs/roce-enabled.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## Running jobs on RoCE enabled Nodes

We have some nodes in Blanca that are equipped with Mellanox 10G cards and RoCE v2
enabled switches to enable users to run MPI jobs over the 10G interfaces. While the
10G network is not as performant with regards to latency as Infiniband or Omnipath,
you can still get line speed for bandwidth.

In order to take advantage of RoCE on these nodes, you will need to compile your code with a
MPI compiler that was built with support for Unified Communication X (UCX). Without UCX a job
submitted to these nodes will fail.

### Using pre-built modules

You can easily build/rebuild your binaries with support for the 10G RoCE network by building
your code with the module keys gcc/8.2.0 and openmpi_ucx/4.0.0. Once you have loaded those software
keys you can begin building your code as you normally would.

### Build a MPI compiler with support for UCX

First ensure that you have UCX installed on the node you intend to build the MPI on

```bash
yum info ucx ucx-devel
```

Then you can move on to building the MPI, in this example we are using the default GNU compiler,
and are using the most recent version for OpenMPI.
```bash
./configure --prefix=/home/jobl6604/soft/openmpi-4.0.0 --with-ucx
```

After successfully building the MPI, you can then compile your code against it and start running jobs.
You do not need to worry about passing any flags or arguements into the MPI command for your job script.

### Tips

If you are still have issues trying to run your code you can try passing some flags to MPI

```bash
mpirun --mca pml ob1 --mca btl openib,self,vader --mca btl_openib_cpc_include rdmacm --mca btl_openib_rroce_enable 1 <command>
```

0 comments on commit 98cc48b

Please sign in to comment.