Skip to content

Commit

Permalink
Refactor Parallel Programming with Jupyter
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonathon Anderson committed Oct 12, 2018
1 parent 65159dc commit c898179
Showing 1 changed file with 73 additions and 29 deletions.
102 changes: 73 additions & 29 deletions docs/additional-resources/parallel-programming-jupyter.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,52 @@
## Table of Contents
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Log in](#log-in)
- [Creating a parallel-processing notebook](#creating-a-parallel-processing-notebook)
- [Shutting down](#shutting-down)
## Parallel programming with Jupyter

## Overview
This tutorial demonstrates simple parallel processing examples using the CURC JupyterHub web service, in both ipyparallel and MPI for Python.
This tutorial demonstrates simple parallel processing examples using
the CURC JupyterHub web service, in both ipyparallel and MPI for
Python.

### Prerequisites

## Prerequisites
Before you begin, you need

* an RC account
* Duo credentials
* access to a CU campus network or VPN
* a web browser

## Log in
First, log into the RC JupyterHub service by nagivating to https://jupyter.rc.colorado.edu. Enter your RC username and password (where your password may be a combination of a password, OTP, and/or credential prefix).
### Log in

First, log into the RC JupyterHub service by nagivating to
https://jupyter.rc.colorado.edu. Enter your RC username and password
(where your password may be a combination of a password, OTP, and/or
credential prefix).

After authenticating, you should be redirected to your Jupyter
notebook server; but this process is a queued process, so you may have
to wait if the JupyterHub resources are otherwise fully utilized.

After authenticating, you should be redirected to your Jupyter notebook server; but this process is a queued process, so you may have to wait if the JupyterHub resources are otherwise fully utilized.
### Prepare an IPython cluster

## Prepare an IPython cluster
Navigate to the _IPython Clusters_ tab to access a list of available parallel profiles. Each profile represents an IPython cluster you can initialize, with a predefined configuration; the _# of engines_ is the number of processes you will spawn for the cluster.
Navigate to the _IPython Clusters_ tab to access a list of available
parallel profiles. Each profile represents an IPython cluster you can
initialize, with a predefined configuration; the _# of engines_ is the
number of processes you will spawn for the cluster.

Any of the RC-provided cluster profiles (though not the default profile) can be used for these examples. Specify 2 engines for the `example-shas` node, and use the _Start_ button to start the compute cluster.
Any of the RC-provided cluster profiles (though not the default
profile) can be used for these examples. Specify 2 engines for the
`example-shas` node, and use the _Start_ button to start the compute
cluster.

## Creating a parallel-processing notebook
Return to the _Files_ tab and use the _New_ button to create a Python 3 notebook. A new notebook should include an initial Python code cell; but, if necessary, use the _Insert_ menu to insert a new cell, and use the _Cell > Cell Type_ menu to configure the new cell as a _Code_ cell.
### Creating a parallel-processing notebook

This first cell includes code to initialize access to the running cluster using `ipyparallel`. You can simply paste this code into the cell, all of which executes within the Jupyter notebook.
Return to the _Files_ tab and use the _New_ button to create a Python
3 notebook. A new notebook should include an initial Python code cell;
but, if necessary, use the _Insert_ menu to insert a new cell, and use
the _Cell > Cell Type_ menu to configure the new cell as a _Code_
cell.

This first cell includes code to initialize access to the running
cluster using `ipyparallel`. You can simply paste this code into the
cell, all of which executes within the Jupyter notebook.

```python
import ipyparallel
Expand All @@ -39,17 +57,24 @@ print('profile:', cluster.profile)
print("IDs:", cluster.ids) # Print process id numbers
```

Execute the cell using Shift+Return, which produces output identifying the engine IDs available in the cluster.
Execute the cell using Shift+Return, which produces output identifying
the engine IDs available in the cluster.

```
profile: example-shas
IDs: [0, 1]
```

**Note:** IPython engines on RC cluster resources are provisioned as batch jobs using Slurm, but Jupyer does not yet report queue progress. If no IDs are listed, or an exception "NoEnginesRegistered: Can't build targets without any engines" is raised, the cluster job is still in the queue and is not ready to accept work.
**Note:** IPython engines on RC cluster resources are provisioned as
batch jobs using Slurm, but Jupyer does not yet report queue
progress. If no IDs are listed, or an exception "NoEnginesRegistered:
Can't build targets without any engines" is raised, the cluster job is
still in the queue and is not ready to accept work.

### IPython Parallel
Once the above code has successfuly reported the engine IDs for the cluster, insert a new code cell below the existing code block.

Once the above code has successfuly reported the engine IDs for the
cluster, insert a new code cell below the existing code block.

```python
# The %px magic executes a single Python command on
Expand All @@ -65,10 +90,16 @@ squares = cluster[:].map_sync(lambda x: x**2, range(32))
print("squares:", squares)
```

Execute this cell using Shift+Return, which outputs the hostname of the host for each engine, as well as the calculated square numbers in each cell. In this cell, code prepended with `%px` is executed in each engine; and the `squares` are calculated using the `cluster` reference obtained in the previous code block.
Execute this cell using Shift+Return, which outputs the hostname of
the host for each engine, as well as the calculated square numbers in
each cell. In this cell, code prepended with `%px` is executed in each
engine; and the `squares` are calculated using the `cluster` reference
obtained in the previous code block.

### MPI for Python
Insert a new code cell below the existing code block to demonstrate message passing using MPI.
#### MPI for Python

Insert a new code cell below the existing code block to demonstrate
message passing using MPI.

```python
%%px
Expand All @@ -93,11 +124,24 @@ else:
print("{0}: idle".format(rank))
```

Execute this cell using Shift+Returm. This cell is prepended with `%%px`, which causes the entire cell to execute in parallel on all nodes. Rank 0 generates a random numpy array which is then sent to rank 1.
Execute this cell using Shift+Returm. This cell is prepended with
`%%px`, which causes the entire cell to execute in parallel on all
nodes. Rank 0 generates a random numpy array which is then sent to
rank 1.

### Shutting down

## Shutting down
Both the IPython cluster and notebook server will persist until manually stopped or each reaches its time limit. IPython clusters have a 4-hour time limit by default (though this can be changed by editing the profile in `$HOME/.ipython/`). The notebook server itself has a time limit of 2 hours. When you're done using a resource, please shut it down so that the resources can be used for other work.
Both the IPython cluster and notebook server will persist until
manually stopped or each reaches its time limit. IPython clusters have
a 4-hour time limit by default (though this can be changed by editing
the profile in `$HOME/.ipython/`). The notebook server itself has a
time limit of 2 hours. When you're done using a resource, please shut
it down so that the resources can be used for other work.

Return to the "IPython Clusters" tab and press the "Stop" button for `example-shas` (or whichever profile was used during the example).
Return to the "IPython Clusters" tab and press the "Stop" button for
`example-shas` (or whichever profile was used during the example).

Finally, if you are done using Jupyter notebook for now, access the "Control Panel" and press the "Stop My Server" button to stop the Jupyter notebook server. After that, you may press "Logout", or simply close the browser window.
Finally, if you are done using Jupyter notebook for now, access the
"Control Panel" and press the "Stop My Server" button to stop the
Jupyter notebook server. After that, you may press "Logout", or simply
close the browser window.

0 comments on commit c898179

Please sign in to comment.