##Commloop
An MPI-based communication loop framework, designed for inter-program cross-lingual communication.
###Table of Contents
####Team Members
- Madison Stemm (author) (madison.stemm@gmail.com)
- Patricio Cubillos (pcubillos@fulbrightmail.org)
- Andrew Foster (andrew.scott.foster@gmail.com)
- Joe Harrington (jh@physics.ucf.edu)
####System Requirements
Important: MPICH must be installed before mpi4py! Commloop and mpi4py were written for MPICH. Nested spawning is not functional with OpenMPI as of the time of this release.
####Background
MPI (Message-Passing Interface) is a communications protocol used to add parallel processing in programs. In this implementation, Commloop consists of a central hub (a 'Master') that interacts with a sequence of spawned programs ('Workers') as a mediator via a loop: Python and C workers are included here, but any language supported by MPI can be easily implemented. This allows communication between programs of different languages.
We designed Commloop to be modular and expandable. As previously stated, the core of Commloop consists of a central hub (master.py
), and C and Python workers (worker_c.c
and worker.py
, respectively). mutils.py
contains a series of function wrappers for MPI calls, designed so that MPI could be easily replaced with another parallel processing interface at a later date.
To execute the code as-is, run:
mpiexec master.py
Initially, the Master sends data to the first Python Worker and awaits output. The outputted data from pyWorker1 is sent back to Master, which then sends it to cWorker. The outputted cWorker data is returned to Master and sent to pyWorker2. Once that data is returned to Master, the loop repeats.
During each loop iteration, each Worker received an array of floats, divides it in half, and sends the resulting array back to the Master. This scaling factor allows the starting number to rapidly approach zero, so there is a traceable difference between each worker operation, without risk of the values blowing up and causing double overflow issues.
The code currently passes dummy arrays in the following structure:
Sender | Data | Receiver |
---|---|---|
Master | Array1 | pyWorker1 |
pyWorker1 | Array2 | Master |
Master | Array2 | cWorker |
cWorker | Array3 | Master |
Master | Array3 | pyWorker2 |
pyWorker2 | Array4 | Master |
Array1 = Array4 | ||
repeat |
####Files
bin/mutils.py
- Holds python wrappers for all used MPI functions in general form
- (used by both master.py and worker.py)
bin/master.py
- Holds all the master MPI calls
bin/worker.py
- Holds worker MPI calls for both Python portions of Commloop
bin/worker_c
- Holds worker MPI calls for C portion of Commloop
src/Makefile
- Compiles the C worker
src/worker_c.c
- Holds worker MPI calls for the C portion of Commloop
####Makefile
To compile the C worker, simply call make
in src/. The compiled binary will be moved to bin/ automatically, overwriting any existing binary.
The makefile generates MPI-executable C code with the following command
mpicc -fPIC -o worker_c worker_c.c
####Commloop Benchmarks
The above plot is for a benchmark of MPI, rather than Commloop specifically. For this setup, we only used one Master and one Worker, with 10 processes per spawned Python worker. The code looped over 1000 iterations. We recorded the min, max, median, and mean times for each transfer size. Performance remains constant up to about 10KB, before runtimes begin to logarithmically increase. The final benchmark (below) was run with the default sourcecode setup (with arrays of sizes 10B, 1KB, 1MB, 10B respectively), with the 1MB array being passed to a C worker, showing runtimes at start, and loop speed breakdowns.
Part of code | Time (seconds) |
---|---|
Start MPI Comm | 0.291091918945 |
Avg Iteration | 0.194536820277 |
Total Code | 82.5224819183 |
Size of Array | Median Time (in seconds) | Minimum Time (in seconds) |
---|---|---|
1B | 8.10623e-06 | 3.81469e-06 |
10B | 8.10623e-06 | 6.91413e-06 |
100B | 8.10623e-06 | 2.86102e-06 |
1KB | 6.19888e-06 | 4.76837e-06 |
10KB | 3.49283e-05 | 1.69277e-05 |
100KB | 0.00130105 | 0.00126290 |
1MB | 0.0130050 | 0.0125229 |
10MB | 0.155957 | 0.150859 |
100MB | 1.60676 | 0.852834 |
1GB | 20.5827 | 5.63851 |
The transfer times for all 1000 iterations for each benchmark were recorded to show the variations in transfer times. Variance in the times are assumed to be caused by background processes in the computer, and spikes occur periodically, which may indicate an MPI buffer being flushed.