Testing and Benchmarking

Unit tests

Basic unit tests can be executed via the following command:

$ make check

which executes the unit test suite on the same environment where the make command was executed and reports a summary when done:

PASS: deque
PASS: freelist
PASS: msgbuff
PASS: scheduler
PASS: idpool
============================================================================
Testsuite summary for aws-ofi-nccl GitHub-dev
============================================================================
# TOTAL: 5
# PASS:  5
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

Functional tests

Running plugin functional tests require a working MPI installation and a MPI setup between the communicating hosts. To install MPI, you can use standard packages provided for your linux distribution. Once MPI is setup, you can use commands like below for running any test of your choice.

mpirun -n 2 --host <host-1>,<host-2> $INSTALL_PREFIX/bin/nccl_message_transfer

Note: All tests require exactly 2 MPI ranks to run except ring.c

Benchmarking with nccl-tests

To run collective benchmark tests with the aws-ofi-nccl plugin, you can follow the instructions below.

Clone the repository

git clone https://github.com/NVIDIA/nccl-tests.git

Build the tests

cd  nccl-tests/
make MPI=1 MPI_HOME=/path/to/mpi CUDA_HOME=/path/to/cuda NCCL_HOME=/path/to/nccl

Run perf tests

NCCL_DEBUG=INFO mpirun -np 2 --bind-to none build/all_reduce_perf -b 8 -f 2 -e 32M -c 1 -g 1

If you installed the AWS libfabric plugin in a custom prefix, ensure LD_LIBRARY_PATH is set to include that prefix so the perf test binaries can find the plugin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing and Benchmarking

Unit tests

Functional tests

Benchmarking with nccl-tests

Clone this wiki locally