This project focuses on the parallelisation of the Red-Black Tree Search Algorithm, a self-balancing binary search tree, using different computational approaches:
- Sequential implementation ๐๏ธ (baseline).
- MPI + OpenMP implementation โก (distributed parallelism).
- CUDA + OpenMP implementation ๐ (GPU acceleration).
- Implement and analyse the Red-Black AVL Tree Search algorithm in parallel computing environments.
- Evaluate performance differences between MPI, OpenMP, and CUDA implementations.
- Conduct extensive benchmarking with different input sizes and hardware configurations.
- Red-Black AVL Tree combines Red-Black Tree and AVL Tree properties to maintain efficient balance for search operations.
- Parallel Computing Approaches:
- OpenMP: Multi-threading on shared-memory architectures.
- MPI: Distributed computing with message-passing.
- CUDA: GPU acceleration using thousands of cores.
The project implements three versions of the search algorithm:
- Sequential version (baseline).
- MPI + OpenMP version:
- Distributes work across multiple processes using MPI.
- Utilises OpenMP threads for intra-process parallelism.
- CUDA + OpenMP version:
- CUDA Kernels handle subtree search in parallel.
- Optimised memory management with shared memory and atomic operations.
- CPU: AMD Ryzen 5 7640HS (6 cores, 12 threads)
- GPU: NVIDIA RTX 4060 (8GB VRAM, 24 CUDA cores)
- RAM: 16GB DDR5
- OS: Ubuntu 22.04 (via WSL)
- Compilers: GCC, NVCC (CUDA 12.3), MPI Compiler
- Tools: Python 3.11 for analysis, CMake for builds
Performance was evaluated by measuring:
- Execution time under different parallel configurations.
- Speed-up compared to the sequential baseline.
- Scalability across varying input sizes.
Key Findings:
- MPI+OpenMP is effective for multi-node CPU clusters, but performance is limited by communication overhead.
- CUDA+OpenMP achieves better speed-up on large datasets, benefiting from GPU parallelism.
- Sequential execution is efficient for small inputs but does not scale well.
Before running the Makefile commands, ensure you have the following installed:
- GCC compiler ๐ฅ๏ธ for C code compilation.
- MPICC compiler ๐ for MPI parallel execution.
- NVCC compiler ๐ฎ from the CUDA toolkit for GPU-based computation.
- OpenMP ๐๏ธ for multi-threading support.
- Python 3 ๐ with:
matplotlib
๐ (for plots)pandas
๐ (for data processing)numpy
๐ข (for numerical computations)
To install Python dependencies, run:
make install_dependencies
The Makefile automates compilation, testing, cleaning, and performance analysis.
Command | Description |
---|---|
make clean |
๐๏ธ Removes compiled binaries and output data from previous builds. |
make all |
๐ Compiles all versions (with different optimisation levels). |
make compile0 |
๐๏ธ Compiles with -O0 (no optimisation). |
make compile1 |
๐น Compiles with -O1 (basic optimisations). |
make compile2 |
๐ท Compiles with -O2 (further optimisations). |
make compile3 |
๐ Compiles with -O3 (full optimisation). |
Command | Description |
---|---|
make test0 / make test1 / make test2 / make test3 |
โ Runs tests for different compiled versions. |
make test00 / make test01 / make test02 / make test03 |
๐ Runs tests with predefined values, threads, and MPI processes. |
make all_test1 / make all_test2 |
๐ Combines multiple test targets for convenience. |
Command | Description |
---|---|
make tables |
๐ Runs a Python script to create tables from CSV results. |
make performance |
๐ Runs a Python script to analyse performance and generate plots & tables. |
To compile and test the project with all optimisation levels and generate performance tables and plots, run:
make clean
make all
make all_test1
make tables
make performance
For detailed performance results and analysis, see the full Project Report ๐.
- Agostino Cardamone ๐ (Student & Creator)
- Supervisor: Francesco Moscato ๐ซ