This project implements a parallel sparse matrix solver using CUDA, featuring different optimization strategies including Level Analysis (LA) and Cache-Aware (CA) approaches.
- CUDA Toolkit (version 12.8 or compatible)
- NVIDIA GPU with CUDA support
- GCC/G++ compiler
- Make
- NVIDIA Nsight Systems (nsys) for performance profiling (optional)
.
├── include/ # Header files
├── src/ # Source files
├── data/ # Test matrices
├── build/ # Build artifacts
├── Makefile # Build configuration
└── run.sh # Execution script
-
Make sure you have CUDA toolkit installed. The default path is set to
/opt/nvidia/hpc_sdk/Linux_x86_64/2023/cuda/12.8. If your CUDA installation is in a different location, modify theCUDA_HOMEvariable in the Makefile. -
Build the project:
make clean # Clean previous build artifacts
make # Build the projectThis will create an executable named run_sparse in the root directory.
The project comes with a convenient run.sh script that runs the solver on multiple test matrices with different optimization strategies.
- onetone1
- onetone2
- bcircuit
- G2_circuit
- hcircuit
- parabolic_fem
- Run with default settings:
./run.sh- Run with a specific size parameter:
./run.sh <size>The script will:
- Run each matrix 20 times
- Use the LEVEL optimization strategy
- Generate performance metrics for analysis
You can also run the solver manually:
./run_sparse <matrix_file> [optimization_strategy] [size]Where:
<matrix_file>: Path to the matrix file (e.g., "data/onetone1/onetone1.mtx")[optimization_strategy]: Optional optimization strategy (e.g., "LA_OPT")[size]: Task granularity size
To get detailed performance metrics using NVIDIA Nsight Systems:
nsys profile --stats=true ./run_sparse <matrix_file> [optimization_strategy] [size]For example:
nsys profile --stats=true ./run_sparse ./data/onetone1/onetone1 LA_OPTThe solver compares its results with serial execution for validation. The console output shows the running time and verification status:
Input: ./data/onetone1/onetone1.mtx
Algorithm: LA_OPT
Granularity: 2048
[CUDA LA_OPT Solve Time] 114.96 ms
[PASS] GPU == CPU
When running with nsys, the solver generates performance metrics and analysis files:
.nsys-repfiles: NVIDIA Nsight Systems performance reports.sqlitefiles: Performance data in SQLite format
To clean build artifacts:
make cleanHelal, Ahmed E., et al. "Adaptive task aggregation for high-performance sparse solvers on GPUs." 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 2019.