LU Decomposition is the factorization of a square matrix into two triangular matrices (one lower and one upper) where multiplying the resulting matrices gives the original matrix. This project is a desktop App that aims to accelerate LU Decomposition with OpenMP.
- Programming Language: C++
- Technologies:
- OpenMP, API for multiprocessing programming
The original algorithm is based on the Crout's algorithm for LU Decomposition Reference.
3 methods are used in an attempt to speed up this algorithm: loop parallelism, scheduling, and SIMD parallelism.
- Instruction-level, loop parallelism (OMP-ILP)
- "for-loop" parallelism
- OMP-ILP + OpenMP Scheduling (OMP-DS)
- Add dynamic scheduling (by default, OpenMP performs static scheduling)
- OMP-DS + data-level, SIMD parallelism (OMP-SIMD)
- Add in Single Instruction, Multiple Data (SIMD), or better known as data-leveling parallelism (SIMD), to indicate that multiple iterations of the loop can be executed concurrently by using SIMD instructions.
- For testing purposes, the thread size, and scheduling algorithm that nets the best scheduling times from solution OMP-DS will be taken to test SIMD instructions.
- For matrix size at 256x256 and above, as the number of threads increases, the performance generally increases. Maximum speedup is observed at the highest matrix size, 2048x2048.
- At size 128x128 and below for the matrix size, slowdown is observed.
- For OMP-SIMD, the speedup ranges from 2.1 times at 128x128 to 5.2 times at 2048x2048. This makes OMP-SIMD by far the fastest implementation among all the OpenMP implementations.
- Bosio, image on static vs dynamic scheduling
- Reinders & Jeffers, image on SIMD