Skip to content

Lab exercise of Parallel Processing course in NTUA regarding CUDA programming

License

Notifications You must be signed in to change notification settings

PanosAntoniadis/cuda-exercises-ntua

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Programming for GPUs - Matrix Multiplication

Dense Matrix Multiplication (DMM) is one of the core components in many scientific computations. In this repository, we implement the DMM algorithm for GPUs in CUDA using 4 algorithms, increasing each time the total performance.

Algorithms

  • Naive: Simple implementation where each thread just computes one element from the output matrix.
  • Coalesced memory acceses of A: Load tiles of the input matrix A in the shared memory.
  • Reduced memory accesses: Load tiles of the input matrices A and B in the shared memory.
  • Using cuBLAS library

Brief results

All experiments were performed in a NVIDIA Tesla K40c (kepler architecture and compute capability=3.5)

  • Total Performance in 2048×2048 matrices

  • Choosing the optimal thread block size

  • Performance in different problem sizes

Project Structure

  • cuda: Source code for DMM.
  • common: Helper source code.
  • make: Scripts for compiling the source code.
  • plots: Plots in order to analyze our results.
  • results: Performance of different scenarios.
  • report: Final report in Greek.

Contributors:

About

Lab exercise of Parallel Processing course in NTUA regarding CUDA programming

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published