Skip to content

XinyaoYI/cuda_matrix_multiplication

 
 

Repository files navigation

Matrix Multiplication using GPU (CUDA)

Cuda Matrix Implementation using Global and Shared memory.

The input follows this pattern:

  1. The number of lines of Matrix A
  2. The number of columns of Matrix A
  3. The number of lines of Matrix B
  4. The number of columns of Matrix B
  5. The values of Matrix A
  6. The values of Matrix B

The Shared method was implemented by dividing the Matrices into blocks.

Tests

Here is a few tests with a GeForce GTX 960M. The times are in milliseconds.

I ran the test 7 times for each line number and then averaged it.

plot1

And the difference in time between global and shared memory.

plot2

Usage

  1. (optional) Use the script generate_input.py to generate a random Matrix
  2. Compile it & Run it

About

Matrix Multiplication using CUDA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 76.5%
  • Shell 9.2%
  • R 9.2%
  • Python 5.1%