Matrix Multiplication using GPU (CUDA)

Cuda Matrix Implementation using Global and Shared memory.

The input follows this pattern:

The number of lines of Matrix A
The number of columns of Matrix A
The number of lines of Matrix B
The number of columns of Matrix B
The values of Matrix A
The values of Matrix B

The Shared method was implemented by dividing the Matrices into blocks.

Tests

Here is a few tests with a GeForce GTX 960M. The times are in milliseconds.

I ran the test 7 times for each line number and then averaged it.

And the difference in time between global and shared memory.

Usage

(optional) Use the script generate_input.py to generate a random Matrix
Compile it & Run it

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cuda_matrix_global.cu		cuda_matrix_global.cu
cuda_matrix_shared.cu		cuda_matrix_shared.cu
generate_input.py		generate_input.py
plot1.png		plot1.png
plot2.png		plot2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

cuda_matrix_global.cu

cuda_matrix_global.cu

cuda_matrix_shared.cu

cuda_matrix_shared.cu

generate_input.py

generate_input.py

plot1.png

plot1.png

plot2.png

plot2.png

Repository files navigation

Matrix Multiplication using GPU (CUDA)

Tests

Usage

About

Releases

Packages

Languages

License

XinyaoYI/cuda_matrix_multiplication

Folders and files

Latest commit

History

Repository files navigation

Matrix Multiplication using GPU (CUDA)

Tests

Usage

About

Resources

License

Stars

Watchers

Forks

Languages