Matrix Multiplication — CPU vs GPU

Perbandingan implementasi perkalian matriks (C = A × B) secara sekuensial, paralel terdistribusi (MPI), dan terakselerasi GPU (CUDA, cuBLAS).

Mata kuliah Pemrograman Paralel — Fasilkom UI.

Implementasi

Berkas	Deskripsi	Platform
`matrix_mul_seq.c`	Algoritma sekuensial ikj; 3 nested loop	CPU (1 core)
`matrix_mul_mpi.c`	Paralelisasi MPI; distribusi baris via `MPI_Scatter`/`MPI_Gather`; broadcast matriks B	CPU Cluster
`matrix_mul_cuda.cu`	Tiga mode kernel CUDA: coalesced (2D grid), row-wise (1D grid), uncoalesced (swap x/y)	GPU (CUDA)
`matrix_mul_cuda_shared.cu`	Kernel tiled dengan shared memory (`__shared__`)	GPU (CUDA)
`matrix_mul_cublas.cu`	Pustaka cuBLAS (`cublasDgemm`)	GPU (cuBLAS)

Prasyarat

Target	Compiler / Runtime
seq	`gcc` dengan OpenMP
mpi	`mpicc` (OpenMPI / MPICH)
cuda	`nvcc` (CUDA Toolkit) + GPU NVIDIA
cuda_shared	`nvcc` + GPU NVIDIA
cublas	`nvcc` + cuBLAS + GPU NVIDIA

Build

Semua target (CPU + GPU):

make

Hanya CPU (seq + mpi):

make -f Makefile.cpu

Hanya GPU (cuda + cuda_shared + cublas):

make -f Makefile.gpu

Target individu:

make seq                   # matrix_mul_seq
make cuda                  # matrix_mul_cuda
make cuda_shared           # matrix_mul_cuda_shared
make cublas                # matrix_mul_cublas
make mpi                   # matrix_mul_mpi
make clean                 # hapus semua binary

Cara Menjalankan

Format argumen umum: <N> [opsi] — semua program menerima N (ukuran matriks N×N, default: 512).

matrix_mul_seq

./matrix_mul_seq <N>

matrix_mul_cuda

./matrix_mul_cuda <N> <blockSize> <mode> <verify>

blockSize: 8, 16, atau 32 (default: 16)
mode : 0 = coalesced (2D), 1 = row-wise, 2 = uncoalesced (default)

Contoh:

./matrix_mul_cuda 1024 16 0 0   # coalesced, N=1024, tanpa verifikasi
./matrix_mul_cuda 1024          # uncoalesced (default), N=1024
./matrix_mul_cuda 512 32        # uncoalesced (default), N=512, block=32

matrix_mul_cuda_shared

./matrix_mul_cuda_shared <N> <blockSize> <verify>

blockSize: 8, 16, atau 32
verify: 0 atau 1

matrix_mul_cublas

./matrix_mul_cublas <N> <verify>

matrix_mul_mpi

mpirun -np <ranks> ./matrix_mul_mpi <N>

N harus habis dibagi jumlah rank.

Contoh:

mpirun -np 4 ./matrix_mul_mpi 1024

Target `make run`

make run-seq
make run-cuda
make run-cuda-shared
make run-cublas
make run-mpi      # default 8 rank, N=512

Metrik Keluaran

Setiap program mencetak:

Computation time — waktu eksekusi inti (kernel / loop matriks)
Communication time — waktu transfer data (host ↔ device, atau MPI)
Checksum — jumlah seluruh elemen C sebagai validasi numerik (expected: N³ × 2)

Contoh output:

CUDA Matrix Multiplication (N=512, blockSize=16, mode=Coalesced)
============================================================
Grid: 32x32, Block: 16x16
Computation time: 0.002341 seconds
Communication time: 0.009876 seconds
Checksum: 268435456.000000 (expected: 268435456.0)
Verification PASSED

Packaging

Buat arsip .tar.gz yang hanya berisi file relevan:

./package_cpu.sh      # → pr2_cpu_matrix_mul_YYYYMMDD.tar.gz
./package_gpu.sh      # → pr2_gpu_matrix_mul_YYYYMMDD.tar.gz

Kubernetes / GPU Cluster

File YAML digunakan untuk deploy container NVHPC di klaster GPU dengan NFS volume. Ubah server dan path NFS sesuai lingkungan masing-masing.

pods-working-user03-gpu02.yaml — node gputype: gpu-02
pods-working-user03-gpu03.yaml — node gputype: gpu-03

kubectl apply -f pods-working-user03-gpu02.yaml
kubectl apply -f pods-working-user03-gpu03.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matrix Multiplication — CPU vs GPU

Implementasi

Prasyarat

Build

Cara Menjalankan

matrix_mul_seq

matrix_mul_cuda

matrix_mul_cuda_shared

matrix_mul_cublas

matrix_mul_mpi

Target `make run`

Metrik Keluaran

Packaging

Kubernetes / GPU Cluster

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Makefile		Makefile
Makefile.cpu		Makefile.cpu
Makefile.gpu		Makefile.gpu
README.md		README.md
cuda_check.h		cuda_check.h
matrix_mul_cublas.cu		matrix_mul_cublas.cu
matrix_mul_cuda.cu		matrix_mul_cuda.cu
matrix_mul_cuda_shared.cu		matrix_mul_cuda_shared.cu
matrix_mul_mpi.c		matrix_mul_mpi.c
matrix_mul_seq.c		matrix_mul_seq.c
package_cpu.sh		package_cpu.sh
package_gpu.sh		package_gpu.sh
pods-working-user03-gpu02.yaml		pods-working-user03-gpu02.yaml
pods-working-user03-gpu03.yaml		pods-working-user03-gpu03.yaml
safe_parse.h		safe_parse.h

Folders and files

Latest commit

History

Repository files navigation

Matrix Multiplication — CPU vs GPU

Implementasi

Prasyarat

Build

Cara Menjalankan

matrix_mul_seq

matrix_mul_cuda

matrix_mul_cuda_shared

matrix_mul_cublas

matrix_mul_mpi

Target make run

Metrik Keluaran

Packaging

Kubernetes / GPU Cluster

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Target `make run`

Packages