- Created both CPU and GPU implementations of the neural-network convolution layer forward pass.
- Accelerated the GPU implementation by 40% with multiple optimizations (atomic operations, matrix unrolling etc.).
- Used NVIDIA Visual Profiler to analyze performance of the optimizations.
-
Notifications
You must be signed in to change notification settings - Fork 0
DDjackson272/CUDA-Parallel-Programming-CNN
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
CUDA GPU optimization on CNN
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published