Skip to content

Implementation of LeNet-1 Forward Propagation algorithm in CUDA C and profiling of the possible optimisation solutions

Notifications You must be signed in to change notification settings

Luca-Dalmasso/LeNetCUDA

Repository files navigation

LeNetCUDA

The purpose of this project is to accelerate a simple Convolutional Neural Network forward propagation algorithm on a Nvidia GPU and to show what are the possible architectural choices that can be used to speed up a code running on a GPU.

Project overview

The following folders contains the source code:
header files
source files
main application
The profiling script is the one used for profiling the application, from it is possible to select different metrics and events to be profiled using nvprof command line profiler available in the NVIDIA TOOLKIT. The script will generate three files (_exhautive, _medium, _light)containing more and more detailed profiling information about the application. Here and Here you can find some examples.
More details about the project can be found in the report.

Hoe to compile, run & profile

If you want to use the application you need to install the Nvidia Toolkit on your machine and of course have a Nvidia GPU available.
Instructions on how to install the Toolkit can be found here.

compile

You can compile the sourcefiles using make

make

and clean the compilation files using

make clean

I suggest to change the following flag GPU_ARCHITECTURE=sm_53 according to your GPU. The above flag is suited for my NVIDIA Jetson NANO board with a Tegra X1 GPU which is a Mawxell architecture.
Higly suggest to take a look at the references i used for the CNN documentation, you can find them in the report.

About

Implementation of LeNet-1 Forward Propagation algorithm in CUDA C and profiling of the possible optimisation solutions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published