This is an implementation of a simple linear model to classify handwritten digits from the MNIST dataset using
Cuda. It doesn't use any libraries such as
Tensorflow, and it was not designed to be extensible. This was really just practice for me to learn how to use
CUDA, and learn about backpropagation & gradient descent at a low level. This implementation is also not memory optimized. It uses pitched memory allocations for the CUDA kernels as this is supposedly faster than just using non-strided arrays, but other than that there are no speed optimizations.
To compile, simply run
make install. Note, to compile and run this program, you need to have CUDA, NVidia drivers, and an NVidia GPU.