# **Fully Connected Neural Network: A `CUDA` and `C++` Implementation**

## **Prepare workspace**

In [33]:
from google.colab import drive
drive.mount("/content/drive")
%cd /content/drive/MyDrive/PP/prj

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/PP/prj


## **Extract `.gz` data (if needed)**

In [None]:
# Extract data from `.gz`
# Only need to run once!
!pip install patool
import patoolib
patoolib.extract_archive("mnist/t10k-images-idx3-ubyte.gz", outdir="mnist")
patoolib.extract_archive("mnist/t10k-labels-idx1-ubyte.gz", outdir="mnist")
patoolib.extract_archive("mnist/train-images-idx3-ubyte.gz", outdir="mnist")
patoolib.extract_archive("mnist/train-labels-idx1-ubyte.gz", outdir="mnist")

Collecting patool
  Downloading patool-3.1.0-py2.py3-none-any.whl.metadata (4.3 kB)
Downloading patool-3.1.0-py2.py3-none-any.whl (98 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.4/98.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: patool
Successfully installed patool-3.1.0


INFO patool: Extracting mnist/t10k-images-idx3-ubyte.gz ...
INFO:patool:Extracting mnist/t10k-images-idx3-ubyte.gz ...
INFO patool: running /usr/bin/7z e -omnist -- mnist/t10k-images-idx3-ubyte.gz
INFO:patool:running /usr/bin/7z e -omnist -- mnist/t10k-images-idx3-ubyte.gz
INFO patool: ... mnist/t10k-images-idx3-ubyte.gz extracted to `mnist'.
INFO:patool:... mnist/t10k-images-idx3-ubyte.gz extracted to `mnist'.
INFO patool: Extracting mnist/t10k-labels-idx1-ubyte.gz ...
INFO:patool:Extracting mnist/t10k-labels-idx1-ubyte.gz ...
INFO patool: running /usr/bin/7z e -omnist -- mnist/t10k-labels-idx1-ubyte.gz
INFO:patool:running /usr/bin/7z e -omnist -- mnist/t10k-labels-idx1-ubyte.gz
INFO patool: ... mnist/t10k-labels-idx1-ubyte.gz extracted to `mnist'.
INFO:patool:... mnist/t10k-labels-idx1-ubyte.gz extracted to `mnist'.
INFO patool: Extracting mnist/train-images-idx3-ubyte.gz ...
INFO:patool:Extracting mnist/train-images-idx3-ubyte.gz ...
INFO patool: running /usr/bin/7z e -omnist -- mni

'mnist'

## **Edit `Makefile`**

In [34]:
%%writefile Makefile

# Compilers
CXX := g++
CXX_FLAGS := -std=c++17 -ggdb
NVCC := nvcc

# Folders
BIN := bin
SRC := src
INCLUDE := include

EXECUTABLE := nn_main

all: $(BIN)/$(EXECUTABLE)

run: clean all
	clear
	./$(BIN)/$(EXECUTABLE)

$(BIN)/$(EXECUTABLE): $(SRC)/*.cu $(SRC)/*.cpp
	$(NVCC) -I $(INCLUDE) $^ -o $@

clean:
	-rm $(BIN)/*

Overwriting Makefile


## **Compile and run**

In [74]:
# Compile
!make

nvcc -I include src/main.cu src/nn.cu src/utils_device.cu src/data.cpp src/utils_host.cpp -o bin/nn_main


In [75]:
# Run the program
# ./main <#-neurons> <#-epochs> <learning-rate>
!./bin/nn_main 20 3 0.5

-- # neurons: 20
-- # epochs: 3
-- learning rate: 0.5
Train Images: 60000 with size 784
Train Labels: 60000 labels loaded
Test Images: 10000 with size 784
Test Labels: 10000 labels loaded

FORWARD TIME CPU: 3748.043701 ms

FORWARD TIME GPU: 339.532196 ms

FORWARD TIME GPU (OPTIMIZED): 335.836914 ms

-- Mean error CPU - GPU: 7.03699e-06
-- Mean error CPU - GPU (optimized): 5.70177e-06

CPU Train start...
-- number of epochs: 3
>>> Epoch 1 CEE loss: 13.0489
>>> Epoch 2 CEE loss: 13.4911
>>> Epoch 3 CEE loss: 16.124
TRAIN TIME: 26333.560547 ms


GPU Train start...
-- number of epochs: 3
>>> Epoch 1 CEE loss: 2.30301
>>> Epoch 2 CEE loss: 2.303
>>> Epoch 3 CEE loss: 2.30303
TRAIN TIME: 15600.505859 ms


GPU Optimized Train start...
-- number of epochs: 3
>>> Epoch 1 CEE loss: 2.30299
>>> Epoch 2 CEE loss: 2.303
>>> Epoch 3 CEE loss: 2.30304
TRAIN TIME: 15391.178711 ms

