# **Fully Connected Neural Network: A `CUDA` and `C++` Implementation**

## **Prepare workspace**

In [1]:
from google.colab import drive
drive.mount("/content/drive")
%cd /content/drive/MyDrive/PP_Project

Mounted at /content/drive
/content/drive/MyDrive/PP_Project


## **Extract `.gz` data (if needed)**

In [None]:
# Extract data from `.gz`
# Only need to run once!
!pip install patool
import patoolib
patoolib.extract_archive("mnist/t10k-images-idx3-ubyte.gz", outdir="mnist")
patoolib.extract_archive("mnist/t10k-labels-idx1-ubyte.gz", outdir="mnist")
patoolib.extract_archive("mnist/train-images-idx3-ubyte.gz", outdir="mnist")
patoolib.extract_archive("mnist/train-labels-idx1-ubyte.gz", outdir="mnist")

Collecting patool
  Downloading patool-3.1.0-py2.py3-none-any.whl.metadata (4.3 kB)
Downloading patool-3.1.0-py2.py3-none-any.whl (98 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.4/98.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: patool
Successfully installed patool-3.1.0


INFO patool: Extracting mnist/t10k-images-idx3-ubyte.gz ...
INFO:patool:Extracting mnist/t10k-images-idx3-ubyte.gz ...
INFO patool: running /usr/bin/7z e -omnist -- mnist/t10k-images-idx3-ubyte.gz
INFO:patool:running /usr/bin/7z e -omnist -- mnist/t10k-images-idx3-ubyte.gz
INFO patool: ... mnist/t10k-images-idx3-ubyte.gz extracted to `mnist'.
INFO:patool:... mnist/t10k-images-idx3-ubyte.gz extracted to `mnist'.
INFO patool: Extracting mnist/t10k-labels-idx1-ubyte.gz ...
INFO:patool:Extracting mnist/t10k-labels-idx1-ubyte.gz ...
INFO patool: running /usr/bin/7z e -omnist -- mnist/t10k-labels-idx1-ubyte.gz
INFO:patool:running /usr/bin/7z e -omnist -- mnist/t10k-labels-idx1-ubyte.gz
INFO patool: ... mnist/t10k-labels-idx1-ubyte.gz extracted to `mnist'.
INFO:patool:... mnist/t10k-labels-idx1-ubyte.gz extracted to `mnist'.
INFO patool: Extracting mnist/train-images-idx3-ubyte.gz ...
INFO:patool:Extracting mnist/train-images-idx3-ubyte.gz ...
INFO patool: running /usr/bin/7z e -omnist -- mni

'mnist'

## **Edit `Makefile`**

In [None]:
%%writefile Makefile

# Compilers
CXX := g++
CXX_FLAGS := -std=c++17 -ggdb
NVCC := nvcc

# Folders
BIN := bin
SRC := src
INCLUDE := include

EXECUTABLE := nn_main

all: $(BIN)/$(EXECUTABLE)

run: clean all
	clear
	./$(BIN)/$(EXECUTABLE)

$(BIN)/$(EXECUTABLE): $(SRC)/*.cu $(SRC)/*.cpp
	$(NVCC) -I $(INCLUDE) $^ -o $@

clean:
	-rm $(BIN)/*

Overwriting Makefile


## **Compile and run**

In [26]:
# Compile
!make

make: Nothing to be done for 'all'.


### **Run with different config**

In [27]:
# Run the program
# ./main <#-neurons> <#-epochs> <learning-rate> <mode>

!echo "Train CPU..."
!./bin/nn_main 20 7 0.5 1

Train CPU...
-- # neurons: 20
-- # epochs: 7
-- learning rate: 0.5
Train Images: 60000 with size 784
Train Labels: 60000 labels loaded
Test Images: 10000 with size 784
Test Labels: 10000 labels loaded


Train start...
-- number of epochs: 7
- layer 0 forward time: 2984.364746 ms
- layer 1 forward time: 82.988930 ms
- layer 2 forward time: 51.431458 ms
>>> Epoch 1 CEE loss: 13.0526
- layer 0 forward time: 3724.683838 ms
- layer 1 forward time: 132.526917 ms
- layer 2 forward time: 80.441086 ms
>>> Epoch 2 CEE loss: 13.4881
- layer 0 forward time: 2999.821777 ms
- layer 1 forward time: 88.487267 ms
- layer 2 forward time: 53.977119 ms
>>> Epoch 3 CEE loss: 6.09104
- layer 0 forward time: 3853.672607 ms
- layer 1 forward time: 90.599869 ms
- layer 2 forward time: 50.719807 ms
>>> Epoch 4 CEE loss: 3.49842
- layer 0 forward time: 2977.855225 ms
- layer 1 forward time: 81.764320 ms
- layer 2 forward time: 51.855263 ms
>>> Epoch 5 CEE loss: 2.59185
- layer 0 forward time: 2997.315674 ms
- la

In [30]:
!echo "Train GPU..."
!./bin/nn_main 20 10 0.5 2

Train GPU...
-- # neurons: 20
-- # epochs: 10
-- learning rate: 0.5
Train Images: 60000 with size 784
Train Labels: 60000 labels loaded
Test Images: 10000 with size 784
Test Labels: 10000 labels loaded


Train start...
-- number of epochs: 10
- layer 0 forward time: 53.676033 ms
- layer 1 forward time: 5.775936 ms
- layer 2 forward time: 2.583968 ms
>>> Epoch 1 CEE loss: 12.1888
- layer 0 forward time: 51.175999 ms
- layer 1 forward time: 5.233056 ms
- layer 2 forward time: 2.411168 ms
>>> Epoch 2 CEE loss: 9.27244
- layer 0 forward time: 51.753407 ms
- layer 1 forward time: 5.215168 ms
- layer 2 forward time: 2.360448 ms
>>> Epoch 3 CEE loss: 13.1128
- layer 0 forward time: 49.527649 ms
- layer 1 forward time: 5.177248 ms
- layer 2 forward time: 2.312704 ms
>>> Epoch 4 CEE loss: 13.0015
- layer 0 forward time: 50.446465 ms
- layer 1 forward time: 5.144960 ms
- layer 2 forward time: 2.305088 ms
>>> Epoch 5 CEE loss: 7.87911
- layer 0 forward time: 49.128639 ms
- layer 1 forward time: 5

In [32]:
!echo "Train GPU (optimized)..."
!./bin/nn_main 20 10 0.5 3

Train GPU (optimized)...
-- # neurons: 20
-- # epochs: 10
-- learning rate: 0.5
Train Images: 60000 with size 784
Train Labels: 60000 labels loaded
Test Images: 10000 with size 784
Test Labels: 10000 labels loaded


Train start...
-- number of epochs: 10
- layer 0 forward time: 55.600258 ms
- layer 1 forward time: 6.029440 ms
- layer 2 forward time: 3.283200 ms
>>> Epoch 1 CEE loss: 14.1153
- layer 0 forward time: 51.442112 ms
- layer 1 forward time: 5.864928 ms
- layer 2 forward time: 3.102656 ms
>>> Epoch 2 CEE loss: 11.4146
- layer 0 forward time: 52.979424 ms
- layer 1 forward time: 6.298272 ms
- layer 2 forward time: 3.209792 ms
>>> Epoch 3 CEE loss: 12.5083
- layer 0 forward time: 52.436928 ms
- layer 1 forward time: 6.254080 ms
- layer 2 forward time: 3.311136 ms
>>> Epoch 4 CEE loss: 9.22414
- layer 0 forward time: 55.557758 ms
- layer 1 forward time: 6.262592 ms
- layer 2 forward time: 2.646624 ms
>>> Epoch 5 CEE loss: 10.4003
- layer 0 forward time: 53.122784 ms
- layer 1 for