DeepCore is a C++ neural network library that leverages CUDA for accelerated tensor operations. Its user-friendly and intuitive API draws inspiration from Keras's Tensorflow API. DeepCore is built from the ground up, from the fundamental tensor operations that power artificial neural networks. The source code aims to aid in understanding the mathematics behind neural networks, implementing concepts such as forward propagation, backpropagation, gradient descent, Jacobian computation, and the chain rule.
Clone the repository into a local directory. Ensure you have the CUDA Toolkit 12.5, the Microsoft Visual Studio Compiler (cl.exe), and other relevant dependencies installed. Ensure your PATH
is configured correctly.
To use the DeepCore library, simply include the ./src/deepcore.cu
file into your project source code. You will then be able to access all of the functions provided by the library. Compile your project with the nvcc
compiler. Refer to the ./examples/
directory for practical examples of ultilizing DeepCore.
An abstract base class for neural network layers. Usage
- Use the derived classes
Dense
andFlatten
as DeepCore neural network layers.
A dense (fully connected) neural network layer. Trainable parameters consist of a weights and biases matrix. Non-trainable parameters consist of the size of the dense layer (number of nodes) and its activation function.
Creates a dense layer consisting of num_nodes
nodes and activation_func
activation function.
Arguments
int num_nodes
: The size of number of nodes of the Dense layer.Activation activation_func
: The activation function of the Dense layer. Currently implemented activation functions includeRELU
andSOFTMAX
Usage
model.add(make_unique<DeepCore::Dense>(300, RELU));
model.add(make_unique<DeepCore::Dense>(10, SOFTMAX));
A flattening neural network layer. Serves as an input layer into a dense neural network. Does not contain trainable parameters, with its only non-trainable parameter being its size (number of nodes).
Creates a flattening layer consisting of num_nodes
nodes.
Arguments
int num_nodes
: The size of number of nodes of the Dense layer.
Usage
model.add(make_unique<DeepCore::Flatten>(num_features));
model.add(make_unique<DeepCore::Flatten>(784));
A model grouping several Layer
objects into an object with training/prediction functionalities.
A DeepCore model is instantiated by simply declaring an object its type:
DeepCore model;
Once you've declared the model, you can add layers using add()
, configure it with compile()
, and then train the model using fit()
. Alternatively, if you have a model saved in a file, you can load it using read()
. Once loaded, you can use the model for predictions with predict()
, evaluate its performance on test data with evaluate()
, or save it back to a file with save()
. When you're done using the model, remember to free up program resources by calling destroy()
. An snippet of using DeepCore is below; refer to ./examples/mnist
for the entire example. The saved model and outputs of the program can also be found in ./examples/mnist/models
and ./examples/mnist/outputs
respectively.
// Training, evaluating, and saving a model to file
DeepCore model;
model.add(make_unique<DeepCore::Flatten>(784));
model.add(make_unique<DeepCore::Dense>(300, RELU));
model.add(make_unique<DeepCore::Dense>(100, RELU));
model.add(make_unique<DeepCore::Dense>(10, SOFTMAX));
model.compile(CROSS_ENTROPY);
model.fit(X, num_features, NUM_TRAIN_IMAGES, Y, num_classes, batch_size, num_epochs, learning_rate, test_X, NUM_TEST_IMAGES, test_Y);
model.evaluate(test_X, num_features, NUM_TEST_IMAGES, test_Y, num_classes, batch_size);
model.save(R"(.\models\784-300-100-10.bin)");
model.destroy();
// Reading a model from file, evaluting it, and using it to make predictions
DeepCore model;
model.read(R"(.\models\784-300-100-10.bin)");
model.evaluate(test_X, num_features, NUM_TEST_IMAGES, test_Y, num_classes, batch_size);
model.predict(predict_X, num_features, NUM_PREDICT_IMAGES, predict_Y, num_classes);
print_batch_and_predictions(predict_X, actual_Y, predict_Y, NUM_PREDICT_IMAGES);
model.destroy();
Adds a layer to the model.
Arguments
std::unique_ptr<DeepCore::Layer> layer
: A unique pointer managing the Layer to be added to the model.
Usage
model.add(make_unique<DeepCore::Flatten>(784));
model.add(make_unique<DeepCore::Dense>(300, RELU));
Output
None
Configures and initializes the model with the provided loss function.
Arguments
Loss loss_func
: The specific loss function the model should use. Currently implemented loss functions includeCROSS_ENTROPY
andMSE
.
Usage
model.compile(MSE);
model.compile(CROSS_ENTROPY);
Output
None
Fits the model to the input data (features) X
and target data (labels) Y
using stochastic gradient descent. Optionally evaluates performance on a validation set after each epoch.
Note: The model must have already been compiled with compile()
.
Arguments
float *X
: The input data matrix to train the model. Should be of dimensionnum_features
×num_samples
.int num_features
: Number of features (input dimensions) of the dataset.int num_samples
: Number of samples (data points) in the dataset.float *Y
: The target data matrix to train the model. Should be of dimensionnum_classes
×num_samples
.int num_classes
: Number of classes (output dimensions) of the dataset.int batch_size = 50
: Number of samples per gradient update. Defaults to50
.int epochs = 10
: Number of epochs (iterations over the dataset) to train the model. Defaults to10
.float learning_rate = 0.1
: Learning rate of the model for gradient descent. Defaults to0.1
.float *validation_X = nullptr
: The validation input data matrix (optional).int num_validation = -1
: Number of samples in the validation set (optional).float *validation_Y = nullptr
: The validation target data (optional).
Usage
model.fit(X, num_features, NUM_TRAIN_IMAGES, Y, num_classes, batch_size, num_epochs, learning_rate, test_X, NUM_TEST_IMAGES, test_Y);
model.fit(X, 784, 60000, Y, 10, 50, 5, 0.1, test_X, 10000, test_Y);
Output
COMPILED MODEL:
______________________________________________________________________
Layer (type) Output Shape Param #
======================================================================
Flatten (50, 784, 1) 0
Dense (50, 300, 1) 235500
Dense (50, 100, 1) 30100
Dense (50, 10, 1) 1010
======================================================================
Total trainable params: 266610
______________________________________________________________________
EPOCH 1/20
BATCH 1200/1200 [================================] - BATCH ACCURACY: 0.940 - TOTAL ACCURACY: 0.922
TRAIN ACCURACY: 55313/60000 (92.19%) - VALIDATION ACCURACY: 9588/10000 (95.88%) - TIME ELAPSED: 28.15s - ETA: 00:01:52
EPOCH 2/20
BATCH 1200/1200 [================================] - BATCH ACCURACY: 1.000 - TOTAL ACCURACY: 0.968
TRAIN ACCURACY: 58102/60000 (96.84%) - VALIDATION ACCURACY: 9674/10000 (96.74%) - TIME ELAPSED: 34.58s - ETA: 00:01:42
EPOCH 3/20
BATCH 1200/1200 [================================] - BATCH ACCURACY: 0.980 - TOTAL ACCURACY: 0.978
TRAIN ACCURACY: 58700/60000 (97.83%) - VALIDATION ACCURACY: 9715/10000 (97.15%) - TIME ELAPSED: 29.57s - ETA: 00:00:58
EPOCH 4/20
BATCH 1200/1200 [================================] - BATCH ACCURACY: 1.000 - TOTAL ACCURACY: 0.985
TRAIN ACCURACY: 59073/60000 (98.45%) - VALIDATION ACCURACY: 9715/10000 (97.15%) - TIME ELAPSED: 30.85s - ETA: 00:00:30
EPOCH 5/20
BATCH 1200/1200 [================================] - BATCH ACCURACY: 0.980 - TOTAL ACCURACY: 0.989
TRAIN ACCURACY: 59353/60000 (98.92%) - VALIDATION ACCURACY: 9756/10000 (97.56%) - TIME ELAPSED: 28.64s - ETA: 00:00:00
>>> TRAINING COMPLETE.
Evaluates a trained model on the input data (features) X
and target data (labels) Y
.
Note: The model must have been trained with fit()
or read from file with read()
.
Arguments
float *test_X
: The input data matrix to test the model. Should be of dimensionnum_features
×num_samples
.int num_features
: Number of features (input dimensions) of the dataset.int num_test
: Number of samples (data points) in the dataset.float *test_Y
: The target data matrix to test the model. Should be of dimensionnum_classes
×num_samples
.int num_classes
: Number of classes (output dimensions) of the dataset.int batch_size = 50
: Number of samples to process at a time. Defaults to50
.
Usage
model.evaluate(test_X, num_features, NUM_TEST_IMAGES, test_Y, num_classes, batch_size);
model.evaluate(test_X, 784, 10000, test_Y, 10, 50);
Output
BATCH 200/200 [================================] - TEST ACCURACY: 9756/10000 (97.56%)
>>> TESTING COMPLETE.
Predicts target data (labels) predict_Y
of the input data (features) predict_X
using a trained model.
Note: The memory for predict_Y
must be allocated and managed by the caller. The model must have been trained with fit()
or read from file with read()
.
Arguments
float *predict_X
: The input data matrix to test the model. Should be of dimensionnum_features
×num_samples
.int num_features
: Number of features (input dimensions) of the dataset.int num_samples
: Number of samples (data points) in the dataset.float *predict_Y
: The target data matrix to test the model. Should be of dimensionnum_classes
×num_samples
.int num_classes
: Number of classes (output dimensions) of the dataset.
Usage
model.predict(predict_X, num_features, NUM_PREDICT_IMAGES, predict_Y, num_classes);
model.predict(predict_X, 784, 50, predict_Y, 10);
Output
>>> PREDICTION COMPLETE.
Saves all the information (layer information, weights, biases) about the model to file path
. The model can be recovered with read()
.
Arguments
string path
: Path of the file for the model to be saved into.
Usage
model.save(R"(.\models\784-300-100-10.bin)");
Output
SAVING MODEL TO .\models\784-300-100-10.bin
>>> SAVING COMPLETE.
Reads and loads all the information (layer information, weights, biases) about the model from file path
.
Note: After read()
is called the model does not need to be compiled with compile()
.
Arguments
string path
: Path of the file for the model to be loaded from.
Usage
model.read(R"(.\models\784-300-100-10.bin)");
Output
READING MODEL FROM .\models\784-300-100-10.bin
MODEL SPECIFICATIONS:
______________________________________________________________________
Layer (type) Output Shape Param #
======================================================================
Flatten (n, 784, 1) 0
Dense (n, 300, 1) 235500
Dense (n, 100, 1) 30100
Dense (n, 10, 1) 1010
======================================================================
Total trainable params: 266610
______________________________________________________________________
>>> READING COMPLETE.
Frees the rest of the memory associated with the model that was allocated from compile()
or read()
.
Arguments
None
Usage
model.destroy();
Output
None
The goal is the following. Given a set of input data
To do this, we must find the gradient of the cost function
In simple words, it tells us the factor by which the cost
All that's left is to compute the gradient vector
Consider a simplified example. Suppose we had a single data sample, represented by a vector Flatten
layer, a hidden Dense (ReLU)
layer, and an output Dense (Softmax)
layer we will call Flatten
layer will simply be our input data Dense
layers are defined as
Since the output layer undergoes Softmax
, our model outputs probabilites of a given output class Cross-Entropy Loss
. Then, the derivative of the cost function softmax
function. Similarly,
Now, how do we compute ReLU
function. Thus, we can define the local error of layer 1 as
And... doing this for every layer and parameter in our model, we're finished! In actuality, to compute the true gradient of the cost function
The true gradient descent algorithm works, but it has some flaws. The main flaw is that it is computationally expensive and slow to compute for large training datasets. In order to make but a single update to our parameters
Stochastic gradient descent is the algorithm DeepCore implements to fit a model, attempting to find a set of parameters
DeepCore uses NVIDIA's CUDA parallel computing platform, allowing for parallelization of tasks such as computing Jacobians or applying activation functions to matrices.
DeepCore uses NVIDIA's cuBLAS, a lightweight library built on top of NVIDIA's CUDA runtime, dedicated to performing basic linear algebra operations. DeepCore extends cuBLAS's functionality by implementing a tensor multiplication function with cuBLAS's batched matrix multiplication function.
DeepCore was evaluated using the MNIST dataset, a popular benchmark dataset for digit recognition tasks. The MNIST dataset consists of 60,000 training images and 10,000 testing images of handwritten digits. Each image is a grayscale image of size 28x28 pixels.
As a rising junior undergrad pursuing Computer Science and Applied Mathematics, I created this project out of personal interest to gain a deeper understanding of the underlying mathematics behind artificial neural networks, and to achieve my long-awaited goal of learning GPU/CUDA programming. Throughout the journey, I learned C++ OOP, CUDA programming, cuBLAS, and solidified my understanding of backpropagation. Other than using cuBLAS for basic matrix operations, everything was implemented from scratch - starting literally from a scratch sheet of paper containing mathematical constructs to a realized program. Huge thanks to the Stanford CS224N NLP with Deep Learning course and 3Blue1Brown's Deep Learning Series for being incredible free and online resources. If you have any suggestions/comments, feel free to reach out!