A neural network library for convolutional, fully connected nets and RNNs in C++.
This library implements some of the assignments from Stanfords's CS231n 2016 course by Andrej Karpathy, Fei-Fei Li, Justin Johnson and CS224d by Richard Socher as C++ framework.
The current v2
-version of the project has the following objectives:
- implement full support for graphs (not only sequential)
- cleanup & documentation
- This will be work-in-progress for considerable time. The previous version is archived in branch
v1
. - CUDA support and other external graphics card libs have been removed (since for good performance they need to rely on blackbox-libs)
Current state: beta
- Fully connected networks
- Convolutional layers
- Recurrent nets (RNNs)
- Long-term short-term memory nets (LSTMs)
- ReLu, Sigmoid, TanH, SELU(1), resilu(2) nonlinearities
- BatchNorm, SpatialBatchNorm, Dropout layers
- Softmax, SVM loss
- TemporalAffine and TemporalSoftmax layers for RNNs
[1]: "scaled exponential linear units" (SELUs), https://arxiv.org/abs/1706.02515
[2]: "resilu residual & relu nonlinearity + linearity" (linear skip connection combined with non-linearity) (s.b.)
Example: C++ definition of a deep convolutional net with batch-norm, dropout and fully connected layers:
LayerBlock lb(R"({"name":"DomsNet","bench":false,"init":"orthonormal"})"_json);
lb.addLayer("Convolution", "cv1", R"({"inputShape":[1,28,28],"kernel":[48,5,5],"stride":1,"pad":2})",{"input"});
lb.addLayer("BatchNorm","sb1","{}",{"cv1"});
lb.addLayer("Relu","rl1","{}",{"sb1"});
lb.addLayer("Dropout","doc1",R"({"drop":0.8})",{"rl1"});
lb.addLayer("Convolution", "cv2", R"({"kernel":[48,3,3],"stride":1,"pad":1})",{"doc1"});
lb.addLayer("Relu","rl2","{}",{"cv2"});
lb.addLayer("Convolution", "cv3", R"({"kernel":[64,3,3],"stride":2,"pad":1})",{"rl2"});
lb.addLayer("BatchNorm","sb2","{}",{"cv3"});
lb.addLayer("Relu","rl3","{}",{"sb2"});
lb.addLayer("Dropout","doc2",R"({"drop":0.8})",{"rl3"});
lb.addLayer("Convolution", "cv4", R"({"kernel":[64,3,3],"stride":1,"pad":1})",{"doc2"});
lb.addLayer("Relu","rl4","{}",{"cv4"});
lb.addLayer("Convolution", "cv5", R"({"kernel":[128,3,3],"stride":2,"pad":1})",{"rl4"});
lb.addLayer("BatchNorm","sb3","{}",{"cv5"});
lb.addLayer("Relu","rl5","{}",{"sb3"});
lb.addLayer("Dropout","doc3",R"({"drop":0.8})",{"rl5"});
lb.addLayer("Convolution", "cv6", R"({"kernel":[128,3,3],"stride":1,"pad":1})",{"doc3"});
lb.addLayer("Relu","rl6","{}",{"cv6"});
lb.addLayer("Affine","af1",R"({"hidden":1024})",{"rl6"});
lb.addLayer("BatchNorm","bn1","{}",{"af1"});
lb.addLayer("Relu","rla1","{}",{"bn1"});
lb.addLayer("Dropout","do1",R"({"drop":0.7})",{"rla1"});
lb.addLayer("Affine","af2",R"({"hidden":512})",{"do1"});
lb.addLayer("BatchNorm","bn2","{}",{"af2"});
lb.addLayer("Relu","rla2","{}",{"bn2"});
lb.addLayer("Dropout","do2",R"({"drop":0.7})",{"rla2"});
lb.addLayer("Affine","af3",R"({"hidden":10})",{"do2"});
lb.addLayer("Softmax","sm1","{}",{"af3"});
json jo(R"({"verbose":true,"shuffle":true,"lr_decay":0.95,"epsilon":1e-8})"_json);
jo["epochs"]=(floatN)40.0;
jo["batch_size"]=50;
jo["learning_rate"]=(floatN)5e-4;
jo["regularization"]=(floatN)1e-8;
lb.train(X, y, Xv, yv, "Adam", jo);
floatN train_err, val_err, test_err;
train_err=lb.test(X, y, jo.value("batch_size", 50));
val_err=lb.test(Xv, yv, jo.value("batch_size", 50));
test_err=lb.test(Xt, yt, jo.value("batch_size", 50));
cerr << "Final results on MNIST after " << jo.value("epochs",(floatN)0.0) << " epochs:" << endl;
cerr << " Train-error: " << train_err << " train-acc: " << 1.0-train_err << endl;
cerr << " Validation-error: " << val_err << " val-acc: " << 1.0-val_err << endl;
cerr << " Test-error: " << test_err << " test-acc: " << 1.0-test_err << endl;
see mnisttest or cifar10test for complete examples.
json j0;
string oName{"OH0"};
j0["inputShape"]=vector<int>{T};
j0["V"]=VS;
lb.addLayer("OneHot",oName,j0,{"input"});
int layer_depth=4;
string nName;
json j1;
j1["inputShape"]=vector<int>{VS,T};
j1["N"]=BS;
j1["H"]=H;
j1["forgetgateinitones"]=true;
j1["forgetbias"]=1.0;
j1["clip"]=clip;
for (auto l=0; l<layer_depth; l++) {
nName="lstm"+std::to_string(l);
lb.addLayer(rnntype,nName,j1,{oName});
oName=nName;
}
json j11;
j11["inputShape"]=vector<int>{VS,T};
lb.addLayer("TemporalSoftmax","sm1",j11,{"af1"});
see rnnreader for a complete example.
- C++ 11 compiler (on Linux (tested: clang, gcc, Intel icpc) or macOS (clang x86-64 and Apple silicon (clang 12, 13)), Raspberry ARM(gcc))
- CMake build system.
- Hdf5 C++ API for model saving and sample data,
hdf5
orlibhdf5-dev
.
- use
ccmake
to configureUSE_SYSTEM_BLAS
toON
, which instructs eigen to use M1's hardware accelerators.rnnreader
sees dramatic 3x-6x speedup, single thread benchmarks inbench
see 200%-400% improvements! [Testet on macOS 12 beta 3 - 2021-07-19] - Memory: macOS simply doesn't give processes all available memory. Expect swapping (and significant speed decrease) when allocating more than 4-5GB, even on 16GB M1 machines.
- The hdf5 libraries are available for ARM64 (
brew install hdf5
).
- Eigen v3.4
eigen3
, already (in default configuration) included in the source tree as submodule. - nlohmann_json, already included in source tree (cpneural/nlohmann_json).
syncognite uses the CMake build system.
Clone the repository:
git clone git://github.com/domschl/syncognite
git submodule init
git submodule update # This gets the in-tree Eigen3
Create a build
directory within the syncognite directory and configure the build:
# in sycognite/build, default is make-build-system, but Ninja can also be used:
cmake [-G Ninja] ..
# optionally use ccmake to configure options and paths:
ccmake ..
To configure your editor / ide for include paths use (in build
):
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=YES ..
or simply execute the helper create_compile_commands.sh
.
macOS users might want to configure for building with Xcode:
cmake -G Xcode ..
Build the project:
make
# or
ninja
# or (macOS) start Xcode and load the generated project file, or:
xcodebuild -configuration Release
- 2022-04-01: nlohmann_json updated to latest
- 2022-03-24: Serious bug fixed in stateful optimizers (incl. Adam): state was lost on each call, causing slow convergence.
- 2022-03-22: Started v2-branch Removed CUDA and other external graphics libs.
- 2021-10-10: Moved CI from travis (defunct) to github workflows. Valgrind currently disabled.
- 2021-08-21: eigen update to 3.4 release
- 2021-07-19: eigen update to 3.4rc1
- 2021-07-19: Dramatic speed improvements when configuring eigen to use system blas (using
ccmake
) with Apple M1, seems to use M1's magic hardware accelerators. - 2020-11-12: Switched eigen3 submodule to gitlab, tracks 3.3 branch
- 2020-07-31: Apple ARM tested ok.
- 2020-07-05: Tests with resilu (non-)linearity
- 2018-03-02: Removed faulty RAN layer, switched to official eigen3 github-mirror at: Github eigen3, fixes for eigen-dev stricted type-checking.
Things that should work:
- testneural (cptest subproject, consistency tests for all layers using testdata and numerical differentials)
- bench (benchmark subproject, benchmarks for all layers)
- mnisttest (cpmnist subproject, MNIST handwritten digit recognition with a convolutional network, requires dataset download.)
- cifar10test (cpcifar10 subproject, cifar10 image recognition with a convolutional network, requires dataset download.)
- rnnreader (rnnreader subproject, text generation via RNN/LSTMs, similar to char-rnn.)
See jupyter notebook for visualization and more discussions of resilu function.
(1)
(2)
thus can be interpreted as a residual combination of linearity and non-linearity via addition.
Since
Both