An open-source scientific computing engine designed for matrix computation and machine learning tasks, with CUDA acceleration support.
Edge-Computing-Engine provides a flexible computational graph implementation that supports:
- Forward propagation for neural network inference
- Backward propagation for training via automatic differentiation
- CUDA-accelerated operations for improved performance on NVIDIA GPUs
The computational graph is built using nodes representing operations such as matrix multiplication, element-wise operations, and activation functions.
The framework supports the following operations:
- OP_INPUT: Input nodes for data and parameters
- OP_ADD: Element-wise addition
- OP_SUB: Element-wise subtraction
- OP_MUL: Element-wise multiplication
- OP_DIV: Element-wise division
- OP_MATMUL: Matrix multiplication
- OP_RELU: ReLU activation function
- OP_SIGMOID: Sigmoid activation function
- OP_TANH: Tanh activation function (declared but not implemented)
- OP_SOFTMAX: Softmax function (declared but not implemented)
- Matrix_CU: A custom matrix class with CUDA support
- ComputeNode: Represents a node in the computational graph
- ComputeGraph: Manages the graph, including node creation, forward and backward propagation
The code includes several CUDA kernel functions:
- reluForwardKernel/reluBackwardKernel: Forward and backward propagation for ReLU
- sigmoidForwardKernel/sigmoidBackwardKernel: Forward and backward propagation for Sigmoid
- matmulForwardKernel/matmulBackwardAKernel/matmulBackwardBKernel: Forward and backward propagation for matrix multiplication
- mulBackwardKernel: Backward propagation for element-wise multiplication
- divBackwardKernel: Backward propagation for element-wise division
- negGradKernel: Helper kernel for computing negative gradients
// Create a computational graph
ComputeGraph graph;
// Create input and weight nodes
ComputeNode* input = graph.addInput(input_rows, input_cols, batch_size);
ComputeNode* weight = graph.addInput(input_cols, output_cols);
// Initialize with random values
input->value.randomInit();
weight->value.randomInit();
// Build forward computation graph
ComputeNode* fc1 = graph.addMatMul(input, weight); // Fully connected layer
ComputeNode* output_node = graph.addReLU(fc1); // Apply ReLU activation
// Perform forward propagation
graph.forward();
// Compute gradients via backward propagation
graph.backward(output_node);
root@5353:/backup# nvcc -arch=sm_80 -I/usr/local/cuda/include -L/usr/local/cuda/lib64 cuda_mat/mat_grad.cu -o main && ./main
root@5353:/backup# ./main
====== Configuration ======
Input matrix: 10000 x 1000
Weight matrix: 1000 x 500
Batch size: 32
Activation: ReLU
====== CUDA Results ======
Input first 5x5:
0.840188 0.394383 0.783099 0.79844 0.911647
0.45724 0.640368 0.717092 0.460067 0.54114
0.0860517 0.180886 0.426423 0.0470658 0.306902
0.0587587 0.0138811 0.622212 0.0391351 0.221551
0.456151 0.24875 0.0699601 0.742097 0.216935
Weight first 5x5:
0.817311 0.0247108 0.0146763 0.939293 0.502699
0.604392 0.921644 0.675689 0.948712 0.58803
0.824073 0.747934 0.0448163 0.757354 0.858343
0.308744 0.659798 0.941692 0.255731 0.539655
0.89383 0.609149 0.799556 0.726306 0.640965
Output first 5x5:
256.076 253.23 258.393 259.965 255.971
250.843 246.827 252.131 253.493 244.151
247.064 244.543 251.723 247.424 250.466
249.944 250.359 256.148 252.21 249.615
246.675 238.295 252.572 242.94 243.512
Input gradient first 5x5:
244.628 251.341 255.388 250.4 249.891
244.628 251.341 255.388 250.4 249.891
244.628 251.341 255.388 250.4 249.891
244.628 251.341 255.388 250.4 249.891
244.628 251.341 255.388 250.4 249.891
Weight gradient first 5x5:
159791 159791 159791 159791 159791
160010 160010 160010 160010 160010
160266 160266 160266 160266 160266
159899 159899 159899 159899 159899
159964 159964 159964 159964 159964
====== Performance ======
CUDA computation time: 2201.45 ms
- CUDA-capable NVIDIA GPU (for GPU features)
- CUDA Toolkit
- C++11 or above
- nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0
read_csv(string &file_path)
: Reads a formatted file (csv) and returns a matrix with automatically calculated dimensions- Formatted file writing interface (similar to
pandas.to_csv
) - Broadcasting mechanism for matrices with padding interface
- Fully connected layer forward and backward propagation interfaces with automatic differentiation support
- Matrix differentiation and automatic differentiation interfaces
- Various matrix operations:
create(row,cols)
: Creates a matrix with specified dimensionsadd
,subtract
,mul
: Basic matrix operationsget_T
: Matrix transpositionflatten
: Returns a flattened arraymatrix_rs
: Matrix structure compressionmatrix_sum
,matrix_mean
: Statistical operationsiloc
: Matrix slicing- And many more
- Activation functions (Relu)
- Loss functions (MSE)
- Neural network components (convolutional layers, pooling layers)
git clone git@github.com:AllenZYJ/Edge-Computing-Engine.git
cd Edge-Computing-Engine/install_diff && make && make install
Note: If you're not a root user, you may need to add sudo to the make install command.
root@5353:/backup# .g++ main.cpp -o main -lautodiff
root@5353:/backup# ./main
0.000000+-0.000378*1.000000
-0.000378+0.000000*1.000000
....
-0.000378+0.000000*1.000000
0.000000+-0.000378*1.000000
-0.000378+0.000000*1.000000
-0.000378+0.000000*1.000000
weight_1_grad:
[
-0.000378 -0.000378 -0.000378
-0.000378 -0.000378 -0.000378
-0.000378 -0.000378 -0.000378
]
neraul end;
Contributions are welcome. To contribute:
- Fork the repository
- Create a new branch:
git checkout -b my-new-feature
- Make your changes and commit them:
git commit -am 'Add some feature'
- Push your changes:
git push origin my-new-feature
- Submit a pull request
Please ensure your code adheres to the existing style and passes tests before submitting.
- Email: zk@likedge.top or edge@ibooker.org.cn
- Website: Likedge
Edge-Computing-Engine is released under the Apache 2.0 License. See the LICENSE
file for details.