Update RK3562/RK3566/RK3568/RK3576/RK3588/RV1103/RV1106 NPU SDK to V2…

….0.0-beta0 Signed-off-by: Randall Zhuo <randall.zhuo@rock-chips.com>
airockchip · Mar 25, 2024 · 77b7109 · 77b7109
1 parent b25dada
commit 77b7109
Show file tree

Hide file tree

Showing 323 changed files with 6,693 additions and 2,649 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+.gitreview
+build/
+install/
+.vscode
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,28 +1,48 @@
 # CHANGELOG
 
-## 1.6.0
+## v2.0.0-beta0
+
+ - Support RK3576 (Beta)
+ - Support RK2118 (Beta)
+ - Support SDPA (Scaled Dot Product Attention) to improve transformer performance
+ - Improve custom operators support
+ - Improve MatMul API
+ - Improve support for Reshape,Transpose,BatchLayernorm,Softmax,Deconv,Matmul,ScatterND etc.
+ - Support pytorch 2.1
+ - Improve support for QAT models of pytorch and onnx
+ - Optimize automatic generation of C++ code
+
+
+
+## v1.6.0
+
  - Support ONNX model of OPSET 12~19
  - Support custom operators (including CPU and GPU)
  - Improve support for dynamic weight convolution, Layernorm, RoiAlign, Softmax, ReduceL2, Gelu, GLU, etc.
  - Added support for python3.7/3.9/3.11
  - Add rknn_convert function
  - Improve transformer support
- - Improve the MatMul API, such as increasing the K limit length, RK3588 adding int4 * int4 -> int16 support, etc.
+ - Improve MatMul API, such as increasing the K limit length, RK3588 adding int4 * int4 -> int16 support, etc.
  - Reduce RV1106 rknn_init initialization time, memory consumption, etc.
  - RV1106 adds int16 support for some operators
  - Fixed the problem that the convolution operator of RV1106 platform may make random errors in some cases.
  - Improve user manual
  - Reconstruct the rknn model zoo and add support for multiple models such as detection, segmentation, OCR, and license plate recognition.
 
-## 1.5.2
+
+
+## v1.5.2
+
 - Improve dynamic shape support
 - Improve matmul api support
 - Add GPU back-end implementations for some operators such as matmul
 - Improve transformer support
 - Reduce rknn_init memory usage
 - Optimize rknn_init time-consuming
 
-## 1.5.0
+
+
+## v1.5.0
 
 - Support RK3562
 - Support more NPU operator fuse, such as Conv-Silu/Conv-Swish/Conv-Hardswish/Conv-sigmoid/Conv-HardSwish/Conv-Gelu ..
@@ -38,19 +58,29 @@
 
 
 
-## 1.4.0
+## v1.4.0
 
 - Support more NPU operators, such as Reshape、Transpose、MatMul、 Max、Min、exGelu、exSoftmax13、Resize etc.
+
 - Add **Weight Share**  function, reduce memory usage.
+
 - Add **Weight Compression** function, reduce memory and bandwidth usage.(RK3588/RV1103/RV1106)
+
 - RK3588 supports storing weights or feature maps on SRAM, reducing system bandwidth consumption.
+
 - RK3588 adds the function of running a single model on multiple cores at the same time.
+
 - Add new output layout NHWC (C has alignment restrictions) .
+
 - Improve support for non-4D input.
+
 - Add more examples such as rknn_yolov5_android_apk_demo and rknn_internal_mem_reuse_demo.
+
 - Bug fix.
 
-## 1.3.0
+
+
+## v1.3.0
 
 - Support RV1103/RV1106（Beta SDK）
 - rknn_tensor_attr support w_stride(rename from stride) and h_stride
@@ -60,15 +90,19 @@
 - When RKNN_LOG_LEVEL=4, it supports to display the MACs utilization and bandwidth occupation of each layer.
 - Bug fix
 
-## 1.2.0
+
+
+## v1.2.0
 
 - Support RK3588
 - Support more operators, such as GRU、Swish、LayerNorm etc.
 - Reduce memory usage
 - Improve zero-copy interface implementation
 - Bug fix
 
-## 1.1.0
+
+
+## v1.1.0
 
 - Support INT8+FP16 mixed quantization to improve model accuracy
 - Support specifying input and output dtype, which can be solidified into the model
@@ -82,7 +116,9 @@
 - Bug fix
 
 
-## 1.0
+
+## v1.0
+
 - Optimize the performance of rknn_inputs_set()
 - Add more functions for zero-copy
 - Add new OP support, see OP support list document for details.
@@ -91,11 +127,17 @@
 - Bug fix
 
 
-## 0.7
+
+## v0.7
+
 - Optimize the performance of rknn_inputs_set(), especially for models whose input width is 8-byte aligned.
+
 - Add new OP support, see OP support list document for details.
+
 - Bug fix
 
-## 0.6
+
+
+## v0.6
 - Initial version
 
diff --git a/README.md b/README.md
@@ -18,8 +18,10 @@
 # Support Platform
   - RK3566/RK3568 Series
   - RK3588 Series
+  - RK3576 Series
   - RK3562 Series
   - RV1103/RV1106
+  - RK2118
 
 
 Note:
@@ -43,23 +45,30 @@ Note:
   - Ubuntu 18.04 python 3.6/3.7
   - Ubuntu 20.04 python 3.8/3.9
   - Ubuntu 22.04 python 3.10/3.11
-- Latest version:1.6.0(Release version)
+- Latest version:v2.0.0-beta0
+
+
+
+# RKNN LLM
+
+If you want to deploy LLM (Large Language Model), we have introduced a new SDK called RKNN-LLM. For details, please refer to:
+
+https://github.com/airockchip/rknn-llm
+
+
 
 # CHANGELOG
 
-## 1.6.0
- - Support ONNX model of OPSET 12~19
- - Support custom operators (including CPU and GPU)
- - Optimization operators support such as dynamic weighted convolution, Layernorm, RoiAlign, Softmax, ReduceL2, Gelu, GLU, etc.
- - Added support for python3.7/3.9/3.11
- - Add rknn_convert function
- - Optimize transformer support
- - Optimize the MatMul API, such as increasing the K limit length, RK3588 adding int4 * int4 -> int16 support, etc.
- - Optimize RV1106 rknn_init initialization time, memory consumption, etc.
- - RV1106 adds int16 support for some operators
- - Fixed the problem that the convolution operator of RV1106 platform may make random errors in some cases.
- - Optimize user manual
- - Reconstruct the rknn model zoo and add support for multiple models such as detection, segmentation, OCR, and license plate recognition.
+## v2.0.0-beta0
+ - Support RK3576 (Beta)
+ - Support RK2118 (Beta)
+ - Support SDPA (Scaled Dot Product Attention) to improve transformer performance
+ - Improve custom operators support
+ - Improve MatMul API
+ - Improve support for Reshape,Transpose,BatchLayernorm,Softmax,Deconv,Matmul,ScatterND etc.
+ - Support pytorch 2.1
+ - Improve support for QAT models of pytorch and onnx
+ - Optimize automatic generation of C++ code
 
  for older version, please refer [CHANGELOG](CHANGELOG.md)
 

diff --git a/autosparsity/README.md b/autosparsity/README.md
@@ -0,0 +1,58 @@
+## AutoSparsity
+
+Enables sparse training and inference for PyTorch models.
+
+
+## Usage
+
+### Step 1
+
+Install autosparsity package
+
+```bash
+pip install packages/autosparsity-1.0-cp38-cp38m-linux_x86_64.whl
+```
+
+### Step 2
+
+Taking ResNet50 in torchvision as an example to generate the sparse model.
+
+```bash
+python examples/autosparsity.py
+```
+To sparsity a custom model, just add the sparsity_model functionwhen model training, as follows:
+
+```python
+# insert model autosparsity code before training
+import torch
+import torchvision.models as models
+from autosparsity.sparsity import sparsity_model
+
+...
+
+model = models.resnet34(pretrained=True).cuda()
+mode = 0
+sparsity_model(model, optimizer, mode)
+
+# normal training
+x, y = DataLoader(args)
+for epoch in range(epochs):
+    y_pred = model(x)
+    loss = loss_func(y_pred, y)
+    loss.backward()
+    optimizer.step()
+    ...
+```
+
+- Note: Make sure CUDA is available
+
+
+### Step3
+
+Use RKNN-Toolkite to perfom sparse inference
+
+```bash
+python examples/test.py
+```
+- Note: Only supports RK3576 target platform
+
diff --git a/autosparsity/examples/README.md b/autosparsity/examples/README.md
@@ -0,0 +1,68 @@
+# Sparse Infer
+
+This tool is used for the Torch model to autosparsity the weights during the training, which can save model storage and reduce model inference time in RKNN sparse inference.
+
+## Usage
+
+### Step 1
+
+Install autosparsity package
+
+```bash
+pip install ../packages/autosparsity-1.0-cp38-cp38m-linux_x86_64.whl
+```
+
+### Step 2
+
+Taking ResNet50 in torchvision as an example to generate the sparse model.
+
+```bash
+python autosparsity.py
+```
+To sparsity a custom model, just add the sparsity_model functionwhen model training, as follows:
+
+```python
+# insert model autosparsity code before training
+import torch
+import torchvision.models as models
+from autosparsity.sparsity import sparsity_model
+
+...
+
+model = models.resnet34(pretrained=True).cuda()
+mode = 0
+sparsity_model(model, optimizer, mode)
+
+# normal training
+x, y = DataLoader(args)
+for epoch in range(epochs):
+    y_pred = model(x)
+    loss = loss_func(y_pred, y)
+    loss.backward()
+    optimizer.step()
+    ...
+```
+
+- Note: Make sure CUDA is available
+
+
+### Step3
+
+Perfom sparse inference
+
+```bash
+python test.py
+```
+- Note: Only supports RK3576 target platform
+
+### Expected Results:
+
+This will print the , as follows:
+```
+-----TOP 5-----
+[155] score:0.877372 class:"Shih-Tzu"
+[283] score:0.042477 class:"Persian cat"
+[ 82] score:0.006625 class:"ruffed grouse, partridge, Bonasa umbellus"
+[154] score:0.006625 class:"Pekinese, Pekingese, Peke"
+[204] score:0.004696 class:"Lhasa, Lhasa apso"
+```
diff --git a/autosparsity/examples/autosparsity.py b/autosparsity/examples/autosparsity.py
@@ -0,0 +1,17 @@
+import torch
+import torchvision.models as models
+from autosparsity.sparsity import sparsity_model
+
+
+if __name__ == "__main__":
+
+    model = models.resnet50(pretrained=True).cuda()
+    optimizer = None
+    mode = 0
+    sparsity_model(model, optimizer, mode)
+
+    model.eval()
+    x = torch.randn((1,3,224,224)).cuda()
+    torch.onnx.export(
+        model, x, 'resnet50.onnx', input_names=['inputs'], output_names=['outputs']
+    )
diff --git a/autosparsity/examples/datasets.txt b/autosparsity/examples/datasets.txt
@@ -0,0 +1 @@
+./dog_224x224.jpg
diff --git a/...e2/examples/dynamic_shape/dog_224x224.jpg → autosparsity/examples/dog_224x224.jpg b/...e2/examples/dynamic_shape/dog_224x224.jpg → autosparsity/examples/dog_224x224.jpg
diff --git a/...tions/gen_cpp_rknn_deploy_demo/labels.txt → autosparsity/examples/labels.txt b/...tions/gen_cpp_rknn_deploy_demo/labels.txt → autosparsity/examples/labels.txt
diff --git a/autosparsity/examples/resnet50.onnx b/autosparsity/examples/resnet50.onnx