Skip to content

Commit

Permalink
Update RK3562/RK3566/RK3568/RK3576/RK3588/RV1103/RV1106 NPU SDK to V2…
Browse files Browse the repository at this point in the history
….0.0-beta0

Signed-off-by: Randall Zhuo <randall.zhuo@rock-chips.com>
  • Loading branch information
Randall Zhuo committed Mar 25, 2024
1 parent b25dada commit 77b7109
Show file tree
Hide file tree
Showing 323 changed files with 6,693 additions and 2,649 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.gitreview
build/
install/
.vscode
64 changes: 53 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,48 @@
# CHANGELOG

## 1.6.0
## v2.0.0-beta0

- Support RK3576 (Beta)
- Support RK2118 (Beta)
- Support SDPA (Scaled Dot Product Attention) to improve transformer performance
- Improve custom operators support
- Improve MatMul API
- Improve support for Reshape,Transpose,BatchLayernorm,Softmax,Deconv,Matmul,ScatterND etc.
- Support pytorch 2.1
- Improve support for QAT models of pytorch and onnx
- Optimize automatic generation of C++ code



## v1.6.0

- Support ONNX model of OPSET 12~19
- Support custom operators (including CPU and GPU)
- Improve support for dynamic weight convolution, Layernorm, RoiAlign, Softmax, ReduceL2, Gelu, GLU, etc.
- Added support for python3.7/3.9/3.11
- Add rknn_convert function
- Improve transformer support
- Improve the MatMul API, such as increasing the K limit length, RK3588 adding int4 * int4 -> int16 support, etc.
- Improve MatMul API, such as increasing the K limit length, RK3588 adding int4 * int4 -> int16 support, etc.
- Reduce RV1106 rknn_init initialization time, memory consumption, etc.
- RV1106 adds int16 support for some operators
- Fixed the problem that the convolution operator of RV1106 platform may make random errors in some cases.
- Improve user manual
- Reconstruct the rknn model zoo and add support for multiple models such as detection, segmentation, OCR, and license plate recognition.

## 1.5.2


## v1.5.2

- Improve dynamic shape support
- Improve matmul api support
- Add GPU back-end implementations for some operators such as matmul
- Improve transformer support
- Reduce rknn_init memory usage
- Optimize rknn_init time-consuming

## 1.5.0


## v1.5.0

- Support RK3562
- Support more NPU operator fuse, such as Conv-Silu/Conv-Swish/Conv-Hardswish/Conv-sigmoid/Conv-HardSwish/Conv-Gelu ..
Expand All @@ -38,19 +58,29 @@



## 1.4.0
## v1.4.0

- Support more NPU operators, such as Reshape、Transpose、MatMul、 Max、Min、exGelu、exSoftmax13、Resize etc.

- Add **Weight Share** function, reduce memory usage.

- Add **Weight Compression** function, reduce memory and bandwidth usage.(RK3588/RV1103/RV1106)

- RK3588 supports storing weights or feature maps on SRAM, reducing system bandwidth consumption.

- RK3588 adds the function of running a single model on multiple cores at the same time.

- Add new output layout NHWC (C has alignment restrictions) .

- Improve support for non-4D input.

- Add more examples such as rknn_yolov5_android_apk_demo and rknn_internal_mem_reuse_demo.

- Bug fix.

## 1.3.0


## v1.3.0

- Support RV1103/RV1106(Beta SDK)
- rknn_tensor_attr support w_stride(rename from stride) and h_stride
Expand All @@ -60,15 +90,19 @@
- When RKNN_LOG_LEVEL=4, it supports to display the MACs utilization and bandwidth occupation of each layer.
- Bug fix

## 1.2.0


## v1.2.0

- Support RK3588
- Support more operators, such as GRU、Swish、LayerNorm etc.
- Reduce memory usage
- Improve zero-copy interface implementation
- Bug fix

## 1.1.0


## v1.1.0

- Support INT8+FP16 mixed quantization to improve model accuracy
- Support specifying input and output dtype, which can be solidified into the model
Expand All @@ -82,7 +116,9 @@
- Bug fix


## 1.0

## v1.0

- Optimize the performance of rknn_inputs_set()
- Add more functions for zero-copy
- Add new OP support, see OP support list document for details.
Expand All @@ -91,11 +127,17 @@
- Bug fix


## 0.7

## v0.7

- Optimize the performance of rknn_inputs_set(), especially for models whose input width is 8-byte aligned.

- Add new OP support, see OP support list document for details.

- Bug fix

## 0.6


## v0.6
- Initial version

37 changes: 23 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,10 @@
# Support Platform
- RK3566/RK3568 Series
- RK3588 Series
- RK3576 Series
- RK3562 Series
- RV1103/RV1106
- RK2118


Note:
Expand All @@ -43,23 +45,30 @@ Note:
- Ubuntu 18.04 python 3.6/3.7
- Ubuntu 20.04 python 3.8/3.9
- Ubuntu 22.04 python 3.10/3.11
- Latest version:1.6.0(Release version)
- Latest version:v2.0.0-beta0



# RKNN LLM

If you want to deploy LLM (Large Language Model), we have introduced a new SDK called RKNN-LLM. For details, please refer to:

https://github.com/airockchip/rknn-llm



# CHANGELOG

## 1.6.0
- Support ONNX model of OPSET 12~19
- Support custom operators (including CPU and GPU)
- Optimization operators support such as dynamic weighted convolution, Layernorm, RoiAlign, Softmax, ReduceL2, Gelu, GLU, etc.
- Added support for python3.7/3.9/3.11
- Add rknn_convert function
- Optimize transformer support
- Optimize the MatMul API, such as increasing the K limit length, RK3588 adding int4 * int4 -> int16 support, etc.
- Optimize RV1106 rknn_init initialization time, memory consumption, etc.
- RV1106 adds int16 support for some operators
- Fixed the problem that the convolution operator of RV1106 platform may make random errors in some cases.
- Optimize user manual
- Reconstruct the rknn model zoo and add support for multiple models such as detection, segmentation, OCR, and license plate recognition.
## v2.0.0-beta0
- Support RK3576 (Beta)
- Support RK2118 (Beta)
- Support SDPA (Scaled Dot Product Attention) to improve transformer performance
- Improve custom operators support
- Improve MatMul API
- Improve support for Reshape,Transpose,BatchLayernorm,Softmax,Deconv,Matmul,ScatterND etc.
- Support pytorch 2.1
- Improve support for QAT models of pytorch and onnx
- Optimize automatic generation of C++ code

for older version, please refer [CHANGELOG](CHANGELOG.md)

Expand Down
58 changes: 58 additions & 0 deletions autosparsity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## AutoSparsity

Enables sparse training and inference for PyTorch models.


## Usage

### Step 1

Install autosparsity package

```bash
pip install packages/autosparsity-1.0-cp38-cp38m-linux_x86_64.whl
```

### Step 2

Taking ResNet50 in torchvision as an example to generate the sparse model.

```bash
python examples/autosparsity.py
```
To sparsity a custom model, just add the sparsity_model functionwhen model training, as follows:

```python
# insert model autosparsity code before training
import torch
import torchvision.models as models
from autosparsity.sparsity import sparsity_model

...

model = models.resnet34(pretrained=True).cuda()
mode = 0
sparsity_model(model, optimizer, mode)

# normal training
x, y = DataLoader(args)
for epoch in range(epochs):
y_pred = model(x)
loss = loss_func(y_pred, y)
loss.backward()
optimizer.step()
...
```

- Note: Make sure CUDA is available


### Step3

Use RKNN-Toolkite to perfom sparse inference

```bash
python examples/test.py
```
- Note: Only supports RK3576 target platform

68 changes: 68 additions & 0 deletions autosparsity/examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Sparse Infer

This tool is used for the Torch model to autosparsity the weights during the training, which can save model storage and reduce model inference time in RKNN sparse inference.

## Usage

### Step 1

Install autosparsity package

```bash
pip install ../packages/autosparsity-1.0-cp38-cp38m-linux_x86_64.whl
```

### Step 2

Taking ResNet50 in torchvision as an example to generate the sparse model.

```bash
python autosparsity.py
```
To sparsity a custom model, just add the sparsity_model functionwhen model training, as follows:

```python
# insert model autosparsity code before training
import torch
import torchvision.models as models
from autosparsity.sparsity import sparsity_model

...

model = models.resnet34(pretrained=True).cuda()
mode = 0
sparsity_model(model, optimizer, mode)

# normal training
x, y = DataLoader(args)
for epoch in range(epochs):
y_pred = model(x)
loss = loss_func(y_pred, y)
loss.backward()
optimizer.step()
...
```

- Note: Make sure CUDA is available


### Step3

Perfom sparse inference

```bash
python test.py
```
- Note: Only supports RK3576 target platform

### Expected Results:

This will print the , as follows:
```
-----TOP 5-----
[155] score:0.877372 class:"Shih-Tzu"
[283] score:0.042477 class:"Persian cat"
[ 82] score:0.006625 class:"ruffed grouse, partridge, Bonasa umbellus"
[154] score:0.006625 class:"Pekinese, Pekingese, Peke"
[204] score:0.004696 class:"Lhasa, Lhasa apso"
```
17 changes: 17 additions & 0 deletions autosparsity/examples/autosparsity.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import torch
import torchvision.models as models
from autosparsity.sparsity import sparsity_model


if __name__ == "__main__":

model = models.resnet50(pretrained=True).cuda()
optimizer = None
mode = 0
sparsity_model(model, optimizer, mode)

model.eval()
x = torch.randn((1,3,224,224)).cuda()
torch.onnx.export(
model, x, 'resnet50.onnx', input_names=['inputs'], output_names=['outputs']
)
1 change: 1 addition & 0 deletions autosparsity/examples/datasets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
./dog_224x224.jpg
File renamed without changes
File renamed without changes.
Binary file added autosparsity/examples/resnet50.onnx
Binary file not shown.

0 comments on commit 77b7109

Please sign in to comment.