# **Lung-Nodule-Detection: Part-3 Model Deployment**

## 目标

在Part-3部分，分别 **部署** 分割模型和分类模型 到 ***RK3588S*** 平台，并 **验证** 运行结果的正确性 。


## 内容

本节主要由2部分组成

- 导出训练好的模型为 **ONNX** 格式，再转换成 ***RK3588S*** 的 ***NPU*** 支持的 ***RKNN*** 格式
- 分别部署分割模型和分类模型 到 ***RK3588S*** 平台，并与 ***PC*** 运行结果比对

## 备注

由于分类模型使用了 ***Conv3d*** 算子，而 ***RKNN*** 不支持 **3D卷积**，所以在作分类模型的格式转换时，会 **报错** 。  
有以下两种解决办法：

- 修改模型架构，改用 ***Conv2d*** 实现 ***Conv3d*** 的等价操作
- 分类模型不使用 ***RKNPU***，改为 ***ONNX Runtime*** + ***GPU (OpenCL)***

**TODO**: 由于本项目着重 **嵌入式 + AI** 的结合，重点在 **部署**，所以拟采用 ***ONNX Runtime*** + ***GPU (OpenCL)*** 的方案

## 设置
**必须**先执行一次设置代码块

In [1]:
import os
import sys
import torch
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

# 获取当前工作目录
current_path = os.getcwd()

# 获取当前目录的最后一个字符串（即最后一个文件夹名）
last_directory = os.path.basename(current_path)

# 检查是否是 notebook
if last_directory == 'notebooks':
    # 切换到上一级目录
    os.chdir('..')

#print(f"切换到目录: {os.getcwd()}")

sys.path.append('src')  # 将模块所在的路径添加到 sys.path

from util.logconf import logging
from app.infer.eval_seg import SegmentationTestingApp
from deployment.export_onnx import *
from deployment.convert_rknn import *

log = logging.getLogger('nb')

2025-05-26 21:43:15.338237: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-26 21:43:15.338288: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-26 21:43:15.339374: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-26 21:43:15.346264: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# 设置随机数种子方便复现实验结果
def set_random_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    #torch.set_num_threads(1)

set_random_seed(42)

In [3]:
# 定义一个辅助函数用于启动应用
def run(app, *argv):
    argv = list(argv)
    log.info("Running: {}({!r}).main()".format(app.__name__, argv))
    app(argv).main()
    log.info("Finished: {}.{!r}).main()".format(app.__name__, argv))

## 导出并转换为RKNN模型

In [4]:
# 导出分割模型为ONNX格式
export_onnx(["--import-path=data/models/seg/seg_2025-04-30_18.55.29_seg.3500000.state",
            "--input-shape", "1", "7", "512", "512", "--model-type=seg"])

export to deployment/models/seg_model.onnx


In [5]:
# 导出分割模型为ONNX格式
convert_main(["--import-path=deployment/models/seg_model.onnx", "--model-type=seg", "--platform=rk3588"])

I rknn-toolkit2 version: 2.3.2


--> Config model
done
--> Loading model


I Loading : 100%|████████████████████████████████████████████████| 70/70 [00:00<00:00, 85523.24it/s]
[1;33mW[0m [1;33mload_onnx: The config.mean_values is None, zeros will be set for input 0![0m
[1;33mW[0m [1;33mload_onnx: The config.std_values is None, ones will be set for input 0![0m
[1;33mW[0m [1;33mbuild: The dataset='' is ignored because do_quantization = False![0m


done
--> Building model


I OpFusing 2 : 100%|████████████████████████████████████████████| 100/100 [00:00<00:00, 2159.23it/s]





I OpFusing 2 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 232.48it/s]
I rknn building ...
I rknn building done.


done
--> Export rknn model
done


In [6]:
# 导出分类模型为ONNX格式
export_onnx(["--import-path=data/models/nodule-cls/cls_2025-05-01_10.27.09_nodule_cls.best.state",
            "--input-shape", "1", "1", "32", "48", "48", "--model-type=cls"])

export to deployment/models/cls_model.onnx


In [7]:
# 导出分类模型为ONNX格式
# ATTENTION!!! 取消注释下面的代码会导致 ipykernel 崩溃，原因见本节开篇备注
# convert_main(["--import-path=deployment/models/cls_model.onnx", "--model-type=cls", "--platform=rk3588"])

## 部署分割模型到RK-NPU

In [8]:
# PC的运行结果
run(SegmentationTestingApp, "--num-workers=0", 
    "--platform=pytorch",
    "--model-path=data/models/seg/seg_2025-04-30_18.55.29_seg.3500000.state",
    "--verbose"
)

2025-05-26 21:43:19,622 INFO     pid:1453421 nb:004:run Running: SegmentationTestingApp(['--num-workers=0', '--platform=pytorch', '--model-path=data/models/seg/seg_2025-04-30_18.55.29_seg.3500000.state', '--verbose']).main()
2025-05-26 21:43:19,692 INFO     pid:1453421 app.infer.eval_seg:158:initPytorchModel Using CUDA; 1 devices.
2025-05-26 21:43:19,698 INFO     pid:1453421 app.infer.eval_seg:202:main Starting SegmentationTestingApp, Namespace(platform='pytorch', model_path='data/models/seg/seg_2025-04-30_18.55.29_seg.3500000.state', verbose=True, target=None, device_id=None, batch_size=16, num_workers=0)
2025-05-26 21:43:26,583 INFO     pid:1453421 app.infer.eval_seg:206:main 71 batches of size 16*1
2025-05-26 21:43:36,367 INFO     pid:1453421 util.util:238:enumerateWithEstimate E0 Validation     4/71, done at 2025-05-26 21:45:45, 0:02:18
2025-05-26 21:43:46,613 INFO     pid:1453421 util.util:238:enumerateWithEstimate E0 Validation     8/71, done at 2025-05-26 21:46:04, 0:02:37
2025-

In [5]:
# RK-NPU 的运行结果
run(SegmentationTestingApp, "--num-workers=0", 
    "--platform=rknn",
    "--model-path=deployment/models/seg_model.rknn",
    "--target=rk3588",
    "--verbose"
)

2025-05-20 23:56:11,903 INFO     pid:3399200 nb:004:run Running: SegmentationTestingApp(['--num-workers=1', '--platform=rknn', '--model-path=deployment/models/seg_model.rknn', '--target=rk3588', '--verbose']).main()
I rknn-toolkit2 version: 2.3.2
adb: unable to connect for root: closed
I target set by user is: rk3588


--> Init runtime environment


I Get hardware info: target_platform = rk3588, os = Linux, aarch = aarch64
I Check RK3588 board npu runtime version
I Starting ntp or adb, target is RK3588
I Start adb...
I Connect to Device success!


I NPUTransfer(3399200): Starting NPU Transfer Client, Transfer version 2.2.2 (12abf2a@2024-09-02T03:22:41)
I NPUTransfer(3399200): TransferBuffer: min aligned size: 1024
D RKNNAPI: RKNN VERSION:[0m
D RKNNAPI:   API: 2.3.2 (1842325 build@2025-03-30T09:55:23)[0m
D RKNNAPI:   DRV: rknn_server: 2.3.2 (1842325 build@2025-03-30T09:54:34)[0m
D RKNNAPI:   DRV: rknnrt: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)[0m


2025-05-20 23:56:14,446 INFO     pid:3399200 app.infer.testing_seg:172:main Starting SegmentationTestingApp, Namespace(platform='rknn', model_path='deployment/models/seg_model.rknn', verbose=True, target='rk3588', device_id=None, batch_size=1, num_workers=1)
2025-05-20 23:56:14,612 INFO     pid:3399200 core.dsets_seg:273:__init__ <core.dsets_seg.Luna2dSegmentationDataset object at 0x7ff8e022da00>: 89 validation series, 1122 slices, 154 nodules
2025-05-20 23:56:14,613 INFO     pid:3399200 app.infer.testing_seg:178:main 1122 batches of size 1*1


D RKNNAPI: Input tensors:[0m
D RKNNAPI:   index=0, name=input, n_dims=4, dims=[1, 512, 512, 7], n_elems=1835008, size=3670016, w_stride = 0, size_with_stride = 0, fmt=NHWC, type=FP16, qnt_type=NONE, zp=0, scale=1.000000[0m
D RKNNAPI: Output tensors:[0m
done
Model-deployment/models/seg_model.rknn is rknn model, starting val
D RKNNAPI:   index=0, name=output, n_dims=4, dims=[1, 1, 512, 512], n_elems=262144, size=524288, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=FP16, qnt_type=NONE, zp=0, scale=1.000000[0m


2025-05-20 23:56:34,090 INFO     pid:3399200 util.util:238:enumerateWithEstimate E0 Validation     4/1122, done at 2025-05-21 00:25:48, 0:29:20
2025-05-20 23:56:51,553 INFO     pid:3399200 util.util:238:enumerateWithEstimate E0 Validation    16/1122, done at 2025-05-21 00:24:11, 0:27:43
2025-05-20 23:58:40,680 INFO     pid:3399200 util.util:238:enumerateWithEstimate E0 Validation    64/1122, done at 2025-05-21 00:35:15, 0:38:47
2025-05-21 00:02:48,933 INFO     pid:3399200 util.util:238:enumerateWithEstimate E0 Validation   256/1122, done at 2025-05-21 00:24:16, 0:27:48
2025-05-21 00:22:28,910 INFO     pid:3399200 util.util:238:enumerateWithEstimate E0 Validation  1024/1122, done at 2025-05-21 00:24:56, 0:28:28
2025-05-21 00:24:54,130 INFO     pid:3399200 app.infer.testing_seg:258:logMetrics E0 SegmentationTestingApp
2025-05-21 00:24:54,131 INFO     pid:3399200 app.infer.testing_seg:288:logMetrics E0 val      0.8932 loss, 0.0640 precision, 0.7666 recall, 0.1182 f1 score
2025-05-21 00:24

### PC 与 NPU 的结果对比

|platform|loss|tp|fn|fp|
|-|-|-|-|-|
|PC|0.8396|76.7%|23.3%|1122.6%|
|RK3588S-NPU|0.8932|76.7%|23.3%|1120.5%|