Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hackathon 182 Model] Update PPOCRV3 For RKNPU2 #1403

Merged
merged 12 commits into from
Feb 27, 2023
Merged

[Hackathon 182 Model] Update PPOCRV3 For RKNPU2 #1403

merged 12 commits into from
Feb 27, 2023

Conversation

Zheng-Bicheng
Copy link
Collaborator

PR types(PR类型)

Model

Description

Update PPOCRV3 For RKNPU2

@Zheng-Bicheng
Copy link
Collaborator Author

PPOCRV3更新细节

FastDeploy cmake部分

  • 更新RKNPU2驱动版本至1.4.2b0
  • 更新RKNPU2 cmake 现在RKNPU2驱动将分为RK3588和RK356X进行下载,不再混合
  • FastDeploy不在依赖rknnlib_api

Det部分

  • RKNPU2不支持Normalzie 和 Permute因此添加了DisableNormalize 和DisablePermute
  • RKNPU2 不支持动态shape,因此参考Rec部分新增了固定shape推理。

Cls部分

  • RKNPU2不支持Normalzie 和 Permute因此添加了DisableNormalize 和DisablePermute

Rec部分

  • RKNPU2不支持Normalzie 和 Permute因此添加了DisableNormalize 和DisablePermute

OCR Result部分

  • 在显示result时会出现rec_score为0但是仍然被框出来的情况,这里对VisOcr函数新增了参数score_threshold

导出模型部分

  • 新增三个模型的配置文件
  • 对导出模型的代码进行了优化

@Zheng-Bicheng
Copy link
Collaborator Author

结果展示

vis_result

det boxes: [[276,174],[285,173],[285,178],[276,179]]rec text:  rec score:0.000000 cls label: 1 cls score: 0.766602
det boxes: [[43,408],[483,390],[483,431],[44,449]]rec text: 上海斯格威铂尔曼大酒店 rec score:0.888450 cls label: 0 cls score: 1.000000
det boxes: [[186,456],[399,448],[399,480],[186,488]]rec text: 打浦路15号 rec score:0.988769 cls label: 0 cls score: 1.000000
det boxes: [[18,501],[513,485],[514,537],[18,554]]rec text: 绿洲仕格维花园公寓 rec score:0.992730 cls label: 0 cls score: 1.000000
det boxes: [[78,553],[404,541],[404,573],[78,585]]rec text: 打浦路252935号 rec score:0.983545 cls label: 0 cls score: 1.000000
Visualized result saved in ./vis_result.jpg

@Zheng-Bicheng
Copy link
Collaborator Author

目前未实现要求

经过实测,使用RK静态量化工具,PPOCRV3的精度会大幅度下降,这一点可能要和RK的工程师讨论下。 @leiqing1

/// Set static_shape_infer is true or not. When deploy PP-OCR
/// on hardware which can not support dynamic input shape very well,
/// like Huawei Ascned, static_shape_infer needs to to be true.
void SetStaticShapeInfer(bool static_shape_infer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个接口,与SetDetImageShape(),是否是有重复。 例如调用了SetDetImageShape()本身就是要求SetStaticShapeInfer()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是参考rec_preprocessor的api结构,如下:

/// Set static_shape_infer is true or not. When deploy PP-OCR
  /// on hardware which can not support dynamic input shape very well,
  /// like Huawei Ascned, static_shape_infer needs to to be true.
  void SetStaticShapeInfer(bool static_shape_infer) {
    static_shape_infer_ = static_shape_infer;
  }
  /// Get static_shape_infer of the recognition preprocess
  bool GetStaticShapeInfer() const { return static_shape_infer_; }

/// Set rec_image_shape for the recognition preprocess
  void SetRecImageShape(const std::vector<int>& rec_image_shape) {
    rec_image_shape_ = rec_image_shape;
  }
  /// Get rec_image_shape for the recognition preprocess
  std::vector<int> GetRecImageShape() { return rec_image_shape_; }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api统一会不会更利于用户理解?调用SetDetImageShape()确实是在固定shape推理的情况下才有意义。

void DisablePermute() { disable_normalize_ = true; }

/// Set cls_image_shape for the classification preprocess
void SetDetImageShape(const std::vector<int>& det_image_shape) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个SetDetImageShape是否不需要暴露给用户使用?我看着这个SetDetImageShape如果暴露后,是由用户自己去决定这个reszie的shape. 但是他们可能也不知道resize到多少比较合适.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有这里的英文注释,这里不是 cls_image_shape

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetDetImageShape是考虑到用户自己训练Det模型的情况,注释我修改下哈

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetDetImageShape是考虑到用户自己训练Det模型的情况,注释我修改下哈

RK的用户,自己训OCR的用户多吗,因为之前很少有听说能超越官方PP-OCR的情况.
如果是出于这种考虑,需要把SetDetImageShape这种接口的使用情况说明白,可以在文档、或者demo代码里,添加一行,并且注释掉,说明使用情况

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有听说重新训练Det模型的,其他模型没听说过重新训练的。因为使用场景不一致,有的用户希望检测特定领域,比如发票啥的,就希望能够自己重新训练。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我将注释修改如下:

  /// Set det_image_shape for the detection preprocess.
  /// This api is usually used when you retrain the model.
  /// Generally, you do not need to use it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好,我发现虽然测试例子能跑通,但是模型未做量化,最后使用npu的处理时间比使用cpu还要高很多,在cpu状态下大概1s,而使用npu需要1.4s。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好,我发现虽然测试例子能跑通,但是模型未做量化,最后使用npu的处理时间比使用cpu还要高很多,在cpu状态下大概1s,而使用npu需要1.4s。

  • 量化的问题和RK的杜大佬那边讨论过。他们对Rec模型的支持度不太好,所以暂时没有做量化。这个问题已经在沟通了,将考虑使用混合量化来解决这个问题。
  • 至于NPU的处理时间比CPU还多,其实这个问题出在FP32转FP16的这个过程,已经反馈到RK那边了,在等他们进行优化。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好,我发现虽然测试例子能跑通,但是模型未做量化,最后使用npu的处理时间比使用cpu还要高很多,在cpu状态下大概1s,而使用npu需要1.4s。

您好,我想问以下您的板子型号是什么,目前我在RK3588上的测试结果如下:

# ONNX下
det: 0.52s
cls: 0.02
rec: 0.32

# RKNN下
det: 0.15
cls: 0.007
rec: 0.18

和您说的有出入

@yunyaoXYY
Copy link
Collaborator

还有一个小问题,不知道为什么代码的格式,会改变这么多. Bicheng你是否用了pre-commit, 按理说这里先前提交的代码,也应该用了pre-commit的. 😿

@Zheng-Bicheng
Copy link
Collaborator Author

还有一个小问题,不知道为什么代码的格式,会改变这么多. Bicheng你是否用了pre-commit, 按理说这里先前提交的代码,也应该用了pre-commit的. 😿

我用了pre-commit,有个cpp-lint执行完就这样了

@yunyaoXYY
Copy link
Collaborator

还有一个小问题,不知道为什么代码的格式,会改变这么多. Bicheng你是否用了pre-commit, 按理说这里先前提交的代码,也应该用了pre-commit的. 😿

我用了pre-commit,有个cpp-lint执行完就这样了

Ok 好的

@Zheng-Bicheng
Copy link
Collaborator Author

@jiangjiajun @yunyaoXYY 大佬们还有需要我修改的地方吗?

@jiangjiajun
Copy link
Collaborator

@Zheng-Bicheng 昨天有同事对DetModel的预处理进行了修改,产生了冲突,需要合并一下

@Zheng-Bicheng
Copy link
Collaborator Author

已经merge了最新的修改,大佬你再看下 @jiangjiajun
image

@jiangjiajun jiangjiajun merged commit 8c3ccc2 into PaddlePaddle:develop Feb 27, 2023
@3togo
Copy link

3togo commented Feb 27, 2023

Could you provide an example on CI-FastDeploy-Android C++ for rknpu2?

@Zheng-Bicheng
Copy link
Collaborator Author

Zheng-Bicheng commented Feb 27, 2023

Could you provide an example on CI-FastDeploy-Android C++ for rknpu2?

We have only tested rknpu2 on the Linux platform for the time being.

@KongXCai
Copy link

KongXCai commented Mar 6, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants