Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练图和label 里如何让每个字符的出现频率类似,尤其是生僻字 #9830

Closed
nissansz opened this issue Apr 26, 2023 · 14 comments
Assignees
Labels
help wanted this issue needs help status/close training this is a training related issue triaged this issue has been looked, and triaged.

Comments

@nissansz
Copy link

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:win10
  • 版本号/Version:Paddle: PaddleOCR:2.5 问题相关组件/Related components:
  • 运行指令/Command Code:
  • 完整报错/Complete Error Message:

训练图和label 里如何让每个字符的出现频率类似,尤其是生僻字

@ToddBear ToddBear added the good first issue Good for newcomers label Jun 30, 2023
@livingbody
Copy link
Contributor

我知道,你说的是识别模型,可以数据均衡。

@nissansz
Copy link
Author

数据均衡怎么实现?

@shiyutang
Copy link
Collaborator

可以采用数据重采样,例如扩增生僻字图片进行copy-paste等方法。

@nissansz
Copy link
Author

nissansz commented Jul 5, 2023

image

怎么模拟这种效果?有没有python代码,方法?

@shiyutang
Copy link
Collaborator

@nissansz
Copy link
Author

nissansz commented Jul 5, 2023

styletext不太好,只支持部分语种。而且效果也不接近

@shiyutang
Copy link
Collaborator

shiyutang commented Jul 7, 2023

还有TextRender可以尝试,效果会好于StyleText。数据合成工具总结:https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/data_synthesis.md

@nissansz
Copy link
Author

nissansz commented Jul 7, 2023

resnet34 默认学习率    learning_rate: 0.0005
训练到一定准确度后没法继续改善,能修改lr继续提高准确率吗?修改多少比较好?

@shiyutang
Copy link
Collaborator

这个可以看你的设置,如果bs增大了,可以采用更大的学习率。另外设置阶梯学习率,例如0.0005、0.0001、0.001、0.002、0.00005等进行尝试,找到适合的学习率在附近微调。

@nissansz
Copy link
Author

nissansz commented Jul 7, 2023

[StyleText](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/StyleText
怎样支持繁体,日文等字体?

resnet34 默认学习率是固定的, learning_rate: 0.0005 训练过程中是可以改的?

@nissansz
Copy link
Author

nissansz commented Jul 7, 2023

yml怎么改成resnet18或其他backbone,

@nissansz
Copy link
Author

nissansz commented Jul 7, 2023

yml怎么改成resnet18或其他backbone,改成这些backbone, crnn还能训练吗?源代码要不要改?

@shiyutang
Copy link
Collaborator

@jzhang533 jzhang533 added triaged this issue has been looked, and triaged. needs investigation this issue needs investigation to either narrow down, or clarify training this is a training related issue help wanted this issue needs help and removed good first issue Good for newcomers needs investigation this issue needs investigation to either narrow down, or clarify labels Apr 10, 2024
@UserWangZz
Copy link
Collaborator

该issue长时间未更新,暂将此issue关闭,如有需要可重新开启。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted this issue needs help status/close training this is a training related issue triaged this issue has been looked, and triaged.
Projects
None yet
Development

No branches or pull requests

7 participants