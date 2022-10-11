Skip to content
/ PaddleOCR Public

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

License

Apache-2.0 license
25.3k stars 5.2k forks
Star
Notifications

PaddlePaddle/PaddleOCR

release/2.6
Switch branches/tags
12 branches 6 tags
Code

Latest commit

@LDOUBLEV
LDOUBLEV Merge pull request #7843 from jingsongliujing/release/2.6
558b2a7 Oct 11, 2022
Merge pull request #7843 from jingsongliujing/release/2.6 
Update 印章弯曲文字识别.md
558b2a7

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.github/ISSUE_TEMPLATE
Update joinus.png and issue_template
Nov 9, 2021
PPOCRLabel
Update README.md
Oct 10, 2022
StyleText
dbg (#3236)
Jul 2, 2021
applications
Update 印章弯曲文字识别.md
Oct 9, 2022
benchmark
fix run_benchmark_det.sh
Mar 11, 2022
configs
add params for v3 rec (#7643)
Sep 19, 2022
deploy
[cherry-pick] update paddle2onnx (#7866)
Oct 10, 2022
doc
Correct download recognition model URL
Oct 11, 2022
ppocr
Update random_crop_data.py (#7496)
Oct 11, 2022
ppstructure
test=document_fix ，update table docs
Sep 23, 2022
test_tipc
reduce rec tipc lite time
Aug 23, 2022
tools
sorted
Oct 10, 2022
.clang_format.hook
upload lite demo and clang-fomat
Jul 7, 2020
.gitignore
update ignore
Aug 8, 2022
.pre-commit-config.yaml
upload PaddleOCR code
May 10, 2020
.style.yapf
upload PaddleOCR code
May 10, 2020
LICENSE
Initial commit
May 8, 2020
MANIFEST.in
fix bug in whl import fce
Mar 18, 2022
README.md
Update README.md
Sep 5, 2022
README_ch.md
Update README_ch.md
Sep 22, 2022
__init__.py
Merge pull request #7356 from Evezerest/dygraph
Aug 25, 2022
paddleocr.py
Merge pull request #7333 from vivien8261/release/2.6
Aug 31, 2022
requirements.txt
rm opencv version limit (#7329)
Aug 24, 2022
setup.py
merge paddlestructure whl to paddleocr whl
Aug 2, 2021
train.sh
opt deploy doc
Feb 2, 2021
Introduction 📣 Recent updates 🌟 Features Quick Experience 📚 E-book: Dive Into OCR 👫 Community 🛠️ PP-OCR Series Model List（Update on September 8th） 📖 Tutorials 👀 Visualization more 🇺🇳 Guideline for New Language Requests 📄 License

README.md

English | 简体中文

Introduction

PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.

📣 Recent updates

  • 🔥2022.8.24 Release PaddleOCR release/2.6
    • Release PP-Structurev2，with functions and performance fully upgraded, adapted to Chinese scenes, and new support for Layout Recovery and one line command to convert PDF to Word;
    • Layout Analysis optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms;
    • Table Recognition optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
    • Key Information Extraction optimization：a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.
  • 🔥2022.8 Release OCR scene application collection
    • Release 9 vertical models such as digital tube, LCD screen, license plate, handwriting recognition model, high-precision SVTR model, etc, covering the main OCR vertical applications in general, manufacturing, finance, and transportation industries.
  • 2022.8 Add implementation of 8 cutting-edge algorithms
  • 2022.5.9 Release PaddleOCR release/2.5
    • Release PP-OCRv3: With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
    • Release PPOCRLabelv2: Add the annotation function for table recognition task, key information extraction task and irregular text image.
    • Release interactive e-book "Dive into OCR", covers the cutting-edge theory and code practice of OCR full stack technology.
  • more

🌟 Features

PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution PP-OCR and PP-Structure on this basis, and get through the whole process of data production, model training, compression, inference and deployment.

It is recommended to start with the “quick experience” in the document tutorial

Quick Experience

📚 E-book: Dive Into OCR

👫 Community

  • For international developers, we regard PaddleOCR Discussions as our international community platform. All ideas and questions can be discussed here in English.

  • For Chinese develops, Scan the QR code below with your Wechat, you can join the official technical discussion group. For richer community content, please refer to 中文README, looking forward to your participation.

🛠️ PP-OCR Series Model List（Update on September 8th）

Model introduction Model name Recommended scene Detection model Direction classifier Recognition model
Chinese and English ultra-lightweight PP-OCRv3 model（16.2M） ch_PP-OCRv3_xx Mobile & Server inference model / trained model inference model / trained model inference model / trained model
English ultra-lightweight PP-OCRv3 model（13.4M） en_PP-OCRv3_xx Mobile & Server inference model / trained model inference model / trained model inference model / trained model
Chinese and English ultra-lightweight PP-OCRv2 model（11.6M） ch_PP-OCRv2_xx Mobile & Server inference model / trained model inference model / trained model inference model / trained model
Chinese and English ultra-lightweight PP-OCR model (9.4M) ch_ppocr_mobile_v2.0_xx Mobile & server inference model / trained model inference model / trained model inference model / trained model
Chinese and English general PP-OCR model (143.4M) ch_ppocr_server_v2.0_xx Server inference model / trained model inference model / trained model inference model / trained model

📖 Tutorials

👀 Visualization more

PP-OCRv3 Chinese model
PP-OCRv3 English model
PP-OCRv3 Multilingual model
PP-Structurev2
  • layout analysis + table recognition
  • SER (Semantic entity recognition)
  • RE (Relation Extraction)

🇺🇳 Guideline for New Language Requests

If you want to request a new language support, a PR with 1 following files are needed：

  1. In folder ppocr/utils/dict, it is necessary to submit the dict text to this path and name it with {language}_dict.txt that contains a list of all characters. Please see the format example from other files in that folder.

If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.

More details, please refer to Multilingual OCR Development Plan.

📄 License

This project is released under Apache 2.0 license

About

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Topics

ocr db crnn ocrlite chineseocr

Resources

Readme

License

Apache-2.0 license

Stars

25.3k stars

Watchers

385 watching

Forks

5.2k forks

Releases 6

PaddleOCRv2.6.0 Latest
Aug 24, 2022
+ 5 releases

Packages

No packages published

Used by 834

  • @luonghuuthanhnam
  • @EMOAIRX
  • @PythonTryHard
  • @nhungnguyen-seta
  • @ldphenshuai
  • @SSTato
  • @B-S-B
  • @yinghanguan
+ 826

Contributors 108

+ 97 contributors

Languages