图像识别：简单的从一张图片中识别出中文（Tesseract） #71

Qingquan-Li · 2017-10-28T09:05:38Z

需求：从一张图片中识别出中文
实现：使用 Python 并借助开源库 Tesseract 实现

Tesseract是一种开源的光学字符识别（OCR）引擎，可根据Apache 2.0许可证使用。它可以直接使用，或（对于程序员）使用API从图像中提取类型，手写或打印的文本。它支持各种语言。
参考：
https://github.com/tesseract-ocr/tesseract/wiki
https://en.wikipedia.org/wiki/Tesseract_(software)

开发环境：

macOS
Python 3.6
brew

一、安装 tesseract

brew install tesseract

二、安装 Python 对应的包

pip3 install pytesseract

三、下载对应的中文训练数据

tesseract 支持多种语言：https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages

从 https://github.com/tesseract-ocr/tessdata 下载简体中文数据集 chi_sim.traineddata 到 /usr/local/Cellar/tesseract/3.05.01/share/tessdata 目录下：

四、Show the code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract

# open image
image = Image.open('/Users/fatli/Desktop/dufu.png')
code = pytesseract.image_to_string(image, lang='chi_sim')
print(code)

附：英文识别

Qingquan-Li added the Python label Oct 28, 2017

Qingquan-Li changed the title ~~图像识别：简单的从一张图片中识别出中文~~ 图像识别：简单的从一张图片中识别出中文（Tesseract） Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

图像识别：简单的从一张图片中识别出中文（Tesseract） #71

图像识别：简单的从一张图片中识别出中文（Tesseract） #71

Qingquan-Li commented Oct 28, 2017 •

edited

图像识别：简单的从一张图片中识别出中文（Tesseract） #71

图像识别：简单的从一张图片中识别出中文（Tesseract） #71

Comments

Qingquan-Li commented Oct 28, 2017 • edited

一、安装 tesseract

二、安装 Python 对应的包

三、下载对应的中文训练数据

四、Show the code

Qingquan-Li commented Oct 28, 2017 •

edited