## 源码

* [github](https://github.com/tesseract-ocr/tesseract)
* [编译](https://github.com/tesseract-ocr/tesseract/wiki/Compiling)
* 训练
    * [训练1](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract)
    * [训练2](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)
* [命令](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage)



## API调用

```C++
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
    char *outText;

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    // Initialize tesseract-ocr with English, without specifying tessdata path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    // Open input image with leptonica library
    Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
    api->SetImage(image);
    // Get OCR result
    outText = api->GetUTF8Text();
    printf("OCR output:\n%s", outText);

    // Destroy used object and release memory
    api->End();
    delete [] outText;
    pixDestroy(&image);

    return 0;
}
```

## python支持的TESSERACT

* [python-TESSEROCR](https://github.com/sirfz/tesserocr)
* [PYTESSERACT](https://github.com/madmaze/pytesseract)

### tesserocr

* 确保自己的电脑安装了 libtesseract (>=3.04) and libleptonica (>=1.71).
* `apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config`安装相关软件

* 安装包(Linux)
    * `pip install tesserocr`安装tesserocr包
    
* 使用

In [6]:
# 初始化并重用tesseract API实例对多个图像进行评分:
from tesserocr import PyTessBaseAPI

images = ['sample1.jpg', 'sample2.jpg', 'sample3.jpg']

with PyTessBaseAPI() as api:
    for img in images:
        api.SetImageFile(img)
        print(api.GetUTF8Text())
        print(api.AllWordConfidences())
# api is automatically finalized when used in a with-statement (context manager).
# otherwise api.End() should be explicitly called when it's no longer needed.

HEW x aweew

muznu m1

‘ iE‘a‘iRﬁUé‘Eﬁﬁ

 

- ibiga’ééﬁ—Es‘kﬁﬁlﬂkiww

 

 

 

 

 

-,‘
' o
.


[47, 65, 56, 69, 73, 35, 53, 95, 93, 47, 95, 95, 95, 95, 95, 37, 90, 65, 76]
maﬁa $111M?" ﬁ/Kﬁﬂ

3’I‘ﬁﬂﬁﬁﬁfﬁﬁ4073

 

 


[44, 49, 53, 46, 95, 95]
7FEEEﬁﬁ X 3E 1.443%?
OCRX$iFl£|J§ﬁ£

i ﬁﬂ“?ﬂ§ﬁ¥ﬁ§”$¢%§ﬁﬁﬁﬁ¥iﬁ

'I

 


[56, 64, 53, 49, 50, 66, 49, 76, 95]


In [13]:
import tesserocr
from PIL import Image

print (tesserocr.tesseract_version())  # print tesseract-ocr version
print (tesserocr.get_languages())  # prints tessdata path and list of available languages

image = Image.open('sample3.jpg')
print (tesserocr.image_to_text(image))  # print ocr text from image
# or
print (tesserocr.file_to_text('sample3.jpg'))

tesseract 3.04.01
 leptonica-1.73
  libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.1.2

('/usr/share/tesseract-ocr/tessdata/', ['eng', 'osd', 'equ'])
7EEDSE£€ X 4% was?
ocﬁziiﬁﬁﬂiﬁﬁ

i ﬁﬁ‘ﬁﬂﬁtﬁE’WﬁEEWﬁﬂﬁ

’l


7FEEEﬁﬁ X 3E 1.443%?
OCRX$iFl£|J§ﬁ£

i ﬁﬂ“?ﬂ§ﬁ¥ﬁ§”$¢%§ﬁﬁﬁﬁ¥iﬁ

'I

 




### pytesseract


## 百度OCR API
* [文字识别](https://cloud.baidu.com/doc/OCR/OCR-Python-SDK.html)
* [文字识别](https://cloud.baidu.com/doc/OCR/OCR-Python-SDK/24.5C.E6.8E.A5.E5.8F.A3.E8.AF.B4.E6.98.8E.html)