Skip to content
Dockerized Tesseract OCR as micro-webservice
Dockerfile PHP HTML
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
Dockerfile
LICENSE
README.md
index.html
ocr.php
policy.xml

README.md

Tesseract OCR microservice

This small Docker container allows to run Tesseract OCR on PDF documents via HTTP POST. To do so, a PDF document is submitted as "userfile" form data. The container returns a PDF document with 300 DPI resolution and all recognized text included.

Example

docker build -t ocr .
docker run -p8080:80 ocr
curl -F "userfile=@/tmp/test.pdf" -H "Expect:" -o output.pdf localhost:8080/ocr.php

Limitations

Currently, Tesseract settings like language, etc. are hard-coded. These should be made configurable via the REST interface.

You can’t perform that action at this time.