SceneTextOCR

Demo

Description

This is a modified implementation of the CRNN architecture. It transcribes cropped images of a single word into text. It is built to handle all kinds of text (cursive, some handwriting, printed etc.) It uses an 18 layer ResNet instead of the VGG stack in the original model, and gets about 97% transcription accuracy. It is trained on the mjsynth dataset. It is trained to map uppercase and lowercase letters to their uppercase version. It does not support punctuation or numbers yet. The maximum word length supported is 16 characters, although I plan to make this size arbitrary soon.

Building

Manually (preferred if doing development)

On a machine with an nvidia GPU:

sudo docker run --rm --gpus all -i -t docker.pkg.github.com/benjamintrapani/scenetextocr/cntk_nvidia_gpu_with_mods:1.0 bash
apt-get install git-lfs
git lfs install
git clone https://github.com/BenjaminTrapani/SceneTextOCR.git
cd SceneTextOCR/TextTranscriptionCode/
mkdir build
cd build/
cmake -DCNTK_DIR=/home/CNTK/ -DBOOST_ROOT=/usr/local/boost-1.60.0/ -DBoost_NO_SYSTEM_PATHS=TRUE -B. -S../
./SceneTextOCREvaluator ../TextTranscriptionWindows/Content/ResNetCRNNNewCNTK32SeqLenOut

The evaluator waits for a path to an image, and evaluates the decoded image in the trained model. For example, pasting /home/SceneTextOCR/TextTranscriptionCode/WebTester/water.jpg will print out the decoded text WATER with the separator character '_' included.

Pre-build (easier to test out the model)

sudo docker run --rm --gpus all -i -t docker.pkg.github.com/benjamintrapani/scenetextocr/scene_text_ocr_built:1.0 bash
cd /home/SceneTextOCR/TextTranscriptionCode/build
Step 9 and below match the manual steps

Training your own model

Download the mjsynth data set: wget https://thor.robots.ox.ac.uk/~vgg/data/text/mjsynth.tar.gz tar -xf mjsynth.tar.gz
Generate mapping files to be decoded by CNTK during training: ./MapFileMaker ./mnt/ image_map_train.txt text_map_train.txt image_map_test.txt text_map_test.txt 32 6000000 9000000

Replace ./mnt/ with the path to the root of the extracted mjsynth data. 32 is the maximum sequence length. 6000000 and 9000000 represent the number of training images and the total number of images respectively.

Run the trainer: ./SceneTextOCRTrainer MapFileMaker/image_map_train.txt MapFileMaker/text_map_train.txt MapFileMaker/image_map_test.txt MapFileMaker/text_map_test.txt /home/SceneTextOCR/TextTranscriptionCode/build/my_new_model

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
ExternalLibs		ExternalLibs
TextTranscriptionCode		TextTranscriptionCode
vue-app/scene-text-ocr-app		vue-app/scene-text-ocr-app
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SceneTextOCR

Demo

Description

Building

Manually (preferred if doing development)

Pre-build (easier to test out the model)

Training your own model

About

Releases

Packages

Languages

BenjaminTrapani/SceneTextOCR

Folders and files

Latest commit

History

Repository files navigation

SceneTextOCR

Demo

Description

Building

Manually (preferred if doing development)

Pre-build (easier to test out the model)

Training your own model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages