This repository now includes an end-to-end handwriting processing pipeline that:
- Takes an input image of handwritten text.
- Runs OCR to extract letter/word sequences.
- Applies word prediction/correction for noisy OCR output.
- Exports the predicted text to both
.txtand.docx.
Create and activate/install with your virtualenv (or use .venv/bin/... commands directly):
sudo apt-get update
sudo apt-get install -y python3.12-venv tesseract-ocr
python3 -m venv .venv
.venv/bin/python -m pip install --upgrade pip
.venv/bin/pip install -r requirements.txtRun the pipeline:
.venv/bin/python handwriting_pipeline.py \
--image path/to/handwritten_image.png \
--output-stem outputs/predicted_wordsOutputs:
outputs/predicted_words.txtoutputs/predicted_words.docx
Train the CNN and save a checkpoint:
EPOCHS=1 MAX_TRAIN_SAMPLES=512 MAX_VAL_SAMPLES=256 \
.venv/bin/python Letter_Detection.pyDefault checkpoint path:
models/character_cnn.pt
Run pipeline with CNN-assisted refinement:
.venv/bin/python handwriting_pipeline.py \
--image path/to/handwritten_image.png \
--output-stem outputs/predicted_words \
--cnn-checkpoint models/character_cnn.ptDisable CNN and use OCR-only mode:
.venv/bin/python handwriting_pipeline.py \
--image path/to/handwritten_image.png \
--output-stem outputs/predicted_words \
--disable-cnnYou can generate a sample handwritten-style image and run the full pipeline:
.venv/bin/python demo_pipeline.py
.venv/bin/python handwriting_pipeline.py \
--image sample_handwriting.png \
--output-stem demo_output/predicted_wordsThis creates:
sample_handwriting.pngdemo_output/predicted_words.txtdemo_output/predicted_words.docx
Run a quick evaluation against the built-in demo reference text:
.venv/bin/python evaluate_pipeline.pyThis prints the expected text, predicted text, and word-level accuracy.