# OCR Training Pipeline - Quick Reference

This notebook consolidates the complete OCR training workflow from dataset preparation to model testing.

## Pipeline Overview:
1. **Dataset Generation** - Create synthetic training data with TRDG
2. **Format Conversion** - Convert dataset to LMDB format for deep-text-recognition-benchmark
3. **Model Training** - Train OCR model using deep-text-recognition-benchmark
4. **Model Testing** - Run inference with demo.py

**Note:** For detailed step-by-step guides, refer to `workspace_step1.ipynb`, `workspace_step2.ipynb`, `workspace_step3.ipynb`, and `workspace_step4.ipynb`.

## Prerequisites

Install required dependencies for deep-text-recognition-benchmark training and inference.

## Setup: Install deep-text-recognition-benchmark Dependencies

Install Python packages required by the benchmark repository for training and inference.

In [1]:
%pip install lmdb pillow torchvision nltk natsort fire opencv-python

Collecting lmdb
  Using cached lmdb-1.7.5-cp312-cp312-macosx_11_0_arm64.whl.metadata (1.4 kB)
Collecting pillow
  Downloading pillow-12.0.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (8.8 kB)
Collecting torchvision
  Downloading torchvision-0.24.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (5.9 kB)
Collecting nltk
  Using cached nltk-3.9.2-py3-none-any.whl.metadata (3.2 kB)
Collecting natsort
  Using cached natsort-8.4.0-py3-none-any.whl.metadata (21 kB)
Collecting fire
  Using cached fire-0.7.1-py3-none-any.whl.metadata (5.8 kB)
Collecting opencv-python
  Using cached opencv_python-4.12.0.88-cp37-abi3-macosx_13_0_arm64.whl.metadata (19 kB)
Collecting numpy (from torchvision)
  Downloading numpy-2.3.4-cp312-cp312-macosx_14_0_arm64.whl.metadata (62 kB)
Collecting torch==2.9.0 (from torchvision)
  Downloading torch-2.9.0-cp312-none-macosx_11_0_arm64.whl.metadata (30 kB)
Collecting filelock (from torch==2.9.0->torchvision)
  Using cached filelock-3.20.0-py3-none-any.whl.metadata (2.1 kB)


---

## Step 1 & 2: Dataset Preparation

*Refer to `workspace_step1.ipynb` for TRDG dataset generation and `workspace_step2.ipynb` for detailed dataset organization.*

---

## Step 3: Convert Dataset to LMDB Format

Convert your dataset (generated by TRDG or manually labeled) to LMDB format for training with deep-text-recognition-benchmark.

**Input Structure:**
```
data/
  ├── image_001.jpg
  ├── image_002.jpg
  └── gt.txt  (format: image_name.jpg	label_text)
```

**Output:** LMDB database in `result/` directory

In [None]:
%%bash

cd deep-text-recognition-benchmark

python create_lmdb_dataset.py --inputPath data --gtFile data/gt.txt --outputPath result/

Written 1000 / 1010
Created dataset with 1010 samples


Exception in thread Thread-6 (bg_main):
Traceback (most recent call last):
  File "/var/folders/fp/zn84l44d58bbdlhq0sb0t6zw0000gn/T/ipykernel_1915/3791181996.py", line 17, in bg_main
  File "/Users/wing199901/Downloads/ocr/.venv/lib/python3.12/site-packages/IPython/core/display_functions.py", line 354, in update
    update_display(obj, display_id=self.display_id, **kwargs)
  File "/Users/wing199901/Downloads/ocr/.venv/lib/python3.12/site-packages/IPython/core/display_functions.py", line 306, in update_display
    display(obj, display_id=display_id, **kwargs)
  File "/Users/wing199901/Downloads/ocr/.venv/lib/python3.12/site-packages/IPython/core/display_functions.py", line 276, in display
    publish_display_data(data=obj, metadata=metadata, **kwargs)
  File "/Users/wing199901/Downloads/ocr/.venv/lib/python3.12/site-packages/IPython/core/display_functions.py", line 73, in publish_display_data
    display_pub.publish(
  File "/Users/wing199901/Downloads/ocr/.venv/lib/python3.12/site-pack

---

## Step 4: Run Inference with Trained Model

Test your trained model or a pre-trained model using `demo.py` from deep-text-recognition-benchmark.

**Model Configuration:**
- **Transformation:** TPS (Thin Plate Spline)
- **FeatureExtraction:** ResNet
- **SequenceModeling:** BiLSTM
- **Prediction:** Attn (Attention mechanism)

**Note:** Adjust `--saved_model` path to your trained model checkpoint. Use `--sensitive` flag for case-sensitive recognition.

In [None]:
%%bash

cd deep-text-recognition-benchmark

CUDA_VISIBLE_DEVICES=0 

python3 demo.py \
--Transformation TPS \
--FeatureExtraction ResNet \
--SequenceModeling BiLSTM \
--Prediction Attn \
--image_folder demo_image/ \
--saved_model models/TPS-ResNet-BiLSTM-Attn-case-sensitive.pth \
--sensitive

model input parameters 32 100 20 1 512 256 96 25 TPS ResNet BiLSTM Attn
loading pretrained model from models/TPS-ResNet-BiLSTM-Attn-case-sensitive.pth




--------------------------------------------------------------------------------
image_path               	predicted_labels         	confidence score
--------------------------------------------------------------------------------
demo_image/demo_1.png    	Available                	0.9996
demo_image/demo_2.jpg    	SHARESHACK               	0.6425
demo_image/demo_3.png    	Londen                   	0.6874
demo_image/demo_4.png    	Greenstead               	0.9997
demo_image/demo_5.png    	TOAST                    	0.9879
demo_image/demo_6.png    	MERRY                    	0.9982
demo_image/demo_7.png    	underground              	0.9998
demo_image/demo_8.jpg    	RONALDO                  	0.9208
demo_image/demo_9.jpg    	BALLY                    	0.4128
demo_image/demo_10.jpg   	UNIVERSITY               	0.9356
