# Fine-Tuning EasyOCR for Custom Text Recognition

### Overview
This notebook demonstrates how to fine-tune EasyOCR's text recognition model on custom datasets. EasyOCR is a powerful OCR library, but sometimes the pre-trained models don't perform well on domain-specific text (technical jargon, special formatting, unique fonts, etc.). This guide shows you how to train a custom model to improve accuracy on your specific use case.

### What You'll Learn
How to prepare custom training data in LMDB format
- Setting up the Deep Text Recognition Benchmark environment
- Training a custom recognition model
- Converting the trained model for use with EasyOCR
- Testing your fine-tuned model

### Prerequisites
- Basic understanding of Python and machine learning
- Training images with corresponding text labels
- A `gt.txt` file containing image-label pairs (format: `image_filename.jpg<TAB>label_text`)

### Training Data Format
Your training data should be organized as:
```
train_data/
â”œâ”€â”€ image1.jpg
â”œâ”€â”€ image2.jpg
â”œâ”€â”€ ...
â””â”€â”€ gt.txt
```

The `gt.txt file` should contain tab-separated values:
```
image1.jpg	sample text one
image2.jpg	another sample
```

## 1. Environment Setup
First, we'll clean up any existing installations and prepare our workspace.

In [None]:
# Remove the old folder if it exists
%cd /content/
!rm -rf deep-text-recognition-benchmark

### Clone the Deep Text Recognition Benchmark Repository
The Deep Text Recognition Benchmark is the underlying framework that powers EasyOCR's recognition models.

In [None]:
# Clone the benchmark repository
!git clone https://github.com/clovaai/deep-text-recognition-benchmark.git
%cd deep-text-recognition-benchmark

### Install Required Dependencies

In [None]:
# Install specific dependencies for LMDB
!pip install lmdb pillow nltk natsort

In [None]:
!pip install fire

## 2. Data Preprocessing
### Fix Ground Truth File Format
This step ensures your ground truth file is properly formatted with tab separators between filenames and labels.

In [None]:
# Script to fix the text file
input_file = '../train_data/gt.txt'
output_file = '../train_data/gt_fixed.txt'

with open(input_file, 'r', encoding='utf-8') as f:
    lines = f.readlines()

with open(output_file, 'w', encoding='utf-8') as f:
    for line in lines:
        # Strip whitespace from ends and skip empty lines
        clean_line = line.strip()
        if not clean_line:
            continue

        # If you accidentally used spaces, this replaces the first space with a tab
        if '\t' not in clean_line:
            # Replaces the first sequence of spaces with a single tab
            parts = clean_line.split(None, 1)
            if len(parts) == 2:
                f.write(f"{parts[0]}\t{parts[1]}\n")
        else:
            f.write(f"{clean_line}\n")

print("Fixed file created at: ../train_data/gt_fixed.txt")

### Convert Data to LMDB Format
LMDB (Lightning Memory-Mapped Database) is an efficient storage format that speeds up data loading during training.

In [None]:
import os

# Create the output directory
os.makedirs('/train_lmdb', exist_ok=True)

!python3 create_lmdb_dataset.py \
    --inputPath ../train_data/ \
    --gtFile ../train_data/gt_fixed.txt \
    --outputPath /train_lmdb/

## 3. Framework Compatibility Fixes
### Fix PyTorch Compatibility Issue
Recent versions of PyTorch have moved some internal utilities. This patch ensures compatibility.

In [None]:
# Fix the ImportError by replacing the internal torch call with standard python itertools
dataset_path = '/content/deep-text-recognition-benchmark/dataset.py'

with open(dataset_path, 'r') as f:
    content = f.read()

# Replace the problematic import
content = content.replace('from torch._utils import _accumulate', 'from itertools import accumulate as _accumulate')

with open(dataset_path, 'w') as f:
    f.write(content)

print("Successfully patched dataset.py!")

### Configure for CPU Training
If you're training on CPU (not recommended for large datasets), this step removes CUDA dependencies.\

> **Note:** \
> For faster training, use GPU runtime in Colab: Runtime â†’ Change runtime type â†’ GPU

In [None]:
# This command uses 'sed' to replace all occurrences of '.cuda()' with nothing
# essentially forcing the script to stay on the CPU.
!sed -i 's/\.cuda()//g' train.py
!sed -i 's/device = torch.device('\''cuda'\'')/device = torch.device('\''cpu'\'')/g' train.py

## 4. Model Training
### Training Configuration
The following parameters control the training process:
- **Transformation**: None (no spatial transformation)
- **FeatureExtraction**: VGG (convolutional feature extractor)
- **SequenceModeling**: BiLSTM (bidirectional LSTM for sequence modeling)
- **Prediction**: CTC (Connectionist Temporal Classification for alignment)
- **batch_size**: 8 (adjust based on your GPU memory)
- **num_iter**: 1000 (total training iterations)
- **valInterval**: 100 (validation every 100 iterations)

In [None]:
!python3 train.py \
    --exp_name my_easyocr_finetune \
    --train_data /train_lmdb/ \
    --valid_data /train_lmdb/ \
    --select_data / \
    --batch_ratio 1 \
    --Transformation None \
    --FeatureExtraction VGG \
    --SequenceModeling BiLSTM \
    --Prediction CTC \
    --batch_size 8 \
    --num_iter 1000 \
    --valInterval 100 \
    --workers 0

### Training Output Interpretation:
- **Train loss:** Should decrease over time
- **Valid loss:** Should also decrease; watch for overfitting
- **Current_accuracy:** Recognition accuracy on validation set
- **Current_norm_ED:** Normalized edit distance (lower is better)

> The model checkpoints will be saved in `saved_models/my_easyocr_finetune/`

## 5. Model Conversion for EasyOCR
### Return to Main Directory

In [None]:
%cd ..

### Install EasyOCR

In [None]:
!pip install easyocr

### Import Required Libraries

In [None]:
import easyocr
import shutil

### Convert Model Weights to EasyOCR Format
The trained model uses a different key naming convention than EasyOCR expects. This conversion script fixes that.

In [None]:
import torch

# Path to your fine-tuned model
input_model_path = 'user_network/custom_model.pth'  # Change this to your model path
output_model_path = 'converted_model.pth'

# Load the state dict
print("Loading model...")
checkpoint = torch.load(input_model_path, map_location='cpu')

# Handle different checkpoint formats
if isinstance(checkpoint, dict) and 'state_dict' in checkpoint:
    state_dict = checkpoint['state_dict']
else:
    state_dict = checkpoint

# Create new state dict with corrected keys
new_state_dict = {}

print("\nConverting keys...")
for old_key, value in state_dict.items():
    # Replace FeatureExtraction.ConvNet. with FeatureExtraction.
    if 'FeatureExtraction.ConvNet.' in old_key:
        new_key = old_key.replace('FeatureExtraction.ConvNet.', 'FeatureExtraction.')
        new_state_dict[new_key] = value
        print(f"  {old_key} -> {new_key}")
    else:
        new_state_dict[old_key] = value

# Save the converted model
print(f"\nSaving converted model to {output_model_path}...")
torch.save(new_state_dict, output_model_path)

print("\nâœ“ Conversion complete!")
print(f"\nConverted {len(new_state_dict)} parameters")
print("\nFirst 10 keys in converted model:")
for i, key in enumerate(list(new_state_dict.keys())[:10]):
    print(f"  {key}")

### Copy Converted Model to User Network Directory

In [None]:
# First, convert your model weights (run the conversion script I provided earlier)
# Then copy the converted weights to the model storage directory
shutil.copy('converted_model.pth', '/content/user_network/custom_model.pth')

## 6. Testing Your Custom Model
Now let's test the fine-tuned model on some sample images.

In [None]:
reader = easyocr.Reader(
    ['en'],
    model_storage_directory='/content/user_network',
    user_network_directory='/content/user_network',
    recog_network='custom_model',
    gpu=False
)

results = reader.readtext('input1.png')

for res in results:
    print(f"Detected: {res[1]} (Confidence: {res[2]:.4f})")

## Tips for Better Results
### 1. Data Quality
- Use high-quality, clear images
- Ensure consistent formatting across training samples
- Include diverse examples of all characters/words you want to recognize

### 2. Training Parameters
- Increase num_iter for better convergence (`3000-5000` recommended)
- Adjust `batch_size` based on available memory
- Monitor validation metrics to prevent overfitting

### 3. Model Architecture
- Try different feature extractors: `ResNet`, `VGG`, `RCNN`
- Experiment with different sequence models: `BiLSTM`, `None`
- Test different prediction methods: `CTC`, `Attn`

### 4. Data Augmentation
- Add rotated, scaled, or slightly distorted versions of images
- Include variations in lighting and contrast
- Augment rare characters/words to balance the dataset

## Next Steps
1. Export Your Model: Save the trained model for deployment
2. Evaluate on Test Set: Test on completely unseen data
3. Fine-tune Further: Iterate on hyperparameters for better performance
4. Deploy: Integrate your custom model into production applications

## References
- [EasyOCR GitHub Repository](https://github.com/JaidedAI/EasyOCR)
- [Deep Text Recognition Benchmark](https://github.com/clovaai/deep-text-recognition-benchmark)
- [LMDB Documentation](https://lmdb.readthedocs.io/)

### License
This notebook is provided under the MIT License. Feel free to use and modify for your projects.
### Contributing
Found an issue or have suggestions? Please open an issue or submit a pull request on GitHub.

Happy Training! ðŸš€