# EasyOCR Model Training Notebook

This notebook provides a comprehensive training pipeline for EasyOCR models. It includes:
- Configuration loading from YAML files
- Model training with customizable parameters
- Progress monitoring and validation
- Model checkpointing

## Getting Started
Make sure you have prepared your dataset and configuration files before running this notebook.

In [17]:
# Import required libraries and modules
import os
import sys
import time
import torch
import torch.backends.cudnn as cudnn
import yaml
import pandas as pd
import numpy as np
from datetime import datetime

# Import custom modules
from train import train
from utils import AttrDict

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device count: {torch.cuda.device_count()}")
    print(f"Current device: {torch.cuda.current_device()}")
    print(f"Device name: {torch.cuda.get_device_name()}")

PyTorch version: 2.7.1
CUDA available: False


In [18]:
# Configure CUDNN backend for performance optimization
cudnn.benchmark = True  # Enable auto-tuner to find the best algorithm
cudnn.deterministic = False  # Allow non-deterministic algorithms for better performance

print("CUDNN Configuration:")
print(f"  - Benchmark: {cudnn.benchmark}")
print(f"  - Deterministic: {cudnn.deterministic}")
print("  - This configuration optimizes training speed but may affect reproducibility")

CUDNN Configuration:
  - Benchmark: True
  - Deterministic: False
  - This configuration optimizes training speed but may affect reproducibility


In [19]:
def get_config(file_path):
    """
    Load and process training configuration from YAML file
    
    Args:
        file_path (str): Path to the YAML configuration file
        
    Returns:
        AttrDict: Configuration object with all training parameters
    """
    print(f"Loading configuration from: {file_path}")
    
    # Load YAML configuration
    with open(file_path, 'r', encoding="utf8") as stream:
        opt = yaml.safe_load(stream)
    
    # Convert to AttrDict for easier access
    opt = AttrDict(opt)
    
    # Process character set based on configuration
    if opt.lang_char == 'None':
        print("Extracting character set from training data...")
        characters = ''
        
        # Extract characters from all selected datasets
        for data in opt['select_data'].split('-'):
            csv_path = os.path.join(opt['train_data'], data, 'labels.csv')
            print(f"  - Processing dataset: {data}")
            
            # Read labels and extract unique characters
            df = pd.read_csv(csv_path, sep='^([^,]+),', engine='python', 
                           usecols=['filename', 'words'], keep_default_na=False)
            all_char = ''.join(df['words'])
            characters += ''.join(set(all_char))
        
        # Create sorted unique character set
        characters = sorted(set(characters))
        opt.character = ''.join(characters)
        print(f"  - Extracted {len(characters)} unique characters")
    else:
        # Use predefined character set
        opt.character = opt.number + opt.symbol + opt.lang_char
        print(f"Using predefined character set: {len(opt.character)} characters")
    
    # Create output directory
    output_dir = f'./saved_models/{opt.experiment_name}'
    os.makedirs(output_dir, exist_ok=True)
    print(f"Models will be saved to: {output_dir}")
    
    # Print key configuration parameters
    print("\nKey Configuration Parameters:")
    print(f"  - Experiment name: {opt.experiment_name}")
    print(f"  - Number of iterations: {opt.num_iter}")
    print(f"  - Batch size: {opt.batch_size}")
    print(f"  - Learning rate: {opt.lr}")
    print(f"  - Image size: {opt.imgH}x{opt.imgW}")
    print(f"  - Character set length: {len(opt.character)}")
    
    return opt

In [20]:
# Load training configuration
print("="*50)
print("AVAILABLE CONFIGURATIONS")
print("="*50)

# Check available configurations
config_dir = "config_files"
available_configs = {}

if os.path.exists(config_dir):
    for file in os.listdir(config_dir):
        if file.endswith(".yaml") or file.endswith(".yml"):
            config_name = file.replace('.yaml', '').replace('.yml', '')
            available_configs[config_name] = f"{config_dir}/{file}"
            print(f"📄 {config_name}: {file}")

print("\n" + "="*50)
print("LOADING TRAINING CONFIGURATION")
print("="*50)

# Select configuration - Change this to use your preferred config
CONFIG_OPTIONS = {
    'thai': 'thai_auto_config.yaml',     # Thai language dataset
    'english': 'en_filtered_config.yaml', # English language dataset  
    'custom': 'custom_config.yaml'       # Create your own config
}

# 🔧 CONFIGURATION SELECTION - Modify this line to change dataset
selected_config = 'thai'  # Options: 'thai', 'english', or 'custom'

config_file = f"config_files/{CONFIG_OPTIONS[selected_config]}"

print(f"🎯 Selected Configuration: {selected_config}")
print(f"📁 Config file: {config_file}")

try:
    opt = get_config(config_file)
    print(f"\n✅ Configuration loaded successfully!")
    print(f"🚀 Training will start with experiment: '{opt.experiment_name}'")
    
    # Display detected datasets
    print(f"\n📊 Dataset Information:")
    print(f"   - Training data directory: {opt.train_data}")
    print(f"   - Validation data directory: {opt.valid_data}")
    print(f"   - Selected datasets: {opt.select_data}")
    print(f"   - Batch ratios: {opt.batch_ratio}")
    
except FileNotFoundError:
    print(f"❌ Error: Configuration file '{config_file}' not found!")
    print("Please make sure the config file exists and the path is correct.")
    print(f"\nAvailable config files in {config_dir}:")
    for name, path in available_configs.items():
        print(f"  - {name}: {path}")
    raise
except Exception as e:
    print(f"❌ Error loading configuration: {e}")
    print("Please check the configuration file format and content.")
    raise

AVAILABLE CONFIGURATIONS
📄 en_filtered_config: en_filtered_config.yaml
📄 thai_auto_config: thai_auto_config.yaml

LOADING TRAINING CONFIGURATION
🎯 Selected Configuration: thai
📁 Config file: config_files/thai_auto_config.yaml
Loading configuration from: config_files/thai_auto_config.yaml
Using predefined character set: 92 characters
Models will be saved to: ./saved_models/thai_auto

Key Configuration Parameters:
  - Experiment name: thai_auto
  - Number of iterations: 5000
  - Batch size: 8
  - Learning rate: 0.001
  - Image size: 64x400
  - Character set length: 92

✅ Configuration loaded successfully!
🚀 Training will start with experiment: 'thai_auto'

📊 Dataset Information:
   - Training data directory: all_data
   - Validation data directory: all_data/thai_val
   - Selected datasets: thai_train
   - Batch ratios: 1


In [21]:
# Validate dataset structure
print("\n" + "="*50)
print("DATASET VALIDATION")
print("="*50)

def validate_dataset_structure(opt):
    """Validate that required datasets and files exist"""
    issues = []
    
    # Check train data directory
    if not os.path.exists(opt.train_data):
        issues.append(f"❌ Training data directory not found: {opt.train_data}")
    else:
        print(f"✅ Training data directory found: {opt.train_data}")
        
        # Check selected datasets
        selected_datasets = opt.select_data.split('-')
        for dataset in selected_datasets:
            dataset_path = os.path.join(opt.train_data, dataset)
            if not os.path.exists(dataset_path):
                issues.append(f"❌ Dataset not found: {dataset_path}")
            else:
                # Check for labels.csv
                labels_file = os.path.join(dataset_path, 'labels.csv')
                if not os.path.exists(labels_file):
                    issues.append(f"❌ Labels file not found: {labels_file}")
                else:
                    # Count samples
                    try:
                        df = pd.read_csv(labels_file, sep='^([^,]+),', engine='python', 
                                       usecols=['filename', 'words'], keep_default_na=False)
                        sample_count = len(df)
                        print(f"✅ Dataset '{dataset}': {sample_count} samples")
                    except Exception as e:
                        issues.append(f"❌ Error reading {labels_file}: {e}")
    
    # Check validation data
    if hasattr(opt, 'valid_data') and opt.valid_data:
        # Handle different validation data configurations
        if opt.valid_data == opt.train_data:
            # If valid_data same as train_data, it will use hierarchical structure
            print(f"✅ Validation uses hierarchical structure from: {opt.valid_data}")
        else:
            # Check specific validation directory
            if not os.path.exists(opt.valid_data):
                issues.append(f"❌ Validation data directory not found: {opt.valid_data}")
            else:
                print(f"✅ Validation data directory found: {opt.valid_data}")
                # Check if it has labels.csv (for direct validation dataset)
                val_labels_file = os.path.join(opt.valid_data, 'labels.csv')
                if os.path.exists(val_labels_file):
                    try:
                        df = pd.read_csv(val_labels_file, sep='^([^,]+),', engine='python', 
                                       usecols=['filename', 'words'], keep_default_na=False)
                        val_sample_count = len(df)
                        print(f"✅ Validation dataset: {val_sample_count} samples")
                    except Exception as e:
                        issues.append(f"❌ Error reading {val_labels_file}: {e}")
                else:
                    print(f"⚠️  No direct labels.csv in validation directory")
                    print(f"   Validation will use hierarchical structure")
    
    return issues

def fix_common_issues(opt):
    """Automatically fix common configuration issues"""
    fixes_applied = []
    
    # Check if validation data points to non-existent path and try to fix
    if hasattr(opt, 'valid_data') and opt.valid_data:
        if not os.path.exists(opt.valid_data):
            # Try common validation directory patterns
            potential_dirs = [
                f"{opt.train_data}/val",
                f"{opt.train_data}/validation", 
                f"{opt.train_data}/thai_val",
                f"{opt.train_data}/en_val",
                opt.train_data  # Use same as training data for hierarchical
            ]
            
            for potential_dir in potential_dirs:
                if os.path.exists(potential_dir):
                    original_path = opt.valid_data
                    opt.valid_data = potential_dir
                    fixes_applied.append(f"🔧 Changed valid_data: {original_path} → {potential_dir}")
                    break
    
    return fixes_applied

# Apply automatic fixes first
print("🔧 Checking for common issues...")
fixes = fix_common_issues(opt)
if fixes:
    print("Applied automatic fixes:")
    for fix in fixes:
        print(f"   {fix}")
else:
    print("   No automatic fixes needed")

# Run validation
validation_issues = validate_dataset_structure(opt)

if validation_issues:
    print(f"\n⚠️  Found {len(validation_issues)} issue(s):")
    for issue in validation_issues:
        print(f"   {issue}")
    
    # Suggest solutions
    print(f"\n💡 Suggested solutions:")
    print(f"   1. Check that your dataset folders contain 'labels.csv' files")
    print(f"   2. Verify dataset paths in the configuration file")
    print(f"   3. Make sure validation data path is correct")
    print(f"   4. Consider using train_data path for validation if no separate validation set")
    
    # Don't stop here, let user decide
    user_continue = input("\nDo you want to continue training anyway? (y/n): ")
    if user_continue.lower() != 'y':
        print("Training stopped. Please fix the issues above.")
        raise Exception("Dataset validation failed")
else:
    print(f"\n🎉 All dataset validations passed!")
    print("Ready to proceed with training.")


DATASET VALIDATION
🔧 Checking for common issues...
   No automatic fixes needed
✅ Training data directory found: all_data
✅ Dataset 'thai_train': 80 samples
✅ Validation data directory found: all_data/thai_val
✅ Validation dataset: 20 samples

🎉 All dataset validations passed!
Ready to proceed with training.


In [22]:
# 📊 Dataset Structure Overview
print("\n" + "="*50)
print("DATASET STRUCTURE OVERVIEW")
print("="*50)

def show_dataset_structure(base_path="all_data", max_depth=3):
    """Show the structure of datasets"""
    if not os.path.exists(base_path):
        print(f"❌ Base path not found: {base_path}")
        return
    
    print(f"📁 Dataset structure under '{base_path}':")
    
    def print_tree(path, prefix="", depth=0):
        if depth >= max_depth:
            return
            
        try:
            items = sorted(os.listdir(path))
            dirs = [item for item in items if os.path.isdir(os.path.join(path, item))]
            files = [item for item in items if os.path.isfile(os.path.join(path, item))]
            
            # Print directories first
            for i, dir_name in enumerate(dirs):
                is_last_dir = (i == len(dirs) - 1) and len(files) == 0
                current_prefix = "└── " if is_last_dir else "├── "
                print(f"{prefix}{current_prefix}📁 {dir_name}/")
                
                # Check if it has labels.csv
                labels_path = os.path.join(path, dir_name, "labels.csv")
                if os.path.exists(labels_path):
                    try:
                        df = pd.read_csv(labels_path, sep='^([^,]+),', engine='python', 
                                       usecols=['filename', 'words'], keep_default_na=False)
                        sample_count = len(df)
                        next_prefix = prefix + ("    " if is_last_dir else "│   ")
                        print(f"{next_prefix}└── 📄 labels.csv ({sample_count} samples)")
                    except:
                        next_prefix = prefix + ("    " if is_last_dir else "│   ")
                        print(f"{next_prefix}└── 📄 labels.csv (error reading)")
                
                # Recurse into subdirectory
                next_prefix = prefix + ("    " if is_last_dir else "│   ")
                print_tree(os.path.join(path, dir_name), next_prefix, depth + 1)
            
            # Print key files
            key_files = [f for f in files if f.endswith(('.csv', '.txt', '.yaml', '.yml'))]
            for i, file_name in enumerate(key_files):
                is_last = i == len(key_files) - 1
                current_prefix = "└── " if is_last else "├── "
                file_path = os.path.join(path, file_name)
                file_size = os.path.getsize(file_path)
                print(f"{prefix}{current_prefix}📄 {file_name} ({file_size} bytes)")
                
        except PermissionError:
            print(f"{prefix}└── ❌ Permission denied")
        except Exception as e:
            print(f"{prefix}└── ❌ Error: {e}")
    
    print_tree(base_path)

# Show current dataset structure
show_dataset_structure()

print(f"\n🎯 Current Configuration:")
print(f"   - Training data: {opt.train_data}")
print(f"   - Validation data: {opt.valid_data}")
print(f"   - Selected training datasets: {opt.select_data}")
print("="*50)


DATASET STRUCTURE OVERVIEW
📁 Dataset structure under 'all_data':
├── 📁 easy_ocr/
│   └── 📄 custom_example.yaml (227 bytes)
├── 📁 results_JS-Kobori/
│   ├── 📁 glyph_masks/
│   │   └── 📁 0/
│   ├── 📁 images/
│   │   ├── 📁 0/
│   ├── 📁 masks/
│   │   └── 📁 0/
│   ├── 📄 coords.txt (14609 bytes)
│   ├── 📄 glyph_coords.txt (14609 bytes)
│   └── 📄 gt.txt (5588 bytes)
├── 📁 thai_train/
│   └── 📄 labels.csv (80 samples)
│   └── 📄 labels.csv (3719 bytes)
└── 📁 thai_val/
    └── 📄 labels.csv (20 samples)
    └── 📄 labels.csv (1001 bytes)

🎯 Current Configuration:
   - Training data: all_data
   - Validation data: all_data/thai_val
   - Selected training datasets: thai_train


In [23]:
# 🔧 Optional: Customize training parameters
print("\n" + "="*50)
print("TRAINING PARAMETER CUSTOMIZATION")
print("="*50)

# You can modify these parameters without changing the config file
print("📝 Current parameters (you can modify these):")

# Training parameters you might want to adjust
CUSTOM_PARAMS = {
    'batch_size': None,        # Set to override config (e.g., 16, 32, 64)
    'num_iter': None,          # Set to override config (e.g., 10000, 50000, 100000)
    'lr': None,                # Set to override config (e.g., 0.001, 0.0001)
    'valInterval': None,       # Set to override config (e.g., 1000, 5000)
    'workers': None,           # Set to override config (e.g., 0, 2, 4)
    'imgH': None,              # Set to override config (e.g., 32, 64)
    'imgW': None,              # Set to override config (e.g., 200, 400, 600)
}

# Apply custom parameters
original_params = {}
for param, value in CUSTOM_PARAMS.items():
    if value is not None and hasattr(opt, param):
        original_params[param] = getattr(opt, param)
        setattr(opt, param, value)
        print(f"   🔄 {param}: {original_params[param]} → {value}")

if not original_params:
    print("   ✅ Using original config parameters")
else:
    print(f"   📊 Modified {len(original_params)} parameters")

# Memory optimization for different GPU sizes
print(f"\n🖥️  Memory Optimization Suggestions:")
print(f"   Current batch_size: {opt.batch_size}")

gpu_memory_gb = None  # Set this if you know your GPU memory
if gpu_memory_gb:
    if gpu_memory_gb >= 16:
        suggested_batch = min(64, opt.batch_size)
        print(f"   💪 High-end GPU ({gpu_memory_gb}GB): Consider batch_size up to {suggested_batch}")
    elif gpu_memory_gb >= 8:
        suggested_batch = min(32, opt.batch_size)
        print(f"   🎯 Mid-range GPU ({gpu_memory_gb}GB): Consider batch_size up to {suggested_batch}")
    else:
        suggested_batch = min(16, opt.batch_size)
        print(f"   ⚡ Lower-end GPU ({gpu_memory_gb}GB): Consider batch_size up to {suggested_batch}")
else:
    print(f"   💡 If you get CUDA out of memory errors, try reducing batch_size to {opt.batch_size // 2}")

print("="*50)


TRAINING PARAMETER CUSTOMIZATION
📝 Current parameters (you can modify these):
   ✅ Using original config parameters

🖥️  Memory Optimization Suggestions:
   Current batch_size: 8
   💡 If you get CUDA out of memory errors, try reducing batch_size to 4


In [24]:
# Display training options and parameters
print("="*50)
print("TRAINING CONFIGURATION SUMMARY")
print("="*50)

print(f"📁 Data Configuration:")
print(f"   - Training data: {opt.train_data}")
print(f"   - Validation data: {opt.valid_data}")
print(f"   - Selected datasets: {opt.select_data}")
print(f"   - Batch ratios: {opt.batch_ratio}")

print(f"\n🖼️  Image Configuration:")
print(f"   - Image height: {opt.imgH}")
print(f"   - Image width: {opt.imgW}")
print(f"   - RGB mode: {opt.rgb}")
print(f"   - Padding: {opt.PAD}")

print(f"\n🧠 Model Configuration:")
print(f"   - Transformation: {opt.Transformation}")
print(f"   - Feature extraction: {opt.FeatureExtraction}")
print(f"   - Sequence modeling: {opt.SequenceModeling}")
print(f"   - Prediction: {opt.Prediction}")

print(f"\n⚙️  Training Configuration:")
print(f"   - Batch size: {opt.batch_size}")
print(f"   - Number of iterations: {opt.num_iter}")
print(f"   - Learning rate: {opt.lr}")
print(f"   - Optimizer: {opt.optim}")
print(f"   - Validation interval: {opt.valInterval}")
print(f"   - Number of workers: {opt.workers}")

print(f"\n💾 Output Configuration:")
print(f"   - Experiment name: {opt.experiment_name}")
print(f"   - Save directory: ./saved_models/{opt.experiment_name}")

# Training options
print(f"\n🎯 Training Options:")
print(f"   - Mixed precision (AMP): {'Enabled' if hasattr(opt, 'amp') and opt.amp else 'Disabled'}")
print(f"   - Data filtering: {'Disabled' if opt.data_filtering_off else 'Enabled'}")
print(f"   - Fine-tuning: {'Enabled' if opt.FT else 'Disabled'}")

print("="*50)

TRAINING CONFIGURATION SUMMARY
📁 Data Configuration:
   - Training data: all_data
   - Validation data: all_data/thai_val
   - Selected datasets: thai_train
   - Batch ratios: 1

🖼️  Image Configuration:
   - Image height: 64
   - Image width: 400
   - RGB mode: False
   - Padding: True

🧠 Model Configuration:
   - Transformation: None
   - Feature extraction: VGG
   - Sequence modeling: BiLSTM
   - Prediction: CTC

⚙️  Training Configuration:
   - Batch size: 8
   - Number of iterations: 5000
   - Learning rate: 0.001
   - Optimizer: adam
   - Validation interval: 250
   - Number of workers: 0

💾 Output Configuration:
   - Experiment name: thai_auto
   - Save directory: ./saved_models/thai_auto

🎯 Training Options:
   - Mixed precision (AMP): Disabled
   - Data filtering: Enabled
   - Fine-tuning: Disabled


In [25]:
# Start model training
print("="*50)
print("STARTING MODEL TRAINING")
print("="*50)

# Training parameters
use_amp = False  # Set to True to enable Automatic Mixed Precision for faster training
show_samples = 3  # Number of prediction samples to show during validation

print(f"🚀 Starting training at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"📊 Training samples to show: {show_samples}")
print(f"⚡ Mixed precision (AMP): {'Enabled' if use_amp else 'Disabled'}")

# Check if resuming from checkpoint
if opt.saved_model and opt.saved_model != '':
    print(f"🔄 Resuming training from: {opt.saved_model}")
else:
    print("🆕 Starting training from scratch")

print("\n" + "="*50)
print("TRAINING LOG")
print("="*50)

try:
    # Start training
    train(opt, show_number=show_samples, amp=use_amp)
except KeyboardInterrupt:
    print("\n⚠️  Training interrupted by user (Ctrl+C)")
    print("Model checkpoints are saved in: ./saved_models/{}/".format(opt.experiment_name))
except Exception as e:
    print(f"\n❌ Training failed with error: {e}")
    print("Please check the error details above and ensure:")
    print("1. Data paths are correct")
    print("2. Required dependencies are installed")
    print("3. GPU memory is sufficient")
    raise
finally:
    print(f"\n🏁 Training session ended at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

STARTING MODEL TRAINING
🚀 Starting training at: 2025-06-28 01:42:35
📊 Training samples to show: 3
⚡ Mixed precision (AMP): Disabled
🆕 Starting training from scratch

TRAINING LOG
Filtering the images containing characters which are not in opt.character
Filtering the images whose label is longer than opt.batch_max_length
--------------------------------------------------------------------------------
dataset_root: all_data
opt.select_data: ['thai_train']
opt.batch_ratio: ['1']
--------------------------------------------------------------------------------
dataset_root:    all_data	 dataset: thai_train
all_data/thai_train
sub-directory:	/thai_train	 num samples: 80
num total samples of thai_train: 80 x 1.0 (total_data_usage_ratio) = 80
num samples of thai_train per batch: 8 x 1.0 (batch_ratio) = 8
--------------------------------------------------------------------------------
Total_batch_size: 8 = 8
--------------------------------------------------------------------------------
datase

  scaler = GradScaler()


training time:  68.41202092170715
[250/5000] Train loss: 3.72254, Valid loss: 4.15305, Elapsed_time: 68.41333
Current_accuracy : 0.000, Current_norm_ED  : 0.0658
Best_accuracy    : 0.000, Best_norm_ED     : 0.0658
--------------------------------------------------------------------------------
Ground Truth              | Prediction                | Confidence Score & T/F
--------------------------------------------------------------------------------
สำรวจสิ่งใหม่             | ฝันย                      | 0.0000	False
ฝันให้ไกล                 | ฝัาย                      | 0.0006	False
ไปให้ถึง                  | ฝัล                       | 0.0027	False
--------------------------------------------------------------------------------
validation time:  0.30673885345458984
training time:  69.56587886810303
[500/5000] Train loss: 2.45274, Valid loss: 5.04326, Elapsed_time: 138.28604
Current_accuracy : 0.000, Current_norm_ED  : 0.1094
Best_accuracy    : 0.000, Best_norm_ED     : 0.1094
----

SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
# Training monitoring utilities
print("="*50)
print("TRAINING MONITORING UTILITIES")
print("="*50)

def check_training_logs(experiment_name):
    """Check and display training logs"""
    log_dir = f"./saved_models/{experiment_name}"
    
    if not os.path.exists(log_dir):
        print(f"❌ Log directory not found: {log_dir}")
        return
    
    print(f"📁 Log directory: {log_dir}")
    
    # Check available log files
    log_files = []
    for file in os.listdir(log_dir):
        if file.endswith('.txt') or file.endswith('.log'):
            log_files.append(file)
    
    if log_files:
        print("📄 Available log files:")
        for log_file in log_files:
            file_path = os.path.join(log_dir, log_file)
            file_size = os.path.getsize(file_path)
            print(f"   - {log_file} ({file_size} bytes)")
    else:
        print("❌ No log files found")
    
    # Check available model checkpoints
    model_files = []
    for file in os.listdir(log_dir):
        if file.endswith('.pth'):
            model_files.append(file)
    
    if model_files:
        print("💾 Available model checkpoints:")
        for model_file in model_files:
            file_path = os.path.join(log_dir, model_file)
            file_size = os.path.getsize(file_path) / (1024 * 1024)  # MB
            print(f"   - {model_file} ({file_size:.1f} MB)")
    else:
        print("❌ No model checkpoints found")

def read_training_log(experiment_name, lines=20):
    """Read the last N lines of training log"""
    log_path = f"./saved_models/{experiment_name}/log_train.txt"
    
    if not os.path.exists(log_path):
        print(f"❌ Training log not found: {log_path}")
        return
    
    print(f"📄 Last {lines} lines of training log:")
    print("-" * 50)
    
    with open(log_path, 'r', encoding='utf8') as f:
        log_lines = f.readlines()
        for line in log_lines[-lines:]:
            print(line.strip())

# Example usage (uncomment after training starts):
# check_training_logs(opt.experiment_name)
# read_training_log(opt.experiment_name, lines=10)

## 📝 Training Notes and Tips

### Important Notes:
1. **Memory Management**: Training requires significant GPU memory. Reduce batch size if you encounter out-of-memory errors.
2. **Checkpointing**: Models are automatically saved every 10,000 iterations and when achieving best accuracy/norm_ED.
3. **Monitoring**: Use the utility functions above to monitor training progress and check logs.
4. **Interruption**: You can safely interrupt training with Ctrl+C - checkpoints are regularly saved.

### Configuration Tips:
- **Mixed Precision (AMP)**: Enable for faster training on modern GPUs, but may affect model quality
- **Learning Rate**: Start with default values, adjust based on loss convergence
- **Validation Interval**: More frequent validation gives better monitoring but slows training
- **Batch Size**: Larger batches generally improve training stability

### After Training:
1. Check the `saved_models/{experiment_name}/` directory for:
   - `best_accuracy.pth` - Model with highest accuracy
   - `best_norm_ED.pth` - Model with best normalized edit distance
   - `iter_*.pth` - Regular checkpoints
   - `log_train.txt` - Training progress log
   - `opt.txt` - Configuration used for training

2. Use the monitoring utilities to analyze training progress
3. Evaluate the trained model on your test dataset
4. Consider fine-tuning with different learning rates or datasets

### Troubleshooting:
- **CUDA out of memory**: Reduce batch size or image dimensions
- **Slow training**: Enable AMP, increase num_workers, or use smaller validation intervals
- **Poor convergence**: Check learning rate, ensure data quality, verify character set
- **File not found errors**: Verify data paths and configuration file locations