# Cell 1 üåç Universal Translator v1.3
NOTES HERE

## Cell 2 üîß Setup & Installation {#setup}
Run these cells once to set up your environment

In [1]:
# Cell 3 Install required packages
%pip install ruff deep-translator pytesseract pillow

# Verify installations
import sys
print(f"‚úÖ Python version: {sys.version}")
print("‚úÖ All packages installed successfully!")
print("üì¶ Installed: ruff, deep-translator, pytesseract, pillow")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
‚úÖ Python version: 3.12.1 (main, Jul 10 2025, 11:57:50) [GCC 13.3.0]
‚úÖ All packages installed successfully!
üì¶ Installed: ruff, deep-translator, pytesseract, pillow


## Cell 4 üîß Code Quality Check
### Ruff Linting & PEP 8 Validation
Run this cell after installation to check and auto-fix code style issues

In [2]:
# Cell 5 - Ruff Code Quality Check & Fix

# Imports at the TOP (fixes the E402 error)
import os
import subprocess

# Clean up any old config files
for file in ['ruff_settings.txt', '../ruff_settings.txt']:
    if os.path.exists(file):
        os.remove(file)
        print(f"üóëÔ∏è Cleaned up {file}")

print("üîç RUFF CODE QUALITY CHECK FOR V1.3")
print("=" * 50)

# First, check what we have
print("üìä Initial check:")
!ruff check translator_v1.3.ipynb --statistics

print("\n" + "=" * 50)
print("üîß Auto-fixing safe issues...")
!ruff check translator_v1.3.ipynb --fix

print("\n" + "=" * 50)
print("üìã Final status:")
!ruff check translator_v1.3.ipynb --statistics

# Show success or what's left (subprocess already imported at top)
result = subprocess.run(['ruff', 'check', 'translator_v1.3.ipynb'], 
                       capture_output=True, text=True)
if result.returncode == 0:
    print("\nüéâ SUCCESS! All checks passed!")
else:
    print("\nüí° Some style issues remain (usually line length)")
    print("These don't affect functionality")

üîç RUFF CODE QUALITY CHECK FOR V1.3
üìä Initial check:

üîß Auto-fixing safe issues...
All checks passed!

üìã Final status:

üéâ SUCCESS! All checks passed!


## Cell 6 üíª ## Imports and Setup

**v1.3 Updates:**
- Added `Enum` for language selection
- All imports follow PEP 8 order
- Version 1.3 - November 2, 2025

In [3]:
# Standard library imports
import re
from enum import Enum
from typing import Dict

# Third-party imports
import pytesseract
from deep_translator import GoogleTranslator
from PIL import Image, ImageEnhance, ImageFilter

"""
Universal Translator Module v1.3
PEP 8 compliant implementation for image text extraction and translation
Now with Enum support for better type safety
"""

# Module information
__version__ = "1.3"
__author__ = "Victor"
__date__ = "November 2, 2025"

print(f"üìö Universal Translator Module v{__version__} loaded")
print(f"üë§ Author: {__author__}")

üìö Universal Translator Module v1.3 loaded
üë§ Author: Victor


## Configuration and Constants

**New in v1.3:** All settings are now in one place using the `Config` class.

**How it works:**
- Settings are grouped by type (Image, OCR, Files, Debug)
- Access using: `Config.Image.SCALE_FACTOR`
- Change any setting without touching main code

**Active Settings:**
- Image: scale, contrast, brightness
- Files: naming, cleanup
- Debug: verbose output on/off

**Future Features (placeholders ready):**
- Batch processing
- Caching
- Error retry

In [4]:
# Cell: Configuration and Constants
"""
Configuration and Constants for Universal Translator v1.3
Centralized settings for easy adjustment and maintenance
"""

class Config:
    """
    Centralized configuration using nested classes for organization.
    Access patterns: Config.Image.SCALE_FACTOR, Config.Debug.VERBOSE, etc.
    """
    
    # ============= IMAGE PROCESSING SETTINGS =============
    class Image:
        """Settings for image enhancement and processing"""
        # Quality vs Speed trade-off (2=fast, 3=balanced, 4+=quality)
        SCALE_FACTOR = 3
        
        # Enhancement settings (1.0 = no change)
        CONTRAST = 2.5      # Increase contrast (higher = more contrast)
        BRIGHTNESS = 1.2    # Increase brightness (higher = brighter)
        
        # Sharpening iterations (more = sharper but slower)
        SHARPEN_ITERATIONS = 2
        
        # Image format for saving enhanced images
        OUTPUT_FORMAT = 'JPEG'  # or 'PNG' for better quality
        OUTPUT_QUALITY = 85     # JPEG quality (1-100, higher = better)
    
    # ============= OCR CONFIGURATION =============
    class OCR:
        """Tesseract OCR settings and configurations"""
        # OCR modes based on image type
        CONFIGS = {
            'document': r'--oem 3 --psm 6',    # Uniform text block
            'sign': r'--oem 3 --psm 11',       # Sparse text
            'screenshot': r'--oem 3 --psm 3',   # Fully automatic
            'default': r'--oem 3 --psm 3'       # Fallback option
        }
        
        # Timeout for OCR operations (seconds)
        TIMEOUT = 30
        
        # Confidence threshold (0-100) - future use
        MIN_CONFIDENCE = 60
    
    # ============= FILE HANDLING =============
    class Files:
        """File naming and management settings"""
        # Prefix for enhanced images
        ENHANCED_PREFIX = "enhanced_"
        
        # Auto-cleanup temporary files after processing
        AUTO_CLEANUP = False  # Set True to delete enhanced images after use
        
        # Directory for temporary files (None = same as source)
        TEMP_DIR = None
        
        # Maximum file size in MB (for safety)
        MAX_FILE_SIZE_MB = 50
    
    # ============= DEBUG AND LOGGING =============
    class Debug:
        """Debug and output control settings"""
        # Show detailed processing steps
        VERBOSE = True
        
        # Show timing information
        SHOW_TIMING = True
        
        # Save enhanced images (overrides AUTO_CLEANUP when False)
        SAVE_ENHANCED = True
        
        # Print configuration on startup
        SHOW_CONFIG = True
        
        # Detailed error messages
        DETAILED_ERRORS = True
    
    # ============= BATCH PROCESSING (Future Feature) =============
    class Batch:
        """Settings for batch processing multiple images"""
        # Maximum images to process in one batch
        SIZE_LIMIT = 10
        
        # Process in parallel (False = sequential)
        PARALLEL = False
        
        # Number of worker threads (if PARALLEL=True)
        WORKERS = 4
        
        # Continue on error or stop batch
        CONTINUE_ON_ERROR = True
    
    # ============= CACHING (Future Feature) =============
    class Cache:
        """Settings for caching processed results"""
        # Enable/disable caching
        ENABLED = False
        
        # Maximum cache size in MB
        MAX_SIZE_MB = 100
        
        # Cache expiration in seconds (3600 = 1 hour)
        EXPIRY_SECONDS = 3600
        
        # Cache location (None = memory, string = disk path)
        LOCATION = None
    
    # ============= ERROR HANDLING (Future Feature) =============
    class ErrorHandling:
        """Settings for error recovery and retries"""
        # Number of retry attempts
        RETRY_COUNT = 3
        
        # Delay between retries (seconds)
        RETRY_DELAY = 1
        
        # Fallback to basic processing on error
        USE_FALLBACK = True
        
        # Log errors to file
        LOG_TO_FILE = False
        LOG_FILE = "translator_errors.log"
    
    # ============= PERFORMANCE (Future Feature) =============
    class Performance:
        """Performance monitoring and optimization settings"""
        # Track processing times
        TRACK_TIMING = True
        
        # Memory usage warnings (MB)
        MEMORY_WARNING_MB = 500
        
        # Automatic optimization based on image size
        AUTO_OPTIMIZE = True
    
    @classmethod
    def validate(cls):
        """
        Validate configuration settings.
        Raises ValueError if any settings are invalid.
        """
        # Image validation
        if cls.Image.SCALE_FACTOR < 1:
            raise ValueError("SCALE_FACTOR must be >= 1")
        if cls.Image.CONTRAST < 0:
            raise ValueError("CONTRAST must be >= 0")
        if cls.Image.BRIGHTNESS < 0:
            raise ValueError("BRIGHTNESS must be >= 0")
        
        # File validation
        if cls.Files.MAX_FILE_SIZE_MB <= 0:
            raise ValueError("MAX_FILE_SIZE_MB must be > 0")
        
        # Batch validation
        if cls.Batch.SIZE_LIMIT <= 0:
            raise ValueError("BATCH_SIZE_LIMIT must be > 0")
        
        print("‚úÖ Configuration validated successfully!")
        return True
    
    @classmethod
    def display(cls):
        """Display current configuration settings"""
        if not cls.Debug.SHOW_CONFIG:
            return
            
        print("\n" + "="*50)
        print("üìã CURRENT CONFIGURATION")
        print("="*50)
        
        print("\nüñºÔ∏è Image Processing:")
        print(f"  ‚Ä¢ Scale Factor: {cls.Image.SCALE_FACTOR}x")
        print(f"  ‚Ä¢ Contrast: {cls.Image.CONTRAST}")
        print(f"  ‚Ä¢ Brightness: {cls.Image.BRIGHTNESS}")
        
        print("\nüìÅ File Handling:")
        print(f"  ‚Ä¢ Enhanced Prefix: '{cls.Files.ENHANCED_PREFIX}'")
        print(f"  ‚Ä¢ Auto Cleanup: {cls.Files.AUTO_CLEANUP}")
        
        print("\nüîç Debug Settings:")
        print(f"  ‚Ä¢ Verbose Output: {cls.Debug.VERBOSE}")
        print(f"  ‚Ä¢ Save Enhanced Images: {cls.Debug.SAVE_ENHANCED}")
        
        print("\nüöÄ Future Features Status:")
        print(f"  ‚Ä¢ Batch Processing: {'Ready' if cls.Batch.SIZE_LIMIT > 0 else 'Disabled'}")
        print(f"  ‚Ä¢ Caching: {'Enabled' if cls.Cache.ENABLED else 'Disabled'}")
        print(f"  ‚Ä¢ Error Retry: {cls.ErrorHandling.RETRY_COUNT} attempts")
        print("="*50 + "\n")


# Validate and display configuration on load
try:
    Config.validate()
    Config.display()
except ValueError as e:
    print(f"‚ùå Configuration Error: {e}")
    print("Please fix the configuration values above.")

# ADD THE LANGUAGE ENUM HERE:
class Language(Enum):
    """
    Supported languages with their Tesseract language codes.
    """
    ENGLISH = 'eng'
    CHINESE = 'chi_sim'  # Simplified Chinese
    JAPANESE = 'jpn'
    KOREAN = 'kor'
    HINDI = 'hin'

# Display available languages
print("üåç Supported Languages:")
print("-" * 30)
for lang in Language:
    print(f"  ‚Ä¢ {lang.name.title()}: {lang.value}")
print("-" * 30)    

‚úÖ Configuration validated successfully!

üìã CURRENT CONFIGURATION

üñºÔ∏è Image Processing:
  ‚Ä¢ Scale Factor: 3x
  ‚Ä¢ Contrast: 2.5
  ‚Ä¢ Brightness: 1.2

üìÅ File Handling:
  ‚Ä¢ Enhanced Prefix: 'enhanced_'
  ‚Ä¢ Auto Cleanup: False

üîç Debug Settings:
  ‚Ä¢ Verbose Output: True
  ‚Ä¢ Save Enhanced Images: True

üöÄ Future Features Status:
  ‚Ä¢ Batch Processing: Ready
  ‚Ä¢ Caching: Disabled
  ‚Ä¢ Error Retry: 3 attempts

üåç Supported Languages:
------------------------------
  ‚Ä¢ English: eng
  ‚Ä¢ Chinese: chi_sim
  ‚Ä¢ Japanese: jpn
  ‚Ä¢ Korean: kor
  ‚Ä¢ Hindi: hin
------------------------------


## Universal Translator

**What's New:**
- Use `Language.ENGLISH` instead of 'english'
- All settings now use `Config` class
- Better error messages

**How to Use:**
```python
result = translator.process("image.jpg", Language.ENGLISH)

In [5]:
# Cell 10: Universal Translator Main Implementation with Smart Checking
import subprocess
import os

class UniversalTranslator:
    """
    A universal translator for extracting and translating text from images.
    
    This class supports text extraction from images in multiple languages
    using enum-based language selection and centralized configuration.
    Now includes smart language support checking.
    
    Attributes:
        supported_languages (list): List of Language enum members.
        available_languages (dict): Languages actually installed on system.
        missing_languages (dict): Languages defined but not installed.
    """
    
    def __init__(self) -> None:
        """
        Initialize the UniversalTranslator.
        
        Sets up supported languages using the Language enum and checks
        which language packs are actually installed.
        """
        self.supported_languages = list(Language)
        self.available_languages = {}
        self.missing_languages = {}
        
        # Check language support
        self._check_language_support()
        
        # Complete setup
        self._setup_complete()
    
    def _check_language_support(self) -> None:
        """
        Check which Tesseract language packs are installed.
        
        Reports available and missing languages without disabling any.
        Provides installation instructions for missing languages.
        """
        print("\n" + "="*50)
        print("üîç CHECKING LANGUAGE SUPPORT")
        print("="*50)
        
        # Get list of installed Tesseract languages
        installed_langs = set()
        try:
            result = subprocess.run(
                ['tesseract', '--list-langs'],
                capture_output=True,
                text=True,
                check=False
            )
            
            if result.returncode == 0:
                # Parse the output (skip header line)
                lines = result.stdout.strip().split('\n')[1:]
                installed_langs = set(lines)
                print(f"‚úÖ Tesseract found with {len(installed_langs)} language packs")
            else:
                print("‚ö†Ô∏è Could not get Tesseract language list")
                
        except FileNotFoundError:
            print("‚ùå Tesseract is not installed or not in PATH")
            print("   Install with: brew install tesseract (Mac)")
            print("   Or: sudo apt-get install tesseract-ocr (Linux)")
            return
        
        # Check each language we support
        print("\nüìã Language Pack Status:")
        
        for lang in self.supported_languages:
            # Get the base language code (handle chi_sim+chi_tra case)
            lang_codes = lang.value.split('+')
            
            # Check if any required language pack is installed
            is_available = any(code in installed_langs for code in lang_codes)
            
            if is_available:
                self.available_languages[lang] = True
                print(f"   ‚úÖ {lang.name:10} ({lang.value:10}) - Installed")
            else:
                self.missing_languages[lang] = True
                print(f"   ‚ùå {lang.name:10} ({lang.value:10}) - Not installed")
        
        # Show installation instructions if any missing
        if self.missing_languages:
            print("\nüì¶ Installation Instructions for Missing Languages:")
            print("   For Mac (using Homebrew):")
            for lang in self.missing_languages:
                if lang == Language.HINDI:
                    print(f"      brew install tesseract-lang")
                    
            print("\n   For Linux/Codespaces:")
            for lang in self.missing_languages:
                if lang == Language.CHINESE:
                    print(f"      sudo apt-get install tesseract-ocr-chi-sim")
                elif lang == Language.JAPANESE:
                    print(f"      sudo apt-get install tesseract-ocr-jpn")
                elif lang == Language.KOREAN:
                    print(f"      sudo apt-get install tesseract-ocr-kor")
                elif lang == Language.HINDI:
                    print(f"      sudo apt-get install tesseract-ocr-hin")
        else:
            print("\n‚úÖ All language packs are installed!")
        
        print("="*50)
    
    def _setup_complete(self) -> None:
        """Print initialization confirmation with language status."""
        if Config.Debug.VERBOSE:
            print("\n‚úÖ Universal Translator v1.3 initialized!")
            print(f"üìö Defined languages: {', '.join([lang.name.lower() for lang in self.supported_languages])}")
            
            # Report language availability summary
            if self.available_languages:
                available_names = [lang.name.lower() for lang in self.available_languages]
                print(f"‚úÖ Ready to use: {', '.join(available_names)}")
            
            if self.missing_languages:
                missing_names = [lang.name.lower() for lang in self.missing_languages]
                print(f"‚ö†Ô∏è Missing (will error if used): {', '.join(missing_names)}")
    
    def enhance_image(self, image_path: str) -> str:
        """
        Enhance image quality for better OCR results.
        
        Args:
            image_path (str): Path to the input image file.
            
        Returns:
            str: Path to the enhanced image file.
            
        Raises:
            FileNotFoundError: If the image file doesn't exist.
            IOError: If the image cannot be processed.
        """
        try:
            img = Image.open(image_path)
            img = img.convert('L')
            
            width, height = img.size
            new_size = (
                width * Config.Image.SCALE_FACTOR,
                height * Config.Image.SCALE_FACTOR
            )
            img = img.resize(new_size, Image.Resampling.LANCZOS)
            
            contrast_enhancer = ImageEnhance.Contrast(img)
            img = contrast_enhancer.enhance(Config.Image.CONTRAST)
            
            brightness_enhancer = ImageEnhance.Brightness(img)
            img = brightness_enhancer.enhance(Config.Image.BRIGHTNESS)
            
            for _ in range(Config.Image.SHARPEN_ITERATIONS):
                img = img.filter(ImageFilter.SHARPEN)
            
            enhanced_path = f"{Config.Files.ENHANCED_PREFIX}{image_path}"
            img.save(enhanced_path, quality=Config.Image.OUTPUT_QUALITY)
            
            if Config.Debug.VERBOSE:
                print(f"‚úÖ Image enhanced: {enhanced_path}")
            
            return enhanced_path
            
        except FileNotFoundError as e:
            error_msg = f"‚ùå Image file not found: {image_path}"
            if Config.Debug.DETAILED_ERRORS:
                print(error_msg)
            raise FileNotFoundError(error_msg) from e
        except Exception as e:
            error_msg = f"‚ùå Error processing image: {str(e)}"
            if Config.Debug.DETAILED_ERRORS:
                print(error_msg)
            raise IOError(error_msg) from e
    
    def _fix_english_text(self, text: str) -> str:
        """
        Apply English-specific text corrections.
        
        Args:
            text (str): Raw text to be corrected.
            
        Returns:
            str: Corrected text.
        """
        if not text:
            return ""
        
        direct_fixes = {
            'Helloworld': 'Hello World',
            'HelloWorld': 'Hello World',
            'Thisisa': 'This is a',
            'This isa': 'This is a',
            'toour': 'to our',
            'aboutour': 'about our',
            'GRANDOPENING': 'GRAND OPENING',
            'SO OFF': '50% OFF',
            'SOOFF': '50% OFF',
            'Pythonm': 'Python',
        }
        
        for incorrect, correct in direct_fixes.items():
            text = text.replace(incorrect, correct)
        
        patterns = [
            (r'\bisa\b', 'is a'),
            (r'([a-z])([A-Z])', r'\1 \2'),
            (r'([a-zA-Z])(\d)', r'\1 \2'),
            (r'(\d)([a-zA-Z])', r'\1 \2'),
        ]
        
        for pattern, replacement in patterns:
            text = re.sub(pattern, replacement, text)
        
        common_errors = {
            ' tbe ': ' the ',
            ' amd ': ' and ',
            ' isa ': ' is a '
        }
        
        for error, correction in common_errors.items():
            text = text.replace(error, correction)
        
        text = ' '.join(text.split())
        
        return text
    
    def fix_text(self, text: str, language: Language) -> str:
        """
        Apply language-specific text corrections.
        
        Args:
            text (str): Raw text extracted from OCR.
            language (Language): Language enum member.
            
        Returns:
            str: Corrected text.
        """
        if not text:
            return ""
        
        if language == Language.ENGLISH:
            return self._fix_english_text(text)
        
        language_fixers = {
            Language.CHINESE: lambda t: t,
            Language.JAPANESE: lambda t: t,
            Language.KOREAN: lambda t: t,
            Language.HINDI: lambda t: t
        }
        
        fixer = language_fixers.get(language, lambda t: t)
        return fixer(text)
    
    def _get_ocr_config(self, image_path: str) -> str:
        """
        Determine optimal OCR configuration based on image type.
        
        Args:
            image_path (str): Path to the image file.
            
        Returns:
            str: Tesseract configuration string.
        """
        image_lower = image_path.lower()
        
        for key, config in Config.OCR.CONFIGS.items():
            if key in image_lower:
                return config
        
        return Config.OCR.CONFIGS['default']
    
    def process(self, image_path: str, language: Language = Language.ENGLISH) -> Dict[str, str]:
        """
        Process an image to extract and optionally translate text.
        
        Note: Will still attempt to process even if language pack is missing
        (will error at runtime). Use available_languages to check first.
        
        Args:
            image_path (str): Path to the image file.
            language (Language): Source language enum member. Defaults to Language.ENGLISH.
            
        Returns:
            Dict[str, str]: Dictionary containing original, fixed, translated text and language.
                
        Raises:
            TypeError: If language is not a Language enum member.
            FileNotFoundError: If image file doesn't exist.
            RuntimeError: If language pack is not installed (when processing).
        """
        if not isinstance(language, Language):
            raise TypeError(
                f"‚ùå Language must be a Language enum member. "
                f"Use: Language.ENGLISH, Language.CHINESE, etc."
            )
        
        # Warn if using a missing language (but don't stop)
        if language in self.missing_languages:
            print(f"‚ö†Ô∏è Warning: {language.name} language pack may not be installed!")
            print(f"   This operation might fail.")
        
        if Config.Debug.VERBOSE:
            print(f"üîç Processing image: {image_path}")
            print(f"üåê Language: {language.name.lower()}")
        
        try:
            enhanced_path = self.enhance_image(image_path)
            
            lang_code = language.value
            config = self._get_ocr_config(image_path)
            
            if Config.Debug.VERBOSE:
                print(f"üîß Using Tesseract config: {config}")
            
            raw_text = pytesseract.image_to_string(
                enhanced_path,
                lang=lang_code,
                config=config
            )
            
            fixed_text = self.fix_text(raw_text, language)
            
            if language != Language.ENGLISH and fixed_text:
                if Config.Debug.VERBOSE:
                    print("üåç Translating to English...")
                translator = GoogleTranslator(source='auto', target='en')
                translated_text = translator.translate(fixed_text)
            else:
                translated_text = fixed_text
            
            if Config.Files.AUTO_CLEANUP and not Config.Debug.SAVE_ENHANCED:
                try:
                    os.remove(enhanced_path)
                    if Config.Debug.VERBOSE:
                        print(f"üóëÔ∏è Cleaned up: {enhanced_path}")
                except:
                    pass
            
            result = {
                'original': raw_text,
                'fixed': fixed_text,
                'translated': translated_text,
                'language': language.name.lower()
            }
            
            if Config.Debug.VERBOSE:
                print("‚úÖ Processing complete!")
            
            return result
            
        except Exception as e:
            if Config.Debug.DETAILED_ERRORS:
                print(f"‚ùå Error processing image: {str(e)}")
            raise


# Initialize the translator
print("\n" + "="*50)
print("üöÄ Initializing Universal Translator v1.3...")
print("="*50)
translator = UniversalTranslator()



üöÄ Initializing Universal Translator v1.3...

üîç CHECKING LANGUAGE SUPPORT
‚úÖ Tesseract found with 6 language packs

üìã Language Pack Status:
   ‚úÖ ENGLISH    (eng       ) - Installed
   ‚úÖ CHINESE    (chi_sim   ) - Installed
   ‚úÖ JAPANESE   (jpn       ) - Installed
   ‚úÖ KOREAN     (kor       ) - Installed
   ‚úÖ HINDI      (hin       ) - Installed

‚úÖ All language packs are installed!

‚úÖ Universal Translator v1.3 initialized!
üìö Defined languages: english, chinese, japanese, korean, hindi
‚úÖ Ready to use: english, chinese, japanese, korean, hindi


## üß™ Testing & Examples {#testing}
Test the translator with sample images

In [6]:
# Test cell - Examples and demonstrations
print("üß™ TESTING ENUM FUNCTIONALITY")
print("=" * 50)

# Test 1: Show available languages
print("üìö Available languages:")
for lang in Language:
    print(f"  - Language.{lang.name}: Tesseract code = '{lang.value}'")

print("\n" + "=" * 50)

# Test 2: Create a simple test image (if you don't have one)
from PIL import Image, ImageDraw, ImageFont

def create_test_image(text: str, filename: str):
    """Create a simple test image with text."""
    img = Image.new('RGB', (400, 100), color='white')
    draw = ImageDraw.Draw(img)
    # Use default font
    draw.text((10, 30), text, fill='black')
    img.save(filename)
    print(f"‚úÖ Created test image: {filename}")
    return filename

# Create a test image
test_file = create_test_image("Hello World! This is a test.", "test_english.jpg")

print("\n" + "=" * 50)

# Test 3: Process with enum
print("üîç Testing translation with Language.ENGLISH:")
try:
    result = translator.process(test_file, Language.ENGLISH)
    print(f"‚úÖ Success! Extracted: '{result['fixed'][:50]}...'")
except Exception as e:
    print(f"‚ùå Error: {e}")

print("\n" + "=" * 50)

# Test 4: Test error handling (wrong type)
print("üîç Testing error handling (passing string instead of enum):")
try:
    result = translator.process(test_file, "english")  # type: ignore  # This should fail!
    print("‚ùå Should have raised TypeError!")
except TypeError as e:
    print(f"‚úÖ Correctly caught error: {e}")

print("\n" + "=" * 50)
print("‚úÖ All enum tests complete!")

üß™ TESTING ENUM FUNCTIONALITY
üìö Available languages:
  - Language.ENGLISH: Tesseract code = 'eng'
  - Language.CHINESE: Tesseract code = 'chi_sim'
  - Language.JAPANESE: Tesseract code = 'jpn'
  - Language.KOREAN: Tesseract code = 'kor'
  - Language.HINDI: Tesseract code = 'hin'

‚úÖ Created test image: test_english.jpg

üîç Testing translation with Language.ENGLISH:
üîç Processing image: test_english.jpg
üåê Language: english
‚úÖ Image enhanced: enhanced_test_english.jpg
üîß Using Tesseract config: --oem 3 --psm 3
‚úÖ Processing complete!
‚úÖ Success! Extracted: 'Hello World! This is a test...'

üîç Testing error handling (passing string instead of enum):
‚úÖ Correctly caught error: ‚ùå Language must be a Language enum member. Use: Language.ENGLISH, Language.CHINESE, etc.

‚úÖ All enum tests complete!


In [7]:
# Quick check what's available
import subprocess
try:
    result = subprocess.run(['tesseract', '--list-langs'], capture_output=True, text=True)
    installed = result.stdout.lower()
    
    for lang in Language:
        if lang.value.split('+')[0] in installed:
            print(f"‚úÖ {lang.name} is ready to use")
        else:
            print(f"‚ùå {lang.name} needs Tesseract language pack: {lang.value}")
except:
    print("‚ö†Ô∏è Could not check Tesseract installation")

‚úÖ ENGLISH is ready to use
‚úÖ CHINESE is ready to use
‚úÖ JAPANESE is ready to use
‚úÖ KOREAN is ready to use
‚úÖ HINDI is ready to use


In [8]:
# Enhanced Test Cell - Internet Images + Language Testing
import urllib.request
import os

print("üß™ ENHANCED TESTING WITH INTERNET IMAGES")
print("=" * 50)

# Test 1: Show available languages
print("üìö Available languages in your code:")
for lang in Language:
    print(f"  - Language.{lang.name}: Tesseract code = '{lang.value}'")

print("\n" + "=" * 50)

# Test 2: Download image from internet
def download_test_image(url: str, filename: str):
    """Download an image from URL for testing."""
    try:
        urllib.request.urlretrieve(url, filename)
        print(f"‚úÖ Downloaded: {filename}")
        return filename
    except Exception as e:
        print(f"‚ùå Download failed: {e}")
        return None

print("üì• Downloading test images from internet...")

# Example test images (these are public domain/free to use)
test_images = {
    'english': {
        'url': 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf.jpg',
        'filename': 'test_english_web.jpg'
    },
    # Alternative: Simple text image
    'simple': {
        'url': 'https://via.placeholder.com/400x100/ffffff/000000?text=Hello+World+Test',
        'filename': 'test_simple.jpg'
    }
}

# Download a simple test image
test_url = 'https://via.placeholder.com/400x100/ffffff/000000?text=Hello+World+Test+123'
web_image = download_test_image(test_url, 'test_from_web.jpg')

print("\n" + "=" * 50)

# Test 3: Check what Tesseract languages are ACTUALLY installed
print("üîç Checking installed Tesseract languages...")
try:
    import subprocess
    result = subprocess.run(['tesseract', '--list-langs'], 
                          capture_output=True, text=True)
    if result.returncode == 0:
        langs = result.stdout.strip().split('\n')[1:]  # Skip header
        print("‚úÖ Tesseract languages installed on your system:")
        for lang in langs:
            print(f"   - {lang}")
    else:
        print("‚ö†Ô∏è Could not check Tesseract languages")
except:
    print("‚ö†Ô∏è Tesseract command not accessible")

print("\n" + "=" * 50)

# Test 4: Process the downloaded image
if web_image:
    print(f"üîç Processing downloaded image: {web_image}")
    try:
        result = translator.process(web_image, Language.ENGLISH)
        print(f"‚úÖ Extracted text: '{result['fixed']}'")
    except Exception as e:
        print(f"‚ùå Error: {e}")

print("\n" + "=" * 50)

# Test 5: Create test images for different languages
def create_language_test_image(lang_name: str, sample_text: str, filename: str):
    """Create a test image with text in different languages."""
    from PIL import Image, ImageDraw
    img = Image.new('RGB', (500, 100), color='white')
    draw = ImageDraw.Draw(img)
    # Note: Default font may not support all characters
    try:
        draw.text((10, 30), sample_text, fill='black')
        img.save(filename)
        print(f"‚úÖ Created {lang_name} test: {filename}")
        return filename
    except Exception as e:
        print(f"‚ö†Ô∏è Could not create {lang_name} image: {e}")
        return None

# Sample texts for testing
test_texts = {
    Language.ENGLISH: ("English", "Hello World! Testing 123", "test_en.jpg"),
    Language.CHINESE: ("Chinese", "‰Ω†Â•Ω‰∏ñÁïåÔºÅÊµãËØï 123", "test_zh.jpg"),
    Language.JAPANESE: ("Japanese", "„Åì„Çì„Å´„Å°„ÅØ‰∏ñÁïåÔºÅ„ÉÜ„Çπ„Éà", "test_ja.jpg"),
    Language.KOREAN: ("Korean", "ÏïàÎÖïÌïòÏÑ∏Ïöî ÏÑ∏Í≥Ñ! ÌÖåÏä§Ìä∏", "test_ko.jpg"),
    Language.HINDI: ("Hindi", "‡§®‡§Æ‡§∏‡•ç‡§§‡•á ‡§¶‡•Å‡§®‡§ø‡§Ø‡§æ! ‡§™‡§∞‡•Ä‡§ï‡•ç‡§∑‡§£", "test_hi.jpg")
}

print("üìù Creating test images for each language...")
for lang, (name, text, filename) in test_texts.items():
    img_file = create_language_test_image(name, text, filename)
    if img_file and os.path.exists(img_file):
        try:
            result = translator.process(img_file, lang)
            print(f"   Processed {name}: '{result['fixed'][:30]}...'")
        except Exception as e:
            print(f"   ‚ö†Ô∏è {name} processing failed: {e}")

print("\n" + "=" * 50)
print("‚úÖ Testing complete!")

üß™ ENHANCED TESTING WITH INTERNET IMAGES
üìö Available languages in your code:
  - Language.ENGLISH: Tesseract code = 'eng'
  - Language.CHINESE: Tesseract code = 'chi_sim'
  - Language.JAPANESE: Tesseract code = 'jpn'
  - Language.KOREAN: Tesseract code = 'kor'
  - Language.HINDI: Tesseract code = 'hin'

üì• Downloading test images from internet...
‚ùå Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>

üîç Checking installed Tesseract languages...
‚úÖ Tesseract languages installed on your system:
   - chi_sim
   - eng
   - hin
   - jpn
   - kor
   - osd


üìù Creating test images for each language...
‚úÖ Created English test: test_en.jpg
üîç Processing image: test_en.jpg
üåê Language: english
‚úÖ Image enhanced: enhanced_test_en.jpg
üîß Using Tesseract config: --oem 3 --psm 3
‚úÖ Processing complete!
   Processed English: 'Hello World! Testing 123...'
‚úÖ Created Chinese test: test_zh.jpg
üîç Processing image: test_zh.jpg
üåê Language: chinese

## üìö Development Notes {#notes}

### ‚úÖ Completed Features:
- Add notes Here

### üîÑ Future Improvements:
-  Add notes Here

### üìñ Change Log:
-  Add notes Here

### üêõ Known Issues:
-  Add notes Here

### üìö References:
-  Add notes Here