# 🌍 Universal Translator v1.2
## Clean Implementation with PEP 8 Standards

**Author:** Victor  
**Version:** 1.2  
**Date:** October 25, 2025  
**Status:** In progress 

### 📋 What's New in v1.2:
- ✅ PEP 8 compliant code
- ✅ Ruff linting for code quality  
- ✅ Proper documentation
- ✅ Type hints added
- ✅ Better organization

### Previous Versions:
- v1.1: See V1.1/ folder  
- v1.0: See V1.0/ folder

## 🔧 Setup & Installation {#setup}
Run these cells once to set up your environment

In [None]:
# Install required packages
!pip install ruff deep-translator pytesseract pillow

# Verify installations
import sys
print(f"✅ Python version: {sys.version}")
print("✅ All packages installed successfully!")
print("📦 Installed: ruff, deep-translator, pytesseract, pillow")

In [None]:
# Create ruff configuration
ruff_config = """
# Ruff configuration for v1.2
[tool.ruff]
line-length = 79
select = ["E", "F", "W", "I", "N", "UP"]
ignore = []
fix = true
"""

# Save config (optional - for reference)
with open('ruff_settings.txt', 'w') as f:
    f.write(ruff_config)

# Run Ruff check (run this AFTER adding main code)
!ruff check --line-length 79 --select E,F,W,I,N
print("✅ Code quality check complete!")
print("💡 Run this cell again after writing main code to check for issues")

## 💻 Main Implementation {#implementation}
### UniversalTranslator Class - Test Ready
PEP 8 compliant implementation with comprehensive documentation

In [None]:
"""
Universal Translator Module v1.2
PEP 8 compliant implementation for image text extraction and translation
"""

# Standard library imports
import re
from typing import Dict, Optional, Tuple, List

# Third-party imports
import pytesseract
from deep_translator import GoogleTranslator
from PIL import Image, ImageEnhance, ImageFilter

# Module information
__version__ = "1.2"
__author__ = "Victor"
__date__ = "October 25, 2025"

print(f"📚 Universal Translator Module v{__version__} loaded")
print(f"👤 Author: {__author__}")

In [None]:
class UniversalTranslator:
    """
    A universal translator for extracting and translating text from images.
    
    This class supports text extraction from images in multiple languages
    including English, Chinese, Japanese, Korean, and Hindi. It includes
    image enhancement capabilities and text correction algorithms.
    
    Attributes:
        language_codes (Dict[str, str]): Mapping of language names to 
            Tesseract language codes.
        supported_languages (List[str]): List of supported languages.
    """
    
    # Class constants
    SUPPORTED_LANGUAGES = [
        'english', 'chinese', 'japanese', 'korean', 'hindi'
    ]
    IMAGE_SCALE_FACTOR = 3
    CONTRAST_ENHANCEMENT = 2.5
    BRIGHTNESS_ENHANCEMENT = 1.2
    
    def __init__(self) -> None:
        """
        Initialize the UniversalTranslator.
        
        Sets up language code mappings for Tesseract OCR.
        """
        self.language_codes: Dict[str, str] = {
            'english': 'eng',
            'chinese': 'chi_sim',
            'japanese': 'jpn',
            'korean': 'kor',
            'hindi': 'hin'
        }
        self.supported_languages = list(self.language_codes.keys())
        self._setup_complete()
    
    def _setup_complete(self) -> None:
        """Print initialization confirmation."""
        print("✅ Universal Translator initialized!")
        print(f"📚 Supported languages: {', '.join(self.supported_languages)}")
    
    def enhance_image(self, image_path: str) -> str:
        """
        Enhance image quality for better OCR results.
        
        Args:
            image_path (str): Path to the input image file.
            
        Returns:
            str: Path to the enhanced image file.
            
        Raises:
            FileNotFoundError: If the image file doesn't exist.
            IOError: If the image cannot be processed.
        """
        try:
            # Open and convert to grayscale
            img = Image.open(image_path)
            img = img.convert('L')
            
            # Upscale image for better OCR accuracy
            width, height = img.size
            new_size = (
                width * self.IMAGE_SCALE_FACTOR,
                height * self.IMAGE_SCALE_FACTOR
            )
            img = img.resize(new_size, Image.Resampling.LANCZOS)
            
            # Apply contrast enhancement
            contrast_enhancer = ImageEnhance.Contrast(img)
            img = contrast_enhancer.enhance(self.CONTRAST_ENHANCEMENT)
            
            # Apply brightness enhancement
            brightness_enhancer = ImageEnhance.Brightness(img)
            img = brightness_enhancer.enhance(self.BRIGHTNESS_ENHANCEMENT)
            
            # Apply sharpening filters
            for _ in range(2):
                img = img.filter(ImageFilter.SHARPEN)
            
            # Save enhanced image
            enhanced_path = f"enhanced_{image_path}"
            img.save(enhanced_path)
            
            print(f"✅ Image enhanced: {enhanced_path}")
            return enhanced_path
            
        except FileNotFoundError as e:
            error_msg = f"❌ Image file not found: {image_path}"
            print(error_msg)
            raise FileNotFoundError(error_msg) from e
        except Exception as e:
            error_msg = f"❌ Error processing image: {str(e)}"
            print(error_msg)
            raise IOError(error_msg) from e
    
    def _fix_english_text(self, text: str) -> str:
        """
        Apply English-specific text corrections.
        
        Args:
            text (str): Raw text to be corrected.
            
        Returns:
            str: Corrected text.
        """
        if not text:
            return ""
        
        # Dictionary of known OCR errors and corrections
        direct_fixes = {
            'Helloworld': 'Hello World',
            'HelloWorld': 'Hello World',
            'Thisisa': 'This is a',
            'This isa': 'This is a',
            'toour': 'to our',
            'aboutour': 'about our',
            'GRANDOPENING': 'GRAND OPENING',
            'SO OFF': '50% OFF',
            'SOOFF': '50% OFF',
            'Pythonm': 'Python',
        }
        
        # Apply direct replacements
        for incorrect, correct in direct_fixes.items():
            text = text.replace(incorrect, correct)
        
        # Pattern-based corrections
        patterns = [
            (r'\bisa\b', 'is a'),
            (r'([a-z])([A-Z])', r'\1 \2'),
            (r'([a-zA-Z])(\d)', r'\1 \2'),
            (r'(\d)([a-zA-Z])', r'\1 \2'),
        ]
        
        for pattern, replacement in patterns:
            text = re.sub(pattern, replacement, text)
        
        # Fix common OCR errors
        common_errors = {
            ' tbe ': ' the ',
            ' amd ': ' and ',
            ' isa ': ' is a '
        }
        
        for error, correction in common_errors.items():
            text = text.replace(error, correction)
        
        # Clean up extra whitespace
        text = ' '.join(text.split())
        
        return text
    
    def fix_text(self, text: str, language: str) -> str:
        """
        Apply language-specific text corrections.
        
        Args:
            text (str): Raw text extracted from OCR.
            language (str): Language of the text.
            
        Returns:
            str: Corrected text.
        """
        if not text:
            return ""
        
        if language == 'english':
            return self._fix_english_text(text)
        
        # TODO: Implement fixes for other languages
        # Placeholder for future implementations
        language_fixers = {
            'chinese': lambda t: t,   # Future: _fix_chinese_text
            'japanese': lambda t: t,  # Future: _fix_japanese_text
            'korean': lambda t: t,    # Future: _fix_korean_text
            'hindi': lambda t: t      # Future: _fix_hindi_text
        }
        
        fixer = language_fixers.get(language, lambda t: t)
        return fixer(text)
    
    def _get_ocr_config(self, image_path: str) -> str:
        """
        Determine optimal OCR configuration based on image type.
        
        Args:
            image_path (str): Path to the image file.
            
        Returns:
            str: Tesseract configuration string.
        """
        image_lower = image_path.lower()
        
        # Configuration based on image type
        configs = {
            'document': r'--oem 3 --psm 6',   # Uniform text block
            'sign': r'--oem 3 --psm 11',      # Sparse text
            'screenshot': r'--oem 3 --psm 3',  # Automatic
            'default': r'--oem 3 --psm 3'      # Automatic
        }
        
        for key, config in configs.items():
            if key in image_lower:
                return config
        
        return configs['default']
    
    def process(
        self,
        image_path: str,
        language: str = 'english'
    ) -> Dict[str, str]:
        """
        Process an image to extract and optionally translate text.
        
        Args:
            image_path (str): Path to the image file.
            language (str, optional): Source language. Defaults to 'english'.
            
        Returns:
            Dict[str, str]: Dictionary containing:
                - 'original': Raw extracted text
                - 'fixed': Corrected text
                - 'translated': English translation
                - 'language': Source language
                
        Raises:
            ValueError: If unsupported language is specified.
            FileNotFoundError: If image file doesn't exist.
        """
        # Validate language
        if language not in self.language_codes:
            raise ValueError(
                f"❌ Unsupported language: {language}. "
                f"Supported: {', '.join(self.language_codes.keys())}"
            )
        
        print(f"🔍 Processing image: {image_path}")
        print(f"🌐 Language: {language}")
        
        try:
            # Step 1: Enhance image
            enhanced_path = self.enhance_image(image_path)
            
            # Step 2: Extract text with OCR
            lang_code = self.language_codes[language]
            config = self._get_ocr_config(image_path)
            
            print(f"🔧 Using Tesseract config: {config}")
            raw_text = pytesseract.image_to_string(
                enhanced_path,
                lang=lang_code,
                config=config
            )
            
            # Step 3: Apply text corrections
            fixed_text = self.fix_text(raw_text, language)
            
            # Step 4: Translate if necessary
            if language != 'english' and fixed_text:
                print("🌍 Translating to English...")
                translator = GoogleTranslator(source='auto', target='en')
                translated_text = translator.translate(fixed_text)
            else:
                translated_text = fixed_text
            
            result = {
                'original': raw_text,
                'fixed': fixed_text,
                'translated': translated_text,
                'language': language
            }
            
            print("✅ Processing complete!")
            return result
            
        except Exception as e:
            print(f"❌ Error processing image: {str(e)}")
            raise

# Initialize the translator
print("\n" + "="*50)
print("🚀 Initializing Universal Translator v1.2...")
print("="*50)
translator = UniversalTranslator()

## 🧪 Testing & Examples {#testing}
Test the translator with sample images

In [None]:
# Test cell - Examples and demonstrations

def test_translator():
    """Test the translator with a sample image."""
    
    # Example usage (uncomment and modify as needed)
    """
    # Test with English text
    result = translator.process('english_test.png', 'english')
    
    print("📄 Test Results:")
    print("-" * 40)
    print(f"Original text: {result['original'][:100]}...")
    print(f"Fixed text: {result['fixed'][:100]}...")
    print(f"Language: {result['language']}")
    """
    
    print("📝 Test function ready!")
    print("Uncomment the code above and add your test image path")

# Call test function
test_translator()

# Quick test of core functions
print("\n📋 Core Functions Check:")
print(f"✅ Supported languages: {translator.supported_languages}")
print(f"✅ Language codes: {translator.language_codes}")

## 📚 Development Notes {#notes}

### ✅ Completed Features:
- [x] PEP 8 compliant code structure
- [x] Ruff integration for code quality
- [x] Type hints throughout
- [x] Comprehensive documentation
- [x] Error handling
- [x] English text correction algorithms

### 🔄 Future Improvements:
- [ ] Add text correction for other languages
- [ ] Implement batch processing
- [ ] Add progress bars for long operations
- [ ] Create GUI interface
- [ ] Add unit tests
- [ ] Implement logging instead of print statements

### 📖 Change Log:
- **v1.2** (Oct 25, 2025): Complete rewrite with PEP 8 standards
- **v1.1**: Improved text corrections (see V1.1/ folder)
- **v1.0**: Initial implementation (see V1.0/ folder)

### 🐛 Known Issues:
- Enhanced images are saved in the working directory (consider temp files)
- Some complex text layouts may need PSM tuning

### 📚 References:
- [Tesseract Documentation](https://github.com/tesseract-ocr/tesseract)
- [PEP 8 Style Guide](https://pep8.org/)
- [Ruff Documentation](https://docs.astral.sh/ruff/)