🌍 Python Encoding Fixer

Automatically detect and repair character encoding corruption in text files. Supports UTF-8, Windows-1252, ISO-8859-1, and CP1252 with smart pattern recognition for multiple languages.

🚀 Quick Start

# Preview changes (safe)
python fix_encoding.py --dry-run

# Fix all files (creates backups)
python fix_encoding.py

# Fix specific directory
python fix_encoding.py --path "/path/to/project" --dry-run

⚡ What It Fixes

Transforms corrupted characters back to proper UTF-8 across languages:

❌ cafÃ©        → ✅ café        (French)
❌ espaÃ±ol     → ✅ español     (Spanish)  
❌ rÃ©sumÃ©     → ✅ résumé      (French)
❌ fÃ¼r         → ✅ für         (German)
❌ JosÃ©        → ✅ José        (Spanish)
❌ naÃ¯ve       → ✅ naïve       (English/French)
❌ donâ€™t      → ✅ don't       (Typography)
❌ â‚¬50        → ✅ €50         (Euro symbol)

🎯 The Universal Problem

Character encoding corruption affects developers worldwide when systems mix UTF-8 with legacy encodings (Windows-1252, ISO-8859-1). This tool automatically detects and fixes the most common corruption patterns across multiple languages.

Common scenarios:

🔄 Legacy system migrations
🗄️ Database export/import with wrong charset
📁 File uploads with encoding misdetection
🌐 Mixed hosting environment transfers

✨ Features

🛡️ Safe: Creates automatic backups before any changes
🧪 Preview Mode: --dry-run shows changes without modifying files
🔍 Multi-Encoding Detection: Handles UTF-8, Windows-1252, ISO-8859-1, CP1252 input
🌐 Multi-Language: Built-in patterns for German, French, Spanish, and more
🎯 Smart Filtering: Only processes text files (.html, .php, .css, .js, .xml, .json)
🎨 Visual Feedback: Colored output shows exactly what gets fixed
📦 Zero Dependencies: Uses only Python standard library
🖥️ Cross-Platform: Works on Windows, macOS, and Linux

📋 Requirements

Python 3.6 or higher
Required libraries (all part of Python standard library):
- os - Operating system functions
- sys - System-specific parameters
- argparse - Command-line argument parsing
- shutil - File operations
- pathlib - Path operations

🔍 Auto-Check Feature: The script automatically verifies all dependencies and Python version compatibility on startup. If anything is missing, you'll get a clear error message with instructions.

No pip installs, no external dependencies, no hassle!

📁 Repository Structure

python-encoding-fixer/
├── fix_encoding.py           # Main script
├── README.md                 # This file
├── examples/
│   ├── corrupted/           # Sample files with encoding issues
│   └── fixed/               # Expected results after processing
├── patterns/
│   ├── languages.json      # Language-specific corruption patterns
│   └── common.json          # Universal patterns
└── docs/
    └── encoding-guide.md    # Technical background

🔧 Usage Examples

Basic Usage

# Check what would be fixed
python fix_encoding.py --dry-run

# Fix files in current directory
python fix_encoding.py

Advanced Usage

# Fix specific project directory
python fix_encoding.py --path "/var/www/multilingual-site" --dry-run
python fix_encoding.py --path "/var/www/multilingual-site"

# Process only specific file types
python fix_encoding.py --extensions .html,.php,.css

Sample Output

==================================================
     Python Encoding Fixer v2.0
==================================================

Checking system requirements...
✓ All required modules found.
✓ Python 3.9.7 is compatible.

Multi-platform encoding repair started...
Directory: ./website
Dry-Run Mode: False

Checking: contact.php
  → cafÃ© → café (3 times, French)
  → JosÃ© → José (1 time, Spanish)  
  → fÃ¼r → für (2 times, German)
  ✓ File repaired (6 corrections)

Checking: product_descriptions.html
  → â‚¬ → € (12 times, Euro symbol)
  → donâ€™t → don't (4 times, Typography)
  ✓ File repaired (16 corrections)

=== SUMMARY ===
Files checked: 47
Files changed: 18
Total corrections: 127
Languages detected: German, French, Spanish

Backups created as .backup files.
All encoding issues resolved! ✓

🛡️ Safety & Recovery

Always create backups first! The script automatically creates .backup files, but you should also backup your entire project.

Restore from backups:

# Single file
cp file.php.backup file.php

# All files (Linux/Mac)
for backup in *.backup; do cp "$backup" "${backup%.backup}"; done

# Windows
for %f in (*.backup) do copy "%f" "%~nf"

🎯 Common Use Cases

Legacy Website Migration: Fix encoding issues from old CMS systems
Database Export Cleanup: Repair corrupted text in SQL dumps
Multilingual Sites: Clean up encoding problems from mixed hosting environments
Content Management: Fix encoding issues in WordPress, Drupal, etc.
API Data Processing: Clean up text data from various sources

🔍 Technical Details

Supported Input Encodings

UTF-8 (with/without BOM)
Windows-1252 (Western European)
ISO-8859-1 (Latin-1)
CP1252 (Windows Western European)

File Types Processed

.php - PHP files
.html, .htm - HTML files
.css - Stylesheets
.js - JavaScript files
.xml - XML files
.json - JSON files

Output

Always UTF-8 without BOM
Preserves file structure and permissions
Creates .backup files for safety

⚠️ Important Warnings & Disclaimers

🚨 USE AT YOUR OWN RISK - NO WARRANTY PROVIDED

⚠️ ALWAYS CREATE BACKUPS BEFORE RUNNING THE SCRIPT

⚠️ TEST WITH --dry-run FIRST TO PREVIEW CHANGES

⚠️ VERIFY RESULTS THOROUGHLY BEFORE DELETING BACKUP FILES

This tool performs automated text manipulation which can have unexpected results. While extensively tested, encoding corruption patterns can be complex and context-dependent. The script creates automatic backups, but you should maintain your own backup strategy.

📋 Legal Disclaimer - No Warranty:

By using this software, you acknowledge that:

You use it entirely at your own risk
No warranty or guarantee is provided
You are responsible for data backup and verification
The developers are not liable for any data loss or corruption
This software is provided "AS IS" without any express or implied warranties

Always follow the safety workflow:

✅ Backup your entire project manually
✅ Run with --dry-run first to preview changes
✅ Test on a small subset of files
✅ Verify results before proceeding with full dataset
✅ Keep backup files until you're certain results are correct

🤝 Contributing

Found an encoding pattern that's not covered? Please open an issue with:

The corrupted text example
The expected correct text
Context (file type, source system, language)

Pull requests welcome for:

Additional language patterns
Performance improvements
Cross-platform compatibility enhancements

📄 License

MIT License - see LICENSE file for details.

🏷️ Repository Name Suggestion

Repo Name: python-encoding-fixer

Alternative Names:

multi-encoding-fixer
utf8-corruption-repair
text-encoding-cleaner

GitHub Description: "Python tool for automatic character encoding repair. Fixes corrupted UTF-8 text (cafÃ© → café, fÃ¼r → für) across multiple languages. Zero dependencies, cross-platform. Use at your own risk - always backup first!"

This tool addresses the universal challenge of character encoding corruption in multilingual text processing. Built with Python best practices for reliability and cross-platform compatibility.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
Readme.MD		Readme.MD
international_encoding_fixer.py		international_encoding_fixer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌍 Python Encoding Fixer

🚀 Quick Start

⚡ What It Fixes

🎯 The Universal Problem

✨ Features

📋 Requirements

📁 Repository Structure

🔧 Usage Examples

Basic Usage

Advanced Usage

Sample Output

🛡️ Safety & Recovery

Restore from backups:

🎯 Common Use Cases

🔍 Technical Details

Supported Input Encodings

File Types Processed

Output

⚠️ Important Warnings & Disclaimers

🤝 Contributing

📄 License

🏷️ Repository Name Suggestion

About

Uh oh!

Releases

Packages

Languages

License

Rigel-Computer/python-encoding-fixer

Folders and files

Latest commit

History

Repository files navigation

🌍 Python Encoding Fixer

🚀 Quick Start

⚡ What It Fixes

🎯 The Universal Problem

✨ Features

📋 Requirements

📁 Repository Structure

🔧 Usage Examples

Basic Usage

Advanced Usage

Sample Output

🛡️ Safety & Recovery

Restore from backups:

🎯 Common Use Cases

🔍 Technical Details

Supported Input Encodings

File Types Processed

Output

⚠️ Important Warnings & Disclaimers

🤝 Contributing

📄 License

🏷️ Repository Name Suggestion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages