A comprehensive Python toolkit designed to automate common localization tasks for language professionals. This collection of scripts streamlines word counting, cost estimation, and quality assurance across multiple file formats.
This toolkit was created by a localization project manager to address real-world workflow challenges in the gaming and software localization industry. Whether you're managing translation projects, performing QA, or analyzing content volume, these tools help automate repetitive tasks and improve accuracy.
- Multi-Format Counter (multi_format_counter.py) - Count words across various file formats (json, xml, pdf, docx) in a single operation
- GUI Counter (gui_counter.py) - Desktop application with file uploader, visual word count analysis, and Excel report export
- Streamlit Counter (streamlit_counter.py) - Web-based interface for word count analysis with CSV export
- Advanced Streamlit Counter (streamlit_counter_advanced_counter.py) - Enhanced web interface with target language-specific cost calculations and Excel export
- QA Auditor (qa_tools/qa_auditor.py) - Automated quality checks comparing source and target files
- Missing placeholder detection
- String length validation for UI constraints
- HTML tag corruption checks
- Excel report generation with flagged issues
- Sample File Generator (excel_counter/create_sample_excel_files.py) - Creates sample Excel files for testing (game strings: skill descriptions, dialogues, UI strings)
- Excel Column Counter (excel_counter/excel_column_counter_with_tag_stripping.py) - Counts words from source columns while ignoring markup tags, with cost calculation and Excel export
Python 3.7+# Clone this repository:
git clone https://github.com/InYoungee/localization-workflow-toolkit.git
cd localization-workflow-toolkit
# Install required dependencies:
pip install -r requirements.txtMulti-Format Counter
python multi_format_counter.pyBatch process and count words across multiple file formats (e.g., .txt, .json, .xml/xlf, .docx, .pdf) in a single operation. Ideal for quickly analyzing diverse localization file types without format-specific tools.
GUI Counter (Desktop Application)
python gui_counter.pyUpload files through the interface, view analysis, and export Excel reports.
Streamlit Web Counter
streamlit run streamlit_counter.pyAccess the web interface at http://localhost:8501 to upload files and export CSV reports.
Advanced Counter with Cost Calculation
streamlit run streamlit_counter_advanced_counter.pyIncludes target language-specific pricing and Excel export functionality.
Run QA Auditor
python qa_tools/qa_auditor.pyChecks target files against source files and generates detailed Excel reports with flagged issues.
Generate Sample Files
python excel_counter/create_sample_excel_files.pyCreates sample game localization files (KO→EN & JP) with string IDs and info comments.
Count Excel Column Words
python excel_counter/excel_column_counter_with_tag_stripping.pyAnalyzes source column word counts while stripping markup tags, includes cost estimation.
localization-workflow-toolkit/
├── word_counter/
│ ├── test_files # Sameple files
│ ├── gui_counter.py
│ ├── multi_format_counter.py
│ ├── streamlit_advanced_counter.py
│ └── streamlit_counter.py
├── qa_tools/
│ ├── qa_auditor.py
│ ├── qa_en-US.json # Sameple file
│ └── qa_ko-KR.json # Sameple file
├── excel_counter/
│ ├── sample_excel_files # Sameple files
│ ├── create_sample_excel_files.py
│ └── excel_column_counter_with_tag_stripping.py
├── README.md
├── .gitignore
└── requirements.txt
- Project Managers: Quickly estimate translation costs and volume across multiple formats
- QA Engineers: Automate quality checks for common localization issues
- Freelance Translators: Calculate word counts and generate client reports
- Game Localization Teams: Process multi-column Excel files with tagged content
- Python 3.x
- Streamlit (Web interfaces)
- Tkinter (GUI applications)
- pandas (Data processing)
- openpyxl (Excel operations)
This toolkit leverages regex extensively for:
- Tag Stripping: Removes HTML/XML tags while preserving text content for accurate word counts
- Placeholder Detection: Identifies patterns like
{0},%s,${variable}in QA checks - Format Recognition: Automatically detects file formats and content structures
- String Validation: Checks for malformed tags and syntax errors
Example regex patterns used:
- HTML tag removal:
<[^>]+>
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
Inyoung Kim
- LinkedIn: https://www.linkedin.com/in/inyoungee/
- GitHub: @InYoungee
Built from real-world experience in gaming localization to help the broader localization community work more efficiently.
If you find this toolkit helpful, please consider giving it a ⭐ on GitHub!
