A Python application to retrieve and analyze editor contributions across Wikipedia's Medicine projects in multiple languages.
- 🌍 Multi-language support for Wikipedia projects
- 📊 Editor statistics aggregation and analysis
- 📝 WikiText report generation
- 🔄 Batch processing for large datasets
- 🔐 Secure database connections via Toolforge
- 📈 Comprehensive logging and error handling
- Python 3.9 or higher
- Access to Wikimedia Toolforge
~/replica.my.cnfcredential file configured
git clone https://github.com/MrIbrahem/med-status.git
cd med-statuspip install -r requirements.txtEnsure your ~/replica.my.cnf file exists with the following format:
[client]
user=your_username
password=your_passwordRun the complete analysis:
python start.py# Process specific languages only
python start.py --languages es,fr,de
# Set custom year
python start.py --year 2024
# Skip title retrieval (use existing data)
python start.py --skip-titles
# Generate reports only
python start.py --reports-only
# Enable debug logging
python start.py --log-level DEBUGmed-status/
├── start.py # Entry point
├── src/
│ ├── __init__.py
│ ├── services/
│ │ ├── __init__.py
│ │ ├── database.py # Database connection management
│ │ ├── processor.py # Data processing logic
│ │ ├── queries.py # SQL query templates
│ │ └── reports.py # Report generation
│ ├── workflow/
│ │ ├── __init__.py
│ │ ├── step1_retrieve_titles.py
│ │ ├── step2_process_languages.py
│ │ └── step3_generate_reports.py
│ ├── config.py # Configuration
│ └── utils.py # Helper functions
├── tests/
│ ├── unit/
│ │ ├── test_database.py
│ │ ├── test_processor.py
│ │ └── test_utils.py
│ └── integration/
│ ├── test_queries.py
│ └── test_workflow.py
├── languages/ # Article titles per language
├── editors/ # Editor statistics per language
├── reports/ # Generated WikiText reports
├── .github/
│ └── workflows/
│ ├── pytest.yml
│ └── lint.yml
├── pytest.ini
├── requirements.txt
├── requirements-dev.txt
├── README.md
├── LICENSE
└── .gitignore
{
"Article_Title_1": "Article Title 1",
"Article_Title_2": "Article Title 2"
}{
"Username1": 1234,
"Username2": 856,
"Username3": 421
}{| class="sortable wikitable"
!#
!User
!Count
|-
!1
|[[:w:es:user:Username1|Username1]]
|1,234
|-
!2
|[[:w:es:user:Username2|Username2]]
|856
|}{| class="sortable wikitable"
!#
!User
!Count
!Wiki
|-
!1
|[[:w:es:user:Username1|Username1]]
|1,234
|es
|}# Run all tests
pytest
# Run unit tests only
pytest tests/unit -m unit
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/unit/test_database.py -v# Format code
black src tests
# Sort imports
isort src tests
# Lint code
flake8 src tests
pylint src
# Type checking
mypy srcInstall pre-commit hooks:
pip install pre-commit
pre-commit installEdit src/config.py to customize:
- Target years for analysis
- Batch size for processing
- Output directories
- Database connection parameters
- Logging settings
# Example config.py
LAST_YEAR = "2024"
BATCH_SIZE = 100
MAX_CONNECTIONS = 5- Retrieve Medicine titles from English Wikipedia
- Get database mappings from meta_p
- Query editor statistics for each language
- Generate per-language reports in WikiText format
- Create global summary report across all languages
Error: max_user_connections exceeded
Solution: Use context managers (with statements) to ensure connections are properly closed.
Error: Query execution timeout
Solution: Reduce batch size in config.py or increase timeout settings.
Error: Access denied for user
Solution: Verify ~/replica.my.cnf credentials and permissions.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Write tests for new features
- Follow PEP 8 style guidelines
- Update documentation as needed
- Ensure all tests pass before submitting PR
This project is licensed under the MIT License - see the LICENSE file for details.
- Wikimedia Foundation for providing database access
- Wikipedia Medicine project contributors
- Toolforge infrastructure team
For issues and questions:
- Open an issue on GitHub
- Contact the maintainers
- Add command-line progress bars
- Export to CSV and HTML formats
- Generate visualization graphs
- Add editor activity timeline analysis
- Compare year-over-year trends
- Email notification on completion
- Web dashboard for results
If you use this tool in your research, please cite:
@software{wikipedia_medicine_2025,
author = {Your Name},
title = {Wikipedia Medicine Editor Analysis},
year = {2025},
url = {https://github.com/MrIbrahem/med-status}
}