A powerful Python-based automation framework for Windows desktop applications using UI automation, OCR, and file operations. Executes YAML-based recipes with comprehensive retry logic, logging, and error handling.
- UI Automation: Native Windows UI automation using pywinauto with UIA backend
- OCR Support: Optical character recognition with pytesseract for text extraction
- File Operations: Secure file system operations with path validation
- Process Management: Application lifecycle management with idempotency
- YAML Recipes: Human-readable automation scripts with variable substitution
- Robust Error Handling: Comprehensive retry logic with exponential backoff
- Rich Logging: Structured JSON logging with screenshot capture on failure
- CLI Interface: Easy-to-use command line interface with progress tracking
- Windows 11 (recommended) or Windows 10
- Python 3.11+
- PowerShell 7+
- Required Python packages (see requirements.txt)
-
Clone the repository:
git clone <https://github.com/SimplyAISolution/windows_desktop_automator.git> cd windows_desktop_automator
-
Install Python dependencies:
pip install -r requirements.txt
-
Optional: Install Tesseract OCR (for enhanced text recognition):
- Download from: https://github.com/tesseract-ocr/tesseract
- Add to system PATH or specify path in configuration
python validate_recipe.py recipes\notepad_excel.yaml
# With dependencies installed:
python automator_cli.py run recipes\notepad_excel.yaml
# Dry run (validation only):
python automator_cli.py run recipes\notepad_excel.yaml --dry-run
python automator_cli.py list-providers
Recipes are YAML files that describe automation workflows:
name: "my_automation"
description: "Example automation recipe"
version: "1.0"
variables:
app_name: "notepad.exe"
demo_text: "Hello World!"
steps:
- name: "Launch Application"
action: "launch"
target:
app: "${app_name}"
timeout: 15
retry_attempts: 3
- name: "Wait for Window"
action: "wait_for"
target:
window:
name: "Untitled - Notepad"
timeout: 10
- name: "Type Text"
action: "type"
target:
element:
control_type: "Edit"
class_name: "Edit"
text: "${demo_text}"
verify_after: true
launch
: Start applicationswait_for
: Wait for windows/elements to appearverify
: Verify element states
click
: Click UI elements (left, right, double)type
: Type text into elementshotkey
: Send keyboard shortcutsread_text
: Extract text from elements
file_read
: Read text filesfile_write
: Write text filesfile_copy
: Copy files
screenshot
: Capture screen/window screenshotsocr_text
: Extract text using OCR
Target UI elements using multiple strategies:
element:
automation_id: "btn_submit" # Most reliable
control_type: "Button" # Additional specificity
element:
control_type: "Edit"
name: "Username"
class_name: "TextBox"
element:
name: "Submit" # Text-based matching
index: 0 # Position-based selection
automator/core/main.py
: CLI orchestrator with retry logicautomator/core/dsl.py
: Recipe schema and validationautomator/core/logger.py
: Structured logging with screenshots
automator/providers/ui.py
: UI automation (pywinauto + UIA)automator/providers/process.py
: Application lifecycleautomator/providers/fs.py
: File system operationsautomator/providers/ocr.py
: Optical character recognition
- JSON format logs in
artifacts/logs/
- Screenshots on failure in
artifacts/screens/
- Step-by-step execution tracking
{
"step_id": "20241227_143022_001",
"action": "click_element",
"target": "Submit Button",
"phase": "SUCCESS",
"timestamp": "2024-12-27T14:30:22.123456",
"result": "Element clicked successfully"
}
pytest tests/
python validate_recipe.py recipes/your_recipe.yaml
# Test with dry run
python automator_cli.py run recipes/notepad_excel.yaml --dry-run
- Path Validation: Restricts file operations to allowed directories
- Process Isolation: Safe application lifecycle management
- Secret Handling: No credentials in logs (use .env files)
- Error Sanitization: Removes sensitive data from error messages
Use variables for dynamic content:
variables:
username: "testuser"
data_file: "artifacts/test_data.txt"
timestamp: "${current_datetime}"
steps:
- name: "Login"
action: "type"
target:
text: "${username}"
- Use AutomationId when available (highest reliability)
- Combine multiple selectors for specificity
- Test selectors thoroughly across different UI states
- Include verification steps after critical actions
- Use appropriate timeouts (5-15 seconds typical)
- Enable continue_on_failure for non-critical steps
- Add meaningful step names for debugging
- Set realistic retry counts (3-5 attempts)
- Use progressive delays between retries
- Include cleanup steps in recipes
- Monitor log outputs for patterns
Element Not Found
Solution: Verify element selectors using Windows Spy tools
- Use Inspect.exe (Windows SDK)
- Check AutomationId, ControlType, ClassName
- Consider timing issues (add wait_for steps)
Application Launch Timeout
Solution: Increase timeout or verify application path
- Check application is installed and accessible
- Verify no permission issues
- Consider antivirus interference
OCR Not Working
Solution: Install and configure Tesseract OCR
- Download from: https://github.com/tesseract-ocr/tesseract
- Add to system PATH
- Verify with: tesseract --version
Enable verbose logging:
python automator_cli.py run recipes/your_recipe.yaml --verbose
Extend the framework by creating custom providers:
from automator.core.logger import automator_logger
class CustomProvider:
def custom_action(self, target, **kwargs):
step_id = automator_logger.log_step_start("custom_action", target)
try:
# Your implementation here
result = self.do_something()
automator_logger.log_step_success(step_id, "custom_action", target, result)
return True
except Exception as e:
automator_logger.log_step_failure(step_id, "custom_action", target, e)
return False
Use the automator in automated testing pipelines:
# PowerShell script for CI
$result = python automator_cli.py run recipes/smoke_test.yaml
if ($LASTEXITCODE -eq 0) {
Write-Host "โ
Automation tests passed"
} else {
Write-Host "โ Automation tests failed"
exit 1
}
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section above
- Review logs in
artifacts/logs/
- Create an issue with recipe and log files
- Include system information (Windows version, Python version)
Built with โค๏ธ for Windows automation