A comprehensive proactive monitoring and auto-healing system for DevOps environments, developed in Python with real-time web interface.
- π System Metrics: CPU, memory, disk, network
- π§ Services: Systemd service status (nginx, mysql, ssh, etc.)
- β±οΈ Configurable Interval: From 10 seconds to several minutes
- π Customizable Thresholds: Flexible alert threshold configuration
- π§ Email Notifications: HTML formatted alerts with built-in anti-spam
β οΈ Severity Levels: Warning and Critical
- π Service Restart: Automatic restart of stopped services
- π§Ή Intelligent Cleanup: Temporary files and cache management
- πΎ Memory Optimization: Termination of memory-hungry processes
- π Web Interface: Dashboard accessible at http://localhost:8050
- π Real-time Charts: Interactive visualizations with Plotly
- π Auto Refresh: Updates every 5 seconds
- Python 3.8 or higher
- Administrator access (for service monitoring)
- Port 8050 available
chmod +x start.sh
./start.sh.\start.ps1# Create virtual environment
python -m venv venv
# Activation (Linux/macOS)
source venv/bin/activate
# Activation (Windows)
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create directories
mkdir -p logsCreate a .env file:
# Monitoring
MONITORING_INTERVAL=10
CPU_THRESHOLD=80.0
MEMORY_THRESHOLD=85.0
DISK_THRESHOLD=90.0
# Services to monitor
MONITORED_SERVICES=cron,dbus,apache2,nginx,mysql
# Auto-healing
AUTO_HEALING_ENABLED=True
# Email (optional)
EMAIL_ALERTS_ENABLED=True
EMAIL_SMTP_SERVER=smtp.gmail.com
EMAIL_SENDER=your@email.com
EMAIL_SENDER_PASSWORD=your_app_password
EMAIL_RECIPIENTS=alert@company.compython main.pyThe system starts and the dashboard is accessible at: http://localhost:8050
python monitoring/monitor.pypython visualization/dashboard.pymonitoring-system/
βββ π main.py # Main entry point
βββ π requirements.txt # Python dependencies
βββ π .env # Configuration
βββ π start.ps1 # Windows installation
βββ π§ start.sh # Linux installation
β
βββ π monitoring/ # Monitoring modules
β βββ π₯οΈ system_monitor.py # System metrics
β βββ π service_monitor.py # Service monitoring
β βββ π¨ alert_manager.py # Alert management
β βββ π§ monitor.py # Main orchestrator
β
βββ π visualization/ # Interface
β βββ π dashboard.py # Web dashboard
β
βββ π config/ # Configuration
β βββ βοΈ settings.py # Application parameters
β
βββ π autohealing/ # Auto-healing
β βββ π§ service_healer.py # Service repair
β βββ π οΈ system_healer.py # System repair
β βββ π action_logger.py # Action logging
β βββ β‘ triggers.py # Triggers
β
βββ π utils/ # Utilities
β βββ π json_array_logger.py # JSON logger
β βββ π§ email_sender.py # Email sending
β
βββ π logs/ # Data
βββ π monitoring.json # Structured JSON logs
- Python 3.8+ - Main language
- Virtual Environment - Dependency isolation
- psutil - System metrics (CPU, memory, disk, network)
- subprocess - Systemd service management
- platform - OS detection and hardware information
- Dash - Web framework for analytical applications
- Plotly - Interactive real-time charts
- Pandas - Data manipulation and analysis
- Dash Bootstrap Components - Responsive UI
- JSON - Structured logging format
- datetime - Timestamp management
- threading - Parallel execution
- smtplib - Email sending via SMTP/TLS
- email.mime - HTML message formatting
- python-dotenv - Environment variables
- os - Filesystem interactions
π Monitoring cycle #1
π [2024-01-15 10:30:00] System metrics:
CPU: 45.2% | Memory: 67.8% | Disk: 82.1%
π§ Service status:
π’ cron: Active
π’ dbus: Active
π΄ nginx: Stopped
π¨ ALERTS:
π΄ Service nginx is stopped
π§ AUTO-HEALING ACTIONS:
β
service_restart: Service nginx restarted successfully
{
"timestamp": "2024-01-15T10:30:00.000000",
"event_type": "action",
"action_type": "service_restart",
"status": "SUCCESS",
"service": "nginx",
"message": "Service nginx restarted successfully"
}Solution: Adapt service_monitor.py to use PowerShell
Check:
- SMTP settings in
.env - Gmail app password
- Firewall/antivirus
Check:
- Port 8050 available
logs/monitoring.jsonfile exists
# Check logs
tail -f logs/monitoring.json
# Test metrics
python -c "import psutil; print(f'CPU: {psutil.cpu_percent()}%')"
# Check services
systemctl status nginx mysqlContributions are welcome!
- Fork the project
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
# Development installation
pip install -r requirements.txt
pip install -e .
# Tests (to be implemented)
python -m pytest tests/
# Code validation
flake8 monitoring/ visualization/ autohealing/- Docker container monitoring
- Database metrics
- Slack/Teams notifications
- Complete REST API
- Dashboard authentication
- Automatic PDF reports
- Native Windows support
- Real-time database
- High availability cluster
- Machine learning for adaptive thresholds
This project is licensed under the MIT License. See the LICENSE file for details.
- Development Team - Sen-Se1
- psutil - For the excellent system metrics library
- Plotly/Dash - For interactive visualizations
- Python - For the language and ecosystem
- Documentation: README.md
- Issues: GitHub Issues
- Email: mbarkihoussem99@gmail.com
Get Started β’ Documentation β’ Contribute β’ Issues
β Don't forget to star the project if you find it useful!