Educational Python application demonstrating local observability, metrics collection, alerting system, log aggregation, performance dashboard, SLA monitoring, and automated incident response with SQLite3 storage.
- System Metrics - CPU, memory, disk usage
- Application Metrics - Custom metrics with labels
- Metric Storage - SQLite persistence
- Metric Aggregation - Sum, avg, min, max, percentiles
- Historical Data - Query past metrics
- Alert Rules - Threshold-based alerting
- Alert Evaluation - Automatic rule checking
- Alert Firing - Trigger alerts on violations
- Alert History - Store in SQLite
- Cooldown Period - Prevent alert spam
- Multiple Severities - Info, warning, critical
- Log Collection - Collect application logs
- Log Levels - INFO, WARNING, ERROR, CRITICAL
- Log Storage - SQLite persistence
- Log Querying - Search and filter logs
- Log Statistics - Count by level
- Log Retention - Auto-delete old logs
- Health Overview - Overall system health
- System Metrics - Real-time system stats
- Active Alerts - Current alerts
- Recent Errors - Latest error logs
- SLA Status - SLA compliance
- Text Rendering - ASCII dashboard
- SLA Definitions - Define service level targets
- Uptime Tracking - Track service availability
- Error Rate Tracking - Monitor error percentage
- SLA Compliance - Check against targets
- SLA History - Historical SLA data (SQLite)
- Incident Detection - Auto-detect issues
- Response Playbooks - Define response actions
- Automated Actions - Log, alert, restart, scale
- Incident History - Store in SQLite
- Response Tracking - Track all responses
git clone https://github.com/Amruth22/Python-Local-Observability-Platform.git
cd Python-Local-Observability-Platformpython -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txtpython main.pypython api/app.pypython tests.pyPython-Local-Observability-Platform/
β
βββ metrics/
β   βββ metrics_collector.py     # Metrics collection
β   βββ metrics_store.py         # SQLite storage
β   βββ metrics_aggregator.py    # Aggregation
β
βββ alerting/
β   βββ alert_manager.py         # Alert management
β   βββ alert_rules.py           # Rule definitions
β   βββ alert_store.py           # SQLite storage
β
βββ logging/
β   βββ log_aggregator.py        # Log collection
β   βββ log_parser.py            # Log parsing
β   βββ log_store.py             # SQLite storage
β
βββ dashboard/
β   βββ dashboard_data.py        # Data provider
β   βββ dashboard_renderer.py    # Text rendering
β
βββ sla/
β   βββ sla_monitor.py           # SLA monitoring
β   βββ sla_store.py             # SQLite storage
β
βββ incident/
β   βββ incident_detector.py     # Detection
β   βββ incident_responder.py    # Auto-response
β   βββ incident_store.py        # SQLite storage
β
βββ api/
β   βββ app.py                   # Flask API
β
βββ main.py                      # Demonstration
βββ tests.py                     # 10 unit tests
βββ README.md                    # This file
from metrics.metrics_collector import MetricsCollector
collector = MetricsCollector()
# Collect system metrics
system_metrics = collector.collect_system_metrics()
print(f"CPU: {system_metrics['cpu_usage_percent']}%")
# Collect custom metric
collector.collect_metric('http_requests', 150, labels={'method': 'GET'})
# Get metric
values = collector.get_metric('http_requests')from alerting.alert_manager import AlertManager
alert_manager = AlertManager()
# Add alert rule
alert_manager.add_rule(
    name='high_cpu',
    metric='cpu_usage',
    condition='greater_than',
    threshold=80,
    severity='warning'
)
# Evaluate rules
fired_alerts = alert_manager.evaluate_all_rules(metrics_collector)from logging.log_aggregator import LogAggregator
log_agg = LogAggregator('observability.db')
# Log messages
log_agg.log('INFO', 'Application started')
log_agg.log('ERROR', 'Database connection failed')
# Query logs
errors = log_agg.get_logs(level='ERROR', hours=24)from sla.sla_monitor import SLAMonitor
sla_monitor = SLAMonitor('observability.db')
# Define SLA
sla_monitor.define_sla('api_uptime', target=99.9)
# Record metric
sla_monitor.record_sla_metric('api_uptime', 99.95)
# Get status
status = sla_monitor.get_sla_status('api_uptime')
print(f"Compliant: {status['compliant']}")from incident.incident_responder import IncidentResponder
responder = IncidentResponder()
# Add playbook
responder.add_playbook('high_error_rate', [
    'log_incident',
    'send_alert',
    'restart_service'
])
# Respond to incident
actions = responder.respond('high_error_rate', context={'error_rate': 15})- POST /api/metrics/collect- Collect metric
- GET /api/metrics/<name>- Get metric values
- GET /api/metrics/system- Get system metrics
- GET /api/alerts- Get active alerts
- GET /api/logs- Get logs (filter by level)
- POST /api/logs/add- Add log entry
- GET /api/dashboard- Get dashboard data
- GET /api/dashboard/render- Get rendered dashboard
- GET /api/sla- Get SLA status
- GET /api/incidents- Get recent incidents
Run the comprehensive test suite:
python tests.py- β Metrics Collection - Test metric collection
- β Metrics Aggregation - Test aggregation functions
- β Alert Rules - Test rule evaluation
- β Alert Firing - Test alert triggering
- β Log Aggregation - Test log collection
- β Log Querying - Test log search
- β SLA Monitoring - Test SLA tracking
- β Incident Detection - Test incident detection
- β Incident Response - Test automated response
- β Aggregators - Test count, sum, avg
Metrics:
- Numerical measurements
- Time-series data
- Aggregatable
Logs:
- Event records
- Detailed context
- Searchable
Traces:
- Request flow
- Distributed tracing
- (Not implemented - advanced)
Benefits:
- Early problem detection
- Proactive response
- Reduce downtime
- Improve reliability
SLA Components:
- Target: What you promise (99.9% uptime)
- Actual: What you deliver
- Compliance: Meeting targets
For production use:
- 
Metrics: - Use Prometheus
- Implement exporters
- Add Grafana dashboards
 
- 
Alerting: - Use Alertmanager
- Integrate PagerDuty/Slack
- Implement escalation
 
- 
Logging: - Use ELK stack
- Implement log shipping
- Add log analysis
 
- 
Monitoring: - Distributed tracing
- APM tools
- Real-time dashboards
 
- Flask 3.0.0 - Web framework
- psutil 5.9.6 - System metrics
- python-dotenv 1.0.0 - Environment variables
- pytest 7.4.3 - Testing framework
- sqlite3 - Database (built-in)
This project is for educational purposes. Feel free to use and modify as needed.
Happy Monitoring! π