AutoWatch: Automated System Monitoring & Remediation

AutoWatch is a lightweight, self-healing infrastructure monitoring tool designed for SREs and System Administrators. It monitors system resources (CPU, Memory, Disk) and critical services, automatically attempting remediation and logging alerts when thresholds are breached.

Project Structure

autowatch/
├── bin/
│   ├── monitor.sh       # Main logic: checks metrics vs thresholds
│   └── remediate.sh     # Action scripts: cleans disk, restarts services
├── config/
│   ├── thresholds.conf  # Define limits for CPU, RAM, Disk
│   └── services.conf    # List of services to keep alive (nginx, ssh, etc.)
├── alerts/
│   └── notifier.py      # Python script to handle logging and notifications
├── cron/
│   └── autowatch.cron   # Cron job definition for continuous monitoring
├── logs/
│   ├── metrics.log      # Time-series data of system health
│   └── alerts.log       # History of incidents and remediation actions
├── runbooks/            # Documentation for manual incident resolution
└── setup.sh             # One-click installation script

Installation & Usage

Initialize the Environment Run the setup script to create necessary directories and set permissions.
```
./setup.sh
```
Configure Thresholds Edit config/thresholds.conf to set your desired limits.
```
CPU_LIMIT=80
MEM_LIMIT=75
DISK_LIMIT=85
```
Define Critical Services Add service names (as recognized by systemctl) to config/services.conf.
```
nginx
docker
cron
```
Run Manually Test the monitoring script.
```
./bin/monitor.sh
```
Automate with Cron Link the cron job to run every 2 minutes.
```
crontab cron/autowatch.cron
```

If ON Windows use wsl(Windows Subsystem for Linux)

How It Works

Monitor: monitor.sh gathers current system stats.
Evaluate: Uses bc for precise floating-point comparison against config.
Alert: If a threshold is breached, notifier.py logs the incident to logs/alerts.log.
Remediate:
- Disk Full: Triggers remediate.sh disk to clean /tmp and vacuum logs.
- Service Down: Triggers remediate.sh service <name> to restart the failed service.

📝 Logs

metrics.log: 2025-12-23 20:00:00 cpu=12.5% mem=45.2% disk=60%
alerts.log: 2025-12-23 20:05:00 [ALERT] CPU usage critical: 92%

Built for reliability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoWatch: Automated System Monitoring & Remediation

Project Structure

Installation & Usage

If ON Windows use wsl(Windows Subsystem for Linux)

How It Works

📝 Logs

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
alerts		alerts
bin		bin
config		config
cron		cron
runbooks		runbooks
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh

asadalpha/autowatch

Folders and files

Latest commit

History

Repository files navigation

AutoWatch: Automated System Monitoring & Remediation

Project Structure

Installation & Usage

If ON Windows use wsl(Windows Subsystem for Linux)

How It Works

📝 Logs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages