Skip to content

asadalpha/autowatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AutoWatch: Automated System Monitoring & Remediation

AutoWatch is a lightweight, self-healing infrastructure monitoring tool designed for SREs and System Administrators. It monitors system resources (CPU, Memory, Disk) and critical services, automatically attempting remediation and logging alerts when thresholds are breached.

Project Structure

autowatch/
β”œβ”€β”€ bin/
β”‚   β”œβ”€β”€ monitor.sh       # Main logic: checks metrics vs thresholds
β”‚   └── remediate.sh     # Action scripts: cleans disk, restarts services
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ thresholds.conf  # Define limits for CPU, RAM, Disk
β”‚   └── services.conf    # List of services to keep alive (nginx, ssh, etc.)
β”œβ”€β”€ alerts/
β”‚   └── notifier.py      # Python script to handle logging and notifications
β”œβ”€β”€ cron/
β”‚   └── autowatch.cron   # Cron job definition for continuous monitoring
β”œβ”€β”€ logs/
β”‚   β”œβ”€β”€ metrics.log      # Time-series data of system health
β”‚   └── alerts.log       # History of incidents and remediation actions
β”œβ”€β”€ runbooks/            # Documentation for manual incident resolution
└── setup.sh             # One-click installation script

Installation & Usage

  1. Initialize the Environment Run the setup script to create necessary directories and set permissions.

    ./setup.sh
  2. Configure Thresholds Edit config/thresholds.conf to set your desired limits.

    CPU_LIMIT=80
    MEM_LIMIT=75
    DISK_LIMIT=85
  3. Define Critical Services Add service names (as recognized by systemctl) to config/services.conf.

    nginx
    docker
    cron
    
  4. Run Manually Test the monitoring script.

    ./bin/monitor.sh
  5. Automate with Cron Link the cron job to run every 2 minutes.

    crontab cron/autowatch.cron

If ON Windows use wsl(Windows Subsystem for Linux)

How It Works

  1. Monitor: monitor.sh gathers current system stats.
  2. Evaluate: Uses bc for precise floating-point comparison against config.
  3. Alert: If a threshold is breached, notifier.py logs the incident to logs/alerts.log.
  4. Remediate:
    • Disk Full: Triggers remediate.sh disk to clean /tmp and vacuum logs.
    • Service Down: Triggers remediate.sh service <name> to restart the failed service.

πŸ“ Logs

  • metrics.log: 2025-12-23 20:00:00 cpu=12.5% mem=45.2% disk=60%
  • alerts.log: 2025-12-23 20:05:00 [ALERT] CPU usage critical: 92%

Built for reliability.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published