This is an on-prem monitoring tool written completely in very clear Python-only code (so you can modify it) and is designed to work on a LAN for on-prem availability monitoring of resources that aren't necesarilly connected to The Internet, and/or where the on-prem monitoring itself is also required to have availability guarantees.
It supports multi-threading of the availability checking of monitored resources for high speed near-realtime performance, if that is what you need (see the -t command line option). The default operation mode is single-threaded for log clarity that runs on small systems like a Raspberry Pi.
It also supports pacing of monitoring alarms using a decaying curve that delivers alert notifications quickly at the start, then slows down notifications over time.
APMonitor.py (APMonitor) is primarily designed to work in tandem with Site24x7 and integrates very well with their "Heartbeat Monitoring".
To achieve guaranteed always-on monitoring service levels, simply setup local availability monitors in your config, sign-up for a Pro Plan at Site24x7 then use heartbeat_url and heartbeat_every_n_secs configuration options to APMonitor.py to ping a Heartbeat Monitoring URL endpoint at Site24x7 when the monitored resource is up. This then ensures that when a heartbeat doesn't arrive from APMonitor, monitoring alerts fall back to Site24x7, and when both are working you have second-opinion availability monitoring reporting.
The service level guarantee works as follows: If the resource is down, APMonitor.py won't hit the Heartbeat Monitoring endpoint URL, and Site24x7 will then send an alert about the missed heartbeat without the need for any additional dependencies on-prem/on-site. So the entire machine APMonitor.py is running on can fall over, and you still get availability monitoring alerts sent, with all the benefits of having on-prem monitoring on your local network behind your firewall.
You can quickly signup for a Site24x7.com Lite or Pro Plan for $10-$50 USD per month, then setup a bunch of Heartbeat Monitoring URL endpoints that works with APMonitor.py rather easily.
Note: Heartbeat Monitoring is not available on their Website Monitoring plans. You need an 'Infrastructure Monitoring' or 'All-In-One' plan for it to work correctly.
APMonitor also integrates well with Slack and Pushover via webhook URL endpoints, and supports email notifications via SMTP.
APMonitor is a neat way to guarantee your on-prem availability monitoring will always let you know about an outage and to avoid putting resources onto the net that don't need to be.
ap
To put APMonitor into near-realtime mode so that it checks resources multiple times per second, use these global settings:
- Dial up threads with
-t 15on the command line ormax_threads: 15in the site config, - set
max_retriesto1and - dial down
max_try_secsto10or15seconds
for real-time environments.
You do need to configure Site24x7's Hearbeat Monitoring to achieve high-availability second opinion availability monitoring.
As an exemplar, for the following monitored resource:
monitors:
- type: http
name: home-nas
address: https://192.168.1.12/api/bump
expect: "(C) COPYRIGHT 2005, Super NAS Storage Inc."
ssl_fingerprint: a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890
heartbeat_url: https://plus.site24x7.com/hb/your-unique-heartbeat-id/homenas
heartbeat_every_n_secs: 300Setup Site24x7 as follows:
This will send a heartbeat to Site24x7 every 5 minutes, and Site24x7 will drop an alarm whenever a heartbeat doesn't arrive or arrives out of sequence +/- 1 minute. This ensures availability monitoring will always function, even when one of APMonitor or Site24x7 is down.
This also means you don't need to expose internal LAN network resources to The Internets.
See Site24x7 docs for more info:
You might also want to consider alarm notification pacing, so that recently down resources generate more frequent messages, whilst long outages are notified less frequently. To enable:
- Set
notify_every_n_secsto3600seconds (i.e., 1 hour), and - Set
after_every_n_notificationsto8,
which will slow alarms down to one per hour after 8 notifications.
An alternate config for monitored resources that have long outages is as follows:
- Set
notify_every_n_secsto43200(i.e., 12 hours), and - Set
after_every_n_notificationsto6,
which will slow alarms down to one every 12 hours after 6 notifications, which means after a few days you will only get at most one alarm whilst asleep.
To see how the alarm pacing will accelerate then subsequently delay notifications, use the example calculations spreadsheet in 20151122 Reminder Timing with Quadratic Bezier Curve.xlsx to experiment with various configuration scenarios:
Note that alarm pacing can be set at a global level in the site: config, and is overridden when set at a per monitored resource level in the monitors: section of the config.
APMonitor uses a YAML or JSON configuration file to define the site being monitored and the resources to check. The configuration consists of two main sections: site-level settings that apply globally, and per-monitor settings that define individual resources to check.
Here's a complete example showing all available configuration options:
site:
name: "HomeLab"
email_server:
smtp_host: "smtp.gmail.com"
smtp_port: 587
smtp_username: "alerts@example.com"
smtp_password: "app_password_here"
from_address: "alerts@example.com"
use_tls: true
outage_emails:
- email: "admin@example.com"
email_outages: true
email_recoveries: true
email_reminders: true
- email: "manager@example.com"
email_outages: yes
email_recoveries: yes
email_reminders: no
outage_webhooks:
- endpoint_url: "https://api.pushover.net/1/messages.json"
request_method: POST
request_encoding: JSON
request_prefix: "token=your_app_token&user=your_user_key&message="
request_suffix: ""
max_threads: 1
max_retries: 3
max_try_secs: 20
check_every_n_secs: 60
notify_every_n_secs: 600
after_every_n_notifications: 1
monitors:
- type: ping
name: home-fw
address: "192.168.1.1"
check_every_n_secs: 60
email: true
heartbeat_url: "https://hc-ping.com/uuid-here"
heartbeat_every_n_secs: 300
- type: http
name: in3245622
address: "http://192.168.1.21/Login?oldUrl=Index"
expect: "System Name: <b>HomeLab</b>"
check_every_n_secs: 120
notify_every_n_secs: 3600
after_every_n_notifications: 5
email: yes
- type: http
name: nvr0
address: "https://192.168.1.12/api/system"
expect: "nvr0"
ssl_fingerprint: "a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890"
ignore_ssl_expiry: true
email: false
heartbeat_url: "https://plus.site24x7.com/hb/uuid/nvr0"
heartbeat_every_n_secs: 60
- type: quic
name: fast-api
address: "https://192.168.1.50/api/health"
expect: "ok"
check_every_n_secs: 30The site section defines global settings for the monitoring site.
name(string): The name of the site being monitored. Used in notification messages to identify which site is reporting issues.
site:
name: "HomeLab"email_server(object, optional): SMTP server configuration for sending email notifications. Required ifoutage_emailsis configured.
email_server:
smtp_host: "smtp.gmail.com"
smtp_port: 587
smtp_username: "alerts@example.com"
smtp_password: "app_password_here"
from_address: "alerts@example.com"
use_tls: truesmtp_host(string, required): SMTP server hostname or IP addresssmtp_port(integer, required): SMTP server port (typically 587 for TLS, 465 for SSL, 25 for unencrypted). Must be between 1 and 65535smtp_username(string, optional): SMTP authentication username. Not required for servers without authenticationsmtp_password(string, optional): SMTP authentication password. Not required for servers without authentication. Use app-specific passwords for Gmail/Google Workspacefrom_address(string, required): Email address to use in the "From" field. Must be a valid email addressuse_tls(boolean, optional): Whether to use TLS/STARTTLS encryption. Default: true
Note: For Gmail/Google Workspace, you must use an app-specific password rather than your account password. Port 587 with use_tls: true is the recommended configuration for most SMTP servers. For servers without authentication (like local SMTP relays), omit smtp_username and smtp_password.
outage_emails(list of objects, optional): Email addresses to notify when resources go down or recover. Requiresemail_serverto be configured. Each entry is an object with anemailfield and optional notification control flags.
outage_emails:
- email: "admin@example.com"
email_outages: true
email_recoveries: true
email_reminders: true
- email: "oncall@example.com"
email_outages: yes
email_recoveries: noemail(string, required): Valid email address matching standard email formatemail_outages(boolean/integer/string, optional): Send email when resource goes down. Accepts:true/yes/on/1(case-insensitive) for enabled,false/no/off/0for disabled. Default: trueemail_recoveries(boolean/integer/string, optional): Send email when resource recovers. Accepts same values asemail_outages. Default: trueemail_reminders(boolean/integer/string, optional): Send email for ongoing outage reminders (respectingnotify_every_n_secsthrottling). Accepts same values asemail_outages. Default: true
Note: These email control flags allow fine-grained control over which notifications each recipient receives. For example, operations staff might want all notifications (email_outages: true, email_recoveries: true, email_reminders: true), while management might only want initial outage alerts (email_outages: true, email_recoveries: false, email_reminders: false).
outage_webhooks(list of objects, optional): Webhook endpoints to call when resources go down or recover. Each webhook requires several configuration fields.
outage_webhooks:
- endpoint_url: "https://api.example.com/alerts"
request_method: POST
request_encoding: JSON
request_prefix: ""
request_suffix: ""-
endpoint_url(string, required): Valid URL with scheme and host for the webhook -
request_method(string, required): HTTP method, must beGETorPOST -
request_encoding(string, required): Message encoding format:URL: URL-encode the message (for query parameters or form data)HTML: HTML-escape the messageJSON: Send as JSON object withmessagefield (POST only)CSVQUOTED: CSV-quote the message for comma-separated values
-
request_prefix(string, optional): String to prepend to encoded message (e.g., API tokens, field names) -
request_suffix(string, optional): String to append to encoded message -
max_threads(integer, optional): Number of concurrent threads for checking resources in parallel. Must be ≥ 1. Default: 1 (single-threaded). Can be overridden by command line-toption.
max_threads: 1Note: For near-realtime monitoring environments, set max_threads to 5-15 to enable parallel checking of multiple resources. Single-threaded mode (1) is recommended for small systems like Raspberry Pi or when log clarity is important. This setting is overridden by the -t command line argument if specified.
max_retries(integer, optional): Number of times to retry failed checks before marking resource as down. Must be ≥ 1. Default: 3
max_retries: 3Note: For near-realtime monitoring, set max_retries: 1 to reduce detection latency. Higher values (3-5) are better for unstable networks where transient failures are common.
max_try_secs(integer, optional): Timeout in seconds for each individual check attempt. Must be ≥ 1. Default: 20
max_try_secs: 20check_every_n_secs(integer, optional): Default seconds between checks for all monitors. Individual monitors can override this with their owncheck_every_n_secssetting. Must be ≥ 1. Default: 60
check_every_n_secs: 300Note: This sets the baseline check interval for all monitors. Can be overridden per-monitor for resources requiring different check frequencies. When a monitor's configuration changes (detected via SHA-256 checksum), it is checked immediately regardless of this interval.
notify_every_n_secs(integer, optional): Default minimum seconds between outage notifications for all monitors. Individual monitors can override this with their ownnotify_every_n_secssetting. Must be ≥ 1. Default: 600
notify_every_n_secs: 1800Note: This sets the baseline notification throttling interval. Combined with after_every_n_notifications, controls the notification escalation curve for all monitors unless overridden per-monitor.
after_every_n_notifications(integer, optional): Default number of notifications after which the notification interval reachesnotify_every_n_secsfor all monitors. Individual monitors can override this with their ownafter_every_n_notificationssetting. Must be ≥ 1. Default: 1 (constant notification intervals)
after_every_n_notifications: 1Note: When set to a value > 1, notification intervals start shorter and gradually increase following a quadratic Bezier curve until reaching notify_every_n_secs after the specified number of notifications. This provides more frequent alerts at the start of an outage when immediate attention is needed, then reduces notification frequency as the outage continues. A value of 1 maintains constant notification intervals (original behavior).
The monitors section is a list of resources to monitor. Each monitor defines what to check and how often.
-
type(string): Type of check to perform. Must be one of:ping: ICMP ping checkhttp: HTTP/HTTPS endpoint check (supports both HTTP and HTTPS schemes, follows and checks redirect chain for errors)quic: HTTP/3 over QUIC endpoint check (UDP-based, faster than HTTP/HTTPS for high-latency networks)
-
name(string): Unique identifier for this monitor. Must be unique across all monitors in the configuration. Used in notifications and state tracking. -
address(string): Resource to check. Format depends on monitor type:- For
ping: Valid hostname, IPv4, or IPv6 address - For
http/quic: Full URL with scheme and host
- For
check_every_n_secs(integer, optional): Seconds between checks for this resource. Overrides site-levelcheck_every_n_secs. Must be ≥ 1. Default: 60 (or site-level setting if configured)
check_every_n_secs: 300Note: When a monitor's configuration changes (any field modification), the monitor is checked immediately on the next run regardless of this interval. Configuration changes are detected via SHA-256 checksum stored in the state file.
notify_every_n_secs(integer, optional): Minimum seconds between outage notifications while resource remains down. Must be ≥ 1 and ≥check_every_n_secs. Default: 600
notify_every_n_secs: 1800after_every_n_notifications(integer, optional): Number of notifications after which the notification interval reachesnotify_every_n_secsfor this specific monitor. Overrides site-levelafter_every_n_notifications. Can only be specified ifnotify_every_n_secsis present. Must be ≥ 1.
notify_every_n_secs: 3600
after_every_n_notifications: 5Behavior: Notification timing follows a quadratic Bezier curve—intervals start shorter and gradually increase over the first N notifications until reaching the full notify_every_n_secs interval. After N notifications, the interval remains constant at notify_every_n_secs. This provides aggressive early alerting that tapers off as outages persist.
email(boolean/integer/string, optional): Master switch to enable/disable email notifications for this specific monitor. Accepts:true/yes/on/1(case-insensitive) for enabled,false/no/off/0for disabled. Default: true (enabled ifemail_serverconfigured)
email: trueNote: When set to false, this monitor will not send any email notifications regardless of site-level outage_emails configuration. Useful for non-critical resources or during maintenance windows. This is a monitor-level override that takes precedence over all other email settings.
heartbeat_url(string, optional): URL to ping (HTTP GET) when resource check succeeds. Useful for external monitoring services like Site24x7 or Healthchecks.io. Must be valid URL with scheme and host.
heartbeat_url: "https://hc-ping.com/your-uuid-here"heartbeat_every_n_secs(integer, optional): Seconds between heartbeat pings. Must be ≥ 1. Can only be specified ifheartbeat_urlis present. If not specified, heartbeat is sent on every successful check.
heartbeat_every_n_secs: 300These fields are only valid for monitors with type: http or type: quic:
expect(string, optional): Substring that must appear in the HTTP response body for the check to succeed. If not present, any 200 OK response is considered successful. The check performs a simple string search—if the expected content appears anywhere in the response body, the check passes.
expect: "System Name: <b>HomeLab</b>"Note: The expect field is string-only for simplicity. It performs exact substring matching (case-sensitive). For complex validation scenarios requiring status code checks, header validation, or regex matching, consider using external monitoring tools or extending APMonitor.
ssl_fingerprint(string, optional): SHA-256 fingerprint of the expected SSL/TLS certificate (with or without colons). Enables certificate pinning for self-signed certificates. When specified, the certificate is verified before making the HTTP request.
ssl_fingerprint: "e85260e8f8e85629cfa4d023ea0ae8dd3ce8ccc0040b054a4753c2a5ab269296"ignore_ssl_expiry(boolean/integer/string, optional): Skip SSL/TLS certificate expiration checking. Accepts:true/1/"yes"/"ok"(case-insensitive) for true, orfalse/0/"no"for false. Useful for development environments or when certificate renewal is managed separately.
ignore_ssl_expiry: truePing Monitor:
- type: ping
name: home-gateway
address: "192.168.1.1"
check_every_n_secs: 60
heartbeat_url: "https://hc-ping.com/uuid-here"HTTP Monitor with Content Check:
- type: http
name: web-server
address: "http://192.168.1.100/health"
expect: "status: ok"
check_every_n_secs: 120
notify_every_n_secs: 3600HTTPS Monitor with Certificate Pinning:
- type: http
name: nvr0
address: "https://192.168.1.12/api/system"
expect: "nvr0"
ssl_fingerprint: "e85260e8f8e85629cfa4d023ea0ae8dd3ce8ccc0040b054a4753c2a5ab269296"
ignore_ssl_expiry: true
heartbeat_url: "https://plus.site24x7.com/hb/uuid/nvr0"
heartbeat_every_n_secs: 60QUIC Monitor (HTTP/3):
- type: quic
name: fast-api
address: "https://api.example.com/health"
expect: "healthy"
check_every_n_secs: 30
ssl_fingerprint: "a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890"Note: QUIC monitoring uses HTTP/3 over UDP (port 443 by default) and is particularly effective for high-latency networks or when monitoring resources over unreliable connections. QUIC provides built-in connection migration and improved performance compared to TCP-based HTTP/2.
The configuration validator enforces these rules:
- Monitor names must be unique across all monitors
notify_every_n_secsmust be ≥check_every_n_secsif both specifiedheartbeat_every_n_secscan only be specified ifheartbeat_urlexistsexpect,ssl_fingerprint, andignore_ssl_expiryare only valid for HTTP/QUIC monitorsexpectmust be a non-empty string if specified- All URLs must include both scheme (http/https) and hostname
- Email addresses must match standard email format (RFC 5322 simplified)
- SSL fingerprints must be valid hexadecimal strings with length that's a power of two
after_every_n_notificationscan only be specified ifnotify_every_n_secsis presentoutage_emailscan only be specified ifemail_serveris configured- If
email_serveris present,smtp_host,smtp_port, andfrom_addressare required smtp_usernameandsmtp_passwordare optional (for servers without authentication)- Email control flags (
email_outages,email_recoveries,email_reminders) accept boolean or string values - Monitor-level
emailflag accepts boolean or string values
Install system-wide for production use:
sudo pip3 install PyYAML requests pyOpenSSL urllib3 aioquic
Or on Debian 12+ systems:
sudo pip3 install --break-system-packages PyYAML requests pyOpenSSL urllib3 aioquic
Note: The aioquic package is required for QUIC/HTTP3 monitoring support. If you don't plan to use type: quic monitors, you can omit this dependency.
./APMonitor.py -s /tmp/statefile.json homelab-monitorhosts.yaml
./APMonitor.py --test-webhooks -v homelab-monitorhosts.yaml
./APMonitor.py --test-emails -v homelab-monitorhosts.yaml
APMonitor is invoked from the command line with various options to control verbosity, threading, state file location, and testing modes.
./APMonitor.py [OPTIONS] <config_file>
-
config_file(required): Path to YAML or JSON configuration file -
-v, --verbose: Increase verbosity level (can be repeated:-v,-vv,-vvv). Shows check progress, skip reasons, and diagnostic information. Useful for troubleshooting configuration or understanding monitoring behavior. -
-t, --threads <N>: Number of concurrent threads for checking resources (default: 1). Higher values enable parallel checking of multiple resources but increase lock contention. Use values > 1 for systems with many independent monitors. Will override the configuration file settings ifmax_threadsis specified in the site config. -
-s, --statefile <path>: Path to state file for persistence (default: platform-dependent). Recommended: use/tmp/statefile.jsonto store state in tmpfs for better performance and reduced disk wear. -
--test-webhooks: Test webhook notifications by sending a dummy alert to all configured webhooks, then exit. Does not check resources or modify state file. Useful for verifying webhook configuration and credentials. -
--test-emails: Test email notifications by sending a dummy alert to all configured email addresses, then exit. Does not check resources or modify state file.
Run with default settings, state stored in tmpfs:
./APMonitor.py -s /tmp/statefile.json monitoring-config.yaml
Show detailed progress and decision-making:
./APMonitor.py -v -s /tmp/statefile.json monitoring-config.yaml
Check many resources concurrently for near-realtime behavior:
./APMonitor.py -t 10 -s /tmp/statefile.json monitoring-config.yaml
Use higher thread counts (-t 5 to -t 20) when:
- Monitoring many independent resources (50+)
- Resources have long check timeouts
- Near-realtime alerting is required
- System has sufficient CPU cores
Warning: High thread counts increase lock contention. Test with -v to ensure checks aren't blocking each other.
Verify webhooks are configured correctly before production use:
./APMonitor.py --test-webhooks -v monitoring-config.yaml
This sends test messages to all configured webhooks with verbose output showing request/response details.
Verify email settings work correctly:
./APMonitor.py --test-emails -v monitoring-config.yaml
APMonitor is designed to be run repeatedly rather than as a long-running daemon. There are two common approaches:
Run every minute via cron for standard monitoring:
* * * * * /path/to/APMonitor.py -s /tmp/statefile.json /path/to/monitoring-config.yaml 2>&1 | logger -t apmonitor
NB: PID file locking should keep this under control, in case you get a long-running process.
Advantages:
- Automatic restart if process crashes
- Built-in scheduling
- System handles process lifecycle
- Easy to enable/disable (comment out cron entry)
Best for: Production systems, servers with standard monitoring requirements (check intervals ≥ 60 seconds)
Run continuously with short sleep intervals for near-realtime monitoring:
#!/bin/bash
while true; do
./APMonitor.py -t 5 -s /tmp/statefile.json monitoring-config.yaml
sleep 10
done
Or as a one-liner:
while true; do ./APMonitor.py -s /tmp/statefile.json monitoring-config.yaml; sleep 30; done
Advantages:
- Sub-minute check intervals
- Near-realtime alerting
- Fine control over execution frequency
Best for: Development, testing, systems requiring rapid failure detection (check intervals < 60 seconds)
Note: Use short sleep intervals (5-30 seconds) combined with per-resource check_every_n_secs settings to balance responsiveness and system load. APMonitor's internal scheduling prevents redundant checks even with frequent invocations.
For production deployments requiring process supervision:
[Unit]
Description=APMonitor Network Resource Monitor
After=network.target
[Service]
Type=simple
ExecStart=/bin/bash -c 'while true; do /usr/local/bin/APMonitor.py -vv -s /var/tmp/apmonitor-statefile.json /usr/local/etc/apmonitor-config.yaml; sleep 15; done'
Restart=always
RestartSec=10
User=monitoring
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
APMonitor automatically selects a platform-appropriate default location for the state file if the -s/--statefile option is not specified:
Default: /var/tmp/apmonitor-statefile.json
- Located in
/var/tmpwhich persists across system reboots - Preserves monitoring history and outage timestamps through restarts
- Enables accurate outage duration reporting even after system reboot
- No special permissions required (unlike
/var/run)
Default: %TEMP%\apmonitor-statefile.json
- Uses the system temporary directory defined by
TEMPorTMPenvironment variables - Typically resolves to
C:\Users\<username>\AppData\Local\Temp\apmonitor-statefile.json - Falls back to
C:\Temp\apmonitor-statefile.jsonif environment variables are not set
Default: ./apmonitor-statefile.json
- Creates state file in current working directory
- Safe fallback for uncommon or embedded systems
- Avoids permission issues on unfamiliar filesystem layouts
APMonitor's state file locking & PID locking is designed for single-process concurrency only—multiple threads within one process safely share state through internal locks. However, no file-level locking is implemented to coordinate between multiple APMonitor processes.
Having said that, APMonitor is very much re-entrant and thread safe for the most part, thus, if you specify different config files, it will happily allow a single process per config file to co-exist in parallel.
The config filename is used as the hash when forming a PID lockfile in tempfs (/tmp/apmonitor-##########.lock), so that multiple lockfiles can coexist.
Thus, running multiple concurrent instances requires separate state files:
# Instance 1: Production monitoring
./APMonitor.py -s /var/tmp/apmonitor-prod.json prod-apmonitor-config.yaml
# Instance 2: Development monitoring
./APMonitor.py -s /var/tmp/apmonitor-dev.json dev-apmonitor-config.yaml
# Instance 3: Critical services (high-frequency)
./APMonitor.py -t 5 -s /tmp/apmonitor-critical.json critical-apmonitor-config.yaml
Which should mean sensible cardinality rules are enforced: one config per site, one process per config, one and IFF only one; good for running out of crontab.
Why separate state files are required:
- No inter-process file locking mechanism exists
- Concurrent writes from multiple processes will corrupt state files
- Each process maintains independent monitoring schedules and notification state
- Atomic file rotation (
.new→.old) only protects single-process integrity
Use cases for multiple instances:
- Different monitoring priorities (high-frequency critical vs. low-frequency non-critical)
- Separate environments (production, staging, development)
- Independent notification channels (ops team vs. dev team)
- Isolated failure domains (prevent one misconfigured monitor from blocking others)
Always specify -s/--statefile when:
- Running from cron (working directory may vary)
- Requiring tmpfs storage for performance (
-s /tmp/apmonitor-statefile.json) - Managing multiple independent monitoring instances
- Deploying in containers or restricted environments
Example: Force tmpfs storage (cleared on reboot, faster I/O):
./APMonitor.py -s /tmp/apmonitor-statefile.json apmonitor-config.yaml
Note: The apmonitor- prefix prevents naming collisions with other applications using generic statefile.json names.
APMonitor uses a JSON state file to persist monitoring data across runs:
- Location: Recommended path is
/tmp/statefile.jsonfor tmpfs storage (faster, no disk wear) - Format: JSON with per-resource nested objects containing timestamps, status, and counters
- Atomic Updates: Uses
.newand.oldrotation to prevent corruption on crashes - Thread Safety: Protected by internal lock during concurrent access
The state file tracks:
is_up: Current resource statuslast_checked: When resource was last checked (ISO 8601 timestamp)last_response_time_ms: Response time in milliseconds for successful checkslast_notified: When last notification was sent (ISO 8601 timestamp)last_alarm_started: When current/last outage began (ISO 8601 timestamp)last_successful_heartbeat: When heartbeat URL last succeeded (ISO 8601 timestamp)down_count: Consecutive failed checksnotified_count: Number of notifications sent for current outageerror_reason: Last error messagelast_config_checksum: SHA-256 hash of monitor configuration (detects config changes)
Note: If using /tmp/statefile.json, the state file is cleared on system reboot. This resets all monitoring history but doesn't affect functionality—monitoring resumes normally on first run.
Configuration Change Detection: The last_config_checksum field stores a SHA-256 hash of the entire monitor configuration (all fields including type, name, address, expect, etc.). When APMonitor detects a configuration change (checksum mismatch), it immediately checks that monitor regardless of check_every_n_secs timing. This ensures configuration changes take effect on the next run without waiting for the scheduled check interval.
Here are some basic devnotes on how APMonitor is built, in case you want to modify it.
Each invocation of APMonitor:
- Acquires a PID lockfile via tempfs, using the config path as the hash to support multiple site configs in parallel.
- Loads and validates configuration file
- Loads previous state from state file (if exists)
- For each monitor:
- Calculates SHA-256 checksum of monitor configuration
- Checks if configuration changed (checksum mismatch) or
check_every_n_secselapsed sincelast_checked - If config changed: checks immediately (bypasses timing)
- If due: performs resource check
- If down and
notify_every_n_secselapsed: sends notifications - If up and heartbeat configured: pings heartbeat URL if due
- Updates state atomically with new checksum
- Saves state file with execution timing
- Cleans up the PID file if possible.
- Exits
This stateless design allows APMonitor to be killed/restarted safely at any time without losing monitoring history or creating duplicate notifications.
APMonitor was designed with an engineering based approach to Vibe Coding in mind, should you wish to change it.
Steps:
- Paste in
READAI.md(containing an Entrance Prompt) into your favourite AI coding tool (e.g., Grok 4.1 or Claude Sonnet) - Paste in
APMonitor.py(tell your AI this is the source code) - Paste in
README.md(tell your AI this is the documentation) - Vibe your changes as you see fit.
Enjoy!
This guide covers installing APMonitor as a systemd service on Debian-based systems (Debian 10+, Ubuntu 20.04+).
Fresh Debian/Ubuntu system with sudo access.
If you want to do an automated install, just follow these instructions, otherwise start with Step 1 below:
# Install (requires root)
sudo make install
# Edit configuration
sudo nano /usr/local/etc/apmonitor-config.yaml
# Test configuration
make test-config
# Enable and start service
sudo make enable
# Check status
make status
# View logs
make logs
# Restart after config changes
sudo make restart
# Uninstall completely
sudo make uninstall
sudo apt update
sudo apt install python3 python3-pip -y
Install dependencies globally (required for systemd service):
sudo pip3 install --break-system-packages PyYAML requests pyOpenSSL urllib3 aioquic
Note: On Debian 12+, the --break-system-packages flag is required. On older systems, omit this flag:
sudo pip3 install PyYAML requests pyOpenSSL urllib3 aioquic
Dependencies installed:
PyYAML- YAML configuration file parsingrequests- HTTP/HTTPS resource checking and webhook notificationspyOpenSSL- SSL certificate verification and fingerprint checkingurllib3- HTTP connection pooling (dependency of requests)aioquic- QUIC/HTTP3 protocol support (required fortype: quicmonitors)
Create a dedicated system user for running APMonitor:
sudo useradd -r -s /bin/bash -d /var/lib/apmonitor -m monitoring
Copy the APMonitor script and example configuration to system locations:
# Install APMonitor script
sudo cp APMonitor.py /usr/local/bin/
sudo chmod +x /usr/local/bin/APMonitor.py
# Install example configuration
sudo cp example-apmonitor-config.yaml /usr/local/etc/apmonitor-config.yaml
sudo chown monitoring:monitoring /usr/local/etc/apmonitor-config.yaml
sudo chmod 640 /usr/local/etc/apmonitor-config.yaml
Important: Edit /usr/local/etc/apmonitor-config.yaml to configure your monitoring targets, notification endpoints, and site name before proceeding.
Create the systemd service definition:
sudo nano /etc/systemd/system/apmonitor.service
Paste the following content:
[Unit]
Description=APMonitor Network Resource Monitor
After=network.target
[Service]
Type=simple
ExecStart=/bin/bash -c 'while true; do /usr/local/bin/APMonitor.py -vv -s /var/tmp/apmonitor-statefile.json /usr/local/etc/apmonitor-config.yaml; sleep 15; done'
Restart=always
RestartSec=10
User=monitoring
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Save and exit (Ctrl+X, then Y, then Enter in nano).
Reload systemd, enable the service to start on boot, and start it:
sudo systemctl daemon-reload
sudo systemctl enable apmonitor.service
sudo systemctl start apmonitor.service
Check service status:
sudo systemctl status apmonitor.service
View live logs:
sudo journalctl -u apmonitor.service -f
View recent logs:
sudo journalctl -u apmonitor.service -n 100
Run APMonitor manually as the monitoring user to verify configuration:
sudo -u monitoring /usr/local/bin/APMonitor.py -vv -s /var/tmp/apmonitor-statefile.json /usr/local/etc/apmonitor-config.yaml
Test webhook configuration without checking resources:
sudo -u monitoring /usr/local/bin/APMonitor.py --test-webhooks -v /usr/local/etc/apmonitor-config.yaml
Test email configuration without checking resources:
sudo -u monitoring /usr/local/bin/APMonitor.py --test-emails -v /usr/local/etc/apmonitor-config.yaml
Verify the monitoring user can write to the state file location:
sudo ls -la /var/tmp/apmonitor-statefile.json
The /var/tmp directory should have permissions 1777 (drwxrwxrwt) allowing any user to create files.
Display the active configuration:
sudo cat /usr/local/etc/apmonitor-config.yaml
# Stop service
sudo systemctl stop apmonitor.service
# Restart service (after config changes)
sudo systemctl restart apmonitor.service
# Disable service from starting on boot
sudo systemctl disable apmonitor.service
# Check if service is enabled
sudo systemctl is-enabled apmonitor.service
After modifying /usr/local/etc/apmonitor-config.yaml, the changes take effect automatically on the next monitoring cycle (typically within 30 seconds). APMonitor detects configuration changes via SHA-256 checksums and immediately checks any modified monitors, so you don't need to restart the service unless you want immediate effect.
To force immediate checking of all monitors after config changes:
sudo systemctl restart apmonitor.service
To completely remove APMonitor:
# Stop and disable service
sudo systemctl stop apmonitor.service
sudo systemctl disable apmonitor.service
# Remove service file
sudo rm /etc/systemd/system/apmonitor.service
sudo systemctl daemon-reload
# Remove files
sudo rm /usr/local/bin/APMonitor.py
sudo rm /usr/local/etc/apmonitor-config.yaml
sudo rm /var/tmp/apmonitor-statefile.json*
# Remove monitoring user
sudo userdel -r monitoring
# Optionally remove Python dependencies
sudo pip3 uninstall -y PyYAML requests pyOpenSSL urllib3 aioquic
-
Add additional monitors:
- TCP & UDP port monitoring
- SNMP w/defaults for managed switches and system performance tuning
- Update docs to provide webhook examples for Pushover, Slack & Diwscord
-
Add additional outputs:
- MRTG compatible logfiles
- MRTG compatible graph generation w/index.html
-
Aggregated root cause alerting:
- Specify parent dependencies using config option
parent_nameso we have a network topology graph - Add loop detection to ensure the topology graph is a DAG
- Use the topology to only notify outages for the root cause and list the affected services in the same alert
- When a monitored resource has multiple parent dependencies, specify if it's down when all are down (AND relation) or down when one is down (OR relation)
- Consider correct use of pre/in/post-order traversal when deciding which alerts to drop
- Specify parent dependencies using config option
-
Convert finished version to pure C
APMonitor.c- Strictly only with
libc/SVR4 C Systems Programming dependencies for really tiny cross-platform embedded systems application environments - Add a Mercator +
APTree.c#InfoRecinspired/styled priority queue for handling large numbers of monitored resources with proper realtime programming guarantees - Test if we are
rootwhen doing apingsyscall and fallback to directSOCK_RAWif we are for high performance
- Strictly only with
APMonitor.py is licensed under the GNU General Public License version 3.
Software: APMonitor 1.1.0
License: GNU General Public License version 3
Licensor: Andrew (AP) Prendergast, ap@andrewprendergast.com -- FSF Member
We use SemVer for version numbering.



