Skip to content

dkamenkov/api-monitoring

Repository files navigation

πŸš€ API Monitoring

CI codecov License: MIT Python 3.13+ Docker Code style: black

Enterprise-grade API monitoring solution for AWS-compatible cloud services with intelligent alerting and comprehensive network diagnostics.

Overview β€’ Features β€’ Quick Start β€’ Installation β€’ Configuration β€’ Usage β€’ Contributing


πŸ“‹ Overview

API Monitoring is a production-ready, enterprise-grade monitoring solution designed specifically for AWS-compatible cloud APIs. Built with modern asynchronous Python architecture, it provides real-time monitoring, intelligent alerting, and comprehensive network diagnostics to ensure your cloud infrastructure remains highly available.

🎯 Problem Statement

Cloud API downtime can cost businesses thousands of dollars per minute. Traditional monitoring solutions often lack:

  • Real-time detection of API failures
  • Intelligent maintenance mode handling
  • Comprehensive network diagnostics when issues occur
  • Immediate notification to the right teams
  • Automated resolution tracking

πŸ’‘ Solution

API Monitoring addresses these challenges by providing:

  • Continuous monitoring of AWS-compatible APIs with configurable intervals
  • Smart maintenance detection that prevents false alerts during planned downtime
  • Instant Telegram alerts with detailed MTR network traces for rapid troubleshooting
  • Automatic resolution notifications when services recover
  • Production-ready deployment with Docker and systemd support

🏒 Use Cases

  • DevOps Teams: Monitor critical cloud infrastructure APIs
  • SRE Teams: Ensure SLA compliance with real-time alerting
  • Cloud Providers: Monitor service availability across regions
  • Enterprise IT: Track AWS-compatible private cloud APIs
  • MSPs: Monitor multiple client cloud environments

✨ Features

πŸ”§ Core Monitoring Capabilities

  • πŸš€ High-Performance Asynchronous Architecture: Built with asyncio and aiohttp for efficient, non-blocking operations that can handle multiple concurrent checks
  • ☁️ AWS-Compatible API Monitoring: Comprehensive monitoring of AWS EC2-compatible APIs using boto3/aiobotocore
  • πŸ”§ Intelligent Maintenance Mode Detection: Automatically detects and skips monitoring during planned maintenance windows
  • ⏱️ Configurable Check Intervals: Customizable monitoring frequency from seconds to hours based on your requirements
  • 🎯 Precise Timeout Management: Granular timeout controls for different operations (API calls, maintenance checks, network traces)

🚨 Advanced Alerting System

  • πŸ€– Intelligent Telegram Alerts: Rich HTML-formatted notifications with comprehensive diagnostic information
  • 🌐 Network Diagnostics Integration: Automatic MTR (My Traceroute) network path analysis included in alerts
  • πŸ’¬ Custom Alert Comments: Add contextual information to alerts for better incident management
  • βœ… Automatic Resolution Tracking: Smart detection and notification when issues are resolved
  • πŸ”„ Alert Deduplication: Prevents spam by tracking alert states and sending updates only when status changes

πŸ› οΈ Enterprise-Ready Operations

  • πŸ“Š Structured JSON Logging: Machine-readable logs for integration with log aggregation systems (ELK, Splunk, etc.)
  • 🐳 Production-Ready Containerization: Docker and Docker Compose support with optimized images
  • βš™οΈ Robust Configuration Management: Pydantic-based validation with environment variable support
  • πŸ”’ Security Best Practices: Secure credential handling and input validation
  • πŸ—οΈ Modular Architecture: Clean, maintainable codebase with comprehensive test coverage

πŸ—οΈ Architecture

API Monitoring follows a modern, modular architecture designed for scalability, maintainability, and reliability:

api_monitoring/
β”œβ”€β”€ config/          # Configuration management with Pydantic validation
β”œβ”€β”€ clients/         # AWS API client with connection pooling
β”œβ”€β”€ monitoring/      # Core monitoring logic and maintenance detection
β”œβ”€β”€ alerting/        # Multi-channel alerting system (Telegram, extensible)
β”œβ”€β”€ utils/           # Shared utilities (logging, networking, diagnostics)
└── main.py         # Application entry point and orchestration

πŸ”§ Component Details

  • βš™οΈ Configuration Layer: Pydantic-based settings with environment variable support, validation, and type safety
  • πŸ“ Structured Logging: JSON-formatted logs with correlation IDs, structured fields, and configurable output destinations
  • ☁️ AWS Client: Asynchronous boto3/aiobotocore client with automatic retry logic, connection pooling, and error handling
  • πŸ”§ Maintenance Detection: Smart maintenance mode detection using configurable patterns and response analysis
  • 🌐 Network Diagnostics: MTR integration for comprehensive network path analysis and troubleshooting
  • πŸ“’ Alerting Engine: Extensible notification system with rich formatting, deduplication, and delivery confirmation
  • πŸ‘οΈ Monitoring Core: Event-driven monitoring loop with configurable intervals, timeout management, and state tracking
  • πŸš€ Application Bootstrap: Signal handling, graceful shutdown, dependency validation, and service lifecycle management

πŸ“Š Technical Specifications

πŸ”§ System Requirements

  • Python: 3.11+ (optimized for 3.13)
  • Memory: Minimum 128MB RAM (recommended 256MB+)
  • CPU: Single core sufficient (benefits from multi-core for concurrent operations)
  • Network: Outbound HTTPS (443) for API calls and Telegram notifications
  • Storage: 50MB for application + logs (configurable log retention)

⚑ Performance Characteristics

  • Monitoring Latency: Sub-second API response detection
  • Alert Delivery: < 5 seconds from detection to Telegram notification
  • Resource Usage: < 50MB RAM, minimal CPU usage during normal operations
  • Concurrent Operations: Supports multiple simultaneous API checks and network traces
  • Scalability: Designed for monitoring 100+ endpoints (with appropriate resource allocation)

πŸ”’ Security Features

  • Credential Management: Environment variable-based secrets with validation
  • Input Sanitization: All user inputs sanitized and validated
  • Network Security: HTTPS-only communications with certificate validation
  • Error Handling: Secure error messages without sensitive data exposure
  • Audit Trail: Comprehensive logging of all monitoring activities

πŸš€ Quick Start

Get up and running in minutes with Docker:

# Clone the repository
git clone https://github.com/dkamenkov/api-monitoring.git
cd api-monitoring

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your settings

# Run with Docker Compose
docker-compose up -d

That's it! Your API monitoring is now running. Check the logs with:

docker-compose logs -f

πŸ“¦ Installation

πŸ“‹ Prerequisites

  • 🐍 Python 3.13+
  • 🌐 MTR (My Traceroute) tool

Option 1: 🐍 Standard Installation

  1. Clone the repository:
git clone https://github.com/dkamenkov/api-monitoring.git
cd api-monitoring
  1. Install the required Python libraries:
pip install -r requirements.txt
  1. Install the MTR tool:
    • On Ubuntu/Debian:
      sudo apt-get install mtr
    • On CentOS:
      sudo yum install mtr

Option 2: 🐳 Docker Installation

  1. Clone the repository:
git clone https://github.com/dkamenkov/api-monitoring.git
cd api-monitoring
  1. Build and run with Docker Compose:
docker-compose up -d

βš™οΈ Configuration

πŸ“ Environment Setup

  1. Copy the example configuration file:
cp .env.example .env
  1. Edit the .env file with your specific settings:

πŸ”§ Required Configuration

# API Endpoint Configuration
ENDPOINT_URL=https://api.your-cloud-provider.com
# The AWS-compatible API endpoint to monitor (without trailing slash)

# AWS Credentials
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
# Your AWS access key ID with EC2 permissions

AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Your AWS secret access key

AWS_DEFAULT_REGION=us-east-1
# AWS region for API calls (must match your endpoint region)

# Telegram Configuration
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrsTUVwxyz
# Bot token from @BotFather on Telegram

TELEGRAM_CHAT_ID=123456789
# Your Telegram chat ID (get from @userinfobot)

βš™οΈ Optional Configuration

# Monitoring Intervals (in seconds)
CHECK_INTERVAL=60
# How often to check the API (recommended: 60-300 for production)

API_TIMEOUT=15
# Timeout for API requests (recommended: 10-30 seconds)

MAINTENANCE_CHECK_TIMEOUT=10
# Timeout for maintenance mode detection (recommended: 5-15 seconds)

# Logging Configuration
LOG_LEVEL=INFO
# Logging verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL

LOG_FILE=logs.log
# Log file path (set to empty string to disable file logging)

πŸ”’ Security Best Practices

  • Never commit .env files to version control
  • Use IAM roles when running on AWS EC2 instead of access keys
  • Rotate credentials regularly and use least-privilege access
  • Restrict Telegram bot to specific chats only
  • Use environment-specific configurations for different deployments

🏒 Enterprise Configuration

For enterprise deployments, consider:

# Advanced Settings
CHECK_INTERVAL=30          # More frequent checks for critical systems
API_TIMEOUT=10            # Shorter timeout for faster detection
LOG_LEVEL=WARNING         # Reduce log verbosity in production
LOG_FILE=""               # Disable file logging, use centralized logging

🐳 Docker Configuration

When using Docker, you can override settings via environment variables:

# docker-compose.override.yml
version: '3.8'
services:
  api-monitoring:
    environment:
      - CHECK_INTERVAL=30
      - LOG_LEVEL=WARNING
      - LOG_FILE=""

πŸš€ Usage

🐍 Standard Method

python -m api_monitoring.main

🐳 Docker Method (Recommended)

docker-compose up -d

πŸ”§ Production Deployment (Systemd)

For production environments, deploy as a systemd service:

  1. Create a systemd service file:
sudo nano /etc/systemd/system/api-monitoring.service
  1. Add the following content:
[Unit]
Description=API Monitoring Service
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=api-monitor
Group=api-monitor
WorkingDirectory=/opt/api-monitoring
ExecStart=/usr/bin/python3 -m api_monitoring.main
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=api-monitoring

# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/api-monitoring/logs

[Install]
WantedBy=multi-user.target
  1. Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable api-monitoring
sudo systemctl start api-monitoring
sudo systemctl status api-monitoring

πŸ“± Monitoring Examples

🚨 Alert Notifications

When an API issue is detected, you'll receive a comprehensive Telegram alert:

🚨 Issue detected with API api.example.com 🚨
Timestamp: 2024-01-15 14:30:25
Source IP: 203.0.113.42
Error: Cannot connect to the endpoint: Connection timeout

Trace to api.example.com:
Start: Mon Jan 15 14:30:26 2024
HOST: monitoring-server          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- gateway.local             0.0%    10    1.2   1.1   0.9   1.5   0.2
  2.|-- isp-router.net            0.0%    10   12.3  11.8  10.2  15.1   1.4
  3.|-- core-01.provider.com     10.0%    10   45.2  44.8  42.1  48.9   2.1
  4.|-- ???                     100.0%    10    0.0   0.0   0.0   0.0   0.0

βœ… Resolution Notifications

When the issue is resolved:

🟒 Issue with API api.example.com resolved!

πŸ“Š Log Output Examples

Structured JSON Logs:

{
  "timestamp": "2024-01-15T14:30:25.123456",
  "level": "INFO",
  "message": "API check succeeded",
  "module": "monitor",
  "function": "run_once",
  "line": 125,
  "endpoint": "api.example.com",
  "response_time_ms": 245,
  "check_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Logs:

{
  "timestamp": "2024-01-15T14:30:25.123456",
  "level": "ERROR",
  "message": "API check failed: Connection timeout",
  "module": "aws_client",
  "function": "check_api_availability",
  "line": 78,
  "endpoint": "api.example.com",
  "error_type": "EndpointConnectionError",
  "check_id": "550e8400-e29b-41d4-a716-446655440001"
}

πŸ”§ Troubleshooting

Common Issues

🚫 "MTR is not installed" Error

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install mtr-tiny

# CentOS/RHEL
sudo yum install mtr

# Alpine Linux (Docker)
apk add --no-cache mtr

πŸ”‘ AWS Credentials Issues

  • Ensure AWS credentials have EC2 permissions
  • Verify the endpoint URL is correct and accessible
  • Check if the region matches your AWS setup

πŸ“± Telegram Notifications Not Working

  • Verify bot token is correct and bot is active
  • Ensure chat ID is correct (use @userinfobot to get your chat ID)
  • Check if bot has permission to send messages to the chat

🐳 Docker Container Issues

# Check container logs
docker-compose logs -f api-monitoring

# Restart container
docker-compose restart api-monitoring

# Rebuild with latest changes
docker-compose down && docker-compose up -d --build

πŸ”§ High Memory Usage

  • Reduce CHECK_INTERVAL if set too low
  • Check for memory leaks in logs
  • Consider resource limits in Docker deployment

Debug Mode

Enable debug logging for troubleshooting:

# Set in .env file
LOG_LEVEL=DEBUG

# Or as environment variable
export LOG_LEVEL=DEBUG
python -m api_monitoring.main

❓ FAQ

General Questions

Q: What APIs are supported? A: Any AWS EC2-compatible API that supports the describe_availability_zones operation. This includes AWS, OpenStack, and many private cloud solutions.

Q: How often should I set the check interval? A: For production systems, 60-300 seconds is recommended. For critical systems, 30-60 seconds. Avoid intervals below 30 seconds to prevent API rate limiting.

Q: Can I monitor multiple APIs? A: Currently, each instance monitors one API endpoint. Deploy multiple instances with different configurations for multiple endpoints.

Technical Questions

Q: What network ports are required? A: Outbound HTTPS (443) for API calls and Telegram notifications. MTR requires ICMP and UDP for network tracing.

Q: How much bandwidth does it use? A: Minimal - typically less than 1MB per day for standard monitoring intervals.

Q: Is it suitable for production use? A: Yes, it's designed for production with proper error handling, logging, and deployment options.

Q: Can I extend the alerting to other channels? A: Yes, the alerting system is modular. See the Contributing Guide for extending to Slack, Discord, or email.

Configuration Questions

Q: How do I get a Telegram bot token? A: Message @BotFather on Telegram, create a new bot, and copy the provided token.

Q: How do I find my Telegram chat ID? A: Message @userinfobot on Telegram, or add your bot to a group and check the logs for the chat ID.

Q: Can I use IAM roles instead of access keys? A: Yes, when running on AWS EC2, the application will automatically use IAM instance profiles if no explicit credentials are provided.

🀝 Contributing

We love contributions! Please read our Contributing Guide to get started.

πŸ› Found a Bug?

Please open an issue with details about the bug.

πŸ’‘ Have an Idea?

We'd love to hear about it! Open a feature request.

πŸ”’ Security

Please review our Security Policy for reporting security vulnerabilities.

πŸ“ˆ Roadmap

  • Web dashboard for monitoring status
  • Support for multiple notification channels (Slack, Discord, Email)
  • Prometheus metrics export
  • Custom health check endpoints
  • Multi-region monitoring
  • Advanced alerting rules and conditions

πŸ™ Acknowledgments

  • Built with aiohttp for async HTTP operations
  • Uses aiobotocore for AWS API interactions
  • Configuration management with Pydantic
  • Network tracing powered by MTR

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


⭐ Star this repository if you find it useful! ⭐

Made with ❀️ by the API Monitoring team

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published