🚀 API Monitoring

Enterprise-grade API monitoring solution for AWS-compatible cloud services with intelligent alerting and comprehensive network diagnostics.

Overview • Features • Quick Start • Installation • Configuration • Usage • Contributing

📋 Overview

API Monitoring is a production-ready, enterprise-grade monitoring solution designed specifically for AWS-compatible cloud APIs. Built with modern asynchronous Python architecture, it provides real-time monitoring, intelligent alerting, and comprehensive network diagnostics to ensure your cloud infrastructure remains highly available.

🎯 Problem Statement

Cloud API downtime can cost businesses thousands of dollars per minute. Traditional monitoring solutions often lack:

Real-time detection of API failures
Intelligent maintenance mode handling
Comprehensive network diagnostics when issues occur
Immediate notification to the right teams
Automated resolution tracking

💡 Solution

API Monitoring addresses these challenges by providing:

Continuous monitoring of AWS-compatible APIs with configurable intervals
Smart maintenance detection that prevents false alerts during planned downtime
Instant Telegram alerts with detailed MTR network traces for rapid troubleshooting
Automatic resolution notifications when services recover
Production-ready deployment with Docker and systemd support

🏢 Use Cases

DevOps Teams: Monitor critical cloud infrastructure APIs
SRE Teams: Ensure SLA compliance with real-time alerting
Cloud Providers: Monitor service availability across regions
Enterprise IT: Track AWS-compatible private cloud APIs
MSPs: Monitor multiple client cloud environments

✨ Features

🔧 Core Monitoring Capabilities

🚀 High-Performance Asynchronous Architecture: Built with asyncio and aiohttp for efficient, non-blocking operations that can handle multiple concurrent checks
☁️ AWS-Compatible API Monitoring: Comprehensive monitoring of AWS EC2-compatible APIs using boto3/aiobotocore
🔧 Intelligent Maintenance Mode Detection: Automatically detects and skips monitoring during planned maintenance windows
⏱️ Configurable Check Intervals: Customizable monitoring frequency from seconds to hours based on your requirements
🎯 Precise Timeout Management: Granular timeout controls for different operations (API calls, maintenance checks, network traces)

🚨 Advanced Alerting System

🤖 Intelligent Telegram Alerts: Rich HTML-formatted notifications with comprehensive diagnostic information
🌐 Network Diagnostics Integration: Automatic MTR (My Traceroute) network path analysis included in alerts
💬 Custom Alert Comments: Add contextual information to alerts for better incident management
✅ Automatic Resolution Tracking: Smart detection and notification when issues are resolved
🔄 Alert Deduplication: Prevents spam by tracking alert states and sending updates only when status changes

🛠️ Enterprise-Ready Operations

📊 Structured JSON Logging: Machine-readable logs for integration with log aggregation systems (ELK, Splunk, etc.)
🐳 Production-Ready Containerization: Docker and Docker Compose support with optimized images
⚙️ Robust Configuration Management: Pydantic-based validation with environment variable support
🔒 Security Best Practices: Secure credential handling and input validation
🏗️ Modular Architecture: Clean, maintainable codebase with comprehensive test coverage

🏗️ Architecture

API Monitoring follows a modern, modular architecture designed for scalability, maintainability, and reliability:

api_monitoring/
├── config/          # Configuration management with Pydantic validation
├── clients/         # AWS API client with connection pooling
├── monitoring/      # Core monitoring logic and maintenance detection
├── alerting/        # Multi-channel alerting system (Telegram, extensible)
├── utils/           # Shared utilities (logging, networking, diagnostics)
└── main.py         # Application entry point and orchestration

🔧 Component Details

⚙️ Configuration Layer: Pydantic-based settings with environment variable support, validation, and type safety
📝 Structured Logging: JSON-formatted logs with correlation IDs, structured fields, and configurable output destinations
☁️ AWS Client: Asynchronous boto3/aiobotocore client with automatic retry logic, connection pooling, and error handling
🔧 Maintenance Detection: Smart maintenance mode detection using configurable patterns and response analysis
🌐 Network Diagnostics: MTR integration for comprehensive network path analysis and troubleshooting
📢 Alerting Engine: Extensible notification system with rich formatting, deduplication, and delivery confirmation
👁️ Monitoring Core: Event-driven monitoring loop with configurable intervals, timeout management, and state tracking
🚀 Application Bootstrap: Signal handling, graceful shutdown, dependency validation, and service lifecycle management

📊 Technical Specifications

🔧 System Requirements

Python: 3.11+ (optimized for 3.13)
Memory: Minimum 128MB RAM (recommended 256MB+)
CPU: Single core sufficient (benefits from multi-core for concurrent operations)
Network: Outbound HTTPS (443) for API calls and Telegram notifications
Storage: 50MB for application + logs (configurable log retention)

⚡ Performance Characteristics

Monitoring Latency: Sub-second API response detection
Alert Delivery: < 5 seconds from detection to Telegram notification
Resource Usage: < 50MB RAM, minimal CPU usage during normal operations
Concurrent Operations: Supports multiple simultaneous API checks and network traces
Scalability: Designed for monitoring 100+ endpoints (with appropriate resource allocation)

🔒 Security Features

Credential Management: Environment variable-based secrets with validation
Input Sanitization: All user inputs sanitized and validated
Network Security: HTTPS-only communications with certificate validation
Error Handling: Secure error messages without sensitive data exposure
Audit Trail: Comprehensive logging of all monitoring activities

🚀 Quick Start

Get up and running in minutes with Docker:

# Clone the repository
git clone https://github.com/dkamenkov/api-monitoring.git
cd api-monitoring

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your settings

# Run with Docker Compose
docker-compose up -d

That's it! Your API monitoring is now running. Check the logs with:

docker-compose logs -f

📦 Installation

📋 Prerequisites

🐍 Python 3.13+
🌐 MTR (My Traceroute) tool

Option 1: 🐍 Standard Installation

Clone the repository:

git clone https://github.com/dkamenkov/api-monitoring.git
cd api-monitoring

Install the required Python libraries:

pip install -r requirements.txt

Install the MTR tool:
- On Ubuntu/Debian:
```
sudo apt-get install mtr
```
- On CentOS:
```
sudo yum install mtr
```

Option 2: 🐳 Docker Installation

Clone the repository:

git clone https://github.com/dkamenkov/api-monitoring.git
cd api-monitoring

Build and run with Docker Compose:

docker-compose up -d

⚙️ Configuration

📝 Environment Setup

Copy the example configuration file:

cp .env.example .env

Edit the .env file with your specific settings:

🔧 Required Configuration

# API Endpoint Configuration
ENDPOINT_URL=https://api.your-cloud-provider.com
# The AWS-compatible API endpoint to monitor (without trailing slash)

# AWS Credentials
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
# Your AWS access key ID with EC2 permissions

AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Your AWS secret access key

AWS_DEFAULT_REGION=us-east-1
# AWS region for API calls (must match your endpoint region)

# Telegram Configuration
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrsTUVwxyz
# Bot token from @BotFather on Telegram

TELEGRAM_CHAT_ID=123456789
# Your Telegram chat ID (get from @userinfobot)

⚙️ Optional Configuration

# Monitoring Intervals (in seconds)
CHECK_INTERVAL=60
# How often to check the API (recommended: 60-300 for production)

API_TIMEOUT=15
# Timeout for API requests (recommended: 10-30 seconds)

MAINTENANCE_CHECK_TIMEOUT=10
# Timeout for maintenance mode detection (recommended: 5-15 seconds)

# Logging Configuration
LOG_LEVEL=INFO
# Logging verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL

LOG_FILE=logs.log
# Log file path (set to empty string to disable file logging)

🔒 Security Best Practices

Never commit .env files to version control
Use IAM roles when running on AWS EC2 instead of access keys
Rotate credentials regularly and use least-privilege access
Restrict Telegram bot to specific chats only
Use environment-specific configurations for different deployments

🏢 Enterprise Configuration

For enterprise deployments, consider:

# Advanced Settings
CHECK_INTERVAL=30          # More frequent checks for critical systems
API_TIMEOUT=10            # Shorter timeout for faster detection
LOG_LEVEL=WARNING         # Reduce log verbosity in production
LOG_FILE=""               # Disable file logging, use centralized logging

🐳 Docker Configuration

When using Docker, you can override settings via environment variables:

# docker-compose.override.yml
version: '3.8'
services:
  api-monitoring:
    environment:
      - CHECK_INTERVAL=30
      - LOG_LEVEL=WARNING
      - LOG_FILE=""

🚀 Usage

🐍 Standard Method

python -m api_monitoring.main

🐳 Docker Method (Recommended)

docker-compose up -d

🔧 Production Deployment (Systemd)

For production environments, deploy as a systemd service:

Create a systemd service file:

sudo nano /etc/systemd/system/api-monitoring.service

Add the following content:

[Unit]
Description=API Monitoring Service
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=api-monitor
Group=api-monitor
WorkingDirectory=/opt/api-monitoring
ExecStart=/usr/bin/python3 -m api_monitoring.main
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=api-monitoring

# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/api-monitoring/logs

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable api-monitoring
sudo systemctl start api-monitoring
sudo systemctl status api-monitoring

📱 Monitoring Examples

🚨 Alert Notifications

When an API issue is detected, you'll receive a comprehensive Telegram alert:

🚨 Issue detected with API api.example.com 🚨
Timestamp: 2024-01-15 14:30:25
Source IP: 203.0.113.42
Error: Cannot connect to the endpoint: Connection timeout

Trace to api.example.com:
Start: Mon Jan 15 14:30:26 2024
HOST: monitoring-server          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- gateway.local             0.0%    10    1.2   1.1   0.9   1.5   0.2
  2.|-- isp-router.net            0.0%    10   12.3  11.8  10.2  15.1   1.4
  3.|-- core-01.provider.com     10.0%    10   45.2  44.8  42.1  48.9   2.1
  4.|-- ???                     100.0%    10    0.0   0.0   0.0   0.0   0.0

✅ Resolution Notifications

When the issue is resolved:

🟢 Issue with API api.example.com resolved!

📊 Log Output Examples

Structured JSON Logs:

{
  "timestamp": "2024-01-15T14:30:25.123456",
  "level": "INFO",
  "message": "API check succeeded",
  "module": "monitor",
  "function": "run_once",
  "line": 125,
  "endpoint": "api.example.com",
  "response_time_ms": 245,
  "check_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Logs:

{
  "timestamp": "2024-01-15T14:30:25.123456",
  "level": "ERROR",
  "message": "API check failed: Connection timeout",
  "module": "aws_client",
  "function": "check_api_availability",
  "line": 78,
  "endpoint": "api.example.com",
  "error_type": "EndpointConnectionError",
  "check_id": "550e8400-e29b-41d4-a716-446655440001"
}

🔧 Troubleshooting

Common Issues

🚫 "MTR is not installed" Error

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install mtr-tiny

# CentOS/RHEL
sudo yum install mtr

# Alpine Linux (Docker)
apk add --no-cache mtr

🔑 AWS Credentials Issues

Ensure AWS credentials have EC2 permissions
Verify the endpoint URL is correct and accessible
Check if the region matches your AWS setup

📱 Telegram Notifications Not Working

Verify bot token is correct and bot is active
Ensure chat ID is correct (use @userinfobot to get your chat ID)
Check if bot has permission to send messages to the chat

🐳 Docker Container Issues

# Check container logs
docker-compose logs -f api-monitoring

# Restart container
docker-compose restart api-monitoring

# Rebuild with latest changes
docker-compose down && docker-compose up -d --build

🔧 High Memory Usage

Reduce CHECK_INTERVAL if set too low
Check for memory leaks in logs
Consider resource limits in Docker deployment

Debug Mode

Enable debug logging for troubleshooting:

# Set in .env file
LOG_LEVEL=DEBUG

# Or as environment variable
export LOG_LEVEL=DEBUG
python -m api_monitoring.main

❓ FAQ

General Questions

Q: What APIs are supported? A: Any AWS EC2-compatible API that supports the describe_availability_zones operation. This includes AWS, OpenStack, and many private cloud solutions.

Q: How often should I set the check interval? A: For production systems, 60-300 seconds is recommended. For critical systems, 30-60 seconds. Avoid intervals below 30 seconds to prevent API rate limiting.

Q: Can I monitor multiple APIs? A: Currently, each instance monitors one API endpoint. Deploy multiple instances with different configurations for multiple endpoints.

Technical Questions

Q: What network ports are required? A: Outbound HTTPS (443) for API calls and Telegram notifications. MTR requires ICMP and UDP for network tracing.

Q: How much bandwidth does it use? A: Minimal - typically less than 1MB per day for standard monitoring intervals.

Q: Is it suitable for production use? A: Yes, it's designed for production with proper error handling, logging, and deployment options.

Q: Can I extend the alerting to other channels? A: Yes, the alerting system is modular. See the Contributing Guide for extending to Slack, Discord, or email.

Configuration Questions

Q: How do I get a Telegram bot token? A: Message @BotFather on Telegram, create a new bot, and copy the provided token.

Q: How do I find my Telegram chat ID? A: Message @userinfobot on Telegram, or add your bot to a group and check the logs for the chat ID.

Q: Can I use IAM roles instead of access keys? A: Yes, when running on AWS EC2, the application will automatically use IAM instance profiles if no explicit credentials are provided.

🤝 Contributing

We love contributions! Please read our Contributing Guide to get started.

🐛 Found a Bug?

Please open an issue with details about the bug.

💡 Have an Idea?

We'd love to hear about it! Open a feature request.

🔒 Security

Please review our Security Policy for reporting security vulnerabilities.

📈 Roadmap

Web dashboard for monitoring status
Support for multiple notification channels (Slack, Discord, Email)
Prometheus metrics export
Custom health check endpoints
Multi-region monitoring
Advanced alerting rules and conditions

🙏 Acknowledgments

Built with aiohttp for async HTTP operations
Uses aiobotocore for AWS API interactions
Configuration management with Pydantic
Network tracing powered by MTR

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ Star this repository if you find it useful! ⭐

Made with ❤️ by the API Monitoring team

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
api_monitoring		api_monitoring
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

License

dkamenkov/api-monitoring

Folders and files

Latest commit

History

Repository files navigation