Skip to content

basit-devBE/Deploy-Watch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# Blue/Green Deployment with Automated Monitoring

Production-ready blue/green deployment system with Docker Compose, featuring automated monitoring, error detection, and Slack notifications.

## Features

- **Zero-Downtime Deployments**: Seamless switching between blue and green pools
- **Automatic Failover**: Nginx automatically routes to backup pool when primary fails
- **Real-Time Monitoring**: Continuous monitoring of error rates and failover events
- **Slack Notifications**: Instant alerts for high error rates and failover events
- **Structured Logging**: JSON-formatted nginx logs with pool, release, and latency tracking
- **Health Checks**: Automated container health monitoring
- **Comprehensive Testing**: Automated test suite for all deployment scenarios

## Architecture

```
                    ┌─────────────┐
                    │   Clients   │
                    └──────┬──────┘

                    ┌──────▼──────┐
                    │    Nginx    │ (Port 8080)
                    │   Reverse   │
                    │    Proxy    │
                    └──┬───────┬──┘
                       │       │
        ┌──────────────┘       └──────────────┐
        │                                     │
    ┌───▼────┐                           ┌───▼────┐
    │  Blue  │ (Primary)                 │ Green  │ (Backup)
    │  Pool  │                           │  Pool  │
    └────────┘                           └────────┘
        │                                     │
        └─────────────┬───────────────────────┘

              ┌───────▼────────┐
              │ Alert Watcher  │
              │   (Monitors    │
              │  Nginx Logs)   │
              └───────┬────────┘

              ┌───────▼────────┐
              │     Slack      │
              │ Notifications  │
              └────────────────┘
```

## Quick Start

1. **Setup environment**
```bash
# Make entrypoint executable
chmod +x ./nginx/entrypoint.sh

# Configure your environment (update with your values)
cat > .env << EOF
BLUE_IMAGE=your-image:tag
GREEN_IMAGE=your-image:tag
ACTIVE_POOL=blue
RELEASE_ID_BLUE=v1.0.0-blue
RELEASE_ID_GREEN=v1.0.0-green
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
ERROR_RATE_THRESHOLD=2
WINDOW_SIZE=200
ALERT_COOLDOWN_SEC=300
MAINTENANCE_MODE=false
PORT=3000
EOF
```

2. **Start the stack**
```bash
docker compose up -d
```

3. **Verify deployment**
```bash
# Check all services are healthy
docker compose ps

# Test the application
curl http://localhost:8080

# Check which pool is active
curl -I http://localhost:8080 | grep -i x-app-pool

# Monitor logs
docker compose logs -f alert_watcher
```

4. **Run automated tests**
```bash
./test_deployment.sh
```

## System Components

### 1. Application Pools
- **Blue Pool** (`app_blue`): Primary deployment pool
- **Green Pool** (`app_green`): Secondary deployment pool for zero-downtime updates

### 2. Nginx Reverse Proxy
- Routes traffic to active pool
- Automatic failover to backup pool on health check failures
- Structured JSON logging with pool tracking

### 3. Alert Watcher
- Monitors nginx access logs in real-time
- Detects error rates exceeding threshold (default: 2%)
- Tracks pool failover events
- Sends alerts to Slack with cooldown period (5 minutes)

## Monitoring & Alerts

### Alert Types

#### 1. High Error Rate Alert
Triggered when error rate exceeds 2% over the last 200 requests.

**Example:**
```
 Error Rate Alert

High error rate detected: 25.00% over last 200 requests (threshold: 2.0%)

• Error Rate: 25.00%
• Threshold: 2.0%
• Window Size: 200
• Errors: 50
• Current Pool: green
• Timestamp: 2025-10-30 15:23:35 UTC
```

#### 2. Failover Alert
Triggered when traffic fails over from primary to backup pool.

**Example:**
```
 Failover Alert

Failover detected: Traffic switched from blue to green

• Previous Pool: blue
• Current Pool: green
• Upstream Address: 172.20.0.3:3000
• Release ID: v1.0.0-green-2024
• Timestamp: 2025-10-30 15:26:37 UTC
```

## Environment Variables (.env)

```bash
# Application Images
BLUE_IMAGE=your-image:tag
GREEN_IMAGE=your-image:tag

# Active Pool Configuration
ACTIVE_POOL=blue                # Current active pool (blue or green)

# Release Identifiers
RELEASE_ID_BLUE=v1.0.0-blue
RELEASE_ID_GREEN=v1.0.0-green

# Slack Webhook for Alerts
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

# Monitoring Configuration
ERROR_RATE_THRESHOLD=2          # Error rate percentage threshold
WINDOW_SIZE=200                  # Number of requests to analyze
ALERT_COOLDOWN_SEC=300          # Seconds between same alert type
MAINTENANCE_MODE=false          # Set to true to suppress alerts

# Application Configuration
PORT=3000                       # Application port
```

**Important:** No spaces around `=` in the .env file.

## Operations

### Viewing Logs

```bash
# All containers
docker compose logs -f

# Nginx access logs (structured JSON)
docker compose exec nginx tail -f /var/log/nginx/access.log | jq .

# Alert watcher
docker compose logs -f alert_watcher

# Specific app pool
docker compose logs -f app_blue
docker compose logs -f app_green
```

### Switching Active Pool

```bash
# Manual pool switch (zero-downtime)
sed -i 's/ACTIVE_POOL=blue/ACTIVE_POOL=green/' .env
docker compose up -d nginx

# Verify switch
curl -I http://localhost:8080 | grep -i x-app-pool
```

### Zero-Downtime Deployment

```bash
# Use the deployment script
./deploy.sh your-image:new-version

# The script will:
# 1. Pull new image
# 2. Update inactive pool
# 3. Wait for health checks
# 4. Switch traffic
# 5. Verify deployment
```

### Testing

```bash
# Run comprehensive test suite
./test_deployment.sh

# Tests include:
# - Health checks
# - Baseline traffic (150 requests)
# - Failover simulation
# - Error rate detection
# - Slack notification verification
```

### Viewing Structured Logs

```bash
# Single formatted log entry
docker compose exec nginx tail -1 /var/log/nginx/access.log | jq .

# Output:
{
  "time_local": "30/Oct/2025:15:27:28 +0000",
  "remote_addr": "172.20.0.1",
  "request": "GET / HTTP/1.1",
  "status": 200,
  "body_bytes_sent": 1247,
  "request_time": 0.008,
  "upstream_addr": "172.20.0.3:3000",
  "upstream_status": "200",
  "upstream_response_time": "0.008",
  "pool": "green",
  "release": "v1.0.0-green-2024"
}

# Filter errors only
docker compose exec nginx tail -100 /var/log/nginx/access.log | jq 'select(.status >= 500)'
```

## Files Structure

```
.
├── docker-compose.yml        # Main orchestration file
├── .env                      # Environment configuration
├── nginx/
│   └── entrypoint.sh        # Nginx dynamic configuration
├── watcher/
│   ├── Dockerfile           # Alert watcher container
│   ├── requirements.txt     # Python dependencies
│   └── watcher.py          # Monitoring script
├── test_deployment.sh       # Automated test suite
├── runbook.md              # Operational runbook
└── README.md               # This file
```

## Operational Runbook

For detailed operational procedures, troubleshooting, and alert response guidelines, see [runbook.md](runbook.md).

The runbook includes:
- Alert types and response procedures
- Deployment procedures
- Troubleshooting guides
- Emergency procedures
- System health checks
- Configuration reference

## Troubleshooting

### Not Receiving Slack Alerts

```bash
# Test webhook manually
curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test alert"}' \
  "$SLACK_WEBHOOK_URL"

# Check watcher has correct URL
docker compose exec alert_watcher env | grep SLACK_WEBHOOK_URL

# Restart watcher
docker compose restart alert_watcher
```

### Both Pools Unhealthy

```bash
# Check container status
docker compose ps

# Restart all services
docker compose restart app_blue app_green

# Wait for health checks
sleep 15

# Verify health
docker compose ps
```

### High Error Rate

```bash
# Check nginx logs for errors
docker compose exec nginx grep '"status":5' /var/log/nginx/access.log | tail -20 | jq .

# Switch to backup pool if needed
sed -i 's/ACTIVE_POOL=blue/ACTIVE_POOL=green/' .env
docker compose up -d nginx
```

## Advanced Configuration

### Adjusting Error Thresholds

Edit `.env` and restart watcher:
```bash
ERROR_RATE_THRESHOLD=5  # Increase to 5%
docker compose restart alert_watcher
```

### Customizing Health Checks

Edit `docker-compose.yml`:
```yaml
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:3000/healthz || exit 1"]
  interval: 10s    # Check every 10 seconds
  timeout: 5s      # Timeout after 5 seconds
  retries: 3       # Retry 3 times before marking unhealthy
```

### Custom Alert Messages

Modify `watcher/watcher.py` to customize Slack message format and content.

## Production Checklist

- [ ] Slack webhook configured and tested
- [ ] Both pools using correct images
- [ ] Health check endpoints verified
- [ ] Error thresholds configured appropriately
- [ ] Test deployment script executed successfully
- [ ] Runbook reviewed by operations team
- [ ] Monitoring dashboard set up (optional)
- [ ] Backup and rollback procedures documented

## Support

For operational issues and alert responses, refer to:
- [runbook.md](runbook.md) - Complete operational guide
- `docker compose logs` - Container logs
- Slack alerts channel - Real-time notifications

## License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published