GitHub - basit-devBE/Deploy-Watch

# Blue/Green Deployment with Automated Monitoring

Production-ready blue/green deployment system with Docker Compose, featuring automated monitoring, error detection, and Slack notifications.

## Features

- **Zero-Downtime Deployments**: Seamless switching between blue and green pools
- **Automatic Failover**: Nginx automatically routes to backup pool when primary fails
- **Real-Time Monitoring**: Continuous monitoring of error rates and failover events
- **Slack Notifications**: Instant alerts for high error rates and failover events
- **Structured Logging**: JSON-formatted nginx logs with pool, release, and latency tracking
- **Health Checks**: Automated container health monitoring
- **Comprehensive Testing**: Automated test suite for all deployment scenarios

## Architecture

```
                    ┌─────────────┐
                    │   Clients   │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │    Nginx    │ (Port 8080)
                    │   Reverse   │
                    │    Proxy    │
                    └──┬───────┬──┘
                       │       │
        ┌──────────────┘       └──────────────┐
        │                                     │
    ┌───▼────┐                           ┌───▼────┐
    │  Blue  │ (Primary)                 │ Green  │ (Backup)
    │  Pool  │                           │  Pool  │
    └────────┘                           └────────┘
        │                                     │
        └─────────────┬───────────────────────┘
                      │
              ┌───────▼────────┐
              │ Alert Watcher  │
              │   (Monitors    │
              │  Nginx Logs)   │
              └───────┬────────┘
                      │
              ┌───────▼────────┐
              │     Slack      │
              │ Notifications  │
              └────────────────┘
```

## Quick Start

1. **Setup environment**
```bash
# Make entrypoint executable
chmod +x ./nginx/entrypoint.sh

# Configure your environment (update with your values)
cat > .env << EOF
BLUE_IMAGE=your-image:tag
GREEN_IMAGE=your-image:tag
ACTIVE_POOL=blue
RELEASE_ID_BLUE=v1.0.0-blue
RELEASE_ID_GREEN=v1.0.0-green
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
ERROR_RATE_THRESHOLD=2
WINDOW_SIZE=200
ALERT_COOLDOWN_SEC=300
MAINTENANCE_MODE=false
PORT=3000
EOF
```

2. **Start the stack**
```bash
docker compose up -d
```

3. **Verify deployment**
```bash
# Check all services are healthy
docker compose ps

# Test the application
curl http://localhost:8080

# Check which pool is active
curl -I http://localhost:8080 | grep -i x-app-pool

# Monitor logs
docker compose logs -f alert_watcher
```

4. **Run automated tests**
```bash
./test_deployment.sh
```

## System Components

### 1. Application Pools
- **Blue Pool** (`app_blue`): Primary deployment pool
- **Green Pool** (`app_green`): Secondary deployment pool for zero-downtime updates

### 2. Nginx Reverse Proxy
- Routes traffic to active pool
- Automatic failover to backup pool on health check failures
- Structured JSON logging with pool tracking

### 3. Alert Watcher
- Monitors nginx access logs in real-time
- Detects error rates exceeding threshold (default: 2%)
- Tracks pool failover events
- Sends alerts to Slack with cooldown period (5 minutes)

## Monitoring & Alerts

### Alert Types

#### 1. High Error Rate Alert
Triggered when error rate exceeds 2% over the last 200 requests.

**Example:**
```
 Error Rate Alert

High error rate detected: 25.00% over last 200 requests (threshold: 2.0%)

• Error Rate: 25.00%
• Threshold: 2.0%
• Window Size: 200
• Errors: 50
• Current Pool: green
• Timestamp: 2025-10-30 15:23:35 UTC
```

#### 2. Failover Alert
Triggered when traffic fails over from primary to backup pool.

**Example:**
```
 Failover Alert

Failover detected: Traffic switched from blue to green

• Previous Pool: blue
• Current Pool: green
• Upstream Address: 172.20.0.3:3000
• Release ID: v1.0.0-green-2024
• Timestamp: 2025-10-30 15:26:37 UTC
```

## Environment Variables (.env)

```bash
# Application Images
BLUE_IMAGE=your-image:tag
GREEN_IMAGE=your-image:tag

# Active Pool Configuration
ACTIVE_POOL=blue                # Current active pool (blue or green)

# Release Identifiers
RELEASE_ID_BLUE=v1.0.0-blue
RELEASE_ID_GREEN=v1.0.0-green

# Slack Webhook for Alerts
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

# Monitoring Configuration
ERROR_RATE_THRESHOLD=2          # Error rate percentage threshold
WINDOW_SIZE=200                  # Number of requests to analyze
ALERT_COOLDOWN_SEC=300          # Seconds between same alert type
MAINTENANCE_MODE=false          # Set to true to suppress alerts

# Application Configuration
PORT=3000                       # Application port
```

**Important:** No spaces around `=` in the .env file.

## Operations

### Viewing Logs

```bash
# All containers
docker compose logs -f

# Nginx access logs (structured JSON)
docker compose exec nginx tail -f /var/log/nginx/access.log | jq .

# Alert watcher
docker compose logs -f alert_watcher

# Specific app pool
docker compose logs -f app_blue
docker compose logs -f app_green
```

### Switching Active Pool

```bash
# Manual pool switch (zero-downtime)
sed -i 's/ACTIVE_POOL=blue/ACTIVE_POOL=green/' .env
docker compose up -d nginx

# Verify switch
curl -I http://localhost:8080 | grep -i x-app-pool
```

### Zero-Downtime Deployment

```bash
# Use the deployment script
./deploy.sh your-image:new-version

# The script will:
# 1. Pull new image
# 2. Update inactive pool
# 3. Wait for health checks
# 4. Switch traffic
# 5. Verify deployment
```

### Testing

```bash
# Run comprehensive test suite
./test_deployment.sh

# Tests include:
# - Health checks
# - Baseline traffic (150 requests)
# - Failover simulation
# - Error rate detection
# - Slack notification verification
```

### Viewing Structured Logs

```bash
# Single formatted log entry
docker compose exec nginx tail -1 /var/log/nginx/access.log | jq .

# Output:
{
  "time_local": "30/Oct/2025:15:27:28 +0000",
  "remote_addr": "172.20.0.1",
  "request": "GET / HTTP/1.1",
  "status": 200,
  "body_bytes_sent": 1247,
  "request_time": 0.008,
  "upstream_addr": "172.20.0.3:3000",
  "upstream_status": "200",
  "upstream_response_time": "0.008",
  "pool": "green",
  "release": "v1.0.0-green-2024"
}

# Filter errors only
docker compose exec nginx tail -100 /var/log/nginx/access.log | jq 'select(.status >= 500)'
```

## Files Structure

```
.
├── docker-compose.yml        # Main orchestration file
├── .env                      # Environment configuration
├── nginx/
│   └── entrypoint.sh        # Nginx dynamic configuration
├── watcher/
│   ├── Dockerfile           # Alert watcher container
│   ├── requirements.txt     # Python dependencies
│   └── watcher.py          # Monitoring script
├── test_deployment.sh       # Automated test suite
├── runbook.md              # Operational runbook
└── README.md               # This file
```

## Operational Runbook

For detailed operational procedures, troubleshooting, and alert response guidelines, see [runbook.md](runbook.md).

The runbook includes:
- Alert types and response procedures
- Deployment procedures
- Troubleshooting guides
- Emergency procedures
- System health checks
- Configuration reference

## Troubleshooting

### Not Receiving Slack Alerts

```bash
# Test webhook manually
curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test alert"}' \
  "$SLACK_WEBHOOK_URL"

# Check watcher has correct URL
docker compose exec alert_watcher env | grep SLACK_WEBHOOK_URL

# Restart watcher
docker compose restart alert_watcher
```

### Both Pools Unhealthy

```bash
# Check container status
docker compose ps

# Restart all services
docker compose restart app_blue app_green

# Wait for health checks
sleep 15

# Verify health
docker compose ps
```

### High Error Rate

```bash
# Check nginx logs for errors
docker compose exec nginx grep '"status":5' /var/log/nginx/access.log | tail -20 | jq .

# Switch to backup pool if needed
sed -i 's/ACTIVE_POOL=blue/ACTIVE_POOL=green/' .env
docker compose up -d nginx
```

## Advanced Configuration

### Adjusting Error Thresholds

Edit `.env` and restart watcher:
```bash
ERROR_RATE_THRESHOLD=5  # Increase to 5%
docker compose restart alert_watcher
```

### Customizing Health Checks

Edit `docker-compose.yml`:
```yaml
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:3000/healthz || exit 1"]
  interval: 10s    # Check every 10 seconds
  timeout: 5s      # Timeout after 5 seconds
  retries: 3       # Retry 3 times before marking unhealthy
```

### Custom Alert Messages

Modify `watcher/watcher.py` to customize Slack message format and content.

## Production Checklist

- [ ] Slack webhook configured and tested
- [ ] Both pools using correct images
- [ ] Health check endpoints verified
- [ ] Error thresholds configured appropriately
- [ ] Test deployment script executed successfully
- [ ] Runbook reviewed by operations team
- [ ] Monitoring dashboard set up (optional)
- [ ] Backup and rollback procedures documented

## Support

For operational issues and alert responses, refer to:
- [runbook.md](runbook.md) - Complete operational guide
- `docker compose logs` - Container logs
- Slack alerts channel - Real-time notifications

## License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
nginx		nginx
watcher		watcher
.env.example		.env.example
README.md		README.md
docker-compose.yml		docker-compose.yml
nginx.conf.template		nginx.conf.template
runbook.md		runbook.md

basit-devBE/Deploy-Watch

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages