codegik · rodrigodlima · Oct 19, 2025 · Oct 20, 2025 · Oct 20, 2025 · Oct 20, 2025
diff --git a/engineering/bitonic-scala3/benchmark/README.md b/engineering/bitonic-scala3/benchmark/README.md
@@ -1,48 +1,174 @@
-# Bitonic Benchmark Runner
+# Bitonic Benchmark Suite
 
-This script benchmarks the two endpoints:
+This directory contains benchmarking tools for testing three endpoints:
 
-- `POST /bitonic?n=...&l=...&r=...`
-- `POST /bitonic-memcached?n=...&l=...&r=...`
-
-It measures latency statistics across different `n` sizes, with configurable concurrency and request counts, and writes a CSV report.
+- `POST /bitonic?n=...&l=...&r=...` — Direct calculation (no cache)
+- `POST /bitonic-redis?n=...&l=...&r=...` — Redis cache
+- `POST /bitonic-memcached?n=...&l=...&r=...` — Memcached cache
 
 ## Requirements
 
-- Python 3.9+
-- `aiohttp`
+- Docker and Docker Compose
+- (Optional) Python 3.9+ for standalone scripts
+
+## Running Benchmarks
 
-Install dependencies:
+### Quick Start
 
-1. **Install Python 3.11.13 using pyenv:**
+1. **Start all services (app, cache, monitoring, and benchmark):**
    ```bash
-   pyenv install 3.11.13
-   pyenv local 3.11.13  # Set for current project
+   docker compose up
    ```
 
-2. **Install uv package manager:**
+2. **View results in Grafana:**
+   - URL: http://localhost:3000
+   - Username: `admin`
+   - Password: `admin`
+   - Dashboard: "Bitonic Service Performance Comparison"
+
+### What Gets Measured
+
+The Grafana dashboard shows:
+
+**Performance Metrics:**
+- Response Time Comparison (Standard vs Redis vs Memcached)
+- Throughput (Requests/Second)
+- 95th Percentile Response Time (P95)
+- 99th Percentile Response Time (P99)
+- Error Rate Comparison
+
+**Cache Metrics:**
+- Redis Cache Hit/Miss Rate
+- Memcached Cache Hit/Miss Rate
+- Cache performance over time
+
+**Visual Indicators:**
+- Color-coded thresholds for response times:
+  - 🟢 Green (0-50ms): Excellent
+  - 🟡 Yellow (50-100ms): Good
+  - 🟠 Orange (100-150ms): Attention needed
+  - 🔴 Red (>150ms): Problematic
+
+## Benchmark Payloads
+
+The benchmark uses `payloads.json` containing 50,000 test cases with:
+- **Array sizes**: 0-1,000 (random distribution)
+- **Range types**: Random l and r values
+- **Purpose**: Comprehensive stress testing of all three endpoints
+
+## Understanding Percentiles (P95, P99)
+
+**What are percentiles?**
+- **P95 (95th Percentile)**: 95% of requests were faster than this value (only 5% were slower)
+- **P99 (99th Percentile)**: 99% of requests were faster than this value (only 1% were slower)
+
+**Why they matter:**
+- The **mean** (average) can hide performance problems
+- Percentiles show the **worst-case experience** for your users
+- High P99 values indicate some users are experiencing poor performance
+
+**Example:**
+```
+If Redis P99 = 50ms, it means:
+- 99% of requests completed in ≤ 50ms (excellent!)
+- Only 1% took longer (outliers)
+```
+
+## Architecture
+
+The benchmark infrastructure consists of:
+
+```
+┌─────────────┐
+│   K6 Load   │ ──── Sends HTTP requests ───▶ ┌──────────────┐
+│  Generator  │                                 │  Bitonic App │
+└─────────────┘                                 │  (3 endpoints)│
+       │                                        └──────┬───────┘
+       │                                               │
+       │ Sends metrics                          Uses cache
+       ▼                                               ▼
+┌─────────────┐                              ┌─────────────────┐
+│  InfluxDB   │ ◀──── Scraped by ────────   │ Redis/Memcached │
+│ (Time-series│                        │     └─────────────────┘
+│   Database) │                        │              │
+└──────┬──────┘                   ┌────┴────┐        │
+       │                          │Telegraf │◀───────┘
+       │ Queries                  │(Scraper)│  Exports metrics via:
+       ▼                          └─────────┘  - Redis Exporter (port 9121)
+┌─────────────┐                                - Memcached Exporter (port 9150)
+│   Grafana   │ ──── Visualizes ────▶ Dashboard
+│ (Dashboard) │
+└─────────────┘
+```
+
+### Cache Metrics Collection
+
+**Redis Metrics:**
+- Collected via [Redis Exporter](https://github.com/oliver006/redis_exporter) (port 9121)
+- Key metrics: `redis_keyspace_hits_total`, `redis_keyspace_misses_total`
+
+**Memcached Metrics:**
+- Collected via [Memcached Exporter](https://github.com/prometheus/memcached_exporter) (port 9150)
+- Key metrics: `memcached_commands_total{command="get",status="hit/miss"}`
+
+**Collection Flow:**
+1. Exporters scrape cache statistics from Redis/Memcached
+2. Telegraf collects Prometheus-formatted metrics from exporters (every 10s)
+3. Telegraf sends metrics to InfluxDB (database: `k6`, measurement: `prometheus`)
+4. Grafana queries InfluxDB and visualizes cache hit/miss rates
+
+## Resource Limits
+
+All containers have CPU and memory limits configured:
+- **App**: 1 CPU, 1GB RAM
+- **Redis**: 0.25 CPU, 256MB RAM
+- **Memcached**: 0.25 CPU, 128MB RAM
+- **InfluxDB**: 0.5 CPU, 512MB RAM
+- **Grafana**: 0.5 CPU, 512MB RAM
+- **K6**: 0.5 CPU, 256MB RAM
+
+This ensures consistent and fair performance comparison.
+
+## Troubleshooting
+
+### Dashboard shows no data
+- Wait for K6 to start sending requests (starts automatically with `docker compose up`)
+- Check if InfluxDB is running: `docker ps | grep influx`
+- Verify K6 is running: `docker logs bitonic-scala3-k6-1`
+
+### Cache metrics not showing
+1. **Verify exporters are running:**
    ```bash
-   pip install uv==0.7.11
+   docker ps | grep exporter
    ```
 
-3. **Create and activate virtual environment:**
+2. **Test exporters directly:**
    ```bash
-   uv venv
-   source .venv/bin/activate
-    ```
+   # Redis Exporter
+   curl http://localhost:9121/metrics | grep keyspace
 
-    ```bash
-    pip install aiohttp
-    ```
+   # Memcached Exporter
+   curl http://localhost:9150/metrics | grep commands
+   ```
 
-## Usage
+3. **Check Telegraf logs:**
+   ```bash
+   docker logs telegraf
+   ```
 
+4. **Query InfluxDB directly:**
+   ```bash
+   docker exec bitonic-scala3-influxdb-1 influx -database 'k6' -execute 'SELECT * FROM prometheus WHERE cache_type = '\''redis'\'' LIMIT 5'
+   ```
+
+### K6 finished but want to run again
 ```bash
-python benchmark_bitonic.py --base-url http://localhost:8080   --endpoints bitonic bitonic-memcached   --sizes 16 32 64 128 256 512 1024   --l 1 --r 100000   --requests 200 --concurrency 50   --warmup 20   --out results.csv
+docker compose restart k6
 ```
 
-## Output
-
-- A `results.csv` with columns:
-  - `endpoint, n, l, r, requests, concurrency, success_rate, latency_mean_s, latency_stdev_s, latency_p50_s, latency_p90_s, latency_p95_s, latency_p99_s, resp_size_mean_bytes`
+### Clear all metrics and start fresh
+```bash
+docker compose down -v  # Removes volumes (InfluxDB data)
+docker compose up
+```
 
diff --git a/engineering/bitonic-scala3/docker-compose.yml b/engineering/bitonic-scala3/docker-compose.yml
@@ -11,6 +11,11 @@ services:
       interval: 5s
       timeout: 3s
       retries: 5
+    deploy:
+      resources:
+        limits:
+          cpus: '0.25'
+          memory: 256M
   memcached:
     image: memcached:alpine3.22
     container_name: memcached
@@ -28,6 +33,45 @@ services:
       - --conn-limit=1024
       - --memory-limit=64
       - --threads=4
+    deploy:
+      resources:
+        limits:
+          cpus: '0.25'
+          memory: 128M
+
+  redis-exporter:
+    image: oliver006/redis_exporter:latest
+    container_name: redis-exporter
+    ports:
+      - "9121:9121"
+    environment:
+      REDIS_ADDR: redis:6379
+    networks:
+      - bitonic
+    depends_on:
+      - redis
+    deploy:
+      resources:
+        limits:
+          cpus: '0.1'
+          memory: 64M
+
+  memcached-exporter:
+    image: prom/memcached-exporter:latest
+    container_name: memcached-exporter
+    ports:
+      - "9150:9150"
+    command:
+      - --memcached.address=memcached:11211
+    networks:
+      - bitonic
+    depends_on:
+      - memcached
+    deploy:
+      resources:
+        limits:
+          cpus: '0.1'
+          memory: 64M
 
   app:
     container_name: bitonic-app
@@ -67,9 +111,14 @@ services:
     # Add stdin/tty for interactive logging
     stdin_open: true
     tty: true
+    deploy:
+      resources:
+        limits:
+          cpus: '1.0'
+          memory: 1024M
 
   influxdb:
-    image: influxdb:1.8
+    image: influxdb:1.12.2
     environment:
       INFLUXDB_DB: k6
       INFLUXDB_ADMIN_USER: admin
@@ -81,6 +130,28 @@ services:
       - influxdb-data:/var/lib/influxdb
     networks:
       - bitonic
+    deploy:
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 512M
+
+  telegraf:
+    image: telegraf:latest
+    container_name: telegraf
+    volumes:
+      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
+    networks:
+      - bitonic
+    depends_on:
+      - influxdb
+      - redis-exporter
+      - memcached-exporter
+    deploy:
+      resources:
+        limits:
+          cpus: '0.25'
+          memory: 256M
 
   grafana:
     image: grafana/grafana
@@ -91,13 +162,20 @@ services:
     environment:
       GF_SECURITY_ADMIN_USER: admin
       GF_SECURITY_ADMIN_PASSWORD: admin
+      GF_SECURITY_FORCE_DEFAULT_ADMIN_PASSWORD_CHANGE: "false"
       SSL_CERT_FILE: /etc/ssl/certs/custom-cert.pem
     volumes:
       - ./grafana/provisioning:/etc/grafana/provisioning
       - ./grafana/dashboards:/var/lib/grafana/dashboards
+      - ./grafana/config/custom.ini:/etc/grafana/custom.ini
     restart: always
     networks:
       - bitonic
+    deploy:
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 512M
 
   k6:
     image: grafana/k6:latest
@@ -119,6 +197,11 @@ services:
         "influxdb=http://influxdb:8086/k6?consistency=one",
         "/benchmark/benchmark.js",
       ]
+    deploy:
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 256M
 
 networks:
   bitonic: