Skip to content

dorodb-web22/CacheForge

 
 

Repository files navigation

title CacheForge Environment
emoji
colorFrom indigo
colorTo blue
sdk docker
pinned false
app_port 8000
base_path /web
tags
openenv

CacheForge

A production-grade multi-tier cache optimisation environment built on the OpenEnv specification. An RL agent observes live cache health metrics and tunes TTL, capacity, eviction policy, and tier placement to maximise hit rate while minimising latency and memory waste. Includes task-based evaluation with deterministic graders (0.0–1.0 scoring).

Designed to evaluate both reinforcement learning policies and LLM-based decision agents.

This environment is fully compliant with OpenEnv and supports both local Docker execution and remote evaluation via Hugging Face Spaces.


Real-World Motivation

Every large-scale web service — Google, Netflix, Amazon, Cloudflare — relies on multi-tier caching to serve billions of requests per second. Manual cache tuning is often suboptimal and brittle:

  • Static TTLs can't adapt to shifting traffic patterns.
  • Over-provisioned caches waste memory; under-provisioned ones spike latency.
  • Eviction policy choice (LRU vs LFU vs FIFO) depends on workload skew.

CacheForge models this problem as an RL environment: the agent receives real-time cache telemetry and must learn a policy that generalises across traffic patterns of increasing difficulty.


RL Loop

┌─────────┐   observation    ┌───────┐   action    ┌─────────────┐
│  Agent  │ ◄──────────────  │ Cache │ ◄────────── │  Agent      │
│  (LLM)  │ ──────────────► │ Env   │ ──────────► │  Decision   │
└─────────┘   reward         └───────┘             └─────────────┘
  1. Observe: hit rate, latency, memory usage, request distribution
  2. Act: adjust TTL, resize capacity, set eviction policy, shift tiers
  3. Reward: composite signal balancing hit rate, latency, and memory
  4. Repeat for up to 200 steps per episode

Observation Space

Field Type Range Description
hit_rate float 0.0 – 1.0 Cache hit rate across all tiers
miss_rate float 0.0 – 1.0 Cache miss rate (1 - hit_rate)
avg_latency float ≥ 0.0 Average request latency (ms)
memory_usage float ≥ 0.0 Total memory as fraction of capacity
request_rate int ≥ 0 Requests processed this step
hot_keys_ratio float 0.0 – 1.0 Fraction of requests hitting hot keys
cache_distribution dict Per-tier utilisation {L1, L2, L3}
done bool Episode termination flag
reward float Step reward value

Action Space

Field Type Range Description
adjust_ttl int -10 to +10 Global TTL delta (seconds)
resize_cache float -0.2 to +0.2 Relative capacity resize
eviction_policy str "LRU" / "LFU" / "FIFO" Eviction strategy
tier_shift str "none" / "L1→L2" / "L2→L3" Tier data migration

Tasks

CacheForge defines 3 tasks of increasing difficulty that map to different workload generators. Each task is evaluated using a deterministic grader returning a score in [0.0, 1.0]. Tasks are automatically selectable via reset(mode=...).

Task 1 — Easy (Static Workload)

Mode easy
Workload Static Zipf (α = 1.2), fixed key-space
Goal Maximise cache hit rate
Grader score = clamp(hit_rate / 0.75, 0.001, 0.998)

Task 2 — Medium (Mixed Workload)

Mode medium
Workload Alternating between two Zipf distributions every 50 steps
Goal Balance hit rate and latency
Grader score = 0.5 × hit_rate + 0.5 × (1 - min(latency / 50, 1))

Task 3 — Hard (Dynamic Workload)

Mode hard
Workload Sine-wave α modulation + Gaussian noise, continuous drift
Goal Maintain high, stable performance under unpredictable load
Grader 4-component score including stability bonus (hit-rate variance penalty)
score = 0.4 × hit_rate
      + 0.2 × (1 - normalised_latency)
      + 0.2 × (1 - memory_penalty)
      + 0.2 × stability_bonus

The stability component discourages agents that spike then crash — consistent performance is rewarded.

All graders enforce strict open-interval bounds (0, 1) to comply with evaluation constraints, ensuring scores never reach exactly 0.0 or 1.0 due to floating-point rounding.

Note: Maximum score is capped at 0.998 instead of 1.0 to prevent floating-point rounding from producing invalid boundary values (e.g., "1.000") during evaluation.


Reward Function

Per-step reward (continuous, non-sparse):

reward = +2.0 × hit_rate
         -1.5 × normalised_latency   (latency / 50ms)
         -1.0 × memory_overuse       (max(0, usage - 0.85))

This provides a meaningful partial-progress signal every step.


Setup Instructions

Local Development

# Install dependencies
uv sync

# Start the environment server
uv run python -m server.app

# Server runs at http://localhost:8000
# API docs at http://localhost:8000/docs

Docker

# Build (Dockerfile is at project root)
docker build -t cacheforge-env:latest .

# Run
docker run -p 8000:8000 cacheforge-env:latest

Deploy to Hugging Face Spaces

openenv push --repo-id <your-username>/cacheforge

🌐 Live Deployment

CacheForge is deployed and publicly accessible on Hugging Face Spaces:

👉 https://tuhindev2029-cacheforge.hf.space

Health Check

curl https://tuhindev2029-cacheforge.hf.space/health

Reset Example

curl -X POST https://tuhindev2029-cacheforge.hf.space/reset \
  -H "Content-Type: application/json" \
  -d '{}'

Step Example

curl -X POST https://tuhindev2029-cacheforge.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{
    "action": {
      "adjust_ttl": 1,
      "resize_cache": 0.05,
      "eviction_policy": "LRU",
      "tier_shift": "none"
    }
  }'

The API is stateless per episode — a /reset call is required before each /step sequence.


Example Usage

Python Client

from client import CacheforgeEnv
from models import CacheforgeAction

with CacheforgeEnv(base_url="http://localhost:8000") as env:
    result = env.reset(mode="easy", seed=42)
    print(f"Initial hit rate: {result.observation.hit_rate}")

    action = CacheforgeAction(
        adjust_ttl=3,
        resize_cache=0.1,
        eviction_policy="LFU",
        tier_shift="none",
    )
    result = env.step(action)
    print(f"Reward: {result.reward:.3f}")

cURL

# Reset
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"seed": 42, "mode": "easy"}'

# Step
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{
    "action": {
      "adjust_ttl": 3,
      "resize_cache": 0.1,
      "eviction_policy": "LFU",
      "tier_shift": "none"
    }
  }'

Running Inference

The inference.py script runs an LLM agent against the environment using an OpenAI-compatible client (HuggingFace Router). It supports two execution modes:

Local Mode (server already running)

export HF_TOKEN="your_hf_token"

# Start server first
uv run python -m server.app

# Then run inference
python inference.py

Remote Mode (Hugging Face Spaces — no Docker required)

export HF_TOKEN="your_hf_token"
export API_BASE_URL="https://tuhindev2029-cacheforge.hf.space"

python inference.py

Default model: Qwen/Qwen2.5-72B-Instruct (configurable via MODEL_NAME).

The script automatically connects to the environment using API_BASE_URL when set, otherwise defaults to local Docker execution.

The agent interacts with the environment via HTTP API calls, making the system compatible with both local and remote deployments.


Baseline Results

Task Mode Score
Easy Static Zipf 0.998
Medium Mixed Zipf 0.856
Hard Dynamic + noise 0.912

Baseline uses an LLM agent (Qwen/Qwen2.5-72B-Instruct via HuggingFace Router) with a fixed seed for deterministic, reproducible runs. Scores demonstrate strong generalisation across all three workload difficulty levels.

All scores are computed using deterministic graders defined in tasks.py, ensuring reproducibility across runs. Scores exceed the success threshold (0.6) across all tasks.


Project Structure

cacheforge/
├── Dockerfile                    # Container build
├── .dockerignore                 # Docker build exclusions
├── .gitignore                    # Git exclusions
├── openenv.yaml                  # OpenEnv manifest
├── pyproject.toml                # Dependencies & metadata
├── uv.lock                      # Locked dependency versions
├── LICENSE                       # BSD 3-Clause
├── models.py                     # Action & Observation Pydantic models
├── client.py                     # CacheforgeEnv client (WebSocket)
├── tasks.py                      # Task definitions & graders
├── inference.py                  # Baseline inference script
├── README.md                     # This file
└── server/
    ├── __init__.py
    ├── app.py                    # FastAPI server
    ├── cacheforge_environment.py # Core environment simulation
    └── requirements.txt          # Server dependencies

License

BSD-style license. See LICENSE for details.

About

RL-based system for multi-tier cache optimization using dynamic TTL tuning, eviction strategies, and tier placement.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 95.6%
  • Dockerfile 4.4%