FutureStack GenAI Hackathon Project
SimStack is an intelligent simulation orchestration platform that uses Cerebras inference with Meta's Llama 3.1 models to automatically plan, spawn, and analyze parallel simulation scenarios using Docker MCP containers. Get actionable insights from complex what-if analyses in seconds, powered by Cerebras' industry-leading 1800+ tokens/second inference speed.
SimStack solves the problem of complex operational planning by:
- Taking high-level goals (e.g., "reduce ER wait time by 20%")
- Using Llama 3.1 via Cerebras to generate optimal simulation plans
- Spawning parallel Docker containers running different scenario variants
- Streaming real-time results back to users
- Exporting reproducible Docker Compose configurations
┌─────────────┐ WebSocket ┌──────────────┐
│ React │ ◄─────────────────► │ Go Backend │
│ Dashboard │ │ (FastAPI) │
└─────────────┘ └──────┬───────┘
│
Cerebras API
(Llama 3.1)
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Queue │ │ Traffic │ │ Resource │
│Simulator │ │Simulator │ │Simulator │
│ (Docker) │ │ (Docker) │ │ (Docker) │
└──────────┘ └──────────┘ └──────────┘
- Location:
backend/internal/cerebras/client.go - Usage: OpenAI-compatible API client for Llama 3.1 inference
- Speed: Tracks tokens/sec (target: 1800+) displayed in performance metrics
- Models:
llama3.1-8bfor fast planning (default)llama3.1-70bfor complex reasoning (configurable)
- Location:
backend/internal/orchestrator/engine.go:47-145 - Usage:
- Tool calling with function schemas for simulator selection
- Variant parameter generation from user goals
- JSON-structured planning responses
- Function Calling: Three tool schemas defined for queue, traffic, and resource simulators
- Location:
simulators/queue/- M/M/1 queueing systemsimulators/traffic/- Traffic flow simulationsimulators/resource/- Staff allocation simulator
- Implementation: Each simulator is a FastAPI service in a Docker container with standardized
/simulateendpoint - Orchestration: Backend spawns parallel HTTP calls to simulator containers (lines 196-319)
- Docker & Docker Compose
- Go 1.22+ (for local development)
- Node.js 18+ (for frontend)
- Cerebras API key (get one here)
- Clone and configure environment:
git clone <repo-url>
cd cerebrus-docker-meta
export CEREBRAS_API_KEY="your-key-here"- Start all services:
docker compose up --buildThis will start:
- Backend API on
localhost:8080 - Queue simulator on
localhost:8101 - Traffic simulator on
localhost:8102 - Resource simulator on
localhost:8103
- Run frontend (separate terminal):
cd frontend
npm install
npm run devFrontend runs on localhost:5173
- Open
http://localhost:5173 - Enter a goal: "reduce ER wait time by 20%"
- Click "Start"
- Watch real-time events stream:
plan- Cerebras generates simulation variantssim_start- Each variant beginssim_complete- Results arrivedone- All simulations complete
Start a simulation run:
curl -X POST http://localhost:8080/api/run \
-H "Content-Type: application/json" \
-d '{"goal": "reduce ER wait time by 20%"}'Export winning scenario as Docker Compose:
curl -X POST http://localhost:8080/api/export \
-H "Content-Type: application/json" \
-d '{"goal": "optimize staffing", "parameters": {"staff": 25}}' \
-o winning-scenario.ymlView performance metrics:
curl http://localhost:8080/metrics
# Returns: {"planner_ms": 450, "simulation_startup_ms": 230, "tokens_per_second": 1850.5}WebSocket for real-time events:
const ws = new WebSocket('ws://localhost:8080/ws');
ws.onmessage = (e) => {
const event = JSON.parse(e.data);
console.log(event.type, event.payload);
};Purpose: Hospital ER wait times, call center queues
Parameters: arrival_rate, service_rate (per hour)
Output: avg_wait_time_min, utilization
Algorithm: M/M/1 queueing theory
Purpose: Urban traffic planning, intersection optimization
Parameters: density (0.0-1.0), signal_timing (seconds)
Output: avg_speed_kmh, throughput_veh_per_hr
Algorithm: Speed-density relationship
Purpose: Staff scheduling, capacity planning
Parameters: staff (count), shifts (array)
Output: coverage_units, satisfaction (0-1)
Algorithm: Linear coverage model
| Variable | Default | Description |
|---|---|---|
CEREBRAS_API_KEY |
(required) | Your Cerebras Cloud API key |
CEREBRAS_API_BASE |
https://api.cerebras.ai/v1 |
API endpoint |
CEREBRAS_MODEL |
llama3.1-8b |
Model to use (8b/70b) |
SIMSTACK_ADDR |
:8080 |
Backend listen address |
QUEUE_SIMULATOR_URL |
http://localhost:8101 |
Queue service URL |
TRAFFIC_SIMULATOR_URL |
http://localhost:8102 |
Traffic service URL |
RESOURCE_SIMULATOR_URL |
http://localhost:8103 |
Resource service URL |
export CEREBRAS_MODEL=llama3.1-70b
docker compose up backendSimStack tracks key performance metrics to demonstrate Cerebras speed advantages:
- Planner Latency: Time for Llama 3.1 to generate simulation plan
- Token Throughput: Tokens per second (Cerebras target: 1800+)
- Simulation Startup: Time to spawn all Docker containers
- E2E Latency: Total time from goal to actionable results
Access metrics via /metrics endpoint or frontend dashboard.
- Real-world applications: healthcare optimization, urban planning, workforce management
- Reduces complex analysis from hours to seconds
- Exportable, reproducible scenarios for stakeholders
- ✅ Cerebras Cloud SDK with OpenAI-compatible client
- ✅ Llama 3.1 function calling for tool selection
- ✅ Three fully functional Docker MCP simulators
- ✅ WebSocket real-time streaming
- ✅ Parallel execution for speed
- ✅ Go backend for performance, React for UX
- Novel approach: AI plans simulations rather than humans
- Multi-tool orchestration with parallel Docker containers
- Automatic parameter variant generation
- Export-to-compose for reproducibility
- Clean, modern React dashboard
- Real-time event streaming (not polling)
- One-click start with intelligent defaults
- Visual feedback for each pipeline stage
- Downloadable Docker Compose files
- Comprehensive documentation
- Live demo-ready setup
- Clear sponsor tech integration
- Measurable performance metrics
- Production-grade code structure
cd backend
go mod download
go run ./cmd/servercd frontend
npm install
npm run devcd simulators/queue
docker build -t simstack/queue .
docker run -p 8101:8000 simstack/queue# Test queue simulator
curl -X POST http://localhost:8101/simulate \
-H "Content-Type: application/json" \
-d '{"arrival_rate": 10, "service_rate": 12}'cerebrus-docker-meta/
├── backend/ # Go backend service
│ ├── cmd/server/ # Entry point
│ ├── internal/
│ │ ├── cerebras/ # Cerebras API client
│ │ ├── orchestrator/ # Simulation orchestration engine
│ │ ├── server/ # HTTP + WebSocket server
│ │ └── types/ # Shared types
│ ├── Dockerfile
│ └── go.mod
├── frontend/ # React dashboard
│ ├── src/
│ │ ├── App.jsx # Main component with WebSocket
│ │ └── ...
│ ├── package.json
│ └── vite.config.js
├── simulators/ # Docker MCP containers
│ ├── queue/
│ │ ├── app.py # FastAPI service
│ │ ├── requirements.txt
│ │ └── Dockerfile
│ ├── traffic/
│ └── resource/
├── docker-compose.yml # Full stack orchestration
└── README.md # This file
30-Second Demo Flow:
- Show architecture diagram
- Start docker-compose (already running)
- Open frontend at
localhost:5173 - Enter goal: "reduce ER wait time by 20%"
- Click Start → watch real-time events
- Highlight: Plan from Cerebras in ~450ms
- Highlight: 3 variants running in parallel
- Show results with metrics comparison
- Export winning scenario as Docker Compose
- Show
/metricsendpoint with 1800+ tokens/sec
Key Talking Points:
- "Cerebras delivers plans in under 500ms - 5x faster than traditional inference"
- "Llama 3.1 function calling selects optimal simulators automatically"
- "Docker MCP ensures reproducible, isolated simulations"
- "From goal to actionable insight in seconds, not hours"
- File:
backend/internal/cerebras/client.go - Metrics:
/metricsendpoint shows tokens/sec - Screenshots: Performance dashboard with >1800 tok/s
- File:
backend/internal/orchestrator/engine.go:52-98 - Function schemas for tool calling
- JSON-structured planning responses
- Three MCP simulators in
simulators/ docker-compose.ymlorchestration- Exportable compose files from
/api/export
MIT License - FutureStack Hackathon 2025
- Cerebras for blazing-fast inference infrastructure
- Meta for open-source Llama 3.1 models
- Docker for containerization and MCP tools
Built for FutureStack GenAI Hackathon | Demo Video | Slides