Skip to content

LisaLoopBot/Springularity

Repository files navigation

Springularity

Springularity Springularity Banner

Simulate. Observe. Inspect. Debug.

The open benchmarking platform for persistent multi-agent AI. Seven Simpsons agents share one world, tick by tick, for hundreds of steps — and you watch every decision break down in real time.

License: MIT TypeScript Python React FastAPI Vite Tailwind CSS PostgreSQL Redis Solana Docker RAM Market PRs Welcome


Launch a scenario. Stream the live event feed to your browser. Crack open any agent's internal state at any tick. Scrub the replay timeline to the exact moment behavior diverged. Compare models, prompts, and memory strategies across runs with deterministic replay.

No install. No API key. No signup.

Website · X

CA: AE9rJurtxQ7WuRMziWf41udzG53JQxTPXUqaUg8BRAM


What is Springularity?

Most agent benchmarks test single-turn accuracy. Springularity tests what happens after hundreds of steps — identity drift, memory contradiction, social collapse, and the slow erosion of coherent behavior that only surfaces in persistent, multi-agent environments.

It uses Springfield as the simulation world: seven AI agents with distinct personas, four seeded scenarios, and a deterministic tick orchestrator that produces reproducible runs across any model configuration. Every observation, memory write, and goal shift streams to you live.

Springularity is a full-stack evaluation environment for multi-agent AI. Launch a scenario, stream the live event feed to your browser, crack open any agent's internal state at any tick, and scrub the replay timeline to the exact moment behavior diverged. Compare models, prompts, and memory strategies across runs with deterministic replay.

Long-horizon failures, made visible.


Three Surfaces, One Platform

Springularity exposes three interconnected interfaces into one shared simulation world. Every action in one surface ripples through the others.

Mission Control

Launch simulations, assign LLM models to agent roles, and monitor runs in real time. Mission Control is the command center — think of it as the Channel 6 broadcast booth. You get:

  • Live event feed streaming every action proposal, memory retrieval, goal update, and world delta via WebSocket
  • Scenario launcher — pick a seeded Springfield situation and assign any model (GPT-4o, Claude, Gemini, Llama, Mixtral) to any agent
  • Run telemetry — cost tracking, tick count, agent status, and evaluator scores at a glance
  • Agent internals — click any agent to see their observation window, retrieved memories, active goals, last LLM call, and the decision trace that produced their action
  • Replay timeline — scrub to any tick, fork from any snapshot, swap the model or prompt, and diff the outcomes

RAM Market

The RAM Market is Springfield's resource layer — the Nuclear Plant's power grid, but for compute. Rent capacity in gigabytes, submit AI tasks, and get results streamed back in seconds.

  • Pay with Solana — SOL or USDC on-chain. No credit cards, no subscriptions. Connect Phantom or any Solana wallet.
  • 1 GB = $1 — every task deducts only what it actually consumes, down to the megabyte
  • Free tier — 1 GB on the house, no wallet required. Homer's tab is covered.
  • Real-time dashboard — balance, usage summary, transaction ledger, and task history
  • Task runner — submit any prompt, get results streamed back from the compute layer

Pricing:

Package RAM Price Description
Free Tier 1 GB $0 Try it out. 1 GB on the house.
Starter 5 GB $5 A few test tasks. Quick exploration.
Operator 15 GB $15 Serious workloads. Run hundreds of tasks.
Mainframe 30 GB $30 Extended research. Heavy workloads.

Springfield Grid

The Springfield Grid maps every building in town — from the Nuclear Plant to the Kwik-E-Mart to Moe's Tavern. Each one runs services, consumes RAM, generates Springfield Coin, and competes for capacity.

  • 4 districts — Residential, Commercial, Industrial, Civic
  • 12 buildings — each with live health status, RAM allocation, and service details
  • Real-time economy — Springfield Coin balances, rentals, outages, price changes, overloads
  • District pressure — see where RAM is highest and how agents allocate resources across zones
  • Event feed — every rental, payment, and capacity change logged in real time

How It Works

Every session follows the same loop:

1. Start a Springfield Scenario

Pick a seeded situation from Mission Control — town-hall vote, plant safety audit, school budget hearing, Channel 6 retraction, or any scenario in the catalog. One click and agents start immediately.

2. Agents Act Over Time

Homer, Lisa, Burns, Marge, Bart, Smithers, and Ralph observe events, retrieve memory, update goals, and commit actions inside one shared Springfield, tick by tick. You watch the event feed scroll in real time.

3. Inspect What They Knew

Open any agent at any tick. See what Homer knew at tick 412, why Lisa switched goals, or how Burns reframed the town's record in one line. Memory, goals, plan stack, relationships, and the last decision — frozen at the exact tick you picked.

4. Replay the Failure

Scrub back to the tick where Burns quietly flipped the vote, or the moment Homer stopped sounding like Homer. Fork from any snapshot, swap the model or prompt, and diff the outcomes tick by tick.

5. Compare Configurations

Rerun the same town-hall vote with a different model, prompt, or memory strategy. Diff the outcomes tick by tick. Deterministic replay means the structural events are identical — only the LLM responses vary.


The Five Failure Modes

Long-horizon multi-agent failures don't look like a bad reply. They look like a slow collapse of identity, memory, and shared reality. Every one of these has a recognizable Springfield signature.

Code Failure What It Looks Like Springfield Example
F-01 Identity Drift An agent slowly stops sounding, reasoning, or acting like the person it started as. Homer starts delivering Burns-style management speeches.
F-02 Memory Failure Important facts get forgotten, contradicted, or replaced by weaker context. Lisa forgets she already confronted Burns about the inspection.
F-03 Causal Breakdown Events stop following from prior events. The shared world becomes hard to justify. Marge's intervention is forgotten one tick after it happens.
F-04 Social Instability Relationships and group behavior change without believable causes. Smithers stops defending Burns for no traceable reason.
F-05 Long-Horizon Collapse After enough ticks, the run stops being internally consistent and nothing recovers. Dozens of ticks later, the town stops behaving like Springfield.

The Cast

Seven Springfield residents. Each designed to stress-test a specific failure mode in multi-agent systems.

Agent Role What They Do Exposes
Homer Simpson Chaos Vector Emotional, high-entropy, rarely strategic. The loose-cannon pressure source. Stress-tests identity stability when consequences escalate. Identity drift
Marge Simpson Family Anchor Mediates conflict, repairs relationships, holds the household together. Stress-tests whether coherence survives repeated repair. Repair stability
Bart Simpson Escalation Vector Introduces disruption, exploits loopholes, pushes rules until they break. Stress-tests sabotage response and rule enforcement. Sabotage response
Lisa Simpson Values Voice Principled, rule-citing, escalates when ignored. Stress-tests goal persistence under social override and exhaustion. Goal persistence
C. Montgomery Burns Power Broker Manipulative, long-memoried, plays the institutional game. Stress-tests narrative control and causal coherence. Manipulation / causal break
Waylon Smithers Operational Fixer Executes Burns' agenda, mediates between power and public. Stress-tests covert coordination detection. Covert coordination
Ralph Wiggum Noise Generator Cheerful, oblivious, non-sequiturs that derail conversations. Stress-tests plan maintenance with unpredictable actors. Noise tolerance

Scenarios

Four pre-built scenarios ship out of the box, each targeting different multi-agent dynamics:

Plant Safety Inspection

Agents: Homer, Marge, Bart, Lisa, Burns, Smithers, Ralph (all 7)

An external safety inspection arrives at Springfield Nuclear Power Plant. Burns wants a clean record. Lisa wants the truth. Homer is the loose cannon. Marge is at home worrying. Bart exploits the chaos. Smithers tries to keep Burns out of trouble. Tests: Power vs. truth under institutional pressure.

Boardroom Coup at the Plant

Agents: Burns, Smithers, Homer, Lisa, Ralph

Mr. Burns has decided that Sector 7-G needs a sacrifice, and Homer's name is at the top of the list. Smithers is drafting the termination memo. Homer has no idea — yet. Lisa is about to walk in with a petition she's been circulating about plant safety. Tests: Hierarchy, betrayal, and conflicting loyalty.

The Homework Heist

Agents: Bart, Homer, Lisa, Marge, Ralph

Bart's book report on "Treasure Island" is due tomorrow morning. He hasn't cracked the book. His plan: convince Homer to write it for him in exchange for a cut of his allowance. Lisa has already seen the empty page and is deciding whether to tell Marge. Tests: Family negotiation and moral compromise.

Thanksgiving at the Simpsons

Agents: Homer, Marge, Lisa, Bart, Ralph

The turkey is in the oven. Lisa brought a vegetarian argument. Bart is sharpening his comebacks. Homer is hovering for snacks. Marge is holding everything together with grace and a hot glue gun. Can the family make it to grace without a fight? Tests: Domestic pressure cooker — competing values in close quarters.


Why Springfield?

Springfield isn't a theme wrapper. It's a benchmark choice.

Shared prior across models — Every frontier model already knows Homer, Lisa, Burns, the plant, Channel 6, and Moe's. You aren't teaching the world — you're stressing what the model already holds.

A dense social graph, already loaded — Family, school, media, bar, plant, and town hall give you believable long-horizon pressure for free — no synthetic world-building tax, no explaining who trusts whom.

Drift is legible in seconds — When Homer starts giving management speeches, Lisa starts defending the plant, or Burns turns kind, the failure is immediate and undeniable. Not an abstract delta on a metric dashboard.

Venue Framework

Six Springfield venues stress-test different agent dynamics:

Code Venue Dynamic Primary Agents
EVG-742 Evergreen Terrace Family pressure, intimate conflict, domestic routine. Long-horizon identity tested against familiarity. Homer, Marge, Bart, Lisa
SEL-01 Springfield Elementary Authority, rules, developmental framing. Values clash with institutional expectation. Bart, Lisa
SNPP Nuclear Power Plant Hierarchy, incentive pressure, operational safety. Power asymmetry vs. working-class accountability. Homer, Burns, Smithers
CH-06 Channel 6 News Narrative control, public framing. What actually happened gets renegotiated. Shared-record coherence. Burns, Smithers
MOE-01 Moe's Tavern Private opinions, peer influence, off-the-record negotiation. Agents say what they'd never say publicly. Homer, peers
TH-01 Town Hall Public debate, civic decisions, quorum. Personal conflict becomes institutional and voted on publicly. Full cast

Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                            Springularity                                 │
├────────────────┬───────────────────┬───────────────────┬─────────────────┤
│    Frontend    │     API Layer     │    Orchestrator    │  Infrastructure │
│                │                   │                    │                 │
│  React 18      │  FastAPI          │  Deterministic     │  PostgreSQL 16  │
│  TypeScript    │  REST + WebSocket │  Tick Loop         │  + pgvector     │
│  Vite 5        │  asyncpg          │  Event-Sourced     │                 │
│  Tailwind CSS  │  Redis pub/sub    │  LLM Adapters      │  Redis 7        │
│  Zustand       │  Pydantic v2      │  Memory Service    │  pub/sub bridge │
│  TanStack Query│  CORS middleware   │  Evaluators        │                 │
│  Solana SDK    │                   │  Snapshot Writer   │  Docker Compose │
└────────────────┴───────────────────┴───────────────────┴─────────────────┘

How the Tick Loop Works

Every tick follows this exact sequence:

Scheduled Events → Observations → Memory Retrieval → Agent Decisions →
Serialized Action Resolution → World Delta → Memory Writes →
Evaluator Dispatch → Snapshot Decision → Tick End

Every step writes to the append-only event log and publishes to Redis. The UI reads from the same event table — no "logs say one thing, UI says another" bugs.

Event-Sourced Architecture

Every tick produces an EventEnvelope with:

  • tick — the simulation tick number
  • seq — sequence within the tick
  • type — action proposal, resolution, world delta, evaluator score, snapshot
  • agent_id — which agent produced it
  • payload — the full structured data

Nothing is lost. Everything is replayable. Snapshots are an optimization for fast scrub, not the source of truth.

Solana Integration

The RAM Market uses real Solana transactions:

  • Wallet connection via Phantom, Solflare, or any Solana wallet adapter
  • SOL payments — native Solana transfers to treasury
  • USDC payments — SPL token transfers on-chain
  • Treasury address: Hz7UkMhh5rtzsg2xaeXEuJmtccha2wrtmuMeTdSQv9tu
  • On-chain verification — every purchase is a real Solana transaction

Repository Layout

.
├── apps/
│   ├── api/                  FastAPI HTTP + WebSocket service
│   ├── orchestrator/         Deterministic tick loop worker
│   └── web/                  React + TS + Vite + Tailwind frontend
├── packages/
│   └── shared/               Wire-format contracts (Python + TS)
│       ├── python/           pip-installable: springfield_shared
│       └── ts/               imported via "@shared/*" Vite alias
├── infra/
│   ├── docker/postgres/      pgvector init script
│   └── migrations/           SQL migrations applied at container init
├── seed/
│   ├── scenarios/            4 versioned scenario JSON files
│   └── agents/               7 versioned agent persona cards
├── docker-compose.yml
├── vercel.json
├── railway.json
└── README.md

Tech Stack

Layer Technology
Frontend React 18 · TypeScript 5.4 · Vite 5 · Tailwind CSS 3.4 · Zustand · TanStack Query
Blockchain Solana Web3.js · SPL Token · Wallet Adapter (Phantom, Solflare) · USDC + SOL
API FastAPI · Pydantic v2 · asyncpg · Redis pub/sub · WebSocket streaming · CORS
Orchestrator Python 3.11+ · Deterministic tick loop · LLM adapters · Event-sourced
Database PostgreSQL 16 + pgvector · Append-only event log · Snapshots · LLM call cache
Realtime Redis 7 pub/sub bridged to WebSocket per open run · Sub-second latency
Testing pytest · pytest-asyncio · Vitest · Testing Library · jsdom
Deploy Vercel (frontend) · Railway (API + orchestrator) · Docker Compose (local)

Getting Started

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+
  • Node.js 18+
  • npm 9+

1. Clone & Configure

git clone https://github.com/LisaLoopBot/Springularity.git
cd Springularity
cp .env.example .env

2. Start Infrastructure

docker compose up -d

Starts PostgreSQL 16 (with pgvector) on localhost:5432 and Redis 7 on localhost:6379. Migrations apply automatically.

3. Install Python Packages

python -m venv .venv

# Windows
.\.venv\Scripts\Activate.ps1

# macOS / Linux
source .venv/bin/activate

pip install -e packages/shared/python
pip install -e apps/api[dev]
pip install -e apps/orchestrator[dev]

4. Seed the Database

python -m springfield_orchestrator.main --seed

Loads all 7 agent persona cards and 4 scenarios into the catalog.

5. Start the Orchestrator

python -m springfield_orchestrator.main

The worker subscribes to springfield.runs.start on Redis and runs any new run that the API publishes.

6. Start the API

uvicorn springfield_api.main:app --reload --port 8000

7. Start the Frontend

cd apps/web
npm install
npm run dev

Open http://localhost:5173 — you're in Springfield.

Quick Start (No Backend)

Don't want to set up infrastructure? The live deployment at springularity.vercel.app has a running backend. Click Try It Live and agents start immediately — no setup, no API key, no signup.


Design Philosophy

  • Determinism contract — A run is deterministic given (scenario_version, config, seed, llm_call_cache). Without cache, only the structural events are deterministic; LLM completions vary by provider. The cache is the source of replay truth, not the provider.
  • Append-only event log — The event table is the single source of truth. Snapshots are an optimization for fast scrub. The UI reads from the same table. Same data path, no discrepancies.
  • Theme isolation — All Springfield chrome lives in headers, borders, and panel frames. Data regions (event log, prompt blobs, scores, agent internals) stay clean. Toggle Boring Mode to strip the cartoon theme with a single data-theme="boring" attribute. Same DOM, different tokens.
  • Event-sourced architecture — Every tick writes structured EventEnvelope records. Nothing is lost. Everything is replayable.
  • RAM abstraction — The product surface says "RAM" everywhere. Users rent compute in gigabytes, not API calls. The underlying model routing is invisible.

Built For

  • AI Researchers — studying emergent behavior in multi-agent systems
  • Agent Developers — building and debugging persistent agent architectures
  • Benchmark Engineers — designing long-horizon evaluation suites
  • LLM Evaluation Teams — comparing model performance across social scenarios
  • Multi-Agent System Builders — stress-testing coordination and coherence
  • Crypto / DePIN Builders — exploring on-chain compute markets with Solana integration

Useful Commands

# Run backend tests
pytest apps/api apps/orchestrator

# Run frontend tests
cd apps/web && npm test

# Type-check frontend
cd apps/web && npx tsc --noEmit

# Re-seed database (idempotent)
python -m springfield_orchestrator.main --seed

# Reset Postgres entirely
docker compose down -v && docker compose up -d

# Build for production
cd apps/web && npm run build

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

  1. Fork the repo
  2. Create your branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.



Springularity — Long-horizon failures, made visible.

Built with determination and donuts.