Digital Tissue is a local-first Python toolkit for building and running simplified tissue simulations and benchmarking LLM agents against them. Everything runs locally (no database required).
A tissue is a grid of cells where each cell contains different molecules (genes, RNA, proteins) governed by rules that determine how they change over time. The entire simulation state lives in a single JSON file (gridstate.json).
This repo includes:
- A web-based editor to create and modify tissues — make them healthy, give them diseases, evolve them toward desired states
- LLM benchmark challenges where an LLM agent interrogates the tissue through API endpoints (running experiments, requesting omics data, proposing interventions) to solve a problem
- Evolutionary optimization to search over initial conditions and find tissue states that maximize an objective
The main goal is to generate environments that are biology-like: many interacting variables, limited observability, noisy data, and open-ended problems that require reasoning to solve.
- Python
3.10+ - macOS or Linux (Windows via WSL is likely fine)
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txtStart the backend and UIs:
python3 -m backend.tools.run_uiOpen:
http://127.0.0.1:8000/(portal)http://127.0.0.1:8000/editor(editor)
To change ports:
python3 -m backend.tools.run_ui --runtime-port 8000 --benchmarks-port 8001Backend only (no Streamlit UI):
python3 -m backend.runtime_server 8000To run LLM-driven benchmark episodes, you need API keys.
Create keys.txt at the repo root (it is gitignored):
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
XAI_API_KEY=
GEMINI_API_KEY=
Or point to a different file path with DT_KEYS_FILE.
Once the servers are running, open the Benchmarks UI:
http://127.0.0.1:8001/
You can also run benchmark scripts directly:
python3 trials/run_llm_benchmark.py --help
python3 trials/run_llm_suite.py --help- Benchmark run artifacts:
var/runs/llm_bench/(e.g.events.jsonl,report.json, logs) - Streamlit controller state:
var/runs/benchmarks/ - Runtime logs:
var/log/
backend/: runtime server + simulation engineapps/editor/: static editor UI served by the backendapps/benchmarks/: Streamlit benchmarks UIbenchmarks/challenges/: fixture puzzles backing/api/tests/*assets/: models, examples, promptstrials/: CLI scripts for running/analyzing benchmarksdocs/: deeper references (LAYER_OPS.md,MEASUREMENTS.md)
I highly recommend learning how to use this repo by talking to an AI assistant that has read it. However, below is an attempt to summarize the key concepts.
A tissue simulation is defined by a single JSON file (gridstate.json) that encodes:
- a 2D grid of cells (dimensions
H × W) - a set of layers — named variables stored per grid cell (e.g.
gene_x,rna_y,protein_z,molecule_a). Each layer is a flatfloat32array of lengthH*W, base64-encoded inside the JSON. - layer ops — update rules that compute the next value of each layer from the current state every tick (time unit)
- measurements — expressions computed from layers to track aggregate metrics (e.g. total live cells, mean protein level)
The backend loads this JSON into memory, advances it step-by-step, and exposes the state through HTTP endpoints. The editor UI visualizes layers as heatmaps and plots measurements over time.
The point is not to perfectly model real biology — it is to have a small, self-contained scenario where you can test strategies, run optimizations, or benchmark LLM agents against a measurable goal (e.g. reduce cancer-like behavior, extend simulated lifespan).
In benchmark mode, an LLM acts as an automated agent that interacts with the simulator through the backend API:
- The runner picks a challenge (e.g. "cancer").
- The LLM receives instructions (scenario, scoring, allowed actions).
- The LLM calls backend API endpoints to run experiments and propose interventions.
- The run produces artifacts:
events.jsonl(step-by-step transcript),report.json(score + summary),story.md,issues.json,stdout.log/stderr.log.
This lets you compare different LLM providers/models, prompts, or toolsets on identical puzzles.
- Create
keys.txt(see LLM benchmarks) with the relevant API key(s). - Start servers:
python3 -m backend.tools.run_ui - Open
http://127.0.0.1:8001/(or click Benchmarks from the portal). - Settings tab → choose a Challenge (cancer / aging / hereditary disease).
- Run tab → choose Provider and Model → click Start new.
- When finished, inspect the transcript, report, and score in the UI or on disk under
var/runs/llm_bench/.
- Copy an example
gridstate.jsonfromassets/examples/. - Edit it (or look at built-in models in
assets/models/). - Load it in the editor and run.
To make it available in the Benchmarks UI, place the file under benchmarks/challenges/<challenge_name>/ and restart.
- Open
http://127.0.0.1:8000/editor. - Click Demo to load a built-in example, or Import for your own file.
- Runtime tab → Reset → Step once or Start to run.
- Switch variables (layers) to view heatmaps; use Inspect for summary stats.
- Evolution tab → choose algorithm (
cem_deltaoraffine) → Start to search over initial conditions.
- Backend (
backend/runtime_server.py): single-processThreadingHTTPServer. Holds simulation state in memory as a JSON payload, updates it on/api/runtime/step. Serves the editor UI as static files (no frontend build step). - Editor UI (
apps/editor/):index.html,app.js,style.css,portal.html. - Benchmarks UI (
apps/benchmarks/): Streamlit app for running and inspecting benchmark episodes. - Simulation engine (
backend/digital_tissue/apply_layer_ops.py): core tick functionapply_layer_ops_inplace(payload, seed_offset=...).
The server sets Cache-Control: no-store, so edits to HTML/JS/CSS are picked up on refresh.
Logs go to var/log/ (override with DT_LOG_DIR).
Supported LLM API key environment variables: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, XAI_API_KEY. Optional base URL overrides: XAI_BASE_URL, GEMINI_BASE_URL.
API smoke test:
curl -s http://127.0.0.1:8000/api/health- step / tick: one simulation update
- grid: 2D layout of the tissue
- layer: one variable per grid location (RNA / protein / molecule / state)
- gridstate / payload: the JSON model file containing the full simulation state
- Evolution: built-in optimization that searches over starting conditions
Simulation state is passed around as a single JSON object:
version: must be1H,W: grid dimensionslayers: list of{ "name": "gene_x", "kind": "continuous" }metadata objectsdata: dict mapping layer names to{ "dtype": "float32", "b64": "..." }buffers
Layer buffers are base64-encoded float32 arrays of length H*W. This keeps the JSON interface simple while staying compact and fast to decode with NumPy.
The optional payload["event_counters"]["totals"] dict tracks simulation events (divisions, deaths) used by Evolution fitness calculations. Evolution strips existing counters from the base payload before evaluations.
All endpoints are in backend/runtime_server.py.
| Endpoint | Body | Returns |
|---|---|---|
POST /api/runtime/reset |
{ payload } |
{ ok, tick, H, W, layers } |
POST /api/runtime/step |
{ layers? } |
{ tick, data, scalars, measurements, events } |
POST /api/runtime/frame |
{ layers? } |
same as /step without advancing |
/step calls apply_layer_ops_inplace(payload, seed_offset=tick) and increments tick.
Evolution runs in a background thread so the UI stays responsive.
| Endpoint | Body | Returns |
|---|---|---|
POST /api/evolution/start |
{ payload, config } |
{ ok, job_id } |
POST /api/evolution/stop |
— | { ok } |
POST /api/evolution/status |
— | { running, progress, baseline, series, history, top, perf } |
POST /api/evolution/candidate |
{ id } |
{ ok, id, fitness, metrics, genome, payload } |
The payload in /candidate response is a fully reconstructed gridstate.json loadable into Runtime.
Evolution searches over the initial conditions of gene_*, rna_*, protein_* layers. The cell/cell_type layer is auto-detected for masking.
Each layer gets a scale and bias: arr2 = arr * scale + bias, clamped to [0, ∞). Genomes mutate via log-normal noise on scale and additive noise on bias. Standard elitist GA structure.
Parameters: mutation_rate, sigma_scale, sigma_bias.
Each layer gets a per-cell delta: arr2 = arr + delta. Deltas are sampled from a per-layer Gaussian (mu, sigma) and updated via Cross-Entropy Method toward the best candidates.
Parameters: cem_sigma_init, cem_alpha, cem_sigma_floor, cem_mask (cell or all).
This works better when the optimal solution requires heterogeneous per-cell specialization.
Weighted sum of alive cells + event totals (divisions, starvation_deaths, damage_deaths). Weights set in the UI. A baseline from the unmodified payload is computed for comparison.
Each evolution evaluation copies the payload, applies the genome, runs N ticks, and computes fitness. The /api/evolution/status response includes perf telemetry (apply_s, ticks_s, decode_cell_s, total_s) to diagnose bottlenecks.
The backend uses ThreadPoolExecutor for parallel evaluations. If ticks_s dominates and CPU doesn't scale with threads, consider process-based parallelism.
python3 -m venv fails on Ubuntu/Debian:
sudo apt install -y python3-venv # or python3.12-venv for 3.12Then recreate .venv and reinstall deps.
externally-managed-environment error:
Use a virtual environment (python3 -m venv .venv). Avoid --break-system-packages.
GitHub push auth:
Use a Personal Access Token, SSH remote, or gh auth login.