An evolutionary multi-agent simulation exploring how moral behaviors emerge through cooperation. LLM-powered agents with different moral frameworks (universal, reciprocal, kin-focused, selfish) interact in a resource-gathering environment — hunting, sharing, fighting, and reproducing — to test hypotheses about why morality might be favored by natural selection.
This simulation engine is the core framework for the ACL 2026 paper: "Investigating Moral Evolution via LLM-based Agent Simulation", hosted under the MoralAgentSim organization.
- Project Website: https://MoralAgentSim.github.io (Provides high-resolution graphs, macro-statistics, and case studies).
- Python 3.12+
- uv (package manager)
- Git
git clone https://github.com/MoralAgentSim/social-evol-sim.git
cd social-evol-sim
# Install dependencies
uv sync
# Set up environment variables
cp .env.example .env
# Edit .env and set OPENROUTER_API_KEY (recommended — gives access to all
# OpenRouter-supported models through a single provider)For checkpoint storage in PostgreSQL:
pip install "psycopg[binary]"# Start a fresh simulation
uv run python main.py run --config_dir configZ_major_v2
# Run with real-time dashboard
uv run python main.py run --config_dir configA_z8_easyHunting_visible --dashboard
# Resume from a checkpoint
uv run python main.py resume <RUN_ID> --config.world.max_life_steps 50
# Resume from a specific time step
uv run python main.py resume <RUN_ID> --time_step 10
# List available runs
uv run python main.py list-runs
# Estimate token usage and cost
uv run python main.py estimate-cost --config_dir configZ_major_v2
# Use OpenRouter as the LLM provider (any OpenRouter-supported model works)
uv run python main.py run \
--config_dir configZ_major_v2 \
--config.llm.provider openrouter \
--config.llm.chat_model anthropic/claude-sonnet-4
# Combine with other flags
uv run python main.py run \
--config_dir configZ_major_v4 \
--config.llm.provider openrouter \
--config.llm.chat_model openai/gpt-4o-mini \
--config.llm.async_config.max_concurrent_calls 2 \
--config.world.max_life_steps 1 \
--dashboard
# Run 4 kin-focused agents only (override agent count and ratios)
uv run python main.py run \
--config_dir configZ_major_v4 \
--config.llm.provider openrouter \
--config.llm.chat_model google/gemini-2.5-flash \
--config.llm.async_config.max_concurrent_calls 20 \
--config.agent.initial_count 4 \
--config.agent.ratio.kin_focused_moral 1.0 \
--config.agent.ratio.universal_group_focused_moral 0.0 \
--config.agent.ratio.reciprocal_group_focused_moral 0.0 \
--config.agent.ratio.reproductive_selfish 0.0 \
--config.world.max_life_steps 3 \
--dashboardThis project follows the OpenRouter model-naming convention (<vendor>/<model-id>), giving access to the full catalogue of providers — Anthropic, OpenAI, Google, DeepSeek, Mistral, Meta, and more — through a single API key. See the full list at openrouter.ai/models.
Examples: anthropic/claude-sonnet-4, openai/gpt-4o-mini, google/gemini-2.5-flash, deepseek/deepseek-chat-v3-0324.
| Subcommand | Description |
|---|---|
run |
Start a fresh simulation (--config_dir required) |
resume <RUN_ID> |
Resume from a checkpoint |
list-runs |
List available simulation runs |
estimate-cost |
Estimate token usage and cost (--config_dir required) |
| Flag | Description |
|---|---|
--checkpoint_dir |
Checkpoint save location (default: ./data) |
--dashboard |
Enable Rich Live real-time dashboard |
--log_level |
debug, info, warning, error, critical |
--debug_responses |
Save raw LLM responses on validation errors |
--no_db |
Disable database, file-only checkpoints |
--config.* |
Override any nested config field (auto-generated from Pydantic model) |
Common config overrides:
| Override | Description |
|---|---|
--config.world.max_life_steps N |
Max simulation steps |
--config.world.communication_and_sharing_steps N |
Communication frequency |
--config.llm.provider |
LLM provider (recommended: openrouter; also supports openai, deepseek, alibaba) |
--config.llm.chat_model |
Model id in OpenRouter format (e.g., anthropic/claude-sonnet-4, openai/gpt-4o-mini) |
--config.llm.async_config.max_concurrent_calls N |
Max concurrent LLM calls (default: 10) |
--config.agent.initial_count N |
Number of starting agents |
The simulation runs an async three-phase step loop:
- Phase 1 — Parallel LLM Decisions: All alive agents query the LLM concurrently (frozen checkpoint state). Returns pure
AgentDecisionResultobjects with no side effects. - Phase 2 — Sequential Action Application: Decisions are applied one-by-one. A stale-action guard catches
ValueErrorfor race conditions (e.g., two agents hunting the same prey). - Phase 3 — Environment Updates: Social and physical environment updates (plant regrowth, prey respawn).
Agents choose from 8 action types each step: Collect, Allocate, Hunt, Fight, Rob, Reproduce, Communicate, DoNothing.
Agents are assigned one of 5 moral frameworks that shape their LLM prompts:
- Universal group-focused moral — cooperates broadly
- Reciprocal group-focused moral — tit-for-tat cooperation
- Kin-focused moral — prioritizes family/offspring
- Reproductive selfish — self-interested, reproduces aggressively
- Reproduction-averse selfish — self-interested, avoids reproduction costs
The evolutionary trajectory of the simulation is governed by physical environment limits and cognitive (LLM) framing parameters. All core experiments are configured to run across 80 scaled time steps with a statistically rigorous N=4 simulation replication per condition.
Each config directory under config/ contains an isolated settings.json (world physics parameters, LLM model tuning, agent ratios) alongside injected moral framework prompt templates. We expose four fundamental experimental environments to observe the emergence (or extinction) of specific moral behaviors under selective pressures:
- Mechanics: High resource spawn rate (
carrying_capacity: abundant) and frictionless (0 HP cost) agent-to-agent communication. - Evolutionary Pressure: Low pressure environments act as a control. Without life-threatening constraints, purely survival-oriented selection pressure drops.
- Observed Emergence: Kin-focused agents dominate this epoch. They safely exploit the peaceful environment to rapidly expand familial lineages without requiring complex, risky trust negotiations outside their in-group.
- Mechanics: Ecological carrying capacity is artificially suppressed. The environment's prey/food regeneration rates are halved, and baseline vitality drainage per step is increased.
- Evolutionary Pressure: High environmental attrition. Agents can no longer survive independently; cooperation and resource sharing become mandatory for long-term health.
- Observed Emergence: Naturally selects for Reciprocal agents. Capable of evaluating external agents and negotiating resource sharing, Reciprocal trust clusters scale horizontally to bypass localized familial starvation limits.
- Mechanics: Imposes a stiff metabolic penalty for dialogue. The
Communicateaction now costs an explicit1 HPand10 tokensper invocation. - Evolutionary Pressure: Taxing dialogue structurally penalizes highly social and cooperative types (Universal/Reciprocal) who rely heavily on communication protocols to forge alliances.
- Observed Emergence: Driven by the metabolic drain of socialization, isolated Selfish agents thrive. By circumventing the communication tax entirely and hoarding individual resources, they out-live the over-extending cooperative agents.
- Mechanics: Removes LLM cognitive blindness regarding peer alignment. Observation prompts are injected with explicit tags revealing the internal moral alignment of targeting peers.
- Evolutionary Pressure: Simulates a perfect "reputation system." Free-rider and defection problems are mathematically nullified because risk is perfectly calculable before interaction.
- Observed Emergence: Highly-efficient Selective Altruism. Universal and Reciprocal agents immediately isolate Selfish defector types, starving them of shared resources which results in the rapid, enforced extinction of selfish behaviors.
# Run all tests
uv run pytest scr/tests/ -v
# Run a single test file
uv run pytest scr/tests/test_stale_action_guard.py -v
# Run a specific test
uv run pytest scr/tests/test_async_step.py::TestEventBus::test_publish_subscribe -vIntegration tests that require API keys will auto-skip when keys are unavailable.
main.py # Entry point (async)
config/ # Simulation configurations
scr/
api/
llm_api/ # LLM client (litellm), config, providers
db_api/ # PostgreSQL checkpoint storage
models/
agent/ # Agent, actions, responses, decision_result
environment/ # Physical & social environments
simulation/ # Checkpoint
core/ # Config, metadata, logs
prompt_manager/ # Prompt construction, messages
simulation/
runner/ # simulation_step (3-phase), runner, resumer
agent_decision/ # Async LLM decision-making, retry
act_manager/ # Action dispatch + handlers
env_manager/ # Environment step logic
cli/ # CLI parsing + command execution
event_bus.py # AsyncIO pub/sub
dashboard.py # Rich Live dashboard
utils/ # Logging, checkpoint I/O, random
tests/ # Unit and integration tests