# **[Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/pdf/2304.03442)**

The paper **introduces generative agents**, virtual characters that **simulate believable human behavior**. They are not simple chatbots: they live in a simulated world, wake up, talk, work, remember experiences, and plan for the future.

**The architecture is based on a LLM** enriched with three functions:

1. **Memory** (recording experiences),
2. **Reflection** (abstracting memories),
3. **Planning** (using memories to act).

In a Sims-style experiment with 25 agents, from a single input ("I want to organize a Valentine's Day party"), the agents spread the news, invited others, made appointments, and participated together.

### **Result**:
The agents exhibit realistic and emergent behaviors, useful for games, immersive environments, and social research.

### Objective

The authors situate their work within the literature on human-AI interaction and the historic but complex efforts to create agents that credibly simulate human behavior.

### Key Points

* **Human-AI Interaction**: For decades, efforts have been made to enable users to influence and train models.
* **Historical examples**: Crayons (interactive ML), SHRDLU, ELIZA (first natural language interactions).
* **Evolution**: systems based on examples/demonstrations, up to prompt authoring and deep learning.
* **Agents and Natural Language**: Today, the technology is mature enough for autonomous agents that communicate in natural language in complex online social environments.
* **Opportunities**: Natural language becomes a means to enhance human skills in various domains (photo editing, code editing).

This line of research reopens classic **HCI questions** (cognitive models, prototyping tools, ubiquitous computing), showing that with LLMs, when integrated with the right architecture, it is possible to build agents that act as credible proxies of human behavior.


### Believability as a Goal

* “**Believability**” has been a central goal in the development of artificial agents for decades.
* The idea is that agents appear to have a life of their own, like Disney characters or NPCs in games, creating the illusion of autonomous decisions and emergent social interactions.

### Historical Approaches

1. **Rule-based** (finite-state machines, behavior trees):

* Dominant in video games (*Mass Effect*, *The Sims*).
* They work, but require manual scripting and don't cover all possible interactions.
* They don't generate new behaviors beyond the encoded ones.

2. **Learning-based (Reinforcement Learning)**:

* Huge successes in competitive games (*AlphaStar*, *OpenAI Five*).
* However, they only work with clear rewards → not suitable for open worlds with complex social dynamics.

3. **Cognitive architectures (SOAR, ACT-R, ICARUS)**:

* They integrate short-term/long-term memory and perception-planning-action cycles.
* Examples: Quakebot-SOAR, TacAir-SOAR.
* Robust for their time, but still based on handwritten procedural knowledge → no real ability to invent new behaviors.

### Current status

* The problem of creating **truly believable agents** remains open.
* Many researchers are satisfied with "good enough" solutions for traditional gameplay.
* The authors argue, however, that **Large Language Models** now offer the possibility of reopening the challenge, if integrated with an architecture capable of managing **memories and reflections**.

---

### Believability as a Goal

* “**Believability**” (credibility) has been a central goal in the development of artificial agents for decades.
* The idea is that agents appear to have a life of their own, like Disney characters or NPCs in games, creating the illusion of autonomous decisions and emergent social interactions.

### Historical Approaches

1. **Rule-based** (finite-state machines, behavior trees):

* Dominant in video games (*Mass Effect*, *The Sims*).
* They work, but require manual scripting and do not cover all possible interactions.
* They do not generate new behaviors beyond those encoded.

2. **Learning-based (Reinforcement Learning)**:

* Huge successes in competitive games (*AlphaStar*, *OpenAI Five*).
* However, they only work with clear rewards → not suitable for open worlds with complex social dynamics.

3. **Cognitive architectures (SOAR, ACT-R, ICARUS)**:

* They integrate short-term/long-term memory and perception-planning-action cycles.
* Examples: Quakebot-SOAR, TacAir-SOAR.
* Robust for their time, but still based on handwritten procedural knowledge → no real ability to invent new behaviors.

### Current status

* The problem of creating **truly believable agents** remains open.
* Many researchers are satisfied with "good enough" solutions for traditional gameplay.
* The authors argue, however, that **Large Language Models** now offer the possibility of reopening the challenge, if integrated with an architecture capable of managing **memories and reflections**.

### LLMs as a basis for behavior

* Large Language Models (LLMs) capture a wide range of human behaviors from their training data.
* With a defined context, they can generate believable responses and simulate realistic behaviors.

### Previous Applications

* Social simulacra: Fictitious agents to test social dynamics in new systems.
* Replication of social and political studies: questionnaires, surveys, synthetic data synthesis.
* Video games: interactive fiction, text adventures.
* Robotics: decomposition of complex tasks (e.g., “pick up the bottle” → move to the table, pick up the bottle).

### Current Limitations

* Most research uses first-level prompts (few-shot, chain-of-thought).
* These only work well in the **immediate context**, but they don't handle:

* **past experiences**
* **long-term consistency**
* Problem: The models' **limited context window** makes it impossible to integrate all past experiences.

### New approach of the paper

* The authors go further, proposing an architecture that:

* **dynamically integrates past experiences** with the current context,
* updates them at every step,
* uses them to generate behaviors that can be reinforced or conflicted by new situations.

### Smallville: The Sandbox World

The authors have created a small world called **Smallville**, similar to *The Sims*, populated by **25 unique agents** represented by sprites.
Each agent has an **initial profile in natural language** (occupation, family, relationships) that becomes the basis of their memory.

### Agent Behavior

* Agents **act and speak in natural language**.
* At each step, they produce a sentence describing what they are doing ("Isabella writes in her diary," "checks email"), which is then translated into emojis above the avatar.
* They can decide whether to ignore each other or start a conversation with other nearby agents.

* Example: political discussions between Isabella Rodriguez and Tom Moreno about the elections.

### User Interaction

* The user communicates with the agents in natural language, assuming a **role or persona** (e.g., a reporter asking questions).
* It's also possible to become the agent's **"inner voice"** to directly influence their decisions.

* Example: By telling John, "You're running against Sam," John actually starts running as a candidate and tells his family.

In **Smallville**, agents live in a village with houses, shops, a school, and interactive objects (beds, stoves, refrigerators, etc.). They move through the space, enter buildings, and react to changes: if a user sets the stove to be on fire, the agent puts it out. The user can also embody an agent and be treated like a real resident.

Agents plan their days and interact naturally. **Example**: John Lin wakes up, eats breakfast, talks to his son Eddy about his studies, then to his wife Mei, relaying the conversation. Finally, he opens the pharmacy, while Mei goes to teach. This shows that generative agents **memorize, share information, and create believable daily routines**, giving rise to realistic social dynamics.

### 1. Dissemination of Information

Officers share news through dialogue.

* Example: Sam announces his candidacy for mayor to Tom. The word gradually spreads, until it becomes a topic of discussion throughout the city.

### 2. Relationships and Memory

Officers form new relationships and recall past interactions.

* Example: Sam meets Latoya for the first time in the park. In a subsequent meeting, he recalls her photography project and asks her for updates.

### 3. Coordination

Officers independently organize joint events.

* Example: Isabella initially plans to throw a Valentine's Day party. From there, she invites friends, sets up the venue, and involves Maria (who in turn invites Klaus, her crush). Eventually, more officers show up and celebrate together.

---

These cases show that generative agents can:

* disseminate ideas,
* maintain coherent relationships,
* coordinate collective activities,
without the need for manual programming.

---


### Generative Agent Architecture

1. **Perception**

* Agents observe the environment (actions, events, interactions).
* Each perception is stored in a **memory stream**, a continuous record of experiences in natural language.

2. **Memory Retrieval**

* When they need to act, they retrieve the most **relevant, recent, and important** memories from the memory stream.

3. **Action**

* They use those memories to decide what to do in the moment (e.g., talk to someone, react to an event).

4. **Reflection**

* Over time, they synthesize memories into higher-level **reflections** (e.g., “Sam is interested in politics”).

5. **Planning**

* From reflections and experiences, they formulate **long-term plans** (e.g., preparing a party, pursuing a relationship).

6. **Continuous Loop**

* Actions, reflections, and plans feed back into the memory stream, influencing future behaviors.

---

## Goal and Problem

The architecture must produce time-consistent behavior in an open world, integrating:

* Perception of the environment (current state),
* Past experience (memory),
* Reasoning (reflections and plans).

An LLM alone can generate plausible "instantaneous" actions, but fails to recall relevant past experiences, make important inferences, and maintain long-term consistency (context window limit, fragile planning).

## Cornerstone: the *Memory Stream*

It is a chronological natural language database that stores everything: observations, reflections, and plans. Each record has its text, creation timestamp, and last access timestamp. The LLM is not conditioned by the entire memory (which is too large), but by a selected subset via retrieval.

### Retrieval: How to Select the Right Memories

A score is calculated by combining three signals (normalized to \[0,1] and weighted α=1**):

* **Recency**: Priority to what has happened/been recently (exponential decay with a factor of **0.995** for “hours of play”).
* **Importance**: Distinguish between trivial and salient events. The LLM assigns a **1–10** rating to poignancy when the memory arises (e.g., “asking a crush out” \~8).
* **Relevance**: Relevance to the current **query context** calculated as **cosine similarity** between the memory embedding and the query embedding.

The **top-k** memories, as long as there is space in the model window, fit into the prompt: thus, the action produced is just enough informed without being “drowning” in noise.

## Reflection: From Data to Inference

Only "raw" memory produces shortsighted choices (e.g., choosing the most frequent friend, not the closest one). Therefore, the authors introduce **reflections**: abstract and enduring thoughts, generated **periodically** when the sum of recent importances exceeds a **threshold (150)**.
Procedure:

1. With the last ~100 memories, the LLM asks **high-level questions** (“What is Klaus passionate about?”).
2. For each question, it performs targeted retrieval and **extracts insights** by citing the evidence-based memories.
3. The reflections **accumulate in a tree** (leaves = observations; internal nodes = increasingly abstract concepts), reentering the memory stream and becoming recallable.

Effect: the agent **generalizes** and maintains **identity themes** (interests, relationships), improving choices and dialogues.

## Planning & Reacting: Consistency over Time

To avoid "temporal myopia" (plausible but repetitive actions), agents:

* **Plan from the top down**: first, a **sketch** of the day (5–8 blocks), then breakdown into **time slots**, then into **5–15 minute chunks** (materials, breaks, cleanup).
* **Adapt the plan** (*reacting*) to each *tick*: they perceive new observations, decide whether to **continue** or **react** (e.g., seeing the child in the garden → starting a relevant dialogue), **regenerating** the plan from the moment of the deviation.
* **Dialogue**: each line is influenced by **synthesized memory** about the interlocutor + **history of the dialogue**, maintaining **relational coherence** (roles, affects, ongoing tasks).

**Operational summary**: cycle: perceive → retrieve (R/R/I) → act/speak → reflect → plan/re-plan → record everything in the memory stream. This loop creates **narrative stability** and credible **social emergency**.

--

### **LOGIC RECAP**

1. **Pure” LLM**

* It doesn't do anything on its own: it generates text given a prompt.
* To "do things," you need an **orchestrator** that reads the output and calls tools/environment (code, API, game engine).

2. **Memory stream ≠ prompt context**

* **Memory stream** (in the paper): a **persistent** archive of experiences (observations, reflections, plans).
* **Context**: the **subset** of memories retrieved from the stream and inserted into the prompt **at that moment** (via retrieval: *recency*, *importance*, *relevance*).
* So: the stream is "everything that happened"; the context is "what I pass to the LLM now."

3. **ReAct** (Reason + Act)

* It's a **pattern** that alternates **reasoning** (textual trace) and **actions** (tool use) in a loop.
* The paper doesn't explicitly call it “ReAct,” but the cycle is **ReAct-like**:
**Perceive → Retrieve memories → Reflect/Plan → Propose action (NL) → Engine executes → New observations → Updated memory**.

4. **What does the paper add beyond ReAct**

* **Reflections** (tree abstractions) to stabilize **identities/interests**.
* **Hierarchical planning** (day → hours → 5–15 min) + reactive **re-planning**.
* **Grounding** NL ↔ world (tree of areas/objects) and **server** that applies the effects (pathfinding, states).

In short:

**Agent = LLM + Memory Stream (persistent) + Retrieval → [Context] + Reflection + Planning + Tools/Env + Orchestrator (loop).**

LLM **with** memory “stream → context” and a **ReAct**-style loop can “reason” (in the sense of planning/justifying) **and** “act” (via tools/environment). Without these pieces, all that remains is text.