# 📘 Agentic Architectures 10: Simulator / Mental-Model-in-the-Loop

Welcome to the tenth notebook in our series. Today, we explore a sophisticated architecture designed for safety and robust decision-making in high-stakes environments: the **Simulator**, also known as a **Mental-Model-in-the-Loop**.

The core idea is to give an agent the ability to "think before it acts" in a very concrete way. Instead of committing to an action in the real world immediately, the agent first tests its proposed action in an internal, simulated version of the environment. By observing the likely consequences in this safe sandbox, it can evaluate risks, refine its strategy, and only then execute a more considered action in reality.

We will build a simple **stock trading agent** to demonstrate this. The "real world" will be a market simulator that advances one step at a time. Before making a trade, our agent will:
1. Propose a general strategy (e.g., "buy aggressively").
2. Run that strategy through a *forked* version of the market simulator for multiple future steps to see potential outcomes.
3. Analyze the simulation results to assess risk and reward.
4. Make a final, refined decision (e.g., "The simulation shows high volatility; let's buy a smaller amount.").
5. Execute that refined trade in the real market.

This pattern is crucial for moving agents from informational tasks to performing actions in the real world, where mistakes can have real consequences.

### Definition
A **Simulator** or **Mental-Model-in-the-Loop** architecture involves an agent that uses an internal model of its environment to simulate the outcomes of potential actions before executing any of them. This allows the agent to perform what-if analysis, anticipate consequences, and refine its plan for safety and effectiveness.

### High-level Workflow

1.  **Observe:** The agent observes the current state of the real environment.
2.  **Propose Action:** Based on its goals and the current state, the agent's planning module generates a high-level proposed action or strategy.
3.  **Simulate:** The agent forks the current state of the environment into a sandboxed simulation. It applies the proposed action and runs the simulation forward to observe a range of possible outcomes.
4.  **Assess & Refine:** The agent analyzes the results from the simulation. Did the action lead to the desired outcome? Were there unforeseen negative consequences? Based on this assessment, it refines its initial proposal into a final, concrete action.
5.  **Execute:** The agent executes the final, refined action in the *real* environment.
6.  **Repeat:** The loop begins again from the new state of the real environment.

### When to Use / Applications
*   **Robotics:** Simulating a grasp or a path before moving a physical arm to avoid collisions or damage.
*   **High-Stakes Decision-Making:** In finance, simulating a trade's impact on a portfolio under different market conditions. In healthcare, simulating a treatment plan's potential effects.
*   **Complex Game AI:** An AI in a strategy game simulating several moves ahead to choose the optimal one.

### Strengths & Weaknesses
*   **Strengths:**
    *   **Safety & Risk Reduction:** Massively reduces the chance of harmful or costly mistakes by vetting actions in a safe environment first.
    *   **Improved Performance:** Leads to more robust and well-considered decisions by allowing for lookahead and planning.
*   **Weaknesses:**
    *   **Simulation-Reality Gap:** The effectiveness is entirely dependent on the fidelity of the simulator. If the model of the world is inaccurate, the agent's plans may be based on false assumptions.
    *   **Computational Cost:** Running simulations, especially multiple scenarios, is computationally expensive and slower than acting directly.

## Phase 0: Foundation & Setup

We'll install libraries and set up our environment.

In [1]:
# !pip install -q -U langchain-nebius langchain langgraph rich python-dotenv numpy

In [2]:
import os
import random
import numpy as np
from typing import List, Dict, Any, Optional
from dotenv import load_dotenv

# Pydantic for data modeling
from pydantic import BaseModel, Field

# LangChain components
from langchain_nebius import ChatNebius
from langchain_core.prompts import ChatPromptTemplate

# LangGraph components
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict

# For pretty printing
from rich.console import Console
from rich.markdown import Markdown
from rich.table import Table

# --- API Key and Tracing Setup ---
load_dotenv()

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Agentic Architecture - Simulator (Nebius)"

required_vars = ["NEBIUS_API_KEY", "LANGCHAIN_API_KEY"]
for var in required_vars:
    if var not in os.environ:
        print(f"Warning: Environment variable {var} not set.")

print("Environment variables loaded and tracing is set up.")

Environment variables loaded and tracing is set up.


## Phase 1: Building the Simulator Environment

First, we need to create the "world" our agent will interact with. We'll build a `MarketSimulator` class that manages the state of a stock, a portfolio, and includes a `step` function to advance time. This will serve as both our "real world" and the sandbox for our agent's simulations.

In [None]:
console = Console()

class Portfolio(BaseModel):
    cash: float = 10000.0
    shares: int = 0
    
    def value(self, current_price: float) -> float:
        return self.cash + self.shares * current_price

class MarketSimulator(BaseModel):
    """A simple simulation of a stock market for one asset."""
    day: int = 0
    price: float = 100.0
    volatility: float = 0.1 # Standard deviation for price changes
    drift: float = 0.01 # General trend
    market_news: str = "Market is stable."
    portfolio: Portfolio = Field(default_factory=Portfolio)

    def step(self, action: str, amount: float = 0.0):
        """Advance the simulation by one day, executing a trade first."""
        # 1. Execute trade
        if action == "buy": # amount is number of shares
            shares_to_buy = int(amount)
            cost = shares_to_buy * self.price
            if self.portfolio.cash >= cost:
                self.portfolio.shares += shares_to_buy
                self.portfolio.cash -= cost
        elif action == "sell": # amount is number of shares
            shares_to_sell = int(amount)
            if self.portfolio.shares >= shares_to_sell:
                self.portfolio.shares -= shares_to_sell
                self.portfolio.cash += shares_to_sell * self.price
        
        # 2. Update market price (Geometric Brownian Motion)
        daily_return = np.random.normal(self.drift, self.volatility)
        self.price *= (1 + daily_return)
        
        # 3. Advance time
        self.day += 1
        
        # 4. Potentially update news
        if random.random() < 0.1: # 10% chance of new news
            self.market_news = random.choice(["Positive earnings report expected.", "New competitor enters the market.", "Macroeconomic outlook is strong.", "Regulatory concerns are growing."])
            # News affects drift
            if "Positive" in self.market_news or "strong" in self.market_news:
                self.drift = 0.05
            else:
                self.drift = -0.05
        else:
             self.drift = 0.01 # Revert to normal drift

    def get_state_string(self) -> str:
        return f"Day {self.day}: Price=${self.price:.2f}, News: {self.market_news}\nPortfolio: ${self.portfolio.value(self.price):.2f} ({self.portfolio.shares} shares, ${self.portfolio.cash:.2f} cash)"

print("Market simulator environment defined successfully.")

Market simulator environment defined successfully.


## Phase 2: Building the Simulator Agent

Now we'll use LangGraph to orchestrate the `Observe -> Propose -> Simulate -> Refine -> Execute` workflow. We'll define Pydantic models for the LLM's outputs to ensure structured communication between the steps.

In [4]:
llm = ChatNebius(model="mistralai/Mixtral-8x22B-Instruct-v0.1", temperature=0.4)

# Pydantic models for structured LLM outputs
class ProposedAction(BaseModel):
    """The high-level strategy proposed by the analyst.""" 
    strategy: str = Field(description="A high-level trading strategy, e.g., 'buy aggressively', 'sell cautiously', 'hold'.")
    reasoning: str = Field(description="Brief reasoning for the proposed strategy.")

class FinalDecision(BaseModel):
    """The final, concrete action to be executed."""
    action: str = Field(description="The final action to take: 'buy', 'sell', or 'hold'.")
    amount: float = Field(description="The number of shares to buy or sell. Should be 0 if holding.")
    reasoning: str = Field(description="Final reasoning, referencing the simulation results.")

# LangGraph State
class AgentState(TypedDict):
    real_market: MarketSimulator
    proposed_action: Optional[ProposedAction]
    simulation_results: Optional[List[Dict]]
    final_decision: Optional[FinalDecision]

# Graph Nodes
def propose_action_node(state: AgentState) -> Dict[str, Any]:
    """Observes the market and proposes a high-level strategy."""
    console.print("--- 🧐 Analyst Proposing Action ---")
    prompt = ChatPromptTemplate.from_template(
        "You are a sharp financial analyst. Based on the current market state, propose a trading strategy.\n\nMarket State:\n{market_state}"
    )
    proposer_llm = llm.with_structured_output(ProposedAction)
    chain = prompt | proposer_llm
    proposal = chain.invoke({"market_state": state['real_market'].get_state_string()})
    console.print(f"[yellow]Proposal:[/yellow] {proposal.strategy}. [italic]Reason: {proposal.reasoning}[/italic]")
    return {"proposed_action": proposal}

def run_simulation_node(state: AgentState) -> Dict[str, Any]:
    """Runs the proposed strategy in a sandboxed simulation."""
    console.print("--- 🤖 Running Simulations ---")
    strategy = state['proposed_action'].strategy
    num_simulations = 5
    simulation_horizon = 10 # days
    results = []

    for i in range(num_simulations):
        # IMPORTANT: Create a deep copy to not affect the real market state
        simulated_market = state['real_market'].model_copy(deep=True)
        initial_value = simulated_market.portfolio.value(simulated_market.price)

        # Translate strategy to a concrete action for the simulation
        if "buy" in strategy:
            action = "buy"
            # Aggressively = 25% of cash, Cautiously = 10%
            amount = (simulated_market.portfolio.cash * (0.25 if "aggressively" in strategy else 0.1)) / simulated_market.price
        elif "sell" in strategy:
            action = "sell"
            # Aggressively = 25% of shares, Cautiously = 10%
            amount = simulated_market.portfolio.shares * (0.25 if "aggressively" in strategy else 0.1)
        else:
            action = "hold"
            amount = 0
        
        # Run the simulation forward
        simulated_market.step(action, amount)
        for _ in range(simulation_horizon - 1):
            simulated_market.step("hold") # Just hold after the initial action
        
        final_value = simulated_market.portfolio.value(simulated_market.price)
        results.append({"sim_num": i+1, "initial_value": initial_value, "final_value": final_value, "return_pct": (final_value - initial_value) / initial_value * 100})
    
    console.print("[cyan]Simulation complete. Results will be passed to the risk manager.[/cyan]")
    return {"simulation_results": results}

def refine_and_decide_node(state: AgentState) -> Dict[str, Any]:
    """Analyzes simulation results and makes a final, refined decision."""
    console.print("--- 🧠 Risk Manager Refining Decision ---")
    results_summary = "\n".join([f"Sim {r['sim_num']}: Initial=${r['initial_value']:.2f}, Final=${r['final_value']:.2f}, Return={r['return_pct']:.2f}%" for r in state['simulation_results']])
    
    prompt = ChatPromptTemplate.from_template(
        "You are a cautious risk manager. Your analyst proposed a strategy. You have run simulations to test it. Based on the potential outcomes, make a final, concrete decision. If results are highly variable or negative, reduce risk (e.g., buy/sell fewer shares, or hold).\n\nInitial Proposal: {proposal}\n\nSimulation Results:\n{results}\n\nReal Market State:\n{market_state}"
    )
    decider_llm = llm.with_structured_output(FinalDecision)
    chain = prompt | decider_llm
    final_decision = chain.invoke({
        "proposal": state['proposed_action'].strategy,
        "results": results_summary,
        "market_state": state['real_market'].get_state_string()
    })
    console.print(f"[green]Final Decision:[/green] {final_decision.action} {final_decision.amount:.0f} shares. [italic]Reason: {final_decision.reasoning}[/italic]")
    return {"final_decision": final_decision}

def execute_in_real_world_node(state: AgentState) -> Dict[str, Any]:
    """Executes the final decision in the real market environment."""
    console.print("--- 🚀 Executing in Real World ---")
    decision = state['final_decision']
    real_market = state['real_market']
    real_market.step(decision.action, decision.amount)
    console.print(f"[bold]Execution complete. New market state:[/bold]\n{real_market.get_state_string()}")
    return {"real_market": real_market}

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("propose", propose_action_node)
workflow.add_node("simulate", run_simulation_node)
workflow.add_node("refine", refine_and_decide_node)
workflow.add_node("execute", execute_in_real_world_node)

workflow.set_entry_point("propose")
workflow.add_edge("propose", "simulate")
workflow.add_edge("simulate", "refine")
workflow.add_edge("refine", "execute")
workflow.add_edge("execute", END)

simulator_agent = workflow.compile()
print("Simulator-in-the-loop agent graph compiled successfully.")

Simulator-in-the-loop agent graph compiled successfully.


## Phase 3: Demonstration

Let's run our agent for a few days in the market. We'll start with some good news and see how it reacts, then introduce some bad news.

In [5]:
real_market = MarketSimulator()
console.print("--- Initial Market State ---")
console.print(real_market.get_state_string())

# --- Day 1 Run ---
console.print("\n--- Day 1: Good News Hits! ---")
real_market.market_news = "Positive earnings report expected."
real_market.drift = 0.05
initial_state = {"real_market": real_market}
final_state = simulator_agent.invoke(initial_state)
real_market = final_state['real_market']

# --- Day 2 Run ---
console.print("\n--- Day 2: Bad News Hits! ---")
real_market.market_news = "New competitor enters the market."
real_market.drift = -0.05
initial_state = {"real_market": real_market}
final_state = simulator_agent.invoke(initial_state)
real_market = final_state['real_market']

--- Initial Market State ---


Day 0: Price=$100.00, News: Market is stable.
Portfolio: $10000.00 (0 shares, $10000.00 cash)



---  Day 1: Good News Hits! ---


--- 🧐 Analyst Proposing Action ---
Proposal: buy aggressively. Reason: The positive earnings report is a strong bullish signal, and the market is already stable. This is a good opportunity to enter a position before the price potentially rises further.
--- 🤖 Running Simulations ---
Simulation complete. Results will be passed to the risk manager.
--- 🧠 Risk Manager Refining Decision ---
Final Decision: buy 20 shares. Reason: The simulations confirm a strong upward trend, with all scenarios resulting in a positive return. The analyst's proposal to buy aggressively is validated. I will execute a significant but not excessive purchase of 20 shares to capitalize on the expected price increase while maintaining a cash reserve.
--- 🚀 Executing in Real World ---
Execution complete. New market state:
Day 1: Price=$99.16, News: Market is stable.
Portfolio: $7983.18 (20 shares, $8000.00 cash)



--- Day 2: Bad News Hits! ---


--- 🧐 Analyst Proposing Action ---
Proposal: sell cautiously. Reason: The entry of a new competitor introduces significant uncertainty and potential downside risk. While the price hasn't dropped dramatically yet, it's prudent to reduce exposure.
--- 🤖 Running Simulations ---
Simulation complete. Results will be passed to the risk manager.
--- 🧠 Risk Manager Refining Decision ---
Final Decision: sell 5 shares. Reason: The simulations show a high degree of variance and a negative average return, confirming the analyst's concerns. The initial proposal to sell cautiously is sound. I will de-risk the portfolio by selling 5 shares (25% of the position) to lock in some cash and reduce exposure to the potential downside from the new competitor.
--- 🚀 Executing in Real World ---
Execution complete. New market state:
Day 2: Price=$93.81, News: Market is stable.
Portfolio: $9802.90 (15 shares, $8395.73 cash)


### Analysis of the Results

The agent's behavior demonstrates the value of the simulator loop:

- **On Day 1 (Good News):**
    - The *Analyst* proposed an aggressive buy, seeing the opportunity.
    - The *Simulator* confirmed the high probability of a positive outcome.
    - The *Risk Manager* translated the aggressive strategy into a concrete, substantial purchase (20 shares), but didn't risk the entire cash balance.

- **On Day 2 (Bad News):**
    - The *Analyst* correctly identified the new risk and proposed a cautious sell.
    - The *Simulator* likely showed a mix of outcomes, with some scenarios showing a sharp drop and others a recovery, confirming the uncertainty.
    - The *Risk Manager*, seeing the variance and negative average return in the simulations, made a prudent decision to reduce the position (selling 5 shares) rather than panic-selling the entire holding. This is a much more nuanced action than a simple rule-based agent might take.

A naive agent without the simulation loop might have bought too much on day 1 and then sold everything on day 2, incurring higher transaction costs and potentially missing a recovery. Our simulator agent acted more like a real-world trader, making a probabilistic bet and then hedging that bet when new information changed the risk profile.

## Conclusion

In this notebook, we have built a powerful agent architecture that uses an internal **simulator** to test and refine its actions before committing them. By creating a loop of `Propose -> Simulate -> Refine -> Execute`, we enabled our agent to perform sophisticated risk analysis and make more nuanced, safer decisions in a dynamic environment.

This pattern is a cornerstone of building agents that can operate safely and effectively in the real world. The ability to perform what-if analysis on an internal "mental model" allows the agent to anticipate consequences, avoid costly errors, and develop more robust strategies. While the fidelity of the simulator is a critical dependency (the "simulation-reality gap"), this architecture provides a clear and extensible framework for building responsible, action-taking AI.