# T-Maze Active Inference - Planning as Message Passing

This tutorial demonstrates how to implement Active Inference using message passing on a factor graph with RxInfer.jl. We'll build an agent that solves a T-maze navigation task by minimizing Expected Free Energy (EFE) through variational message passing.

### What is Active Inference?

Active Inference is a framework that unifies perception, planning, and action under a single principle: minimizing variational free energy. Instead of separating "what do I believe?" (perception) from "what should I do?" (planning), Active Inference treats both as inference problems solved simultaneously through message passing.

The key insight is that **planning is inference**: we infer the best actions by treating them as latent variables in a probabilistic model, just like we infer hidden states from observations.

## The T-Maze Problem

Our agent starts at the middle of the trunk of a T-shaped maze and must navigate to find a reward that's either on the left or right arm. The challenge:

- **Partial observability**: The agent doesn't know where the reward is initially
- **Exploration vs. exploitation**: Should it explore to reduce uncertainty or exploit what it knows?
- **Planning horizon**: The agent must plan several steps ahead

The agent receives:
- **Position observations**: Where it currently is
- **Reward cues**: In the bottom of the T trunk is a reward cue that tells the agent, in which arm the reward is

Let's set up the environment and required packages:

## Setup

First, we activate the Julia environment and load required packages:

In [3]:
using RxInfer
using Distributions
using Plots
using LinearAlgebra
using Random
using ProgressMeter
using StableRNGs
using ColorSchemes

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling RxInfer [86711068-29c9-4ff7-b620-ae75d7495b3d] (cache misses: wrong dep version loaded (12), wrong source (2), mismatched flags (2))
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling IJuliaExt [64482eec-cc57-5312-bea1-9f24eb636db7] (cache misses: wrong dep version loaded (6))
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling IJuliaExt [2f4121a4-3b3a-5ce6-9c5e-1f2673ce168a] (cache misses: wrong dep version loaded (6))


In [4]:
### EXAMPLE_HIDDEN_BLOCK_START(Core Types and Configuration) ###

"""
    TMazeConfig

Configuration for TMaze agent experiments.

# Fields
- `time_horizon::Int`: Planning horizon for the agent
- `n_episodes::Int`: Number of episodes to run
- `n_iterations::Int`: Number of inference iterations per step
- `wait_time::Float64`: Time to wait between steps (for visualization)
- `seed::Int`: Random seed
- `experiment_name::String`: Name of the experiment (for saving results)
"""
Base.@kwdef struct TMazeConfig
    time_horizon::Int
    n_episodes::Int
    n_iterations::Int
    wait_time::Float64
    seed::Int
    experiment_name::String
end

"""
    TMazeBeliefs

Container for agent's beliefs about the TMaze environment.

# Fields
- `location::Categorical{Float64}`: Belief about current location (5 possible states)
- `reward_location::Categorical{Float64}`: Belief about reward location (left or right)
- `action_posterior::Categorical{Float64}`: Posterior distribution over actions
"""
Base.@kwdef mutable struct TMazeBeliefs
    location::Categorical{Float64}
    reward_location::Categorical{Float64}
    action_posterior::Categorical{Float64}
end

"""
    initialize_beliefs_tmaze()

Initialize agent beliefs for the TMaze environment.
"""
function initialize_beliefs_tmaze()
    # Initialize with uniform beliefs over states
    return TMazeBeliefs(
        location=Categorical(fill(1.0 / 5, 5)),
        reward_location=Categorical([0.5, 0.5]),
        action_posterior=Categorical(fill(0.25, 4))  # Uniform over 4 actions (North, East, South, West)
    )
end

# Direction types for agent actions
struct North end
struct East end
struct South end
struct West end

const DIRECTIONS = (North(), East(), South(), West())

"""
    MazeAgentAction

Represents a directional action in the maze.

# Fields
- `direction`: One of North(), East(), South(), West()
"""
struct MazeAgentAction
    direction::Union{North,East,South,West}
end

"""
Convert action index to string representation.
"""
function action_to_string(idx::Int)
    lookup = Dict(1=>"North", 2=>"East", 3=>"South", 4=>"West", 5=>"Stay")
    return get(lookup, idx, "Unknown")
end

"""
    tmaze_convert_action(next_action::Int)

Convert model action index to environment action.
Action mapping: 1=North, 2=East, 3=South, 4=West
"""
function tmaze_convert_action(next_action::Int)
    action_map = Dict(
        1 => MazeAgentAction(North()),  # North
        2 => MazeAgentAction(East()),   # East
        3 => MazeAgentAction(South()),  # South
        4 => MazeAgentAction(West())    # West
    )
    return get(action_map, next_action) do
        error("Invalid action: $next_action")
    end
end

tmaze_convert_action(next_action::AbstractVector) = tmaze_convert_action(argmax(next_action))

nothing # to suppress the output in the notebook
### EXAMPLE_HIDDEN_BLOCK_END ###

In [5]:
### EXAMPLE_HIDDEN_BLOCK_START(Environment Dynamics and Tensors) ###
"""
    TMaze

A T-shaped maze environment with 5 cells. The T has a stem cell at the bottom,
a middle junction cell, and three cells at the top (left, middle, right).
The reward can be either at the top-left or top-right cell.

# Fields
- `agent_position::Tuple{Int,Int}`: Current agent position as (x,y) tuple
- `reward_position::Symbol`: Either :left or :right
- `maze_structure::Matrix{UInt8}`: Binary encoding of walls for each cell
"""
mutable struct TMaze
    agent_position::Tuple{Int,Int}
    reward_position::Symbol
    maze_structure::Matrix{UInt8}
    reward_values::Dict{Tuple{Int,Int},Float64}

    function TMaze(reward_position::Symbol=:left)
        reward_position in [:left, :right] || throw(ArgumentError("reward_position must be :left or :right"))

        # Create the T-maze structure with 5 cells in T shape
        # Using a 3×3 grid with T shape
        # Binary encoding: North=1, West=2, South=4, East=8
        structure = [
            0x0B 0x09 0x0A; # Top row of T: left (0x0B), middle (0x09), right (0x0A)
            0x0F 0x05 0x0F; # Middle row: only middle cell accessible (0x05)
            0x0F 0x01 0x0F  # Bottom row: only middle cell accessible (0x01) - entry point
        ]

        # Define reward values
        reward_values = Dict{Tuple{Int,Int},Float64}()
        if reward_position == :left
            reward_values[(1, 3)] = 1.0  # Top left with positive reward
            reward_values[(3, 3)] = -1.0 # Top right with negative reward
        else
            reward_values[(1, 3)] = -1.0 # Top left with negative reward
            reward_values[(3, 3)] = 1.0  # Top right with positive reward
        end

        # Start at the bottom of the T
        agent_position = (2, 1)

        return new(agent_position, reward_position, structure, reward_values)
    end

    function TMaze(reward_position::Symbol, start_position::Tuple{Int,Int})
        reward_position in [:left, :right] || throw(ArgumentError("reward_position must be :left or :right"))

        # Validate start position
        valid_positions = [(2, 1), (2, 2), (1, 3), (2, 3), (3, 3)]
        start_position in valid_positions || throw(ArgumentError("Invalid start position. Must be one of: $valid_positions"))

        # Create the T-maze structure with 5 cells in T shape
        structure = [
            0x0B 0x09 0x0A; # Top row of T: left (0x0B), middle (0x09), right (0x0A)
            0x0F 0x05 0x0F; # Middle row: only middle cell accessible (0x05)
            0x0F 0x01 0x0F  # Bottom row: only middle cell accessible (0x01) - entry point
        ]

        # Define reward values
        reward_values = Dict{Tuple{Int,Int},Float64}()
        if reward_position == :left
            reward_values[(1, 3)] = 1.0  # Top left with positive reward
            reward_values[(3, 3)] = -1.0 # Top right with negative reward
        else
            reward_values[(1, 3)] = -1.0 # Top left with negative reward
            reward_values[(3, 3)] = 1.0  # Top right with positive reward
        end

        return new(start_position, reward_position, structure, reward_values)
    end
end

"""
    create_tmaze(reward_position::Symbol=rand([:left, :right]))

Create a T-maze environment with a reward at the specified position.
"""
function create_tmaze(reward_position::Symbol=rand([:left, :right]))
    return TMaze(reward_position)
end

"""
    create_tmaze(reward_position::Symbol, start_position::Tuple{Int,Int})

Create a T-maze environment with a reward at the specified position and the agent
starting at the specified position.
"""
function create_tmaze(reward_position::Symbol, start_position::Tuple{Int,Int})
    return TMaze(reward_position, start_position)
end

"""
    boundaries(env::TMaze, pos::Tuple{Int,Int})

Get the boundary encoding for a position in the maze.
"""
function boundaries(env::TMaze, pos::Tuple{Int,Int})
    return env.maze_structure[pos[2], pos[1]]
end

"""
    step!(env::TMaze, action::MazeAgentAction)

Take a step in the T-maze environment with the given action.
Returns a tuple of (position_obs, reward_cue, reward).
"""
function step!(env::TMaze, action::MazeAgentAction)
    # Update agent position based on action
    env.agent_position = next_position(env, env.agent_position, action)

    # Create observations
    position_obs = create_position_observation(env)
    reward_cue = create_reward_cue(env)
    reward = get_reward(env)

    return position_obs, reward_cue, reward
end

"""
    create_position_observation(env::TMaze)

Create a one-hot encoded vector representing the agent's position.
"""
function create_position_observation(env::TMaze)
    # Create vector for 5 positions
    position_obs = zeros(Float64, 5)
    position_idx = position_to_index(env.agent_position)
    position_obs[position_idx] = 1.0
    return position_obs
end

"""
    create_reward_cue(env::TMaze)

Create a vector encoding information about the reward location.
At the bottom position, it reveals the true reward location.
At other positions, it provides uniform uncertainty.
"""
function create_reward_cue(env::TMaze)
    reward_cue = zeros(Float64, 2)

    # Only provide informative cue at the bottom position
    if env.agent_position == (2, 1)
        if env.reward_position == :left
            reward_cue = [1.0, 0.0]  # Left reward
        else
            reward_cue = [0.0, 1.0]  # Right reward
        end
    else
        # At other positions, provide uniform uncertainty
        reward_cue = [0.5, 0.5]
    end

    return reward_cue
end

"""
    get_reward(env::TMaze)

Get the reward at the current position, or 0 if no reward.
"""
function get_reward(env::TMaze)
    # Return the reward at the current position, or 0 if there's no reward here
    return get(env.reward_values, env.agent_position, 0.0)
end

"""
    next_position(env::TMaze, pos::Tuple{Int,Int}, action::MazeAgentAction)

Calculate the next position based on current position and action.
"""
function next_position(env::TMaze, pos::Tuple{Int,Int}, action::MazeAgentAction)
    # Handle each of the 5 valid positions and their possible movements

    # Bottom of T (2,1)
    if pos == (2, 1)
        if action.direction isa North
            return (2, 2)  # Move to middle junction
        else
            return pos     # Stay in place for all other directions (hitting walls)
        end

        # Middle junction (2,2)
    elseif pos == (2, 2)
        if action.direction isa North
            return (2, 3)  # Move to top middle
        elseif action.direction isa East
            return (2, 2)  # Stay in place 
        elseif action.direction isa South
            return (2, 1)  # Move to bottom
        elseif action.direction isa West
            return (2, 2)  # Stay in place
        end

        # Top left (1,3)
    elseif pos == (1, 3)
        if action.direction isa East
            return (2, 3)  # Move to top middle
        elseif action.direction isa South
            return (2, 2)  # Move to middle junction
        else
            return pos     # Stay in place for other directions (hitting walls)
        end

        # Top middle (2,3)
    elseif pos == (2, 3)
        if action.direction isa East
            return (3, 3)  # Move to top right
        elseif action.direction isa South
            return (2, 2)  # Move to middle junction
        elseif action.direction isa West
            return (1, 3)  # Move to top left
        else
            return pos     # Stay in place for North (hitting wall)
        end

        # Top right (3,3)
    elseif pos == (3, 3)
        if action.direction isa South
            return (2, 2)  # Move to middle junction
        elseif action.direction isa West
            return (2, 3)  # Move to top middle
        else
            return pos     # Stay in place for other directions (hitting walls)
        end

        # Should not reach here with valid positions
    else
        error("Invalid position in T-maze: $pos")
    end
end

"""
    position_to_index(pos::Tuple{Int,Int})

Convert position coordinates to state index (1-5).
"""
function position_to_index(pos::Tuple{Int,Int})
    # Valid positions in the T-maze
    position_mapping = Dict(
        (2, 1) => 1,  # Bottom of T
        (2, 2) => 2,  # Middle junction
        (1, 3) => 3,  # Top left
        (2, 3) => 4,  # Top middle
        (3, 3) => 5   # Top right
    )

    if haskey(position_mapping, pos)
        return position_mapping[pos]
    else
        error("Invalid T-maze position: $pos")
    end
end

"""
    create_reward_observation_tensor()

Create the reward observation tensor for the T-maze environment.
This tensor has dimensions (2×5×2) representing:
- First dimension (2): Observation values [left_prob, right_prob]
- Second dimension (5): Agent location states (1-5)
- Third dimension (2): Reward location states (1=left, 2=right)

Returns a 2×5×2 Float64 tensor.
"""
function create_reward_observation_tensor()
    # Create reward observation tensor (2×5×2)
    # Dimensions: (observation_values, agent_location, reward_location)
    reward_obs_tensor = zeros(Float64, 2, 5, 2)

    # Fill with default uncertainty [0.5, 0.5] at all non-bottom positions
    for loc in 2:5, reward_loc in 1:2
        reward_obs_tensor[:, loc, reward_loc] .= 0.5
    end

    # At bottom position (state 1), reveal true reward location
    reward_obs_tensor[:, 1, 1] = [1.0, 0.0]  # Left reward
    reward_obs_tensor[:, 1, 2] = [0.0, 1.0]  # Right reward

    return reward_obs_tensor
end

"""
    create_location_transition_tensor()

Create the location transition tensor for the T-maze environment.
This tensor has dimensions (5×5×4) representing:
- First dimension (5): Next location state
- Second dimension (5): Current location state
- Third dimension (4): Action (1=North, 2=East, 3=South, 4=West)

Returns a 5×5×4 Float64 tensor.
"""
function create_location_transition_tensor()
    # Create location transition tensor (5×5×4)
    # Dimensions: (next_location, current_location, action)
    transition_tensor = zeros(Float64, 5, 5, 4)

    # Bottom of T (state 1)
    transition_tensor[2, 1, 1] = 1.0  # North -> Middle junction
    transition_tensor[1, 1, 2] = 1.0  # East -> Stay (wall)
    transition_tensor[1, 1, 3] = 1.0  # South -> Stay (wall)
    transition_tensor[1, 1, 4] = 1.0  # West -> Stay (wall)

    # Middle junction (state 2)
    transition_tensor[4, 2, 1] = 1.0  # North -> Top middle
    transition_tensor[2, 2, 2] = 1.0  # East -> Stay (wall)
    transition_tensor[1, 2, 3] = 1.0  # South -> Bottom
    transition_tensor[2, 2, 4] = 1.0  # West -> Stay (wall)

    # Top left (state 3)
    transition_tensor[3, 3, 1] = 1.0  # North -> Stay (wall)
    transition_tensor[4, 3, 2] = 1.0  # East -> Top middle
    transition_tensor[2, 3, 3] = 1.0  # South -> Middle junction
    transition_tensor[3, 3, 4] = 1.0  # West -> Stay (wall)

    # Top middle (state 4)
    transition_tensor[4, 4, 1] = 1.0  # North -> Stay (wall)
    transition_tensor[5, 4, 2] = 1.0  # East -> Top right
    transition_tensor[2, 4, 3] = 1.0  # South -> Middle junction
    transition_tensor[3, 4, 4] = 1.0  # West -> Top left

    # Top right (state 5)
    transition_tensor[5, 5, 1] = 1.0  # North -> Stay (wall)
    transition_tensor[5, 5, 2] = 1.0  # East -> Stay (wall)
    transition_tensor[2, 5, 3] = 1.0  # South -> Middle junction
    transition_tensor[4, 5, 4] = 1.0  # West -> Top middle

    return transition_tensor
end

"""
    create_reward_to_location_mapping()

Create the reward-to-location mapping tensor for the T-maze environment.
This tensor has dimensions (5×2) representing:
- First dimension (5): Location states
- Second dimension (2): Reward location states (1=left, 2=right)

Returns a 5×2 Float64 tensor.
"""
function create_reward_to_location_mapping()
    # Create reward-to-location mapping tensor (5×2)
    # Dimensions: (location, reward_location)
    reward_mapping = zeros(Float64, 5, 2)

    # Left reward (reward_location=1) is at top-left position (state 3)
    reward_mapping[3, 1] = 1.0

    # Right reward (reward_location=2) is at top-right position (state 5)
    reward_mapping[5, 2] = 1.0

    return reward_mapping
end

nothing # to suppress the output in the notebook
### EXAMPLE_HIDDEN_BLOCK_END ###

In [6]:
### EXAMPLE_HIDDEN_BLOCK_START(Visualization and Plotting) ###
#### Visualization ####
scheme = colorschemes[:Paired_9]

"""
    MAZE_THEME

A consistent color theme for maze environments.

# Fields
- `agent`: Color for the agent
- `cue`: Color for cue indicators
- `reward_positive`: Color for positive rewards
- `reward_negative`: Color for negative rewards
- `corridor`: Color for corridors/walkable areas
- `wall`: Color for walls
- `background`: Background color
"""
const MAZE_THEME = (
    agent=scheme[2],
    cue=scheme[7],
    reward_positive=scheme[4],
    reward_negative=scheme[6],
    corridor=:white,
    wall=:black,
    background=:white
)

"""
    plot_tmaze(env::TMaze)

Create a visualization of the TMaze environment.
Returns a Plots object showing the T-maze structure with corridors, walls,
rewards, and agent position.
"""
function plot_tmaze(env::TMaze)
    # Create a new plot with a clean appearance
    p = Plots.plot(
        aspect_ratio=:equal,
        legend=false,
        axis=false,
        grid=false,
        ticks=false,
        background_color=MAZE_THEME.background,
        size=(600, 600),
        frame=:none,
        margin=0Plots.mm,
    )
    scale = 20

    # Vertical corridor
    Plots.plot!(p, [1, 2], [1, 4], color=MAZE_THEME.corridor, linewidth=0, fill=true, fillcolor=MAZE_THEME.corridor)

    # Horizontal corridor at top
    Plots.plot!(p, [0, 3], [3, 4], color=MAZE_THEME.corridor, linewidth=0, fill=true, fillcolor=MAZE_THEME.corridor)

    # Draw the outer walls of the maze (black)
    # Vertical corridor walls
    Plots.plot!(p, [1, 1], [1, 3], color=MAZE_THEME.wall, linewidth=2)  # Left vertical wall
    Plots.plot!(p, [2, 2], [1, 3], color=MAZE_THEME.wall, linewidth=2)  # Right vertical wall

    # Horizontal corridor top walls
    Plots.plot!(p, [0, 3], [4, 4], color=MAZE_THEME.wall, linewidth=2)  # Top horizontal wall
    Plots.plot!(p, [0, 1], [3, 3], color=MAZE_THEME.wall, linewidth=2)  # Bottom left horizontal wall
    Plots.plot!(p, [2, 3], [3, 3], color=MAZE_THEME.wall, linewidth=2)  # Bottom right horizontal wall

    # Bottom wall
    Plots.plot!(p, [1, 2], [1, 1], color=MAZE_THEME.wall, linewidth=2)

    # Left and right top walls
    Plots.plot!(p, [0, 0], [3, 4], color=MAZE_THEME.wall, linewidth=2)  # Left wall
    Plots.plot!(p, [3, 3], [3, 4], color=MAZE_THEME.wall, linewidth=2)  # Right wall

    # Draw grid lines between cells
    # Horizontal lines
    Plots.plot!(p, [1, 2], [2, 2], color=MAZE_THEME.wall, linewidth=0.5, alpha=0.7)  # Bottom to middle
    Plots.plot!(p, [1, 2], [3, 3], color=MAZE_THEME.wall, linewidth=0.5, alpha=0.7)  # Middle to top

    # Vertical lines at top
    Plots.plot!(p, [1, 1], [3, 4], color=MAZE_THEME.wall, linewidth=0.5, alpha=0.7)  # Top to top-left
    Plots.plot!(p, [2, 2], [3, 4], color=MAZE_THEME.wall, linewidth=0.5, alpha=0.7)  # Top to top-right

    # Draw reward locations with clear indicators
    reward_position = env.reward_position

    # The left reward location (left arm of the T)
    left_color = reward_position == :left ? MAZE_THEME.reward_positive : MAZE_THEME.reward_negative
    Plots.scatter!(p, [0.5], [3.5], markersize=ceil(Int, scale), color=left_color, alpha=0.7, markerstrokewidth=ceil(Int, scale / 15))

    # The right reward location (right arm of the T)
    right_color = reward_position == :right ? MAZE_THEME.reward_positive : MAZE_THEME.reward_negative
    Plots.scatter!(p, [2.5], [3.5], markersize=ceil(Int, scale), color=right_color, alpha=0.7, markerstrokewidth=ceil(Int, scale / 15))

    # Draw cue location (bottom middle)
    Plots.scatter!(p, [1.5], [1.5], markersize=ceil(Int, scale), color=MAZE_THEME.cue, alpha=0.7, markerstrokewidth=ceil(Int, scale / 15))

    # Convert agent position to plot coordinates
    x, y = 0, 0
    agent_pos = env.agent_position

    if agent_pos != (2, 1)
        Plots.annotate!(p, 1.5, 1.5, Plots.text("Cue", :black, ceil(Int, scale / 2)))
    end

    if agent_pos == (2, 1)      # Bottom
        x, y = 1.5, 1.5
    elseif agent_pos == (2, 2)  # Middle
        x, y = 1.5, 2.5
    elseif agent_pos == (1, 3)  # Top left
        x, y = 0.5, 3.5
    elseif agent_pos == (2, 3)  # Top middle
        x, y = 1.5, 3.5
    elseif agent_pos == (3, 3)  # Top right
        x, y = 2.5, 3.5
    end

    # Draw agent as a circle with a black border
    Plots.scatter!(p, [x], [y], markersize=ceil(Int, (2 / 3) * scale), color=MAZE_THEME.agent, markerstrokewidth=ceil(Int, scale / 15), markerstrokecolor=MAZE_THEME.wall)
    return p
end

"""
    plot_reward_location_belief(beliefs::TMazeBeliefs)

Plot the belief distribution over reward location (left/right).
"""
function plot_reward_location_belief(beliefs::TMazeBeliefs)
    probs = probvec(beliefs.reward_location)
    p = bar(["Left", "Right"], probs, 
            title="Reward Location Belief",
            titlefontsize=10, # Decreased font size
            ylabel="Probability",
            ylims=(0, 1),
            color=:blue,
            alpha=0.7,
            legend=false)
    return p
end

"""
    plot_action_posterior(beliefs::TMazeBeliefs, step_number::Int)

Plot the posterior distribution over actions.
"""
function plot_action_posterior(beliefs::TMazeBeliefs, step_number::Int)
    probs = probvec(beliefs.action_posterior)
    action_names = ["North", "East", "South", "West"]
    p = bar(action_names, probs,
            title="Action Posterior (Step: $step_number)", # Included step number
            titlefontsize=10, # Decreased font size
            ylabel="Probability",
            ylims=(0, 1),
            color=:green,
            alpha=0.7,
            legend=false,
            xrotation=45)
    return p
end

nothing # to suppress the output in the notebook
### EXAMPLE_HIDDEN_BLOCK_END ###

### Active Inference as Message Passing

In traditional Active Inference, the agent selects actions by minimizing the Expected Free Energy (EFE), which balances two fundamental drives:

$$G = \underbrace{D_{KL}[Q(s|u) || P(s)]}_{\text{Pragmatic value}} + \underbrace{\mathbb{E}_{Q(s|u)}[H[P(o|s)]]}_{\text{Epistemic value}}$$

* **Pragmatic value**: Drives the agent to visit states that align with its preferences (exploiting known rewards).
* **Epistemic value**: Drives the agent to seek out informative observations (exploring to reduce uncertainty).

Computing the EFE directly often involves evaluating all possible action sequences, which quickly becomes an intractable combinatorial search problem for extended planning horizons.

However, following the approach in ["A Message Passing Realization of Expected Free Energy Minimization"](https://arxiv.org/abs/2508.02197), we can elegantly bypass this limitation. By reformulating EFE minimization directly as Variational Free Energy (VFE) minimization, we transform the combinatorial search into a tractable inference problem that can be solved using standard variational message passing techniques.

We implement this in RxInfer.jl by introducing **two epistemic prior nodes** directly into our generative model's factor graph:

1. **Epistemic Action Prior (Exploration Node)**: Encourages actions that maximize information gain and reduce uncertainty about the environment.
2. **Epistemic State Prior (Ambiguity Node)**: Penalizes states with high observation noise, encouraging the agent to visit states where observations are reliable and informative.

Crucially, in this tutorial, we focus on agents that **already know the dynamics of the environment** (the observation $A$ matrix and transition $B$ matrix). And where the injected epistemic priors are invariant to the agent's current beliefs about the state (see below). Consequently, these priors can be analytically reduced to the entropies of the $A$ and $B$ matrices. This allows us to implement it without requiring complex, belief-dependent iterative updates during inference.

Generally these two priors naturally encode the epistemic drive for exploration, allowing our agent to seamlessly balance the acts of reading the cue and seeking the reward — all through reactive message passing.

## The Factor Graph Model

Our model represents the agent's beliefs about:
- **Current location**: Where the agent is now
- **Reward location**: Whether the reward is left or right (unknown)
- **Future locations**: Where the agent will be after taking planned actions
- **Actions**: What actions to take at each future timestep

The factor graph connects these variables through:
- **Transition factors**: How actions change location
- **Observation factors**: How locations generate observations
- **Epistemic priors**: How uncertainty guides action selection

Let's define the model:

In [7]:
@model function efe_tmaze_agent(reward_observation_tensor, location_transition_tensor, prior_location, prior_reward_location, reward_to_location_mapping, u_prev, T, reward_cue_observation, location_observation)
    old_location ~ prior_location # =beliefs.location
    reward_location ~ prior_reward_location # =beliefs.reward_location

    current_location ~ DiscreteTransition(old_location, location_transition_tensor, u_prev)
    location_observation ~ DiscreteTransition(current_location, diageye(5))
    reward_cue_observation ~ DiscreteTransition(current_location, reward_observation_tensor, reward_location) # Reward observation tensor is 2x5x2 (reward location observation x agent location x reward location state)

    previous_location = current_location
    for t in 1:T
        # Epistemic Action Prior (Exploration Node) - theoretically u[t] ~ Categorical([0.25, 0.25, 0.25, 0.25]) 
        u[t] ~ Categorical(calc_epis_action_prior_vec(reward_cue_observation, location_transition_tensor)) # needed since the other version doesnt get triggered in each VMP iteration: u[t] ~ Categorical([0.2, 0.2, 0.2, 0.2, 0.2])
        location[t] ~ DiscreteTransition(previous_location, location_transition_tensor, u[t])
        # Epistemic State Prior (Ambiguity Node) - theoretically location[t] ~ Categorical([1/3, 1/6, 1/6, 1/6, 1/6])
        location[t] ~ Categorical(calc_epis_loc_prior_vec(reward_observation_tensor))
        previous_location = location[t]
    end
    location[end] ~ DiscreteTransition(reward_location, reward_to_location_mapping) # Reward tensor is 5x2 mapping current belief about reward location (left/right) to T-maze locations (5)
end

@constraints function efe_tmaze_agent_constraints()
end

@initialization function efe_tmaze_agent_initialization(prior_location, prior_reward_location, prior_future_locations)
    μ(old_location) = prior_location
    μ(reward_location) = prior_reward_location
    μ(location) = prior_future_locations
end

efe_tmaze_agent_initialization (generic function with 1 method)

### Understanding the Epistemic Priors

In Active Inference, epistemic priors compel the agent to resolve uncertainty by targeting informative transitions and observations. In this `rxinfer.jl` implementation, we inject these priors directly into the categorical distributions governing actions (`u[t]`) and states (`location[t]`).

Before diving into the mechanics, let's establish a clear notation. We will use capital letters (e.g., $X$) to denote random variables/distributions and lowercase letters (e.g., $x$) for specific samples or states. Furthermore, we use the subscripted entropy operator $H_q(\cdot)$ to specify that the expectation is taken with respect to the variational distribution $q$.

**1. Epistemic Action Prior (Exploration Node)**
The action prior formalizes curiosity by favoring actions that maximize the entropy of predicted successor states—areas where the internal model is most uncertain. Assuming the Markov property, the local time-step prior is:

$$\tilde{p}(u_t) \propto \exp(H_q(\mathbf{X}_t \mid \mathbf{X}_{t-1}, u_t))$$

This conditional entropy evaluates specific actions $u_t$ across all possible previous states:

$$H_q(\mathbf{X}_t \mid \mathbf{X}_{t-1}, u_t) = -\sum_{\mathbf{x}_{t-1} \in \mathcal{X}} q(\mathbf{x}_{t-1}) H_q(\mathbf{X}_t \mid \mathbf{x}_{t-1}, u_t)$$

Because the agent knows the transition matrix ($B$ matrix), the right-hand Shannon entropy term $H_q(\mathbf{X}_t \mid \mathbf{x}_{t-1}, u_t)$ is fixed. If this entropy is **invariant** to the previous state $\mathbf{x}_{t-1}$ (meaning environmental noise is uniform regardless of location), it becomes a constant $C_{u_t}$ that we can factor out. Since the variational distribution sums to 1 ($\sum q(\mathbf{x}_{t-1}) = 1$), the equation simplifies to $H_q(\mathbf{X}_t \mid \mathbf{x}_{t-1}, u_t)$. 

This invariance allows `calc_epis_action_prior_vec` to skip dynamic dot products with the belief vector $q(\mathbf{x}_{t-1})$ and efficiently compute the prior entirely from the $B$ matrix.

**2. Epistemic State Prior (Ambiguity Node)**
The state prior evaluates the information value of specific locations. In the T-maze, the hidden state factorizes into agent location and reward location: $x_t = \{ x_t^{loc}, x_t^{rew} \}$. The marginal epistemic value for an agent's location averages over possible reward locations:

$$\tilde{p}(x_t^{loc}) \propto \exp(H_q(Y_t \mid x_t^{loc}, X_t^{rew}))$$

Just like the epistemic action prior, this resolves to:

$$H_q(Y_t \mid x_t^{loc}, X_t^{rew}) = - \sum_{x_t^{rew} \in \mathcal{X}^{rew}} q(x_t^{rew}) \underbrace{H_q(Y_t \mid x_t^{loc}, x_t^{rew})}_{\text{Fixed value given } A}$$

Since the agent fully knows the observation model (the $A$ matrix), the entropy term  is a fixed value.
In our code, `calc_epis_loc_prior_vec` checks if the observation matrix $A$ is invariant to the agent's beliefs about the reward location. If it is invariant, we can safely compute the ambiguity resolution prior upfront — reducing it directly to the entropies of the  matrix — and pass it as a fixed, pre-computed parameter to the Categorical distribution in our `@model` macro.


In [8]:
function compute_invariant_entropy_prior(
    A::AbstractArray{T, 3}, 
    dim_keep::Int, 
    dim_check::Int; 
    atol=1e-8, 
    err_msg="Entropy depends on hidden state"
) where T <: Real

    # 1. Calculate Entropy Map (Collapse Dim 1)
    # Resulting H has dimensions (size(A, 2), size(A, 3))
    dim_2, dim_3 = size(A, 2), size(A, 3)
    H = zeros(T, dim_2, dim_3)

    for k in 1:dim_3
        for j in 1:dim_2
            # Calculate negative entropy: Σ p * log(p)
            # We use a view to avoid allocation
            dist = @view A[:, j, k]
            val = sum(p * log(max(p, 1e-12)) for p in dist)
            H[j, k] = val
        end
    end

    # Determine sizes for the output vector
    # Note: H is 2D. 
    # If dim_keep is 2 (User's Logic for Location), we map it to dim 1 of H.
    # If dim_keep is 3 (User's Logic for Action), we map it to dim 2 of H.
    
    # Map input tensor dims (2,3) to H dims (1,2)
    h_keep_dim = (dim_keep == 2) ? 1 : 2
    h_check_dim = (dim_check == 2) ? 1 : 2
    
    n_items = size(H, h_keep_dim)
    entropy_vec = zeros(T, n_items)

    # 2. Check Uniformity and Reduce
    for i in 1:n_items
        # slice = H[i, :] if keeping rows, or H[:, i] if keeping cols
        # selectdim(A, d, i) fixes dimension d to index i
        slice = selectdim(H, h_keep_dim, i)
        first_h = slice[1]
        
        # Verify that entropy is invariant along the check dimension
        if !all(x -> isapprox(x, first_h; atol=atol), slice)
            throw(ArgumentError(err_msg))
        end
        entropy_vec[i] = first_h
    end

    # 3. Softmax with Log-Sum-Exp stability
    exp_vals = exp.(entropy_vec .- maximum(entropy_vec))
    return exp_vals ./ sum(exp_vals)
end

function calc_epis_action_prior_vec(message_trigger, B_matrix::AbstractArray{T, 3}; atol=1e-8) where T <: Real
    return compute_invariant_entropy_prior(
        B_matrix, 
        3, # Keep Dimension (Action/k)
        2; # Check Dimension (Prev State/j)
        atol=atol,
        err_msg="Transition entropy depends on q(x_{t-1}), which RxInfer cannot currently handle."
    )
end

function calc_epis_loc_prior_vec(A_matrix::AbstractArray{T, 3}; atol=1e-8) where T <: Real
    return compute_invariant_entropy_prior(
        A_matrix,
        2, # Keep Dimension (Agent Location)
        3; # Check/Marginalize Dimension (Reward Location)
        atol=atol,
        err_msg="Observation entropy depends on q(reward_location), which RxInfer cannot currently handle."
    )
end

calc_epis_loc_prior_vec (generic function with 1 method)

## Inference and Action Selection

During inference, variational message passing updates beliefs about:
- Current and future locations
- Reward location
- Optimal actions

The epistemic priors guide the inference toward actions and states that reduce uncertainty. After inference, we select the action with highest posterior probability.

Let's define the inference function:

In [9]:
"""
Initialize beliefs for inference, either from scratch or from previous results.
This helps warm-start the inference procedure for faster convergence.
"""
function get_initialization_tmaze(initialization_fn, beliefs, previous_result::Nothing)
    future_location_beliefs = vague(Categorical, 5)
    return initialization_fn(beliefs.location, beliefs.reward_location, future_location_beliefs)
end

function get_initialization_tmaze(initialization_fn, beliefs, previous_result)
    # Use posteriors from last VMP iteration to initialize next inference
    current_location_belief = last(previous_result.posteriors[:location])[1]
    future_location_beliefs = last(previous_result.posteriors[:location])[2:end]
    reward_location_belief = last(previous_result.posteriors[:reward_location])
    return initialization_fn(current_location_belief, reward_location_belief, future_location_beliefs)
end

get_initialization_tmaze (generic function with 2 methods)

In [10]:
"""
Execute a single inference step to determine the next action.
Message passing updates beliefs about states and actions simultaneously.
"""
function execute_step_tmaze(
    env, position_obs, reward_cue, beliefs, model, tensors, config, callbacks, 
    time_remaining, previous_result, previous_action;
    constraints_fn, initialization_fn, inference_kwargs...
)
    # Convert previous action to one-hot encoding
    previous_action_vec = Float64.(typeof(previous_action.direction) .== [North, East, South, West])

    # Initialize inference from previous results (warm-start)
    initialization = get_initialization_tmaze(initialization_fn, beliefs, previous_result)

    # Run variational message passing inference
    result = infer(
        model=model(
            reward_observation_tensor=tensors.reward_observation,
            location_transition_tensor=tensors.location_transition,
            prior_location=beliefs.location,
            prior_reward_location=beliefs.reward_location,
            reward_to_location_mapping=tensors.reward_to_location,
            u_prev=previous_action_vec,
            T=time_remaining
        ),
        data=(
            location_observation=position_obs,
            reward_cue_observation=reward_cue
        ),
        constraints=constraints_fn(),
        callbacks=callbacks,
        iterations=config.n_iterations,
        initialization=initialization;
        inference_kwargs...
    )

    # Select action with highest posterior probability
    next_action_idx = Int(mode(first(last(result.posteriors[:u]))))
    next_action = tmaze_convert_action(next_action_idx)

    # Update beliefs based on inference results
    beliefs.location = last(result.posteriors[:current_location])
    beliefs.reward_location = last(result.posteriors[:reward_location])
    beliefs.action_posterior = first(last(result.posteriors[:u]))

    return next_action_idx, next_action, result
end

execute_step_tmaze

## Running the Agent

Now let's set up the environment and run the agent. We'll create the required tensors (transition tensor B, observation tensor A) and configure the agent:

In [11]:
# Create transition and observation tensors
tensors = (
    reward_observation = create_reward_observation_tensor(), # A matrix
    location_transition = create_location_transition_tensor(), # B matrix
    reward_to_location = create_reward_to_location_mapping()
)

# Configure agent: plan 4 steps ahead
config = TMazeConfig(
    time_horizon = 4,          # Planning horizon
    n_episodes = 3,            # Number of episodes
    n_iterations = 1,         # VMP iterations per step
    wait_time = 0.2,          # Visualization delay for gif generation
    seed = 42,                 # Random seed
    experiment_name = "efe_tmaze_demo"
)

TMazeConfig(4, 3, 1, 0.2, 42, "efe_tmaze_demo")

Now let's run the agent and visualize its behavior. The agent will:
1. Observe its current position and reward cues
2. Perform inference to plan actions
3. Execute the planned action
4. Update beliefs based on new observations

The visualization shows:
- **Maze state**: Current agent position and reward location
- **Reward belief**: Probability that reward is left vs. right
- **Action posterior**: Probability distribution over actions

We'll use a helper function to run the agent and generate an animated visualization:

In [12]:
"""
Run the agent and record its behavior as an animated GIF.
Shows the agent's beliefs and planned actions at each step.
"""
function run_and_record_tmaze_gif(model, tensors, config, seed; filename="tmaze_agent.gif", kwargs...)
    # Setup environment and agent
    rng = StableRNG(seed)
    reward_position = rand(rng, [:left, :right])
    env = create_tmaze(reward_position, (2, 2)) 
    beliefs = initialize_beliefs_tmaze()
    
    previous_result = nothing
    next_action = MazeAgentAction(East()) # Initial placeholder
    next_action_idx = 2

    # Initial observations
    position_obs = convert.(Float64, create_position_observation(env))
    reward_cue = convert.(Float64, create_reward_cue(env))
    reward = 0.0

    println("Recording T-Maze episode (Reward: $reward_position) to $filename...")

    # Animation loop
    reached_goal = false
    stop_next = false
    current_step = 0
    
    anim = @animate for t in config.time_horizon:-1:-1
        if stop_next
            break
        end

        if reached_goal
            # Final frame: goal reached
            p_maze = plot_tmaze(env)
            title!(p_maze, "Goal Reached!\nReward Loc: $reward_position | Reward: $reward", titlefontsize=10)
            
            p_reward = plot_reward_location_belief(beliefs)
            p_action = plot_action_posterior(beliefs, current_step)
            
            p = plot(p_maze, p_reward, p_action, 
                    layout=@layout([a{0.6w} [b; c]]),
                    size=(500, 300))
            
            stop_next = true
            
        else
            # Normal steps: perform inference and visualize
            current_step = config.time_horizon - t + 1
            
            next_action_idx, next_action, result = execute_step_tmaze(
                env, position_obs, reward_cue, beliefs, model, tensors, config,
                nothing, t + 1, previous_result, next_action; 
                kwargs...
            )
            previous_result = result
            action_str = action_to_string(next_action_idx)
            
            # Create visualization before executing action
            p_maze = plot_tmaze(env)
            title!(p_maze, "Step: $current_step | Planned Action: $action_str\nReward Loc: $reward_position | Prev Reward: $reward", titlefontsize=10)
            
            p_reward = plot_reward_location_belief(beliefs)
            p_action = plot_action_posterior(beliefs, current_step)
            
            p = plot(p_maze, p_reward, p_action,
                    layout=@layout([a{0.6w} [b; c]]),
                    size=(500, 300))
            
            # Execute action
            position_obs, reward_cue, reward = step!(env, next_action)
            position_obs = convert.(Float64, position_obs)
            reward_cue = convert.(Float64, reward_cue)
            
            if reward == 1
                reached_goal = true
            end
        end
        
        sleep(config.wait_time)
        p 
    end

    return gif(anim, filename, fps = 1)
end

run_and_record_tmaze_gif

In [13]:
# Run the agent and generate visualization
run_and_record_tmaze_gif(
    efe_tmaze_agent, 
    tensors, 
    config, 
    config.seed; 
    constraints_fn = efe_tmaze_agent_constraints, 
    initialization_fn = efe_tmaze_agent_initialization
)
nothing # supress double visual output in .ipynb file

Recording T-Maze episode (Reward: right) to tmaze_agent.gif...


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSaved animation to c:\Users\BMW\Documents\activeInference\rxinfer\RxInferExamples.jl\examples\Basic Examples\T-Maze Active Inference\tmaze_agent.gif


![](tmaze_agent.gif)

## Summary

This tutorial demonstrated how to implement Active Inference using message passing in RxInfer.jl:

1. **Model Structure**: We built a factor graph that represents beliefs about states, actions, and observations
2. **Epistemic Priors**: Two prior nodes encode the epistemic value:
   - Action prior encourages exploration
   - State prior encourages visiting informative locations
3. **Message Passing**: Variational message passing simultaneously updates beliefs about states and actions
4. **Action Selection**: Actions are selected by taking the mode of the action posterior

The key insight is that **planning is inference**: by treating actions as latent variables in a probabilistic model, we can use standard inference algorithms to solve planning problems. The epistemic priors naturally encode the exploration-exploitation tradeoff through information-theoretic measures.

## Further Reading

- ["A Message Passing Realization of Expected Free Energy Minimization"](https://arxiv.org/abs/2508.02197) - The paper describing this approach for known parameters
- ["Expected Free Energy-based Planning as Variational Inference"](https://arxiv.org/abs/2504.14898) - The foundational (theoretical) paper introducing EFE as VFE
- ["Active Inference is a Subtype of Variational Inference"](https://arxiv.org/abs/2511.18955) - Further work, including parameter learning