### Multiagent Active Blockference

This notebook is an experimental exploration of multi-agent active inference. CadCAD is not used at this point.

We are considering an environment with two agents, Karl and Thomas, who are trying to move to a preferred state without colliding.

In [58]:
import itertools
import numpy as np
import sys

# adding tools to the system path
sys.path.insert(0, '../../')

from blockference.envs.grid_env import GridAgent
from blockference.gridference import ActiveGridference

In [41]:
# start with 2x2 grid
grid = list(itertools.product(range(2), repeat=2))
border = np.sqrt(len(grid)) - 1
pos_dict = {}
for i in range(0, len(grid)):
    pos_dict[i] = grid[i]
print(pos_dict)
num_agents = 2 # start with 2 agents
init_pos = [0, 3] # agents will start at positions 0 and 3

{0: (0, 0), 1: (0, 1), 2: (1, 0), 3: (1, 1)}


In [42]:
# getting the grid positions and indexes for the two agents K & T
init_K = init_pos[0]
init_T = init_pos[1]
init_K_pos = pos_dict[init_K]
init_T_pos = pos_dict[init_T]

In [43]:
# getting the preferred grid positions and indexes for the two agents A & B
# their preferred position will be the one where the other agent starts
pref_K = 3
pref_T = 0
pref_K_pos = pos_dict[pref_K]
pref_T_pos = pos_dict[pref_T]

#### Observations and States
In a single-agent environment, observations and states are both just the number of positions (because the agent can be at 4 different positions (4 states) and have 4 different observations).

Adding an extra agents adds extra complexity. We let our agents be strictly non-interacting, i.e. they cannot occupy the same position on the grid at the same time.

Both agents started with having 4 possible states, hence 4*4=16 possible states, but the restriction on non-interactivity reduces this number by the 4 positions where both agents are present, hence **number of possible states is 12**.

In [44]:
# observations and states
n_states = 12
n_observations = 12 # have to do more work on this, might be reduced assuming completely symmetric states

#### Generative Model Tensors
Now we define the tensors describing the generative model. For detailed explanations of what each tensor means/does, see:
https://pymdp-rtd.readthedocs.io/en/latest/notebooks/active_inference_from_scratch.html

In [45]:
# E vector (affordances)
E = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]

## Alternative way of thinking about states & state modalities (current)
The two modalities of the multiagent POMDP:
- location: "where am I in the world (on the grid)"
- agent awareness: "where is the other agent with respect to me in the world"

These modalities will be reflected in the **A** and **B** matrices.

In [59]:
# location
n_states = len(grid)
n_observations = len(grid)

In [60]:
# A matrix
# Note: maybe multi-agent actinf does not change B but rather A & D
A = np.eye(n_observations, n_states)
print(A)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [61]:
# other agent relative location (currently 1-step depth)
second_agent_locations = ["NONE", "NEXT_LEFT", "NEXT_RIGHT", "ABOVE", "BELOW"]

n_states_second = len(second_agent_locations)
n_observations_second = len(second_agent_locations)

A_second = np.eye(n_observations_second, n_states_second)
print(A_second)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [55]:
# B matrix
# Note: B should only encode prior beliefs about !controllable! transitions between hidden states
# why/how can we assume the actions of the other agents are within controllable transitions?
B = np.zeros((len(grid), len(grid), len(E)))

for action_id, action_label in enumerate(E):

    for curr_state, grid_location in enumerate(grid):

        y, x = grid_location

        if action_label == "DOWN":
            next_y = y - 1 if y > 0 else y
            next_x = x
        elif action_label == "UP":
            next_y = y + 1 if y < border else y
            next_x = x
        elif action_label == "LEFT":
            next_x = x - 1 if x > 0 else x
            next_y = y
        elif action_label == "RIGHT":
            next_x = x + 1 if x < border else x
            next_y = y
        elif action_label == "STAY":
            next_x = x
            next_y = y
        new_location = (next_y, next_x)
        next_state = grid.index(new_location)
        B[next_state, curr_state, action_id] = 1.0
print(B)

[[[0. 1. 1. 0. 1.]
  [0. 0. 1. 0. 0.]
  [0. 1. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 1. 0.]
  [0. 1. 0. 1. 1.]
  [0. 0. 0. 0. 0.]
  [0. 1. 0. 0. 0.]]

 [[1. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [1. 0. 1. 0. 1.]
  [0. 0. 1. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [1. 0. 0. 0. 0.]
  [0. 0. 0. 1. 0.]
  [1. 0. 0. 1. 1.]]]


Second modality of the **B** matrix is the transition probabilities given an observation of a second agent.
This can either be:
- "there is an agent *above* me"
- "there is an agent *below* me"
- "there is an agent *to the right* of me"
- "there is an agent *to the left* of me"
- "there is no agent next to me

This modality should track the *relative* position of the agent with respect to the second agent.
This can then be scaled to arbitrary many agents by using this matrix for tracking the position of different agents relative to each other.

In the following, the K_agent is the one whose generative model we're modeling, T_agent is the agent who K_agent is perceiving.

In [56]:
second_agent_locations = ["NONE", "NEXT_LEFT", "NEXT_RIGHT", "ABOVE", "BELOW"]

B_second = np.zeros((len(second_agent_locations), len(second_agent_locations), len(E)))
pos_idx = {"NONE": 0, "NEXT_LEFT": 1, "NEXT_RIGHT": 2, "ABOVE": 3, "BELOW": 4}

for action_id, action_label in enumerate(E):

    for curr_state, T_location in enumerate(second_agent_locations):

        if action_label == "UP":
            next_T_location = "NONE" if T_location != "ABOVE" else "ABOVE"
        elif action_label == "DOWN":
            next_T_location = "NONE" if T_location != "BELOW" else "BELOW"
        elif action_label == "LEFT":
            next_T_location = "NONE" if T_location != "NEXT_LEFT" else "NEXT_LEFT"
        elif action_label == "RIGHT":
            next_T_location = "NONE" if T_location != "NEXT_RIGHT" else "NEXT_RIGHT"
        elif action_label == "STAY":
            next_T_location = "NONE"
        new_T_location = next_T_location
        next_state = pos_idx[new_T_location]
        B_second[next_state, curr_state, action_id] = 1.0

print(B_second)

[[[1. 1. 1. 1. 1.]
  [1. 1. 0. 1. 1.]
  [1. 1. 1. 0. 1.]
  [0. 1. 1. 1. 1.]
  [1. 0. 1. 1. 1.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 1. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 1. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [1. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 1. 0. 0. 0.]]]


In [None]:
act_pos_next = {"UP": {
                    "ABOVE": "ABOVE",
                    "BELOW": "NONE",
                    "NEXT_LEFT": "NONE", 
                    "NEXT_RIGHT": "NONE", 
                    "NONE": "NONE"
                },
                "DOWN": {
                    "ABOVE": "NONE", 
                    "BELOW": "BELOW:,
                    "NEXT_LEFT": "NONE", 
                    "NEXT_RIGHT": "NONE",  
                    "NONE": "NONE"
                }, 
                "LEFT": {
                    "ABOVE": "NONE",  
                    "BELOW": "NONE", 
                    "NEXT_LEFT",
                    "NEXT_RIGHT": "NONE", 
                    "NONE": "NONE", 
                },
                "RIGHT": {
                    "ABOVE", 
                    "BELOW", 
                    "NEXT_LEFT", 
                    "NEXT_RIGHT", 
                    "NONE"
                }, 
                "STAY": {
                    "ABOVE",
                    "BELOW",
                    "NEXT_LEFT",
                    "NEXT_RIGHT",
                    "NONE"
                }
               }

In [100]:
import tools.utils as utils

# C -> preferred state

# C for agent A
C_A = utils.onehot(grid.index(pref_A_pos), len(grid)) # originally len(grid) was n_observations but that doesn't seem correct now

# C for agent B
C_B = utils.onehot(grid.index(pref_B_pos), len(grid))

print(C_A)
print(C_B)

[0. 0. 0. 1.]
[1. 0. 0. 0.]


In [102]:
# D -> initial prior

D_A = utils.onehot(grid.index(init_A_pos), len(grid)) # REVISIT: originally n_states but again did not seem correct
D_B = utils.onehot(grid.index(init_B_pos), len(grid))

print(D_A)
print(D_B)

[1. 0. 0. 0.]
[0. 0. 0. 1.]


In [15]:
# WIP
chosen_action = None
if chosen_action == 0:  # UP

    Y_new = Y - 1 if Y > 0 else Y
    X_new = X

elif chosen_action == 1:  # DOWN

    Y_new = Y + 1 if Y < agent.border else Y
    X_new = X

elif chosen_action == 2:  # LEFT
    Y_new = Y
    X_new = X - 1 if X > 0 else X

elif chosen_action == 3:  # RIGHT
    Y_new = Y
    X_new = X + 1 if X < agent.border else X

elif chosen_action == 4:  # STAY/WATCH (i.e. watch what the other agent will do)
    Y_new, X_new = Y, X