# SSP/VSA embeddings in MiniGrid
There are wrappers and features extractors included in this package that are are for the MiniGrid environments specfically.


In the grid world environments, each cell can contain, at most, one object, which is specified by its type, colour, and state. Possible object types include wall, door, key, ball, box, goal, and lava,
with each object having attributes like colour (from a predefined set) and states (open, closed, locked) that are specific to certain object types.

The agent has a limited $7\times7$ field of view and cannot see through walls. The default observations are represented as a $7\times7\times3$ integer matrix, where each vector $(i,j,:)$ denotes the type, colour, and state of the object at position $(i,j)$ within the agent's field of view.  The agent can perform seven actions: turn left, turn right, move forward, pick up an object, drop an object, open a door or box, and complete a task (which is not applicable in the tasks considered here).

## Wrappers
- **SSPMiniGridPoseWrapper:** Represents the agent's pose within the environment as an SSP,
\begin{align}
   \phi_{\text{pose}} = \phi \left ( \left [x,y,\theta \right ] \right ) 
\end{align}
where $x,y$ is the agent's global position in the grid and $\theta \in \{0,1,2,3\}$ is an integer indicating the direction the agent is facing. Although the state variables are discrete due to the finite number of possible agent positions and orientations, they are treated as continuous variables in this embedding.
- **SSPMiniGridViewWrapper:**  Uses the algebra of HRRs to encode both the agent's field of view and its pose. The information encoded includes a representation of the agent's pose (global position and orientation), $\phi([x,y,\theta])$; a representation of the object the agent is carrying (bound with a semantic pointer, $\mathtt{HAS}$), if the agent is carrying an object (in these environments the agent is limited to carrying a single object, so the sum over objects carried in in equations below is over at most a single object); and a bundled representation of objects in the agent's field of view and their location relative to the agent. There are two versions of this:
    - **obj_encoding='allbound':** The complete state encoding is constructed via binding and bundling operations:
\begin{align}
   \Phi_{\text{view}} = \phi([x,y,\theta]) + \mathtt{HAS} \, \circledast &\sum_{\text{objects carried}}  \mathtt{ITEM}_i \circledast \mathtt{COLOUR}_i \circledast \mathtt{STATE}_i  \\
     + &\sum_{\text{objects in view}} \Delta\phi_i \circledast \mathtt{ITEM}_i \circledast \mathtt{COLOUR}_i \circledast \mathtt{STATE}_i. 
\end{align}
The vector, $\mathtt{ITEM}$, indicates the 'type' of an object in view, and can take on values $\mathtt{DOOR}$, $ \mathtt{KEY}$, $ \mathtt{BALL}$, $ \mathtt{BOX}$, $ \mathtt{GOAL}$, or $ \mathtt{LAVA}$. The vector, $\mathtt{COLOUR}$, indicates the colour of the associated object. The vector, $\mathtt{STATE}$, indicates the 'state' of an object, and can take on values $\mathtt{OPEN}$, $ \mathtt{CLOSED}$, or $ \mathtt{LOCKED}$ (objects with fixed states, such as lava or balls, are encoded as being in the `open' state). Finally, $\Delta\phi_i$, encodes an object-in-view's location relative to the agent.
    - **obj_encoding='slotfiller':** The complete state encoding is constructed via binding and bundling operations in a slot-filler style:
\begin{align}
    \Phi_{\text{slot-filler}} = \phi([x,y,\theta]) + \mathtt{HAS} \, \circledast &\sum_{\text{objects carried}} \left ( \mathtt{ITEM} \circledast \mathtt{I}_i + \mathtt{COLOUR} \circledast \mathtt{C}_i + \mathtt{STATE} \circledast \mathtt{S}_i \right )\\
     + &\sum_{\text{objects in view}} \Delta\phi_i \circledast \left ( \mathtt{ITEM} \circledast \mathtt{I}_{i} + \mathtt{COLOUR} \circledast \mathtt{C}_i + \mathtt{STATE} \circledast \mathtt{S}_i \right ),   
\end{align}
where $\mathtt{ITEM}$, $\mathtt{COLOUR}$, and $\mathtt{STATE}$ are random vectors that represent **slots** -- they indicate the type of the vector they are bound with -- while $\mathtt{I}_i$, $\mathtt{C}_i$, and $\mathtt{S}_i$ denote the actual **values** of item type, colour, and state. The main difference between $\Phi_{\text{slot-filler}}$ and the prior \gls*{hrr} embedding, $\Phi_{\text{view}}$, is representational overlap.
In $\Phi_{\text{view}}$,  objects differing in any attribute (\eg an open blue door versus a closed blue door) are dissimilar, whereas in $\Phi_{\text{slot-filler}}$, objects sharing properties have greater similarity (e.g., the representation of an open blue door is more similar to a closed blue door or a blue key compared to a red box).
    - **Local vs gloabl:** (view_type='local' or 'global') In local mode we use  $\Delta\phi_i$, object-in-view's location relative to the agent. While in global mode, we  $\phi_i$ instead, an object-in-view's global location in the env
- **SSPMiniGridMissionWrapper:** Added on to the above encoding is a representation of the mission string -- a part of the observation space in some MiniGrid and all BabyAI tasks.
    - Examples of misssion statements: “go to the {color} door”, “pick up the {color} {type}”, “go to a/the {color} {type}” + “and go to a/the {color} {type}” + “, then go to a/the {color} {type}” + “and go to a/the {color} {type}”
    - This class is a work-in-progress. Currently, regex is used to decompose the string, looking for particular command patterns (e.g., "go to _", "fetch a _", "pick up a _", "open the _", "put the _ near the _") as well as object and color names. The idea is to break up the mission statement into different simple subcommands that each involve a sngle object and binding a command type representations (e.g., $\mathtt{GO\_TO}$, $\mathtt{PICK\_UP}$, $\mathtt{OPEN}$) to object color and type representations (those used in the view encoding). This class will likely change in future versions of this package.
- **SSPMiniGridWrapper** An interface to selct one of the above. Takes input encode_pose (true/false),encode_view (true/false), encode_mission (true/false). Currently encode_mission=True with encode_view=False is not supported.

In [2]:
import gymnasium as gym
import sys, os
sys.path.insert(1, os.path.dirname(os.getcwd()))
os.chdir("..")
from vsagym.wrappers import minigrid_wrappers


env = gym.make('MiniGrid-Dynamic-Obstacles-5x5-v0')
env = minigrid_wrappers.SSPMiniGridWrapper(env,shape_out=251,
                encode_pose=False,encode_view=True,encode_mission=False)
observation, _ = env.reset()
for t in range(5):
    action = env.action_space.sample()
    observation, _, terminated, truncated, _ = env.step(action)
    if terminated or truncated or t == 4:
        observation, _ = env.reset()
env.close()

In [3]:
env = gym.make('MiniGrid-Empty-5x5-v0')
env = minigrid_wrappers.SSPMiniGridPoseWrapper(env,
                             shape_out=251,
                             decoder_method='from-set')
observation, _ = env.reset()
for t in range(5):
    action = env.action_space.sample()
    observation, _, terminated, truncated, _ = env.step(action)
    if terminated or truncated or t == 4:
        observation, _ = env.reset()
env.close()

env = gym.make('MiniGrid-KeyCorridorS3R1-v0')
env = minigrid_wrappers.SSPMiniGridViewWrapper(env,
                                               obj_encoding='allbound',
                                               view_type='local',
                                               shape_out=251,
                                               decoder_method='from-set')
observation, _ = env.reset()
for t in range(5):
    action = env.action_space.sample()
    observation, _, terminated, truncated, _ = env.step(action)
    if terminated or truncated or t == 4:
        observation, _ = env.reset()
env.close()

env = gym.make('MiniGrid-KeyCorridorS3R1-v0', render_mode='rgb_array')
env = minigrid_wrappers.SSPMiniGridViewWrapper(env,
                                               obj_encoding='allbound',
                                               view_type='global',
                                               shape_out=251,
                                               decoder_method='from-set')
observation, _ = env.reset()
for t in range(5):
    action = env.action_space.sample()
    observation, _, terminated, truncated, _ = env.step(action)
    if terminated or truncated or t == 4:
        observation, _ = env.reset()
env.close()

env = gym.make('MiniGrid-KeyCorridorS3R1-v0')
env = minigrid_wrappers.SSPMiniGridViewWrapper(env,
                                               obj_encoding='slotfiller',
                                               view_type='local',
                                               shape_out=251,
                                               decoder_method='from-set')
observation, _ = env.reset()
for t in range(5):
    action = env.action_space.sample()
    observation, _, terminated, truncated, _ = env.step(action)
    if terminated or truncated or t == 4:
        observation, _ = env.reset()
env.close()

