## Overview

This Python module defines a set of abstract actions for hierarchical reinforcement learning (HRL) in a 2D grid world. It includes classes and protocols to model abstract actions, grid movements, and one-hot encoded grid movements. The main goal is to provide a flexible framework for experimenting with hierarchical decision-making agents in grid-based environments.


## `AbstractActions` Protocol

This protocol defines the structure that abstract action classes should follow. It includes methods to manipulate observations, calculate termination probabilities (`beta`), and evaluate the compatibility of actions with observed transitions.


#### Methods:
- **`mask(self, obs: ObsType) -> ObsType`**: Masks the observation, potentially modifying its representation.
- **`beta(self, action: ActType, start_obs: ObsType, next_obs: ObsType) -> tuple[bool, bool]`**: Calculates termination probabilities for an action given the start and next observations.
- **`compatibility(self, action: ActType, start_obs: ObsType, next_obs: ObsType) -> float`**: Evaluates the compatibility of an action with the observed transition.


## Classes

### 1. `Grid2dActions` Enum

An enumeration representing 2D grid movement actions: LEFT, DOWN, RIGHT, and UP. It also provides a method to convert actions to coordinate deltas.

#### Methods:
- **`to_delta(self) -> tuple[int, int]`**: Converts the grid movement action to a coordinate delta.


### 2. `Grid2dMovement` Class

A concrete implementation of the `AbstractActions` protocol for 2D grid movements. It models basic grid movements, such as LEFT, DOWN, RIGHT, and UP, along with termination probabilities and compatibility metrics.

#### Attributes:
- **`action_space: ClassVar[spaces.Discrete]`**: Class variable representing the discrete action space (LEFT, DOWN, RIGHT, UP).
- **`cell_shape: tuple[int, int]`**: The shape of each grid cell.
- **`grid_shape: tuple[int, int]`**: The overall shape of the 2D grid.
- **`p_termination: float`**: The probability of termination for an action.
- **`reward: float`**: The reward associated with a compatible action.

### 3. `Grid2dMovementOnehot` Class

A variation of `Grid2dMovement` that uses one-hot encoding for agent positions in the grid. It allows for additional customization by adding a validity channel.

#### Attributes:
- **`agent_channel: int`**: The channel index representing the agent's position in the one-hot encoding.
- **`add_valid_channel: bool`**: Whether to add a validity channel to the observation.

#### Methods:
- **`obs2coord(self, obs: ObsType) -> tuple[int, int]`**: Converts a one-hot encoded observation to agent coordinates.
- **`mask(self, obs: ObsType) -> ObsType`**: Masks the one-hot encoded observation, potentially adding a validity channel.

## Examples of Implementation

### 1. Creating and Using `Grid2dMovement`

In [None]:
import sys
sys.path.append("..")

In [1]:

from mango.actions.grid2D_abstract_actions import Grid2dMovement

# Define grid parameters
cell_shape = (5, 5)
grid_shape = (10, 10)

# Create Grid2dMovement instance
grid_movement = Grid2dMovement(cell_shape=cell_shape, grid_shape=grid_shape)

# Use methods defined in AbstractActions protocol
masked_observation = grid_movement.mask(observation)
termination_probs = grid_movement.beta(action, start_obs, next_obs)
compatibility_score = grid_movement.compatibility(action, start_obs, next_obs)


ModuleNotFoundError: No module named 'grid_actions'

### 2. Creating and Using `Grid2dMovementOnehot`


In [None]:
from grid_actions import Grid2dMovementOnehot

# Define one-hot encoding parameters
agent_channel = 0
add_valid_channel = True

# Create Grid2dMovementOnehot instance
grid_movement_onehot = Grid2dMovementOnehot(
    cell_shape=cell_shape,
    grid_shape=grid_shape,
    agent_channel=agent_channel,
    add_valid_channel=add_valid_channel,
)

# Use additional methods for one-hot encoding
agent_coords = grid_movement_onehot.obs2coord(onehot_observation)
masked_onehot_observation = grid_movement_onehot.mask(onehot_observation)