## DynamicPolicy Protocol


### `DynamicPolicy` Protocol

The `DynamicPolicy` protocol defines methods to represent the behavior of a dynamic policy in a reinforcement learning environment. It provides a way to select actions based on a command and train the policy using a reward generator and emphasis function.


In [None]:
from typing import Protocol, Sequence, Callable
import numpy.typing as npt
import gymnasium as gym

class DynamicPolicy(Protocol):
    comand_space: gym.spaces.Discrete
    action_space: gym.spaces.Discrete

    def get_action(self, comand: int, state: npt.NDArray) -> int:
        ...

    def train(
        self,
        transitions: Sequence[tuple[Transition, Transition]],
        reward_generator: ActionCompatibility,
        emphasis: Callable[[int], float] = lambda _: 1.0,
    ) -> None:
        ...


**Usage:**

You can implement concrete dynamic policy classes that adhere to this protocol, providing methods for selecting actions and training the policy.


## Concrete Dynamic Policy Class

### `DQnetPolicyMapper`

The `DQnetPolicyMapper` class is a concrete implementation of the `DynamicPolicy` protocol. It manages a collection of policies, one for each command, and delegates action selection and training to these policies.


In [None]:
@dataclass(eq=False, slots=True)
class DQnetPolicyMapper(DynamicPolicy):
    comand_space: gym.spaces.Discrete
    action_space: gym.spaces.Discrete

    exploration_rate: float = field(init=False, default=1.0, repr=False)
    policies: dict[int, Policy] = field(init=False, repr=False)

    def __post_init__(self):
        ...
        
    def get_action(self, comand: int, state: npt.NDArray) -> int:
        ...
    
    def train(
        self,
        transitions: Sequence[tuple[Transition, Transition]],
        reward_gen: ActionCompatibility,
        emphasis: Callable[[int], float] = lambda _: 1.0,
    ) -> None:
        ...



**Usage:**

You can use the `DQnetPolicyMapper` class to manage multiple policies, each associated with a specific command. This can be useful when dealing with dynamic environments where different policies are needed for different commands.


### `__post_init__` Method

The `__post_init__` method is called automatically after object initialization. It creates policies for each command using the `DQnetPolicy` class.

### `get_action` Method

The `get_action` method selects an action based on a command and the current state by delegating the action selection to the corresponding policy.

### `train` Method

The `train` method trains the policies using a sequence of transitions, a reward generator, and an emphasis function. It trains each policy based on its associated command and transitions.


## Example Usage

Here's an example of how to use the `DQnetPolicyMapper` class:

In [None]:
# Create an instance of DQnetPolicyMapper
comand_space = gym.spaces.Discrete(3)
action_space = gym.spaces.Discrete(2)
policy_mapper = DQnetPolicyMapper(comand_space=comand_space, action_space=action_space)

# Generate a sequence of transitions (for illustration purposes)
transitions = [(Transition(...), Transition(...)) for _ in range(100)]

# Define a reward generator (for illustration purposes)
def reward_generator(comand, start_state, next_state):
    return float(comand == 1)  # Example: Reward is 1 if comand is 1, else 0

# Train the policy mapper
policy_mapper.train(transitions, reward_generator)

# Get actions based on commands and states
comand = 1
state = np.array([0.1, 0.2, 0.3])
action = policy_mapper.get_action(comand, state)

In this example, we create an instance of `DQnetPolicyMapper`, generate some transitions, and train the policies within the mapper. We also demonstrate how to use the `get_action` method to select actions based on commands and states.