## Policy Protocol

### `Policy` Protocol

The `Policy` protocol defines two methods, `get_action` and `train`, to represent the behavior of a reinforcement learning policy. 

In [3]:
import numpy as np
import sys 
sys.path.append('../')
from mango.utils import Transition


In [4]:
from typing import Protocol, Sequence
import numpy.typing as npt
from dataclasses import dataclass, field
import gymnasium as gym
from typing import Any, Protocol, Sequence


class Policy(Protocol):
    def get_action(self, state: npt.NDArray) -> int:
        ...

    def train(self, transitions: Sequence[Transition]):
        ...

**Usage:**

You can implement concrete policy classes that adhere to this protocol, providing methods for selecting actions and training the policy.

## Concrete Policy Classes



### `RandomPolicy`

The `RandomPolicy` class is a concrete implementation of the `Policy` protocol that selects random actions from a given action space.


In [5]:
@dataclass(eq=False, slots=True)
class RandomPolicy(Policy):
    action_space: gym.spaces.Discrete

    def get_action(self, state: Any) -> int:
        ...
    
    def train(self, transitions: Sequence[Transition]):
        ...

**Usage:**

In [6]:
# Create a random policy for a Discrete action space
action_space = gym.spaces.Discrete(4)
random_policy = RandomPolicy(action_space=action_space)

# Get a random action
state = np.array([0.1, 0.2, 0.3])
action = random_policy.get_action(state)

### `DQnetPolicy`

The `DQnetPolicy` class is a concrete implementation of the `Policy` protocol that uses a deep Q-network to select actions and perform Q-learning updates.


In [8]:
@dataclass(eq=False, slots=True)
class DQnetPolicy(Policy):
    action_space: gym.spaces.Discrete

    def get_action(self, state: npt.NDArray) -> int:
        ...
    
    def train(self, transitions: Sequence[Transition]):
        ...


**Usage:**


In [None]:
# Create a DQnetPolicy for a Discrete action space
action_space = gym.spaces.Discrete(4)
dqnet_policy = DQnetPolicy(action_space=action_space)

# Get an action using the policy
state = np.array([0.1, 0.2, 0.3])
action = dqnet_policy.get_action(state)

# Train the policy with a sequence of transitions
transitions = [Transition(...)]
dqnet_policy.train(transitions)

## Summary

In this documentation, we have explained the purpose of the provided code and described the usage of the `Policy` protocol and its concrete implementations, `RandomPolicy` and `DQnetPolicy`. These classes can be used to define and train reinforcement learning policies for various environments and tasks.