## `Transition` NamedTuple

### `Transition` NamedTuple

The `Transition` named tuple represents a transition in a reinforcement learning environment. It includes the following fields:

- `start_state`: The starting state of the transition (as a NumPy array).
- `action`: The action taken during the transition (an integer).
- `next_state`: The resulting state after taking the action (as a NumPy array).
- `reward`: The reward received for taking the action (a floating-point number).
- `terminated`: A boolean flag indicating whether the episode terminated after this transition.
- `truncated`: A boolean flag indicating whether the episode was truncated.
- `info`: Additional information as a dictionary of string keys and arbitrary values.

In [2]:
from typing import NamedTuple, Any, Generic, Optional, TypeVar
import numpy.typing as npt
from numpy.typing import NDArray
from typing import SupportsFloat
import numpy as np

T = TypeVar("T")

class Transition(NamedTuple):
    start_state: NDArray
    action: int
    next_state: NDArray
    reward: SupportsFloat
    terminated: bool
    truncated: bool
    info: dict[str, Any]

**Usage:**

You can use the `Transition` named tuple to represent transitions in your reinforcement learning environment. For example:

In [3]:
# Create a transition
transition = Transition(
    start_state=np.array([0.1, 0.2, 0.3]),
    action=2,
    next_state=np.array([0.2, 0.3, 0.4]),
    reward=0.5,
    terminated=False,
    truncated=False,
    info={"step": 1}
)

## `ReplayMemory` Generic Class



### `ReplayMemory` Generic Class

The `ReplayMemory` generic class is a data structure for storing and sampling replay memory in reinforcement learning. It stores a list of generic items `T`, which can be transitions or other data.


In [4]:
from dataclasses import dataclass, field
from typing import Generic, Optional, TypeVar

@dataclass(eq=False)
class ReplayMemory(Generic[T]):
    batch_size: int = 256
    capacity: int = 2**15
    last: int = field(default=0, init=False)
    memory: list[T] = field(default_factory=list, init=False)

    @property
    def size(self) -> int:
        ...

    def push(self, item: T) -> None:
        ...

    def sample(self, quantity: Optional[int] = None) -> list[T]:
        ...

**Usage:**

You can use the `ReplayMemory` class to create a replay memory for storing transitions or other data. For example:

In [None]:
# Create a replay memory
memory = ReplayMemory[Transition]()

# Push a transition into the memory
transition = Transition(...)
memory.push(transition)

# Sample a batch of transitions from the memory
batch = memory.sample(quantity=32)

## Utility Functions

The code also provides utility functions:

- `add_indent`: Adds indentation to all lines in a string, useful for creating well-formatted representations.
- `torch_style_repr`: Creates a string representation in the style of PyTorch classes with named parameters.
- `plot_grid`: Plots a grid with specified cell sizes using Matplotlib.
- `plot_trajectory`: Plots a trajectory on a grid given a sequence of observations.

**Usage:**

You can use these utility functions for various purposes. For example:

In [None]:
# Create an indented string representation
indented_str = add_indent("This is\nan indented\nstring.", indent=2)

# Create a string representation in the style of PyTorch classes
params = {"batch_size": "32", "learning_rate": "0.001"}
repr_str = torch_style_repr("MyModel", params)

# Plot a grid
plot_grid(grid_shape=(6, 6), cell_shape=(2, 2))

# Plot a trajectory on a grid
trajectory = [1, 3, 7, 15]
plot_trajectory(start=0, trajectory=trajectory, grid_shape=(4, 4))

These utility functions are helpful for debugging, visualization, and generating informative string representations.