<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [None]:
#| include: false
from nbdev.showdoc import show_doc

# Reinforcement learning

The agents based on reinforcement learning implement a value-based algorithm called Q-learning. More precisely, the agent implemented in this framework is based on [deep double Q-learning](https://ojs.aaai.org/index.php/AAAI/article/view/10295).

In [1]:
#| echo: false
#| output: asis
show_doc(DQNAgent)

---

[source](https://github.com/BorjaRequena/BOUNCE/tree/master/blob/master/BOUNCE/agents.py#L20){target="_blank" style="float:right; font-size:smaller"}

### DQNAgent

>      DQNAgent (model, learning_rate=0.001, criterion=None, optimizer=None,
>                batch_size=128, target_update=5, gamma=0.85, eps_0=1,
>                eps_decay=0.999, eps_min=0.1)

Agent based on a deep Q-Network (DQN):
Input: 
    - model: torch.nn.Module with the DQN model. Dimensions must be consistent
    - criterion: loss criterion (e.g., torch.nn.SmoothL1Loss)
    - optimizer: optimization algorithm (e.g., torch.nn.Adam)
    - eps_0: initial epsilon value for an epsilon-greedy policy
    - eps_decay: exponential decay factor for epsilon in the epsilon-greedy policy
    - eps_min: minimum saturation value for epsilon
    - gamma: future reward discount factor for Q-value estimation

We provide a default architecture for the neural network that encodes the Q-values, usually referred to as deep Q-Network (DQN). 

In [2]:
#| echo: false
#| output: asis
show_doc(DQN)

---

[source](https://github.com/BorjaRequena/BOUNCE/tree/master/blob/master/BOUNCE/agents.py#L125){target="_blank" style="float:right; font-size:smaller"}

### DQN

>      DQN (state_size, action_size)

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in
a tree structure. You can assign the submodules as regular attributes::

    import torch.nn as nn
    import torch.nn.functional as F

    class Model(nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.conv1 = nn.Conv2d(1, 20, 5)
            self.conv2 = nn.Conv2d(20, 20, 5)

        def forward(self, x):
            x = F.relu(self.conv1(x))
            return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:`to`, etc.

:ivar training: Boolean represents whether this module is in training or
                evaluation mode.
:vartype training: bool

# Blind-search

The agents based on tree search currently only implement blind-search techniques, such as breadth first search. 

In [3]:
#| echo: false
#| output: asis
show_doc(BrFSAgent)

---

[source](https://github.com/BorjaRequena/BOUNCE/tree/master/blob/master/BOUNCE/agents.py#L141){target="_blank" style="float:right; font-size:smaller"}

### BrFSAgent

>      BrFSAgent (initial_state)

Agent based on Breadth First Search (BrFS).

In [None]:
agent = BrFSAgent(np.array([1, 1, 1, 0, 0, 0]))

In [None]:
agent.expand()

[array([0, 1, 1, 0, 0, 0]),
 array([1, 0, 1, 0, 0, 0]),
 array([1, 1, 0, 0, 0, 0]),
 array([1, 1, 1, 1, 0, 0]),
 array([1, 1, 1, 0, 1, 0]),
 array([1, 1, 1, 0, 0, 1])]

# Monte-Carlo

The agents based on Monte-Carlo sampling follow the Metropolis-Hastings algorithm to move between states. A random action (new state) is proposed and the move is accepted or rejected with a certain probability.  

In [4]:
#| echo: false
#| output: asis
show_doc(MCAgent)

---

[source](https://github.com/BorjaRequena/BOUNCE/tree/master/blob/master/BOUNCE/agents.py#L174){target="_blank" style="float:right; font-size:smaller"}

### MCAgent

>      MCAgent (beta=0.1)

Initialize self.  See help(type(self)) for accurate signature.

In [None]:
#| include: false
from nbdev import nbdev_export
nbdev_export()

Converted 00_environment.ipynb.
Converted 01_agents.ipynb.
Converted 02_budget_profiles.ipynb.
Converted 03_hamiltonian.ipynb.
Converted 04_training.ipynb.
Converted 05_utils.ipynb.
Converted 06_sdp.ipynb.
Converted index.ipynb.
