My A2C implementation as PL system

The Algorithm

Credit: 2 - A2C implementation from Deep-Reinforcement-Learning-Hands-On-Second-Edition (pages 315-317)

From a training point of view, we complete these steps:

The preceding algorithm is an outline and similar to those that are usually printed in research papers. In practice, some considerations are as follows:

The separate parts:

Data module
Neural Nets
PL module
Callbacks
Data set

`CartPole-v0` parameters:

MAX_EPOCHS = 1000  # maximum epoch to execute
MAX_LENGTH_OF_A_GAME = 10000
LR = 3e-5  # learning rate
GAMMA = 0.99  # discount factor
HIDDEN_SIZE = 256

A2C net:

class ALGNet(nn.Module):
    """
    obs_size: observation/state size of the environment
    n_actions: number of discrete actions available in the environment
    # hidden_size: size of hidden layers
    """

    def __init__(self, obs_size: int, n_actions: int):
        super(ALGNet, self).__init__()
        self.critic_linear1 = nn.Linear(obs_size, HIDDEN_SIZE)
        self.critic_linear_hidden = nn.Linear(HIDDEN_SIZE, HIDDEN_SIZE)
        self.critic_linear2 = nn.Linear(HIDDEN_SIZE, 1)

        self.actor_linear1 = nn.Linear(obs_size, HIDDEN_SIZE)
        self.actor_linear_hidden = nn.Linear(HIDDEN_SIZE, HIDDEN_SIZE)
        self.actor_linear2 = nn.Linear(HIDDEN_SIZE, n_actions)
        self.n_actions = n_actions
        self.obs_size = obs_size
        self.entropy_term = 0

    def forward(self, state):
        state = Variable(torch.from_numpy(state).float().unsqueeze(0))
        value = F.relu(self.critic_linear1(state))
        value = F.relu(self.critic_linear_hidden(value))
        value = self.critic_linear2(value)

        policy_dist = F.relu(self.actor_linear1(state))
        policy_dist = F.relu(self.actor_linear_hidden(policy_dist))
        policy_dist = F.softmax(self.actor_linear2(policy_dist), dim=2)

        return value, policy_dist

`LunarLander-v2` parameters:

A2C net:

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
pics		pics
.gitignore		.gitignore
CONSTANTS.py		CONSTANTS.py
README.md		README.md
alg_callbaks.py		alg_callbaks.py
alg_datamodule.py		alg_datamodule.py
alg_dataset.py		alg_dataset.py
alg_lightning_module.py		alg_lightning_module.py
alg_net.py		alg_net.py
drafts.ipynb		drafts.ipynb
drafts.py		drafts.py
help_functions.py		help_functions.py
main.py		main.py
other_version_1.py		other_version_1.py
other_version_2.py		other_version_2.py
other_version_3.py		other_version_3.py
try_datamodule.py		try_datamodule.py
try_weights.py		try_weights.py

Arseni1919/PL_A2C

Folders and files

Latest commit

History

Repository files navigation

My A2C implementation as PL system

The Algorithm

CartPole-v0 parameters:

LunarLander-v2 parameters:

Thanks to:

About

Resources

Stars

Watchers

Forks

Languages

`CartPole-v0` parameters:

`LunarLander-v2` parameters: