A2C

Advantage Actor-Critic (A2C) is a synchronous and deterministic version of Asynchronous Advantage Actor-Critic (A3C). It combines value optimization and policy optimization approaches. This implementation of the A2C algorithm is built on PPO algorithm for simplicity, and it supports the following extensions:

Target network: ✔️
Gradient clipping: ✔️
Reward clipping: ❌
Generalized Advantage Estimation (GAE): ✔️
Discrete version: ✔️

Warning

The implementation of A2C serves as a pedagogical goal. For practitioners, we recommend using the PPO algorithm for training agents. Without the trust-region and clipped ratio, hyper-parameters in A2C, e.g., repeat_times, need to be fine-tuned to avoid performance collapse.

Code Snippet

import torch
from elegantrl.run import train_and_evaluate
from elegantrl.config import Arguments
from elegantrl.train.config import build_env
from elegantrl.agents.AgentA2C import AgentA2C

# train and save
args = Arguments(env=build_env('Pendulum-v0'), agent=AgentA2C())
args.cwd = 'demo_Pendulum_A2C'
args.env.target_return = -200
args.reward_scale = 2 ** -2
train_and_evaluate(args) 

# test
agent = AgentA2C()
agent.init(args.net_dim, args.state_dim, args.action_dim)
agent.save_or_load_agent(cwd=args.cwd, if_save=False)

env = build_env('Pendulum-v0')
state = env.reset()
episode_reward = 0
for i in range(2 ** 10):
    action = agent.select_action(state)
    next_state, reward, done, _ = env.step(action)

    episode_reward += reward
    if done:
        print(f'Step {i:>6}, Episode return {episode_reward:8.3f}')
        break
    else:
        state = next_state
    env.render()

Parameters

elegantrl.agents.AgentA2C.AgentA2C

elegantrl.agents.AgentA2C.AgentDiscreteA2C

Networks

elegantrl.agents.net.ActorPPO

elegantrl.agents.net.ActorDiscretePPO

elegantrl.agents.net.CriticPPO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a2c.rst

a2c.rst

A2C

Code Snippet

Parameters

Networks

Files

a2c.rst

Latest commit

History

a2c.rst

File metadata and controls

A2C

Code Snippet

Parameters

Networks