## A2C (Advantage Actor Critic)

Code to train and test an A2C agent (using the stable-baselines3 library) in the gymnasium CartPole-v1 position. The example is given on stable-baselines3 [documentation](https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html).

In [None]:
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv
import gymnasium as gym

In this example, four vectorized environments are used to train the A2C agent. The parallel environments are created by the `make_vec_env` method

In [None]:
# Create parallel environments
vec_env = make_vec_env("CartPole-v1", n_envs=8, vec_env_cls=SubprocVecEnv)
model = A2C("MlpPolicy", vec_env, verbose = 1, device = "cpu")
model.learn(total_timesteps=50000)

model.save("a2c_cartpole")

In [None]:
try: del model
except: pass

model = A2C.load("a2c_cartpole")
obs = vec_env.reset()
dones_arr = []

for i in range(1000):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = vec_env.step(action)
    dones_arr.append(dones)
    vec_env.render("human")
