# CartPole Test   

In [None]:
!pip install ./pyansys_rl ./pyansys_gym

In [None]:
import os

import gym
import numpy as np

import pyansys_cartpole
from pyansys_dqn import dqn, dqn_runner, qn_keras
from pyansys_dqn.test_agents import RandomAgent, TrainedAgent

In [None]:
np.set_printoptions(precision=4, suppress=True)

In [None]:
env_name = 'pyansys-CartPole-v0'
env = gym.make(env_name)

## Random Agent
Here we create a simple test agent that behaves randomly and thus is not likely to succeed at the balancing task

In [None]:
agent = RandomAgent(env.action_space.n)

In [None]:
s = env.reset()
print(s)

Below, notice how we inform the agent about each state transition with `agent.start_state(s)` or `agent.next_reading(s, r, done)` and then ask it to recommend an action with `agent.next_action()`.  We inform the environment this recommendation by feeding the method `env.step(a)`.  We do not expect these recommendations to be good because this agent selects at random from the choices 'left' and 'right', with equal probability.  A control algorithm that just flips a coin to select how to behave is usually not effective.  Thus, the pole should not stay balanced for long.

In [None]:
agent.start_state(s)
done, r_tot = False, 0
while not done:
    a = agent.next_action()
    s, r, done, _ = env.step(a)
    print('--->' if a else '<---', s)
    agent.next_reading(s, r, done, False)
    r_tot += r
print('total timesteps:', r_tot)

## Trained Agent
Now we create an agent that has been trained, i.e., that refers to a successful neural networks in order to decide how best to act. It is thus much more likely to perform well and balance the pole for a noticeably greater number of steps... all this despite having a random starting point for the system!

In [None]:
output_path = 'successful_runs/pyansys_cartpole'
output_name = 'pyansys_cartpole_00'
n_actions = 2
agent = TrainedAgent(output_path, output_name, env.action_space.n, env.observation_space.shape)

In [None]:
s = env.reset()
print(s)

Below, notice how we inform the agent about each state transition with `agent.start_state(s)` or `agent.next_reading(s, r, done)` and then ask it to recommend an action with `agent.next_action()`.  We follow its recommendation by feeding it into the environment in `env.step(a)`.  The recommendations should be pretty good because they stem from neural networks that store the information resulting from successful training and the pole should stay up longer, hopefully for the entirety of the episode (200 steps). 

In [None]:
agent.start_state(s)
done, r_tot = False, 0
while not done:
    a = agent.next_action()
    s, r, done, _ = env.step(a)
    print('--->' if a else '<---', s)
    agent.next_reading(s, r, done, False)
    r_tot += r
print('total timesteps:', r_tot)

## Epilogue
Try resuming a trained neural network of your own!