# CartPole Test   

In [1]:
# !pip install ./pyansys_rl ./pyansys_gym

Processing ./pyansys_rl
Processing ./pyansys_gym
Building wheels for collected packages: pyansys-rl, pyansys-cartpole
  Building wheel for pyansys-rl (setup.py) ... [?25ldone
[?25h  Created wheel for pyansys-rl: filename=pyansys_rl-0.0.1-py3-none-any.whl size=7071 sha256=a9bd0f5c7a73eef1119ce50245e43dd6ac2fcd5784001965d402f51291953f00
  Stored in directory: /tmp/pip-ephem-wheel-cache-eneeb3gv/wheels/ab/f3/ff/bebff563baa460ac7ab74ef83360389d4ae6f5a05651e162f3
  Building wheel for pyansys-cartpole (setup.py) ... [?25ldone
[?25h  Created wheel for pyansys-cartpole: filename=pyansys_cartpole-0.0.1-py3-none-any.whl size=10366 sha256=c4c078ac3120cf716c7ca521eb5e3d02e54f928a3f5b520de338cca7fd589076
  Stored in directory: /tmp/pip-ephem-wheel-cache-eneeb3gv/wheels/80/64/72/e71acde093f23c47c6e87f82f5841c0c889c0fcfdb32351491
Successfully built pyansys-rl pyansys-cartpole
Installing collected packages: pyansys-rl, pyansys-cartpole
  Attempting uninstall: pyansys-rl
    Found existing installa

In [1]:
import os

import gym
import numpy as np

import pyansys_cartpole
from pyansys_dqn import dqn, dqn_runner, qn_keras
from pyansys_dqn.test_agents import RandomAgent, TrainedAgent

In [2]:
np.set_printoptions(precision=4, suppress=True)

In [3]:
env_name = 'pyansys-CartPole-v0'
env = gym.make(env_name)

  result = entry_point.load(False)


## Random Agent
Here we create a simple test agent that behaves randomly and thus is not likely to succeed at the balancing task

In [4]:
agent = RandomAgent(env.action_space.n)

In [5]:
s = env.reset()
print(s)

[-0.0328 -0.0433 -2.7321  0.    ]


Below, notice how we inform the agent about each state transition with `agent.start_state(s)` or `agent.next_reading(s, r, done)` and then ask it to recommend an action with `agent.next_action()`.  We inform the environment this recommendation by feeding the method `env.step(a)`.  We do not expect these recommendations to be good because this agent selects at random from the choices 'left' and 'right', with equal probability.  A control algorithm that just flips a coin to select how to behave is usually not effective.  Thus, the pole should not stay balanced for long.

In [6]:
agent.start_state(s)
done, r_tot = False, 0
while not done:
    a = agent.next_action()
    s, r, done, _ = env.step(a)
    print('--->' if a else '<---', s)
    agent.next_reading(s, r, done, False)
    r_tot += r
print('total timesteps:', r_tot)

<--- [-0.0338 -0.1072 -2.6191  0.0232]
<--- [-0.0379 -0.3012 -2.2815  0.0957]
total timesteps: 2


## Trained Agent
Now we create an agent that has been trained, i.e., that refers to a successful neural networks in order to decide how best to act. It is thus much more likely to perform well and balance the pole for a noticeably greater number of steps... all this despite having a random starting point for the system!

In [7]:
output_path = os.path(os.getcwd(), 'successful_runs','pyansys_cartpole')
output_name = 'pyansys_cartpole_00'
n_actions = 2
agent = TrainedAgent(output_path, output_name, env.action_space.n, env.observation_space.shape)

In [8]:
s = env.reset()
print(s)

[-0.0242 -0.0313 -0.8666  0.    ]


Below, notice how we inform the agent about each state transition with `agent.start_state(s)` or `agent.next_reading(s, r, done)` and then ask it to recommend an action with `agent.next_action()`.  We follow its recommendation by feeding it into the environment in `env.step(a)`.  The recommendations should be pretty good because they stem from neural networks that store the information resulting from successful training and the pole should stay up longer, hopefully for the entirety of the episode (200 steps). 

In [9]:
agent.start_state(s)
done, r_tot = False, 0
while not done:
    a = agent.next_action()
    s, r, done, _ = env.step(a)
    print('--->' if a else '<---', s)
    agent.next_reading(s, r, done, False)
    r_tot += r
print('total timesteps:', r_tot)

<--- [-0.0253 -0.1075 -0.7492  0.0272]
<--- [-0.0294 -0.3019 -0.3988  0.1091]
total timesteps: 2


## Epilogue
Try resuming a trained neural network of your own!