<a id='Top'></a>
# Test Runner Harness
This Repo contains a framework/harness for running tests of various agents against tasks or environments. This file is intended to explain.

1. [How to create an agent](#Agent)
2. [How to create a task](#Task)
3. [How to use the test runner framework](#runner)

It also includes a list of [Tricks](#Tricks) and possible [TODOs](#TODO)

***
<a id='Agent'></a>
## 1) How to create an agent
[Top](#Top)

When we get to the test runner, there are two options for passing an agent. Either passing the agent object directly, or by specifying by name. If you wish to use the specifying by name option you MUST make your agent class a subclass of the BaseAgent class([/agents/base_agent.py](./agents/base_agent.py)) plus some extra steps (see below). If you are passing in an agent object this is not necessary but it can be helpful as it will give helpful errors when you have forgotten to implement something.

Regardless of if they are subclassed or not and agent MUST have the following methods:

1. `__init__(self, env)`
2. `act(self, state, testing)` 
3. `learn(self, state, action, reward, next_state, done, testing)`
4. `reset(self)`

### __init__
The init function should take an task or environment object as an argument. It is expected that after initialization all the functions will work. Though reset() is called before every "run" (not episode) of a test so you can do some building of the network there too if you really wanted.

### act
The act function takes arguments     
state - The state being passed from the environment     
testing -  A boolean indicating if this episode is a testing episode    

The act function should return a choice of action which is in the environment's action space.

### learn
The learn function is called after every step in an episode with:    
state - the previous state acted on    
action - the action which was taken    
reward - the reward that was given    
next_state - the state which resulted from the action    
done - boolean indicating if the episode has finished    
testing - boolean indicating if it is currently a testing round    

### reset
The purpose of reset is to support re-initializing the network so that the test runner can run one agent against one environment multiple times (runs) in sequence in order to accumulate average learning data. Reset should make the agent "like new". This can be tricky to do. But there are options which I'll list in the [Tricks](#Tricks) section 


### Example Agent
Here I have created a sample agent which performs the random policy and never learns:

```python
from agents.base_agent import BaseAgent
class RandomAgent(BaseAgent):
    def __init__(self, env):
        self.env = env

    def act(self, state, testing):
        return self.env.action_space.sample()
    
    def learn(self, s, a, r, sp, d, t):
        pass
    
    def reset(self):
        pass
```

Technically in this case reset() doesn't need to be explicitly defined because the BaseAgent class has a basic implementation which simply passes

### Making your agent accessible by name
The test runner supports referencing an agent by name and it fetching it for you. In order for it to do this your class **MUST** be a subclass of BaseAgent. It must also be loaded in the kernel at the time of running the test runner. The simplest way to do this if you done fiddling with it is to same it in a file in /agents/ and to import your agent class in the [/agents/\_\_init__.py](./agents/__init__.py)

You can also reference by name if you have constructed the class in a notebook and it inherited from BaseAgent


***
<a id='Task'></a>
## How to create and environments/ tasks
[Top](#Top)

Here we have a similar situation to the agents above. The test runner supports both taking a task/environment object or a name. When using the Name option the same restrictions apply. You must subclass BaseTask ([/tasks/base_task.py](./tasks/base_task.py))

Tasks are designed to mimic open ai gym environments so that all open ai gym environments will be automatically compatible with the test runner. Note though that reference by name is not yet supported for open ai gyms  because they are not subclasses of BaseTask if using a gym you MUST pass the object.

In order to meet this compatibility requirement the following methods are required for any task:

1. `__init__(self)`
2. `step(self, action)`
3. `reset(self)`
4. `render(self, close=False)`

### __init__
There are actually no special requirements for initialization. Though it is recommended to include observation and action spaces so that agents can infer what input and output shapes they need to satisfy.

### step
Step takes an action and is expected to output the following as a tuple

reward - numerical reward value    
next_state - the new state of the environment    
done - Boolean indicating if the episode has ended    
info - I'm not really sure what is supposed to be in this but gym supports it so so must you.    

### reset
Reset takes no arguments it just resets to the start of a new episode. It returns the initial state.

### render
This one is the most complicated. It is only used in the test runner's display functionality which allows you to see a video of the agent working in the environment. It must support the keyword argument close which by default is False and is True when the window is to be closed. This is soley for compatibility with the gyn environments. and is kind of optional to implement.

### Example
Here is an example where I have wrapped an existing gym environment to reduce the information supplied to it in the state (Removing the velocity data)

```python
import gym
from gym import spaces
from base_task import BaseTask

class limitedCartPole(BaseTask):
    def __init__(self):
        self.env = gym.make('CartPole-v0')
        
        self.mask = [0,2]
        olow  = self.env.observation_space.low[self.mask]
        ohigh = self.env.observation_space.high[self.mask]
        
        self.observation_space = spaces.Box(olow, ohigh)
        self.action_space = spaces.Discrete(2)
        
    def step(self, *args, **kwargs):
        ns, r, d, i = self.env.step(*args, **kwargs)
        ns = ns[self.mask]
        return ns, r, d, i
    
    def reset(self):
        s = self.env.reset()
        s = s[self.mask]
        return s
    
    def render(self, *args, **kwargs):
        self.env.render(*args, **kwargs)
```


### Making your task accessible by name
The test runner supports referencing an task by name and it fetching it for you. In order for it to do this your class **MUST** be a subclass of BaseTask. It must also be loaded in the kernel at the time of running the test runner. The simplest way to do this if you done fiddling with it is to same it in a file in /tasks/ and to import your task class in the [/tasks/\_\_init__.py](./tasks/__init__.py)

You can also reference by name if you have constructed the class in a notebook and it inherited from BaseTask

***
<a id='Runner'></a>
## Using the test runner

The test runner is a harness for taking an agent and a task and running them through episodes and recording stats. It is still in its early stages but it is capable of decent configuration of the test as will be outlined here. in particular it is responsible for determining if it is a testing or training round. It also has helper functions for inspecting the results

### __init__
The parameters to init are listed below
`(self, env, agent, runs=1, num_episodes=1000, report_interval=10, test_samples=5, print_test=False)`

env - The environment object or string name of task to be used    
agent - The agent object or the string name of agent to be used    
runs - How many times the agent should be reset and trained to accumulate statisics on learning ability    
num_episodes - How many episodes are in a single run    
report_interval - How frequently a testing phase should occur and stats be recorded    
test_sample - How many episodes should be used as a test sample when testing    
print_test - Boolean indicating if only the results of testing phases should be printed out    

**NOTE** When using the string name option use the name of the class even if you "import ... as ..." python doesn't care what you call it when its looking for the class name

### start
Dtart takes no arguments. It will reset the testrunner and start a testing sequence

### display
Display takes no arguments. It runs the agent through 1 episode while calling env.render() to display the agent at work

### plot
Plot takes one keyword argument, mode, which if set to 'ci' will plot the mean of all runs along with the 95% confidence interval. Otherwise it will plot all runs on one graph.

### Example
Simple examples of using the testRunner are as follows:

```python
from runner import TestRunner

runner = TestRunner('limitedCartPole', 'RandomAgent', 100, report_interval=5, print_test=True, runs=10)

# Run the test
runner.start()

# display the agent at work
runner.display()

# show the performance plots
runner.plot(mode='ci')
```

Alternatively both env and agent can be actual objects. This is particularly useful for supplying open AI gyms

```python
from runner import TestRunner
import gym

env = gym.make('CartPole-v0')
agent = RandomAgent(env)

runner = TestRunner(env, agent, 100, report_interval=5, print_test=True, runs=10)

...
```



***
<a id='Tricks'></a>
## Tricks
[Top](#Top)

Here are a few "tricks" that might come in handy

### Saving a rendering to file
Open AI gym has a method for doing this, though in my experience it is very finicky. In involves wrapping the environment.

```python
import gym
from gym import wrappers

env = gym.make('CartPole-v0')
Menv = wrappers.Monitor(env, './videos/Cartpole-agent')
```

You can then either pass this wrapped environment to the test runner directly OR if you can swap it in after the fact (if maybe you forgot but you got a really good agent and want to record it) by editing the `runner.env` and replacing it with the wrapped version. You can then call display() and it will play a single episode and save the video in the folder provided.

```python
# Replace the runner's env temporarily
# Menv.reset()
runner.env = Menv
runner.display()
#This is a dirty hack to make it save
Menv.reset()
Menv.render(close=True)

# Switch back
runner.env = env
```

Note there is some weirdness about when it chooses to save. If you are getting videos that are empty then try doing a `Menv.reset()` either before or after calling display or both... Not really sure why that happens

### Accessing the rewards data
Sometimes you might want to get the reward data  after having run the test runner. All of the reward information is stored in `runner.tracker` which is a numpy array of shape `[num_reports, num_runs]`. If you also want the episode numbers which correspond to these data points, access `runner.X` which is of shape `[num_reports]`. num_reports is the number of times testing phase was run. This is approximately `num_episodes//reporting_interval` but might be 1 more because testing is always run on the last episode of a run.

### Accessing the Agent
If you want to retrieve you agent after the fact it is stored at `runner.agent`

***
<a id='TODO'></a>
## TODO
[Top](#Top)

This is a work in progress harness which will probably get adapted as it is used. Some things foreseen to include:

1. Agent checkpointing options, either best or every test etc. This will require exposing a .save() function on the agent
2. The ability to input hyper parameters and possibly even sets of hyper parameter options have them run all of them and store the test results. This is similar to something that sci-kit learn has for grid searching
3. Maybe more built in analysis options than just plot. Other stats
4. The ability to retrieve processed stats like the mean and confidence intervales so multiple tests can be compared on one plot easier
5. MAYBE allow for multiple agents to be specified all of which will be run and stats accumulated. this could be added in conjunction with 2