## Description

This notebook aims to be a minimalsit template for building reinfocement learning environments. Code which is very re-usable is provided, but gaps are left in most places where you need to write code which is specific to your problem. 

The main sections are:-
- system parameters: this is where any constants or simple functions required for simulation are coded.
- notebook parameters: general control of this notebook, whether to save results, display outputss etc.
- system constructor: the main body of code defining the environment and handlers for updating it and similar.
- agent handler: a constructor used for managing the agents in the simulated environment, data mining etc.

## Notes

I will create these:-

A general system object capable of representing arbitrary reinforcement learning environments, with the following attributes:-
- constructor
- copySystem
- readInputs
- generateRandomAction
- generateRandomState
- interpretAction
- setAction
- updateSystem

I will also create a generalized agent which works as follows:-
- takes the system in a particular state
- uses readInputs to identify the current state
- passes readInputs to some decision algorithm
- outputs the decision in the format required by interpetAction

## To-Do

none

## Imports

In [1]:
import random as r
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm

## System Parameters

These are problem-specific parameters and functions. Anything which is particular to this RL problem but not part of the environment definition goes here.

## Notebook parameters

In [2]:
#Display
verbosity = False #Whether neural networks should display their predictions and training

#Training parameters

#Testing parameters

#Whether to seed the notebook's randomness
seed = True
if seed:
    r.seed(1)

## System Constructor

All RL problems will need some version of these functions

In [4]:
class System:
    def __init__(
        self, 
    ):

    #Given the state of the environmnent, what do the agents actually see? 
    def readInputs(self):
        output = [
        ]
        return(output)

    #Creates a dummy copy of the system - useful for constructing other functions
    def copySystem(self):
        dummySystem = System(
        )
        return(dummySystem)

    #Generates any random action, ignoring validity constraints
    def generateRandomAction(self):
        output = [
        ]
        return(output)

    #Creates a randomly chosen state with uniform distribution
    def generateRandomState(self):
        dummySystem = System(
        )
        return(dummySystem)

    #Creates a randomly chosen state, excluding extremes. Not needed for all problems, but can be useful for managing edge cases
    def generateRandomMiddleState(self):
        dummySystem = System(
        )
        return(dummySystem)

    #Creates a default state: useful if there's a particularly common state such as initial configurations
    def generateDefaultState(self):
        dummySystem = System()
        return(dummySystem)

    #Checks agent decisions for validity and interprets invalid actions
    def interpretAction(
        self,
    ):
        return(output)

    #Some problems have actions as parts of the environment (e.g. opening or closing a valve). If so, setAction handles this.
    def setAction(
        self    
    ):
        return(dummySystem)

    #One "turn" might be literally the agent's turn in a discrete time game, or some small unit of time (e.g. 1 second) in continuous time
    def updateSystemOneTurn(self):

        outputSystem = System(
        )
        return(outputSystem)

    #For updating the system for multiple turns/timesteps, or if a different agent gets a turn after our agent (e.g. chess, go)
    def updateSystem(self):
        outputSystem = System(
        )
        return(outputSystem)

    '''
    The next functions are useful for interpreting data but not strictly needed to run the agents
    '''
    def readData(self):
        output = [
        ]
        return(output)
        
    #How good each state is, independently of actions
    def utilityFunction(
        self
    ):
        return(output)


    #The reward for transitioning between states, including rewards or costs for actions
    def reward(
        self,
    ):
       
        return()

IndentationError: expected an indented block after function definition on line 2 (4505655.py, line 7)

## Agent Handler

In [None]:
class AgentHandler:
    def __init__(self, agent):
        self.agent = agent

    #Briefly test the agents to check performance is as expected. Should require minimal tuning, timesteps is the main thing
    def evaluateAgent(self, timesteps = 60):
        inputs = []
        outputs = []
        data = []
        systemState = System().generateRandomState() #Consider replacing with generateRandomMiddleState() if appropriate
        for i in tqdm(range(timesteps)):
            inputs.append(systemState.readInputs())
            out = self.agent.predict(systemState)
            outputs.append(out)
            data.append(systemState.readData())
            systemState = systemState.setAction(*out)
            systemState.updateSystem()
        return([inputs, outputs, data])

    #Show a graph for the performances in evaluateAgent
    def displayEvaluations(self):
        data = self.evaluateAgent()
        for i in range(len(data)):
            row = data[i]
            plt.xlim(0, len(row))
            plt.grid(True)
            plt.plot(row)
            plt.legend()
            plt.show()

    
    def processData(self, data):
        outputs = []
        for row in data:
            subRow = []
            transRow = np.array(row).T
            for element in transRow:
                subRow.append(np.percentile(transRow, 0))
                subRow.append(np.percentile(transRow, 25))
                subRow.append(np.median(transRow))
                subRow.append(np.percentile(transRow, 75))
                subRow.append(np.percentile(transRow, 100))
            outputs.append(transRow)
        outputs = np.array(outputs)
        return(outputs.T)

    #fullTest is a longer test of the agents. Not needed for all problems, and requires specific tuning
    def fullTest(self):