# Agent One Notes
These are notes for ```agent_one.py``` script

In [32]:
import random
from tqdm import tqdm, tnrange
import time

# First Agent

### Environment Class

In [33]:
# Environment Class
class Environment:
    def __init__(self):
        self.steps_left = 100
        
    def get_observations(self):
        """
        This method returns the current enviroments obersvation to the agent. 
        """
        return [0.0, 0.0, 0.0] # No internal state
    
    def get_actions(self):
        """
        This method allows the agent to query the set of actions it can execute.
        """
        return [0, 1] # Two possible actions
    
    def is_done(self):
        """
        This step signals the end of the episode to the agent.
        """
        return self.steps_left == 0
    
    def action(self, action):
        """
        This method is the central piece in the environment functionality. This handles the agent's action and returns the reward for this action.
        """
        if self.is_done():
            raise Exception('Game is over')
        self.steps_left -= 1
        
        # if there are steps left, make a random action
        return random.random() # Returns a random float

### Agent Class

In [36]:
# Agent Class
class Agent:
    def __init__(self):
        """
        We initialize the counter that will keep the total reward accumulated by the agent during the episode
        """
        self.total_reward = 0.0
        
    def step(self, env):
        """
        We accept environment instance and allow the agent to take actions, then add up the reward from that action
        ARGS:
            env: Environment instance
        """
        # Making an observation
        current_obs = env.get_observations()
        
        # Make a decision about the action to take, based on the observations 
        actions = env.get_actions()
        
        # Submit the a random action, recieving a reward
        reward = env.action(random.choice(actions))
        
        # Update our total rewards
        self.total_reward += reward
        
        
if __name__ == "__main__":
    env = Environment()
    agent = Agent()
    
    while not env.is_done():
        for i in tnrange(env.steps_left, desc="Training Agent"):
            agent.step(env)
            time.sleep(.100)
            tqdm.write(f'Iteration: {i+1}, Total Reward So Far: {agent.total_reward}')
        

HBox(children=(IntProgress(value=0, description='Training Agent | Reward: 0.0', style=ProgressStyle(descriptio…


Iteration: 100, Total Reward So Far: 55.41886417276639


### Agent 1 Notes
The above allows us to illustrate important basic concepts that come from this RL model. *The environment could be an extremely complicated physics model, and an agent could easily be a large neural network implementing the lasted RL Algorithm but, **the basic pattern states the same**:*
* On every step, an agent takes some observation from the enviroment
* The agent does it's calculations
* From calculations, selects the action to issue
* The result of that action is a **reward** and new observation