# Simple Reinforcement Learning Implementation
![image.png](attachment:image.png)

### Our first RL Code
To make the things very simple, let's create a dummy environment that gives the agent some random rewards everytime, regardless of the agent's actions.

Though this is not of any practical usage, it allow us to focus on implementation of environment and agent classes.

Our enviornment class should be capable of handling actions received from the agent. This is done by action method, which checks the number of steps left and returns a random reward, by ignoring the agent's action

___init___ constructor is called to set the number of episodes for the event, get_observation() method is supposed to return the current environment's observation to the agent, but in this case returns a zero vector.

Other methods are mostly self explanatory, get_actions returns 0 or 1 corresponding to two available actions.is_done checks the end of episode.

In [1]:
import random
from typing import List

class SampleEnvironment:
    def __init__(self):
        self.steps_left = 20

    def get_observation(self) -> List[float]:
        return [0.0, 0.0, 0.0]

    def get_actions(self) -> List[int]:
        return [0, 1]

    def is_done(self) -> bool:
        return self.steps_left == 0

    def action(self, action: int) -> float:
        if self.is_done():
            raise Exception("Game is over")
        self.steps_left -= 1
        return random.random()

The agent's Class simple and includes only two methods: the constructor and the method that performs one step in the environment

Intitially the total reward collected is set to zero by the constructor.

The step function accepts environment instance as an argument and allows agent to perform the following actions:

Observe the environment
Make a decision about the action to take based on the observations
Submit the action to the environment
Get the reward for the current step

In [2]:
class Agent:
    def __init__(self):
        self.total_reward = 0.0

    def step(self, env: SampleEnvironment):
        current_obs = env.get_observation()
        print("Observation {}".format(current_obs))
        actions = env.get_actions()
        print(actions)
        reward = env.action(random.choice(actions))
        self.total_reward += reward
        print("Total Reward {}".format(self.total_reward))

In [3]:
if __name__ == "__main__":
    env = SampleEnvironment()
    agent = Agent()
    i=0

    while not env.is_done():
        i=i+1
        print("Steps {}".format(i))
        agent.step(env)

    print("Total reward got: %.4f" % agent.total_reward)

Steps 1
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 0.577610140966117
Steps 2
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 0.9145787534023856
Steps 3
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 1.5339384137172374
Steps 4
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 2.1105444460098894
Steps 5
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 2.961794010771453
Steps 6
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 3.3685717746129784
Steps 7
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 4.3328321189756265
Steps 8
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 5.244586063492445
Steps 9
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 5.973016415530635
Steps 10
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 6.311209139052627
Steps 11
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 6.314341379158698
Steps 12
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 7.153471517484753
Steps 13
Observation [0.0, 0.0, 0.0]
[0, 1]
Total Reward 8.069288019973232
Steps 14
Observation [0.0, 0.