# Implementing the Reinforcement Learning Loop with Gym
In this exercise we will implement a basic RL loop with episodes and timesteps using the Cart-Pole environment. You can change environment and use other environments as well, nothing changes as the main goal of gym is to unify the interfaces of all possible environments to build agents that are environment-agnostic as much as possible. This is very peculiar thing of RL: the algorithms are not usually suited to the task but are task-agnostic, so that they can be applied successfully to a variety of environments and still solve them. 
We need to create the environment as before. After that, we can loop for a defined number of episodes, for each episode we loop for a defined number of steps or until the episode is terminated (by checking the done value). For each timestep we have to call the env.step() function passing an action (we pass a random action for now), we collect the desired information. 

In [1]:
import gym

import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
env = gym.make("CartPole-v1")

# each episode is composed by 100 timesteps
# define 1000 episodes
n_episodes = 10
n_timesteps = 100

# loop for the episodes
for episode_number in range(n_episodes):
    # here we are inside an episode

    # the reset function resets the environment and returns
    # the first environment observation
    observation = env.reset()

    # loop for the given number of timesteps or
    # until the episode is terminated
    for timestep_number in range(n_timesteps):
        
        # render the environment
        env.render(mode="rgb-array")

        # select the action
        action = env.action_space.sample()

        # apply the selected action by calling env.step
        observation, reward, done, info = env.step(action)

        # if done the episode is terminated, we have to reset
        # the environment
        if done:
            print(f"Episode Number: {episode_number}, Timesteps: {timestep_number}")
            # break from the timestep loop
            break

# close the environment
env.close()


Episode Number: 0, Timesteps: 34
Episode Number: 1, Timesteps: 10
Episode Number: 2, Timesteps: 12
Episode Number: 3, Timesteps: 21
Episode Number: 4, Timesteps: 16
Episode Number: 5, Timesteps: 17
Episode Number: 6, Timesteps: 12
Episode Number: 7, Timesteps: 15
Episode Number: 8, Timesteps: 16
Episode Number: 9, Timesteps: 16
