## Introduzione
Il seguente notebook è frutto della visione del video [Play Any OpenAI Gym Environment with a Single Agent](https://www.youtube.com/watch?v=nvhWfk7R0RM&list=PLIfPjWrv526bMF8_vx9BqWjec-F-g-lQO&index=2) di TheComputerScientist 

In questo notebook si studiano altri due environment: **MountainCarContinuous-v0** e **MountainCar-v0** (*versione discreta*)

Nota bene: quando si parla di environment discreto o continuo si fa riferimento in questo caso all'**action space**.

***

## Import

Per prima cosa si fa il setup dell'ambiente di lavoro, importando la libreria:

In [3]:
import gym
import random
import numpy as np

***

## Creazione dell'environment **MountainCar-v0** e di un agente stupido

In [4]:
env_name = "MountainCar-v0"
env = gym.make(env_name)

  logger.warn(


E si stampano l'**observation space** e l'**action space**

In [5]:
print("Observation space of", env_name, "environment:", env.observation_space)
print("Action space of", env_name, "environment:", env.action_space)

Observation space of MountainCar-v0 environment: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
Action space of MountainCar-v0 environment: Discrete(3)


Dopodichè si definisce la classe Agent che andrà ad interagire con l'environment:

In [6]:
class Agent:
    def __init__(self, env):
        self.action_size = env.action_space.n
        print("Action size:", self.action_size)
        
    def get_action(self, state):
        action = random.choice(range(self.action_size))
        return action

E successivamente si istanzia un agent e facendogli eseguire delle azioni casuali come test sull'environment:

In [7]:
agent = Agent(env)
state = env.reset()

for _ in range(200):
    action = agent.get_action(state)
    state, reward, done, info = env.step(action)
    env.render()
    
env.close()
env.reset()

Action size: 3


array([-0.50772446,  0.        ], dtype=float32)

***

## Creazione dell'environment **MountainCarContinuos-v0** e di un agente stupido

Si impiega ora l'environment continuo **MountainCarContinuous-v0**:

In [8]:
env_name_continuous = "MountainCarContinuous-v0"
env_continuous = gym.make(env_name_continuous)

In [9]:
print("Observation space of", env_name_continuous, "environment:", env_continuous.observation_space)
print("Action space of", env_name_continuous, "environment:", env_continuous.action_space)

Observation space of MountainCarContinuous-v0 environment: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
Action space of MountainCarContinuous-v0 environment: Box(-1.0, 1.0, (1,), float32)


Come si può notare in questo caso si ha un ``action_space`` continuo di tipo ```float32```.

La classe **Agent** definita in precedenza non è valida per environment continui e ciò è dato dal fatto che ``action_space`` non ha una dimensione n e quindi l'azione casuale non è ottenibile tramite il ``random.choice()`` (*returns a randomly selected element from the specified sequence*). In questo caso si utilizza la libreria **numpy** per definire una nuova classe agente valida per environment continui:

In [10]:
class AgentCont:
    def __init__(self, env):
        self.action_low = env.action_space.low
        self.action_high = env.action_space.high
        self.action_shape = env.action_space.shape
        print(self.action_shape)
        
    def get_action(self, state):
        action = np.random.uniform(self.action_low,
                                  self.action_high,
                                  self.action_shape)
        return action

Il passo successivo consiste nel far interagire questo nuovo tipo di agente con l'environment **MountainCarContinuous-v0**:

In [11]:
agent_cont = AgentCont(env_continuous)
state = env_continuous.reset()

for _ in range(200):
    action = agent_cont.get_action(state)
    state, reward, done, info = env_continuous.step(action)
    env_continuous.render()
    
env_continuous.close()
env_continuous.reset()

(1,)


array([-0.42414397,  0.        ], dtype=float32)

Per curiosità si prova ad utilizzare la classe **Agent** per verificare quanto detto in precedenza:

In [12]:
agent = Agent(env_continuous)
state = env_continuous.reset()

for _ in range(200):
    action = agent.get_action(state)
    state, reward, done, info = env_continuous.step(action)
    env_continuous.render()
    
env_continuous.close()
env_continuous.reset()

AttributeError: 'Box' object has no attribute 'n'

 ***
 
 ## Creazione di un nuovo agente
 
 Si determina ora una classe **Agent** in grado di interagire con environment discreti e continui:

In [13]:
class AgentV2:
    def __init__(self, env):
        self.is_discrete = type(env.action_space) == gym.spaces.discrete.Discrete
        
        if self.is_discrete:
            self.action_size = env.action_space.n
            print("Action size:", self.action_size)
        else:
            self.action_low = env.action_space.low
            self.action_high = env.action_space.high
            self.action_shape = env.action_space.shape
            print("Action range:[", self.action_low,",", self.action_high, "]")
            
    def get_action(self, state):
        if self.is_discrete:
            action = random.choice(range(self.action_size))
        else:
            action = np.random.uniform(self.action_low,
                                      self.action_high,
                                      self.action_shape)

        return action

E si prova a verificare il funzionamento di ```AgentV2``` sui due nuovi environment, uno discreto (**Acrobot-v1**) e uno continuo (**Pendulum-v1**):

In [14]:
env1_name = "Acrobot-v1"
env1 = gym.make(env1_name)

agent = AgentV2(env1)
state = env1.reset()

for _ in range(200):
    action = agent.get_action(state)
    state, reward, done, info = env1.step(action)
    env1.render()
    
env1.close()
env1.reset()

Action size: 3


array([ 0.9951293 , -0.09857865,  0.99986255,  0.01658019,  0.08299524,
       -0.06223753], dtype=float32)

In [15]:
env2_name = "Pendulum-v1"
env2 = gym.make(env2_name)

agent = AgentV2(env2)
state = env2.reset()

for _ in range(200):
    action = agent.get_action(state)
    state, reward, done, info = env2.step(action)
    env2.render()
    
env2.close()
env2.reset()

Action range:[ [-2.] , [2.] ]


array([ 0.1871025 , -0.9823404 ,  0.16972768], dtype=float32)