<a href="https://colab.research.google.com/github/NicolasOrtiz05/Analitica/blob/main/Practica_1_snake_reactive_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Práctica 1: Ambiente Snake y Agente Reactivo
### Analítica de Datos
### Universidad de La Sabana

---

**Fecha límite de entrega**: Miércoles 14 de agosto antes de la medianoche (ver instrucciones en Teams).


Cerciórese de reiniciar y correr el notebook en su totalidad antes de enviarlo. Verifique que todas las salidas se muestran de manera correcta.

Integrantes del grupo (máximo 3):

* Nombre_1 username_1
* Nombre_2 username_2
* Nombre_3 username_3

---

Las siguientes instrucciones instalan las librerías y archivos necesarios para el notebook. Si ejecuta el notebook en un entorno local, verifique las direcciones estén bien.

In [None]:
!rm -r snake-ai-reinforcement # Elimina (rm: remove) cualquier cosa que se llame igual y que de casualidad tenga ud en la misma carpeta del notebook
!git clone https://github.com/YuriyGuts/snake-ai-reinforcement.git # Clona el repositorio de github en cuestion (git clone)
!mv snake-ai-reinforcement/snakeai . # Mueve (mv: move) la carpeta /snake al mismo directorio del notebook
!ls # lista (ls: list) el contenido del directorio donde está el notebook

rm: cannot remove 'snake-ai-reinforcement': No such file or directory
Cloning into 'snake-ai-reinforcement'...
remote: Enumerating objects: 197, done.[K
remote: Total 197 (delta 0), reused 0 (delta 0), pack-reused 197 (from 1)[K
Receiving objects: 100% (197/197), 42.98 KiB | 8.59 MiB/s, done.
Resolving deltas: 100% (97/97), done.
sample_data  snakeai  snake-ai-reinforcement


Vamos a construir un agente que sea capaz de jugar el juego de *Snake*:

<img src="https://cloud.githubusercontent.com/assets/2750531/24808769/cc825424-1bc5-11e7-816f-7320f7bda2cf.gif" alt="Snake snapshot" width="320"/>

Para esto vamos a usar como base este [proyecto](https://github.com/YuriyGuts/snake-ai-reinforcement) desarrollado por [Yuriy Guts](https://github.com/YuriyGuts).

Primero definimos una clase que nos permite simular el juego:


In [None]:
from snakeai.gameplay.environment import Environment

class EnvironmentPO(Environment):
    """
    Partial observation environment. Same as base class environment, overloads
    `get_observation` so that only the cells in front of the snake are returned.
    (From Environment doc): Represents the RL environment for the Snake game that implements the game logic,
    provides rewards for the agent and keeps track of game statistics.
    """
    def __init__(self, config, verbose=0):
        super().__init__(config, verbose)

    @property
    def observation_shape(self):
        """ Get the shape of the state observed at each timestep. """
        return 3

    def get_observation(self):
        """ Observe the state of the environment. """
        if self.is_game_over:
            return (0, 0, 0)
        center = self.snake.head + self.snake.direction
        if self.snake.direction == Point(0,1):
            left = self.snake.head + Point(1,0)
            right = self.snake.head + Point(-1, 0)
        elif self.snake.direction == Point(0, -1):
            left = self.snake.head + Point(-1, 0)
            right = self.snake.head + Point(1, 0)
        elif self.snake.direction == Point(1, 0):
            left = self.snake.head + Point(0, -1)
            right = self.snake.head + Point(0, 1)
        else:
            left = self.snake.head + Point(0, 1)
            right = self.snake.head + Point(0, -1)
        return (self.field[left], self.field[center], self.field[right])

    def show_field(self):
        return self.field.__str__()

Esta clase extiende la clase `Environment` del proyecto `snakeai`:

```Python
class Environment(object):
    """
    Represents the RL environment for the Snake game that implements the game logic,
    provides rewards for the agent and keeps track of game statistics.
    """

    def __init__(self, config, verbose=1):
        """
        Create a new Snake RL environment.
        
        Args:
            config (dict): level configuration, typically found in JSON configs.  
            verbose (int): verbosity level:
                0 = do not write any debug information;
                1 = write a CSV file containing the statistics for every episode;
                2 = same as 1, but also write a full log file containing the state of each timestep.
        """
        self.field = Field(level_map=config['field'])
        self.snake = None
        self.fruit = None
        self.initial_snake_length = config['initial_snake_length']
        self.rewards = config['rewards']
        self.max_step_limit = config.get('max_step_limit', 1000)
        self.is_game_over = False

        self.timestep_index = 0
        self.current_action = None
        self.stats = EpisodeStatistics()
        self.verbose = verbose
        self.debug_file = None
        self.stats_file = None

    def seed(self, value):

    @property
    def observation_shape(self):
        """ Get the shape of the state observed at each timestep. """

    @property
    def num_actions(self):
        """ Get the number of actions the agent can take. """

    def new_episode(self):
        """ Reset the environment and begin a new episode. """
        
    def record_timestep_stats(self, result):
        """ Record environment statistics according to the verbosity level. """

    def get_observation(self):
        """ Observe the state of the environment. """

    def choose_action(self, action):
        """ Choose the action that will be taken at the next timestep. """

    def timestep(self):
        """ Execute the timestep and return the new observable state. """

    def generate_fruit(self, position=None):
        """ Generate a new fruit at a random unoccupied cell. """

    def has_hit_wall(self):
        """ True if the snake has hit a wall, False otherwise. """

    def has_hit_own_body(self):
        """ True if the snake has hit its own body, False otherwise. """

    def is_alive(self):
        """ True if the snake is still alive, False otherwise. """
```

Un agente para jugar Snake lo construimos extendiendo la clase `AgentBase`. El siguiente es un agente que ejecuta sus acciones al azar:

In [None]:
from snakeai.agent import AgentBase

class RandomActionAgent(AgentBase):
    """ Represents a Snake agent that takes a random action at every step. """

    def __init__(self):
        pass

    def begin_episode(self):
        pass

    def act(self, observation, reward):
        return random.choice(ALL_SNAKE_ACTIONS)

    def end_episode(self):
        pass

Finalmente definimos una función `play` que nos permite simular el juego:

In [None]:
from snakeai.gameplay.entities import ALL_SNAKE_ACTIONS, Point
import numpy as np
import random


def play(env, agent, num_episodes=1, verbose=1):
    """
    Play a set of episodes using the specified Snake agent.
    Use the non-interactive command-line interface and print the summary statistics afterwards.

    Args:
        env: an instance of Snake environment.
        agent: an instance of Snake agent.
        num_episodes (int): the number of episodes to run.
    """

    fruit_stats = []

    print()
    print('Playing:')

    for episode in range(num_episodes):
        timestep = env.new_episode()
        agent.begin_episode()
        game_over = False
        step = 0
        while not game_over:
            if verbose > 0:
                print("------ Step ", step, " ------")
                print (env.show_field())
                print ("Observation:", env.get_observation())
                print ("Head:", env.snake.head)
                print ("Direction:", env.snake.direction)
            step += 1
            action = agent.act(timestep.observation, timestep.reward)
            env.choose_action(action)
            timestep = env.timestep()
            game_over = timestep.is_episode_end

        fruit_stats.append(env.stats.fruits_eaten)

        summary = '******* Episode {:3d} / {:3d} | Timesteps {:4d} | Fruits {:2d}'
        print(summary.format(episode + 1, num_episodes, env.stats.timesteps_survived, env.stats.fruits_eaten))

    print()
    print('Fruits eaten {:.1f} +/- stddev {:.1f}'.format(np.mean(fruit_stats), np.std(fruit_stats)))

Ya tenemos todos los elementos necesarios para simular el juego. Arrancamos con un tablero inicial en el cual
la serpiente está en el centro. Esto lo especificamos con un dictionario que indica la configuración, los campos
que nos interesan son `field`, `initial_snake_length` y `max_step_limit`, los otros campo los podemo ignorar por el momento:

In [None]:
inicial = {
  "field": [
    "#######",
    "#.....#",
    "#.....#",
    "#..S..#",
    "#.....#",
    "#.....#",
    "#######"
  ],

  "initial_snake_length": 2,
  "max_step_limit": 100,

  "rewards": {
    "timestep": -0.01,
    "ate_fruit": 1,
    "died": -1
  }
}

Veamos como se comporta el agente aleatorio con esta configuración:

In [None]:
env = EnvironmentPO(config=inicial, verbose=0)
agent = RandomActionAgent()
play(env, agent, num_episodes= 100, verbose=1)

## 1. Agente con un plan determinado

Vamos a construir un agente que partiendo del siguiente estado inicial:


In [None]:
```
#######
#.....#
#.###.#
#..S..#
##.s..#
#.....#
#######
```

llegue al siguiente estado

In [None]:
```
#######
#.....#
#s###.#
#S....#
##....#
#.....#
#######
```

In [None]:
from snakeai.gameplay.entities import SnakeAction
class PredefinedActionAgent(AgentBase):
    """ Represents a Snake agent that takes a random action at every step. """

    def __init__(self, actions):
        self.actions = actions
        self.step = 0
        pass

    def begin_episode(self):
        pass

    def act(self, observation, reward):
        """
        The agent takes the next action in the list of actions. Increases
        step by 1.
        """

        if self.step < len(self.actions):
            action = self.actions[self.step]
            self.step += 1
        else:
            action = SnakeAction.MAINTAIN_DIRECTION


        # Your code here

        return action

    def end_episode(self):
        pass

In [None]:
from snakeai.gameplay.entities import SnakeAction
inicial1 = inicial = {
  "field": [
    "#######",
    "#.....#",
    "#.###.#",
    "#..S..#",
    "##.s..#",
    "#.....#",
    "#######"
  ],

  "initial_snake_length": 2,
  "max_step_limit": 1000,

  "rewards": {
    "timestep": -0.01,
    "ate_fruit": 1,
    "died": -1
  }
}

env = EnvironmentPO(config=inicial1, verbose=1)
agent = PredefinedActionAgent([
    SnakeAction.TURN_RIGHT,
    SnakeAction.MAINTAIN_DIRECTION,
    SnakeAction.TURN_LEFT,
    SnakeAction.MAINTAIN_DIRECTION,
    SnakeAction.TURN_LEFT,
    SnakeAction.MAINTAIN_DIRECTION,
    SnakeAction.MAINTAIN_DIRECTION,
    SnakeAction.MAINTAIN_DIRECTION,
    SnakeAction.TURN_LEFT,
    SnakeAction.MAINTAIN_DIRECTION
])
play(env, agent, num_episodes= 1, verbose=1)

NameError: name 'PredefinedActionAgent' is not defined

## 2. Agente reactivo

La idea es construir un agente reactivo, es decir que sus acciones solo dependen de la observación en un momento dado y
no tiene memoria. El agente debe procurar no estrellarse y come cuantas frutas pueda. Compare el comportamiento de este agente con el del agente
al azar. Simule cada agente por 100 episodios. Presente los resultados y discútalos.



In [None]:
from snakeai.agent import AgentBase
from snakeai.gameplay.entities import SnakeAction, CellType

class ReactiveAgent(AgentBase):
    """
    Represents a reactive Snake agent that dcides an action exclusively based on
    the current observation.
    """

    def __init__(self):
        pass

    def begin_episode(self):
        pass

    def act(self, observation, reward):
        """
        The agent anlizes de current observation and takes a consequent action.
        """
        # Your code here
        left, front, right = observation

        if front == CellType.FRUIT:
            return SnakeAction.MAINTAIN_DIRECTION
        elif left == CellType.FRUIT:
            return SnakeAction.TURN_LEFT
        elif right == CellType.FRUIT:
            return SnakeAction.TURN_RIGHT

        else:
            valid_actions = []
            if front not in (CellType.WALL, CellType.SNAKE_BODY):
                valid_actions.append(SnakeAction.MAINTAIN_DIRECTION)
            if left not in (CellType.WALL, CellType.SNAKE_BODY):
                valid_actions.append(SnakeAction.TURN_LEFT)
            if right not in (CellType.WALL, CellType.SNAKE_BODY):
                valid_actions.append(SnakeAction.TURN_RIGHT)

            if not valid_actions:
                valid_actions = ALL_SNAKE_ACTIONS

            return random.choice(valid_actions)
    def end_episode(self):
        pass

In [None]:
#Funcionamiento agente reactivo
from snakeai.gameplay.entities import SnakeAction
inicial2 = inicial = {
  "field": [
    "#######",
    "#.....#",
    "#.###.#",
    "#..S..#",
    "#..s..#",
    "#.....#",
    "#######"
  ],

  "initial_snake_length": 2,
  "max_step_limit": 100,

  "rewards": {
    "timestep": -0.01,
    "ate_fruit": 1,
    "died": -1
  }
}

env = EnvironmentPO(config=inicial2, verbose=1)
agent = ReactiveAgent()
play(env, agent, num_episodes= 2, verbose=1)


Playing:
------ Step  0  ------
#######
#.....#
#.###.#
#..S..#
#.Os..#
#.....#
#######
Observation: (0, 4, 0)
Head: Point(x=3, y=3)
Direction: Point(x=0, y=-1)
------ Step  1  ------
#######
#.....#
#.###.#
#..sS.#
#.O...#
#.....#
#######
Observation: (4, 0, 0)
Head: Point(x=4, y=3)
Direction: Point(x=1, y=0)
------ Step  2  ------
#######
#.....#
#.###.#
#...s.#
#.O.S.#
#.....#
#######
Observation: (0, 0, 0)
Head: Point(x=4, y=4)
Direction: Point(x=0, y=1)
------ Step  3  ------
#######
#.....#
#.###.#
#.....#
#.OSs.#
#.....#
#######
Observation: (0, 1, 0)
Head: Point(x=3, y=4)
Direction: Point(x=-1, y=0)
------ Step  4  ------
#######
#.....#
#.###.#
#O....#
#.Sss.#
#.....#
#######
Observation: (0, 0, 0)
Head: Point(x=2, y=4)
Direction: Point(x=-1, y=0)
------ Step  5  ------
#######
#.....#
#.###.#
#OS...#
#.ss..#
#.....#
#######
Observation: (1, 4, 0)
Head: Point(x=2, y=3)
Direction: Point(x=0, y=-1)
------ Step  6  ------
#######
#.....#
#.###.#
#Ss.O.#
#.ss..#
#.....#
#######
O

Ahora con una nueva funcion podemos comparar cuantas frutas come el agente


In [None]:
from snakeai.gameplay.entities import Point
import numpy as np
import random

def play_com(env, agent_random, agent_reactivo, num_episodes=100, verbose=1):
    fruit_stats_random = []
    fruit_stats_reactivo = []

    print()
    print('Playing:')

    for episode in range(num_episodes):
        # Agente aleatorio
        env.new_episode()
        agent_random.begin_episode()
        game_over = False
        while not game_over:
            observation = env.get_observation()
            action = agent_random.act(observation, 0)
            env.choose_action(action)
            timestep = env.timestep()
            game_over = timestep.is_episode_end

        fruit_stats_random.append(env.stats.fruits_eaten)

        # Agente reactivo
        env.new_episode()
        agent_reactivo.begin_episode()
        game_over = False
        while not game_over:
            observation = env.get_observation()
            action = agent_reactivo.act(observation, 0)
            env.choose_action(action)
            timestep = env.timestep()
            game_over = timestep.is_episode_end

        fruit_stats_reactivo.append(env.stats.fruits_eaten)

        if verbose:
            summary = '******* Episode {:3d} / {:3d} | Timesteps {:4d} | Fruits Random {:2d} | Fruits Reactivo {:2d}'
            print(summary.format(episode + 1, num_episodes, env.stats.timesteps_survived,
                                 fruit_stats_random[-1], fruit_stats_reactivo[-1]))

    if verbose:
        print()
        print('Average fruits eaten by Random Agent: {:.1f} +/- stddev {:.1f}'.format(
            np.mean(fruit_stats_random), np.std(fruit_stats_random)))
        print('Average fruits eaten by Reactive Agent: {:.1f} +/- stddev {:.1f}'.format(
            np.mean(fruit_stats_reactivo), np.std(fruit_stats_reactivo)))


In [None]:
from snakeai.gameplay.entities import SnakeAction
inicial3 = {
  "field": [
    "#######",
    "#.....#",
    "#.###.#",
    "#..S..#",
    "#..s..#",
    "#.....#",
    "#######"
  ],

  "initial_snake_length": 2,
  "max_step_limit": 100,

  "rewards": {
    "timestep": -0.01,
    "ate_fruit": 1,
    "died": -1
  }
}

env = EnvironmentPO(config=inicial2, verbose=1)
agent_reactivo = ReactiveAgent()
agent_random = RandomActionAgent()
play_com(env, agent_random, agent_reactivo, num_episodes=100, verbose=1)


Playing:
******* Episode   1 / 100 | Timesteps   41 | Fruits Random  0 | Fruits Reactivo  4
******* Episode   2 / 100 | Timesteps   73 | Fruits Random  0 | Fruits Reactivo 15
******* Episode   3 / 100 | Timesteps   41 | Fruits Random  1 | Fruits Reactivo  5
******* Episode   4 / 100 | Timesteps   38 | Fruits Random  0 | Fruits Reactivo  7
******* Episode   5 / 100 | Timesteps   66 | Fruits Random  0 | Fruits Reactivo  9
******* Episode   6 / 100 | Timesteps   56 | Fruits Random  0 | Fruits Reactivo 10
******* Episode   7 / 100 | Timesteps   81 | Fruits Random  0 | Fruits Reactivo  6
******* Episode   8 / 100 | Timesteps   74 | Fruits Random  0 | Fruits Reactivo  9
******* Episode   9 / 100 | Timesteps  100 | Fruits Random  1 | Fruits Reactivo  6
******* Episode  10 / 100 | Timesteps   39 | Fruits Random  0 | Fruits Reactivo  5
******* Episode  11 / 100 | Timesteps  100 | Fruits Random  0 | Fruits Reactivo  0
******* Episode  12 / 100 | Timesteps   88 | Fruits Random  0 | Fruits Reacti

IndexError: list index out of range