# Endless Runner Q-table training

Heavily inspired on the Q-table Colab notebook from Huggingface's DeepRL course: https://huggingface.co/learn/deep-rl-course/unit2/hands-on

(Aquí dejo esto para que no se me olvide. Para correr el ambiente, primero se debe instalar el paquete de GameEnvs, para hacer esto lo que hago es irme a /GameEnvs y corro pip install -e .)

## Import the packages

In [3]:
pip install -e GameEnvs

Obtaining file:///D:/Repository/GameDevsCSF/RL-environments/GameEnvs
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Installing collected packages: GameEnvs
  Attempting uninstall: GameEnvs
    Found existing installation: GameEnvs 0.0.1
    Uninstalling GameEnvs-0.0.1:
      Successfully uninstalled GameEnvs-0.0.1
  Running setup.py develop for GameEnvs
Successfully installed GameEnvs-0.0.1
Note: you may need to restart the kernel to use updated packages.


By running the next cell, we'll kill the kernel to make sure the installed package is imported correctly.

In [6]:
# Restart the kernel after running this cell to make sure that the package is imported correctly.
import os
os._exit(00)

: 

Now we can run the next cell by stopping it, restarting the kernel and running it again.

In [1]:
import gymnasium as gym
import GameEnvs
import numpy as np

## Create the environment

The range of distances in this particular environment goes from 50 (player_x - obstacle_width) to 800 (the set screen width)

Binning with 76 leads to an array like this: 
```
[ 50.  60.  70.  80.  90. 100. 110. 120. 130. 140. 150. 160. 170. 180.
 190. 200. 210. 220. 230. 240. 250. 260. 270. 280. 290. 300. 310. 320.
 330. 340. 350. 360. 370. 380. 390. 400. 410. 420. 430. 440. 450. 460.
 470. 480. 490. 500. 510. 520. 530. 540. 550. 560. 570. 580. 590. 600.
 610. 620. 630. 640. 650. 660. 670. 680. 690. 700. 710. 720. 730. 740.
 750. 760. 770. 780. 790. 800.]
```

In [2]:
bins = 76
env = gym.make("EndlessRunner-v0", obstacle_x_bins=bins)



## Understanding our environment

In [4]:
print("Action space:", env.action_space)
print("Sample action:", env.action_space.sample())

Action space: Discrete(3)
Sample action: 0


The actions are: do nothing, jump or duck

## Initialize the Q-table
Need to discretize the observations to make the Q-table. The solution seems to be to bin the observations, the question is how.

We have the obstacle's x position, the obstacles y positon and the player's y positon.

- Obstacle x: Goes from being at x = 800 to x = 50 (behind the player). The bin size could translate into "how fast" the agent reacts to having an obstacle in front of it.
- Obstacle y: is either on the floor (0) or above the floor (1).
- Player y: Is either standing (0), ducking (1), jumping (2).

In [38]:
print("Observation space:", env.observation_space)
print("Sample observation:", env.observation_space.sample())

Observation space: Tuple(Discrete(2), Discrete(76))
Sample observation: (1, 25)


In [54]:
action_space = env.action_space.n
print("There are ", action_space, " actions in the action space.")

observation_space = env.observation_space.spaces[0].n * env.observation_space.spaces[1].n 
print("There are ", observation_space, " observations in the observation space.")   

There are  3  actions in the action space.
There are  152  observations in the observation space.


In [55]:
def initialize_q_table(state_space, action_space):
    q_table = np.zeros([state_space, action_space])
    return q_table

In [58]:
Qtable_runner = initialize_q_table(bins, action_space)
print("Qtable shape:", Qtable_runner.shape)
print("Qtable:", Qtable_runner)

Qtable shape: (76, 3)
Qtable: [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [58]:
for i in range(10):
    env.reset()
    for t in range(10000):
        action = env.action_space.sample()
        observation, reward, terminated, truncated, info = env.step(action)
        if terminated:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

Game Over
Episode finished after 87 timesteps
Game Over
Episode finished after 87 timesteps
Game Over
Episode finished after 188 timesteps
Game Over
Episode finished after 87 timesteps
Game Over
Episode finished after 92 timesteps
Game Over
Episode finished after 96 timesteps
Game Over
Episode finished after 87 timesteps
Game Over
Episode finished after 87 timesteps
Game Over
Episode finished after 87 timesteps
Game Over
Episode finished after 87 timesteps
