### Documentación

Problemas interesantes para Aprendizaje por refuerzo
 * Gymnasium: https://gymnasium.farama.org/environments/box2d/

## Instalación

%pip install gymnasium  
%pip install gymnasium[box2d] 

## Acciones adicionales

Pueden ser necesarias *antes* de instalar gymnasium[box2d].

### En macos

pip uninstall swig  
xcode-select -—install (instala las herramientas de desarrollador si no se tienen ya)  
pip install swig  / sudo port install swig-python
pip install 'gymnasium[box2d]' # en zsh hay que poner las comillas  

### en Windows

Si da error, se debe a la falta de la versión correcta de Microsoft C++ Build Tools, que es una dependencia de Box2D. Para solucionar este problema, puede seguir los siguientes pasos:
 * Descargar Microsoft C++ Build Tools desde https://visualstudio.microsoft.com/visual-cpp-build-tools/.
 * Dentro del instalador, seleccione la opción "Desarrollo para el escritorio con C++"
 * Reinicie su sesión en Jupyter Notebook o en Visual Studio.
 * Ejecute nuevamente el comando !pip install gymnasium[box2d] en la línea de comandos de su notebook.

### En linux (colab)
  * pip install swig

In [40]:
import gymnasium as gym

from real_numbers import evolve_himmelblau, fitness, get_architecture, set_architecture
from MLP import MLP

import json

In [80]:
set_architecture([8, 16, 16, 4])

In [None]:
architecture = get_architecture()
print(architecture)
population_size = 750

pop = [MLP(architecture).to_chromosome() for _ in range(population_size)]

pop = evolve_himmelblau(pop, fitness, 0.2, pcross=0.7, ngen=175, T=5, trace=1)

[8, 16, 16, 4]


KeyboardInterrupt: 

In [70]:
pop[0]

[0.5159363696354056,
 0.4132473186247462,
 -0.2616896588842931,
 0.13790786698108448,
 0.38123819042283275,
 -0.6388958008436424,
 0.5533756568331181,
 -0.16709772053096011,
 0.9746494452754604,
 0.23062996990987672,
 -0.08940535232690246,
 -0.0943475506671419,
 0.5563546799093786,
 -0.2586533993617277,
 -0.6379420844874941,
 -0.49327647342767783,
 -0.8953966604004777,
 -0.3857631030195706,
 0.40519554772552563,
 0.15149619485929616,
 -0.17684457979804913,
 0.4651163310293566,
 0.2587135354191445,
 0.3925296590826495,
 -0.914856636463736,
 -0.003637486076268903,
 -0.6110954385383756,
 -0.3673630681740969,
 -0.29082155761019357,
 0.7058775991053448,
 -0.3233140846680991,
 -0.5605976144543241,
 0.6156167592286157,
 -0.7860444784312477,
 -0.19066692315836842,
 -0.7994356280616582,
 -0.39827282063637415,
 0.18349982481327126,
 0.06425159771254665,
 -0.06935630048231614,
 0.1603875781937781,
 -0.3425024622268896,
 -0.83879623237889,
 0.9720364925382305,
 0.7160847371559204,
 0.5196249451672

In [71]:
with open("/home/corti/RLGAN/weight/weights_16_16_16_1000_200_10_30.txt", "w") as file:
    json.dump(pop[0], file)

In [76]:
with open("/home/corti/RLGAN/weight/weights_16_16_750_175_5_20.txt", "r") as file:
    test = json.load(file)

In [None]:
from real_numbers import policy

env = gym.make("LunarLander-v3", render_mode="human")

observation, _ = env.reset()

racum = 0
while True:
    model = MLP(get_architecture())
    model.from_chromosome(test)
    action = policy(model, observation)
    observation, _, terminated, truncated, _ = env.step(action)
    if any([truncated, terminated]):
        observation, _ = env.reset()
        racum+=1
    
    if racum == 10:
        break

env.close()

[ 0.00969954  1.4055266   0.49054208 -0.13270721 -0.0111109  -0.10997228
  0.          0.        ]
[ 0.00969954  1.4055266   0.49054208 -0.13270721 -0.0111109  -0.10997228
  0.          0.        ]
[ 0.01454954  1.4019418   0.49055958 -0.15937799 -0.01660574 -0.10990677
  0.          0.        ]
[ 0.01454954  1.4019418   0.49055958 -0.15937799 -0.01660574 -0.10990677
  0.          0.        ]
[ 0.01939983  1.3977573   0.4905761  -0.18605147 -0.02209986 -0.1098925
  0.          0.        ]
[ 0.01939983  1.3977573   0.4905761  -0.18605147 -0.02209986 -0.1098925
  0.          0.        ]
[ 0.02425013  1.3929732   0.49059218 -0.21272221 -0.0275931  -0.10987512
  0.          0.        ]
[ 0.02425013  1.3929732   0.49059218 -0.21272221 -0.0275931  -0.10987512
  0.          0.        ]
[ 0.0291007   1.3875895   0.4906084  -0.23939256 -0.03308548 -0.10985757
  0.          0.        ]
[ 0.0291007   1.3875895   0.4906084  -0.23939256 -0.03308548 -0.10985757
  0.          0.        ]
[ 0.03395147

### ¿Cómo contruir el fitness para aplicar genéticos?

 * El módulo MLP ya tiene implementado el perceptrón multicapa. Se construye con MLP(architecture).
 * Architecture es una tupla (entradas, capa1, capa2, ...).
 * La función fitness toma el cromosoma del individuo y lo convierte a pesos del MLP con model.from_chromosome(ch).
 * usa run para N casos (esto da estabilidad) y calcula el refuerzo medio.
 * Este refuerzo medio es el fitness del individuo.

#### ¿No has tenido bastante?

Prueba a controlar el flappy bird https://github.com/markub3327/flappy-bird-gymnasium

pip install flappy-bird-gymnasium

import flappy_bird_gymnasium  
env = gym.make("FlappyBird-v0")

Estado (12 variables):
  * the last pipe's horizontal position
  * the last top pipe's vertical position
  * the last bottom pipe's vertical position
  * the next pipe's horizontal position
  * the next top pipe's vertical position
  * he next bottom pipe's vertical position
  * the next next pipe's horizontal position
  * the next next top pipe's vertical position
  * the next next bottom pipe's vertical position
  * player's vertical position
  * player's vertical velocity
  * player's rotation

  Acciones:
  * 0 -> no hacer nada
  * 1 -> volar