# Redes Neuronales de Convolución
### Javier Guzmán Muñoz

El siguiente notebook deja configuradas varias series neuronales que serán entrenadas con juegos del entorno GYM, de la categoría Atari Games, basados en imágenes y para los que tiene sentido usar una Red Neuronal de Convolución.

Todos los entornos de esta categoría están basados en imágenes de dimensiones (210, 160, 3), esto es, imágenes de 210x160 píxeles con 3 capas de color (RGB).

Las imágenes son modificadas con el preprocesador `GenericPixelPreprocessor`, que se encuentra disponible en `ray.rllib.models.preprocessors.py`. Este preprocesador, dada la dimensión que queremos que tenga nuestra imagen final (por defecto este valor es 84) la redimensiona para que tenga las dimensiones especificadas haciendo uso de la función `cv2.resize()`. Así, indicando en la configuración del agente el valor de dimensión especificado tendremos imágenes del tamaño en píxeles deseado.

Por otro lado, la función `get_filter_config` definida en `ray.rllib.models.utils` nos porporciona valores para los filtros de convolución por defecto para imágenes de tamaños 84x84 o 42x42. Para otros tamaños estos filtros se deben configurar manualmente, con la restricción de que sus parámetros deben producir una salida de dimensiones \[B, 1, 1, X\]. 

Para especificar filtros de convolución debemos  tres valores:
- `out_size`: tercera dimensión de la capa (número de filtros de convolución que aplicaremos)
- `kernel`: dimesiones del filtro de convolución
- `stride`: desplazamiento del filtro de convolución.

Así, se crea una capa por cada filtro de convolución especificado, todas con `padding='same'` salvo la última que tiene `padding='valid'`. Esto quiere decir que todas las capas se rellenan con ceros para que el tamaño de la salida sea el mismo que el de la entrada. Así, el ancho y alto de salida de todas las capas menos la última vienen dados por:
- Alto de salida: ceil(alto_entrada/stride)
- Ancho de salida: ceil(ancho_entrada/stride)

Los de la ultima capa obedecen a la fórmula: (dim_entrada-kernel_size)/stride +1, por lo que para que sea 1 debemos hacer que el kernel del último filtro de convolución sea igual al tamaño del ancho o largo de la última capa de convolución (esta ya se considera una capa totalmente conectada).


In [1]:
# Imports necesarios (copiados del notebook Prueba.inicial.ypnb)
import ray
import ray.rllib.agents.ppo as ppo
import json, os, shutil, sys
import gym
import pprint
import time
import shelve
from tensorflow import keras
from ray import tune

Instructions for updating:
non-resource variables are not supported in the long term


### Modelo con imágenes de entrada de tamaño 84x84 y filtros de convolución predefinidos.

In [2]:
ray.shutdown()
ray.init()
config = ppo.DEFAULT_CONFIG.copy()
agent = ppo.PPOTrainer(config, env='Pong-v0')
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-03 23:58:14,784	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
2020-12-03 23:58:17,781	INFO trainer.py:592 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
2020-12-03 23:58:17,783	INFO trainer.py:1064 -- `_use_trajectory_view_api` only supported for PyTorch so far! Will run w/o.
2020-12-03 23:58:17,783	INFO trainer.py:617 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=2151)[0m Instructions for updating:
[2m[36m(pid=2151)[0m non-resource variables are not supported in the long term
[2m[36m(pid=2154)[0m Instructions for updating:
[2m[36m(pid=2154)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [8, 8], 4], [32, [4, 4], 2], [256, [11, 11], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 84, 84, 4)]  0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None, 21, 21, 16)   

### Modelo con imágenes de entrada de tamaño 42x42 y filtros de convolución predefinidos

In [5]:
ray.shutdown()
ray.init()
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 42
agent = ppo.PPOTrainer(config, env='Pong-v0')
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:17:01,006	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=2853)[0m Instructions for updating:
[2m[36m(pid=2853)[0m non-resource variables are not supported in the long term
[2m[36m(pid=2854)[0m Instructions for updating:
[2m[36m(pid=2854)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [4, 4], 2], [32, [4, 4], 2], [256, [11, 11], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 42, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 42, 42, 4)]  0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None, 21, 21, 16)   

In [6]:
shutil.rmtree('/tmp/ppo/pong_42', ignore_errors=True, onerror=None)
result = agent.train()
print(agent.save('/tmp/ppo/pong_42'))

[2m[36m(pid=2853)[0m Instructions for updating:
[2m[36m(pid=2853)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=2854)[0m Instructions for updating:
[2m[36m(pid=2854)[0m Prefer Variable.assign which has equivalent behavior in 2.X.


/tmp/ppo/pong_42/checkpoint_1/checkpoint-1


In [27]:
ray.shutdown()
!python3 rollout.py /tmp/ppo/pong_42/checkpoint_1/checkpoint-1 --env='Pong-v0' --run PPO --episodes 10

Instructions for updating:
non-resource variables are not supported in the long term
2020-12-04 01:06:42,341	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
2020-12-04 01:06:44,829	INFO trainer.py:592 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
2020-12-04 01:06:44,829	INFO trainer.py:1064 -- `_use_trajectory_view_api` only supported for PyTorch so far! Will run w/o.
2020-12-04 01:06:44,829	INFO trainer.py:617 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=6791)[0m Instructions for updating:
[2m[36m(pid=6791)[0m non-resource variables are not supported in the long term
[2m[36m(pid=6799)[0m Instructions for updating:
[2m[36m(pid=6799)[0m non-resource variables are not supported in the long term
2020-12-04 01:06:53,985	INFO trainable.py:481 -- Restored on 10.10.1.128 from checkpoint: /tmp/ppo/pong_42/checkpoint_1/che

### Modelo con imágenes de entrada de tamaño 168x168

Para este tamaño de la entrada no tenemos un valor predefinido para los filtros de convolución. Siguiendo la idea de los dos que tenemos de ejemplo, vamos a probar varias opciones.

In [10]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 168
config['model']['conv_filters'] = [[16, [8, 8], 4],[32, [4, 4], 2],[256, [21, 21], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:28:30,443	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=3780)[0m Instructions for updating:
[2m[36m(pid=3780)[0m non-resource variables are not supported in the long term
[2m[36m(pid=3781)[0m Instructions for updating:
[2m[36m(pid=3781)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [8, 8], 4], [32, [4, 4], 2], [256, [21, 21], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 168, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 168, 168, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None, 42, 42, 16)  

In [11]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 168
config['model']['conv_filters'] = [[16, [16, 16], 8],[32, [4, 4], 2],[256, [11, 11], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:29:04,238	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=3986)[0m Instructions for updating:
[2m[36m(pid=3986)[0m non-resource variables are not supported in the long term
[2m[36m(pid=3984)[0m Instructions for updating:
[2m[36m(pid=3984)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [16, 16], 8], [32, [4, 4], 2], [256, [11, 11], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 168, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 168, 168, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None, 21, 21, 16)

In [14]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 168
config['model']['conv_filters'] = [[16, [8, 8], 4],[32, [4, 4], 2],[32, [4, 4], 2], [256, [11, 11], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:37:51,123	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=4605)[0m Instructions for updating:
[2m[36m(pid=4605)[0m non-resource variables are not supported in the long term
[2m[36m(pid=4607)[0m Instructions for updating:
[2m[36m(pid=4607)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [8, 8], 4], [32, [4, 4], 2], [32, [4, 4], 2], [256, [11, 11], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 168, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 168, 168, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (No

In [None]:
shutil.rmtree('/tmp/ppo/pong_168', ignore_errors=True, onerror=None)
result = agent.train()
print(agent.save('/tmp/ppo/pong_168'))

In [None]:
!rllib rollout /tmp/ppo/pong_168/checkpoint_1/checkpoint-1 --env='Pong-v0' --run PPO --episodes 10

### Modelo con imágenes de entrada de tamaño 252x252

In [12]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 252
config['model']['conv_filters'] = [[16, [8, 8], 4],[32, [4, 4], 2],[256, [32, 32], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:29:57,587	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=4187)[0m Instructions for updating:
[2m[36m(pid=4187)[0m non-resource variables are not supported in the long term
[2m[36m(pid=4189)[0m Instructions for updating:
[2m[36m(pid=4189)[0m non-resource variables are not supported in the long term
2020-12-04 00:30:21,199	INFO trainable.py:252 -- Trainable.setup took 20.843 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [8, 8], 4], [32, [4, 4], 2], [256, [32, 32], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 252, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 252, 252, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None, 63, 63, 16)  

In [13]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 252
config['model']['conv_filters'] = [[16, [16, 16], 8],[32, [4, 4], 2],[256, [16, 16], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:31:48,168	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=4400)[0m Instructions for updating:
[2m[36m(pid=4400)[0m non-resource variables are not supported in the long term
[2m[36m(pid=4401)[0m Instructions for updating:
[2m[36m(pid=4401)[0m non-resource variables are not supported in the long term
2020-12-04 00:32:02,574	INFO trainable.py:252 -- Trainable.setup took 11.219 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [16, 16], 8], [32, [4, 4], 2], [256, [16, 16], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 252, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 252, 252, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None, 32, 32, 16)

In [15]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 252
config['model']['conv_filters'] = [[16, [8, 8], 4],[32, [4, 4], 2], [32, [4, 4], 2], [256, [16, 16], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:40:21,352	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=4811)[0m Instructions for updating:
[2m[36m(pid=4811)[0m non-resource variables are not supported in the long term
[2m[36m(pid=4813)[0m Instructions for updating:
[2m[36m(pid=4813)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [8, 8], 4], [32, [4, 4], 2], [32, [4, 4], 2], [256, [16, 16], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 252, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 252, 252, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (No

In [16]:
ray.shutdown()
ray.init()
env = 'Pong-v0'
config = ppo.DEFAULT_CONFIG.copy()
config['model']['dim'] = 252
config['model']['conv_filters'] = [[16, [8, 8], 4],[16, [8, 8], 4], [32, [4, 4], 2], [256, [8, 8], 1]]
agent = ppo.PPOTrainer(config, env=env)
policy=agent.get_policy()
print(policy.model.model_config)
print(policy.model.base_model.summary())

2020-12-04 00:41:45,072	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(pid=5012)[0m Instructions for updating:
[2m[36m(pid=5012)[0m non-resource variables are not supported in the long term
[2m[36m(pid=5017)[0m Instructions for updating:
[2m[36m(pid=5017)[0m non-resource variables are not supported in the long term


{'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': [[16, [8, 8], 4], [16, [8, 8], 4], [32, [4, 4], 2], [256, [8, 8], 1]], 'conv_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, '_time_major': False, 'framestack': True, 'dim': 252, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None}
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 252, 252, 4) 0                                            
__________________________________________________________________________________________________
conv_value_1 (Conv2D)           (None