# ToDo List!
* ¿Qué suposición hace el MAE? ¿Qué está minimizando? ¿Por qué conviene usarlo como función de costo en este caso?
* Usar otros optimizadores

# Fuentes

### Link: https://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0
En esta fuente se puede encontrar una breve explicación del MAE y del MSE, una comparación entre ambos respecto de su comportamiento en entrenamiento frente a conjuntos de datos con y sin outliers, y luego una comparación de su comportamiento durante entrenamiento a razón de cómo son sus gradientes, lo cual provoca en el caso del MAE que la convergencia sea más lenta y sea necesario utilizar un **learning rate dinámico**. Explica que, si nos importa que la presencia de outliers tenga un impacto directo sobre el modelo, deberíamos utilizar MSE, mientras que si deseamos que no afecte demasiado podemos emplear MAE.

### Link: https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1
En esta fuente se puede encontrar una explicación de los tres métodos para learning rate dinámico utilizados, el **time-based decay**, el **step decay** y el **exponential decay**, empleando para algunos de ellos la clase de Keras llamada Learning Rate Scheduler, que permite modificar a gusto del usuario el valor del learning rate a través del proceso.

### Link: https://stackoverflow.com/questions/46308374/what-is-validation-data-used-for-in-a-keras-sequential-model
Esta disución de StackOverflow es interesante sobre la separación de los datasets en entrenamiento, validación y evaluación del modelo, la use para verificar algunas cuestiones sobre cómo usaba la información de validación Keras, entre otras cosas.

### Link: https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
Explicación sobre el uso de **early stopping**, donde básicamente buscamos parar el entrenamiento aunque no se hayan terminado de correr todos los epochs predefinidos, porque se detecta que no hay mejoría en los resultados obtenidos, para ello se emplea la métrica evaluada sobre el conjunto de validación.

### Link: https://machinelearningmastery.com/polynomial-features-transforms-for-machine-learning/
Explicación sobre el uso de **features polinomiales**, que básicamente consiste en agregar nuevas variables de entrada al modelo a partir de potencias obtenidas entre las variables de entrada originales. De esta forma, el espacio que conforman las variables es de mayor dimensión y por ello la solución es más flexible, aunque hay que tener cuidado de que no se ajuste demasiado provocando **overfitting**.

# 1. Cargando base de datos

In [1]:
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

In [2]:
import numpy as np

In [3]:
import importlib

In [4]:
import sys

In [5]:
sys.path.insert(0, '../..')

In [6]:
# Read the database from the .csv file into a pandas dataframe
df = pd.read_csv('../../databases/insurance.csv')

# 2. Preprocesamiento de los datos

In [7]:
from sklearn import preprocessing

## 2.1. Codificación de variables no numéricas

In [8]:
# Create a label encoder for the sex variable or feature and create a new column in the dataframe 
# with the encoded version of the gender
sex_encoder = preprocessing.LabelEncoder()
sex_encoder.fit(df['sex'])
df['sex-encoded'] = sex_encoder.transform(df['sex'])

In [9]:
# Create a label encoder for the smoker variable or feature and create a new column in the dataframe
# with the encoded version of the smoker
smoker_encoder = preprocessing.LabelEncoder()
smoker_encoder.fit(df['smoker'])
df['smoker-encoded'] = smoker_encoder.transform(df['smoker'])

In [10]:
# Create a one hot encoder and fit the available types of regions in the dataset
region_encoder = preprocessing.OneHotEncoder()
region_encoder.fit(df['region'].to_numpy().reshape(-1, 1))

# Transform all entries into the one hot encoded representation
encoded_regions = region_encoder.transform(df['region'].to_numpy().reshape(-1, 1)).toarray()

# Add each new encoded variable or feature to the dataset
for i, category in enumerate(region_encoder.categories_[0]):
    df[f'{category}-encoded'] = encoded_regions.transpose()[i]

## 2.2. Filtrado de variables

In [11]:
# Filtering or removing of non desired variables
df_x = df[['age', 'bmi', 'smoker-encoded', 'children', 'sex-encoded', 'northwest-encoded', 'northeast-encoded', 'southwest-encoded', 'southeast-encoded']]
df_y = df['charges']

# 3. Separación del conjunto de entrenamiento y evaluación

In [12]:
from sklearn import model_selection

In [13]:
from sklearn import preprocessing

## 3.1. Separación de los conjuntos
Es importante notar que, se realiza la separación del conjunto de datos original en **train**, **valid** y **test**, por fuera del framework de Keras para garantizar un adecuado tratamiento de los conjuntos acorde a la metodología empleada. En otras palabras, de esta forma nos aseguramos que cualquier preprocesamiento o normalización sobre validación (valid) y evaluación (test) se realiza a partir de la información obtenida en entrenamiento.

In [14]:
# Split the dataset into train_valid and test
x_train_valid, x_test, y_train_valid, y_test = model_selection.train_test_split(df_x, df_y, test_size=0.2, random_state=15, shuffle=True)

In [15]:
# Split the dataset into train and valid
x_train, x_valid, y_train, y_valid = model_selection.train_test_split(x_train_valid, y_train_valid, test_size=0.3, random_state=23, shuffle=True)

# 4. Regresión Lineal


#### Comentarios
1. Al principio, sucedió que el MAE era muy lento para convergencia, lo cual tiene sentido por el tipo de función de costo que representa. Particularmente, comparado con MSE, es mucho más lentro. Empecé probando modificar de forma estática y a mano el **learning rate**.
2. Luego, con un learning rate cada vez mayor, pude observar que el entrenamiento era más rápido, pero sucedían dos cuestiones. En primer lugar, que se producía una especie oscilación en torno a un valor que asumo que es el mínimo al cual se acerca el entrenamiento, con lo cual sería necesario disminuir cerca de ahí el valor del learning rate. Por otro lado, este mínimo no era el mismo mínimo que obtuve con el MSE, debe ser un plateau, un mínimo local pero no el absoluto. Me propuse usar **learning rate dinámico** y **comenzar de diferentes puntos**.
3. Cuando probe utilizar MSE, si no normalizaba con z-score todas las variables, rápidamente divergía la función de costo y se rompía el entrenamiento. Por otro lado, la misma normalización afectaba mucho al entrenamiento del MAE. *¿Por qué?* Lo pude corregir un poco al aumentar el learning rate por un factor, lo cual debe tener sentido si se considera que ahora las variables estando normalizadas tienen una menor magnitud lo cual puede producir que los pasos sean menores que antes, y por eso se ralentizó.
4. Interesante, llegué a esta discusión https://datascience.stackexchange.com/questions/9020/do-i-have-to-standardize-my-new-polynomial-features a raiz de una pregunta bastante sencilla, **¿por qué no está mejorando la métrica con mayor orden de polynomial features?**. Resulta ser que normalizando las variables y luego aplicando polynomial features, obtengo nuevas variables que siguen encontrándose en el intervalo [0,1] pero que su orden de magnitud es mucho menor. *Conclusión, siempre normalizar las variables que entran al modelo, y por ende si aplicas polynomial features tenés que normalizar luego de crear las nuevas variables.*
5. Con la corrección mencionada anteriormente con respecto a la normalización, mejoró el resultado de ordenes grandes de polinomios.
6. Me llama la atención que por lo general los resultados de validación son mejores que en entrenamiento, y además, esta diferencia se achica más a medida que aumenta el orden de los polinomios. Me hace pensar que por alguna razón estoy en underfitting, o estimando incorrectamente las métricas (por ejemplo por tamaño del dataset). Este artículo menciona algo que puede ser útil https://keras.io/getting_started/faq/#why-is-my-training-loss-much-higher-than-my-testing-loss, una posibilidad sería que la validación sobre un epoch siempre tienda a ser mejor que el promedio del train en los batch, porque fue entrenándose mejor. Aunque no me convence después de muchos epochs que suceda esto. **¿Debería estar usando k-folding?**

In [37]:
from src import rl_helper
importlib.reload(rl_helper);

In [17]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          scheduler='time-decay',
                          decay_rate=0.01,
                          epochs=500,
                          batch_size=64
                         )

Model logs at tb-logs/rl/20210525-144650
Model checkpoints at checkpoints/rl/20210525-144650
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 3644.799072265625 Valid: 2969.898193359375 Test: 3130.251953125


In [18]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1.0,
                          scheduler='time-decay',
                          decay_rate=0.01,
                          epochs=500,
                          batch_size=64
                         )

Model logs at tb-logs/rl/20210525-144909
Model checkpoints at checkpoints/rl/20210525-144909
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 11415.05859375 Valid: 10254.7333984375 Test: 9868.3759765625


In [19]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          scheduler='time-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=500,
                          batch_size=64
                         )

Model logs at tb-logs/rl/20210525-145644
Model checkpoints at checkpoints/rl/20210525-145644
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 3654.8017578125 Valid: 2994.168212890625 Test: 3098.084228515625


In [20]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          scheduler='step-decay',
                          drop_rate=0.5,
                          epochs_drop=10,
                          epochs=500,
                          batch_size=32,
                         )

Model logs at tb-logs/rl/20210525-145948
Model checkpoints at checkpoints/rl/20210525-145948
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 3632.73046875 Valid: 2955.25537109375 Test: 3081.125732421875


In [21]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          scheduler='step-decay',
                          drop_rate=0.5,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs_drop=10,
                          epochs=500,
                          batch_size=32,
                         )

Model logs at tb-logs/rl/20210525-150229
Model checkpoints at checkpoints/rl/20210525-150229
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 3635.908203125 Valid: 2956.652099609375 Test: 3075.880615234375


In [22]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          scheduler='exponential-decay',
                          decay_rate=0.07,
                          epochs=500,
                          batch_size=32,
                         )

Model logs at tb-logs/rl/20210525-150501
Model checkpoints at checkpoints/rl/20210525-150501
Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 3642.355712890625 Valid: 2952.64306640625 Test: 3103.237060546875


In [23]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=500,
                          batch_size=32,
                         )

Model logs at tb-logs/rl/20210525-150759
Model checkpoints at checkpoints/rl/20210525-150759
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 3730.654541015625 Valid: 3021.2919921875 Test: 3236.439453125


In [24]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          degree=2,
                          epochs=500,
                          batch_size=32,
                          scheduler='exponential-decay',
                          decay_rate=0.09
                         )

Model logs at tb-logs/rl/20210525-151058
Model checkpoints at checkpoints/rl/20210525-151058
Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 1)                 55        
Total params: 55
Trainable params: 55
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2326.44189453125 Valid: 1785.8372802734375 Test: 1824.603515625


In [25]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          degree=3,
                          scheduler='exponential-decay',
                          decay_rate=0.1,
                          epochs=500,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-151354
Model checkpoints at checkpoints/rl/20210525-151354
Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 1)                 220       
Total params: 220
Trainable params: 220
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2193.197998046875 Valid: 1703.93359375 Test: 1724.305419921875


In [26]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          degree=3,
                          scheduler='exponential-decay',
                          decay_rate=0.1,
                          epochs=500,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-151709
Model checkpoints at checkpoints/rl/20210525-151709
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_9 (Dense)              (None, 1)                 220       
Total params: 220
Trainable params: 220
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2204.8984375 Valid: 1693.052001953125 Test: 1734.9503173828125


In [27]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          degree=3,
                          scheduler='exponential-decay',
                          decay_rate=0.1,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=500,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-152033
Model checkpoints at checkpoints/rl/20210525-152033
Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 1)                 220       
Total params: 220
Trainable params: 220
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2198.080810546875 Valid: 1717.03271484375 Test: 1758.10498046875


In [28]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          degree=4,
                          scheduler='exponential-decay',
                          decay_rate=0.1,
                          epochs=500,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-152339
Model checkpoints at checkpoints/rl/20210525-152339
Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_11 (Dense)             (None, 1)                 715       
Total params: 715
Trainable params: 715
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2206.0283203125 Valid: 1877.9254150390625 Test: 1834.62109375


In [29]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=1000,
                          degree=4,
                          scheduler='exponential-decay',
                          decay_rate=0.1,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=500,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-152642
Model checkpoints at checkpoints/rl/20210525-152642
Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_12 (Dense)             (None, 1)                 715       
Total params: 715
Trainable params: 715
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2194.46337890625 Valid: 1981.76806640625 Test: 2041.5289306640625


In [30]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=2.0,
                          degree=5,
                          scheduler='exponential-decay',
                          decay_rate=0.001,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-152951
Model checkpoints at checkpoints/rl/20210525-152951
Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_13 (Dense)             (None, 1)                 2002      
Total params: 2,002
Trainable params: 2,002
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2193.317138671875 Valid: 1799.4388427734375 Test: 1841.3829345703125


In [31]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=6,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-153530
Model checkpoints at checkpoints/rl/20210525-153530
Model: "sequential_14"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_14 (Dense)             (None, 1)                 5005      
Total params: 5,005
Trainable params: 5,005
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 2105.380615234375 Valid: 1966.4605712890625 Test: 1889.2669677734375


In [32]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=7,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-154102
Model checkpoints at checkpoints/rl/20210525-154102
Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_15 (Dense)             (None, 1)                 11440     
Total params: 11,440
Trainable params: 11,440
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 1980.1461181640625 Valid: 1836.9769287109375 Test: 1943.13427734375


In [33]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=8,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-154723
Model checkpoints at checkpoints/rl/20210525-154723
Model: "sequential_16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_16 (Dense)             (None, 1)                 24310     
Total params: 24,310
Trainable params: 24,310
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 1895.37890625 Valid: 1869.8349609375 Test: 2096.034912109375


In [38]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=9,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32
                         )

Model logs at tb-logs/rl/20210525-163054
Model checkpoints at checkpoints/rl/20210525-163054
Model: "sequential_20"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_20 (Dense)             (None, 1)                 48620     
Total params: 48,620
Trainable params: 48,620
Non-trainable params: 0
_________________________________________________________________
[MAE] Train: 1994.5565185546875 Valid: 2052.66357421875 Test: 2201.368408203125


In [None]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=9,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32,
                          regularizer='l1',
                          regularizer_lambda=1e-3
                         )

Model logs at tb-logs/rl/20210525-163547
Model checkpoints at checkpoints/rl/20210525-163547
Model: "sequential_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_21 (Dense)             (None, 1)                 48620     
Total params: 48,620
Trainable params: 48,620
Non-trainable params: 0
_________________________________________________________________


In [None]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=9,
                          scheduler='exponential-decay',
                          decay_rate=0.01,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          epochs=1000,
                          batch_size=32,
                          regularizer='l2',
                          regularizer_lambda=1e-3
                         )

In [None]:
mae = rl_helper.run_model(x_train, y_train, x_valid, y_valid, x_test, y_test,
                          learning_rate=5.0,
                          degree=15,
                          scheduler='exponential-decay',
                          decay_rate=0.001,
                          optimizer='adam',
                          beta_1=0.9,
                          beta_2=0.99,
                          patience=200,
                          delta=1,
                          epochs=10000,
                          batch_size=64
                         )