# Optimización de redes neuronales a través de algoritmos genéticos


En este notebook se llevará a cabo la optimización de los hiperparámetros de una red neuronal mediante el uso de algoritmos genéticos.

La función de nuestra red neuronal es la predicción del PH del agua a través de ciertos parámetros

1. Cargamos las bibliotecas necesarias

In [33]:
!pip install deap

Collecting deap
  Downloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.4/135.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: deap
Successfully installed deap-1.4.1


In [96]:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from deap import base, creator, tools, algorithms

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import random

Cargamos el dataset

In [35]:
path = 'https://raw.githubusercontent.com/Yahred/evolutionary-computation/main/data/PARAMETROS_FINALES_CRUDOS.csv'

df = pd.read_csv(path)
ph = 'pH_CAMPO'

In [37]:
df.head()

Unnamed: 0,CLOROF_A,COLI_FEC,COLI_TOT,E_COLI,COT,COT_SOL,DBO_SOL,DBO_TOT,DQO_SOL,DQO_TOT,...,TURBIEDAD,TEMP_AMB,PROFUNDIDAD,CAUDAL,DUR_TOT,TEMP_AGUA,CONDUC_CAMPO,pH_CAMPO,OD_%,OD_mg/L
0,,24196.0,24196.0,,2.356,2.35,3.33,6.63,12.6,18.0872,...,46.0,35.3,,430.0,303.34,24.6,1200.0,8.2,83.7,5.26
1,,24196.0,24196.0,24196.0,8.3441,6.4727,2.73,4.11,15.5,27.8784,...,60.0,26.7,,420000.0,222.9984,24.3,677.0,7.97,85.8,7.21
2,,24196.0,24196.0,3654.0,8.1953,6.1425,4.97,6.65,10.0,16.16,...,30.0,34.6,,180.0,224.4432,25.8,479.0,8.02,89.8,7.31
3,,24196.0,24196.0,776.0,7.6502,4.0415,2.0,2.34,10.0,10.0,...,40.0,,,5.0,414.96,29.9,930.0,8.05,94.3,7.07
4,,663.0,12997.0,109.0,9.4452,3.0909,2.0,2.33,10.0,25.47,...,5.5,37.4,,5.0,298.99,33.1,1170.0,8.27,127.6,9.06


# 1. Limpieza de datos


Removemos las filas con valores nulos de pH_CAMPO, debido a que es el valor que queremos predecir

In [38]:
df.shape

(6162, 34)

In [39]:
df = df.dropna(subset=[ph])

In [40]:
df.shape

(6067, 34)

Removemos las columnas que no cuentan con al menos el 95% de registros

In [41]:
rows = len(df)
umbral = 0.95 * rows

df = df.dropna(thresh=umbral, axis=1)

In [42]:
df.shape

(6067, 22)

Nos quedan 22 columnas, ahora para el resto de datos nulos realizaremos una estrategia de imputación con mediana

In [43]:
columnas = list(df.columns)

df_imputado = df.copy()
for col in columnas:
  print(col, df[col].median())
  df_imputado[col].fillna(df[col].median(), inplace=True)

df = df_imputado

COLI_FEC 1723.0
COT 3.303
COT_SOL 2.7924499999999997
N_NH3 0.100493
N_NO2 0.005804999999999999
N_NO3 0.079
N_ORG 0.6272249999999999
N_TOT 1.1867450000000002
N_TOTK 0.795853
P_TOT 0.12009500000000001
ORTO_PO4 0.042891
COLOR_VER 18.0
ABS_UV 0.0874
SDT 481.92
SST 23.0
TURBIEDAD 7.5
TEMP_AMB 31.0
TEMP_AGUA 29.1
CONDUC_CAMPO 775.0
pH_CAMPO 8.1
OD_% 95.9
OD_mg/L 7.3


In [44]:
df.head()

Unnamed: 0,COLI_FEC,COT,COT_SOL,N_NH3,N_NO2,N_NO3,N_ORG,N_TOT,N_TOTK,P_TOT,...,ABS_UV,SDT,SST,TURBIEDAD,TEMP_AMB,TEMP_AGUA,CONDUC_CAMPO,pH_CAMPO,OD_%,OD_mg/L
0,24196.0,2.356,2.35,0.003,0.0341,19.6195,0.0,19.6566,0.003,0.208,...,0.009,768.0,76.6667,46.0,35.3,24.6,1200.0,8.2,83.7,5.26
1,24196.0,8.3441,6.4727,0.2566,0.0603,33.4269,0.0,33.7438,0.2566,0.2475,...,0.0288,433.28,43.0,60.0,26.7,24.3,677.0,7.97,85.8,7.21
2,24196.0,8.1953,6.1425,0.1063,0.0279,8.0885,0.0781,8.3008,0.1844,0.1814,...,0.0086,306.56,45.625,30.0,34.6,25.8,479.0,8.02,89.8,7.31
3,24196.0,7.6502,4.0415,0.2231,0.0354,16.7725,0.0,17.031,0.2231,0.1451,...,0.009,595.2,54.0,40.0,31.0,29.9,930.0,8.05,94.3,7.07
4,663.0,9.4452,3.0909,0.0589,0.0601,14.5449,0.0,14.6639,0.0589,0.1387,...,0.0935,748.8,30.0,5.5,37.4,33.1,1170.0,8.27,127.6,9.06


Ahora los datos está listos nuestro dataset esta listo y podemos definir los datos de entrenamiento y los datos de validación.

In [71]:
X = df.drop(ph, axis=1)

scaler = StandardScaler()

X = scaler.fit_transform(X)
y = df[ph]


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Definición del algoritmo evolutivo

Los hiperparámetros que vamos a evolucionar (listados respecto a la posición del gen que los representa dentro del genotipo) serán:


*  hidden_layer_sizes
*  activation
*  solver
*  alpha
*  learning_rate


In [81]:
opciones_genes = {
  'hidden_layer_sizes': [(128, 64, 32), (64, 32), (32)],
  'activation': ['identity', 'logistic', 'tanh', 'relu'],
  'solver': ['lbfgs', 'sgd', 'adam'],
  'alpha': (0.0001, 0.1),
  'learning_rate': ['constant', 'invscaling', 'adaptive'],
}

Definición de las caracetísticas del individuo


In [99]:
def crear_gen(hiper_parametro, valores):
  if isinstance(valores, tuple):
    inicio, fin = valores
    return random.uniform(inicio, fin)
  if isinstance(valores, list):
    return random.randint(0, len(valores) - 1)

def crear_individuo():
    return [crear_gen(hiper_parametro, valores) for [hiper_parametro, valores] in opciones_genes.items()]

creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

toolbox = base.Toolbox()

toolbox.register("individual", tools.initIterate, creator.Individual, crear_individuo)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

ind = toolbox.individual()



In [86]:
ind

[1, 0, 0, 0.08809823518707052, 2]

Definiremos una función que nos ayude a pasar de la codificación del genotipo a los valores que en realidad necesitamos

In [107]:
def from_ind_to_hiper_param(ind):
    hidden_layer_sizes, activation, solver, alpha, learning_rate = ind

    hidden_layer_sizes = opciones_genes['hidden_layer_sizes'][hidden_layer_sizes]
    activation = opciones_genes['activation'][activation]
    solver = opciones_genes['solver'][solver]
    learning_rate = opciones_genes['learning_rate'][learning_rate]

    return hidden_layer_sizes, activation, solver, alpha, learning_rate

# Definición de la evaluación de aptitud

Para esto se llevará a cabo el entramiento de un MLP con los hiperparámetros definidos por el individuo.

In [106]:
def fitness(ind):
  hidden_layer_sizes, activation, solver, alpha, learning_rate = from_ind_to_hiper_param(ind)

  modelo = MLPRegressor(hidden_layer_sizes=hidden_layer_sizes, max_iter=10, random_state=42, verbose=False, learning_rate=learning_rate, activation=activation, solver=solver, alpha=alpha)
  modelo.fit(X_train, y_train)
  y_pred = modelo.predict(X_test)
  mse = mean_squared_error(y_test, y_pred)
  return mse,

# Definición de la evolución

In [103]:
generaciones = 3
tamaño_poblacion = 10
longitud_cromosoma = 5
tamaño_torneo = 3
poblacion = []


toolbox.register("evaluate", fitness)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=tamaño_torneo)

stats = tools.Statistics(key=lambda ind: ind.fitness.values)
stats.register("avg", np.mean)


# Ejecución de la evolución

In [104]:
random.seed(64)

pop = toolbox.population(n=tamaño_poblacion)
hof = tools.HallOfFame(1)

pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, halloffame=hof, verbose=True, stats=stats)

ganador = tools.selBest(pop, k=1)[0]

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


gen	nevals	avg    
0  	10    	1.86659




1  	5     	0.150134




2  	10    	0.150134




3  	6     	0.15581 




4  	6     	0.15036 




5  	5     	0.150134




6  	4     	0.15581 




7  	5     	0.150134




8  	6     	0.150134


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


9  	7     	0.150891




10 	6     	0.150134




11 	4     	0.150134




12 	5     	0.150134




13 	5     	0.15036 




14 	3     	0.150134




15 	6     	0.150134




16 	4     	0.530811




17 	7     	0.15036 




18 	5     	0.15581 




19 	4     	0.150134




20 	5     	0.150134




21 	9     	0.150134




22 	4     	0.150134


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


23 	10    	0.150891




24 	3     	0.150134




25 	3     	0.15036 




26 	10    	0.16442 




27 	4     	0.15036 




28 	6     	0.150134




29 	6     	0.26918 




30 	7     	0.15036 




31 	4     	0.150134




32 	6     	0.150134




33 	2     	0.150134




34 	8     	0.269198


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


35 	10    	0.151649




36 	9     	0.15581 


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


37 	8     	0.150891


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


38 	8     	0.151117




39 	6     	0.150134




40 	4     	0.150134




In [105]:
ganador

[0, 0, 1, 0.06797853143580418, 0]

In [109]:
hiper_parametros = from_ind_to_hiper_param(ind)

In [110]:
hiper_parametros

((128, 64, 32), 'identity', 'adam', 0.09235236644367383, 'adaptive')