<a href="https://colab.research.google.com/github/Yahred/evolutionary-computation/blob/main/PHPrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicción del PH

Cargamos las bibliotecas necesarias

In [90]:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt
import pandas as pd

Cargamos el dataset

In [26]:
path = 'https://raw.githubusercontent.com/Yahred/evolutionary-computation/main/data/PARAMETROS_FINALES_CRUDOS.csv'

df = pd.read_csv(path)

In [27]:
df.head()

Unnamed: 0,CLOROF_A,COLI_FEC,COLI_TOT,E_COLI,COT,COT_SOL,DBO_SOL,DBO_TOT,DQO_SOL,DQO_TOT,...,TURBIEDAD,TEMP_AMB,PROFUNDIDAD,CAUDAL,DUR_TOT,TEMP_AGUA,CONDUC_CAMPO,pH_CAMPO,OD_%,OD_mg/L
0,,24196.0,24196.0,,2.356,2.35,3.33,6.63,12.6,18.0872,...,46.0,35.3,,430.0,303.34,24.6,1200.0,8.2,83.7,5.26
1,,24196.0,24196.0,24196.0,8.3441,6.4727,2.73,4.11,15.5,27.8784,...,60.0,26.7,,420000.0,222.9984,24.3,677.0,7.97,85.8,7.21
2,,24196.0,24196.0,3654.0,8.1953,6.1425,4.97,6.65,10.0,16.16,...,30.0,34.6,,180.0,224.4432,25.8,479.0,8.02,89.8,7.31
3,,24196.0,24196.0,776.0,7.6502,4.0415,2.0,2.34,10.0,10.0,...,40.0,,,5.0,414.96,29.9,930.0,8.05,94.3,7.07
4,,663.0,12997.0,109.0,9.4452,3.0909,2.0,2.33,10.0,25.47,...,5.5,37.4,,5.0,298.99,33.1,1170.0,8.27,127.6,9.06


# Limpieza de los datos

1. Remover filas que cuentan con campos nulos

In [28]:
df.shape

(6162, 34)

In [29]:
df.dropna().shape

(0, 34)

Como vemos no es factible eliminar las filas que cuentan con campos nulos

2. Remover columnas con datos nulos

In [30]:
df.shape

(6162, 34)

In [31]:
df.dropna(axis=1).shape

(6162, 0)

In [32]:
df.dropna(axis=1).head()

0
1
2
3
4


Tampoco es una estrategia recomendable debido a que todas las columnas cuentan con valores nulos

3. Imputación de valores con la media

In [33]:
df.columns

Index(['CLOROF_A', 'COLI_FEC', 'COLI_TOT', 'E_COLI', 'COT', 'COT_SOL',
       'DBO_SOL', 'DBO_TOT', 'DQO_SOL', 'DQO_TOT', 'N_NH3', 'N_NO2', 'N_NO3',
       'N_ORG', 'N_TOT', 'N_TOTK', 'P_TOT', 'ORTO_PO4', 'COLOR_VER',
       'TRANSPARENCIA', 'ABS_UV', 'SDT', 'SAAM', 'SST', 'TURBIEDAD',
       'TEMP_AMB', 'PROFUNDIDAD', 'CAUDAL', 'DUR_TOT', 'TEMP_AGUA',
       'CONDUC_CAMPO', 'pH_CAMPO', 'OD_%', 'OD_mg/L'],
      dtype='object')

Se recorre cada columna del DataFrame para remplazar los nulos por la media

In [46]:
columnas = list(df.columns)

df_imputado = df.copy()
for col in columnas:
  print(col, df[col].median())
  df_imputado[col].fillna(df[col].median(), inplace=True)

CLOROF_A 14.501709614405241
COLI_FEC 1785.0
COLI_TOT 155747.34681325685
E_COLI 1083.5
COT 3.3865
COT_SOL 2.9240000000000004
DBO_SOL 3.05
DBO_TOT 5.38
DQO_SOL 16.75
DQO_TOT 41.179
N_NH3 0.1008995
N_NO2 0.0058
N_NO3 0.079
N_ORG 0.6343
N_TOT 1.1918395
N_TOTK 0.8037165
P_TOT 0.1205
ORTO_PO4 0.042842
COLOR_VER 18.0
TRANSPARENCIA 0.8568620676253402
ABS_UV 0.09
SDT 488.32
SAAM 0.47265058910161994
SST 23.0
TURBIEDAD 7.5
TEMP_AMB 31.0
PROFUNDIDAD 7.003205882352941
CAUDAL 12806.168171326399
DUR_TOT 183.88386625702813
TEMP_AGUA 28.9
CONDUC_CAMPO 763.0
pH_CAMPO 8.1
OD_% 94.9
OD_mg/L 7.2


In [47]:
df_imputado.head()

Unnamed: 0,CLOROF_A,COLI_FEC,COLI_TOT,E_COLI,COT,COT_SOL,DBO_SOL,DBO_TOT,DQO_SOL,DQO_TOT,...,TURBIEDAD,TEMP_AMB,PROFUNDIDAD,CAUDAL,DUR_TOT,TEMP_AGUA,CONDUC_CAMPO,pH_CAMPO,OD_%,OD_mg/L
0,14.50171,24196.0,24196.0,83963.8002,2.356,2.35,3.33,6.63,12.6,18.0872,...,46.0,35.3,7.003206,430.0,303.34,24.6,1200.0,8.2,83.7,5.26
1,14.50171,24196.0,24196.0,24196.0,8.3441,6.4727,2.73,4.11,15.5,27.8784,...,60.0,26.7,7.003206,420000.0,222.9984,24.3,677.0,7.97,85.8,7.21
2,14.50171,24196.0,24196.0,3654.0,8.1953,6.1425,4.97,6.65,10.0,16.16,...,30.0,34.6,7.003206,180.0,224.4432,25.8,479.0,8.02,89.8,7.31
3,14.50171,24196.0,24196.0,776.0,7.6502,4.0415,2.0,2.34,10.0,10.0,...,40.0,30.975155,7.003206,5.0,414.96,29.9,930.0,8.05,94.3,7.07
4,14.50171,663.0,12997.0,109.0,9.4452,3.0909,2.0,2.33,10.0,25.47,...,5.5,37.4,7.003206,5.0,298.99,33.1,1170.0,8.27,127.6,9.06


In [86]:
df_imputado.tail()

Unnamed: 0,CLOROF_A,COLI_FEC,COLI_TOT,E_COLI,COT,COT_SOL,DBO_SOL,DBO_TOT,DQO_SOL,DQO_TOT,...,TURBIEDAD,TEMP_AMB,PROFUNDIDAD,CAUDAL,DUR_TOT,TEMP_AGUA,CONDUC_CAMPO,pH_CAMPO,OD_%,OD_mg/L
6157,0.474,1576.0,155747.346813,10.0,7.627,7.627,2.0,5.21,10.0,21.2,...,75.0,35.0,1.2,12806.168171,111.26,32.7,302.0,8.5,104.4,7.3
6158,3.6624,2909.0,155747.346813,364.0,7.03,6.865,2.0,4.5,13.14,101.5,...,55.0,35.0,1.4,12806.168171,141.69,30.6,298.0,8.2,84.5,6.2
6159,0.1,201.0,155747.346813,10.0,6.397,5.392,2.0,2.0,10.0,26.62,...,4.1,19.6,12.0,12806.168171,82.91,23.9,189.0,8.1,58.0,4.9
6160,2.844,292.0,155747.346813,10.0,8.691,7.803,2.0,2.0,10.0,24.01,...,2.5,16.5,8.0,12806.168171,88.26,21.2,177.8,8.1,88.8,7.6
6161,21.1244,1.0,155747.346813,1.0,8.529,5.998,2.0,2.0,10.97,34.83,...,2.1,36.0,7.5,12806.168171,89.06,32.4,235.0,9.2,118.3,8.4


In [48]:
df = df_imputado

In [98]:
X = df.drop('pH_CAMPO', axis=1)

scaler = StandardScaler()
X = scaler.fit_transform(X)

y = df['pH_CAMPO']


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. Definir el modelo

In [93]:
modelo = MLPRegressor(hidden_layer_sizes=(200, 50), max_iter=1000, random_state=42, verbose=True, learning_rate='adaptive')

3. Entrenar el modelo

In [94]:
modelo.fit(X_train, y_train)

Iteration 1, loss = 20.31009860
Iteration 2, loss = 3.34180371
Iteration 3, loss = 1.50931974
Iteration 4, loss = 1.15710044
Iteration 5, loss = 0.95538107
Iteration 6, loss = 0.79957063
Iteration 7, loss = 0.66983358
Iteration 8, loss = 0.56907791
Iteration 9, loss = 0.48284843
Iteration 10, loss = 0.41045528
Iteration 11, loss = 0.35659362
Iteration 12, loss = 0.31328444
Iteration 13, loss = 0.27229287
Iteration 14, loss = 0.24269613
Iteration 15, loss = 0.21751743
Iteration 16, loss = 0.19891414
Iteration 17, loss = 0.18465302
Iteration 18, loss = 0.16591866
Iteration 19, loss = 0.15474295
Iteration 20, loss = 0.14525724
Iteration 21, loss = 0.13481527
Iteration 22, loss = 0.12639962
Iteration 23, loss = 0.12164579
Iteration 24, loss = 0.11334928
Iteration 25, loss = 0.10616108
Iteration 26, loss = 0.10243579
Iteration 27, loss = 0.09878123
Iteration 28, loss = 0.09493956
Iteration 29, loss = 0.09089092
Iteration 30, loss = 0.08746909
Iteration 31, loss = 0.08497637
Iteration 32, lo

4. Evaluación del modelo

In [95]:
y_pred = modelo.predict(X_test)

In [96]:
mse = mean_squared_error(y_test, y_pred)
print(f"Error cuadrado medio (MSE): {mse:.4f}")

Error cuadrado medio (MSE): 0.1908


In [104]:
for pred, real in zip(y_pred, y_test):
  print(f'Real: {real:.4f} Predicción: {pred:.4f}')

Real: 8.6000 Predicción: 8.3654
Real: 7.6000 Predicción: 7.5031
Real: 7.8000 Predicción: 8.0991
Real: 7.9000 Predicción: 7.8272
Real: 7.4000 Predicción: 7.6813
Real: 7.3000 Predicción: 7.8604
Real: 8.0000 Predicción: 7.7444
Real: 7.8000 Predicción: 7.5243
Real: 8.3000 Predicción: 8.0891
Real: 8.1000 Predicción: 8.2280
Real: 7.7000 Predicción: 8.1945
Real: 8.5000 Predicción: 8.0871
Real: 7.8400 Predicción: 7.8588
Real: 7.9000 Predicción: 8.1719
Real: 8.8600 Predicción: 8.3434
Real: 8.5000 Predicción: 7.9255
Real: 7.4500 Predicción: 7.2665
Real: 7.4000 Predicción: 7.0705
Real: 8.5000 Predicción: 8.2810
Real: 8.5000 Predicción: 8.5544
Real: 8.3000 Predicción: 7.9125
Real: 7.8300 Predicción: 7.8573
Real: 8.0700 Predicción: 8.2648
Real: 8.5000 Predicción: 8.3930
Real: 8.6400 Predicción: 8.8344
Real: 8.1800 Predicción: 8.3553
Real: 8.5000 Predicción: 6.9450
Real: 8.1097 Predicción: 8.0445
Real: 7.9000 Predicción: 8.5193
Real: 8.3000 Predicción: 8.1771
Real: 8.2000 Predicción: 8.4087
Real: 8.