# Redes neuronales: regresión 

Vamos a crear un modelo de redes neuronales para predecir la fuerza compresiva de preparación de concreto, dada su composición. El diccionario de datos es el siguiente:

- cement (componente 1): cemento, kg en un metro cúbico de mezcla
- slag (componente 2): escorias de horno, kg en un metro cúbico de mezcla
- ash (componente 3): ceniza, kg en un metro cúbico de mezcla
- water (componente 4): agua,  kg en un metro cúbico de mezcla
- superplastic (componente 5): superplastificante, kg en un metro cúbico de mezcla
- coarseagg (componente 6): agregado grueso, kg en un metro cúbico de mezcla
- fineagg (componente 7): agregado fino, kg en un metro cúbico de mezcla
- age: días desde que se creó la mezcla
- strength: fuerza compresiva del concreto, en MPa (**variable objetivo**)

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
import numpy as np
import pandas as pd #tratamiento de datos
import matplotlib.pyplot as plt #gráficos
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split #metodo de particionamiento de datasets para evaluación
from sklearn.model_selection import GridSearchCV #permite buscar la mejor configuración de parámetros con C-V
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from math import sqrt
import seaborn as sns

## Entendimiento de los datos

In [6]:
data = pd.read_csv('Concrete.csv', sep=',', na_values=".")
print(data.shape)
data.head(5)

(1030, 9)


Unnamed: 0,cement,slag,ash,water,superplastic,coarseagg,fineagg,age,strength
0,141.3,212.0,0.0,203.5,0.0,971.8,748.5,28,29.89
1,168.9,42.2,124.3,158.3,10.8,1080.8,796.2,14,23.51
2,250.0,0.0,95.7,187.4,5.5,956.9,861.2,28,29.22
3,266.0,114.0,0.0,228.0,0.0,932.0,670.0,28,45.85
4,154.8,183.4,0.0,193.3,9.1,1047.4,696.7,28,18.29


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   cement        1030 non-null   float64
 1   slag          1030 non-null   float64
 2   ash           1030 non-null   float64
 3   water         1030 non-null   float64
 4   superplastic  1030 non-null   float64
 5   coarseagg     1030 non-null   float64
 6   fineagg       1030 non-null   float64
 7   age           1030 non-null   int64  
 8   strength      1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.6 KB


In [8]:
data.describe(include="all")

Unnamed: 0,cement,slag,ash,water,superplastic,coarseagg,fineagg,age,strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [7]:
X=data.iloc[:, :8]
y=data.iloc[:, 8]

In [8]:
np.random.seed(1234)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

<font color = "red">Establecemos el baseline de regresión y encontramos su MAE y R2.</font>

In [24]:
# Compute the baseline prediction (mean of actual values)
baseline = y.mean()

# Create a baseline prediction column with the mean value
data['baseline'] = baseline

# Compute MAE for baseline model
mae_baseline = mean_absolute_error(y, data['baseline'])

# Compute MSE for baseline model
mse_baseline = mean_squared_error(y, data['baseline'])

print("Baseline prediction (mean):", baseline)
print("MAE for Baseline Model:", mae_baseline)
print("MSE for Baseline Model:", mse_baseline)

Baseline prediction (mean): 35.817961165048544
MAE for Baseline Model: 13.460658308982937
MSE for Baseline Model: 278.81086128004523


## Modelamiento

<font color = "red">Encuentre la mejor red neuronal utilizando **GridSearchCV**, buscando la mejor combinación de los parámetros siguientes:</font>
* <font color = "red">**activation**: función de activación, a escoger entre 'logistic', 'tanh', 'relu' (valor por defecto)</font>
* <font color = "red">**max_iter**: máximo número de épocas de entrenamiento (por defecto, 200). Puede que no se necesiten todas las especificadas si se llega a convergencia).</font>
* <font color = "red">**hidden_layer_sizes**: topología de la red, vector indicando el número de neuronas por capa. Por defecto solo se tiene un capa escondidad con 100 neuronas: (100).</font>
* <font color = "red">**learning_rate_init**: tasa de aprendizaje inicial (por defecto es constante aunque se puede modificar esta tasa a medida que se va avanzando en el número de épocas). Por defecto, el valor es 0.001. </font>

In [25]:
activation_vec = ['logistic', 'relu', 'tanh']
max_iter_vec = [10, 20, 50, 75, 100, 200, 300, 400, 500, 1000, 2000]
hidden_layer_sizes_vec = [(10,), (20,), (30,), (10, 10), (20, 20), (30, 30), (20, 10), (30, 20, 10)]
learning_rate_init_vec = [0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02, 0.03, 0.04, 0.05]

In [26]:
mlp = MLPRegressor(hidden_layer_sizes=(30,30,30))

### activation

In [27]:
import time

# Start timer with clear annotation
start_time = time.time()
np.random.seed(1234)

# Parameters and scoring
parametros = {'activation': activation_vec}
scoring = {
    'mae': 'neg_mean_absolute_error',
    'mse': 'neg_mean_squared_error',  # Added MSE
    'r2': 'r2'
}

# Configure grid search
grid = GridSearchCV(
    mlp, 
    param_grid=parametros, 
    cv=5, 
    scoring=scoring, 
    refit='mae', 
    n_jobs=-1
)

# Execute grid search
grid.fit(X_train, y_train)

# Calculate execution time
execution_min = (time.time() - start_time) / 60
execution_sec = (time.time() - start_time) % 60

# Get best model metrics
best_mae = -grid.best_score_  # Convert back from negative MAE
best_mse = -grid.cv_results_['mean_test_mse'][grid.best_index_]  # Get MSE
best_r2 = grid.cv_results_['mean_test_r2'][grid.best_index_]

# Print comprehensive results
print("\n=== Grid Search Results ===")
print(f"Best parameters: {grid.best_params_}")
print(f"Best MAE: {best_mae:.4f}")
print(f"Best MSE: {best_mse:.4f}")
print(f"Best R²: {best_r2:.2%}")
print(f"\nTraining time: {int(execution_min)}m {execution_sec:.2f}s")




=== Grid Search Results ===
Best parameters: {'activation': 'relu'}
Best MAE: 5.8614
Best MSE: 58.6672
Best R²: 79.32%

Training time: 0m 8.59s


In [28]:
df = pd.DataFrame([(activation, mae*100, r2*100) for (activation, mae, r2) in 
                   zip(activation_vec, 
                       grid.cv_results_['mean_test_mae'], 
                       grid.cv_results_['mean_test_r2'],
                      )
                   ], columns = ('activation', 'MAE', 'R2'))

In [29]:
df

Unnamed: 0,activation,MAE,R2
0,logistic,-2232.698251,-166.807767
1,relu,-586.140187,79.320528
2,tanh,-1659.64301,-59.865957


In [30]:
y_pred = grid.best_estimator_.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred), ", R2:", r2_score(y_test, y_pred), "\n")

MAE: 5.480693527478118 , R2: 0.8093936855562822 



### mejor combinación

In [13]:
#activation_vec = ['logistic', 'relu']
#max_iter_vec = [50, 100, 150, 200, 300, 350, 1000, 2000]
#hidden_layer_sizes_vec = [(10,), (20,), (30,), (10, 10), (20, 20), (30, 30), (20, 10), 
#                          (10, 10, 10), (20, 20, 20), (30, 30, 30), (30, 20, 10)]
#learning_rate_init_vec = [0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02]

In [31]:
activation_vec = ['logistic', 'relu']
max_iter_vec = [50, 100, 150, 200, 250, 300, 350]
hidden_layer_sizes_vec = [(10,), (20,), (30,), (20, 10), (20, 20), (20, 30),
                           (20, 10, 10), (20, 20, 20), (20, 30, 10), (20, 10, 30)]
learning_rate_init_vec = [0.001, 0.005, 0.01, 0.05, 0.1, 1.0]

In [33]:
import time
# Start timer with clear annotation
start_time = time.time()
np.random.seed(1234)

# Parameters and scoring
parametros = {'activation': activation_vec,
              'max_iter':max_iter_vec,
              'hidden_layer_sizes': hidden_layer_sizes_vec,
              'learning_rate_init': learning_rate_init_vec
              }
scoring = {
    'mae': 'neg_mean_absolute_error',
    'mse': 'neg_mean_squared_error',  # Added MSE
    'r2': 'r2'
}

# Configure grid search
grid = GridSearchCV(
    mlp, 
    param_grid=parametros, 
    cv=5, 
    scoring=scoring, 
    refit='mae', 
    n_jobs=-1
)

# Execute grid search
grid.fit(X_train, y_train)

# Calculate execution time
execution_min = (time.time() - start_time) / 60
execution_sec = (time.time() - start_time) % 60

# Get best model metrics
best_mae = -grid.best_score_  # Convert back from negative MAE
best_mse = -grid.cv_results_['mean_test_mse'][grid.best_index_]  # Get MSE
best_r2 = grid.cv_results_['mean_test_r2'][grid.best_index_]

# Print comprehensive results
print("\n=== Grid Search Results ===")
print(f"Best parameters: {grid.best_params_}")
print(f"Best MAE: {best_mae:.4f}")
print(f"Best MSE: {best_mse:.4f}")
print(f"Best R²: {best_r2:.2%}")
print(f"\nTraining time: {int(execution_min)}m {execution_sec:.2f}s")


































































































=== Grid Search Results ===
Best parameters: {'activation': 'relu', 'hidden_layer_sizes': (20, 10, 30), 'learning_rate_init': 0.005, 'max_iter': 250}
Best MAE: 5.1365
Best MSE: 45.3471
Best R²: 83.83%

Training time: 3m 23.36s


In [35]:
df = pd.DataFrame([(mae*100, r2*100) for (mae, r2) in 
                   zip( 
                       grid.cv_results_['mean_test_mae'], 
                       grid.cv_results_['mean_test_r2'],
                      )
                   ], columns = ('MAE', 'R2'))

In [36]:
df.iloc[np.argsort(-df.MAE),]

Unnamed: 0,MAE,R2
809,-513.654291,83.833826
804,-527.526284,83.302311
685,-529.731924,82.658517
720,-530.783351,83.074972
726,-533.956190,82.484411
...,...,...
582,-3653.247992,-476.499639
752,-3696.889471,-631.563294
709,-4171.178782,-687.065814
834,-4193.501178,-883.412979


In [37]:
grid.cv_results_.keys()

dict_keys(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_activation', 'param_hidden_layer_sizes', 'param_learning_rate_init', 'param_max_iter', 'params', 'split0_test_mae', 'split1_test_mae', 'split2_test_mae', 'split3_test_mae', 'split4_test_mae', 'mean_test_mae', 'std_test_mae', 'rank_test_mae', 'split0_test_mse', 'split1_test_mse', 'split2_test_mse', 'split3_test_mse', 'split4_test_mse', 'mean_test_mse', 'std_test_mse', 'rank_test_mse', 'split0_test_r2', 'split1_test_r2', 'split2_test_r2', 'split3_test_r2', 'split4_test_r2', 'mean_test_r2', 'std_test_r2', 'rank_test_r2'])

In [38]:
y_pred = grid.best_estimator_.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred), ", R2:", r2_score(y_test, y_pred), "\n")

MAE: 5.772293849463601 , R2: 0.7681199799833709 



In [39]:
df = pd.DataFrame([(act, hidden_layers, lr, max_iter, acc*100, kappa*100) for (act, hidden_layers, lr, max_iter, acc, kappa) in 
                   zip(
                       grid.cv_results_['param_activation'], 
                       grid.cv_results_['param_hidden_layer_sizes'], 
                       grid.cv_results_['param_learning_rate_init'], 
                       grid.cv_results_['param_max_iter'], 
                       grid.cv_results_['mean_test_mae'], 
                       grid.cv_results_['mean_test_r2'],
                      )
                   ], columns = ('Activation', 'HiddenLayers', 'LearningRate', 'MaxIter', 'MAE', 'R2'))

In [101]:
df.iloc[np.argsort(-df.MAE),].head(20)

Unnamed: 0,Activation,HiddenLayers,LearningRate,MaxIter,MAE,R2
2547,relu,"(30, 20, 10)",0.001,300,-550.910383,79.267352
2595,relu,"(30, 20, 10)",0.005,2000,-562.852486,79.331937
2515,relu,"(30, 30, 30)",0.009,400,-563.55611,79.750992
2481,relu,"(30, 30, 30)",0.006,300,-566.691616,79.064271
2627,relu,"(30, 20, 10)",0.008,1000,-566.980789,79.421641
2474,relu,"(30, 30, 30)",0.005,2000,-569.073427,79.724852
2546,relu,"(30, 20, 10)",0.001,200,-571.88547,78.976864
2557,relu,"(30, 20, 10)",0.002,200,-572.79493,78.855431
2551,relu,"(30, 20, 10)",0.001,2000,-575.481511,78.311116
2592,relu,"(30, 20, 10)",0.005,400,-581.176688,78.961343
