Prediction of the evolution of diabetes in patients using MLP
===

It is desired to build a nonlinear regression model (artificial neural networks) that allows predicting the progress of diabetes with a horizon of twelve months based on physical variables and laboratory tests. See https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

In this problem, there is a database of ten base variables (age, sex, body mass index, blood pressure, and six variables measured in blood) for 442 patients, and an index that measures the progress of diabetes one year later. Of the test. Column Y is the explained variable.

In [2]:
#
# The sample is divided into three parts:
#
#   * X_train, y_true_train: is the sample to estimate the optimal parameters
#
#   * X_test, y_true_test: is the sample to select the best configuration
#
#   * X_val, y_true_val: is the sample to test the model in production
#
import warnings

import pandas as pd
import pytest
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error

warnings.filterwarnings("ignore")

df = pd.read_csv("https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/diabetes.csv")
print(df.columns)
y_true = df.pop('target')
y_true_fit = df[:350]
y_true_test = df[350:400]
y_true_val = df[400:]

X_fit = df[:350]
X_test = df[350:400]
X_val = df[400:]

Index(['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6',
       'target'],
      dtype='object')


In [46]:
#
# Use the sample (X_train, y_true_train) for estimation
# of the optimal weights of the neural network.
#
# Select the optimal model as the one that minimizes the error
# root mean square for the sample (X_test, y_true_test).
#
# Consider only models from one (1) to (5)
# neurons in the hidden layer. Consider only the
# following seeds to initialize the neural network
#1000, 1001, 1002, 1003, 1004, 1005.
#
# Compute the mean square error for the sample
# (X_val, y_true_val). This sample represents the operation
# of the model in production
# 
# answer/
# True
#

# >>> Inserte su codigo aquí >>>
from sklearn.model_selection import train_test_split

# df_x = df.drop(labels=['target'], axis=1)
# df_x.head()
# df_y = df['target']

#The sizes of train/test/val sets are not specified in the exercise. Insert them here, and everything else should work properly.
#X_train, X_test, y_true_train, y_true_test = train_test_split(df_x, df_y, test_size=0.25) 
#X_test, X_val, y_true_test, y_true_val = train_test_split(X_test, y_true_test, test_size=0.5)

seeds = [1, 2, 3, 4, 5]
mse_opt=float("inf")
hls_opt=0
for hls in seeds:
  reg = MLPRegressor(hidden_layer_sizes=(hls,),
     activation="logistic",     
    learning_rate="adaptive",
    momentum=0.9,
    learning_rate_init=0.01,
    max_iter=1000,
    random_state=1005).fit(X_fit, y_true_fit)#hidden_layer_sizes=(5,), activation="logistic", solver="sgd", random_state=0).fit(X_fit, y_true_fit)
  mse = mean_squared_error(y_true_test, reg.predict(X_test))
  if mse < mse_opt:
    mse_opt = mse
    hls_opt = hls


reg_opt = MLPRegressor(hidden_layer_sizes=hls_opt, random_state=1005).fit(X_fit, y_true_fit)
mse_val = mean_squared_error(y_true_val, reg_opt.predict(X_val))
print('MSE validation set:', mse_val)
print(hls_opt)
# <<<

# ---->>> Evaluación ---->>>
pytest.approx(mse_val, 0.0001) == 0.009535

MSE validation set: 0.009535839389139395
5


True