# **Regression MLP using Sequential API**

# **Setting up the environment**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

In [2]:
from sklearn.datasets import load_diabetes
data = load_diabetes()

In [3]:
data.keys()

dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename', 'target_filename', 'data_module'])

In [4]:
print(data.DESCR)

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - age     age in years
      - sex
      - bmi     body mass index
      - bp      average blood pressure
      - s1      tc, total serum cholesterol
      - s2      ldl, low-density lipoproteins
      - s3      hdl, high-density lipoproteins
      - s4      tch, total cholesterol / HDL
      - s5      ltg, possibly log of serum triglycerides level
      - s6      glu, blood sugar level

Note: Each of these 1

In [5]:
X = data.data
y = data.target

In [6]:
from sklearn.model_selection import train_test_split
X_train_full , X_test , y_train_full , y_test = train_test_split(X , y , random_state=42)
X_train , X_valid , y_train , y_valid = train_test_split(X_train_full ,y_train_full , random_state=42)

In [7]:
data.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

# **MLP**

In [8]:
import tensorflow as tf
tf.random.set_seed(42)

In [9]:
norm_layer = tf.keras.layers.Normalization(input_shape = X_train.shape[1:])

model = tf.keras.Sequential([
    norm_layer,
    tf.keras.layers.Dense(50 , activation="relu"),
    tf.keras.layers.Dense(25 , activation="relu"),
    tf.keras.layers.Dense(1),
])

In [10]:
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.01)
model.compile(loss="mse" , optimizer=optimizer , metrics=["RootMeanSquaredError"])
norm_layer.adapt(X_train)

In [11]:
history = model.fit(X_train , y_train , epochs=20 , validation_data = (X_valid , y_valid))
mse_test , rmse_test = model.evaluate(X_test , y_test)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [12]:
mse_test , rmse_test

(3020.87353515625, 54.96247482299805)

# **Linear Regression**

In [13]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

X_train , X_test , y_train , y_test = train_test_split(X , y , random_state=42)
reg = LinearRegression()
reg.fit(X_train ,y_train)
y_pred = reg.predict(X_test)

print(f"Root mean squared error of the Model: {np.sqrt(mean_squared_error(y_pred , y_test))}")

Root mean squared error of the Model: 53.369566710321955


**Comparing to traditional linear approach MLP  performs slightly worse , this is because we only have 442 instances in the dataset . For large datasets Multi Layer Perceptrons scale well than other ML models**