# Regression Analysis

## Objectives

- Apply neural networks to solve regression problems involving continuous target prediction.
- Explore the effectiveness of simple linear regression and neural network models on the advertising dataset.
- Evaluate model performance using metrics suitable for regression tasks.

## Background

This notebook focuses on using both traditional statistical methods and neural networks for regression analysis. It aims to predict sales based on advertising spend across different media.

## Datasets Used

Advertising Dataset: It consists of data on advertising spending in various media like TV, radio, and newspapers, and the corresponding sales figures.

## Advertising dataset

In this notebook, we will solve a regression problems with neural networks. Remember regression consists of predicting a continuous target.

In [1]:
import numpy as np
import pandas as pd

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"

In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import root_mean_squared_error, r2_score

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense

In [3]:
df = pd.read_csv('advertising.csv')
print(df.shape)
df.head()

(200, 4)


Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


## Simple Linear Regression

Let's compute a simple linear regression model between `TV` (X) and `Sales` (y).

In [4]:
# Scatterplot
px.scatter(df, x='TV', y='Sales', 
           title="TV vs Sales", width=800, height=500)  

In [5]:
X_train, X_test, y_train, y_test = train_test_split(df.TV, df.Sales, 
                                        shuffle=True, test_size=0.3, random_state=20) 
print('Train Set: %i cases, \tTest Set: %i cases' %(X_train.shape[0], X_test.shape[0])) 

Train Set: 140 cases, 	Test Set: 60 cases


In [6]:
# Creating a DataFrame for plotting
train = pd.concat([X_train, y_train], axis=1)
train['Set'] = 'train'
test = pd.concat([X_test, y_test], axis=1)
test['Set'] = 'test'
t = pd.concat([train, test], axis=0)
t.sample(5)

Unnamed: 0,TV,Sales,Set
47,239.9,23.2,test
123,123.1,15.2,train
22,13.2,5.6,train
16,67.8,12.5,train
174,222.4,16.5,test


In [7]:
# Scatterplot
px.scatter(t, x='TV', y='Sales', color='Set',
           color_discrete_map={'train':'#636EFA', 'test':'darkblue'},
           labels={'TV':'TV Advertising', 'Sales':'Sales'},
           title="TV Advertising vs Sales: Training and Testing Sets", 
           width=800, height=500)   

In [8]:
# Saving training and testing data
data_train = pd.DataFrame({'X':X_train, 'y':y_train})
data_test  = pd.DataFrame({'X':X_test,  'y':y_test})
data_train.head()

Unnamed: 0,X,y
134,36.9,10.8
62,239.3,20.7
20,218.4,18.0
21,237.4,17.5
45,175.1,16.1


### Traditional Simple Linear Regression Model

In [9]:
# Simple linear model
model_tm = LinearRegression(fit_intercept=True)
model_tm.fit(data_train.X.values.reshape(-1,1), data_train.y)
print('Simple Linear Model: y = %.4f x + %.4f' %(model_tm.coef_[0], model_tm.intercept_))

Simple Linear Model: y = 0.0540 x + 7.1647


In [10]:
# Getting predictions
data_train['y_tm'] = model_tm.predict(data_train.X.values.reshape(-1,1))
data_test['y_tm']  = model_tm.predict(data_test.X.values.reshape(-1,1))
data_train.head()

Unnamed: 0,X,y,y_tm
134,36.9,10.8,9.156057
62,239.3,20.7,20.078974
20,218.4,18.0,18.951064
21,237.4,17.5,19.976437
45,175.1,16.1,16.614294


Evaluation metrics

- $R^2$ (coefficient of determination) It describes the proportion of variance in the response variable that is explained by the model.

- `RMSE` is the standard deviation of the residuals (prediction errors). It measures how dispersed these residuals are; in other words, it tells you how concentrated the data is around the line of best fit. The lower the RMSE, the better a model fits a dataset.

In [11]:
# Computing R² (We save the test score for later)
print('Training Set: R\u00b2 = %.4f' %(r2_score(data_train.y, data_train.y_tm)))
r2_tm = r2_score(data_test.y, data_test.y_tm)
print('Testing Set:  R\u00b2 = %.4f' %(r2_tm))

Training Set: R² = 0.8099
Testing Set:  R² = 0.8140


In [12]:
# Computing the RMSE
print('Training Set: RMSE = %.2f' %(root_mean_squared_error(data_train.y, data_train.y_tm)))
print('Testing Set:  RMSE = %.2f' %(root_mean_squared_error(data_test.y,  data_test.y_tm)))

Training Set: RMSE = 2.19
Testing Set:  RMSE = 2.51


In [13]:
# Scatterplot with traditional model
sorted_data_train = data_train.sort_values(by='X')
# Train and test sets
fig = px.scatter(t, x='TV', y='Sales', color='Set',
           color_discrete_map={'train':'#636EFA', 'test':'darkblue'},
           labels={'TV':'TV Advertising', 'Sales':'Sales', 'Set':''},
           title="TV Advertising vs Sales: Traditional Simple Regression Model", 
           width=800, height=500)  
# Traditional Simple Regression Model
fig.add_scatter(x=sorted_data_train.X, y=sorted_data_train.y_tm, mode='lines',
                name=f'Traditional Model (R\u00b2={r2_tm:.2f})')  
# Update legend position
fig.update_layout(legend=dict(
        orientation='h', x=0, y=1.1,
        xanchor='left',  yanchor='top'))
fig.show()

### Simple Linear Regression using Neural Networks

Let's start by standardizing the variables.

In [14]:
# Standardize the input variables
scaler = StandardScaler()
data_train['Xs'] = scaler.fit_transform(data_train.X.values.reshape(-1,1))
data_test['Xs']  = scaler.fit_transform(data_test.X.values.reshape(-1,1))
data_train.head()

Unnamed: 0,X,y,y_tm,Xs
134,36.9,10.8,9.156057,-1.345216
62,239.3,20.7,20.078974,1.07335
20,218.4,18.0,18.951064,0.823606
21,237.4,17.5,19.976437,1.050646
45,175.1,16.1,16.614294,0.306196


In [15]:
# Defining the model
model_nn = Sequential([
    Input(shape=[1]),               # Input layer: explicitly define the input shape
    Dense(64, activation='relu'),   # Hidden layer with 64 neurons and ReLU activation: 
    Dense(64, activation='relu'),   # Hidden layer with 64 neurons and ReLU activation
    Dense(1)                        # Output layer
])

# Model summary
model_nn.summary()


The network ends with a single unit and no activation. It is typical for regression problems when we want to predict a single continuous value.

In [16]:
# Compile the model
model_nn.compile(optimizer='rmsprop',
                 loss='mse',
                 metrics=['mae'])

We compile the network with the `mse` loss function (mean squared error). It is a widely used loss function for regression problems. It computes the square of the difference between the predictions and the targets.

We are monitoring the metric `mae` (mean absolute error) during training. It represents the absolute value of the difference between the predictions and the targets.

In [17]:
# Fit the model
history_nn = model_nn.fit(data_train.Xs, data_train.y, 
                          batch_size=10,
                          epochs=20, 
                          validation_data=(data_test.Xs, data_test.y),  
                          verbose=0)

In [18]:
history_nn.history.keys()

dict_keys(['loss', 'mae', 'val_loss', 'val_mae'])

In [19]:
def plot_history_loss(history):
    '''
    Plotting losses results of the neural network training process
    '''
    hist = history.history
    d = pd.DataFrame({'epochs': [epoch + 1 for epoch in history.epoch],
                      'loss': hist['loss'],
                      'val_loss': hist['val_loss']})
    
    fig = px.line(d, x='epochs', y=['loss', 'val_loss'],
                  color_discrete_sequence=['orange', 'peru'],
                  labels={'epochs': 'Epochs', 'value': 'Loss', 'variable': 'Legend'},
                  title='Neural Network Training Loss History', width=800, height=500)
    
    fig.update_traces(mode='lines+markers')
    
    return fig.show()

In [20]:
plot_history_loss(history_nn)

In [21]:
def plot_history_mae(history):
    '''
    Plotting mae results of the neural network training process
    '''
    hist = history.history
    d = pd.DataFrame({'epochs': [epoch + 1 for epoch in history.epoch],
                      'mae': hist['mae'],
                      'val_mae': hist['val_mae']})
    
    fig = px.line(d, x='epochs', y=['mae', 'val_mae'],
                  color_discrete_sequence=['violet', 'deeppink'],
                  labels={'epochs': 'Epochs', 'value': 'MAE', 'variable': 'Legend'},
                  title='Neural Network Training MAE History', width=800, height=500)
    
    fig.update_traces(mode='lines+markers')
    
    return fig.show()

In [22]:
plot_history_mae(history_nn)

In [23]:
# Getting predictions
data_train['y_nn'] = model_nn.predict(data_train.Xs)
data_test['y_nn']  = model_nn.predict(data_test.Xs)
data_train.head()

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step 


Unnamed: 0,X,y,y_tm,Xs,y_nn
134,36.9,10.8,9.156057,-1.345216,9.891116
62,239.3,20.7,20.078974,1.07335,19.84333
20,218.4,18.0,18.951064,0.823606,18.410471
21,237.4,17.5,19.976437,1.050646,19.71307
45,175.1,16.1,16.614294,0.306196,15.7244


In [24]:
# Computing R² (We save the test score for later)
print('Training Set: R\u00b2 = %.4f' %(r2_score(data_train.y, data_train.y_nn)))
r2_nn = r2_score(data_test.y, data_test.y_nn)
print('Testing Set:  R\u00b2 = %.4f' %(r2_nn))

Training Set: R² = 0.7304
Testing Set:  R² = 0.7332


In [25]:
# Computing the RMSE
print('Training Set: RMSE = %.2f' %(root_mean_squared_error(data_train.y, data_train.y_nn)))
print('Testing Set:  RMSE = %.2f' %(root_mean_squared_error(data_test.y,  data_test.y_nn)))

Training Set: RMSE = 2.61
Testing Set:  RMSE = 3.00


In [26]:
# Scatterplot with models
sorted_data_train = data_train.sort_values(by='X')
# Train and test sets
fig = px.scatter(t, x='TV', y='Sales', color='Set',
           color_discrete_map={'train':'#636EFA', 'test':'darkblue'},
           labels={'TV':'TV Advertising', 'Sales':'Sales', 'Set':''},
           title="TV Advertising vs Sales: Traditional Simple Regression Model", 
           width=800, height=500)  
# Traditional Simple Regression Model
fig.add_scatter(x=sorted_data_train.X, y=sorted_data_train.y_tm, mode='lines',
                name=f'Traditional Model (R\u00b2={r2_tm:.2f})')
# Neural Network Regression Model
fig.add_scatter(x=sorted_data_train.X, y=sorted_data_train.y_nn, mode='lines',
                name=f'Neural Network (R\u00b2={r2_nn:.2f})')
# Update the legend position
fig.update_layout(legend=dict(
        orientation='h', x=0, y=1.1,
        xanchor='left',  yanchor='top'))
fig.show()

There could be several reasons why the neural network model performed poorly compared to the traditional linear regression model. Here are a few possible explanations:

- `Insufficient Data`: Neural networks typically require a large amount of data to generalize well. Our dataset is small, and the neural network might struggle to learn complex patterns effectively. 

- `Model Architecture`: The choice of neural network architecture, including the number of layers, neurons per layer, and the activation functions, can impact performance. Our current architecture could not be suitable for the given problem. 

- `Hyperparameter Tuning`: Neural networks have several hyperparameters that must be properly tuned for optimal performance. They include learning rate, batch size, number of epochs, regularization techniques, and dropout rate. Poorly chosen hyperparameters can lead to suboptimal results. 

- `Overfitting`: Neural networks are prone to overfitting, which occurs when the model becomes too complex and memorizes the training data instead of generalizing well to unseen data. If the neural network is overfitting, it will perform poorly on the test set. 

Experimenting with different approaches, analyzing the results, and iterating on your models and data preprocessing techniques are essential. Neural networks can be powerful tools but require careful consideration and experimentation to achieve optimal performance.

## Multiple Linear Regression

In [27]:
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


Let's compute a multiple linear regression model.

In [28]:
X2 = df[['TV', 'Radio', 'Newspaper']]       # Features matrix
y = df['Sales']                             # Target variable

In [29]:
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y, test_size=0.3, random_state=20) 
print('Train Set: %i cases, \tTest Set: %i cases' %(X_train2.shape[0], X_test2.shape[0])) 

Train Set: 140 cases, 	Test Set: 60 cases


In [30]:
# Saving training and testing data
data_train2 = pd.concat([X_train2, y_train2], axis=1)
data_test2  = pd.concat([X_test2, y_test2], axis=1)
data_train2.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
134,36.9,38.6,65.6,10.8
62,239.3,15.5,27.3,20.7
20,218.4,27.7,53.4,18.0
21,237.4,5.1,23.5,17.5
45,175.1,22.5,31.5,16.1


### Traditional Multiple Linear Regression Model

In [31]:
model_tm2 = LinearRegression(fit_intercept=True)
model_tm2.fit(X_train2, y_train2)
print('Coefficients =', np.round(model_tm2.coef_, 4))
print('Intercept = %.4f' %(model_tm2.intercept_))

Coefficients = [ 0.0524  0.1127 -0.0005]
Intercept = 4.9046


In [32]:
# Getting predictions
data_train2['y_tm'] = model_tm2.predict(X_train2)
data_test2['y_tm']  = model_tm2.predict(X_test2)
data_train2.head()

Unnamed: 0,TV,Radio,Newspaper,Sales,y_tm
134,36.9,38.6,65.6,10.8,11.157619
62,239.3,15.5,27.3,20.7,19.18269
20,218.4,27.7,53.4,18.0,19.449405
21,237.4,5.1,23.5,17.5,17.912654
45,175.1,22.5,31.5,16.1,16.604237


In [33]:
# Computing R² (We save the test score for later)
print('Training Set: R\u00b2 = %.4f' %(r2_score(data_train2.Sales, data_train2.y_tm)))
r2_tm2 = r2_score(data_test2.Sales, data_test2.y_tm)
print('Testing Set:  R\u00b2 = %.4f' %(r2_tm2))

Training Set: R² = 0.9186
Testing Set:  R² = 0.8702


In [34]:
# Computing the RMSE
print('Training Set: RMSE = %.2f' %(root_mean_squared_error(data_train2.Sales, data_train2.y_tm)))
print('Testing Set:  RMSE = %.2f' %(root_mean_squared_error(data_test2.Sales,  data_test2.y_tm)))

Training Set: RMSE = 1.43
Testing Set:  RMSE = 2.09


### Multiple Linear Regression using Neural Networks

In [35]:
data_train2.head()

Unnamed: 0,TV,Radio,Newspaper,Sales,y_tm
134,36.9,38.6,65.6,10.8,11.157619
62,239.3,15.5,27.3,20.7,19.18269
20,218.4,27.7,53.4,18.0,19.449405
21,237.4,5.1,23.5,17.5,17.912654
45,175.1,22.5,31.5,16.1,16.604237


In [36]:
# Standardize the input variables
data_train2[['TVs','RadioS','NewspaperS']] = scaler.fit_transform(data_train2[['TV','Radio','Newspaper']])
data_test2[['TVs','RadioS','NewspaperS']] = scaler.fit_transform(data_test2[['TV','Radio','Newspaper']])
data_train2.head()

Unnamed: 0,TV,Radio,Newspaper,Sales,y_tm,TVs,RadioS,NewspaperS
134,36.9,38.6,65.6,10.8,11.157619,-1.345216,1.10854,1.515489
62,239.3,15.5,27.3,20.7,19.18269,1.07335,-0.456492,-0.187158
20,218.4,27.7,53.4,18.0,19.449405,0.823606,0.370062,0.973131
21,237.4,5.1,23.5,17.5,17.912654,1.050646,-1.161094,-0.356089
45,175.1,22.5,31.5,16.1,16.604237,0.306196,0.01776,-0.000445


In [37]:
# Defining the model
model_nn2 = Sequential([
    Input(shape=[3]),               # Input layer: explicitly define the input shape
    Dense(256, activation='relu'),  # Hidden layer with 256 neurons and ReLU activation:
    Dense(64, activation='relu'),   # Hidden layer with 64 neurons and ReLU activation: 
    Dense(16, activation='relu'),   # Hidden layer with 16 neurons and ReLU activation
    Dense(1)                        # Output layer
])

# Model summary
model_nn2.summary()

In [38]:
# Compile the model
model_nn2.compile(optimizer='rmsprop',
                  loss='mse',
                  metrics=['mae'])

In [39]:
# Fit the model
history_nn2 = model_nn2.fit(X_train2, y_train2, 
                            batch_size=10,
                            epochs=30, 
                            validation_data=(X_test2, y_test2),  
                            verbose=0)

In [40]:
plot_history_loss(history_nn2)

In [41]:
plot_history_mae(history_nn2)

In [42]:
# Getting predictions
data_train2['y_nn'] = model_nn2.predict(X_train2)
data_test2['y_nn']  = model_nn2.predict(X_test2)
data_train2.head()

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step 


Unnamed: 0,TV,Radio,Newspaper,Sales,y_tm,TVs,RadioS,NewspaperS,y_nn
134,36.9,38.6,65.6,10.8,11.157619,-1.345216,1.10854,1.515489,10.469641
62,239.3,15.5,27.3,20.7,19.18269,1.07335,-0.456492,-0.187158,17.220997
20,218.4,27.7,53.4,18.0,19.449405,0.823606,0.370062,0.973131,17.370277
21,237.4,5.1,23.5,17.5,17.912654,1.050646,-1.161094,-0.356089,15.688094
45,175.1,22.5,31.5,16.1,16.604237,0.306196,0.01776,-0.000445,14.13816


In [43]:
# Computing R² (We save the test score for later)
print('Training Set: R\u00b2 = %.4f' %(r2_score(data_train2.Sales, data_train2.y_nn)))
r2_nn2 = r2_score(data_test2.Sales, data_test2.y_nn)
print('Testing Set:  R\u00b2 = %.4f' %(r2_nn2))

Training Set: R² = 0.7509
Testing Set:  R² = 0.7874


In [44]:
# Computing the RMSE
print('Training Set: RMSE = %.2f' %(root_mean_squared_error(data_train2.Sales, data_train2.y_nn)))
print('Testing Set:  RMSE = %.2f' %(root_mean_squared_error(data_test2.Sales,  data_test2.y_nn)))

Training Set: RMSE = 2.50
Testing Set:  RMSE = 2.68


Key points:

- Regression is done using different loss functions than we used for classification. Mean squared error (`mse`) is a widely used loss function for regression.

- Similarly, evaluation metrics used for regression differ from those used for classification. A standard regression metric is the mean absolute error (`mae`).

- When features in the input data have values in different ranges, they should be scaled as a preprocessing step.

- When little training data is available, a small network with few hidden layers is recommended to avoid severe overfitting.

## Conclusions

Key Takeaways:
- Both traditional regression and neural network models can predict continuous outcomes effectively.
- Neural networks require careful tuning of architecture and parameters to perform well on regression tasks.
- Standardization of input features is crucial for neural network performance.
- Performance metrics such as R2 and RMSE are valuable for evaluating the accuracy and reliability of regression models.

## References

- Chollet, F. (2021) *Deep Learning with Python*, Second Edition, Manning Publications Co, chap 3