### Multiple Linear Regressions

In the previous subject, we learned about simple linear regressions which use the equation of a line to model the relationship between a single feature and the target variable. We will now see how to generalize this idea to multiple features.

At the end of this unit, you should be able to fit linear regressions to datasets with multiple features using the lstsq() function from the Scipy library.

Note that in this subject we will be working only with training data. Therefore, all the subsequent predictions and model evaluations will be in-sample, i.e. on the training data. The only exception is the bike sharing exercise and solution at the end of this subject where we use both train and test data.

#### Multiple linear regressions

First, let’s start by reviewing some of the mathematical notations from the previous units. We will use the marketing campaign dataset for illustration. You can find the dataset file in a zipped folder under the resources tab.



In [2]:
import pandas as pd

# Load data
data_df = pd.read_csv("c3_marketing-campaign.csv")
print("data_df shape:", data_df.shape)
data_df.head()

data_df shape: (50, 4)


Unnamed: 0,tv,web,radio,sales
0,0.916,1.689,0.208,1.204
1,9.359,1.706,1.071,4.8
2,5.261,2.538,2.438,3.97
3,8.682,2.092,1.283,5.212
4,11.736,1.66,1.8,5.993


In our example, the X matrix corresponds to the data_df DataFrame without the sales column. Hence, its shape is (50,3).

So far, we fitted models using a single feature, e.g., the tv budget. Hence, it wasn’t necessary to write the arrow above each data point xi in the equation of the simple linear regression.

However, we now have p values x(i,p) for each data point  and our goal is to model a linear relationship between these p variables and the target one y. To achieve this, instead of using the equation of a line, we can generalize to multiple dimensions by using the equation of a hyperplane.

In this equation, we multiply each feature with a coefficient and add a w0 parameter that corresponds to the intercept term. We can also write the equation using the inner product between the data point xi and a vector with the coefficients w.

y_pred = Xw + w0

#### Implementation with Scipy

The first step is to create the input matrix Xand the output vector y.

In [3]:
# Extract input matrix X
X = data_df.drop("sales", axis=1).values
print("X:", X.shape)
# Extract target vector y
y = data_df.sales.values
print("y:", y.shape)

X: (50, 3)
y: (50,)


We now want to find the vector of coefficients w that minimizes an objective function. In this unit, we will use the lstsq() function from Scipy which computes the least squares solution of the equation Ax=b. In our case, A, b and x correspond respectively to 
X, y and w, and the least squares solution is the vector w that minimizes the squared distances between the two sides of the equation.

Xw=y

In other words, the lstsq() function will return the parameter values that minimize the squares of the difference between the predictions of our linear regression model (without the intercept term)

In [4]:
from scipy.linalg import lstsq

# Fit a multiple linear regression
w, rss, _, _ = lstsq(X, y)
print("w:", w)
print("RSS:", rss)

w: [0.3958359  0.47521518 0.31040001]
RSS: 1.6884039033000027


The function returns four values. As for now, we will only look at the first two. The first one w is the vector of coefficients (one for each feature) and the second one rss is the residual sum of squares.

For reference, the RSS score of our simple linear regression model was around 15.7. Hence, we gained a lot in accuracy by bringing the two other marketing budgets in the equation.

#### Adding the intercept term
The code from above computes the optimal solution without the intercept term w0. However, it’s possible to make the lstsq() function compute it using a little trick. The idea is to add a column of ones in the matrix X. This column corresponds to the 
w0 element in w.

Adding the intercept term
The code from above computes the optimal solution without the intercept term w0. However, it’s possible to make the lstsq() function compute it using a little trick. The idea is to add a column of ones in the matrix X. This column corresponds to the w0 element in w.

In [5]:
import numpy as np

# Add a column of ones
X1 = np.c_[np.ones(X.shape[0]),  # Vector of ones of shape (n,)
           X]                    # X matrix of shape (n,p)

X1[:5, :]

array([[ 1.   ,  0.916,  1.689,  0.208],
       [ 1.   ,  9.359,  1.706,  1.071],
       [ 1.   ,  5.261,  2.538,  2.438],
       [ 1.   ,  8.682,  2.092,  1.283],
       [ 1.   , 11.736,  1.66 ,  1.8  ]])

In [6]:
w, rss, _, _ = lstsq(X1, y)

print("w:", w)
print("RSS:", rss)

w: [0.02487092 0.39465146 0.47037002 0.30669954]
RSS: 1.685450868082472


The array w has now four elements. The first one w[0] corresponds to the intercept term and the other three w[1:] to the coefficients. Note that the intercept is close to zero. Hence the RSS score didn’t change significantly.

We can now use this vector w to compute predictions.

In [7]:
# Compute predictions
y_pred = np.matmul(X1, w)
print("y_pred:", y_pred.shape)

y_pred: (50,)


This code computes the predictions for the data points in X, and we can verify that we get the same RSS score as lstsq().

In [8]:
# Verify RSS score
def RSS(y, y_pred):
    return np.sum(np.square(np.subtract(y, y_pred)))


rss = RSS(y, y_pred)
print("RSS:", rss)

RSS: 1.6854508680824711
