**Recap of the Dataset:**
* **Dataset:** Advertising and Sales
* **Download Link (if needed again):** [https://www.statlearning.com/s/Advertising.csv](https://www.statlearning.com/s/Advertising.csv)
* **Features:** TV, Radio, Newspaper advertising budgets (in thousands of dollars).
* **Target Variable:** Sales (in thousands of units).

----
**Part 2: Multiple Linear Regression (MLR)**

As you outlined in your initial notes, **Multiple Linear Regression (MLR)** extends Simple Linear Regression by allowing us to use *two or more* independent variables (features) to predict a single dependent (target) variable.

* **Goal:** To model the linear relationship between a set of features ($X_1, X_2, ..., X_p$) and a continuous target variable ($y$).
* **The Model Equation:**
    $$\hat{y} = b_0 + b_1 X_1 + b_2 X_2 + ... + b_p X_p$$
    Where:
    * $ \hat{y} $ is the predicted value.
    * $ X_1, X_2, ..., X_p $ are the values of the $p$ independent features.
    * $ b_0 $ is the **intercept**: the predicted value of $y$ when all features ($X_1, ..., X_p$) are zero.
    * $ b_1, b_2, ..., b_p $ are the **coefficients** for each feature.
* **Crucial Interpretation of Coefficients ($b_j$):** The coefficient $b_j$ represents the estimated change in the dependent variable $y$ for a **one-unit increase** in the feature $X_j$, **assuming all other features ($X_k$ where $k \neq j$) are held constant.** This "holding all else constant" (ceteris paribus) condition is fundamental to understanding MLR.

**Why Visualization is Harder:**
In SLR, we could easily plot the data and the regression line in 2D. With two features ($X_1, X_2$) and one target ($y$), we'd be fitting a 2D plane in a 3D space. With more than two features, we're fitting a hyperplane in a higher-dimensional space, which we can't directly visualize. So, we rely more on numerical summaries and coefficient interpretation.

---
**MLR Example: Using TV, Radio, and Newspaper Budgets to Predict Sales**

Let's use all three advertising channels to predict sales.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # Still useful for some plots like residuals
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [3]:
# 1. Load the Data (assuming 'Advertising.csv' is available)
try:
    df_adv = pd.read_csv('Advertising.csv', index_col=0)
except FileNotFoundError:
    print("Error: 'Advertising.csv' not found. Please download it and place it in the correct directory.")
    exit()

print("Advertising Dataset Sample:")
print(df_adv.head())

Advertising Dataset Sample:
      TV  radio  newspaper  sales
1  230.1   37.8       69.2   22.1
2   44.5   39.3       45.1   10.4
3   17.2   45.9       69.3    9.3
4  151.5   41.3       58.5   18.5
5  180.8   10.8       58.4   12.9


In [5]:
# 2. Prepare the Data for Multiple Linear Regression
# Features: 'TV', 'Radio', 'Newspaper'
X_mlr = df_adv[['TV', 'radio', 'newspaper']]
# Target: 'Sales'
y_mlr = df_adv['sales']

print("\nFeatures (X):")
print(X_mlr.head())
print("\nTarget (y):")
print(y_mlr.head())


Features (X):
      TV  radio  newspaper
1  230.1   37.8       69.2
2   44.5   39.3       45.1
3   17.2   45.9       69.3
4  151.5   41.3       58.5
5  180.8   10.8       58.4

Target (y):
1    22.1
2    10.4
3     9.3
4    18.5
5    12.9
Name: sales, dtype: float64


In [6]:
# 3. Split the Data into Training and Test Sets
X_train_mlr, X_test_mlr, y_train_mlr, y_test_mlr = train_test_split(X_mlr, y_mlr, test_size=0.2, random_state=42)

print(f"\nTraining set size: {X_train_mlr.shape[0]} samples, {X_train_mlr.shape[1]} features")
print(f"Test set size: {X_test_mlr.shape[0]} samples, {X_test_mlr.shape[1]} features")


Training set size: 160 samples, 3 features
Test set size: 40 samples, 3 features


In [7]:
# 4. Create and Train the Multiple Linear Regression Model
mlr_model = LinearRegression()
mlr_model.fit(X_train_mlr, y_train_mlr)

In [8]:
# 5. Get the Intercept (b0) and Coefficients (b1, b2, b3)
b0_mlr = mlr_model.intercept_
coefficients_mlr = mlr_model.coef_

print(f"\nModel Intercept (b0): {b0_mlr:.4f}")
print("Model Coefficients (b1 for TV, b2 for Radio, b3 for Newspaper):")
for feature, coef in zip(X_mlr.columns, coefficients_mlr):
    print(f"  Coefficient for {feature}: {coef:.4f}")


Model Intercept (b0): 2.9791
Model Coefficients (b1 for TV, b2 for Radio, b3 for Newspaper):
  Coefficient for TV: 0.0447
  Coefficient for radio: 0.1892
  Coefficient for newspaper: 0.0028


In [11]:
# 6. Interpretation of Coefficients:
print("\nInterpreting the coefficients:")
print(f"- Intercept ({b0_mlr:.2f}): If $0 were spent on TV, Radio, and Newspaper advertising,")
print(f"  the model predicts sales of approximately {b0_mlr:.2f} thousand units.")
print(f"  (Often, the intercept isn't very interpretable if X=0 is far outside the range of data).\n")

for feature, coef in zip(X_mlr.columns, coefficients_mlr):
    print(f"- {feature} ({coef:.4f}): For each additional $1000 spent on {feature} advertising,")
    print(f"  sales are predicted to {'increase' if coef > 0 else 'decrease'} by approximately {abs(coef):.4f} thousand units,")
    print(f"  assuming spending on the other two advertising channels is held constant.\n")


Interpreting the coefficients:
- Intercept (2.98): If $0 were spent on TV, Radio, and Newspaper advertising,
  the model predicts sales of approximately 2.98 thousand units.
  (Often, the intercept isn't very interpretable if X=0 is far outside the range of data).

- TV (0.0447): For each additional $1000 spent on TV advertising,
  sales are predicted to increase by approximately 0.0447 thousand units,
  assuming spending on the other two advertising channels is held constant.

- radio (0.1892): For each additional $1000 spent on radio advertising,
  sales are predicted to increase by approximately 0.1892 thousand units,
  assuming spending on the other two advertising channels is held constant.

- newspaper (0.0028): For each additional $1000 spent on newspaper advertising,
  sales are predicted to increase by approximately 0.0028 thousand units,
  assuming spending on the other two advertising channels is held constant.



In [12]:
# 7. Make Predictions on the Test Set
y_pred_mlr = mlr_model.predict(X_test_mlr)

# Compare some actual vs. predicted values
df_predictions = pd.DataFrame({'Actual Sales': y_test_mlr, 'Predicted Sales (MLR)': y_pred_mlr})
print("\nSample of Actual vs. Predicted Sales (MLR):")
print(df_predictions.head())


Sample of Actual vs. Predicted Sales (MLR):
     Actual Sales  Predicted Sales (MLR)
96           16.9              16.408024
16           22.4              20.889882
31           21.4              21.553843
159           7.3              10.608503
129          24.7              22.112373


In [13]:
# 8. Evaluate the Model
mse_mlr = mean_squared_error(y_test_mlr, y_pred_mlr)
r2_mlr = r2_score(y_test_mlr, y_pred_mlr)

print(f"\nModel Performance on Test Data (MLR):")
print(f"Mean Squared Error (MSE): {mse_mlr:.4f}")
print(f"R-squared (R²): {r2_mlr:.4f}")


Model Performance on Test Data (MLR):
Mean Squared Error (MSE): 3.1741
R-squared (R²): 0.8994


In [14]:
# Let's recall the R-squared from the SLR model using only TV (if you ran that code cell)
# (This is a conceptual comparison, actual value depends on the SLR run)
# r2_tv_slr_example = 0.81 # Example value from a typical TV SLR run on this dataset
# print(f"\nFor comparison, a typical R-squared for SLR with only TV was around 0.81 (example).")
print(f"The R-squared for MLR ({r2_mlr:.4f}) suggests that this model, using TV, Radio, and Newspaper,")
print(f"explains approximately {r2_mlr*100:.2f}% of the variance in Sales.")
if 'r2_tv' in locals() or 'r2_tv' in globals(): # Check if the previous R2 value exists
    print(f"Comparing to the R-squared from SLR with only TV ({r2_tv:.4f}):")
    if r2_mlr > r2_tv:
        print("The MLR model with all three features explains more variance than the SLR model with only TV.")
    else:
        print("The MLR model does not explain significantly more variance than the SLR model with only TV, or there might be an issue.")
else:
    print("To compare R-squared with the previous SLR model, ensure the SLR example was run in the same session.")

The R-squared for MLR (0.8994) suggests that this model, using TV, Radio, and Newspaper,
explains approximately 89.94% of the variance in Sales.
To compare R-squared with the previous SLR model, ensure the SLR example was run in the same session.


---
**Key Points from the MLR Example:**

1.  **Feature Selection:** We now include `['TV', 'Radio', 'Newspaper']` in our `X_mlr`.
2.  **Coefficients:**
    * The `mlr_model.coef_` will now be an array with three values, one for each feature.
    * Notice how the coefficient for 'TV' in the MLR model might be different from its coefficient in the SLR model. This is because the MLR coefficient for TV is calculated *while accounting for the effects of Radio and Newspaper*.
    * You might observe that the 'Newspaper' coefficient is very small or even slightly negative (this can happen due to multicollinearity or if Newspaper ads truly have a negligible or slightly detrimental effect when TV and Radio are already considered). This is a common finding with this dataset.
3.  **Interpretation is Key:**
    * A positive coefficient for 'Radio' (e.g., 0.179) means that for every $1000 increase in Radio ad spend, sales are predicted to increase by 0.179 thousand units (or 179 units), *assuming TV and Newspaper spending remain unchanged*.
4.  **Model Performance ($R^2$):**
    * Typically, the $R^2$ for an MLR model will be higher than for an SLR model using only one of its features, because the MLR model has more information to make predictions. We expect to explain more of the variance in 'Sales' by using all three advertising methods. With this dataset, you should see a significant jump in $R^2$ (often to around 0.90 or higher on the test set) compared to using TV alone.

This example shows how we extend the linear regression framework to multiple predictors and the critical importance of how we interpret the resulting coefficients.