# Multiple Linear Regression — Lab Practice (Automobile Battery Sales)


**Goal:** Build and interpret a multiple linear regression (MLR) model using a CSV file.  
**Data file:** `battery_sales_practice.csv` (each row = one region).

**Target (Y):** `BatterySales_week`  
**Predictors (X):** `Vehicles_thousands`, `ColdIndex_days`, `Advertising_k`, `Shops`, `MedianIncome_k`


## 0) Setup

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.stattools import durbin_watson


## 1) Load the CSV

In [None]:

from google.colab import drive
drive.mount('/content/battery_sales_practice')

df = pd.read_csv('/content/battery_sales_practice/MyDrive/Colab Notebooks/battery_sales_practice.csv')


## 2) Quick data check

In [None]:

display(df.info())
# df.describe(include="all")
# Preview the first rows
df.head()


## 3) Visuals: Y vs each X

In [None]:

target = "BatterySales_week"
predictors = ["Vehicles_thousands", "ColdIndex_days", "Advertising_k", "Shops", "MedianIncome_k"]

for col in predictors:
    plt.figure()
    plt.scatter(df[col], df[target], s=35)
    plt.xlabel(col); plt.ylabel(target)
    plt.title(f"{target} vs {col}")
    plt.show()


## 4) Fit the MLR model + Coefficients analysis + Model fit + Residual tests

In [None]:

X = sm.add_constant(df[predictors])
y = df[target]
mlr = sm.OLS(y, X).fit()
print(mlr.summary())


# formula_str = "BatterySales_week ~ Vehicles_thousands + ColdIndex_days + Advertising_k + Shops + MedianIncome_k"
# result_multi = smf.ols(formula=formula_str, data=df).fit()
# print(result_multi.summary())



# b = mlr.params
# print("Fitted MLR equation:")
# print(f"{target} = "
#       f"{b['const']:.3f} + "
#       f"{b['Vehicles_thousands']:.3f}*Vehicles_thousands + "
#       f"{b['ColdIndex_days']:.3f}*ColdIndex_days + "
#       f"{b['Advertising_k']:.3f}*Advertising_k + "
#       f"{b['Shops']:.3f}*Shops + "
#       f"{b['MedianIncome_k']:.3f}*MedianIncome_k")

# coef_table = pd.DataFrame({
#     "term": mlr.params.index,
#     "estimate": mlr.params.values,
#     "std_error": mlr.bse.values,
#     "t_value": mlr.tvalues.values,
#     "p_value": mlr.pvalues.values
# })
# coef_table


## 6) Compare models

In [None]:

X_base = sm.add_constant(df[["Vehicles_thousands", "ColdIndex_days"]])
base = sm.OLS(y, X_base).fit()

print(base.summary())



## 7) Residual diagnostics

In [None]:

fitted = mlr.fittedvalues
resid = mlr.resid

plt.figure()
plt.scatter(fitted, resid, s=30)
plt.axhline(0, linewidth=1)
plt.xlabel("Fitted values"); plt.ylabel("Residuals")
plt.title("Residuals vs Fitted (MLR)"); plt.show()


## 8) Multicollinearity (VIF)

In [None]:

X_vif = df[predictors].values
vif_vals = [variance_inflation_factor(X_vif, i) for i in range(X_vif.shape[1])]
pd.DataFrame({"predictor": predictors, "VIF": np.round(vif_vals, 2)})


## 9) Predict a new market

In [None]:

new_market = pd.DataFrame([{
    "Vehicles_thousands": 55,
    "ColdIndex_days": 18,
    "Advertising_k": 30,
    "Shops": 35,
    "MedianIncome_k": 52
}])


# Make sure a constant is added, no guessing
X_new = sm.add_constant(new_market, has_constant='add')

# Reorder/align to the columns the model expects
X_new = X_new[mlr.model.exog_names]

pred = mlr.predict(X_new)
float(pred)


## 10) Exercises


1. Remove one predictor with the highest VIF and re-fit. What changes in AIC/BIC and R²?  
2. Add an interaction `Vehicles_thousands * ColdIndex_days`. Does the model improve?  
3. Write the final model equation and briefly interpret each coefficient in plain English.
