# Advertising Spending and Sales

The Advertising dataset is a popular dataset often used for introductory regression analysis. It contains information about advertising spending on three different media channels—TV, radio, and newspaper—along with the corresponding sales figures. The goal is to predict sales based on the ad spend in these channels.

# Setup Workspace

Import packages.

In [1]:
import pandas as pd
import seaborn as sns
from sklearn.metrics import root_mean_squared_error
from sklearn.model_selection import train_test_split
from statsmodels.formula.api import ols

# Load Data

Load data from CSV file.

In [2]:
path = "https://raw.githubusercontent.com/kbrennig/MODS_WS24_25/refs/heads/main/Week_4_lecture/Advertising.csv"
data = pd.read_csv(path)

# Modeling

Make random training/test split.

In [3]:
data_training, data_test = train_test_split(data, test_size=0.2, random_state=42)

## TV as Predictor

Fit simple linear regression model to training data with Sales as the response variable and TV as the only predictor.

In [None]:
model_tv = ols(formula="Sales ~ TV", data=data_training).fit()
print(model_tv.summary(slim=True))

Use fitted model to make predictions on test data.

In [5]:
data_test["predictions_tv"] = model_tv.predict(data_test)

Calculate RMSE on test set.

In [None]:
rmse = root_mean_squared_error(data_test["Sales"], data_test["predictions_tv"])
print(rmse)

## Radio as Predictor

In [None]:
model_radio = ols(formula="Sales ~ Radio", data=data_training).fit()
print(model_radio.summary(slim=True))

In [None]:
data_test["predictions_radio"] = model_radio.predict(data_test)
rmse = root_mean_squared_error(data_test["Sales"], data_test["predictions_radio"])
print(rmse)

## Newspaper as Predictor

In [None]:
model_newspaper = ols(formula="Sales ~ Newspaper", data=data_training).fit()
print(model_newspaper.summary(slim=True))

In [None]:
data_test["predictions_newspaper"] = model_newspaper.predict(data_test)
rmse = root_mean_squared_error(data_test["Sales"], data_test["predictions_newspaper"])
print(rmse)

## All Three Predictors

In [None]:
model_all = ols(formula="Sales ~ TV + Radio + Newspaper", data=data_training).fit()
print(model_all.summary(slim=True))

In [None]:
correlations = data_training.corr(method="pearson")
sns.heatmap(correlations, cmap="vlag", vmin=-1, vmax=1, annot=True)

In [None]:
data_test["predictions_all"] = model_all.predict(data_test)
rmse = root_mean_squared_error(data_test["Sales"], data_test["predictions_all"])
print(rmse)