<a href="https://colab.research.google.com/github/WayneGretzky1/CSCI-4521-Applied-Machine-Learning/blob/main/3_3_regression_with_advertising_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Libraries and Dataset loading

In [None]:
import torch
torch.set_printoptions(sci_mode=False)  #Removes printing in scientific notation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data_pd = pd.read_csv("https://raw.githubusercontent.com/be-prado/csci4521/refs/heads/main/Advertising.csv")

In [None]:
print(data_pd.head())

   Unnamed: 0     TV  radio  newspaper  sales
0           1  230.1   37.8       69.2   22.1
1           2   44.5   39.3       45.1   10.4
2           3   17.2   45.9       69.3    9.3
3           4  151.5   41.3       58.5   18.5
4           5  180.8   10.8       58.4   12.9


## Fitting data (no normalization)

In [None]:
# TODO: create tensors for the TV budget data and the sales data


In [None]:
# TODO: create a scatter plot of them


Lets implement gradient descent!

In [None]:
# Define learning rate
lr = 0.001 #Larger lr goes faster, but what about too large an lr?

# Initialize parameters (initializing them small usually helps)
# making sure to tell pytorch these are the variables we will be taking the gradient over
m = torch.Tensor([0.1]).float()
m.requires_grad = True
b = torch.Tensor([0.1]).float()
b.requires_grad = True

# Gradient descent update loop
for epoch in range(2000):
  # forward pass
  y_pred = m*x_pt+b
  # compute loss
  loss_tensor = (y_pred - y_pt)**2
  loss = loss_tensor.mean()
  # compute the gradient of our loss function
  loss.backward()
  with torch.no_grad(): # don't need to include this computation in our gradient
    # update our parameters by using the learning rate to define step size
    m -= m.grad*lr
    b -= b.grad*lr
    # reset our gradients
    m.grad.zero_()
    b.grad.zero_()

  if epoch%200==0:
    print("m=",m.item(),"b=",b.item(),"loss=",loss.item())

Oops, this breaks. Even with a small learning rate! (Try it.) The issue is that we are not normalizating our data. Gradient descent does much better with normalized data input.

## Line fitting (with normalization)

Looking at the scatterplot above, we can see that TV advertising spending goes up to about 300, and the sales go up to about 30.

So, to get our results to have a magnitude of around 1, we can do a very simple "Order of magnitude" normalization where we divide each term by a nice round number near the maximum value.

In [None]:
# TODO: scale your TV budget tensor and your sales tensor so the values around a magnitude of 1


In [None]:
# TODO: copy and paste the same gradient descent code we used above but make sure to use our normalized tensors this time
# make sure to use a small learning rate!



Hopefully you see that's much better! We can now see that we have an estimate for the slope $m$ and the y-intercept $b$ that has a relatively low loss.

We can recompute the loss, just to double-check:

In [None]:
# TODO: compute total mean squared error with the parameters our training loop ended with


We can also plot the line we learned to see if it fits the data well.

In [None]:
sns.scatterplot(x=x_pt,y=y_pt) # make sure these variables match your normalized tensors
y_intercept =  b.detach().numpy()
slope =  m.detach().numpy()
plt.plot([0,1],[y_intercept,slope+y_intercept], color="red")

## Structured Code

The above code works well, but wasn't very structured. Let's fix that.

First, we can put the linear model in its own function and make sure the parameters are stored in an array or tensor to structure the code better:

In [None]:
def linearModel(params, inputs):
  y_pred = params[0]*inputs + params[1]
  return y_pred

We can also encapsulate the gradient descent into a function.

In [None]:
def gradDec(model, n_params, x, y, lr=0.01, n_epochs=2000, print_rate=200):

  params = 0.1*torch.rand(n_params).float() #Random inital paramaters
  params.requires_grad = True

  for epoch in range(n_epochs):
    y_pred = model(params, x)
    loss_tenor = (y_pred - y)**2
    loss = loss_tenor.mean()
    loss.backward()
    with torch.no_grad():
      params -= lr*params.grad
      params.grad.zero_()
    if epoch%print_rate==0:
      print("epoch:",epoch,"loss=",loss.item())

  return params

Now we can train everything in one line of code:

In [None]:
# TODO: use our new helper functions to redo the same analysis as above


---


# Multiple Linear Regression

Let's go through the same exercise again, but this time with multiple linear regression where we try to predict the model output as a linear combination of several input features.

In [None]:
data_df = pd.read_csv("https://raw.githubusercontent.com/be-prado/csci4521/refs/heads/main/Advertising.csv")

## Corelation

Before building the model, let's first visualize which variables are correlated with each other, and which variables are correlated with the output (sales).

In [None]:
corr = data_df[["TV","radio","newspaper","sales"]].corr()
sns.heatmap(corr, cmap = "coolwarm") #Try: RdBu_r vs. coolwarm

We can already see that TV and radio are likely more useful than newspaper advertising (strong positive correlations).

### Loading Data into Tensors

We need to move the dataset we care about from Pandas dataframes into PyTorch tensors.

In [None]:
features=["TV","radio","newspaper"]
# TODO: load the values from "TV","radio","newspaper" into x and values from "sales" into y


A single row in the input feature tensor *should* now be a vector with 3 values (TV, radio, and newspaper spending):

In [None]:
# TODO: print out a single row of x


By using the colon symbol `:` when accessing an array, we can pull out an entire column. Print out the TV advertising spending (column 0).

In [None]:
# TODO: print out a single column of x


## Normalizing the tensors

As we saw above, we need to normalize the tensors to get good results. Above, we did a simple order of magnitude normalization. Here, I'll normalize based on the mean of the data. (There are lots of ways to normalize data, and there is no best method. Different ideas can make sense for different datasets. Just make sure the numbers all end up with magnitudes around 1.)

PyTorch tensors have a `.mean()` method to get the mean of the tensor. We want the mean along the columns (e.g., a separate mean for each feature), so we need to pass in `dim=0` to the mean function so it knows which direction to take the mean over.

In [None]:
x_mean = x.mean(dim=0)
x_mean

We can shift and divide by the mean for both the inputs x and outputs y. This is the same z-score normalization we've used before. This time with tensors.

In [None]:
x_norm = (x-x_mean)/x_mean
y_mean = y.mean(dim=0)
y_norm = (y-y_mean)/y_mean

Our `multiLinearModel` predicts the output as a scaled sum of all three input features (plus a constant shift):

In [None]:
def multiLinearModel(params, inputs):
  y_pred = params[0]*inputs[:,0] + params[1]*inputs[:,1] + params[2]*inputs[:,2] + params[3]
  return y_pred

With our new model and structured code from above, we can train everything in one line.

In [None]:
# TODO: use our new multilinear model helper function to train on our data again


Oops, the `newspaper` term is almost 0. We may want to discard it from our model and train again to reduce the risk of overfitting!

## Multi-linear regression with interaction terms

Lastly, we may want to try a model where we consider interaction effects between the terms.

In the model below we remove the term relating to newspaper advertising (see above). But we add a new term of `TV * Radio` ... this captures the intreaction between the type two avertisement. You can explore different interaction terms but the correlation heatmap can help.

In [None]:
def multiLinearModel_withInteraction(params, inputs):
  y_pred = params[0]*inputs[:,0] + params[1]*inputs[:,1] + params[2]*(inputs[:,0]*inputs[:,1]) + params[3]
  return y_pred

In [None]:
# TODO: use our new multilinear with interaction model helper function to train on our data again


Notice that the loss of the model with this interaction term is noticeably better than the previous model.

For a gut-check, we can apply the model to the first five items:

In [None]:
for item  in range (5):
  print(0.3581*x_norm[item][0] + 0.2699*x_norm[item][1]+0.1643*x_norm[item][0]*x_norm[item][1]-0.0027)

And checking our prediction:

In [None]:
for item  in range (5):
  print(y_norm[item])

Not a bad match! Though for a real gut-check we would need to undo the normalization so we can report these values in the original units.

