## Creating a Multivariable Linear Regression Model

## Setup

Let's install the packages we'll need for implementing our multivariable linear regression model

In [1]:
%pip install numpy
%pip install matplotlib

Note: you may need to restart the kernel to use updated packages.
Collecting matplotlib
  Using cached matplotlib-3.8.4-cp312-cp312-win_amd64.whl.metadata (5.9 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Using cached contourpy-1.2.1-cp312-cp312-win_amd64.whl.metadata (5.8 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Using cached matplotlib-3.8.4-cp312-cp312-win_amd64.whl (7.7 MB)
Using cached contourpy-1.2.1-cp312-cp312-win_amd64.whl (189 kB)
Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Installing collected packages: cycler, contourpy, matplotlib
Successfully installed contourpy-1.2.1 cycler-0.12.1 matplotlib-3.8.4
Note: you may need to restart the kernel to use updated packages.


## Importing Packages

Let's start by importing some of the packages we will need

In [4]:
import copy, math
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

## Notation

Here is a summary of some of the notation you will encounter, updated for multiple features.

### General Notation

| Symbol | Description | Python Equivalent |
| ------ | ----------- | ----------------- |
| $a$ | Scalar (non-bold) | N/A |
| $\mathbf{a}$ | Vector (bold) | N/A |
| $\mathbf{A}$ | Matrix (bold, capital) | N/A |

### Regression Specific Notation

| Symbol | Description | Python Equivalent |
| ------ | ----------- | ----------------- |
| $\mathbf{X}$ | Training example matrix | `X_train` |
| $\mathbf{y}$ | Training example targets | `y_train` |
| $\mathbf{x}^{(i)}$, $y^{(i)}$ | $i^{th}$ training example | `X[i]`, `y[i]` |
| $m$ | Number of training examples | `m` |
| $n$ | Number of features in each example | `n` |
| $\mathbf{w}$ | Parameter: weight | `w` |
| $b$ | Parameter: bias | `b` |
| $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ | Model evaluation at $\mathbf{x}^{(i)}$ parameterized by $\mathbf{w}, b$: $f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b$ | `f_wb` |


## What Are We Building?

We are building a multi variable linear regression model for predicting housing prices. Our model will use many 'features' of a house to try and predict its cost. We will use features such as Size (sq ft), number of bedrooms, number of floors, and age of home in order to predict price. This is a very simple example but could be extended to have real world usage if you wanted to! Here is a table of example training data:

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  


Lets get started by creating our training data sets x_train and y_train

In [5]:
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

## Understanding X_Train

In the code above, X_train is a numpy matrix. Each row in the matrix consists of 4 values. Each of the 4 values in the row represent the features of our model which are Size (sq ft), Number of Bedrooms, Number of Floors, and Age of Home respectively. 

Each row of the matrix is one training example for our model.

When you have $m$ training examples (3 training examples in our case) and each training example has $n$ number of features (4 features in our case) we can say that our matrix $\mathbf{X}$ is a matrix with dimensions ($m$, $n$) ($m$ rows, $n$ columns).

$$\mathbf{X} = 
\begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ 
 x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\
 \cdots \\
 x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} 
\end{pmatrix}
$$
notation:
- $\mathbf{x}^{(i)}$ is vector containing example i. $\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \cdots,x^{(i)}_{n-1})$
- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element.  

## Understanding Y_Train

y_train is a straight forward numpy array. It contains the actual price of each training example in increments of 1,000 dollars.

Lets print out the shape and type of both X and Y to get a better understanding of our dataset.

In [7]:
# data is stored in numpy array/matrix
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(X_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

X Shape: (3, 4), X Type:<class 'numpy.ndarray'>)
[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]
y Shape: (3,), y Type:<class 'numpy.ndarray'>)
[460 232 178]


## Parameters of our multivariable linear regression model

In our single variable linear regression we used 2 parameters, w and b to adjust our model to fit out data. With our multivariable linear regression model we will do something similar. The major difference is that instead of there being one value for w, we now have a vector of values for w. Each element in our vector is associated with one of our features in our model.

Our vector w will contain n elements where n is the number of features our model will utilize.

* $\mathbf{w}$ is a vector with $n$ elements.
  - Each element contains the parameter associated with one feature.
  - in our dataset, n is 4.
  - notionally, we draw this as a column vector

$$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$

* $b$ is a scalar parameter.  

Our parameter b will remain a scalar value as it was in the regular linear regression model, no changes here!

Lets initialize w and b with some values. For demonstration, they will be close to the optimal values



In [8]:
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

w_init shape: (4,), b_init type: <class 'float'>


## Prediction with Multiple Variables

The model's prediction is given by the following linear formula:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

Let's try and implement the prediction based on the linear formula and through vectorization.

## Single Prediction With Looping (no vectorization)

The easiest approach for this would be to simply iterate over each feature and multiply it by its parameter. We keep a running sum of these products as we loop. Once finished with the multiplication we can simply added the bias parameter $b$ at the end.

In [9]:
def predict_single_loop(x, w, b): 
    """
    single predict using linear regression
    
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters    
      b (scalar):  model parameter     
      
    Returns:
      prediction (scalar):  prediction
    """
    n = x.shape[0]
    prediction = 0

    for i in range(n):
        prediction_i = x[i] * w[i]
        prediction = prediction + prediction_i
    
    prediction = prediction + b

    return prediction

In [10]:
# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction
f_wb = predict_single_loop(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104    5    1   45]
f_wb shape (), prediction: 459.9999976194083


## Single Prediction Vectorization

Our previous prediction function does work but with lots of data it may become very slow. In order to speed up our predictions we can take advantage of vector operations. If you refer back to the second function definition above, you'll see that in vector form all we need to do is compute the dot product of w and x and add the bias parameter to the result. Numpy provides a useful function for computing the dot product of 2 vectors called np.dot().

`np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)]

In [11]:
def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    prediction = np.dot(x, w) + b 
    return prediction

In [12]:
# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction
f_wb = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104    5    1   45]
f_wb shape (), prediction: 459.9999976194083


The output of our two predict functions are exactly the same but with vectorization we actually have more terse code and it is significantly faster than looping. It's a win win! The code is actually so simple that instead of calling a seperate predict function we can now instead just directly use np.dot() to make our predictions in other routines.

## Cost Function with Multiple Variables

Now that we understand prediction with multiple variables, we can implement the cost function for it as well.

The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 


$\mathbf{w}$ and $\mathbf{x}^{(i)}$ are vectors rather than scalars in order to support multiple features.

Let's implement this below, making sure to utilize np.dot() in favor of a loop for computing prediction. For now, we will still utilize the loop over all $m$ training examples when computing the cost function.

In [15]:
def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0

    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b
        cost = cost + (f_wb_i - y[i])**2

    cost = cost / (2 * m) 

    return cost 

## Compute Cost Vectorized

Similarly to how we vectorized our prediction, we can also vectorize the cost function so that we can get rid of the for loop from i to m. This again will make the code simpler and faster.

In [19]:
def compute_cost(X, y, w, b):
    """
    Compute cost using vectorized operations.
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    predictions = X.dot(w) + b
    errors = predictions - y
    cost = np.sum(errors ** 2) / (2 * m)
    return cost

In [20]:
# Compute and display cost using our pre-chosen optimal parameters. 
cost = compute_cost(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904045996674e-12


## Gradient Descent With Multiple Variables

