## Multiple Feature Linear Regression

The problems in which there are more than one independent feature (variables) that can be used to predict the output.

Referring the house price prediction problem:

|Size in sqft   |No of bedroom  |No. of floors  |Age of home in years   | Price ($) in $1000's  |
|:-------------:|:-------------:|:-------------:|:---------------------:|:---------------------:|
|$X_1$          |$X_2$          |$X_3$          |$X_4$                  |                       |
|2104           |5              |1              |45                     |460                    |
|1416           |3              |2              |40                     |232                    |
|852            |2              |1              |36                     |178                    |

- $x_j$ = $j^{th}$ feature
- n = number of features
- $x^{(i)}$ - features of $i^{th}$ training example i.e. $(x^{(i)}_0, x^{(i)}_1, \cdots,x^{(i)}_{n-1})$
- $x_{j}^{(i)}$ - value of features j in $i^{th}$ training example

### Updated model prediction function with multiple variables

The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

To demonstrate the dot product, we will implement prediction using (1) and (2).

In [1]:
import copy
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Setting the input training data
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

### Implementation

In [3]:
def multi_feature_model_function(x, w, b):
    """
    Model prediction function using the dot product of X and w

    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters    
      b (scalar):  model parameter     
      
    Returns:
      p (scalar):  prediction
    """

    """
    Slower approach
    Iterates m times to calculate product of each feature x and w one by one
    adding it to the prediction
    
    m = x.shape[0]
    p = np.zeros(m)
    
    for i in range(m):
        n = x[i].shape[0]
        p_i = 0
        for j in range(n):
            p_ij = x[i,j] * w[j]
            p_i = p_i + p_ij

        p[i] = p_i + b
    """
    
    """
    Faster approach using prebuilt np.dot() function.
    It uses hardware parallel processing to compute the products of each item in the vector
    and then sum it using efficient algorithm
    """
    p = np.dot(x, w) + b

    return p

In [4]:
# Initialize w and b
b_init = 785.1811367994083
w_init = np.array([0.39133535, 18.75376741, -53.36032453, -26.42131618])

In [5]:
# Make a prediction from the training data
f_wb = multi_feature_model_function(X_train, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

f_wb shape (3,), prediction: [459.99999762 231.99999837 177.99999899]


### Compute Cost With Multiple Variables

The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 


In contrast to single feature, $\mathbf{w}$ and $\mathbf{x}^{(i)}$ are now vectors rather than scalars supporting multiple features.

In [6]:
def compute_cost(X, y, w, b):
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """

    cost = 0
    m = X.shape[0]
    y_predictions = multi_feature_model_function(X, w, b)

    for i in range(m):
        diff = y_predictions[i] - y[i]
        cost = cost + diff ** 2
    cost = cost / (2 * m)
    
    return cost

In [7]:
# Computing the cost for the initialized w and b

cost = compute_cost(X_train, y_train, w_init, b_init)
print(f"Cost of model function when \nw = {w_init} and \nb = {b_init} is \n{cost}")

Cost of model function when 
w = [  0.39133535  18.75376741 -53.36032453 -26.42131618] and 
b = 785.1811367994083 is 
1.5578904045996674e-12


### Gradient Descent With Multiple Variables
Equation of the Gradient descent for multiple variables: 
$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value

In [8]:
def compute_gradient(X, y, w, b):
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m, n = X.shape
    dj_dw = np.zeros(n)
    dj_db = 0

    y_predictions = multi_feature_model_function(X, w, b)
    for i in range(m):
        for j in range(n):
            dj_dw[j] = dj_dw[j] + ((y_predictions[i] - y[i]) * X[i][j])
        dj_db = dj_db + (y_predictions[i] - y[i])

    dj_dw = dj_dw / m
    dj_db = dj_db / m

    return dj_dw, dj_db

In [9]:
dw_init, db_init = compute_gradient(X_train, y_train, w_init, b_init)
print(f"Gradient of w = {w_init} \nb = {b_init} is \ndj_dw = {dw_init} \ndj_db = {db_init}")

Gradient of w = [  0.39133535  18.75376741 -53.36032453 -26.42131618] 
b = 785.1811367994083 is 
dj_dw = [-2.72623574e-03 -6.27197255e-06 -2.21745574e-06 -6.92403377e-05] 
dj_db = -1.6739251122999121e-06


In [10]:
def gradient_descent(X, y, w_in, b_in, alpha=0.001, num_iters=100000):
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
    """

    m, n = X.shape
    w = copy.deepcopy(w_in)
    b = b_in

    for i in range(num_iters):
        dj_dw, dj_db = compute_gradient(X, y, w, b)
        w = w - alpha * dj_dw
        b = b - alpha * dj_db

    return w, b

In [35]:
temp_w = np.zeros(w_init.shape[0])
temp_b = 0

# Learning rate
alpha = 5.0e-7

# Number of iterations
num_iters = 10 ** 3

# Finding final w, b using the gradient descent algorithm
w_final, b_final = gradient_descent(X_train, y_train, temp_w, temp_b, alpha=alpha, num_iters=num_iters)

print(f"Final w, b for learning rate {alpha:0.10f} and {num_iters} iterations is \nw = {w_final}\nb = {b_final:0.2f}\n")

print(f"Predictions after setting final w and b\n{multi_feature_model_function(X_train, w_final, b_final)}")

Final w, b for learning rate 0.0000005000 and 1000 iterations is 
w = [ 0.20396569  0.00374919 -0.0112487  -0.0658614 ]
b = -0.00

Predictions after setting final w and b
[426.18530497 286.16747201 171.46763087]
