# Linear Regression Derivation and Python Implementation

Linear regression can be expressed as $y_k=w_0\times 1 + w_1x_{n1}+...+w_{m-1}x_{k,m-1}$, <br>
where the training dataset $( {X}, {Y})$ has $n$ examples and $m$ features (where the first feature is 1).
Equivalent matrix form is $\hat{ {Y}}= {X} {W}$, where $X$ is a $n \times m$ matrix, and $ {W}$ is a $m \times 1$ vector of parameters and $\hat{ {Y}}$ is the vector of predicted values y. Note that the first element of $W$ which is $w_0$ in an intercept.

The cost function is $$J( {W}) = \frac{1}{2n} \sum\limits_{i=0}^n(\hat{y_i} - y_i)^2 \tag{1}$$
In the matrix form the cost function is $$J( {W})=\frac{1}{2n}||\hat{ {Y}} -  {Y}||_2^2=\frac{1}{2n}|| {X} {W}- {Y}||_2^2 = \frac{1}{2n}( {X} {W}- {Y})^T( {X} {W}- {Y})$$

In order to find $ {W}$, we need to calculate the derivative of $J$ with respect to $ {W}$: $$\frac{\partial{J( {W})}}{\partial{ {W}}}=\nabla_w(J( {W}))$$ and compare it to zero. 


In order to find $ {W}$, we need to calculate the derivative of $J$ with respect to $ {W}$: $$\frac{\partial{J( {W})}}{\partial{ {W}}}=\nabla_w(J( {W}))$$ and compare it to zero. 

$$\frac{\partial{J( {W})}}{\partial{ {W}}}=\nabla_w(J( {W}))$$
$$J( {W}) = \frac{1}{2n}( {X} {W}- {Y})^T( {X} {W}- {Y})$$
$$\frac{\partial{J( {W})}}{\partial{ {W}}} = \frac{1}{2n}\frac{\partial{(( {X} {W}- {Y})^T( {X} {W}- {Y}))}}{\partial{ {W}}}$$
$$ \frac{1}{2n}\frac{\partial{( {X^T} {W^T}{W}{X} - {Y^T}{X}{W} - {W^T}{X^T}{Y} + {Y^T}{Y})}}{\partial{ {W}}}$$

Using $$ W^TX^TY=(W^TX^TY)^T=Y^TXW $$, if $W^TX^TY$ evaluates to a scalar.

$$ \frac{1}{2n}\frac{\partial{(  {X^T} {W^T}{W}{X} - 2 {W^T}{X^T}{Y} + {Y^T}{Y})}}{\partial{ {W}}}$$


1. $$ \frac{\partial{(W^TXX^TW)}}{\partial{ {W}}}=2X^TXW $$  
2. $$ \frac{\partial{(2W^TX^TY)}}{\partial{ {W}}}=2X^TY $$ 
3. $$ \frac{\partial{(YY^T)}}{\partial{ {W}}}=0 $$ 

Using equation 1, 2, and 3  we get

$$ \frac{1}{2n}( 2{X^T}{X} {W} - 2 {X^T}{Y} ) $$
Comparing the equation to 0, we get
$$ \frac{1}{2n}( 2{X^T}{X} {W} - 2 {X^T}{Y} )  = 0 $$

$$ {X^T}{X} {W} = {X^T}{Y} $$

Multiplying both sides with $ (XX^T)^{-1} $, we get

$$ {(X^TX)}^{-1}{X^T}{X}{W} = (X^TX)^{-1}{X^T}{Y} $$
$$ W = ({X^TX})^{-1}{X^T}{Y} $$



## Closed Form of Linear Regression coding


In [1]:
import numpy as np

Y = np.array([13, 8, 11, 2, 6])
X = np.array([[1, 3, 5],[1, 6, 7],[1, 7, 8],[1, 8, 9],[1, 11, 12]])
W = np.array([0, 0, 0])
J = 0.0

"""
Using above given Y, X and W to calculate final W and cost J 
using the above  written derivation
"""
# ====================== CODE HERE ======================  
n = len(Y) # or 4 in this case
X_T = np.transpose(X) # for X transpose
W =  np.linalg.inv(X_T @ X) @ X_T @ Y # using the formula that derived above, and using @ for matrix multiplication
J = (1/(2*n)) * np.linalg.norm(X @ W - Y) ** 2 #using the formula that is given in the question for calculation of cost function 
#============================================================

print('Please copy the folowing result to Question 1 "(sumW = )"')
print(np.round(np.sum(W),2))
print('Please copy the folowing result to Question 1 "(J = )"')
print(np.round(J,2))


Please copy the folowing result to Question 1 "(sumW = )"
8.21
Please copy the folowing result to Question 1 "(J = )"
3.7


## Linear Regression with Gradient Descent

In [2]:
import numpy as np
import pandas as pd

dataset = pd.read_csv("diabetes.csv")
# selecting features
X_train = np.array(dataset[["age", "sex", "bmi", "bp"]])

# normalising numerical features
X_train = (np.max(X_train,axis=0)-X_train)/(np.max(X_train,axis=0)-np.min(X_train,axis=0))

# adding '1' column for the intercept
X_train = np.concatenate((np.ones(X_train.shape[0]).reshape(X_train.shape[0],1), X_train), axis=1)

# forming target
Y_train = np.array([dataset["target"]])
Y_train = Y_train.reshape(X_train.shape[0],1)

In [3]:
def linreg_compute_cost(X, Y, W): 
    """
    Args:
      X (ndarray (n,m)): Data, n examples with m features
      y (ndarray (n,1)) : target values
      w (ndarray (m,1)) : model parameters  
      
    Returns:
      J (scalar): cost
    """
    J = 0
# ====================== CODE HERE ======================  
    n = len(Y) # or 442 in this case
    J = (1/(2*n)) * np.linalg.norm(X @ W - Y) ** 2 

# ============================================================

    return J

In [5]:
def linreg_compute_gradient(X, Y, W): 
    """
    Computes the gradient for linear regression 
    Args:
      X : Data,n examples with n features
      Y : n target values
      W : m model parameters 
      
    Returns:
      dJ_dW : The gradient of the cost w.r.t. the parameters W, m values. 
    """
    dJ_dW = 0

# ====================== CODE HERE ======================  

    n = len(Y) 
    X_T = np.transpose(X) 
    dJ_dW = (1/n) * X_T  @ ((X @ W) - Y)

# ============================================================

    return dJ_dW

In [6]:

def linreg_gradient_descent(X, Y, W_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X                   : Data, n examples with m features
      Y                   : n target values
      W_in                : m initial model parameters  
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      W                   : final m values of parameters 
      J (scalar)          : final cost
      """
    W = 0
    J = 0
    
# ====================== YOUR CODE HERE ======================  
# DO NOT use any other import statements for this question
    W = W_in
    for i in range(num_iters):
      dJ_dW = gradient_function(X, Y, W)  
      #print(f"Dj_Dw[{i}] : {dJ_dW}") I wanted to verify that the values are updating
      W = W -  alpha * dJ_dW
      # print(f"W[{i}] : {W}") I wanted to verify that the values are updating
      J = cost_function(X, Y, W) 
        
    # Add the cost function to the array

# ===========================================================

    return W, J

In [8]:
# initialize parameters
initial_W = np.ones(X_train.shape[1]).reshape(5,1)

# gradient descent settings
iterations = 1000
alpha = 0.05

"""
Apply functions coded above to calculate final W and cost J
Use given datasets and parameters
"""
W = 0
# ====================== CODE HERE ======================  
W, J = linreg_gradient_descent(X_train, Y_train, initial_W, linreg_compute_cost, linreg_compute_gradient, alpha, iterations)
# ============================================================

print('Please copy the folowing result line to Question 2 "(sumW = )"')
print(np.round(np.sum(W),2))
print('Please copy the folowing result line to Question 3 "(J = )"')
print(np.round(J,2))



Please copy the folowing result line to Question 2 "(sumW = )"
72.02
Please copy the folowing result line to Question 3 "(J = )"
1923.46
