## Day2
- Gradient Descent
- Parameter Estimation for Linear Regression
- Parameter Estimation for Logistic Regression

One of the commonly used techniques is called gradient descent. First let's understand how gradient descent works using an example of a simple example of a polynomial minimization

Assume I wanted to solve the following problem:
**$argmin_x f(x) = x^2+4x+18$**

The gradient descent algorithm works as below:
$X_{new} := X_{old}-\eta\frac{d(f(x)}{d(x)}$

In [1]:
import numpy as np
def grad(x):
    return 2*x+4
def f(x):
    return x**2+4*x+18
X=0
eta=0.01
for i in range(5):
    gradient=grad(X)
    Xnew=X-eta*gradient
    X=Xnew
    func=f(X)
    print(f"Value of function is {func}, value of gradient is {gradient}, value of X is {X}")
    

Value of function is 17.8416, value of gradient is 4, value of X is -0.04
Value of function is 17.68947264, value of gradient is 3.92, value of X is -0.07919999999999999
Value of function is 17.543369523456, value of gradient is 3.8416, value of X is -0.117616
Value of function is 17.403052090327144, value of gradient is 3.764768, value of X is -0.15526368000000002
Value of function is 17.268291227550186, value of gradient is 3.68947264, value of X is -0.1921584064


In [2]:
class GradientDescent():
    def __init__(self,grad,f):
        self.grad=grad
        self.f=f
    def estimate(self,eta,n_iter,x_0):
        x=x_0
        for i in range(n_iter):
            gradient=self.grad(x)
            x_new=x-eta*gradient
            x=x_new
        result={'function_value':self.f(x),'parameter_value':x,'gradient_value':self.grad(x)}
        return result

In [3]:
gradient_descent=GradientDescent(grad,f)

In [4]:
gradient_descent.estimate(eta=0.01,n_iter=1000,x_0=0)

{'function_value': 14.0,
 'parameter_value': -1.9999999966340651,
 'gradient_value': 6.731869728326956e-09}

Let's apply gradient descent to solve the parameter estimation problem for linear regression, result 21 (https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/readings/L02%20Linear%20Regression.pdf) illustrates how this can be done:

Gradient of RSS=$-2X^T(XW-y)$
So, the update rule will be:
- $W_{new}=W_{old}-\eta X^T(XW_{old}-y)$

Clearly we will need to initialize the $W$ matrix, we will do this initially by making all the entries as zeros

In [5]:
import pandas as pd
data=pd.read_csv("data/regression.csv")

In [6]:
X=data[['cylinders']].copy()
X['intercept']=1
X=X[['intercept','cylinders']]
X=X.values
y=data['mpg']
y=y.values

In [7]:
class LinearRegression:
    def __init__(self,X,y,lr=0.01):
        self.X=X
        self.y=y
        self.lr = lr
    
    def fit(self):
        W=np.ones(self.X.shape[1])
        for i in range(1000):
            y_pred=np.matmul(self.X,W)
            resid=self.y-y_pred
            grad=np.matmul(self.X.T,resid)
            grad=(-2*grad)/len(self.y)
            W_new=W-self.lr*grad
            W=W_new
        return W

In [8]:
mod = LinearRegression(X,y)
mod.fit()

array([35.52255148, -2.31847888])

## Beta estimation Logistic Regression

The loss function for logistic regression can be written as:

$Loss = -1[ylog(p)+ (1-y)log(1-p)]$

The gradient of the loss function can be written as:

$X^T(p-y)$

In [9]:
cls=pd.read_csv("./data/classification.csv")

In [10]:
cls.head(2)

Unnamed: 0,No_pregnant,Plasma_glucose,Blood_pres,Skin_thick,Serum_insu,BMI,Diabetes_func,Age,Class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0


$log(p)/log(1-p)=\beta_0+\beta_1NoPreg$

In [11]:
W=np.ones(2)
X=cls[['No_pregnant']].copy()
X['intercept']=1
X=X[['intercept','No_pregnant']].values
y=cls['Class'].values

In [12]:
z=np.matmul(X,W) ## b0+b1X1

In [13]:
p=1/(1+np.exp(-z))

In [14]:
def grad(X,resid):
    return np.matmul(X.T,resid)

In [15]:
lr=0.01
W=np.ones(2)
for i in range(1000):
    z=np.matmul(X,W)
    p=1/(1+np.exp(-z))
    resid=p-y
    g=grad(X,resid)
    g=g/len(y)
    W_new=W-lr*g
    W=W_new
W

array([-0.43042259,  0.02710281])