## Logistic Regression from scratch (Binary Classification)
Implementation of Logistic Regression using Gradient Descent.

#### Sources:

[Cost Function and Gradient Descent](https://www.coursera.org/learn/machine-learning/supplement/0hpMl/simplified-cost-function-and-gradient-descent)


#### Optimization algorithms:
[[advanced-optimization]](https://www.coursera.org/learn/machine-learning/lecture/licwf/advanced-optimization)
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS


#### Useful Links:

[Logistic regression (binary) - computing the gradient](https://www.youtube.com/watch?v=hWLdFMccpTY)

In [1]:
import pandas as pd
import numpy as np

### Hypothesis function

Logistic Regression hypothesis function is defined as:

$$h_\theta(x) = g(\theta^Tx),$$

where function g is the Sigmoid function and is defined as:

$$g(z)= \frac{1}{1+e^{-z}}$$


We are going to implement sigmoid function.

In [2]:
def sigmoid_function(z):
    return 1/(1+(np.exp(-z)))
    

### Properties of Sigmoid function:
[[decision-boundary]](https://www.coursera.org/learn/machine-learning/supplement/N8qsm/decision-boundary)

$$z = 0 \implies e^{-z} = 1 \implies  g(z) \rightarrow 0.5$$

$$z \rightarrow \infty, \implies e^{-z} \rightarrow 0 \implies g(z) \rightarrow 1$$

$$z \rightarrow -\infty, \implies e^{-z} \rightarrow 0 \implies g(z) \rightarrow 0$$

### Test Sigmoid Function

In [62]:
print('z=0, g(z) = {}'.format(sigmoid_function(0)))
print('z=20, g(z) = {}'.format(sigmoid_function(20).round(8)))
print('z=-20, g(z) = {}'.format(sigmoid_function(-20).round(8)))

z=0, g(z) = 0.5
z=20, g(z) = 1.0
z=-20, g(z) = 0.0


### Vectoried Cost Function:
[[cost-function]](https://www.coursera.org/learn/machine-learning/supplement/bgEt4/cost-function)

We will try to minimise the cost function.

$$h=g(Xθ)$$

$$J(θ)=\frac{1}{m}⋅(−y^{T}log(h)−(1−y)^{T}log(1−h))$$

In [63]:
def cost_function(x, y, theta):
    m = y.shape[0]
    z = x @ theta
    h = sigmoid_function(z)
    print(np.log(h))
    J = (1/m) * ( (-y.T @ np.log(h)) - ((1-y).T @ np.log(1-h)) )
    return J
    

In [64]:
cost_function(x, y, theta)

[[-0.69314718]
 [-0.69314718]
 [-0.69314718]]


array([[0.69314718]])

### Test Cost Function

In [65]:
df = pd.DataFrame({'x1': [2,3,5], 'x2': [3,4,8], 'y': [1,2,3]})

y = df[['y']].to_numpy()
x = df[['x1', 'x2']].to_numpy()

# add column of ones for theta0 (intercept) term
x = np.append(np.ones((3, 1)), x, axis=1)

# Construct initial theta vector after adding a column of ones to x
theta = np.zeros((x.shape[1], 1))

df.head()

Unnamed: 0,x1,x2,y
0,2,3,1
1,3,4,2
2,5,8,3


In [66]:
theta

array([[0.],
       [0.],
       [0.]])

### Gradient Descent:
[[gradient-descent]](https://www.coursera.org/learn/machine-learning/supplement/0hpMl/simplified-cost-function-and-gradient-descent)

We will use Gradient Descent to minimise the cost function.

General form of gradient descent:
\begin{align}
Repeat \{ \\
        θ_j := θ_j − \alpha\frac{∂}{∂θ_j}J(θ) \\
\}
\end{align}


Vectorised implementation after workig out the partial derivatives:

\begin{align}
θ := θ− \frac{α}{m} X^T(g(Xθ) - y)
\end{align}
