## Logistic Regression from scratch (Binary Classification)
Implementation of Logistic Regression using Gradient Descent.

#### Sources:

[Cost Function and Gradient Descent](https://www.coursera.org/learn/machine-learning/supplement/0hpMl/simplified-cost-function-and-gradient-descent)

#### Optimization algorithms:
[[advanced-optimization]](https://www.coursera.org/learn/machine-learning/lecture/licwf/advanced-optimization)
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS



In [32]:
from math import exp
import numpy as np

### Hypothesis function

Logistic Regression hypothesis function is defined as:

$$h_\theta(x) = g(\theta^Tx),$$

where function g is the Sigmoid function and is defined as:

$$g(z)= \frac{1}{1+e^{-z}}$$


We are going to implement sigmoid function.

In [11]:
def sigmoid(z):
    return 1/(1+(exp(-z)))
    

### Properties of Sigmoid function:
[[decision-boundary]](https://www.coursera.org/learn/machine-learning/supplement/N8qsm/decision-boundary)

$$z = 0 \implies e^{-z} = 1 \implies  g(z) \rightarrow 0.5$$

$$z \rightarrow \infty, \implies e^{-z} \rightarrow 0 \implies g(z) \rightarrow 1$$

$$z \rightarrow -\infty, \implies e^{-z} \rightarrow 0 \implies g(z) \rightarrow 0$$

### Testing Sigmoid Function

In [29]:
print('z=0, g(z) = {}'.format(sigmoid(0)))
print('z=20, g(z) = {}'.format(sigmoid(20)))
print('z=-20, g(z) = {}'.format(sigmoid(-20)))

z=0, g(z) = 0.5
z=20, g(z) = 0.9999999979388463
z=-20, g(z) = 2.0611536181902037e-09


### Vectoried Cost Function:
[[cost-function]](https://www.coursera.org/learn/machine-learning/supplement/bgEt4/cost-function)

We will try to minimise the cost function.

$$h=g(Xθ)$$

$$J(θ)=\frac{1}{m}⋅(−y^{T}log(h)−(1−y)^{T}log(1−h))$$

In [None]:
def cost_function(x, y, theta):
    m = y.shape[0]
    
    return (1/m) * (((x @ theta) - y) ** 2).sum()
    
    pass
    

### Gradient Descent:
[[gradient-descent]](https://www.coursera.org/learn/machine-learning/supplement/0hpMl/simplified-cost-function-and-gradient-descent)

We will use Gradient Descent to minimise the cost function.

General form of gradient descent:
\begin{align}
Repeat \{ \\
        θ_j := θ_j − \alpha\frac{∂}{∂θ_j}J(θ) \\
\}
\end{align}


Vectorised implementation after workig out partial derivatives:

\begin{align}
θ := θ− \frac{α}{m} X^T(g(Xθ) - y)
\end{align}
