# Logistic Regression
Linear regression is a statistical technique that models the probability of events, given one or more independant variables. It is mainly used for binary classification tasks

Unlike in linear regression, where the hypothetical function `y= wx + b` help with calculating unbound values, here we need to find either 0 or 1 (True and False)

So the logistic regression model is as follows,

$$
y = sigmoid( wx + b )
$$

sigmoid is an activation function that is defined as,

$$
sigmoid(z) = \frac{1}{1 + e^-z}
$$

Where `e` is the euler number which is equal 2.71828 and `z` is the dependant variable 

## Cost Function (loss)

The cost function used for a logistic regression method is a convex function, whereas the least squared error method used in linear regression is non convex (refer to Notes for more details). A common function used for logistic regression is Binary Cross Entropy

$$
BCE = -\frac{1}{m} \sum_{i=1}^{m} ylog(\hat{y}) + (1-y)log(1-\hat{y})
$$

where $\hat{y}$ is the predicted value from the logistic regression equation

## Calculating Gradient partials 

For weights ($x_i^T$ is the transposed x value at ith position),

$$
\frac{\partial BCE}{\partial w_j} = \frac{1}{m} (\hat{y}_i - y_i) x_i^T
$$

For bias,
$$
\frac{\partial BCE}{\partial b} = \frac{1}{m} (\hat{y}_i - y_i)
$$


The partials can then be used in the general gradient descent calculations

### Implementation
The implementation would involve a cost_function() for BCE, compute_gradients() for the partials, grad_desc() and a pred() for predictions and fitting

In [14]:
import numpy as np
from sklearn import datasets as ds
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

In [16]:
np.random.seed(123)   
bc = ds.load_breast_cancer() # 569 samples, 2 target classes, 30 features, can be found at https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer
X, y = bc.data, bc.target
print(len(bc.feature_names))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

30


In [None]:
class logisticRegression:
    def __init__(self, learning_rate=0.01, epochs=100):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.weights = None
        self.bias = None
    def sigmoid(self, z):
        return 1/(1+np.exp(-z))
    
    def cost_function(self, y, y_pred):
        epsilon = 1e-9 # the reason to add epsilon is to prevent any crashes when log(0) which can be inf
        m = len(y)
        y1 = y * np.log(y_pred + epsilon)
        y2 = (1-y) * np.log(1-y_pred+epsilon)
        return -(y1 + y2)/m
    def compute_partials()
    def train() # basically gradient descent

### Note
**Why use a different loss function from linear regression?**

If you try to use the linear regression's cost function in a logistic regression problem, you would end up with a non-convex function: a wierdly-shaped graph with no easy to find minimum global point. 

This strange outcome is due to the fact that in logistic regression we have the sigmoid function around, which is non-linear (i.e. not a line). The gradient descent algorithm might get stuck in a local minimum point. That's why we still need a neat convex function as we did for linear regression: a bowl-shaped function that eases the gradient descent function's work to converge to the optimal minimum point.

*Source can be found [here](https://www.internalpointers.com/post/cost-function-logistic-regression)*