# Logistic Regression from Scratch!

## Data

The dataset used in this notebook is [synthetically](https://towardsdatascience.com/synthetic-data-generation-a-must-have-skill-for-new-data-scientists-915896c0c1ae "Click to learn more!") generated.The data consists of a single row of features and a single row of columns. The only feature is **CellGrowthRate** of a certain tumours. We will classify the tumours as *Benign* or *Malignant* 
<br>
Let's first checkout the dataset.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
df1 = pd.read_csv('Features.csv')
df2 = pd.read_csv('Labels.csv')

In [None]:
df1.head()

In [None]:
df2.head()

In [None]:
df = pd.concat([df1['CellGrowthRate'], df2['Labels']], axis = 1)
df.head(10)

In [None]:
df2[df2['Labels'] == 'Malignant'] = 1
df2[df2['Labels'] == 'Benign'] = 0
df2.head()

In [None]:
X = df1['CellGrowthRate'].values
Y = df2['Labels'].values
X.shape

In [None]:
Y.shape


In [None]:
Y = Y.flatten()
Y.shape

## What's the deal? 

Just introduce something called **Activation Function**.
<br>
* What does it *do*?
   + Maps the continuous values predicted by the Linear Regressor into two discrete values.
* What is *Sigmoid*?
   + It's the Activation Function we'll use for our Classifier (for Logistic Regression).
* What happens with the *Cost Function*?
   + For now, we need a different one.

<img src = "comparis.png">

### Let's dig in!

**SIGMOID**
<br>
<img src = "sigmoid.png">

For any input the sigmoid function maps it to 0 or 1 using the following formula 

##  <center>$\sigma (t) = \frac{1}{1 + e^{-t}}$</center>
##  <center>$h(y_p) = \frac{1}{1 + e^{-y_p}}$</center>

In [None]:
def sigmoid(z):
    s = (1 / (1 + np.exp(-z)))
    return s

**Cost Function**
<br>
## <center>$cost(y_p, y) = -ylog(y_p) - (1-y)log(1-y_p)$</center>

**Cost over all samples**

## <center>$J(\theta) = -\frac{1}{n}\sum_{i=1}^{n} [y^ilog(y^i_p) + (1-y^i)log(1-y^i_p)] $</center>

**Gradient**

## <center> $\frac{\partial J(\theta)}{\partial \theta} = \frac{1}{n}\sum_{i=1}^{n} (y^i_p - y^i)x^i $<center>

In [None]:
def loss(y_p, y):
    x = -sum(y * np.log(y_p) + (1 - y) * np.log(1 - y_p))/y.shape[0]
    return x

In [None]:
plt.scatter(X, Y, c = Y)

In [None]:
def fit(x, y):
    m = 0
    b = 0
    lr = 0.02
    n = y.shape[0]
    dm = 0
    db = 0
    costs = []
    for it in range(0, 200):
        h = m*x + b
        yp = sigmoid(h)
        cost = loss(yp, y)
        costs.append(cost)
        #print(loss(yp, y))
        dm = sum(x*(yp-y))/n
        db = sum(yp-y)/n
        m = m - lr * dm
        b = b - lr * db
    return costs, m, b
        

In [None]:
costs, m, b = fit(X, Y)

In [None]:
pred = sigmoid(X*m+b)
np.sum((pred>0.5).flatten()==Y)/Y.shape[0]
#(train_pred>0.5).flatten()

In [None]:
plt.plot(costs)
plt.xlabel('Iteration')
plt.ylabel('Cost')