<a href="https://colab.research.google.com/github/cm-int/classification_models/blob/main/module_1/Democode/Mod_1_Lesson_3_Classification_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Using Gradient Descent to Classify Data

This demonstration applies the principles of gradient descent to a simple classification scenario.

The sample data comprises two sets of linearly-separable coordinates, illustrated as blue and red dots on a graph. The algorithm attempts to find the best linear boundary between the two sets. 

![Data used to demonstrate classification](classification_data.png)

The linear regression algorithm is the same as that used in the previous demonstration. The main difference is that this example minimizes the cost of dots that appear on the wrong side of an estimated linear boundary:

* The *points_above* function returns the coordinates of all points that lie above the line determined by the parameters m (slope) and b (y-intercept).

* The *points_below* function returns the coordinates of all points that lie below the line determined by the parameters m (slope) and b (y-intercept).

The classification algorithm iterates over both sets of points to find the red points that are below the boundary and the blue points that are above the boundary. These are the points that are on the wrong side of the line. The algorithm uses the cost function to determine the MSE of these points and performs gradient descent to reduce this cost on each iteration.

For a description of the gradient descent algorithm, see the topic **How machine learning uses differential calculus for gradient descent**

Mean squared error (cost function):

$$C=\frac{1}{n}\sum_{i=1}^{n}\left({\hat{y}}_i-y_i\right)^2$$

In [None]:
def mse(y_hat, y):
    n = len(y_hat)
    diff = y_hat - y
    diff_squared = sum(diff * diff)
    return diff_squared / n

Regression equation:
$${\hat{y}}_i=m{\ \times\ x}_i+b$$

In [None]:
def regress(m, b, x):
    return m * x + b

Partial derivative of cost (C) with respect to m:

$$\frac{\partial C}{\partial m}=\frac{2}{n}\sum_{i=1}^{n}x_i\ \times\ \left({\hat{y}}_i-y_i\right)$$

In [None]:
def delC_delm(x, y_hat, y):
    s = sum(x * (y_hat - y))
    return (2 / len(x)) * s

Partial derivative of cost (C) with respect to b:

$$\frac{\partial C}{\partial b}=\frac{2}{n}\sum_{i=1}^{n}\left({\hat{y}}_i-y_i\right)$$

In [None]:
def delC_delb(y_hat, y):
    s = sum(y_hat - y)
    return (2 / len(y)) * s

In [None]:
def points_above(x, y, m, b):
    points = []
    for i in range(1, len(x)-1):
        y_hat = m * x[i] + b
        if y_hat - y[i] > 0:
          points.append((x[i], y[i]))
    return points

In [None]:
def points_below(x, y, m, b):
    points = []
    for i in range(1, len(x)-1):
        y_hat = m * x[i] + b
        if y_hat - y[i] < 0:
          points.append((x[i], y[i]))
    return points

In [None]:
import numpy as np
import pandas as pd
from numpy import random

xr = random.randint(low=0, high=100, size=100)
yr = np.arange(21, 41, 0.2)

xb = random.randint(low=0, high=100, size=100)
yb = np.arange(0, 20, 0.2)

Initial guesses for slope **m**, and y-intercept, **b**

Learning rate: **lr**

Minimum cost threshold: **tol**. The algorithm stops when the cost falls below this level.

Maximimum number of iterations to perform: **max_iters**. The algorithm halts after this number of iterations if it doesn't converge


In [None]:
import numpy as np

# Gradient descent parameters. Experiment with these
m = 1
b = 0.1
lr = 0.0002 # Don't make this value too big because the algorithm might not converge
tol = 1e-4
max_iters = 30000

##Classiciation using Gradient Descent

```
Loop while number of iterations performed is less than num_iters:
    
    Find all red dots that are below the line Y = m * X + b, and all blue dots that are above this line

    If there are no red dots below the line and no blue dots above the line we have finished, so stop

    Calculate new estimates for Y based on m, b, and X using the regression function Y = m * X + b
    
    Find the cost of these new estimates by finding the MSE for red dots and blue dots that are on the wrong side of the line
    
    If the cost is less than tol we have finished, so stop

    Create a new estimate for m using the partial derivative of cost with respect to m multiplied by the learning rate, lr

    Create a new estimate for b using the partial derivative of cost with respect to b multiplied by the learning rate, lr
  
```



In [None]:
from pandas.core.series import is_empty_data
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10, 10))
plt.scatter(xr, yr, s=15, c='red')
plt.scatter(xb, yb, s=15, c='blue')

costs = []
for i in range(0, max_iters):

    points = points_above(xb, yb, m, b) + points_below(xr, yr, m, b)
    if not(len(points)):
      break

    x = np.array(list(zip(*points))[0])
    y = np.array(list(zip(*points))[1])

    y_hat = regress(m, b, x)

    cost = mse(y_hat, y)
    costs.append(cost) # Save the costs in a list. These will be graphed later
    if cost <= tol:
            break

    new_m = m - delC_delm(x, y_hat, y) * lr
    new_b = b - delC_delb(y_hat, y) * lr

    m = new_m
    b = new_b
 
    if i % 2000 == 0: # Show progress every 2000 iterations
      plt.plot(x, m * x + b, c='lightgrey')

plt.plot(x, m * x + b, c='black')
plt.xlabel('X', fontdict={'family': 'serif','color':  'darkred','weight': 'normal','size': 20})
plt.ylabel('Y', fontdict={'family': 'serif','color':  'darkred','weight': 'normal','size': 20})

plt.show()

print(f'Estimated line is: y = {m} * x + {b}')

##Gradient of Cost##

Graph of cost versus the number of iterations. The cost decreases as the number of iterations increases.

In [None]:
import matplotlib.pyplot as plt

costs_index = range(0, len(costs))

fig = plt.figure(figsize=(10, 10))
plt.plot(costs_index[::1000], costs[::1000])  # Plot every 1000th point (stops the graph from being a mass of dots)
plt.scatter(costs_index[::1000], costs[::1000])

plt.xlabel('Iterations', fontdict={'family': 'serif','color':  'darkred','weight': 'normal','size': 20})
plt.ylabel('Cost', fontdict={'family': 'serif','color':  'darkred','weight': 'normal','size': 20})
plt.title('Gradient of Cost', fontdict={'family': 'serif','color':  'darkred','weight': 'normal','size': 20})

plt.show()