# Categorical Classification
Classification is a supervised machine learning method where the model tries to predict the correct label of a given input data.
There are two types of classification
    
    1.Binary
    2.Multi
    
    1. KNN ( k nearest neighbours)

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as sp
from math import sqrt
from sklearn import datasets

In [3]:
x1 = np.random.randint(1, 10, 1000)
x2 = np.random.normal(0, 10, 1000)
y = np.random.randint(0, 2, 1000)

In [4]:
# Euclidean Function
def euclidean(x1, x2, y1, y2):
    res = sqrt((x2 - x1)**2 + (y2 - y1)**2)
    return res

In [5]:
# Initialization
k = 5
loss = []

# KNN Algorithm
for i in range(0,1000):
    dist = []  
    for j in range(0,1000):
        if i != j:  # Exclude the same point
            dist.append(euclidean(x1[i], x1[j],x2[i], x2[j]))

    sort_indices = np.argsort(dist)  
    top_k = y[sort_indices[0:k]] 
    y_pred = sp.mode(top_k)

    # Append the total loss for each data point
    error = abs(y_pred - y[i])
    loss.append(error)
    # print(loss)

print("k-nearest actual values", top_k)
print("the predicted output", y_pred)
total_loss = np.mean(loss)
print("the total loss value", total_loss)

k-nearest actual values [0 1 0 1 1]
the predicted output 1
the total loss value 0.498


In [7]:
iris = datasets.load_iris()

In [9]:
x1 = iris.data[::,1]
x2 = iris.data[::,3]
y  = iris.target

In [10]:
np.unique(y)

array([0, 1, 2])

In [12]:
len(x1)

150

In [14]:
# Initialization
k = 5
loss = []

# KNN Algorithm
for i in range(0,150):
    dist = []  
    for j in range(0,150):
        if i != j:  # Exclude the same point
            dist.append(euclidean(x1[i], x1[j],x2[i], x2[j]))

    sort_indices = np.argsort(dist)  
    top_k = y[sort_indices[0:k]] 
    y_pred = sp.mode(top_k)

    # Append the total loss for each data point
    error = abs(y_pred - y[i])
    loss.append(error)
    # print(loss)

print("k-nearest actual values", top_k)
print("the predicted output", y_pred)
total_loss = np.mean(loss)
print("the total loss value", total_loss)

k-nearest actual values [2 2 2 2 2]
the predicted output 2
the total loss value 0.05333333333333334


In [21]:
def predict(a,b):
    k = 5
    dist=[]
    # KNN Algorithm
    for i in range(0,150):
        dist.append(euclidean(a, x1[j],b, x2[j]))
    
    sort_indices = np.argsort(dist)  
    top_k = y[sort_indices[0:k]] 
    y_pred = sp.mode(top_k)

    print("k-nearest actual values", top_k)
    print("the predicted output", y_pred)
    return y_pred

In [24]:
predict(3,1)

k-nearest actual values [0 1 1 1 1]
the predicted output 1


1

# Logistic Regression
Binary Classification

$$
h_0(x) = \frac{1}{1 + e^{-z}}
$$

$$
h_\theta(x) = p( y=1 ) > 0.5 
$$

Loss function

$$
\text{Cross Entropy} = \frac{1}{N} \sum_{j=1}^{N} \left[ y_1 \log_2(y_j) + (1 - y_j) \log_2(1 - y_j) \right]
$$

Here's a step-by-step explanation of the formula:

    1. For each data point $j$, compute the cross-entropy contribution for both the positive class ($y_1 \log_2(y_j)$) and the negative class ($(1 - y_j) \log_2(1 - y_j)$).
    2. Sum up these contributions across all data points ($\sum$).
    3. Finally, divide the total by the number of data points $N$ to get the average cross-entropy loss.

1. $y_1 \log_2(y_j)$ 
    - This term computes the cross-entropy contribution for the positive class when the true label is 1.
    
2. $(1 - y_j) \log_2(1 - y_j)$
    - This term computes the cross-entropy contribution for the negative class when the true label is 0.
