# Classification

Classification is a **supervised learning** technique used to predict a **categorical target variable (class labels)**.

Examples:
- Spam vs. Not Spam
- Disease vs. No Disease
- Fraud vs. Genuine Transaction

---

## 1. Logistic Regression

- Despite the name, this is used for **classification**.
- Uses the **sigmoid function** to output probability between 0 and 1.

Equation:

$$
P(y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x)}}
$$

where:  
- $ P(y=1|X) $ → probability of class = 1 given input $X$  
- $ \beta_0 $ → intercept  
- $ \beta_1 $ → coefficient of feature $x$  
- $ e $ → exponential constant (2.718...)  

---

## 2. K-Nearest Neighbors (KNN)

- A **non-parametric** method.  
- Classifies based on the **majority class** among the k nearest neighbors.  

Equation (distance measure):

$$
d(x, x_i) = \sqrt{\sum_{j=1}^n (x_j - x_{ij})^2}
$$

where:  
- $ d(x, x_i) $ → distance between test point and training point  
- $ n $ → number of features  
- $ x_j, x_{ij} $ → feature values  

- Its a lazzy learner model no generalized model is formed.
---

## 3. Decision Tree

- Splits data into subsets using **decision rules** (IF-THEN conditions).  
- Uses **Information Gain** or **Gini Index** to find the best split.

Gini Index:

$$
Gini = 1 - \sum_{i=1}^n p_i^2
$$

where:  
- $ p_i $ → probability of class $i$  
- $ n $ → number of classes  

---

## 4. Random Forest

- An **ensemble** of multiple decision trees.  
- Each tree votes for a class → majority vote decides the output.  

Equation (majority voting):

$$
y = \text{mode}(h_1(x), h_2(x), \dots, h_m(x))
$$

where:  
- $ h_i(x) $ → prediction from $i^{th}$ tree  
- $ m $ → total number of trees  

---

## 5. Support Vector Machine (SVM)

- Tries to find the **optimal hyperplane** that separates classes with maximum margin.  

Equation (decision function):

$$
f(x) = w \cdot x + b
$$

where:  
- $ w $ → weight vector  
- $ b $ → bias (intercept)  
- $ f(x) $ → decision function (sign determines class)  

---

## 6. Naive Bayes

- Based on **Bayes’ Theorem** with assumption of feature independence.  

Bayes’ Rule:

$$
P(C_k|X) = \frac{P(X|C_k) \cdot P(C_k)}{P(X)}
$$

where:  
- $ P(C_k|X) $ → posterior probability of class $C_k$ given input $X$  
- $ P(X|C_k) $ → likelihood  
- $ P(C_k) $ → prior probability of class  
- $ P(X) $ → evidence  

---

### Summary of Classification Methods

- **Logistic Regression** → Probabilistic linear classifier  
- **KNN** → Instance-based, distance-driven  
- **Decision Trees** → Rule-based, interpretable  
- **Random Forest** → Ensemble of trees, reduces overfitting  
- **SVM** → Optimal margin classifier  
- **Naive Bayes** → Probabilistic, based on Bayes’ theorem  
- **Neural Networks** → Deep learning, powerful but complex
