# 📘 Supervised Machine Learning Algorithms with Theory & Math

Supervised learning is a type of machine learning where the model is trained on **labeled data** — meaning each input has a corresponding output. The model learns to map inputs to outputs and can make predictions on unseen data.

---

## 📌 Table of Contents
1. [Linear Regression](#1-linear-regression)
2. [Logistic Regression](#2-logistic-regression)
3. [K-Nearest Neighbors (KNN)](#3-k-nearest-neighbors-knn)
4. [Support Vector Machines (SVM)](#4-support-vector-machines-svm)
5. [Decision Trees](#5-decision-trees)
6. [Random Forest](#6-random-forest)
7. [Gradient Boosting (XGBoost, LightGBM)](#7-gradient-boosting)
8. [Naive Bayes](#8-naive-bayes)

---

## 1. Linear Regression

### 📌 Definition:
Linear Regression is a statistical method used to **predict a continuous output** based on one or more input features by fitting a linear relationship.

### 📖 Theoretical Intuition:
- Assumes a **linear relationship** between the independent variables and the dependent variable.
- Estimates coefficients (weights) to minimize prediction error.

### 📐 Equation:
\[
\hat{y} = w_0 + w_1x_1 + w_2x_2 + \dots + w_nx_n
\]

### 🎯 Cost Function (MSE):
\[
J(w) = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2
\]

### ⚙ Optimization:
- Uses **Gradient Descent** or **Normal Equation** to minimize error.

### ✅ Use Cases:
- Predicting prices (housing, stock)
- Sales forecasting

---

## 2. Logistic Regression

### 📌 Definition:
Logistic Regression is used for **binary classification** problems. It estimates the probability that a given input belongs to a certain class.

### 📖 Theoretical Intuition:
- Applies the **sigmoid function** to output a probability between 0 and 1.
- Uses **cross-entropy** as a loss function.

### 📐 Hypothesis:
\[
\hat{y} = \sigma(w^T x) = \frac{1}{1 + e^{-w^T x}}
\]

### 🎯 Loss Function:
\[
J(w) = - \frac{1}{n} \sum_{i=1}^{n} \left[y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)\right]
\]

### ⚙ Optimization:
- Gradient Descent, L-BFGS

### ✅ Use Cases:
- Email spam detection
- Disease diagnosis (yes/no)

---

## 3. K-Nearest Neighbors (KNN)

### 📌 Definition:
KNN is a **non-parametric**, instance-based algorithm that classifies new instances based on the majority class of its **k closest neighbors**.

### 📖 Theoretical Intuition:
- Stores the entire training dataset.
- Makes predictions by calculating the **distance** to all training points.

### 📐 Distance Metric (Euclidean):
\[
d(x, x_i) = \sqrt{\sum_{j=1}^n (x_j - x_{ij})^2}
\]

### ✅ Use Cases:
- Recommender systems
- Handwriting recognition

---

## 4. Support Vector Machines (SVM)

### 📌 Definition:
SVM is a **maximum margin classifier** that finds the optimal hyperplane to separate different classes in the feature space.

### 📖 Theoretical Intuition:
- Maximizes the distance (margin) between the nearest data points and the decision boundary.
- Can handle **non-linear data** using the **kernel trick**.

### 📐 Objective Function:
\[
\text{maximize } \frac{2}{\|w\|} \quad \text{subject to } y_i(w^T x_i + b) \geq 1
\]

### 🎯 Hinge Loss:
\[
J(w) = \frac{1}{2}\|w\|^2 + C \sum_{i=1}^n \max(0, 1 - y_i(w^T x_i + b))
\]

### ✅ Use Cases:
- Image classification
- Bioinformatics

---

## 5. Decision Trees

### 📌 Definition:
A Decision Tree is a **flowchart-like** structure where internal nodes represent feature tests, branches represent outcomes, and leaf nodes represent decisions.

### 📖 Theoretical Intuition:
- Splits data on features that provide the **highest information gain** (or lowest impurity).
- Easy to interpret and visualize.

### 📐 Gini Index:
\[
G = 1 - \sum_{i=1}^C p_i^2
\]

### 📐 Entropy:
\[
H = - \sum_{i=1}^C p_i \log_2(p_i)
\]

### ✅ Use Cases:
- Customer segmentation
- Fraud detection

---

## 6. Random Forest

### 📌 Definition:
Random Forest is an **ensemble learning method** that builds multiple decision trees and merges their results for better accuracy.

### 📖 Theoretical Intuition:
- Uses **Bagging** (bootstrap aggregating) to reduce variance.
- At each split, only a **random subset of features** is considered.

### 🧠 Prediction:
- Classification: **Majority vote**
- Regression: **Mean of predictions**

### ✅ Use Cases:
- Loan approval prediction
- Credit scoring

---

## 7. Gradient Boosting

### 📌 Definition:
Gradient Boosting is an ensemble method that builds models **sequentially**, each correcting the errors of its predecessor.

### 📖 Theoretical Intuition:
- Models the **residual errors** from previous models.
- Each learner tries to reduce the **gradient** of the loss function.

### 📐 Model Update:
\[
F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)
\]

Where:
- \( F_m \): current model
- \( \eta \): learning rate
- \( h_m(x) \): new weak learner

### ✅ Use Cases:
- Ranking (search engines)
- Customer churn prediction

---

## 8. Naive Bayes

### 📌 Definition:
Naive Bayes is a **probabilistic classifier** based on **Bayes' Theorem**, assuming **independence between features**.

### 📖 Theoretical Intuition:
- Despite the independence assumption, performs well in many applications.
- Very efficient and requires a small amount of training data.

### 📐 Bayes’ Theorem:
\[
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
\]

### 📐 Classification Rule:
\[
\hat{y} = \arg\max_C P(C) \prod_{i=1}^n P(x_i | C)
\]

### ✅ Use Cases:
- Spam filtering
- Sentiment analysis

---

## 📌 Summary Table

| Algorithm         | Type           | Key Concept                         | Math Element                     |
|-------------------|----------------|-------------------------------------|----------------------------------|
| Linear Regression | Regression     | Line of best fit                    | MSE + Gradient Descent          |
| Logistic Regression | Classification | Probability Estimation            | Sigmoid + Cross-Entropy         |
| KNN               | Both           | Nearest neighbors                   | Distance (Euclidean, etc.)      |
| SVM               | Classification | Maximum Margin                      | Hinge Loss + Kernels            |
| Decision Tree     | Both           | Feature splits                      | Gini/Entropy                    |
| Random Forest     | Both           | Ensemble of trees                   | Bagging                         |
| Gradient Boosting | Both           | Residual learning                   | Additive Models + Gradient      |
| Naive Bayes       | Classification | Probabilistic model                 | Bayes’ Theorem + Product Rule   |

---

