
### 🌟 **Non-Linear Supervised Learning Algorithms – Short Notes**

**Definition**:
Supervised learning algorithms learn from labeled data to predict outcomes. **Non-linear algorithms** can model complex relationships where the data cannot be separated or fitted using a straight line.

---

### 📌 1. **Decision Trees**

* **Type**: Classification & Regression
* **Working**: Splits data based on feature values into branches; non-linear boundaries.
* **Pros**: Easy to interpret; handles both numerical & categorical data.
* **Cons**: Prone to overfitting.

---

### 📌 2. **Random Forest**

* **Type**: Ensemble (Classification & Regression)
* **Working**: Builds multiple decision trees and combines their outputs.
* **Pros**: Reduces overfitting; good accuracy.
* **Cons**: Less interpretable; slower than single trees.

---

### 📌 3. **Support Vector Machines (SVM) with Non-Linear Kernels**

* **Type**: Classification & Regression
* **Working**: Uses kernels (like RBF, polynomial) to map data into higher dimensions to find a separating hyperplane.
* **Pros**: Effective in high dimensions.
* **Cons**: Computationally expensive with large datasets.

---

### 📌 4. **K-Nearest Neighbors (KNN)**

* **Type**: Classification & Regression
* **Working**: Classifies based on majority vote of nearest data points (non-linear decision boundaries).
* **Pros**: Simple; no training phase.
* **Cons**: Slow with large data; sensitive to feature scaling.

---


### Summary Table

| Algorithm         | Use Case | Strength                     | Weakness                |
| ----------------- | -------- | ---------------------------- | ----------------------- |
| Decision Tree     | C & R    | Interpretability             | Overfitting             |
| Random Forest     | C & R    | Accuracy, robustness         | Complexity              |
| SVM (Non-linear)  | C & R    | Effective in high dim spaces | High computational cost |
| KNN               | C & R    | Simple, no training needed   | Slow prediction         |

---



---

## 📚 **K-Nearest Neighbors (KNN) – Complete Notes**

---

### 🔷 **What is KNN?**

K-Nearest Neighbors (KNN) is a **supervised learning algorithm** used for both **classification** and **regression**. It is a **lazy learner** because it doesn’t build a model during training but makes predictions only at runtime.

---

### 📌 **How KNN Works**

1. **Choose K** (number of neighbors).
2. **Calculate distance** between the new point and all training points.
3. **Sort the distances** and find the **K nearest neighbors**.
4. For **classification**:

   * Use **majority vote** of K neighbors’ classes.
5. For **regression**:

   * Take the **average (or weighted average)** of K neighbors’ values.

---

### 🧠 **KNN for Classification – Example**

#### 🧾 Dataset (Fruit Classification)

| Fruit  | Weight (g) | Color Score | Label  |
| ------ | ---------- | ----------- | ------ |
| Apple  | 150        | 0.8         | Apple  |
| Apple  | 170        | 0.75        | Apple  |
| Orange | 130        | 0.4         | Orange |
| Orange | 120        | 0.35        | Orange |
| Banana | 110        | 0.9         | Banana |

#### 🆕 New Fruit to Classify

* Weight: **140 g**
* Color Score: **0.6**

#### ✅ Python Code (Classification)

```python
from sklearn.neighbors import KNeighborsClassifier

# Features and labels
X = [[150, 0.8], [170, 0.75], [130, 0.4], [120, 0.35], [110, 0.9]]
y = ['Apple', 'Apple', 'Orange', 'Orange', 'Banana']

# New data
new_data = [[140, 0.6]]

# KNN Classifier
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)

# Predict
prediction = model.predict(new_data)
print("Predicted Class:", prediction[0])
```

#### 🎯 Output

```
Predicted Class: Apple
```

---

### 📈 **KNN for Regression – Example**

#### 🧾 Dataset (House Price Prediction)

| House Size (sqft) | Price (in \$1000) |
| ----------------- | ----------------- |
| 1000              | 300               |
| 1200              | 330               |
| 1500              | 380               |
| 1700              | 410               |
| 2000              | 450               |

#### 🆕 Predict Price of House

* Size: **1600 sqft**

#### ✅ Python Code (Regression)

```python
from sklearn.neighbors import KNeighborsRegressor

# Features and target
X = [[1000], [1200], [1500], [1700], [2000]]
y = [300, 330, 380, 410, 450]

# New data
new_house = [[1600]]

# KNN Regressor
model = KNeighborsRegressor(n_neighbors=3)
model.fit(X, y)

# Predict
prediction = model.predict(new_house)
print("Predicted Price: $", prediction[0]*1000)
```

#### 🎯 Output

```
Predicted Price: $400000.0
```

> Here, the average price of the 3 closest house sizes (1500, 1700, 2000) is used for prediction.

---

### 🔧 **Choosing the Right K**

* **Too small (e.g., K=1)** → Overfitting (model too sensitive to noise).
* **Too large** → Underfitting (model too smooth).
* Use **cross-validation** to find the optimal K.

---

### 📐 **Distance Metrics**

* **Euclidean Distance** (default for most cases):

  $$
  d = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}
  $$
* **Other options**: Manhattan, Minkowski, cosine distance.

---

### ✅ **Advantages**

* Simple to understand and implement.
* Works well with small, clean datasets.
* No training time – just store data.

---

### ❌ **Disadvantages**

* Slow prediction on large datasets.
* Memory-intensive (stores all data).
* Performance drops in **high dimensions** (curse of dimensionality).
* Sensitive to feature scaling (important to normalize or standardize features).

---

### 🧪 **Applications of KNN**

* **Image recognition**
* **Handwriting classification**
* **Medical diagnosis**
* **Recommendation systems**
* **Customer segmentation**

---



---

# 🌳 Decision Tree – Detailed Explanation

---

## 1. **What is a Decision Tree?**

* A **Decision Tree** is a **tree-shaped model** used for supervised learning.
* It predicts **target values** by learning simple decision rules inferred from data features.
* Applicable for **classification** (discrete labels) and **regression** (continuous values).

---

## 2. **Structure of a Decision Tree**

* **Root Node:** The top node representing the entire dataset.
* **Internal Nodes (Decision Nodes):** Nodes where data is split based on a feature condition.
* **Branches:** Paths from one node to another depending on split decisions.
* **Leaf Nodes (Terminal Nodes):** Final nodes that give the output (class label or regression value).

---

## 3. **How Does a Decision Tree Work?**

* **Step 1:** Start at the root with the full dataset.
* **Step 2:** Evaluate all possible splits across all features.
* **Step 3:** Choose the split that best separates the data according to a **splitting criterion**.
* **Step 4:** Split the dataset into subsets based on the selected feature and threshold.
* **Step 5:** Recursively repeat Steps 2–4 on each subset.
* **Step 6:** Stop splitting when a stopping condition is met:

  * All samples in a node belong to the same class (classification).
  * Minimum samples per node reached.
  * Maximum tree depth reached.

---

## 4. **Splitting Criteria (How to Choose Best Split)**

### For Classification:

* **Gini Impurity:**

$$
Gini = 1 - \sum_{i=1}^{C} p_i^2
$$

Where $p_i$ is the proportion of class $i$ in the node. Lower Gini means better purity.

* **Entropy (Information Gain):**

$$
Entropy = - \sum_{i=1}^C p_i \log_2(p_i)
$$

Information gain = Entropy(parent) - Weighted average Entropy(children)

Higher information gain means better split.

### For Regression:

* **Mean Squared Error (MSE):**

Split chosen to minimize the variance of target values in child nodes.

---

## 5. **Stopping Conditions**

* **Pure node:** All examples belong to the same class.
* **Max depth:** Limit the depth of the tree to avoid overfitting.
* **Minimum samples per split/leaf:** Prevent splits with too few samples.
* **No improvement:** No further decrease in impurity.

---

## 6. **How to Use Decision Trees?**

### Classification Example:

| Feature: Age | Buys Laptop (Label) |
| ------------ | ------------------- |
| < 30         | No                  |
| 30–40        | Yes                 |
| > 40         | Yes                 |

* The tree first splits by Age < 30?
* If yes, predict **No**.
* Else, predict **Yes**.

### Regression Example:

| House Size (sqft) | Price (in \$1000) |
| ----------------- | ----------------- |
| 1000              | 200               |
| 1500              | 300               |
| 2000              | 400               |

* Splits data based on size ranges minimizing variance in prices within each leaf.
* Predicts the average price in the leaf node.

---

## 7. **Advantages**

* **Easy to understand and interpret** (decision paths are human-readable).
* Handles **both categorical and numerical data**.
* **No need for feature scaling or normalization**.
* Can model **non-linear relationships**.
* Fast predictions once trained.

---

## 8. **Disadvantages**

* Can easily **overfit** especially if tree is very deep.
* Small changes in data can lead to **very different trees** (unstable).
* Decision boundaries are **axis-aligned**, which may limit model flexibility.
* Not great with **high-dimensional data** without feature selection.

---

## 9. **Avoiding Overfitting**

* **Pruning:** Cut back parts of the tree that do not improve performance on validation data.
* **Set max depth:** Limit how deep the tree can grow.
* **Minimum samples leaf:** Set a minimum number of samples needed in a leaf node.
* Use **ensemble methods** like Random Forest or Gradient Boosted Trees for better generalization.

---

## 10. **Hyperparameters to Tune**

| Parameter           | Purpose                                    |
| ------------------- | ------------------------------------------ |
| `max_depth`         | Maximum levels of the tree                 |
| `min_samples_split` | Minimum samples required to split a node   |
| `min_samples_leaf`  | Minimum samples required in a leaf node    |
| `criterion`         | Split quality measure (gini, entropy, mse) |
| `max_features`      | Max features to consider for split         |

---

## 11. **Python Code Example (Classification)**

```python
from sklearn.tree import DecisionTreeClassifier

X = [[25], [35], [45], [20], [50]]
y = ['No', 'Yes', 'Yes', 'No', 'Yes']

model = DecisionTreeClassifier(max_depth=3, criterion='gini')
model.fit(X, y)

print(model.predict([[28]]))  # Output: 'No' or 'Yes'
```

---

## 12. **Visualizing Decision Trees**

* Trees can be visualized with `plot_tree` in sklearn or `graphviz`.
* Visualization helps understand how the tree makes decisions.

---

## Summary Table

| Aspect           | Details                                                       |
| ---------------- | ------------------------------------------------------------- |
| Type             | Supervised, non-linear                                        |
| Suitable for     | Classification & regression                                   |
| Interpretability | High (easy to interpret)                                      |
| Pros             | Simple, no scaling needed, handles categorical/numerical data |
| Cons             | Prone to overfitting, unstable, axis-aligned splits           |

---




---

# Support Vector Machine (SVM) — Clear & Simple Guide

---

## 1. **What is SVM?**

* SVM is a **supervised learning algorithm** used for:

  * **Classification** (e.g., spam or not spam)
  * **Regression** (predicting continuous values, less common)

* Goal: Find the **best boundary** (called a **hyperplane**) that **separates different classes** in the data.

---

## 2. **What is a Hyperplane?**

* In 2D, a hyperplane is a **line** that divides the plane.
* In 3D, it’s a **plane** dividing the space.
* In higher dimensions, it’s a flat decision boundary.

---

## 3. **How does SVM find the best hyperplane?**

* SVM looks for the hyperplane that **maximizes the margin**:

  * The margin = distance between the hyperplane and the **closest points** of each class.
  * These closest points are called **support vectors**.
* A bigger margin means better generalization to unseen data.

---

## 4. **Why margin matters?**

* A **wide margin** means the decision boundary is far from any data point, making the model more robust to noise.
* A **narrow margin** means the boundary is close to some data points and more likely to misclassify new points.

---

## 5. **What if data is not linearly separable?**

* Many real-world problems have data that **can’t be separated by a straight line**.
* SVM uses something called the **Kernel Trick**:

  * It transforms data into a **higher dimension** where a linear boundary can separate it.
  * Example kernels:

    * **Linear** (no transform)
    * **Polynomial** (curved boundaries)
    * **RBF (Radial Basis Function)** (very flexible, nonlinear boundaries)

---

## 6. **Soft Margin SVM**

* Real data often has noise and overlapping classes.
* SVM allows some **misclassifications** by introducing a **soft margin**.
* A parameter **C** controls the trade-off:

  * Large C = less tolerance for errors (tries to classify all correctly, might overfit)
  * Small C = more tolerance for errors (more generalized)

---

## 7. **SVM for Regression (SVR)**

* Instead of classification, SVR predicts continuous values.
* It tries to fit a function within a margin of tolerance (epsilon) around the data points.

---

## 8. **Summary of Steps in SVM**

| Step                             | Description                                                       |
| -------------------------------- | ----------------------------------------------------------------- |
| 1. Choose kernel                 | Linear for simple, RBF or Polynomial for complex data             |
| 2. Find hyperplane               | Maximize margin between classes                                   |
| 3. Use soft margin (parameter C) | Allow some misclassification for robustness                       |
| 4. Make predictions              | New points classified based on which side of hyperplane they fall |

---

## 9. **Example Intuition**

Imagine you want to separate apples and oranges on a table:

* **Linear SVM:** Draw the straight line that separates apples on left and oranges on right with the largest space between closest apple and orange.
* **Non-linear SVM:** If apples and oranges are mixed in a circle, transform the space so you can separate them with a straight line in that transformed space.

---

## 10. **Python Code Example (Classification)**

```python
from sklearn.svm import SVC

# Sample data points
X = [[1,2], [2,3], [3,3], [5,5], [6,5], [7,7]]
y = [0, 0, 0, 1, 1, 1]  # Classes 0 and 1

# Create SVM with RBF kernel (non-linear)
model = SVC(kernel='rbf', C=1.0)
model.fit(X, y)

# Predict new point
print(model.predict([[4,4]]))  # Output: 0 or 1 depending on prediction
```

---

## 11. **When to Use SVM?**

* When your data has **clear margin** between classes.
* Works well in **high dimensional spaces**.
* When you want a **robust classifier** with good generalization.
* When you have **small to medium-sized datasets** (SVM can be slow on very large datasets).

---

## 12. **Pros and Cons**

| Pros                                         | Cons                                   |
| -------------------------------------------- | -------------------------------------- |
| Effective in high-dimensional spaces         | Computationally expensive for big data |
| Works well with clear margin of separation   | Choosing kernel and parameters tricky  |
| Can model non-linear boundaries with kernels | No direct probability outputs          |

---
