# <h1 style='text-align:center'>**Introduction to Machine Learning (ML)**</h1>
---
---

#### **What is Machine Learning?**

Machine Learning (ML) is a branch of artificial intelligence (AI) that focuses on developing algorithms and models that enable computers to **learn from data and make predictions or decisions without being explicitly programmed**. Unlike traditional programming, where rules are manually coded, ML systems improve automatically as they are exposed to more data.

In simpler terms:

> Machine Learning allows computers to **learn patterns from data** and **take actions** based on those patterns.

---

### **Key Concepts in Machine Learning**

1. **Data**

   * ML systems rely heavily on data‚Äîthis could be images, text, numbers, or sensor readings.
   * The quality and quantity of data greatly affect the performance of the ML model.

2. **Features**

   * Features are the **individual measurable properties or characteristics** of the data.
   * Example: In predicting house prices, features could include the number of bedrooms, location, and square footage.

3. **Model**

   * A model is the mathematical representation of patterns learned from data.
   * It maps input data (features) to an output (prediction or classification).

4. **Training**

   * Training is the process of feeding data to the ML algorithm so it can **learn patterns**.
   * The algorithm adjusts its internal parameters to minimize errors.

5. **Prediction/Inference**

   * Once trained, the model can make predictions on **new, unseen data**.

6. **Evaluation**

   * ML models are evaluated using metrics like **accuracy, precision, recall, or mean squared error**, depending on the type of problem.

---

### **Applications of Machine Learning**

* **Healthcare:** Disease prediction, medical image analysis.
* **Finance:** Fraud detection, stock market predictions.
* **E-commerce:** Recommendation systems (e.g., Netflix, Amazon).
* **Autonomous Systems:** Self-driving cars, drones.
* **Natural Language Processing:** Chatbots, translation, sentiment analysis.

---

### **Key Takeaways**

* Machine Learning enables computers to **learn from data** instead of being explicitly programmed.
* The **type of ML** depends on whether the data is labeled and the type of problem being solved.
* ML is widely used in real-world applications and is transforming industries globally.



## 1. Understanding AI, ML, DL, GenAI, and LLMs

This README provides a concise overview of key AI-related technologies and how they relate to each other.

### Table: Differences

| Term | Definition | Subset of | How it Works | Examples | Key Point |
|------|------------|-----------|--------------|---------|-----------|
| **Artificial Intelligence (AI)** | Broad science of making machines perform tasks that require human intelligence | ‚Äì | Rule-based systems, logic, or data-driven methods | Chess-playing AI, Siri, Fraud detection | Umbrella term for all intelligent machines |
| **Machine Learning (ML)** | AI that **learns from data** and improves automatically | AI | Algorithms learn patterns from data to make predictions or decisions | Spam filters, Recommendation systems, Price prediction | AI that **learns from data** |
| **Deep Learning (DL)** | ML that uses **deep neural networks** to learn complex patterns from large data | ML | Multi-layer neural networks extract features automatically from data | Image recognition, Speech recognition, Self-driving cars | ML for **big and complex data** |
| **Generative AI (GenAI)** | AI that **creates new content** (text, images, audio, code) | AI / ML / DL | Models generate new outputs resembling training data | ChatGPT, DALL¬∑E, GitHub Copilot | AI focused on **creation**, not just prediction |
| **Large Language Models (LLMs)** | DL models trained on massive text to **understand & generate human-like language** | DL / GenAI | Learn patterns of language from huge text corpora | GPT-4/5, LLaMA, Claude | GenAI specialized for **language tasks** |

### Summary

- **AI**: The broad field of machine intelligence.  
- **ML**: AI that learns from data.  
- **DL**: ML using deep neural networks.  
- **GenAI**: AI that generates new content.  
- **LLMs**: GenAI specialized for language understanding and generation.

---


## 2. Types of Machine Learning
---


### **2.1 Supervised Machine Learning**

#### **Definition**

Supervised Machine Learning (SML) is a type of ML where the model **learns from labeled data**. Each input (feature) in the dataset has a **corresponding output (label)**. The goal is to learn a **mapping from input to output** so that the model can predict outputs for **new, unseen data**.

* **Input ‚Üí Output**
* Data must be **labeled** (e.g., ‚Äúspam‚Äù or ‚Äúnot spam‚Äù).

---

#### **How it Works**

1. **Collect labeled data** ‚Äì Data with known inputs and outputs.
2. **Split data** ‚Äì Usually into training and testing sets.
3. **Choose an algorithm** ‚Äì E.g., Linear Regression, Decision Tree, etc.
4. **Train the model** ‚Äì Algorithm learns patterns in the training data.
5. **Test/Evaluate** ‚Äì Check how well the model predicts on unseen test data using metrics like accuracy, RMSE, or F1-score.
6. **Prediction** ‚Äì Use the trained model to predict new data.

---

#### **Types of Supervised Learning**

1. **Regression** ‚Äì Predict **continuous values**.

   * Example: Predicting house prices, temperature, stock prices.
2. **Classification** ‚Äì Predict **discrete categories/classes**.

   * Example: Email spam detection, disease diagnosis (yes/no).

---

#### **Common Algorithms**

| Algorithm                        | Type                      | Description                                              | Example Use Case                              |
| -------------------------------- | ------------------------- | -------------------------------------------------------- | --------------------------------------------- |
| **Linear Regression**            | Regression                | Predicts continuous output by fitting a line to data     | Predict house prices based on size, location  |
| **Logistic Regression**          | Classification            | Predicts probability of a categorical outcome            | Spam detection, disease prediction            |
| **Decision Tree**                | Regression/Classification | Splits data into branches based on feature decisions     | Customer churn prediction                     |
| **Random Forest**                | Regression/Classification | Ensemble of decision trees for better accuracy           | Fraud detection, stock prediction             |
| **Support Vector Machine (SVM)** | Regression/Classification | Finds the best boundary (hyperplane) to separate classes | Handwriting recognition, credit scoring                       |
| **K-Nearest Neighbors (KNN)**    | Regression/Classification | Predicts output based on nearest neighbors               | Recommender systems, credit scoring           |
| **Naive Bayes**                  | Classification            | Uses probability based on Bayes‚Äô theorem                 | Email spam classification, sentiment analysis |

---

#### **Examples of Supervised Learning**

| Domain      | Problem                                  | Type           | Algorithm Example                           |
| ----------- | ---------------------------------------- | -------------- | ------------------------------------------- |
| E-commerce  | Predict if a customer will buy a product | Classification | Logistic Regression, Random Forest          |
| Healthcare  | Predict patient‚Äôs blood pressure         | Regression     | Linear Regression, Random Forest Regression |
| Finance     | Detect fraudulent transactions           | Classification | Decision Tree, SVM, Random Forest           |
| Marketing   | Predict customer churn                   | Classification | KNN, Logistic Regression                    |
| Real Estate | Predict house prices                     | Regression     | Linear Regression, Random Forest            |

---

#### **Key Points**

* Requires **labeled data**.
* Divided into **regression (continuous)** and **classification (categorical)** problems.
* Performance depends on **data quality, feature selection, and algorithm choice**.
* Often evaluated using metrics like **accuracy, precision, recall, F1-score (classification)** or **MSE/RMSE (regression)**.


---

### **2.2 Unsupervised Machine Learning (UML)**

#### **Definition**

Unsupervised Machine Learning is a type of ML where the model learns patterns from **unlabeled data**. Unlike supervised learning, there are **no predefined outputs or labels**. The goal is to find **hidden structures, patterns, or relationships** in the data.

* **Input ‚Üí No explicit Output**
* Focuses on **clustering, grouping, and reducing dimensionality**.

---

#### **How it Works**

1. **Collect unlabeled data** ‚Äì Data has features but no target labels.
2. **Choose an algorithm** ‚Äì E.g., K-Means, Hierarchical Clustering, PCA.
3. **Train the model** ‚Äì Algorithm finds patterns, clusters, or reduces dimensionality.
4. **Interpret results** ‚Äì Use patterns for insights, visualization, or preprocessing for other ML tasks.

---

#### **Types of Unsupervised Learning**

| Type                          | Purpose                              | Description                                        | Example Use Case                               |
| ----------------------------- | ------------------------------------ | -------------------------------------------------- | ---------------------------------------------- |
| **Clustering**                | Group similar data points            | Divides data into clusters based on similarity     | Customer segmentation, market analysis         |
| **Dimensionality Reduction**  | Reduce number of features            | Compresses data while retaining important patterns | PCA for visualization, noise reduction         |
| **Anomaly Detection**         | Detect unusual data points           | Identifies outliers or rare events                 | Fraud detection, fault detection in machines   |
| **Association Rule Learning** | Find relationships between variables | Identifies frequent patterns or co-occurrences     | Market basket analysis, recommendation systems |

---

#### **Common Algorithms**

| Algorithm                              | Type                     | Description                                           | Example Use Case                                  |
| -------------------------------------- | ------------------------ | ----------------------------------------------------- | ------------------------------------------------- |
| **K-Means Clustering**                 | Clustering               | Divides data into k clusters based on distance        | Customer segmentation, image compression          |
| **Hierarchical Clustering**            | Clustering               | Builds a tree of clusters (dendrogram)                | Gene expression analysis, social network grouping |
| **DBSCAN**                             | Clustering               | Density-based clustering for irregular shapes         | Detecting fraud, geospatial analysis              |
| **Principal Component Analysis (PCA)** | Dimensionality Reduction | Reduces features while preserving variance            | Visualizing high-dimensional data                 |
| **t-SNE / UMAP**                       | Dimensionality Reduction | Maps high-dimensional data to 2D/3D for visualization | Pattern discovery in datasets                     |
| **Apriori Algorithm**                  | Association Rule         | Finds frequent itemsets and association rules         | Market basket analysis, recommendation systems    |
| **Isolation Forest**                   | Anomaly Detection        | Detects anomalies by isolating data points            | Fraud detection, network intrusion detection      |

---

#### **Examples of Unsupervised Learning**

| Domain       | Problem                                       | Type                      | Algorithm Example        |
| ------------ | --------------------------------------------- | ------------------------- | ------------------------ |
| Retail       | Segment customers based on buying habits      | Clustering                | K-Means, Hierarchical    |
| Finance      | Detect unusual transactions                   | Anomaly Detection         | Isolation Forest, DBSCAN |
| Social Media | Group similar users or posts                  | Clustering                | K-Means, DBSCAN          |
| Healthcare   | Reduce dimensionality of gene expression data | Dimensionality Reduction  | PCA, t-SNE               |
| Marketing    | Discover frequently bought product sets       | Association Rule Learning | Apriori Algorithm        |

---

#### **Key Points**

* Works with **unlabeled data**.
* Focuses on **finding hidden patterns or structures**.
* Often used for **exploratory data analysis (EDA)**.
* Clustering, dimensionality reduction, and anomaly detection are common applications.
* Helps **discover insights** that are not explicitly labeled in the dataset.

---


### **2.3 Semi-Supervised Learning (SSL)**

#### **Definition**

Semi-Supervised Learning is a type of ML that uses **a small amount of labeled data and a large amount of unlabeled data** for training. It combines aspects of **supervised** and **unsupervised learning** to improve learning efficiency when labeling data is expensive or time-consuming.

* **Input ‚Üí Partially labeled Output**
* Goal: **Leverage unlabeled data** to improve model accuracy while using minimal labeled data.

---

#### **How it Works**

1. **Collect data** ‚Äì Most data is unlabeled; only a small portion is labeled.
2. **Train model with labeled data** ‚Äì Start with supervised learning on labeled data.
3. **Use unlabeled data** ‚Äì The model predicts labels for unlabeled data and iteratively improves learning.
4. **Refine model** ‚Äì Repeat the process to improve accuracy and generalization.

---

#### **Types of Semi-Supervised Learning Methods**

| Method                  | Description                                                                                     | Example Use Case                                |
| ----------------------- | ----------------------------------------------------------------------------------------------- | ----------------------------------------------- |
| **Self-training**       | Model is trained on labeled data, then predicts labels for unlabeled data iteratively           | Text classification, email filtering            |
| **Co-training**         | Two models are trained on different feature sets and label unlabeled data for each other        | Web page classification, multi-view learning    |
| **Graph-based methods** | Uses a graph to represent data points and propagates labels through connected nodes             | Social network analysis, recommendation systems |
| **Generative models**   | Uses unsupervised models (like autoencoders) to learn data structure and improve classification | Image recognition with limited labeled data     |

---

#### **Common Algorithms**

| Algorithm                         | Type           | Description                                                      | Example Use Case                          |
| --------------------------------- | -------------- | ---------------------------------------------------------------- | ----------------------------------------- |
| **Self-training with SVM**        | Classification | Start with labeled data, predict unlabeled data, retrain         | Spam detection, sentiment analysis        |
| **Label Propagation**             | Graph-based    | Labels spread from labeled nodes to connected unlabeled nodes    | Social network node classification        |
| **Semi-supervised K-Means**       | Clustering     | Uses labeled points to guide clustering of unlabeled data        | Customer segmentation with partial labels |
| **Generative Models (VAE, GANs)** | Generative     | Learn underlying data distribution and improve label predictions | Image classification with few labels      |

---

#### **Examples of Semi-Supervised Learning**

| Domain              | Problem                                             | Method/Algorithm             |
| ------------------- | --------------------------------------------------- | ---------------------------- |
| Healthcare          | Classify medical images with few labeled scans      | Self-training with CNN       |
| Finance             | Fraud detection with limited labeled transactions   | Label Propagation            |
| E-commerce          | Product categorization with partially labeled items | Semi-supervised K-Means      |
| NLP                 | Sentiment analysis on social media posts            | Self-training with SVM       |
| Autonomous Vehicles | Object detection with limited annotated images      | Generative models (VAE/GANs) |

---

#### **Key Points**

* **Uses both labeled and unlabeled data**, making it cost-effective.
* Bridges **supervised and unsupervised learning**.
* Ideal when **labeling is expensive or time-consuming**.
* Improves performance compared to using only small labeled datasets.

---

### **2.4 Reinforcement Learning (RL)**

#### **Definition**
Reinforcement Learning is a type of machine learning where an **agent learns to make decisions by interacting with an environment**. The agent takes **actions** and receives **rewards or penalties** based on the outcome, learning over time to **maximize cumulative rewards**.  

- **Input ‚Üí Actions ‚Üí Feedback (Reward/Penalty) ‚Üí Learning**
- RL is **trial-and-error based learning**.  

---

#### **How it Works**

1. **Agent** ‚Äì The learner or decision-maker.  
2. **Environment** ‚Äì The system or world the agent interacts with.  
3. **State (s)** ‚Äì The current situation of the agent in the environment.  
4. **Action (a)** ‚Äì The choice the agent makes in the current state.  
5. **Reward (r)** ‚Äì Feedback from the environment for the action.  
6. **Policy (œÄ)** ‚Äì Strategy the agent uses to decide actions.  
7. **Goal** ‚Äì Learn a policy that **maximizes cumulative rewards** over time.  

**Workflow:**  
```
State ‚Üí Action ‚Üí Environment ‚Üí Reward + New State ‚Üí Agent updates Policy ‚Üí Repeat
```

---

#### **Types of Reinforcement Learning**

| Type | Description | Example |
|------|-------------|---------|
| **Positive Reinforcement** | Agent is rewarded for good actions | Game AI gets points for winning |
| **Negative Reinforcement** | Agent avoids actions that give penalties | Robot learns to avoid obstacles |
| **Model-Based RL** | Agent builds a model of the environment and plans ahead | Chess AI predicting future moves |
| **Model-Free RL** | Agent learns from trial-and-error without a model | Q-Learning, Policy Gradient |

---

#### **Common RL Algorithms**

| Algorithm | Type | Description | Example Use Case |
|-----------|------|-------------|----------------|
| **Q-Learning** | Model-free | Learns value of action in each state (Q-table) | Grid-world navigation, simple games |
| **Deep Q-Network (DQN)** | Model-free, Deep RL | Uses neural networks to approximate Q-values | Atari games, robotics control |
| **SARSA** | Model-free | Updates action-values based on the next action actually taken | Robot navigation |
| **Policy Gradient** | Model-free | Directly learns the policy function | Continuous action tasks, robotics |
| **Actor-Critic** | Hybrid | Combines policy-based and value-based learning | Complex simulations, OpenAI Gym environments |
| **Monte Carlo Methods** | Model-free | Learns value functions by averaging returns over episodes | Board games, stochastic environments |

---

#### **Examples of Reinforcement Learning**

| Domain | Problem | Algorithm Example |
|--------|---------|-----------------|
| Gaming | Train AI to play Atari games | DQN, Q-Learning |
| Robotics | Teach a robot to walk | Policy Gradient, Actor-Critic |
| Finance | Optimize stock trading strategy | Q-Learning, Monte Carlo RL |
| Autonomous Vehicles | Learn to drive safely | Actor-Critic, DDPG |
| Resource Management | Optimize data center energy usage | Reinforcement-based optimization |

---

#### **Key Points**
- RL is **trial-and-error based learning**.  
- Focuses on **learning policies** to maximize cumulative reward.  
- Works with **dynamic environments**, unlike supervised learning.  
- Requires balancing **exploration (trying new actions)** and **exploitation (using known best actions)**.  
- Often used in **games, robotics, autonomous systems, and real-time decision-making**.  

---
---

### **2.1.1Regression in Machine Learning**

#### **Definition**

Regression is a type of **Supervised Learning** used to predict **continuous numerical values** based on one or more input features.

* **Input ‚Üí Continuous Output**
* Goal: Find a **relationship between independent variables (features) and a dependent variable (target)**.

---

#### **How Regression Works**

1. **Collect labeled data** ‚Äì Features (inputs) and target (output).
2. **Choose a regression algorithm** ‚Äì E.g., Linear Regression, Decision Tree Regression.
3. **Train the model** ‚Äì Learn the relationship between inputs and outputs.
4. **Predict on new data** ‚Äì Estimate continuous target values.
5. **Evaluate performance** ‚Äì Using metrics like **Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R¬≤ score**.

---

#### **Regression Algorithms**

| Type                                | Description                                                                      | Example Use Case                                        |
| ----------------------------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------- |
| **Linear Regression**               | Predicts output as a **linear combination** of input features                    | Predict house prices based on size, bedrooms            |
| **Polynomial Regression**           | Fits a **polynomial curve** to model non-linear relationships                    | Predicting growth trends or temperature patterns        |
| **Ridge Regression**                | Linear regression with **L2 regularization** to reduce overfitting               | Predicting sales with many correlated features          |
| **Lasso Regression**                | Linear regression with **L1 regularization** to shrink some coefficients to zero | Feature selection + prediction in high-dimensional data |
| **Logistic Regression**             | Actually a **classification** algorithm, used for binary outcomes                | Predicting spam/not-spam emails                         |
| **Decision Tree Regression**        | Uses tree-based splits to predict continuous values                              | Predicting customer spending                            |
| **Random Forest Regression**        | Ensemble of decision trees for robust regression                                 | Stock price prediction, energy consumption              |
| **Support Vector Regression (SVR)** | Finds a hyperplane within a threshold to predict values                          | Predicting temperature, housing prices                  |

---

#### **Examples of Regression**

| Domain      | Problem                                     | Algorithm Example                               |
| ----------- | ------------------------------------------- | ----------------------------------------------- |
| Real Estate | Predict house prices                        | Linear Regression, Random Forest Regression     |
| Finance     | Forecast stock prices                       | SVR, Decision Tree Regression                   |
| Healthcare  | Predict blood pressure or cholesterol level | Linear Regression, Ridge/Lasso Regression       |
| Marketing   | Predict sales based on ad spend             | Polynomial Regression, Random Forest Regression |
| Weather     | Predict temperature or rainfall             | SVR, Linear/Polynomial Regression               |

---

### **Regression Evaluation Metrics**

Regression metrics measure the **difference between predicted values and actual values**. The choice of metric depends on the problem and sensitivity to errors.

---

#### **1. Mean Absolute Error (MAE)**

* **Definition:** Average of the **absolute differences** between predicted and actual values.
* **Formula:**
  $$
  \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
  $$
* **Characteristics:**

  * Easy to interpret.
  * Treats all errors equally.
* **Use Case:** When you want a simple, robust measure of average error.

---

#### **2. Mean Squared Error (MSE)**

* **Definition:** Average of the **squared differences** between predicted and actual values.
* **Formula:**
  $$
  \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  $$
* **Characteristics:**

  * Penalizes **larger errors more heavily**.
  * Sensitive to outliers.
* **Use Case:** When large errors should be penalized more (e.g., finance, engineering).

---

#### **3. Root Mean Squared Error (RMSE)**

* **Definition:** Square root of MSE, bringing it back to the **same units as the target variable**.
* **Formula:**
  $$
  \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
  $$
* **Characteristics:**

  * Like MSE, penalizes large errors.
  * Easier to interpret than MSE because units match the target variable.
* **Use Case:** General-purpose regression evaluation.

---

#### **4. R-squared (R¬≤ Score / Coefficient of Determination)**

* **Definition:** Measures the **proportion of variance in the dependent variable** explained by the model.
* **Formula:**
  $$
  R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}*i)^2}{\sum*{i=1}^{n} (y_i - \bar{y})^2}
  $$
* **Characteristics:**

  * Value between **0 and 1** (sometimes negative for very poor models).
  * Higher R¬≤ ‚Üí better fit.
* **Use Case:** To check **overall goodness of fit** of the regression model.

---

#### **5. Adjusted R-squared**

* **Definition:** Modified R¬≤ that **adjusts for the number of features** in the model.
* **Formula:**
  $$
  \text{Adjusted } R^2 = 1 - \left( \frac{(1-R^2)(n-1)}{n-p-1} \right)
  $$
* **Characteristics:**

  * Penalizes adding irrelevant features.
  * Better than R¬≤ for multiple regression.

---

#### **6. Mean Absolute Percentage Error (MAPE)**

* **Definition:** Measures the **average percentage error** between predicted and actual values.
* **Formula:**
  $$
  \text{MAPE} = \frac{100}{n} \sum_{i=1}^{n} \frac{|y_i - \hat{y}_i|}{|y_i|}
  $$
* **Characteristics:**

  * Expresses error as a **percentage**, making it easier to interpret.
  * Sensitive if actual values are close to zero.
* **Use Case:** Forecasting sales, demand prediction, financial predictions.

---

#### **Key Takeaways**

* Regression predicts **continuous numerical values**.
* Used when **output is quantitative**, not categorical.
* Linear regression assumes a **linear relationship**, while polynomial or tree-based models handle non-linear patterns.
* **MAE** ‚Üí Simple average error; less sensitive to outliers.
* **MSE/RMSE** ‚Üí Penalizes larger errors; RMSE in same units as target.
* **R¬≤ / Adjusted R¬≤** ‚Üí Measures overall fit; adjusted R¬≤ for multiple predictors.
* **MAPE** ‚Üí Percentage-based, intuitive for business applications.
* **Choice of metric** depends on the **problem context and sensitivity to large errors**.

---

### **2.1.2 Classification in Machine Learning**

### **Definition**

Classification is a type of **Supervised Learning** where the model predicts **categorical (discrete) outcomes** based on input features.

* **Input ‚Üí Discrete Output (Class Labels)**
* Goal: Assign each input to one of the **predefined classes**.

---

### **How Classification Works**

1. **Collect labeled data** ‚Äì Inputs with corresponding class labels.
2. **Split data** ‚Äì Training and testing sets.
3. **Choose a classification algorithm** ‚Äì E.g., Logistic Regression, Decision Tree, Random Forest.
4. **Train the model** ‚Äì Learn patterns in the features for each class.
5. **Predict on new data** ‚Äì Assign class labels to unseen instances.
6. **Evaluate performance** ‚Äì Using metrics like accuracy, precision, recall, F1-score, ROC-AUC.

---

### **Types of Classification**

| Type                           | Description                                  | Example                                            |
| ------------------------------ | -------------------------------------------- | -------------------------------------------------- |
| **Binary Classification**      | Two possible classes                         | Spam vs. Not Spam, Disease vs. Healthy             |
| **Multi-class Classification** | More than two classes                        | Predicting animal type: Cat, Dog, Rabbit           |
| **Multi-label Classification** | Each instance can belong to multiple classes | Tagging a photo with ‚ÄúBeach‚Äù, ‚ÄúSunset‚Äù, ‚ÄúVacation‚Äù |

---

### **Common Classification Algorithms**

| Algorithm                        | Type              | Description                                       | Example Use Case                         |
| -------------------------------- | ----------------- | ------------------------------------------------- | ---------------------------------------- |
| **Logistic Regression**          | Binary/Multiclass | Predicts probability of class membership          | Spam detection, medical diagnosis        |
| **Decision Tree**                | Binary/Multiclass | Tree-based splits to classify instances           | Customer churn prediction                |
| **Random Forest**                | Binary/Multiclass | Ensemble of decision trees for robust predictions | Fraud detection, loan approval           |
| **Support Vector Machine (SVM)** | Binary/Multiclass | Finds best separating hyperplane between classes  | Handwriting recognition                  |
| **K-Nearest Neighbors (KNN)**    | Binary/Multiclass | Classifies based on closest neighbors             | Recommender systems, credit scoring      |
| **Naive Bayes**                  | Binary/Multiclass | Probability-based using Bayes theorem             | Email spam filtering, sentiment analysis |
| **Gradient Boosting / XGBoost**  | Binary/Multiclass | Boosted tree ensembles for higher accuracy        | Risk prediction, competition datasets    |
| **Neural Networks (MLP, CNN)**   | Binary/Multiclass | Learns complex patterns for classification        | Image recognition, speech recognition    |

---

### **Examples of Classification**

| Domain            | Problem                            | Algorithm Example                   |
| ----------------- | ---------------------------------- | ----------------------------------- |
| Healthcare        | Predict if a patient has a disease | Logistic Regression, Random Forest  |
| Finance           | Predict loan approval (yes/no)     | Decision Tree, Gradient Boosting    |
| Marketing         | Predict if a customer will churn   | SVM, Random Forest                  |
| Retail            | Predict product category           | KNN, Naive Bayes                    |
| Image Recognition | Classify handwritten digits        | CNN (Convolutional Neural Networks) |

---

### **Evaluation Metrics for Classification**

Classification metrics are based on the **confusion matrix**, which summarizes predictions vs. actual labels.

#### **Confusion Matrix**

|                     | Predicted Positive  | Predicted Negative  |
| ------------------- | ------------------- | ------------------- |
| **Actual Positive** | True Positive (TP)  | False Negative (FN) |
| **Actual Negative** | False Positive (FP) | True Negative (TN)  |

---

#### **1. Accuracy**

* **Definition:** Fraction of total correct predictions.
* **Formula:**
  $$
  \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
  $$
* **Key Point:** Easy to interpret, but can be misleading for **imbalanced datasets**.

---

#### **2. Precision**

* **Definition:** Fraction of correctly predicted positive instances among all predicted positives.
* **Formula:**
  $$
  \text{Precision} = \frac{TP}{TP + FP}
  $$
* **Key Point:** Focuses on **minimizing false positives**.
* **Use Case:** Spam detection, fraud detection.

---

#### **3. Recall (Sensitivity / True Positive Rate)**

* **Definition:** Fraction of correctly predicted positive instances among all actual positives.
* **Formula:**
  $$
  \text{Recall} = \frac{TP}{TP + FN}
  $$
* **Key Point:** Focuses on **minimizing false negatives**.
* **Use Case:** Medical diagnosis, cancer detection.

---

#### **4. F1-Score**

* **Definition:** Harmonic mean of precision and recall.
* **Formula:**
  $$
  F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
  $$
* **Key Point:** Balances precision and recall; useful for **imbalanced datasets**.
* **Use Case:** Fraud detection, rare event prediction.

---

#### **5. Specificity (True Negative Rate)**

* **Definition:** Fraction of correctly predicted negative instances among all actual negatives.
* **Formula:**
  $$
  \text{Specificity} = \frac{TN}{TN + FP}
  $$
* **Key Point:** Measures ability to detect negatives.
* **Use Case:** When avoiding false alarms is critical.

---

#### **6. ROC-AUC (Receiver Operating Characteristic ‚Äì Area Under Curve)**

**Definition**

* **ROC (Receiver Operating Characteristic) Curve:** A plot that shows the **trade-off between True Positive Rate (TPR / Recall) and False Positive Rate (FPR)** at different classification thresholds.
* **AUC (Area Under the Curve):** A single scalar value summarizing the **overall ability of the classifier to distinguish between classes**.

#### **Key Concepts**

1. **True Positive Rate (TPR / Recall / Sensitivity):**
   $$
   TPR = \frac{TP}{TP + FN}
   $$

* Measures proportion of actual positives correctly identified.

2. **False Positive Rate (FPR):**
   $$
   FPR = \frac{FP}{FP + TN}
   $$

* Measures proportion of actual negatives incorrectly classified as positive.

3. **ROC Curve:**

* X-axis: FPR (0 ‚Üí 1)
* Y-axis: TPR (0 ‚Üí 1)
* Each point corresponds to a **different threshold** used to classify probabilities into classes.

4. **AUC:**

* Value between **0 and 1**:

  * **0.5 ‚Üí random guessing**
  * **1.0 ‚Üí perfect classifier**
  * **<0.5 ‚Üí worse than random** (rare, indicates reversed predictions)

---

**Interpretation**

| AUC Value | Interpretation       |
| --------- | -------------------- |
| 0.9 ‚Äì 1.0 | Excellent classifier |
| 0.8 ‚Äì 0.9 | Good classifier      |
| 0.7 ‚Äì 0.8 | Fair classifier      |
| 0.6 ‚Äì 0.7 | Poor classifier      |
| 0.5       | Random guessing      |


**Connection to Classification Metrics**

* **Type 1 Error (FP) vs Type 2 Error (FN):**

  * ROC curve visualizes the **trade-off between TPR (minimizing FN) and FPR (minimizing FP)**.
* **Threshold-independent evaluation:**

  * Unlike accuracy, precision, or recall which depend on a specific threshold, **AUC evaluates classifier performance across all thresholds**.
* **Useful for imbalanced datasets:**

  * Helps assess **discriminative ability** even when one class is much smaller.


**Advantages of AUC-ROC**

1. **Threshold-independent** ‚Äì evaluates model across all possible thresholds.
2. **Class imbalance friendly** ‚Äì focuses on ranking positives higher than negatives.
3. **Interpretability** ‚Äì probability that a randomly chosen positive is ranked higher than a randomly chosen negative.


**Visualization Example**

* **ROC Curve:**

  * X-axis = FPR
  * Y-axis = TPR
  * Curve above diagonal = better than random.
  * **AUC** = area under the curve.

```
TPR |
    |       *
    |      *
    |     *
    |    *
    |   *
    |__*_____________ FPR
       0   0.5    1
```

---

#### **7. Log Loss (Cross-Entropy Loss)**

* **Definition:** Measures how close the predicted probabilities are to the true class labels.
* **Formula:**
  $$
  \text{Log Loss} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{k} y_{i,j} \log(p_{i,j})
  $$
* **Key Point:** Penalizes wrong confident predictions.
* **Use Case:** Probabilistic classification, logistic regression.

---

### **Comparison Table of Classification Metrics**

| Metric                   | Formula                                   | Focus                             | Best Use Case                             |
| ------------------------ | ----------------------------------------- | --------------------------------- | ----------------------------------------- |
| **Accuracy**             | ((TP+TN)/(TP+TN+FP+FN))                   | Overall correctness               | Balanced datasets                         |
| **Precision**            | (TP/(TP+FP))                              | Avoid false positives             | Spam detection, fraud detection           |
| **Recall (Sensitivity)** | (TP/(TP+FN))                              | Avoid false negatives             | Medical diagnosis, disease detection      |
| **F1-Score**             | (2*(Precision*Recall)/(Precision+Recall)) | Balance Precision & Recall        | Imbalanced datasets                       |
| **Specificity**          | (TN/(TN+FP))                              | True negative detection           | False alarm sensitive tasks               |
| **ROC-AUC**              | Area under ROC curve                      | Class separation ability          | Model comparison for imbalance            |
| **Log Loss**             | Cross-entropy loss                        | Probabilistic prediction accuracy | Logistic regression, probabilistic models |

---

#### **Type 1 & Type 2 Errors in Classification**

In classification, we have a **confusion matrix**:

|                  | Predicted Positive | Predicted Negative |
|------------------|-----------------|-----------------|
| **Actual Positive** | True Positive (TP) | False Negative (FN) |
| **Actual Negative** | False Positive (FP) | True Negative (TN) |

---

#### **1. Type 1 Error (False Positive)**

- **Definition:** Model predicts **Positive**, but actual is **Negative**.  
- **ML Equivalent:** **False Positive (FP)**  
- **Example:**  
  - Medical Diagnosis: Model predicts ‚Äúdisease‚Äù ‚Üí patient is actually healthy.  
  - Spam Filter: Marks a legitimate email as spam.  

- **Relevant Classification Metrics:**  
  | Metric | Connection to Type 1 Error | Interpretation |
  |--------|---------------------------|----------------|
  | **Precision** | TP / (TP + FP) ‚Üí FP appears in denominator | High FP ‚Üí Low precision |
  | **Specificity (True Negative Rate)** | TN / (TN + FP) ‚Üí FP appears in denominator | High FP ‚Üí Low specificity |
  | **FPR (False Positive Rate)** | FP / (FP + TN) | Measures proportion of negative instances misclassified as positive |

---

#### **2. Type 2 Error (False Negative)**

- **Definition:** Model predicts **Negative**, but actual is **Positive**.  
- **ML Equivalent:** **False Negative (FN)**  
- **Example:**  
  - Medical Diagnosis: Model predicts ‚Äúhealthy‚Äù ‚Üí patient actually has disease.  
  - Spam Filter: Marks a spam email as not spam.  

- **Relevant Classification Metrics:**  
  | Metric | Connection to Type 2 Error | Interpretation |
  |--------|---------------------------|----------------|
  | **Recall (Sensitivity / True Positive Rate)** | TP / (TP + FN) ‚Üí FN in denominator | High FN ‚Üí Low recall |
  | **FNR (False Negative Rate)** | FN / (TP + FN) | Measures proportion of positives missed by the model |
  | **F1-Score** | Combines precision and recall | High FN reduces F1-score |

- **Quick Mapping Table**

| ML Term | Type of Error | Confusion Matrix | Metric Impact |
|---------|--------------|----------------|---------------|
| **Type 1 Error** | False Positive | Predicted Positive, Actual Negative | Precision ‚Üì, Specificity ‚Üì, FPR ‚Üë |
| **Type 2 Error** | False Negative | Predicted Negative, Actual Positive | Recall ‚Üì, FNR ‚Üë, F1-Score ‚Üì |


---

### **Key Points**

1. **Accuracy** is simple but not reliable for **imbalanced datasets**.
2. **Precision** = focus on correctness of positive predictions.
3. **Recall** = focus on capturing all positives.
4. **F1-score** balances **precision and recall**; essential for imbalanced data.
5. **ROC-AUC** measures overall discrimination capability of the model.
6. **Log Loss** evaluates probabilistic predictions, penalizing confident wrong guesses.
7. **Type 1 Error (FP):** Predict positive incorrectly ‚Üí affects **precision and specificity**.  
8. **Type 2 Error (FN):** Predict negative incorrectly ‚Üí affects **recall and F1-score**.  
9. **Trade-off:** Reducing Type 1 (FP) usually increases Type 2 (FN), and vice versa.  
10. **Choice depends on context:**  
   - Medical tests ‚Üí minimize **Type 2 errors** (don‚Äôt miss sick patients).  
   - Spam detection ‚Üí minimize **Type 1 errors** (don‚Äôt mark good emails as spam). 
11. Classification predicts **categorical outcomes**.
12. Can be **binary, multi-class, or multi-label**.
13. **Algorithm choice** depends on data size, feature types, and problem complexity.
14. Evaluation must consider **class imbalance**; metrics like F1-score or ROC-AUC may be more meaningful than accuracy.
15. Common applications include **spam detection, fraud detection, medical diagnosis, image recognition, and customer churn prediction**.

---


### **2.2.1 Clustering in Machine Learning**

### **Definition**

Clustering is an **unsupervised learning** technique that groups data points into **clusters** such that:

* Data points within the **same cluster are similar**
* Data points in **different clusters are dissimilar**

üëâ There are **no labels**; the model discovers structure automatically.

---

### **How Clustering Works**

1. Collect **unlabeled data**
2. Choose a **similarity/distance measure** (e.g., Euclidean distance)
3. Apply a **clustering algorithm**
4. Algorithm groups data into clusters
5. Analyze clusters for insights or downstream tasks

---

### **Types of Clustering**

| Type                | Description                                        | Example                      |
| ------------------- | -------------------------------------------------- | ---------------------------- |
| **Partition-based** | Divides data into K non-overlapping clusters       | K-Means                      |
| **Hierarchical**    | Creates a tree of clusters (dendrogram)            | Agglomerative Clustering     |
| **Density-based**   | Groups dense regions, handles noise                | DBSCAN                       |
| **Model-based**     | Assumes data comes from a probability distribution | Gaussian Mixture Model (GMM) |
| **Grid-based**      | Divides data into grid structures                  | STING                        |

---

### **Common Clustering Algorithms**

| Algorithm                        | Type            | Description                             | Use Case                         |
| -------------------------------- | --------------- | --------------------------------------- | -------------------------------- |
| **K-Means**                      | Partition-based | Minimizes distance to cluster centroids | Customer segmentation            |
| **Hierarchical Clustering**      | Hierarchical    | Builds cluster tree (dendrogram)        | Gene analysis                    |
| **DBSCAN**                       | Density-based   | Finds dense clusters & noise            | Fraud detection, geospatial data |
| **Gaussian Mixture Model (GMM)** | Model-based     | Soft clustering using probabilities     | Image segmentation               |
| **Mean Shift**                   | Density-based   | Shifts points toward density peaks      | Object tracking                  |

---

### **Distance Measures Used in Clustering**

| Distance Metric        | Description                    |
| ---------------------- | ------------------------------ |
| **Euclidean Distance** | Straight-line distance         |
| **Manhattan Distance** | Sum of absolute differences    |
| **Cosine Similarity**  | Measures angle between vectors |
| **Jaccard Distance**   | Similarity between sets        |

---

### **Examples of Clustering Applications**

| Domain           | Problem                   | Algorithm               |
| ---------------- | ------------------------- | ----------------------- |
| Retail           | Customer segmentation     | K-Means                 |
| Finance          | Fraud detection           | DBSCAN                  |
| Healthcare       | Disease pattern discovery | Hierarchical Clustering |
| Marketing        | Market segmentation       | K-Means, GMM            |
| Image Processing | Image compression         | K-Means                 |
| Social Media     | User grouping             | DBSCAN                  |

---

### **Clustering Evaluation Metrics**

Since clustering is **unsupervised**, we usually **don‚Äôt have true labels**. Therefore, clustering evaluation is divided into:

1. **Internal Metrics** ‚Äì use only the data & clusters
2. **External Metrics** ‚Äì use true labels (if available)

---

### **1. Internal Clustering Evaluation Metrics (Most Common)**

#### **1. Silhouette Score**

* **Definition:** Measures how similar a point is to its own cluster compared to other clusters.

* **Formula:**
  $$
  s = \frac{b - a}{\max(a, b)}
  $$
  Where:

* `a` = average distance to points in same cluster

* `b` = average distance to points in nearest cluster

* **Range:** `-1 to +1`

| Value       | Meaning              |
| ----------- | -------------------- |
| Close to +1 | Well-clustered       |
| Around 0    | Overlapping clusters |
| Negative    | Wrong clustering     |

‚úî **Best Metric for:** Overall clustering quality
‚úî **Used with:** K-Means, Hierarchical clustering

---

#### **2. Davies‚ÄìBouldin Index (DBI)**

* **Definition:** Measures **average similarity between each cluster and its most similar cluster**.
* **Key Idea:** Lower is better.
* **Range:** `0 ‚Üí ‚àû`

| Value | Meaning                           |
| ----- | --------------------------------- |
| Low   | Compact & well-separated clusters |
| High  | Poor clustering                   |

‚úî **Best Metric for:** Comparing multiple clustering models
‚úî **Sensitive to:** Cluster shape

---

#### **3. Calinski‚ÄìHarabasz Index (CHI)**

* **Definition:** Ratio of **between-cluster dispersion to within-cluster dispersion**.
* **Key Idea:** Higher is better.

‚úî **Best Metric for:** Finding optimal number of clusters
‚úî **Works well with:** K-Means

---

#### **4. Elbow Method (Inertia / WCSS)**

* **Definition:** Measures **within-cluster sum of squares**.
* **Approach:** Plot WCSS vs number of clusters (K).

‚úî **Optimal K:** Point where curve bends (‚Äúelbow‚Äù)

‚úî **Limitation:** Subjective interpretation
‚úî **Used only for:** K-Means

---

### **2. External Clustering Evaluation Metrics (If Labels Exist)**

#### **5. Adjusted Rand Index (ARI)**

* **Definition:** Measures similarity between true labels and predicted clusters.
* **Range:** `-1 to 1`

| Value | Meaning           |
| ----- | ----------------- |
| 1     | Perfect match     |
| 0     | Random clustering |
| <0    | Worse than random |

‚úî **Best Metric for:** Ground-truth comparison

---

#### **6. Normalized Mutual Information (NMI)**

* **Definition:** Measures information shared between true labels and clusters.
* **Range:** `0 to 1`

‚úî **Insensitive to:** Number of clusters
‚úî **Used in:** NLP & image clustering

---

#### **7. Homogeneity, Completeness & V-Measure**

| Metric           | Meaning                              |
| ---------------- | ------------------------------------ |
| **Homogeneity**  | Each cluster contains only one class |
| **Completeness** | All class samples in same cluster    |
| **V-Measure**    | Harmonic mean of both                |

‚úî **Best for:** Evaluating label consistency

---

### **Comparison Table**

| Metric            | Type     | Best Value  | When to Use                |
| ----------------- | -------- | ----------- | -------------------------- |
| Silhouette Score  | Internal | ‚Üë High      | General clustering quality |
| Davies‚ÄìBouldin    | Internal | ‚Üì Low       | Model comparison           |
| Calinski‚ÄìHarabasz | Internal | ‚Üë High      | Optimal K selection        |
| Elbow Method      | Internal | Elbow point | K-Means only               |
| ARI               | External | ‚Üë High      | True labels available      |
| NMI               | External | ‚Üë High      | Label comparison           |
| V-Measure         | External | ‚Üë High      | Class consistency          |

---

### **Key Takeaways**

1. **No labels ‚Üí use internal metrics** (Silhouette, DBI, CHI)
2. **Labels available ‚Üí use external metrics** (ARI, NMI)
3. **Silhouette Score** is the most widely used metric
4. **DBI ‚Üì lower is better**, **CHI ‚Üë higher is better**
5. **Elbow method helps choose K**, but is subjective

---

### **Advantages of Clustering**

* Works without labeled data
* Helps discover hidden patterns
* Useful for exploratory data analysis (EDA)
* Scales to large datasets

---

### **Limitations**

* Choosing the right number of clusters can be difficult
* Sensitive to noise and outliers (e.g., K-Means)
* Results depend on distance metrics
* Interpretation may be subjective

---

### **Key Takeaways**

1. Clustering is an **unsupervised learning technique**.
2. Used to **group similar data points**.
3. Common algorithms: **K-Means, Hierarchical, DBSCAN, GMM**.
4. Evaluation uses **internal metrics** like silhouette score.
5. Widely used in **customer segmentation, fraud detection, image processing**, and more.

---


### **2.2.2 Dimensionality Reduction**

#### **Definition**

Dimensionality Reduction is an **unsupervised learning technique** used to **reduce the number of input features (dimensions)** in a dataset while **preserving as much important information as possible**.

üëâ Goal: **Simpler data, faster models, less noise, better visualization**

---

#### **Why Dimensionality Reduction is Needed**

* High-dimensional data ‚Üí **curse of dimensionality**
* Reduces **overfitting**
* Improves **training speed**
* Removes **redundant and noisy features**
* Enables **2D/3D visualization**

---

#### **Types of Dimensionality Reduction**

| Type                   | Description                                            |
| ---------------------- | ------------------------------------------------------ |
| **Feature Selection**  | Selects a subset of original features                  |
| **Feature Extraction** | Transforms features into a new lower-dimensional space |

---

#### **Common Dimensionality Reduction Algorithms**

#### **1. Principal Component Analysis (PCA)**

* **Type:** Linear, Feature Extraction
* **Idea:** Projects data onto directions of **maximum variance**
* **Output:** Principal Components (orthogonal axes)

‚úî Fast & widely used
‚ùå Assumes linearity

---

#### **2. Linear Discriminant Analysis (LDA)**

* **Type:** Supervised dimensionality reduction
* **Idea:** Maximizes **class separability**
* **Used when:** Labels are available

---

#### **3. t-SNE (t-Distributed Stochastic Neighbor Embedding)**

* **Type:** Non-linear
* **Idea:** Preserves **local structure** for visualization
* **Used for:** 2D/3D visualization

‚úî Excellent visualization
‚ùå Not scalable, not for modeling

---

#### **4. UMAP (Uniform Manifold Approximation and Projection)**

* **Type:** Non-linear
* **Idea:** Preserves both **local and global structure**

‚úî Faster & scalable than t-SNE
‚úî Good for large datasets

---

#### **5. Autoencoders**

* **Type:** Deep Learning-based
* **Idea:** Neural network compresses and reconstructs data

‚úî Handles complex non-linear data
‚ùå Requires more data & compute

---

### **Algorithm Comparison Table**

| Algorithm   | Linear | Supervised | Best For                    |
| ----------- | ------ | ---------- | --------------------------- |
| PCA         | Yes    | No         | Noise reduction, speed      |
| LDA         | Yes    | Yes        | Class separability          |
| t-SNE       | No     | No         | Visualization               |
| UMAP        | No     | No         | Visualization + structure   |
| Autoencoder | No     | No         | Complex data (images, text) |

---

### **Applications**

| Domain          | Use Case                                       |
| --------------- | ---------------------------------------------- |
| Computer Vision | Image compression                              |
| NLP             | Word embeddings                                |
| Healthcare      | Gene expression analysis                       |
| Finance         | Feature reduction                              |
| ML Pipelines    | Preprocessing before clustering/classification |

---

### **Evaluation Metrics**

Dimensionality reduction does **not directly predict labels**, so evaluation focuses on:

1. **Information preservation**
2. **Reconstruction quality**
3. **Structure preservation**
4. **Downstream task performance**

---

#### **1. Information Preservation Metrics**

**Explained Variance Ratio (PCA)**

* **Definition:** Measures how much variance (information) is retained after reduction.

* **Formula:**
  $$
  \text{Explained Variance Ratio} = \frac{\text{Variance of selected components}}{\text{Total variance}}
  $$

* **Interpretation:**

  * 95% variance ‚Üí very good
  * 80‚Äì90% ‚Üí acceptable

‚úî **Used with:** PCA
‚úî **Goal:** Retain maximum information

---

#### **2. Reconstruction-Based Metrics**

 **Reconstruction Error**

* **Definition:** Measures difference between original data and reconstructed data.
* **Common measures:**

  * Mean Squared Error (MSE)
  * Mean Absolute Error (MAE)

$$
\text{Reconstruction Error} = ||X - \hat{X}||^2
$$

‚úî **Lower = better**
‚úî **Used with:** PCA, Autoencoders

---

#### **3. Structure Preservation Metrics**

**Trustworthiness**

* **Definition:** Measures whether **nearest neighbors in reduced space** were neighbors in original space.
* **Range:** 0 to 1

‚úî **High value ‚Üí good local structure**

---

**Continuity**

* **Definition:** Measures whether **original neighbors remain neighbors** after reduction.
* **Range:** 0 to 1

‚úî **High value ‚Üí minimal structure loss**

---

**Mean Relative Rank Error (MRRE)**

* Measures ranking distortion between high-dimensional and low-dimensional spaces.

‚úî **Lower = better**

---

### **4. Visualization-Oriented Metrics**

**Silhouette Score (After Reduction)**

* Apply clustering on reduced data
* Measures **cluster separation**

‚úî Useful when DR is followed by clustering
‚úî Higher score ‚Üí better separation

---

**KNN Preservation Score**

* Measures how many nearest neighbors are preserved after reduction.

‚úî Especially useful for **t-SNE & UMAP**

---

### **5. Downstream Task Performance**

**Classification / Clustering Performance**

* Train a model on reduced data
* Evaluate using:

  * Accuracy
  * F1-score
  * AUC
  * Silhouette Score

‚úî Best real-world evaluation method

---

## **Metric vs Algorithm Mapping**

| Metric                | PCA | t-SNE | UMAP | LDA | Autoencoder |
| --------------------- | --- | ----- | ---- | --- | ----------- |
| Explained Variance    | ‚úÖ   | ‚ùå     | ‚ùå    | ‚ùå   | ‚ùå           |
| Reconstruction Error  | ‚úÖ   | ‚ùå     | ‚ùå    | ‚ùå   | ‚úÖ           |
| Trustworthiness       | ‚ö†Ô∏è  | ‚úÖ     | ‚úÖ    | ‚ö†Ô∏è  | ‚úÖ           |
| Continuity            | ‚ö†Ô∏è  | ‚úÖ     | ‚úÖ    | ‚ö†Ô∏è  | ‚úÖ           |
| Downstream Accuracy   | ‚úÖ   | ‚ö†Ô∏è    | ‚ö†Ô∏è   | ‚úÖ   | ‚úÖ           |
| Visualization Quality | ‚ùå   | ‚úÖ     | ‚úÖ    | ‚ö†Ô∏è  | ‚ö†Ô∏è          |

---

## **Comparison Table**

| Metric               | What It Measures          | Best For          |
| -------------------- | ------------------------- | ----------------- |
| Explained Variance   | Information retained      | PCA               |
| Reconstruction Error | Data loss                 | PCA, Autoencoders |
| Trustworthiness      | Local structure           | t-SNE, UMAP       |
| Continuity           | Neighborhood preservation | t-SNE, UMAP       |
| Silhouette Score     | Cluster separation        | DR + Clustering   |
| Downstream Accuracy  | Real usefulness           | All DR methods    |

---

### **Advantages**

* Reduces complexity
* Improves model performance
* Helps visualization
* Reduces storage & computation

---

### **Limitations**

* Possible loss of information
* Harder interpretation (feature extraction)
* t-SNE & UMAP not ideal for modeling

---

### **Key Takeaways**

1. Dimensionality reduction simplifies high-dimensional data
2. **PCA** is the most common linear method
3. **t-SNE & UMAP** are best for visualization
4. **LDA** uses labels to maximize class separation
5. **Autoencoders** handle complex, non-linear data
6. **No single metric fits all DR methods**
7. **PCA ‚Üí Explained variance & reconstruction error**
8. **t-SNE / UMAP ‚Üí Trustworthiness & continuity**
9. **Autoencoders ‚Üí Reconstruction loss**
10. **Best evaluation ‚Üí downstream task performance**

---

# <center> *End of Topic* </center>