# **Introduction to Supervised Learning**


**Supervised Learning** is a branch of Machine Learning where models are trained using **labeled data** ‚Äî meaning each input has a known output.  
The goal is to learn a mapping from **inputs (features)** to **outputs (targets)** so that the model can make accurate predictions on new, unseen data.  

---

### Key Characteristics
- Works with **labeled datasets** (input + correct output).  
- The model learns patterns between **X (features)** and **y (target)**.  
- Used for both **classification** (predicting categories) and **regression** (predicting continuous values).  

---

### Common Algorithms
- **Linear Regression** ‚Äî Predict continuous values.  
- **Logistic Regression** ‚Äî Binary classification (e.g., spam vs. not spam).  
- **Decision Trees & Random Forests** ‚Äî Handle complex, non-linear relationships.  
- **Support Vector Machines (SVM)** ‚Äî Find optimal boundaries between classes.  
- **K-Nearest Neighbors (KNN)** ‚Äî Classify based on similarity.  

---

### Examples
- Predicting **house prices** based on features (regression).  
- Classifying **emails** as spam or not (classification).  
- Predicting **customer churn** or **loan default risk**.




### Regression vs Classification

**Regression** and **Classification** are the two main types of problems in supervised learning.  
Both use labeled data, but they differ in what they predict and how results are evaluated.

---

### Regression
Regression is used when the target variable is **continuous**.

#### Examples:
- Predicting **house prices**
- Estimating **ride distance**
- Forecasting **sales** or **temperature**

#### Output:
- A **numeric value** (e.g., 42.5 km, ‚Çπ1500)

#### Common Metrics:
- MAE (Mean Absolute Error)  
- MSE / RMSE  
- R¬≤ Score  

---

### Classification
Classification is used when the target variable is **categorical**.

#### Examples:
- Predicting whether a ride is **Completed / Cancelled**  
- Classifying emails as **Spam / Not Spam**  
- Identifying customer sentiment as **Positive / Negative / Neutral**

#### Output:
- A **class label** (0/1 or categories)

#### Common Metrics:
- Accuracy  
- Precision, Recall, F1-score  
- Confusion Matrix  

---

### Key Differences

| Aspect            | Regression                     | Classification                |
|------------------|--------------------------------|-------------------------------|
| Target Type       | Continuous numbers             | Discrete categories           |
| Output Example    | 120.5 (price)                  | ‚ÄúCompleted‚Äù or ‚ÄúCancelled‚Äù    |
| Algorithms        | Linear Regression, SVR         | Logistic Regression, SVM, DT  |
| Evaluation        | Error-based metrics            | Classification metrics        |
| Goal              | Predict exact values           | Predict correct class         |

---


### Importance of Splitting the Dataset into Train & Test Sets

Splitting a dataset into **training** and **testing** sets is a crucial step in supervised learning.  
It ensures that your model learns from one portion of the data and is evaluated on **unseen data**, just like in real-world scenarios.

---

### üß†Why It‚Äôs Important

#### Prevents Overfitting  
If you train and test on the same data, the model memorizes patterns instead of learning them.  
Testing on unseen data shows how well the model generalizes.

#### Gives a Realistic Performance Check  
The test set simulates new data.  
A good model performs well **both on training and test sets**.

#### Helps Compare Algorithms Fairly  
By using the same split, you can compare different models on identical test data.

#### Detects Data Leakage  
Keeping test data separate ensures no target information leaks into training, maintaining model reliability.

---


