# Machine Learning

## 1. Supervised Learning
- Input data (input features + output) is labelled. Models are trained using labelled dataset, where the model lears about each type of data.

- Once the training process is completed, the model is tested on the basis of test data (a subset of training set), and then it predicts the output.


<img src="https://cdn.prod.website-files.com/5ef788f07804fb7d78a4127a/61cc3ae3acdd638008a38147_supervised%20learning.png">

Supervised learning is devided into 2 types:
- **Regression**
- **Classification**

### 1.1 Regression
It is used when the **dependent variable** is a continious numerical value.

E.g. Predicting tomorrow's temperature.

Regression Algorithms are:
- Linear Regression
- Decission Tree
- Random Forest

#### 1.1.1 Linear Regression
y = m * x + c

Input features
- x

Output feature
- y

Machine learning objective here is to find the value of m (slope) and c (y-intercepter) so that we can apply input (x) and predice the output (y).

------------------------------
Mulitple-Linear Regression looks something like this:
$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \epsilon
$$

#### **Residual Error**
In the machine learning context, the Residual Error = (Y-actual) - (Y-predicted). The machine learning may predict multiple linear-line or model. You will have to choose the one based on Error or Accuracy Metrics.

**Error Metrics** measure how far off your predictions are from the actual values. Lower values mean better performance.

**_Mean Absolute Errors (MAE)_**
$$
MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
$$

**_Mean Squared Errors (MSE)_**
$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

**_Root Mean Square Error (RMSE)_**
$$
RMSE = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }
$$

**_Sum of Squared Errors (SSE)_**
$$
SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$


**Accuracy Metrics**
tell you how well your model is performing overall — in other words, how close the predictions are to the true values, either in percentage or in explained variance. Here are a few:

**_R² - Coefficient of Determination_**
$$
R^2 = 1 - \frac{SSE}{SST} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
$$

**_Regression Accuracy_**
$$
MAPE = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
$$

$$
\text{Accuracy} = 100\% - \text{MAPE}
$$


#### **📊 R² (R-squared) Interpretation**

| R² Value | Interpretation                                      |
|----------|-----------------------------------------------------|
| 0        | Model explains **none** of the variance             |
| 1        | Model explains **all** the variance (perfect fit)   |
| ~0.7+    | Generally considered **good** in most domains       |
| < 0.5    | **Weak** model — may need improvement or tuning     |

#### **📊 RMSE Interpretation Table (Rule of Thumb)**

| Scenario                                             | RMSE Interpretation                         |
|------------------------------------------------------|---------------------------------------------|
| RMSE ≈ 0                                             | ✅ Perfect predictions                       |
| RMSE ≪ mean(target) or std(target)                   | ✅ Very good model                           |
| RMSE ≈ std(target)                                   | ⚖️ Acceptable model                          |
| RMSE ≫


### 1.2 Classification

It is used when the **Dependent variable** is a categorical finite set of value.

E.g. Predict tomorrow's weather to be hot / clody / rain.

Classification Algorithms are:
- Logistic Regression
- Decission Tree
- Random Forest
- K Nearest Neighbor
- SVM - Support Vector Machine
- Ensemble Learning
- Naive Bayes
- etc..

#### 1.2.1 Logistic Regression
Logistic Regression is a supervised machine learning algorithm used for **classification problems** — when your target variable is categorical (e.g., Yes/No, 0/1, Spam/Not Spam).

Despite the name, it is not used for regression, but for predicting probabilities and making binary or multi-class decisions.

Logistic Regression predicts the probability that a given input belongs to a particular class using the sigmoid (logistic) function.

##### 📊 Logistic Regression Probability Function

The logistic regression model predicts the **probability** that the output `y` is 1 (e.g., "Yes", "Approved", "Positive") given the input features `x`.

$$
P(y = 1 \mid x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p)}}
$$

##### 🧠 Explanation:

- $P(y = 1 \mid x)$: Probability that the target `y` equals 1 given input features `x`
- $\beta_0 $: Intercept (bias term)
- $\beta_1, \beta_2, \ldots, \beta_p $: Coefficients (weights) for each feature
- $ x_1, x_2, \ldots, x_p $: Feature values
- $ e $: Euler’s number (approx. 2.718)

##### 📊 Multinomial Logistic Regression

**Multinomial Logistic Regression** is an extension of binary logistic regression used when the **target variable has more than two classes** (e.g., `Accepted`, `Rejected`, `Withdrawn`).

Unlike One-vs-Rest, it models **all classes simultaneously** using the **softmax function**.

---

##### 🧠 Softmax Function

Given a feature vector \( x \), the probability that it belongs to class \( k \) is:

$$
P(y = k \mid x) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}
$$

Where:
- $ z_k = \beta_{k0} + \beta_{k1}x_1 + \cdots + \beta_{kp}x_p $
- $ K $ is the total number of classes
- $ \beta_{ki} $ are the learned coefficients for class \( k \)

---

##### 🔍 How It Works

1. Compute a **score** \( z_k \) for each class using a linear combination of inputs.
2. Apply the **softmax** to convert these scores into **probabilities** that sum to 1.
3. Choose the class with the **highest probability** as the predicted class.



It's called **logistic regression** because it models the probability of a class using a regression-like linear equation (shown below) that is then passed through the logistic (sigmoid) function — not because it performs regression in the traditional sense of predicting continuous values.

$ z_k = \beta_{k0} + \beta_{k1}x_1 + \cdots + \beta_{kp}x_p $

**Note**:

Tree-based Algorithms and Decission Tree are on its own ipynb file.



## 2. Unsupervised Learning
- We take unlabeled data which means it is not categorized and corresponding outputs are also not given.

- Now, this unlabeled input data is fed to the machine learning model in order to train it.

- Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable alogorithms such as k-means clusting, decission tree, etc..

<img src="https://databasetown.com/wp-content/uploads/2023/05/Unsupervised-Learning-1536x1090.jpg">

## Types of Unsupervised Learning:

- **Clustering or Cluster Analysis**

- **Association Rule Learning**

- **Dimensionality Reduction**

- **Anomaly Detection**

Check the unsupervised ipynb file for detailed explanation.
