<div align="center">

# Machine Learning

</div>


## Machine Learning Introduction 

Machine learning is a subset of artificial intelligence (AI) focused on designing algorithms that enable computers to learn patterns and make decisions from data, without being directly programmed for every possible scenario. Unlike traditional programming (where explicit rules are coded), machine learning algorithms develop their own logic based on examples and feedback, improving performance as they are exposed to more data.

**Types of Machine Learning**

| Type                  | Description                                                                                   | Examples                        |
|-----------------------|-----------------------------------------------------------------------------------------------|----------------------------------|
| Supervised Learning   | Learns from labeled data to predict outcomes.                                                 | Email spam detection, regression |
| Unsupervised Learning | Finds patterns and groupings in unlabeled data.                                               | Customer segmentation, clustering|
| Reinforcement Learning| Learns by trial and error, receiving rewards/penalties from its environment.                  | Game playing, robotics           |

**Key Concepts**

- **Data:** The dataset used for training, crucial for effective learning.
- **Model:** The algorithm or mathematical structure that learns from data (e.g., neural network, decision tree).
- **Training:** The process where the model 'learns' patterns and adjusts its internal settings for best predictions.

**Common Interview Questions**

| Question                                                          | Key Point/Short Answer                                                                    |
|-------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| What is machine learning?                                          | The field where computer systems learn from data without explicit programming.             |
| Types of machine learning?                                         | Supervised, Unsupervised, Reinforcement learning (see above table).                       |
| Difference from traditional programming?                           | ML creates logic from data; traditional uses explicit rules coded by programmers.          |
| Real-world applications?                                           | Spam filtering, recommendations, facial/speech recognition, fraud detection, self-driving.|
| What are features and labels?                                      | Features are input variables; labels are the target outcomes for prediction tasks.         |


## Supervised Machine Learning

Supervised machine learning is a type of machine learning where algorithms are trained using labeled data, meaning each input has a corresponding correct output. The model learns the relationship between features (inputs) and labels (outputs) by analyzing these pairs. Its objective is to predict accurate outcomes for new, unseen data by generalizing from the training data patterns.

**Types of Supervised Learning Tasks**
- **Classification:** Predicting categorical labels (e.g., spam vs. non-spam emails).
- **Regression:** Predicting continuous numerical values (e.g., house price prediction).

**Key Process Steps**
- Train the model on labeled data (features and labels).
- Predict outputs and compare them with actual labels.
- Adjust model parameters to minimize errors.
- Evaluate performance on test data.

| **Aspect**         | **Description**                                         | **Example Tasks**                |
|--------------------|--------------------------------------------------------|----------------------------------|
| Input Data         | Labeled data with features and corresponding labels     | Emails + spam/not spam labels    |
| Goal               | Learn mapping from inputs to outputs                   | Classification, regression       |
| Common Tasks       | Classification and regression                          | Spam detection, price prediction |
| Evaluation         | Measure accuracy or error on test data                 | Accuracy, RMSE                   |

**Common Interview Questions**

| **Question**                             | **Key Point/Short Answer**                                                                |
|------------------------------------------|------------------------------------------------------------------------------------------|
| What is supervised learning?             | Learning from labeled data to predict future outcomes.                                   |
| Difference from unsupervised learning?   | Supervised uses labeled data; unsupervised finds patterns in unlabeled data.             |
| Classification vs. regression?           | Classification predicts categories; regression predicts continuous values.               |
| What is overfitting and how to prevent it?| Overfitting occurs when the model learns noise; prevented by cross-validation, regularization, pruning. |
| Bias-variance tradeoff?                  | The balance between underfitting (high bias) and overfitting (high variance).            |
| Evaluation metrics?                      | Accuracy, Precision, Recall, F1 Score for classification; RMSE, MAE for regression.      |
| Examples of supervised algorithms?       | Linear regression, logistic regression, decision trees, random forest, SVM, k-NN, neural networks. |
| What is cross-validation?                | A technique to assess generalization by splitting data into multiple train/test sets.    |

Here is the previous content with sections that can be logically represented in **tabular format** converted into tables:

---

## 📘 **Theory: Simple Linear Regression**

**Definition:**
Simple Linear Regression is a supervised learning algorithm used to predict a **continuous** target variable $y$ based on a **single** independent variable $x$. It assumes a **linear** relationship between $x$ and $y$ and fits a straight line to the data.

---

### 🔹 **Model Equation**

$$
y = \beta_0 + \beta_1 x + \epsilon
$$

| Symbol     | Meaning                                        |
| ---------- | ---------------------------------------------- |
| $y$        | Dependent (target) variable                    |
| $x$        | Independent (feature) variable                 |
| $\beta_0$  | Intercept (value of $y$ when $x=0$)            |
| $\beta_1$  | Slope (change in $y$ for a unit change in $x$) |
| $\epsilon$ | Error term                                     |

---

### 🔹 **How the Model Learns**

| Step | Description                                                        |
| ---- | ------------------------------------------------------------------ |
| 1    | Estimates $\beta_0$ and $\beta_1$ to minimize prediction errors    |
| 2    | Uses **Mean Squared Error (MSE)** as the cost function             |
| 3    | Applies **Ordinary Least Squares (OLS)** to find the best-fit line |

---

### 🔹 **Assumptions of Linear Regression**

| Assumption       | Description                                |
| ---------------- | ------------------------------------------ |
| Linearity        | Relationship between $x$ and $y$ is linear |
| Independence     | Observations are independent               |
| Homoscedasticity | Residuals have constant variance           |
| Normality        | Residuals are normally distributed         |

---

### 🔹 **Advantages and Disadvantages**

| Advantages                           | Disadvantages                                |
| ------------------------------------ | -------------------------------------------- |
| Easy to implement and interpret      | Assumes linearity; fails on non-linear data  |
| Computationally efficient            | Sensitive to outliers                        |
| Provides a baseline for other models | Poor performance if assumptions are violated |

---

## 🎯 **Interview Insights**

### ✅ **Basic Level**

| Question                          | Answer                                                                                                           |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| What is simple linear regression? | It’s an algorithm that models a linear relationship between one independent variable and one dependent variable. |
| Give a real-world example of SLR. | Predicting house price based on its size.                                                                        |

---

### ✅ **Intermediate Level**

| Question                                              | Answer                                                                                              |
| ----------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| What is the cost function used in linear regression?  | Mean Squared Error (MSE).                                                                           |
| How are parameters $\beta_0$ and $\beta_1$ estimated? | Using the Ordinary Least Squares (OLS) method, which minimizes the sum of squared residuals.        |
| What is $R^2$ in linear regression?                   | A metric that explains the proportion of variance in the dependent variable explained by the model. |

---

### ✅ **Advanced Level**

| Question                                                                            | Answer                                                                                                             |
| ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| Explain the assumptions of linear regression and what happens if they are violated. | Violations may lead to biased or inefficient estimates; e.g., heteroscedasticity affects standard errors.          |
| How would you detect and handle outliers in linear regression?                      | Use residual plots, leverage scores, Cook’s distance; handle by removing or applying robust regression techniques. |
| Why is gradient descent not commonly used for simple linear regression?             | Because OLS has an analytical solution, making it computationally simpler.                                         |


## Cost Functions in Machine Learning

Cost functions (also known as loss or objective functions) quantify how well a machine learning model is performing by measuring the difference between predicted values and actual values. The ultimate goal of training a model is to minimize the cost function, leading to better accuracy.

### Common Cost Functions and Their Use Cases

| **Cost Function**            | **Type**          | **Formula (for single sample)**                                               | **Use Case / Notes**                                                                                            |
|-----------------------------|-------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| Mean Squared Error (MSE)     | Regression        | $$\frac{1}{2} (y_\text{pred} - y_\text{true})^2$$                          | Most popular for regression; penalizes large errors heavily; differentiable and convex                        |
| Mean Absolute Error (MAE)    | Regression        | $$|y_\text{pred} - y_\text{true}|$$                                        | Robust to outliers; less sensitive than MSE; not differentiable at zero                                        |
| Root Mean Squared Error (RMSE) | Regression      | $$\sqrt{\frac{1}{m} \sum (y_\text{pred} - y_\text{true})^2}$$             | Square root of MSE, measuring error in original units                                                          |
| Mean Absolute Percentage Error (MAPE) | Regression | $$\frac{1}{m} \sum \left| \frac{y_\text{true} - y_\text{pred}}{y_\text{true}} \right|$$ | Measures prediction accuracy in percentage terms                                                               |
| Huber Loss                   | Regression        | Piecewise: quadratic if error < δ, linear otherwise                         | Combines advantages of MSE and MAE; less sensitive to outliers                                                 |
| Binary Cross-Entropy         | Binary Classification | $$-[y \log(p) + (1 - y) \log(1 - p)]$$                                    | Measures error between predicted probabilities and actual classes                                              |
| Categorical Cross-Entropy    | Multi-class Classification | $$-\sum_k y_k \log(p_k)$$                                                  | Extends binary cross-entropy to multi-class problems                                                            |
| Hinge Loss                   | Classification    | $$\max(0, 1 - y \cdot f(x))$$                                              | Used by support vector machines; tries to maximize margin between classes                                      |

### Summary

- **Regression Tasks:** MSE, MAE, RMSE, MAPE, and Huber Loss are common. MSE is the most widely used because it penalizes large errors, but MAE and Huber Loss are more robust to outliers.
- **Classification Tasks:** Cross-Entropy Loss (binary or categorical) is prevalent because it works well with probabilistic outputs from classifiers. Hinge Loss is used with support vector machines.
- The choice depends on the problem type, data distribution, robustness needs, and model framework.

### Role in Training

- The cost function outputs a scalar error value representing model performance.
- Optimization algorithms minimize this cost by adjusting model parameters.
- Well-chosen cost functions lead to faster convergence and better generalization.

This overview of cost functions can be directly incorporated into your Jupyter notebook for study or reference.

[1] https://www.analyticssteps.com/blogs/7-types-cost-functions-machine-learning
[2] https://intellipaat.com/blog/cost-function-in-machine-learning/
[3] https://www.alooba.com/skills/concepts/machine-learning-11/cost-functions/
[4] https://www.analytixlabs.co.in/blog/cost-function-in-machine-learning/
[5] https://wisdomplexus.com/blogs/cost-function-in-machine-learning-meaning-types-and-importance/
[6] https://www.numberanalytics.com/blog/ultimate-guide-cost-function-machine-learning
[7] https://www.geeksforgeeks.org/machine-learning/ml-cost-function-in-logistic-regression/

## 📘 **Theory: Multiple Linear Regression**

**Definition:**
Multiple Linear Regression (MLR) is an extension of simple linear regression where the target variable $y$ depends on **two or more independent variables** $x_1, x_2, ..., x_n$.

* It models the relationship between multiple predictors and a continuous outcome.
* The model fits a hyperplane (instead of a line) in an n-dimensional feature space.

---

### 🔹 **Model Equation**

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n + \epsilon
$$

| Symbol               | Meaning                                                     |
| -------------------- | ----------------------------------------------------------- |
| $y$                  | Dependent (target) variable                                 |
| $x_1, x_2, ..., x_n$ | Independent (feature) variables                             |
| $\beta_0$            | Intercept (value of $y$ when all $x_i=0$)                   |
| $\beta_i$            | Coefficient representing the effect of feature $x_i$ on $y$ |
| $\epsilon$           | Error term capturing noise in the data                      |

---

### 🔹 **How the Model Learns**

| Step | Description                                                                                    |
| ---- | ---------------------------------------------------------------------------------------------- |
| 1    | Estimate coefficients $\beta_0, \beta_1, ..., \beta_n$ using **Ordinary Least Squares (OLS)**. |
| 2    | Minimize the **Mean Squared Error (MSE)** cost function.                                       |
| 3    | The fitted model predicts $y$ by summing contributions from all features.                      |

---

### 🔹 **Assumptions of Multiple Linear Regression**

| Assumption           | Description                                           |
| -------------------- | ----------------------------------------------------- |
| Linearity            | Relationship between predictors and target is linear. |
| Independence         | Observations are independent.                         |
| Homoscedasticity     | Residuals have constant variance.                     |
| Normality            | Residuals are normally distributed.                   |
| No Multicollinearity | Predictors are not highly correlated with each other. |

---

### 🔹 **Advantages and Disadvantages**

| Advantages                                           | Disadvantages                                          |
| ---------------------------------------------------- | ------------------------------------------------------ |
| Models relationships with multiple factors           | Sensitive to multicollinearity                         |
| Easy to interpret (coefficients show feature impact) | Assumes linearity, may not capture non-linear patterns |
| Efficient and widely used                            | Outliers can distort the model                         |

---

## 🎯 **Interview Insights**

### ✅ **Basic Level**

| Question                            | Answer                                                                                                           |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| What is multiple linear regression? | It’s a regression technique that predicts a continuous target variable using more than one independent variable. |
| Give an example of MLR.             | Predicting house prices using size, location, and number of rooms.                                               |

---

### ✅ **Intermediate Level**

| Question                                                  | Answer                                                                                   |
| --------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| What cost function is used in multiple linear regression? | Mean Squared Error (MSE).                                                                |
| What is multicollinearity?                                | When independent variables are highly correlated, making coefficient estimates unstable. |
| How can you detect multicollinearity?                     | Using Variance Inflation Factor (VIF) or correlation matrices.                           |

---

### ✅ **Advanced Level**

| Question                                                 | Answer                                                                                                                 |
| -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| How do you handle multicollinearity?                     | Remove correlated variables, apply dimensionality reduction (e.g., PCA), or use regularization (Ridge/Lasso).          |
| What metrics evaluate the model’s performance?           | $R^2$, Adjusted $R^2$, RMSE, MAE.                                                                                      |
| What is the difference between $R^2$ and Adjusted $R^2$? | Adjusted $R^2$ penalizes the addition of irrelevant variables, giving a more reliable measure for multiple predictors. |


## 📘 **Theory: Performance Metrics in Machine Learning**

**Definition:**
Performance metrics are quantitative measures used to evaluate how well a machine learning model performs on unseen data.

* The choice of metric depends on the type of problem: **Regression** or **Classification**.
* Proper evaluation ensures the model generalizes well and is not overfitting.

---

### 🔹 **Performance Metrics for Regression**

| Metric                             | Formula                                   | Range          | Interpretation                                                  |               |                                      |
| ---------------------------------- | ----------------------------------------- | -------------- | --------------------------------------------------------------- | ------------- | ------------------------------------ |
| **Mean Squared Error (MSE)**       | $\frac{1}{n} \sum (\hat{y} - y)^2$        | $[0, \infty)$  | Lower is better; penalizes large errors heavily.                |               |                                      |
| **Root Mean Squared Error (RMSE)** | $\sqrt{\frac{1}{n} \sum (\hat{y} - y)^2}$ | $[0, \infty)$  | Same as MSE but in original units of $y$.                       |               |                                      |
| **Mean Absolute Error (MAE)**      | (\frac{1}{n} \sum                         | \hat{y} - y    | )                                                               | $[0, \infty)$ | Less sensitive to outliers than MSE. |
| **R-Squared ($R^2$)**              | $1 - \frac{SS_{res}}{SS_{tot}}$           | $(-\infty, 1]$ | Proportion of variance explained; closer to 1 is better.        |               |                                      |
| **Adjusted $R^2$**                 | $1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}$  | $(-\infty, 1]$ | Adjusts $R^2$ for number of predictors to avoid overestimation. |               |                                      |

---

### 🔹 **Performance Metrics for Classification**

| Metric                   | Formula                                                             | Range         | Interpretation                                                 |
| ------------------------ | ------------------------------------------------------------------- | ------------- | -------------------------------------------------------------- |
| **Accuracy**             | $\frac{TP + TN}{TP + TN + FP + FN}$                                 | $[0, 1]$      | Percentage of correctly classified instances.                  |
| **Precision**            | $\frac{TP}{TP + FP}$                                                | $[0, 1]$      | Of predicted positives, how many are correct.                  |
| **Recall (Sensitivity)** | $\frac{TP}{TP + FN}$                                                | $[0, 1]$      | Of actual positives, how many are correctly identified.        |
| **F1-Score**             | $2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$         | $[0, 1]$      | Harmonic mean of precision and recall.                         |
| **Specificity**          | $\frac{TN}{TN + FP}$                                                | $[0, 1]$      | Ability to correctly identify negatives.                       |
| **ROC-AUC**              | Area under ROC curve                                                | $[0, 1]$      | Higher values indicate better discrimination between classes.  |
| **Log Loss**             | $-\frac{1}{n} \sum [ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) ]$ | $[0, \infty)$ | Measures accuracy of probability predictions; lower is better. |

---

### 🔹 **Confusion Matrix (for Classification)**

|                     | Predicted Positive  | Predicted Negative  |
| ------------------- | ------------------- | ------------------- |
| **Actual Positive** | True Positive (TP)  | False Negative (FN) |
| **Actual Negative** | False Positive (FP) | True Negative (TN)  |

---

### 🔹 **Special Metrics (for Imbalanced Data)**

* **Precision-Recall Curve:** Focuses on performance with imbalanced classes.
* **Fβ-Score:** Weighted F-score giving more importance to either precision or recall.
* **Matthews Correlation Coefficient (MCC):** Balanced metric even for skewed datasets.

---

## 🎯 **Interview Insights**

### ✅ **Basic Level**

| Question                                          | Answer                                                                |
| ------------------------------------------------- | --------------------------------------------------------------------- |
| What metric do you use for regression?            | Common metrics: MSE, RMSE, MAE, $R^2$.                                |
| What metric do you use for binary classification? | Accuracy, Precision, Recall, F1-Score, ROC-AUC.                       |
| What is a confusion matrix?                       | A table showing correct and incorrect predictions for classification. |

---

### ✅ **Intermediate Level**

| Question                                      | Answer                                                                                    |
| --------------------------------------------- | ----------------------------------------------------------------------------------------- |
| Why is accuracy not always a good metric?     | In imbalanced datasets, accuracy can be misleading because it ignores class distribution. |
| When would you prefer F1-score over accuracy? | When false positives and false negatives are equally important and data is imbalanced.    |
| What does ROC-AUC measure?                    | The ability of a model to distinguish between classes at different thresholds.            |

---

### ✅ **Advanced Level**

| Question                                            | Answer                                                                                                                             |
| --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| What’s the difference between precision and recall? | Precision focuses on correctness of positive predictions, while recall measures coverage of actual positives.                      |
| Why use adjusted $R^2$ instead of $R^2$?            | Adjusted $R^2$ accounts for number of predictors, preventing artificial inflation.                                                 |
| How do you choose a metric for a business problem?  | Based on the cost of misclassification errors and project goals (e.g., recall in medical diagnosis, precision in fraud detection). |


## 📘 **Theory: Overfitting and Underfitting in Machine Learning**

**Definition:**
Overfitting and underfitting are two common problems in model training that affect a model’s ability to generalize to unseen data.

---

### 🔹 **Overfitting**

| Aspect         | Description                                                                                                            |
| -------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **Definition** | When a model learns the training data too well, including noise and outliers, leading to poor performance on new data. |
| **Cause**      | Model is too complex (too many parameters or features).                                                                |
| **Symptoms**   | High training accuracy, low test accuracy.                                                                             |
| **Example**    | A decision tree grown without pruning that memorizes training data.                                                    |

---

### 🔹 **Underfitting**

| Aspect         | Description                                                                                                                          |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| **Definition** | When a model is too simple to capture underlying patterns in the data, resulting in poor performance on both training and test data. |
| **Cause**      | Model lacks complexity or is improperly trained.                                                                                     |
| **Symptoms**   | Low training accuracy and low test accuracy.                                                                                         |
| **Example**    | Using a linear model to fit non-linear data.                                                                                         |

---

### 🔹 **Bias-Variance Trade-off**

| Term         | Description                                                                          |
| ------------ | ------------------------------------------------------------------------------------ |
| **Bias**     | Error due to overly simplistic assumptions (underfitting).                           |
| **Variance** | Error due to model sensitivity to small fluctuations in training data (overfitting). |
| **Goal**     | Find a balance where both bias and variance are minimized.                           |

---

### 🔹 **Techniques to Handle Overfitting**

| Technique                 | Description                                             |
| ------------------------- | ------------------------------------------------------- |
| Cross-Validation          | Use validation data to tune model parameters.           |
| Regularization (L1/L2)    | Penalize large coefficients to simplify the model.      |
| Pruning (for trees)       | Limit depth or remove unnecessary branches.             |
| Early Stopping            | Stop training before the model starts memorizing noise. |
| Dropout (for neural nets) | Randomly drop neurons during training.                  |
| Reduce Features           | Remove irrelevant or highly correlated features.        |

---

### 🔹 **Techniques to Handle Underfitting**

| Technique             | Description                                                  |
| --------------------- | ------------------------------------------------------------ |
| Add Features          | Include more informative predictors.                         |
| Use Complex Models    | Choose models with higher capacity (e.g., ensemble methods). |
| Reduce Regularization | Loosen constraints on parameters.                            |
| Train Longer          | Allow model to learn more patterns from data.                |

---

## 🎯 **Interview Insights**

### ✅ **Basic Level**

| Question                       | Answer                                                                         |
| ------------------------------ | ------------------------------------------------------------------------------ |
| What is overfitting?           | It’s when a model memorizes training data and fails to generalize to new data. |
| What is underfitting?          | It’s when a model is too simple and fails to capture data patterns.            |
| How do you detect overfitting? | Compare training and validation accuracy; large gap indicates overfitting.     |

---

### ✅ **Intermediate Level**

| Question                                    | Answer                                                                                                           |
| ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| Explain the bias-variance trade-off.        | High bias leads to underfitting; high variance leads to overfitting. The trade-off seeks optimal generalization. |
| What techniques prevent overfitting?        | Cross-validation, regularization, pruning, dropout, early stopping.                                              |
| How does regularization reduce overfitting? | It adds a penalty term to the cost function to prevent large coefficients and model complexity.                  |

---

### ✅ **Advanced Level**

| Question                                                                        | Answer                                                                                                                      |
| ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| Why does adding more features sometimes lead to overfitting?                    | Because the model becomes more complex, capturing noise along with patterns.                                                |
| How would you handle overfitting in a neural network?                           | Use dropout, early stopping, and data augmentation.                                                                         |
| Can you explain a scenario where both overfitting and underfitting might occur? | When a model starts underfitting with insufficient training and then overfits as training continues without regularization. |


## 📘 **Theory: Polynomial Linear Regression**

**Definition:**
Polynomial Linear Regression is an extension of simple and multiple linear regression where the relationship between the independent variable(s) and the dependent variable is modeled as an **nth-degree polynomial**.

* It is still a **linear model** because coefficients are linear, but the features are transformed into polynomial terms.

---

### 🔹 **Model Equation**

For a single variable $x$ and polynomial degree $n$:

$$
y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \dots + \beta_n x^n + \epsilon
$$

| Term                   | Meaning                             |
| ---------------------- | ----------------------------------- |
| $y$                    | Dependent (target) variable         |
| $x$                    | Independent variable                |
| $x^2, x^3, \dots, x^n$ | Polynomial features (powers of $x$) |
| $\beta_i$              | Coefficients for each term          |
| $\epsilon$             | Error term                          |

---

### 🔹 **How It Works**

| Step | Description                                                                   |
| ---- | ----------------------------------------------------------------------------- |
| 1    | Transform original feature(s) into polynomial features (e.g., $x^2, x^3$).    |
| 2    | Apply linear regression on transformed features.                              |
| 3    | Fit a curve (instead of a straight line) to capture non-linear relationships. |

---

### 🔹 **Advantages and Disadvantages**

| Advantages                                       | Disadvantages                                   |
| ------------------------------------------------ | ----------------------------------------------- |
| Captures non-linear patterns easily              | High-degree polynomials may lead to overfitting |
| Simple to implement with linear regression tools | Sensitive to outliers                           |
| Provides flexibility in curve fitting            | Computationally expensive for high-degree terms |

---

### 🔹 **Overfitting Risk**

* Higher polynomial degrees can create a curve that fits the training data very closely (overfitting).
* Regularization (Ridge, Lasso) can help control complexity.

---

### 🔹 **Use Cases**

* Modeling growth curves (population, sales trends)
* Capturing non-linear trends in time series
* Engineering applications where relationships are polynomial

---

## 🎯 **Interview Insights**

### ✅ **Basic Level**

| Question                                       | Answer                                                                                                      |
| ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| What is polynomial regression?                 | It’s a regression technique where the model fits a polynomial equation to capture non-linear relationships. |
| Is polynomial regression linear or non-linear? | It’s a linear model because it’s linear in terms of coefficients $\beta$, though features are polynomial.   |

---

### ✅ **Intermediate Level**

| Question                                                | Answer                                                                                                                       |
| ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| How do you implement polynomial regression in practice? | Transform features using polynomial terms (e.g., via `PolynomialFeatures` in scikit-learn) and then apply linear regression. |
| What happens when you increase polynomial degree?       | Model flexibility increases, but risk of overfitting also rises.                                                             |
| How can you choose the right degree of the polynomial?  | Use cross-validation to determine the optimal complexity.                                                                    |

---

### ✅ **Advanced Level**

| Question                                                                                     | Answer                                                                                                                                  |
| -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| Why might polynomial regression perform poorly on extrapolation?                             | Because high-degree polynomials can produce extreme values outside the training range.                                                  |
| How do regularization methods (Ridge/Lasso) help polynomial regression?                      | They penalize large coefficients, reducing overfitting while still modeling non-linearity.                                              |
| How is polynomial regression different from using non-linear algorithms like decision trees? | Polynomial regression assumes a parametric polynomial relationship, while decision trees capture non-linearity in a non-parametric way. |
