# QUIZ : REGRESSION
---

## Q1. What is Primary goal of linear regression? 
1. Minimize the residual sum of squares between observed and predicted values 
2. Maximize the number of predictors 
3. Find the correlation between variables 
4. Reduce the number of features

The correct answer is: **1. Minimize the residual sum of squares between observed and predicted values** ✅

Linear regression’s main goal is to find the best-fit line that **minimizes the sum of squared differences** (residuals) between the actual values and the values predicted by the model. This is called the **Ordinary Least Squares (OLS)** method.


## Q2. What does the Elastic Net parameter alpha control? 
1. The weight of L1 and L2 regularization 
2. The learning rate 
3. The number of iterations 
4. The size of the dataset

The correct answer is: **1. The weight of L1 and L2 regularization** ✅

In **Elastic Net regression**, the parameter **alpha** controls how much **L1 regularization (Lasso)** and **L2 regularization (Ridge)** are mixed.

* **Alpha = 1** → Pure Lasso (L1)
* **Alpha = 0** → Pure Ridge (L2)
* **Between 0 and 1** → A combination of both.


## Q3. What is the Elastic Net method? 
1. A combination of L1 and L2 regularization 
2. A type of Ridge regression 
3. A method to maximize the cost function 
4. A method for unsupervised learning

The correct answer is: **1. A combination of L1 and L2 regularization** ✅

**Elastic Net** combines:

* **L1 regularization (Lasso)** → helps with feature selection (drives some coefficients to zero)
* **L2 regularization (Ridge)** → helps with multicollinearity and stabilizing coefficients

This hybrid approach is especially useful when there are **many correlated features**.


## Q4. What is a key feature of Lasso Regression that distinguishes it from Ridge Regression? 
1. It increases the coefficients 
2. It can shrink some coefficients to zero, effectively performing feature selection 
3. It penalizes the square of the coefficients 
4. It uses both L1 and L2 regularization

The correct answer is: **2. It can shrink some coefficients to zero, effectively performing feature selection** ✅

**Lasso Regression** uses **L1 regularization**, which can force some coefficients to become exactly zero — meaning those features are effectively removed from the model. This is the main distinction from **Ridge Regression**, which uses **L2 regularization** and only shrinks coefficients toward zero but never exactly to zero.


## Q5. How is the Ridge parameter typically chosen? 
1. By trial and error 
2. Using cross-validation 
3. By setting it to zero 
4. Based on the number of features

The correct answer is: **2. Using cross-validation** ✅

In **Ridge Regression**, the regularization strength parameter (**λ** or **alpha**) is usually chosen through **cross-validation**, where multiple values are tested, and the one giving the best model performance on validation data is selected.


## Q6. What is the amin difference between Ridge Regression and OLS (Ordinary Least Squares)? 
1. Ridge Regression minimizes the sum of squared residuals 
2. Ridge Regression adds a penalty for large coefficients 
3. Ridge Regression maximizes the cost function 
4. Ridge Regression removes irrelevant features

The correct answer is: **2. Ridge Regression adds a penalty for large coefficients** ✅

* **OLS**: Minimizes only the **sum of squared residuals**.
* **Ridge Regression**: Minimizes the **sum of squared residuals + L2 penalty term** (which is the sum of squared coefficients).
  This penalty discourages very large coefficient values, helping to reduce overfitting.


## Q7. What is the main purpose of regularization in linear models? 
1. Increase the complexity of the model 
2. Prevent overfitting by penalizing large coefficients 
3. Improve the model's speed 
4. Ensure perfect accuracy on the training data

The correct answer is: **2. Prevent overfitting by penalizing large coefficients** ✅

Regularization (like **L1**, **L2**, or **Elastic Net**) discourages overly complex models by adding a penalty term to the loss function, which shrinks large coefficients. This helps the model generalize better to **unseen data** instead of just memorizing the training set.


## Q8. Which metric is least sensitive to outliers? 
1. RMSE 
2. MSE 
3. MAE
4. R Square

The correct answer is: **3. MAE (Mean Absolute Error)** ✅

* **MSE** and **RMSE** square the errors, so large errors (outliers) have a much bigger impact.
* **MAE** takes the absolute value of errors, giving equal weight to all errors and making it **less sensitive to outliers**.
* **R²** isn’t a direct error metric and can still be affected by outliers through variance changes.


## Q9. Which of the following is an assumption of linear regression? 
1. Homoscedasticity 
2. Heteroscedasticity 
3. Multicollinearity 
4. Non-linearity

The correct answer is: **1. Homoscedasticity** ✅

**Homoscedasticity** means that the variance of the residuals (errors) is constant across all levels of the independent variable(s).
It’s a core assumption of linear regression, along with:

* Linearity
* Independence of errors
* Normally distributed errors
* No perfect multicollinearity


## Q10. What does the slope coefficient in simple linear regression represent? 
1. The intercept of the regression line 
2. The change in the dependent variable for a unit change in the independent variable 
3. The error term 
4. The mean of the dependent variable

The correct answer is: **2. The change in the dependent variable for a unit change in the independent variable** ✅

In **simple linear regression**:

$$
Y = \beta_0 + \beta_1 X + \epsilon
$$

* **$\beta_1$** (slope) tells us how much **Y** changes when **X** increases by 1 unit, assuming all else stays constant.
* **$\beta_0$** is the intercept, **$\epsilon$** is the error term.


## Q11. What is the objective of gradient descent? 
1. Maximize the cost function 
2. Find the local minima of the cost function 
3. Increase the learning rate 
4. Perform feature scaling

The correct answer is: **2. Find the local minima of the cost function** ✅

**Gradient Descent** is an optimization algorithm used to update model parameters (like weights in regression or neural networks) in order to **minimize the cost/loss function**.
It works by moving step-by-step in the opposite direction of the gradient until it reaches the minimum point.


## Q12. Which type of gradient descet uses the entire dataset to compute the gradient? 
1. Stochastic Gradient Descent 
2. Mini-batch Gradient Descent 
3. Batch Gradient Descent 
4. Adam Optimizer

The correct answer is: **3. Batch Gradient Descent** ✅

* **Batch Gradient Descent** → Uses the **entire dataset** to compute the gradient before each update.
* **Stochastic Gradient Descent (SGD)** → Uses **one sample at a time**.
* **Mini-batch Gradient Descent** → Uses a **small subset of the dataset** for each update.
* **Adam Optimizer** → An advanced optimization algorithm that adapts learning rates.


## Q13. What is the primary advantage of Stochastic Gradient Descent (SGD) over Batch Gradient Descent? 
1. It converges faster 
2. It is more accurate 
3. It uses the entire dataset for updates 
4. It always finds the global minimum

The correct answer is: **1. It converges faster** ✅

**Stochastic Gradient Descent (SGD)** updates parameters **after each training example**, which:

* Makes it faster for large datasets (more frequent updates).
* Allows it to start improving the model without waiting to process the whole dataset.
  However, it introduces more noise in updates, which can sometimes help escape local minima but also causes more fluctuations.


## Q14. It always finds the global minimum 
1. One dependent variable and one independent variable 
2. Multiple dependent variables 
3. One dependent variable and multiple independent variables 
4. Multiple independent variables and dependent variables

It looks like your question is about identifying the correct definition for **multiple linear regression**.

The correct answer is: **3. One dependent variable and multiple independent variables** ✅

* **Simple Linear Regression** → 1 dependent variable, 1 independent variable
* **Multiple Linear Regression** → 1 dependent variable, **2 or more** independent variables
* **Multivariate Regression** → **Multiple dependent variables**


## Q15. What does the p-value of a coefficient in multiple linear regression indicate? 
1. The strength of correlation between variables 
2. The probability that the coefficient is zero 
3. The effect size of the coefficient 
4. The residual error

The correct answer is: **2. The probability that the coefficient is zero** ✅

In multiple linear regression, the **p-value** for a coefficient tests the **null hypothesis** that the coefficient = 0 (no effect).

* **Low p-value (< 0.05)** → Reject the null → The predictor is statistically significant.
* **High p-value** → Fail to reject the null → The predictor may not be contributing significantly.


## Q16. Which techniques is commonly used for feature selection in multiple linear regression? 
1. Ridge Regression 
2. Recursive Feature Elimination (RFE) 
3. K-Means Clustering 
4. Random Forest

The correct answer is: **2. Recursive Feature Elimination (RFE)** ✅

**Recursive Feature Elimination** works by:

1. Fitting the model
2. Ranking features by importance
3. Removing the least important features
4. Repeating the process until the desired number of features remains

It’s a common method for **feature selection** in regression and classification tasks.


## Q17. What is the primary difference between linear and polynomial regression? 
1. The type of data used 
2. The degree of the polynomial used to fit the data 
3. The cost function 
4. The optimization technique

The correct answer is: **2. The degree of the polynomial used to fit the data** ✅

* **Linear Regression** → Fits a straight line ($y = \beta_0 + \beta_1 x$)
* **Polynomial Regression** → Fits a curve by including higher-degree terms ($x^2, x^3, \dots$) in the model, while still being **linear in the coefficients**.


## Q18. When choosing the degree of polynomial in regression, increasing the degree too much can lead to 
1. Underfitting 
2. Overfitting 
3. Reducing bias 
4. Homoscedasticity

The correct answer is: **2. Overfitting** ✅

When the **degree of the polynomial** is too high, the model can fit the training data **too closely**, capturing noise and fluctuations instead of the true underlying pattern.
This results in:

* Very low training error
* Poor generalization to new/unseen data


## Q19. Which method can help identify the optimal degree of a polynomial model? 
1. Cross-validation 
2. Grid Search 
3. Both A and B 
4. Feature Engineering

The correct answer is: **3. Both A and B** ✅

* **Cross-validation** → Tests model performance on multiple splits of the data to find the best degree.
* **Grid Search** → Systematically tries different degree values (and possibly other hyperparameters) to find the optimal one, often combined with cross-validation.


## Q20. What does R Squared represent in regression analysis?
1. The total sum of squares 
2. The proportion of variance in the dependent variable explained by the independent variables 
3. The sum of squared residuals 
4. The model's error

The correct answer is: **2. The proportion of variance in the dependent variable explained by the independent variables** ✅

**R²** tells us how well the model fits the data:

* **R² = 1** → Perfect fit (100% variance explained)
* **R² = 0** → Model explains none of the variance
* Higher R² means better fit, but a very high value can sometimes indicate overfitting if not validated.


## Q 21. A major limitation of R Square is 
1. It decreases with the addition of predictors 
2. It increases regardless of the predictors' significance 
3. It is affected by multicollinearity 
4. It is always equal to 1

The correct answer is: **2. It increases regardless of the predictors' significance** ✅

A key limitation of **R²** is that it **never decreases** when you add more predictors, even if those predictors have no real relationship with the dependent variable.
That’s why **Adjusted R²** is often preferred — it penalizes adding irrelevant variables.


## Q22. How does Adjusted R Square improve upon R Sqaure? 
1. By penalizing the addition of irrelevant predictors 
2. By increasing the model complexity 
3. By ensuring the model is not underfitting 
4. By increasing with the number of predictors

The correct answer is: **1. By penalizing the addition of irrelevant predictors** ✅

**Adjusted R²** adjusts the R² value based on the number of predictors and the sample size.

* If a new predictor doesn’t improve the model enough, **Adjusted R² will decrease**.
* This makes it a better metric for comparing models with different numbers of predictors.


## Q23. Which metric is most sensitive to outliers? 
1. RMSE 
2. MSE 
3. MAE 
4. All are equally sensitive

The correct answer is: **2. MSE (Mean Squared Error)** ✅

Because **MSE** squares the errors, **large errors from outliers become disproportionately larger**, making MSE more sensitive to outliers than RMSE or MAE.

* **RMSE** is also sensitive but slightly less so because it takes the square root at the end.
* **MAE** is least sensitive since it uses absolute values.


## Q24. What does RMSE measure in regression models? 
1. The square root of the average squared differences between actual and predicted values 
2. The average squared differences between actual and predicted values 
3. The average absolute differences between actual and predicted values 
4. The average difference between actual and predicted values

The correct answer is: **1. The square root of the average squared differences between actual and predicted values** ✅

**RMSE (Root Mean Squared Error)** gives the standard deviation of prediction errors:

$$
RMSE = \sqrt{\frac{\sum (y_{pred} - y_{actual})^2}{n}}
$$

It’s in the **same units** as the dependent variable, making it easier to interpret compared to MSE.


## Q25. When is MAE preferred over RMSE? 
1. When the data contains significant outliers 
2. When penalizing larger errors is important 
3. When interpreting squared errors is difficult 
4. Both A and C

The correct answer is: **4. Both A and C** ✅

**MAE (Mean Absolute Error)** is preferred when:

* **A** → The dataset has significant outliers (MAE is less sensitive to them).
* **C** → You want a more intuitive interpretation without squaring errors — it directly represents the average absolute error in the same units as the target variable.


## Q26. Which of the following is a benefit of regularization? 
1. It reduces bias 
2. It prevents overfitting 
3. It increases the number of features 
4. It improves underfitting

The correct answer is: **2. It prevents overfitting** ✅

Regularization techniques (**L1**, **L2**, **Elastic Net**) add a penalty to large coefficients, which:

* Reduces model complexity
* Prevents overfitting
* Improves generalization to unseen data


## Q27. L2 regularization in Ridge Regression penalizes the : 
1. Absolute value of the coefficients 
2. Square of the coefficients 
3. Sum of the coefficients 
4. Logarithm of the coefficients

The correct answer is: **2. Square of the coefficients** ✅

In **Ridge Regression (L2 regularization)**, the penalty term added to the loss function is:

$$
\lambda \sum_{j=1}^p \beta_j^2
$$

This discourages large coefficient values but doesn’t make them exactly zero.


## Q28. What does Lasso Regression introduce into the cost function? 
1. L1 regularization 
2. L2 regularization 
3. Elastic Net regularization 
4. No regularization

The correct answer is: **1. L1 regularization** ✅

**Lasso Regression** adds an **L1 penalty** to the cost function:

$$
\lambda \sum_{j=1}^p |\beta_j|
$$

This penalty can shrink some coefficients to **exactly zero**, effectively performing **feature selection**.


## Q29. When would you prefer Lasso Regression over Ridge Regression? 
1. When you want to eliminate irrelevant features 
2. When all features are equally important 
3. When there is no multicollinearity 
4. When you want to increase model complexity

The correct answer is: **1. When you want to eliminate irrelevant features** ✅

**Lasso Regression** (L1 regularization) can shrink some coefficients to **exactly zero**, effectively removing irrelevant features from the model — making it useful for **feature selection**.

In contrast, **Ridge Regression** (L2) only shrinks coefficients but keeps all features.


## Q30. Elastic Net is beneficial When 
1. You have a large number of correlated features 
2. You want to avoid regularization 
3. You only have a single feature 
4. You have a small dataset

The correct answer is: **1. You have a large number of correlated features** ✅

**Elastic Net** combines **L1** (feature selection) and **L2** (shrinkage, handles multicollinearity) penalties, making it especially useful when:

* There are **many correlated predictors**
* You want the benefits of both Lasso and Ridge methods in one model
