### Question 1

R-squared (also known as the coefficient of determination) is a statistical measure used in linear regression models to indicate how well the independent variables (predictors) explain the variability in the dependent variable (response). It represents the proportion of the variance in the dependent variable that is predictable from the independent variables.

Interpretation:

Range: 0 to 1.
-0: The model explains none of the variability.
-1: The model explains all the variability.
-Example: If R-squared = 0.8 (80%), 80% of the variation is explained by the model.

Formula:-

R2 = 1 - ( SS res/SS total )

### Question 2

Adjusted R-squared is a modified version of R-squared that accounts for the number of independent variables (predictors) in a regression model. It adjusts the R-squared value by penalizing models with more predictors, making it a better measure when comparing models with different numbers of variables.

##### Difference
R-squared measures the proportion of variance explained by the model and always increases with more variables. Adjusted R-squared penalizes for adding unnecessary predictors, increasing only if they improve the model.

R2(adj) = 1-  ( (1- R2) (n-1) ) / n-p-1

## Question 3

Adjusted R-squared is more appropriate when evaluating the goodness of fit of a regression model with multiple predictive features. Unlike R-squared, it adjusts for the number of features in the model, penalizing models with more features to prevent overfitting. Therefore, it provides a more reliable measure of the contribution of each new feature to the model's performance.

## Question 4

1. Root Mean Squared Error (RMSE): It's the square root of the mean of the squared differences between predicted and actual values. RMSE penalizes large errors, providing a measure of the dispersion of errors. Lower values indicate better model performance.

2. Mean Squared Error (MSE): It's the average of the squared differences between predicted and actual values. MSE is sensitive to large errors, making it a good measure for models where large errors are particularly undesirable.

3. Mean Absolute Error (MAE): It's the average of the absolute differences between predicted and actual values. MAE is less sensitive to outliers and provides a better measure of average error magnitude.

These metrics are calculated as follows:

- RMSE: sqrt(sum((y_pred - y_true)^2) / n)
- MSE: sum((y_pred - y_true)^2) / n
- MAE: sum(abs(y_pred - y_true)) / n

Where y_pred is the predicted value, y_true is the true value, and n is the number of observations.

## Question 5

Root Mean Square Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Error (MAE) are common metrics for regression analysis.

##### Advantages:

1. RMSE and MSE are sensitive to large errors, making them useful when large errors are particularly undesirable.
2. MAE is easier to interpret as it is in the same units as the target variable.

##### Disadvantages:

1. RMSE and MSE are sensitive to outliers, which can skew results.
2. MSE is not easily interpretable as it's in squared units.
3. MAE doesn't consider the difference in the size of errors.

## Question 6

Lasso (Least Absolute Shrinkage and Selection Operator) and Ridge (L2 regularization) are techniques used in linear regression to prevent overfitting.

Lasso regularization adds an absolute value of the magnitude of coefficient as penalty term to the loss function. It can lead to zero coefficients, effectively eliminating the corresponding feature from the model. This is useful when you want to select a subset of features that have significant effects on the target variable.

Ridge regularization, on the other hand, adds the squared magnitude of coefficient as penalty term. It usually results in smaller but non-zero coefficients. It's useful when all features have some importance and should be included in the model.

Lasso is more suitable when you have many features and want to select only the most significant ones, while Ridge is more suitable when all features are important.

## Quesiton 7

1. Regularized linear models add a penalty term to the loss function, encouraging smaller coefficients.
2. This helps prevent overfitting by creating a simpler, more generalized model.
3. Ridge regularization encourages smaller, non-zero coefficients, while Lasso can set some coefficients to zero (feature selection).
4. Regularized models can be visualized through learning curves, where training and validation errors converge.
5. Regularization helps balance model complexity and generalization, improving performance on new data.

example-:

Suppose you have a dataset with 10 features and you want to build a linear regression model to predict a target variable. Without any regularization, the model might overfit the training data and perform poorly on new, unseen data. This is because the model could assign large, unrealistic coefficients to the features, leading to a complex model that fits the training data very well but fails to generalize.

Now when we use regularization models than issue of overfitting is resolved and only the important feature are considered.

##  Question 8

Limitations of Regularized Linear Models (Brief)

1. Bias-Variance Tradeoff: Regularization increases bias to reduce variance, potentially leading to underfitting when the true relationship is complex or nonlinear.
2. Feature Selection Challenges: Lasso may exclude important correlated features, while Ridge retains all variables, reducing interpretability in feature selection.
3. Non-linearity in Relationships: Regularized models assume linearity, which can fail to capture nonlinear patterns unless combined with other techniques.
4. Selection of Regularization Parameter: Choosing the right regularization strength (𝜆) can be difficult, with too much or too little leading to underfitting or overfitting.
5. Interpretability: Shrinking coefficients can make the model harder to interpret, especially when many variables are involved.
6. Data Requirements: Regularization is beneficial with many predictors or collinearity, but adds bias when only a few predictors are present.

## Question 9

According to the situation given in the question I would like to go with `Model B` as we know the the performance metrics of the models.

- Model A - 10 RMSE
- Model B - 8 MAE

As RMSE is calculated by squaring the error so it makes it more sensitive to outliers where as MAE calculates the performace directly, hence this is the reason I will go with the `Model B`

Limitation to choice of metrics:
1. RMSE vs. MAE: RMSE is more sensitive to large errors (outliers), while MAE gives an equal penalty to all errors. If the dataset has outliers, RMSE might exaggerate the performance gap. If no outliers are present, MAE might understate their importance.


## Question 10

I would choose Model B (Lasso with λ = 0.5).

Reason:
Lasso performs both regularization and feature selection by setting some coefficients to zero, which is beneficial if only a subset of features is important. With λ = 0.5, it likely provides stronger regularization and feature reduction, improving interpretability.

Limitation:
Lasso can exclude important correlated features, potentially oversimplifying the model if variables are highly related.