# Q-1

### R-squared, also known as the coefficient of determination, is a statistical measure used to evaluate the goodness of fit of a linear regression model. It indicates the proportion of the variance in the dependent variable that can be explained by the independent variables in the model. In other words, R-squared quantifies how well the linear regression model captures the variability in the data.
### R-squared is calculated as the ratio of the explained sum of squares (ESS) to the total sum of squares (TSS). It ranges from 0 to 1, with a higher value indicating a better fit of the model to the data.
### Mathematically, R-squared is calculated as:

R-squared = 1 - (RSS / TSS)

### Interpreting R-squared:
- R-squared of 1: It indicates that the linear regression model explains all the variability in the dependent variable. In this case, all data points fall perfectly on the regression line.
- R-squared close to 1: It indicates that the model captures a large proportion of the variability in the dependent variable. The predictions by the model align closely with the actual values.
- R-squared close to 0: It indicates that the model does not explain much of the variability in the dependent variable. The predictions by the model do not align well with the actual values.
- R-squared negative: It indicates that the model performs worse than simply using the mean value of the dependent variable as the prediction. The model is not capturing any meaningful relationship.

# Q-2

### Adjusted R-squared is a modified version of the R-squared statistic that adjusts for the number of predictors in a regression model. While R-squared measures the proportion of the variance in the dependent variable explained by the independent variables, adjusted R-squared takes into account the number of predictors and penalizes the addition of irrelevant or redundant variables.
### Adjusted R-squared is calculated using the following formula:

Adjusted R-squared = 1 - [(1 - R-squared) * (n - 1) / (n - k - 1)]


### The key difference between R-squared and adjusted R-squared lies in how they handle the number of predictors. R-squared tends to increase as more predictors are added to the model, even if the additional predictors do not contribute meaningfully to explaining the dependent variable. This can lead to overfitting and an overly optimistic assessment of the model's performance.

### In contrast, adjusted R-squared accounts for the number of predictors and adjusts for the degrees of freedom in the model. It penalizes the addition of unnecessary predictors and provides a more conservative estimate of the model's explanatory power. Adjusted R-squared will only increase if the additional predictor improves the model's fit more than would be expected by chance.

# Q-3

### Adjusted R-squared is more appropriate to use when comparing regression models with different numbers of predictors. It addresses the issue of overfitting by penalizing the inclusion of unnecessary predictors, providing a more conservative estimate of the model's explanatory power.
### 

# Q-4

### RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and MAE (Mean Absolute Error) are commonly used metrics in regression analysis to evaluate the performance of a regression model.
### MSE represents the average of the squared differences between the predicted values and the actual values. It is calculated by taking the average of the squared residuals (the differences between predicted and actual values) and provides a measure of the overall model fit. The formula for MSE is:

MSE = (1/n) * Σ(yᵢ - ȳ)²

where yᵢ is the observed value, ȳ is the mean of the observed values, and n is the number of observations.
### RMSE is the square root of MSE and is a popular choice as it has the same unit of measurement as the dependent variable. It represents the standard deviation of the residuals, indicating the average magnitude of the prediction errors. The formula for RMSE is:

RMSE = √(MSE)

### MAE represents the average of the absolute differences between the predicted values and the actual values. It is calculated by taking the average of the absolute residuals, ignoring the direction of the errors. MAE is useful when you want to understand the average magnitude of the prediction errors without considering their direction. The formula for MAE is:
MAE = (1/n) * Σ|yᵢ - ŷ|

where yᵢ is the observed value and ŷ is the predicted value.

### 

# Q-5

### Advantages of RMSE, MSE, and MAE as evaluation metrics in regression analysis:
- Easy interpretation: RMSE, MSE, and MAE are straightforward metrics that provide a clear measure of the prediction accuracy in the same units as the dependent variable. This makes it easy to interpret and compare the performance of different models.
- Sensitivity to outliers: RMSE, MSE, and MAE give equal weight to all errors, including outliers. This can be advantageous when outliers have a significant impact on the overall model performance and need to be appropriately penalized.
- Robustness to non-normality: RMSE, MSE, and MAE do not rely on the assumption of normality in the errors. They are robust metrics that can be used in situations where the error distribution deviates from normality.
### Disadvantages of RMSE, MSE, and MAE as evaluation metrics in regression analysis:
- Lack of context: RMSE, MSE, and MAE do not provide information about the practical significance of the prediction errors. They do not indicate whether the errors are large or small in relation to the problem's context or whether they have any real-world implications.
- Focus on overall error: RMSE, MSE, and MAE consider the overall error across all observations but do not provide insights into the pattern of errors. They do not capture potential heteroscedasticity or systematic biases in the model.
- Sensitivity to scale: RMSE and MSE are sensitive to the scale of the dependent variable since they involve squaring the errors. This means that models with larger values in the dependent variable may have higher RMSE or MSE compared to models with smaller values, even if their prediction accuracy is similar.

# Q-6

### Lasso regularization, also known as L1 regularization, is a technique used in regression analysis to introduce a penalty on the absolute value of the coefficients. It adds a term to the objective function of the regression model, which consists of the sum of the absolute values of the coefficients multiplied by a regularization parameter (lambda) that controls the strength of the penalty.
### Lasso regularization differs from Ridge regularization (L2 regularization) in the type of penalty applied to the coefficients. While Lasso regularization penalizes the absolute values of the coefficients, Ridge regularization penalizes the squared values of the coefficients.
### Lasso regularization is more appropriate to use when there is a need for feature selection or when the data contains a large number of features, some of which may be irrelevant or redundant. By setting some coefficients to zero, Lasso can simplify the model and improve interpretability by focusing on the most relevant features. It is particularly useful when dealing with high-dimensional datasets, where there are more features than observations.
### However, if all the features in the dataset are considered important and there is no need for explicit feature selection, Ridge regularization may be more appropriate. Ridge regularization can be beneficial when dealing with multicollinearity among the features, as it can help reduce the impact of collinear variables by shrinking their coefficients.

# Q-7

### Regularized linear models help prevent overfitting in machine learning by introducing a penalty term to the loss function, which discourages the model from fitting the training data too closely. This penalty term controls the complexity of the model by limiting the magnitude of the coefficients.
### By adding a regularization term, the model is encouraged to find a balance between minimizing the training error and keeping the coefficients small. This regularization constraint helps prevent the model from becoming overly complex and overly sensitive to the training data, thus reducing the risk of overfitting.
### Let's take an example of regularized linear regression, specifically Ridge regression, to illustrate how it helps prevent overfitting. Suppose we have a dataset with one input feature, "x", and the corresponding target variable, "y". We want to fit a linear regression model to predict "y" based on "x".
### Without regularization, the linear regression model may try to fit the training data too closely, capturing noise and outliers in the process. This can lead to overfitting, where the model performs well on the training data but fails to generalize to unseen data.
### By applying Ridge regularization, we add a penalty term to the loss function, which is proportional to the sum of the squared coefficients. This penalty term discourages the model from having large coefficients and helps control the complexity of the model.
### 

# Q-8

### Regularized linear models have certain limitations that make them not always the best choice for regression analysis in all situations. Some of the limitations include:
- Linear Assumption: Regularized linear models assume a linear relationship between the input features and the target variable. However, in real-world scenarios, the relationship may be non-linear. Using a linear model in such cases may result in underfitting and poor predictive performance.
- Feature Interpretability: Regularized linear models tend to shrink the coefficients of less relevant features towards zero, effectively reducing their impact on the model. While this can be advantageous for feature selection and reducing model complexity, it may make it difficult to interpret the contribution of individual features to the target variable.
- Sensitivity to Outliers: Regularized linear models can still be sensitive to outliers in the data, especially when using L1 regularization (Lasso). Outliers can disproportionately influence the model's coefficient estimates, potentially leading to biased predictions.

# Q-9

### If the goal is to penalize larger errors more heavily, then RMSE would be a suitable choice. Since RMSE includes the square of the errors, it places more emphasis on larger errors and may be more sensitive to outliers. In this case, Model A with an RMSE of 10 would indicate a higher average error compared to Model B's MAE of 8.
### On the other hand, if the emphasis is on the average magnitude of the errors without consideration for their squared values, then MAE would be a suitable choice. MAE represents the average absolute difference between predicted and actual values, providing a measure of the average error magnitude. In this case, Model B with an MAE of 8 would indicate a lower average error compared to Model A's RMSE of 10.
### On the other hand, if the emphasis is on the average magnitude of the errors without consideration for their squared values, then MAE would be a suitable choice. MAE represents the average absolute difference between predicted and actual values, providing a measure of the average error magnitude. In this case, Model B with an MAE of 8 would indicate a lower average error compared to Model A's RMSE of 10.

# Q-10

### 