# **Q1. R-squared in Linear Regression:**

    R-squared (also known as the coefficient of determination) is a statistical measure that represents the proportion of the variance in the dependent variable that is explained by the independent variables in a linear regression model. It quantifies how well the model fits the data compared to a simple mean-based model.

    **Calculation:** R-squared is calculated using the formula:
    \[ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} \]
    Where:
    - \( SS_{res} \) is the sum of squared residuals (the difference between the predicted and actual values).
    - \( SS_{tot} \) is the total sum of squares (the variance of the dependent variable).

    **Interpretation:** R-squared ranges from 0 to 1. A higher R-squared value indicates that a larger proportion of the variance in the dependent variable is explained by the model's independent variables. A value of 1 means that the model perfectly fits the data, while a value of 0 means that the model's predictions are equivalent to the mean of the dependent variable.

# **Q2. Adjusted R-squared:**
    Adjusted R-squared is an extension of R-squared that takes into account the number of independent variables in the model. Unlike R-squared, which can increase with the addition of any variable (even if it doesn't contribute to the model's predictive power), adjusted R-squared penalizes the inclusion of irrelevant variables.

    **Calculation:** Adjusted R-squared is calculated using the formula:
    \[ Adjusted \ R^2 = 1 - \frac{SS_{res} / (n - p - 1)}{SS_{tot} / (n - 1)} \]
    Where:
    - \( n \) is the number of observations.
    - \( p \) is the number of independent variables.

    **Difference:** Adjusted R-squared takes into account both the goodness of fit and the complexity of the model. It tends to be lower than R-squared if the model includes irrelevant variables.

# **Q3. When to Use Adjusted R-squared:**
    Adjusted R-squared is more appropriate when comparing models with different numbers of independent variables. It helps to avoid overfitting by penalizing models that add unnecessary variables. If you are interested in model simplicity and want to avoid overfitting, adjusted R-squared can be a better metric.

# **Q4. RMSE, MSE, and MAE in Regression:**
    These are evaluation metrics used in regression analysis to measure the accuracy of a model's predictions:

    - **Root Mean Squared Error (RMSE):** It is the square root of the average of the squared differences between the predicted and actual values.
    - **Mean Squared Error (MSE):** It is the average of the squared differences between the predicted and actual values.
    - **Mean Absolute Error (MAE):** It is the average of the absolute differences between the predicted and actual values.

    **Calculation:** For each metric, you calculate the relevant differences and then compute the appropriate average or root mean.

    **Interpretation:** All three metrics quantify the magnitude of the prediction errors. Smaller values indicate better model performance.

# **Q5. Advantages and Disadvantages of RMSE, MSE, and MAE:**
    Advantages:
    - **RMSE:** Penalizes larger errors more heavily, which can be appropriate if large errors are more problematic.
    - **MSE:** Useful for optimization as it considers squared errors.
    - **MAE:** More robust to outliers than squared error metrics.

    Disadvantages:
    - **RMSE:** Sensitive to outliers due to squaring of errors.
    - **MSE:** Not directly interpretable in the same units as the original data.
    - **MAE:** Less sensitive to larger errors, potentially not reflecting the true impact of outliers.

# **Q6. Lasso Regularization:**
    Lasso (Least Absolute Shrinkage and Selection Operator) is a regularization technique used in linear regression to prevent overfitting and improve model generalization. Lasso adds a penalty term to the loss function that is proportional to the absolute values of the regression coefficients.

    **Difference from Ridge Regularization:**
    Lasso can lead to sparsity, meaning it tends to drive some coefficients to exactly zero. This makes it a useful feature selection technique. In contrast, Ridge regularization only penalizes the coefficients but doesn't force them to be exactly zero.

    **When to Use Lasso Regularization:**
    Lasso is more appropriate when you suspect that many features are irrelevant or redundant, and you want to perform feature selection by automatically setting some coefficients to zero.

# **Q7. Preventing Overfitting with Regularized Linear Models:**
    Regularized linear models, such as Ridge and Lasso regression, introduce a penalty on the size of the coefficients. This helps prevent overfitting by discouraging the model from fitting noise in the data and promoting simpler models with smaller coefficients. Regularization controls the trade-off between fitting the training data well and keeping the model parameters small.

    **Example:** In Ridge regression, the L2 regularization term penalizes large coefficients. This prevents the model from becoming too sensitive to fluctuations in the training data, reducing overfitting.

# **Q8. Limitations of Regularized Linear Models:**
    - **Loss of Interpretability:** As the coefficients are penalized, their interpretation becomes less straightforward, especially in Lasso where some may be exactly zero.
    - **Hyperparameter Tuning:** Regularization introduces hyperparameters that need to be tuned, and the choice of the regularization strength can be challenging.
    - **Data Scaling:** Regularization techniques are sensitive to the scale of features, requiring feature scaling before applying them.

# **Q9. Comparing Models using RMSE and MAE:**
    For Model A with an RMSE of 10 and Model B with an MAE of 8, it's not straightforward to definitively say which model is better. Both metrics measure different aspects of the model's performance. RMSE places more weight on larger errors, while MAE treats all errors equally. The choice depends on the problem context and how you want to prioritize different types of errors.

# **Q10. Comparing Regularized Linear Models:**
    Comparing Ridge and Lasso models with different regularization parameters depends on the specific data and problem. Generally, you would select the model with the best performance on validation data. Ridge tends to be more appropriate when most features are potentially relevant, while Lasso is suitable when feature selection is desired.

    **Trade-offs:** Ridge maintains all features in the model with smaller coefficients, while Lasso might lead to some features being excluded. Ridge is less likely to lead to exact zero coefficients, allowing for some level of feature importance estimation. Lasso can be useful when a sparse model is preferred, but it might exclude relevant features if the regularization is too strong.