# Ans : 1

In [None]:
'''
R-squared (coefficient of determination) is a statistical measure that represents the proportion of the variance in the dependent
variable explained by the independent variables in a linear regression model. It ranges from 0 to 1, with higher values indicating
a better fit of the model to the data.

Mathematically, R-squared is calculated as:

[ R^2 = 1 - frac{text{Sum of Squared Residuals}}{\text{Total Sum of Squares}} ]

- Sum of Squared Residuals: The sum of the squared differences between the observed and predicted values.
- Total Sum of Squares: The sum of the squared differences between the observed values and the mean of the dependent variable.

R-squared values close to 1 indicate that a high percentage of the variability in the dependent variable is explained by the
independent variables, suggesting a good fit. Conversely, values close to 0 indicate that the model does not explain much of 
the variability, suggesting a poor fit. It's important to note that R-squared should be interpreted alongside other metrics
and considerations, as a high R-squared does not guarantee a causal relationship or the absence of overfitting.

'''

# Ans : 2

In [None]:
'''
Adjusted R-squared is a modified version of the regular R-squared that takes into account the number of predictors in a linear
regression model. While R-squared measures the proportion of variance explained by all the predictors, adjusted R-squared 
adjusts this value based on the number of predictors, penalizing the inclusion of irrelevant variables.

Mathematically, adjusted R-squared is calculated using the formula:

[ text{Adjusted } R^2 = 1 - left( frac{(1 - R^2)(n - 1)}{n - k - 1} \right) ]

where:
- \( R^2 \) is the regular R-squared.
- \( n \) is the number of observations.
- \( k \) is the number of predictors.

Adjusted R-squared increases only if adding a new predictor improves the model more than would be expected by chance. 
It provides a more conservative evaluation of model fit, helping to prevent overfitting by penalizing the inclusion of
unnecessary variables. Researchers often prefer adjusted R-squared when comparing models with different numbers of predictors.
'''

# Ans : 3

In [None]:
'''
Adjusted R-squared is more appropriate when comparing and evaluating models with different numbers of predictors, making it 
useful in situations where the complexity of the model needs to be considered. Regular R-squared tends to increase as more
predictors are added, even if those predictors do not significantly contribute to explaining the variance in the dependent 
variable. Adjusted R-squared addresses this issue by penalizing models for including irrelevant variables.

Researchers and analysts often prefer adjusted R-squared in scenarios where model simplicity and parsimony are important. 
It helps in selecting models that strike a balance between explaining variance and avoiding overfitting. Adjusted R-squared 
is particularly valuable when working with regression models with a varying number of predictors or when comparing nested
models. Choosing adjusted R-squared over regular R-squared provides a more conservative and realistic measure of a model's
goodness of fit.

'''

# Ans : 4

In [None]:
'''
In regression analysis, RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and MAE (Mean Absolute Error) are metrics used to assess the accuracy of predictive models.

1. RMSE (Root Mean Squared Error): It is calculated by taking the square root of the average of the squared differences
        between predicted and actual values. Mathematically, RMSE = (sqrt{frac{sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{n}}).
        RMSE provides a measure of the model's overall accuracy, with higher values indicating larger prediction errors.

2. MSE (Mean Squared Error): It is the average of the squared differences between predicted and actual values.
        MSE = (frac{sum_{i=1}^{n}(y_i - hat{y}_i)^2}{n}). Like RMSE, MSE quantifies the average squared error and is 
        sensitive to larger errors.

3. MAE (Mean Absolute Error): It is the average of the absolute differences between predicted and actual values.
        MAE = (frac{sum_{i=1}^{n}|y_i - hat{y}_i|}{n}). MAE is less sensitive to outliers compared to RMSE and MSE, providing a 
        measure of the average absolute prediction error.

Lower values for RMSE, MSE, and MAE indicate better model performance in terms of accuracy and precision.

'''

# Ans : 5

In [None]:
'''
Advantages:

1. RMSE (Root Mean Squared Error):
   - Penalizes large errors more heavily, providing a stronger emphasis on significant deviations.
   - More sensitive to outliers, making it suitable when large errors are particularly important.

2. MSE (Mean Squared Error):
   - Easier to compute than RMSE since it doesn't involve the square root operation.
   - Emphasizes larger errors, helping to identify substantial deviations from predictions.

3. MAE (Mean Absolute Error):
   - Robust to outliers as it does not square the errors, providing a more balanced view of overall accuracy.
   - Intuitively interpretable, representing the average magnitude of prediction errors.

Disadvantages:

1. RMSE:
   - Sensitive to outliers, which can skew the evaluation if the dataset contains extreme values.
   - May not be suitable when the emphasis is on smaller errors or when outliers need to be downplayed.

2. MSE:
   - Similar to RMSE, it can be heavily influenced by outliers, impacting the interpretation of overall model performance.

3. MAE:
   - Ignores the relative importance of large errors, potentially downplaying their significance.
   - Less sensitive to extreme values, which might be crucial in certain applications.

Choosing the appropriate metric depends on the specific goals and characteristics of the dataset, with RMSE and MSE often
favored when larger errors require more attention, and MAE preferred for a more robust evaluation against outliers.

'''

# Ans : 6

In [None]:
'''
Lasso regularization, or L1 regularization, is a technique used in linear regression to prevent overfitting and feature 
selection. It adds a penalty term to the cost function proportional to the absolute values of the regression coefficients. 
The Lasso objective function is formulated as the sum of the squared residuals and the absolute values of the coefficients
multiplied by a regularization parameter (λ).

[text{Lasso Objective Function} = text{Sum of Squared Residuals} + lambda sum_{j=1}^{p} |b_j| ]

Here, (b_j) represents the regression coefficients, and (p) is the number of predictors.

The key difference between Lasso and Ridge regularization (L2 regularization) is the penalty term. While Ridge uses the squared
values of the coefficients, Lasso uses their absolute values. This leads Lasso to enforce sparsity, meaning it tends to drive
some coefficients exactly to zero, effectively performing feature selection.

Lasso is more appropriate when dealing with datasets where many features may be irrelevant or redundant, and a simpler,
more interpretable model is desired.

'''

# Ans : 7

In [None]:
'''
Regularized linear models, such as Lasso and Ridge regression, help prevent overfitting in machine learning by adding a 
penalty term to the cost function, which discourages excessively complex models with overly large coefficients.
This regularization term controls the trade-off between fitting the training data well and keeping the model parameters
within reasonable bounds.

For example, consider Lasso regression, which adds a penalty term proportional to the absolute values of the regression 
coefficients. If the model has many features and some are irrelevant, Lasso tends to drive the coefficients of irrelevant
features to zero, effectively excluding them from the model. This feature selection property prevents the model from fitting 
noise in the training data and results in a more generalized model.

'''


# Ans : 8

In [None]:
'''
Regularized linear models, such as Lasso and Ridge regression, have some limitations that may make them less suitable in certain situations:

1. Loss of Interpretability: The regularization term can make interpretation of individual coefficients challenging, particularly when features are penalized or excluded.
  
2. Sensitivity to Outliers: Regularized models may be sensitive to outliers, and the penalty term could be disproportionately influenced by extreme values.
  
3. Hyperparameter Tuning: The performance of regularized models depends on choosing an appropriate regularization strength(lambda/alpha), which requires careful tuning and validation.

4. Data Scaling Sensitivity: Regularized models are sensitive to the scale of the features, and it is often necessary to scale or normalize the data.

5. Not Ideal for Every Dataset: In cases where the relationship between features and the target variable is truly linear or 
when the number of features is small, the added complexity of regularization may not provide significant benefits.

'''

# Ans : 9

In [None]:
'''
The choice of the better performer depends on the specific goals and characteristics of the problem. If the primary concern 
is minimizing the impact of larger errors, Model A with an RMSE of 10 may be preferable since RMSE penalizes larger errors 
more heavily. On the other hand, if the focus is on the average magnitude of errors without giving more weight to outliers, 
Model B with an MAE of 8 might be preferred.

Limitations to consider:
1. Sensitivity to Outliers: RMSE is more sensitive to outliers than MAE. If the dataset contains significant outliers, RMSE may be disproportionately influenced.
   
2. Interpretability: MAE is more interpretable since it represents the average absolute error. RMSE involves a square root operation, which may make interpretation less straightforward.

3. Problem-Specific Goals: The choice should align with the specific goals of the analysis. For example, in financial applications, large errors might be more critical than in other domains.

'''

# Ans : 10

In [None]:
'''
The choice between Ridge and Lasso regularization depends on the specific characteristics of the dataset and the goals of the analysis.

1. *Model A (Ridge with λ=0.1): Ridge regularization adds a penalty term proportional to the squared values of the coefficients. It is effective in handling multicollinearity and preventing overly large coefficients. A lower regularization parameter (λ=0.1) indicates a milder penalty, allowing for a balance between fitting the data and regularization.

2. **Model B (Lasso with λ=0.5):** Lasso regularization includes a penalty term proportional to the absolute values of the coefficients, encouraging sparsity and potentially driving some coefficients to exactly zero. A higher regularization parameter (λ=0.5) suggests a stronger penalty, favoring a more parsimonious model with feature selection.

Trade-offs and limitations:
- **Ridge:** Suitable when multicollinearity is a concern, and all features are expected to contribute. However, it might not perform well in situations where some features are truly irrelevant.
  
- **Lasso:** Useful for feature selection, but it tends to choose only one variable among a group of correlated variables. It might not be suitable when all features are expected to contribute.

