1. Concept of R-squared in Linear Regression Models
R-squared (
𝑅
2
R 
2
 ):

Definition: R-squared is a statistical measure that represents the proportion of the variance for the dependent variable that's explained by the independent variables in a regression model.
Calculation: 
𝑅
2
=
1
−
𝑆
𝑆
𝑟
𝑒
𝑠
𝑆
𝑆
𝑡
𝑜
𝑡
R 
2
 =1− 
SS 
tot
​
 
SS 
res
​
 
​
 
𝑆
𝑆
𝑟
𝑒
𝑠
SS 
res
​
 : Sum of squares of residuals.
𝑆
𝑆
𝑡
𝑜
𝑡
SS 
tot
​
 : Total sum of squares (variance of the dependent variable).
Representation: R-squared values range from 0 to 1. A value closer to 1 indicates that a larger proportion of the variance is explained by the model.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Example data
X = [[1], [2], [3], [4], [5]]
y = [1, 2, 3, 4, 5]

# Linear regression model
model = LinearRegression()
model.fit(X, y)
predictions = model.predict(X)

# Calculate R-squared
r_squared = r2_score(y, predictions)
print(f"R-squared: {r_squared}")


In [None]:
2. Definition of Adjusted R-squared and its Difference from Regular R-squared
Adjusted R-squared:

Definition: Adjusted R-squared adjusts the R-squared value based on the number of predictors in the model. It accounts for the addition 
of variables to the model, which can artificially inflate the R-squared value.
Calculation: 
Adjusted 
𝑅
2
=
1
−
(
1
−
𝑅
2
)
(
𝑛
−
1
)
𝑛
−
𝑝
−
1
Adjusted R 
2
 =1− 
n−p−1
(1−R 
2
 )(n−1)
​
 
𝑛
n: Number of observations.
𝑝
p: Number of predictors.
Difference: Unlike R-squared, which can only increase with more predictors, adjusted R-squared can decrease if the added predictors do not
improve the model.

from sklearn.metrics import r2_score

# Example adjusted R-squared calculation
def adjusted_r2(r_squared, n, p):
    return 1 - (1 - r_squared) * (n - 1) / (n - p - 1)

# Parameters
n = 10  # number of observations
p = 2   # number of predictors
r_squared = 0.9  # example R-squared

# Calculate adjusted R-squared
adj_r_squared = adjusted_r2(r_squared, n, p)
print(f"Adjusted R-squared: {adj_r_squared}")


In [None]:
3. When to Use Adjusted R-squared
When to Use: Adjusted R-squared is more appropriate when comparing models with different numbers of predictors. 
It penalizes the addition of predictors that do not improve the model significantly, preventing overfitting.

In [None]:
4. RMSE, MSE, and MAE in Regression Analysis
Definitions:

Mean Squared Error (MSE): The average of the squared differences between the actual and predicted values. 
MSE
=
1
𝑛
∑
𝑖
=
1
𝑛
(
𝑦
𝑖
−
𝑦
^
𝑖
)
2
MSE= 
n
1
​
 ∑ 
i=1
n
​
 (y 
i
​
 − 
y
^
​
  
i
​
 ) 
2
 
Root Mean Squared Error (RMSE): The square root of the MSE. It has the same units as the dependent variable. 
RMSE
=
MSE
RMSE= 
MSE
​
 
Mean Absolute Error (MAE): The average of the absolute differences between the actual and predicted values. 
MAE
=
1
𝑛
∑
𝑖
=
1
𝑛
∣
𝑦
𝑖
−
𝑦
^
𝑖
∣
MAE= 
n
1
​
 ∑ 
i=1
n
​
 ∣y 
i
​
 − 
y
^
​
  
i
​
 ∣

In [None]:
Q5. Advantages and Disadvantages of RMSE, MSE, and MAE
Advantages:

MSE: Penalizes larger errors more than smaller ones due to squaring; useful for emphasizing large errors.
RMSE: Has the same units as the dependent variable; easier to interpret compared to MSE.
MAE: More robust to outliers; represents the average error in absolute terms.
Disadvantages:

MSE: Sensitive to outliers; can be difficult to interpret due to squared units.
RMSE: Sensitive to outliers; can be misleading if not considering the context of the dependent variable.
MAE: Does not penalize larger errors as much as MSE/RMSE; less sensitive to variance.
Usage: Choose the metric based on the context of the problem and the importance of penalizing larger errors versus overall error robustness.

In [None]:
6. Lasso Regularization
Lasso Regularization (Least Absolute Shrinkage and Selection Operator):

Definition: A type of linear regression that includes a penalty equal to the absolute value of the magnitude of coefficients.
Equation: 
Loss
=
RSS
+
𝜆
∑
𝑗
=
1
𝑝
∣
𝛽
𝑗
∣
Loss=RSS+λ∑ 
j=1
p
​
 ∣β 
j
​
 ∣
Difference from Ridge Regularization: Lasso uses 
𝐿
1
L1 norm (absolute values), whereas Ridge uses 
𝐿
2
L2 norm (squared values). Lasso can shrink some coefficients to zero, effectively performing feature selection.
When to Use: When you have many features and expect that only a subset of them are useful for the model.

from sklearn.linear_model import Lasso

# Example data
X = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [1, 2, 3, 4]

# Lasso regression
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)
print(f"Coefficients: {lasso.coef_}")


In [None]:
Q7. Regularized Linear Models and Overfitting
Regularized Linear Models: Models that include a penalty term to reduce the magnitude of coefficients, thereby preventing overfitting.

Ridge: Adds a penalty equal to the sum of the squared coefficients (
𝐿
2
L2 norm).
Lasso: Adds a penalty equal to the sum of the absolute coefficients (
𝐿
1
L1 norm).
from sklearn.linear_model import Ridge

# Ridge regression
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
print(f"Coefficients: {ridge.coef_}")


In [None]:
8. Limitations of Regularized Linear Models
Limitations:

Bias-Variance Tradeoff: Regularization introduces bias, which can lead to underfitting if not balanced correctly.
Feature Scaling: Requires proper feature scaling to work effectively.
Not Always Optimal: May not capture complex, non-linear relationships in the data

In [None]:
Q9. Comparing Regression Models with Different Evaluation Metrics
Comparing RMSE and MAE:

Model A: RMSE of 10.
Model B: MAE of 8.
Choosing the Better Model:

If minimizing large errors is more important, prefer Model A (lower RMSE).
If overall error robustness is more important, prefer Model B (lower MAE).
Limitations: Different metrics capture different aspects of model performance.
It's crucial to consider the context and objectives of the analysis when choosing the evaluation metric.

In [None]:
Q10. Comparing Regularized Linear Models with Different Regularization Methods
Comparing Ridge and Lasso:

Model A (Ridge): Regularization parameter of 0.1.
Model B (Lasso): Regularization parameter of 0.5.
Choosing the Better Model:

If feature selection is essential, prefer Model B (Lasso) as it can shrink coefficients to zero.
If capturing all features with reduced magnitude is essential, prefer Model A (Ridge).
Trade-offs and Limitations:

Lasso: Effective for feature selection but can be less stable in high-dimensional data.
Ridge: Better at handling multicollinearity but doesn't perform feature selection.
from sklearn.linear_model import Lasso, Ridge

# Ridge and Lasso regression
ridge = Ridge(alpha=0.1)
lasso = Lasso(alpha=0.5)

ridge.fit(X, y)
lasso.fit(X, y)

print(f"Ridge coefficients: {ridge.coef_}")
print(f"Lasso coefficients: {lasso.coef_}")
