# Adjusted R-squared Score

**Definition:**  
Adjusted R-squared is a modified version of R-squared (R²) that adjusts for the number of predictors in a regression model. Unlike R², which can artificially inflate with the addition of more predictors, Adjusted R-squared accounts for the degrees of freedom and provides a more accurate measure of the model's goodness-of-fit when multiple predictors are involved.

**Formula:**

$$
\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2) \cdot (n - 1)}{n - p - 1} \right)
$$

where:
- \( R^2 \) is the R-squared value of the model.
- \( n \) is the number of observations (data points).
- \( p \) is the number of predictors (independent variables) in the model.

**Importance of Adjusted R²:**
Adjusted R² is an essential metric for evaluating the performance of regression models, especially when comparing models with a different number of predictors. It helps in understanding how well the model fits the data while penalizing the inclusion of unnecessary predictors.

- **Model Comparison:** Adjusted R² allows for a fair comparison between models with different numbers of predictors, making it easier to determine which model is more effective.
- **Feature Selection:** A higher Adjusted R² indicates a better model fit that explains the variance in the dependent variable without overfitting.

**Interpretation:**
- **Higher Adjusted R²:** Indicates a better fit of the model to the data, taking into account the number of predictors.
- **Lower Adjusted R²:** May suggest that the model is overfitting or that additional predictors do not significantly improve the model.

**Example:**
Consider a regression problem where we are predicting the price of houses based on their sizes and locations. Suppose we have the following R² value and model details:

- \( R^2 = 0.85 \)
- Number of observations (\( n \)) = 100
- Number of predictors (\( p \)) = 5

1. Calculate Adjusted R²:

Using the formula:

$$
\text{Adjusted } R^2 = 1 - \left( \frac{(1 - 0.85) \cdot (100 - 1)}{100 - 5 - 1} \right)
$$

Calculating:

$$
\text{Adjusted } R^2 = 1 - \left( \frac{0.15 \cdot 99}{94} \right) \approx 1 - 0.1585 \approx 0.8415
$$

This indicates that approximately 84.15% of the variance in the actual house prices is explained by the model after adjusting for the number of predictors.

**Conclusion:**
Adjusted R-squared is a valuable metric for assessing the performance of regression models, especially when multiple predictors are involved. It provides insights into the model's effectiveness while accounting for the number of predictors, thus preventing overfitting. By considering Adjusted R² alongside other metrics, stakeholders can gain a comprehensive view of model performance.


In [5]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Sample data
data = {
    'Size': [1500, 1600, 1700, 1800, 1900],
    'Location': [1, 2, 1, 2, 1],  # Categorical variable encoded as numeric
    'Price': [300000, 320000, 340000, 360000, 380000]
}

# Create DataFrame
df = pd.DataFrame(data)

# Independent variables (predictors)
X = df[['Size', 'Location']]
# Dependent variable (target)
y = df['Price']

# Fit linear regression model
model = LinearRegression()
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Calculate R²
r_squared = r2_score(y, y_pred)

# Calculate Adjusted R²
n = len(y)  # Number of observations
p = X.shape[1]  # Number of predictors
adjusted_r_squared = 1 - ((1 - r_squared) * (n - 1)) / (n - p - 1)

print(f"R²: {r_squared:.4f}")
print(f"Adjusted R²: {adjusted_r_squared:.4f}")

R²: 1.0000
Adjusted R²: 1.0000
