## R2 / Adjusted R2 with Linear Regression.

R2 tells us how well the model explains the variance in the data.

An R2 close to 1 means a good fit, while an R2 close to 0 means the model is poor.

In [1]:
#Import the required packages.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [2]:
#Sample Data
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) #Study Hours
y = np.array([50, 55, 65, 70, 75, 80, 85, 90, 95, 100]) #Test Scores
#Convert from 1D array to 2D array.
X_reshaped = X.reshape(-1,1)
y_reshaped = y.reshape(-1,1)


In [3]:
X_train, X_test, y_train, y_test = train_test_split(X_reshaped,y_reshaped, test_size=0.25,random_state=42)

In [4]:
#Fit the model.
model = LinearRegression()
model.fit(X_train,y_train)

In [5]:
#Predict the model.
y_pred = model.predict(X_test)

In [6]:
#Calculate R2.
r2Score = r2_score(y_test, y_pred)

In [7]:
print(f'r2Score: {r2Score:.2f}')

r2Score: 0.99


## Adjusted R2
While R2 measures how well the regression model explains the variance in the dependent variable.

It always increases when adding more independent variables. To counteract, we use Adjusted R2.

Formula:
    R2Adj = 1 - ((1-R2)(n-1))/n-p-1). where n = Number of Observations, R2 = Regular R-squared value, p - Number of independent variables.

In [8]:
#Import required packages.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

In [9]:
#Sample Data.
X = np.array([[1, 7],[2, 6.5],[3, 6],[4, 5.5],[5, 5],[6, 6],[7, 7],[8, 8],[9, 9],[10, 10]]) #Study and Sleep Hours.
y = np.array([50, 55, 65, 70, 75, 80, 85, 90, 95, 100]) #Test Scores

In [10]:
#Split data for Training and Testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [11]:
#Fit model.
model = LinearRegression()
model.fit(X_train, y_train)

In [12]:
y_pred = model.predict(X_test)

In [13]:
r2Score = r2_score(y_test,y_pred)

In [17]:
n, p = X.shape

In [19]:
r2AdjScore = 1 - ((1 - r2Score)*(n - 1)/(n - p - 1))

In [20]:
print(f'R2 Adjusted Score: {r2AdjScore}')

R2 Adjusted Score: 0.9921946472001821
