<a href="https://colab.research.google.com/github/Ehtisham1053/Regression-ML-Algorithms/blob/main/Regression_metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [23]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

In [24]:
df = pd.read_csv('audi.csv')

In [25]:
x = df.drop('price', axis=1)
y = df['price']

In [26]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

In [27]:
c = ColumnTransformer(transformers=[
("onehot" , OneHotEncoder(handle_unknown='ignore' , sparse_output=False , drop='first') , ['model' , 'transmission' , 'fuelType']),
                                    ("scaling" , StandardScaler() , ['mileage' , 'tax' , 'mpg' , 'engineSize'])
],
                      remainder='passthrough')

In [28]:
x_train = c.fit_transform(x_train)
x_test = c.transform(x_test)


# implementing the regression metrics using python class

In [29]:
import numpy as np
import matplotlib.pyplot as plt

class LinearRegressionManual:
    def __init__(self, x_train, y_train, x_test, y_test):
        self.x_train = np.array(x_train)
        self.y_train = np.array(y_train)
        self.x_test = np.array(x_test)
        self.y_test = np.array(y_test)
        self.m = None
        self.c = None
        self.y_pred = None

    def train(self):
        """Train the model using OLS method"""
        X = np.c_[np.ones(self.x_train.shape[0]), self.x_train]
        theta = np.linalg.inv(X.T @ X) @ X.T @ self.y_train
        self.c = theta[0]
        self.m = theta[1:]

    def predict(self):
        """Make predictions using the trained model"""
        X_test = np.c_[np.ones(self.x_test.shape[0]), self.x_test]
        self.y_pred = X_test @ np.r_[self.c, self.m]
        return self.y_pred

    def mse(self):
        """Mean Squared Error"""
        return np.mean((self.y_test - self.y_pred) ** 2)

    def mae(self):
        """Mean Absolute Error"""
        return np.mean(np.abs(self.y_test - self.y_pred))

    def rmse(self):
        """Root Mean Squared Error"""
        return np.sqrt(self.mse())

    def r2_score(self):
        """R² Score"""
        ss_total = np.sum((self.y_test - np.mean(self.y_test)) ** 2)
        ss_residual = np.sum((self.y_test - self.y_pred) ** 2)
        return 1 - (ss_residual / ss_total)

    def adjusted_r2(self):
        """Adjusted R² Score"""
        n = len(self.y_test)
        p = self.x_test.shape[1]
        r2 = self.r2_score()
        return 1 - ((1 - r2) * (n - 1) / (n - p - 1))

    def evaluate(self):
        """Print evaluation metrics"""
        print(f"MSE: {self.mse():.4f}")
        print(f"MAE: {self.mae():.4f}")
        print(f"RMSE: {self.rmse():.4f}")
        print(f"R² Score: {self.r2_score():.4f}")
        print(f"Adjusted R² Score: {self.adjusted_r2():.4f}")

    def plot_regression(self):
        """Plot the regression line (only for single feature regression)"""
        if self.x_test.shape[1] == 1:
            plt.scatter(self.x_test, self.y_test, color="blue", label="Actual Data")
            plt.plot(self.x_test, self.y_pred, color="red", linewidth=2, label="Regression Line")
            plt.xlabel("X Test")
            plt.ylabel("Y Test")
            plt.legend()
            plt.title("Linear Regression Model")
            plt.show()
        else:
            print("Plotting is only supported for single feature regression.")





In [30]:
model = LinearRegressionManual(x_train, y_train, x_test, y_test)
model.train()
model.predict()
model.evaluate()
model.plot_regression()

MSE: 15807273.4085
MAE: 2605.5566
RMSE: 3975.8362
R² Score: 0.8813
Adjusted R² Score: 0.8793
Plotting is only supported for single feature regression.


#📌 Linear Regression (Manual Implementation) – Explanation
Linear Regression is a fundamental algorithm used in machine learning to predict a continuous value based on input features. It establishes a linear relationship between the dependent variable (target) and independent variable(s) (features). The objective is to find the best-fit line that minimizes the difference between the actual and predicted values.

*  1️⃣ Training the Model (Finding Coefficients)
The model learns by calculating the slope (weights) and intercept, which define the best-fit line. These values are obtained by minimizing the total error, ensuring the predictions are as close as possible to the actual values.

* 2️⃣ Making Predictions
Once the model is trained, predictions are made by plugging the input values into the learned equation. This helps estimate outputs for new, unseen data.

## 📌 Regression Metrics (Manual Implementation)
To evaluate how well the model performs, we use various metrics to measure the error and accuracy.

* 1️⃣ Mean Squared Error (MSE)
This metric calculates the average squared differences between actual and predicted values. Squaring the errors ensures that large deviations contribute more to the total error, making it useful for detecting large mistakes. Lower values indicate better model performance.

* 2️⃣ Mean Absolute Error (MAE)
Unlike MSE, this metric calculates the average absolute differences without squaring them. It provides a straightforward measure of how far predictions are from the actual values, making it more interpretable.

* 3️⃣ Root Mean Squared Error (RMSE)
This metric is similar to MSE but takes the square root of the result to bring the error back to the same scale as the target variable. It helps understand the impact of errors in real-world scenarios.

* 4️⃣ R² Score (Coefficient of Determination)
This metric evaluates how well the independent variables explain the variance in the dependent variable. A value close to 1 means the model fits well, while a value closer to 0 suggests the model does not explain the data effectively.

* 5️⃣ Adjusted R² Score
This metric improves upon the R² score by adjusting for the number of input features. It ensures that adding unnecessary features does not falsely inflate the model’s accuracy. This is particularly useful in models with multiple input variables.

###📌 Model Evaluation
After training the model and making predictions, we use these metrics to assess performance. If the error values are high, it may indicate issues such as underfitting or overfitting, requiring further adjustments to improve accuracy.



# Implementation using sklearn library

In [35]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

class SklearnRegressionMetrics:
    def __init__(self, x_train, y_train, x_test, y_test):
        """
        Initialize with preprocessed training and testing data.
        Train the Linear Regression model using Sklearn.
        """
        self.x_train = np.array(x_train)
        self.y_train = np.array(y_train)
        self.x_test = np.array(x_test)
        self.y_test = np.array(y_test)

        self.model = LinearRegression()
        self.model.fit(self.x_train, self.y_train)
        self.y_pred = self.model.predict(self.x_test)

        self.n_test = len(self.x_test)
        self.k = self.x_train.shape[1] if self.x_train.ndim > 1 else 1

    def mse(self):
        """ Mean Squared Error (MSE) """
        return mean_squared_error(self.y_test, self.y_pred)

    def mae(self):
        """ Mean Absolute Error (MAE) """
        return mean_absolute_error(self.y_test, self.y_pred)

    def rmse(self):
        """ Root Mean Squared Error (RMSE) """
        return np.sqrt(self.mse())

    def r2_score(self):
        """ R² Score """
        return r2_score(self.y_test, self.y_pred)

    def adjusted_r2(self):
        """ Adjusted R² Score """
        r2 = self.r2_score()
        return 1 - ((1 - r2) * (self.n_test - 1) / (self.n_test - self.k - 1))

    def print_metrics(self):
        """ Print all regression metrics """
        print(f"Slope (m): {self.model.coef_}")
        print(f"Intercept (c): {self.model.intercept_}")
        print(f"MSE: {self.mse():.4f}")
        print(f"MAE: {self.mae():.4f}")
        print(f"RMSE: {self.rmse():.4f}")
        print(f"R² Score: {self.r2_score():.4f}")
        print(f"Adjusted R² Score: {self.adjusted_r2():.4f}")


In [37]:
regression = SklearnRegressionMetrics(x_train, y_train, x_test, y_test)
regression.print_metrics()


Slope (m): [ 1.92178629e+04  1.29739314e+03  1.54180779e+03  2.96542649e+03
  3.38921603e+03  4.78655410e+03  7.78215999e+03  1.25767636e+03
  2.84651695e+03  6.64017522e+03  1.52803428e+04  2.40532932e+04
  6.29106831e+04  9.61851279e+03  2.07114640e+04  2.07030990e+04
  2.61524962e+04  1.85973608e+04  3.98577106e+03  8.95570881e+03
  2.56006846e+03  9.48877896e+03  1.02110331e+04  1.80500328e+04
  3.52883742e+03 -1.50516734e+03  5.04928184e+01  3.38360445e+04
 -9.22835957e+02 -1.81255447e+03 -1.91903304e+03 -3.81772774e+03
  2.72687711e+03  1.83974092e+03]
Intercept (c): -3690450.7937985533
MSE: 15807273.4084
MAE: 2605.5566
RMSE: 3975.8362
R² Score: 0.8813
Adjusted R² Score: 0.8793
