<a href="https://colab.research.google.com/github/NDsasuke/Autocorrelation-function-Diagnostics-and-prediction/blob/main/Diagnostics%20and%20prediction/Model%20Evaluation%20Metrics/Regression_Evaluation_Metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


1. **Fetching the Boston Housing dataset from the original source:**
   - This segment provides an alternative way to fetch the Boston Housing dataset since the `load_boston` function has been removed from scikit-learn.
   - It uses the `pd.read_csv` function to read the dataset from the original source, specifying the URL, delimiter (`sep="\s+"`), and skipping the header rows (`skiprows=22`).
   - The data is then processed to separate the features (`X`) and the target variable (`y`).


In [7]:
# Fetch the Boston Housing dataset
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
y = raw_df.values[1::2, 2]


2. **Splitting the data into train and test sets:**
   - This segment splits the loaded dataset into training and test sets using the `train_test_split` function from scikit-learn's `model_selection` module.
   - It assigns 80% of the data to the training set (`X_train` and `y_train`) and 20% to the test set (`X_test` and `y_test`).
   - The `test_size` parameter is set to 0.2, indicating that we want 20% of the data for testing.


In [8]:

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


3. **Training the linear regression model and making predictions:**
   - This segment creates an instance of the linear regression model using `LinearRegression()` from scikit-learn's `linear_model` module and trains it on the training data using the `fit` method.
   - Once the model is trained, it makes predictions on the test set using the `predict` method and assigns the predicted values to `y_pred`.


In [9]:
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

In [10]:
# Make predictions on the test set
y_pred = model.predict(X_test)


4. **Computing evaluation metrics:**
   - This segment computes various evaluation metrics suitable for regression models.
   - It calculates the mean squared error (MSE) using `mean_squared_error` from scikit-learn's `metrics` module. The MSE measures the average squared difference between the predicted and actual values.
   - The mean absolute error (MAE) is computed using `mean_absolute_error` from the `metrics` module. The MAE measures the average absolute difference between the predicted and actual values.
   - The R-squared score is calculated using `r2_score` from the `metrics` module. R-squared represents the proportion of variance in the dependent variable that can be explained by the independent variables.


In [11]:
# Compute evaluation metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)



5. **Printing the evaluation metrics:**
   - This segment prints the computed evaluation metrics to the console.
   - It displays the mean squared error (MSE), mean absolute error (MAE), and R-squared score using `print("Mean Squared Error (MSE):", mse)`, `print("Mean Absolute Error (MAE):", mae)`, and `print("R-squared:", r2)`.

By executing this code, you will obtain the evaluation metrics for your regression model, allowing you to assess its performance in terms of mean squared error, mean absolute error, and R-squared score.



In [12]:
# Print the evaluation metrics
print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared:", r2)


Mean Squared Error (MSE): 24.291119474973485
Mean Absolute Error (MAE): 3.189091965887834
R-squared: 0.6687594935356325
