# Evaluation Techniques for Regression Models

### Evaluation techniques for regression models are crucial for assessing their performance. Common metrics include Mean Absolute Error (MAE), which measures the average magnitude of errors, and Mean Squared Error (MSE), which emphasizes larger errors due to squaring the differences. Root Mean Squared Error (RMSE) provides error estimates in the same unit as the target variable.R-squared (RÂ²) indicates the proportion of variance explained by the model, while Adjusted R-squared accounts for the number of predictors. Mean Absolute Percentage Error (MAPE) offers a percentage-basedaccuracy metric, and Median Absolute Error provides a robust measure of central tendency for errors, less influenced by outliers. These metrics help in comparing and refining regression models.

###importing necessary libraries and dataset

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, median_absolute_error


data = pd.read_csv(r'C:\Users\Huawei\Desktop\Reg_EnergyData.csv')

# Features and target
X = data[['Appliances', 'lights', 'T1', 'RH_1', 'Press_mm_hg', 'RH_out', 'Windspeed']]
y = data['Visibility']

### Split the data into train and testing sets

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Linear Regression 

In [6]:
model_lr = LinearRegression()
model_lr.fit(X_train, y_train)
y_pred_lr = model_lr.predict(X_test)

### MSE(Mean Squared Error)

In [7]:
mse_lr = mean_squared_error(y_test, y_pred_lr)
print("MSE(MeanSquaredError): ", mse_lr)

MSE(MeanSquaredError):  5423.592725549897


###. Mean Squared Error (MSE) is a measure of how well a model's predictions match the actual data.  The value 5423.59 indicates that on average the squared differences between predicted and actual values are around 5423.59.  A lower MSE means better model accuracy.

### MAE(Mean Absolute Error)

In [8]:
mae_lr = mean_absolute_error(y_test, y_pred_lr)
print("MSE(MeanAbsoluteError): ", mae_lr)

MSE(MeanAbsoluteError):  37.56916426508724


###. MAE Measures the average magnitude of errors in predictions. The value 37.57 means that on average the model's predictions are about 37.57 units away from the actual values.

### R_Squared

In [9]:
r2_lr = r2_score(y_test, y_pred_lr)
print("R_Squared: ", r2_lr)

R_Squared:  -23.4099813245565


###. The R-squared value indicates how well the model's predictions are close to the actual values. Negative values suggest poor prediction i.e model is not capturing the relationship in the data effectively.

### Adjusted R_Squared

In [10]:
n = len(y_test)
k = X_test.shape[1]
adj_r2 = 1 - (1 - r2_lr) * (n - 1) / (n - k - 1)
print("Adjusted R-squared: ", adj_r2)

Adjusted R-squared:  19.307485993417377


###. Adjusted R-squared shows how well the model's predictions match the actual data considering how many factors (predictors) i used. A value of 19.31% means that the model can account for 19.31% of the differences or variations in the data. it means model misses a lot of variations in the data. 

### RMSE (Root Mean Squared Error)

In [11]:
rmse = np.sqrt(mse_lr)
print("Root mean squared error : ",rmse)


Root mean squared error :  73.64504549221147


###. Root Mean Squared Error (RMSE) measures the average size of errors in predictions. An RMSE of 73.65 means that on average the model predictions are about 73.65 units away from the actual values. Low value of RMSE indicates better accuracy.

### MAPE (Mean Absolute Percentage Error)

In [15]:
nonzero_indices = y_test != 0 #as division by zero will give infinity which is a wrong value
mape = np.mean(np.abs((y_test[nonzero_indices] - y_pred_lr[nonzero_indices]) / y_test[nonzero_indices])) * 100
print("Mean Absolute Percentage Error: ", mape)

Mean Absolute Percentage Error:  60.85610437139359


###. Mean Absolute Percentage Error (MAPE) shows the average percentage by which the model's predictions differ from the actual values. A MAPE of 60.86% means that on average the predictions are about 60.86% differ from the actual values. Low Mape value indicates better accuracy.

### Median Absolute Error

In [14]:
median_abs_error = median_absolute_error(y_test, y_pred_lr)
print("Median Absolute Error: ",median_abs_error)

Median Absolute Error:  1.106500747428072


###. Median Absolute Error shows the size of prediction errors. A value of 1.106... indicates the model's predictions are about 1.106... units away from the actual values.