# **Demo: Metrics and Methods for Model Performance Evaluation**

## **Step 1: Load the Dataset**

In [1]:
from sqlalchemy import create_engine
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

# Connection string components
server = 'DESKTOP'        # Server name
database = 'Transactions' # Database name
driver= 'SQL Server'                   

# SQLAlchemy connection string
connection_string = f'mssql+pyodbc://{server}/{database}?driver={driver}&trusted_connection=yes'

# Create the engine
engine = create_engine(connection_string)

# SQL query
query = '''
SELECT transaction_id, customer_id, 
       CAST(date AS DATE) as date, 
       CAST(time AS TIME) as time, 
       product_name, category, quantity, price
FROM dbo.Transactions
'''

# Use the engine to connect and execute the query
df = pd.read_sql_query(query, engine)

## **Step 2:** **Feature Engineering**

In [2]:
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.dayofweek
df['day_of_month'] = df['date'].dt.day
features = ['day_of_week', 'day_of_month']

## **Step 3: Split Data**

In [3]:
# Target variable
y = df['price']

# Splitting the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(df[features], y, test_size=0.2, random_state=44)

## **Step 4:** **Scaling Features**

In [4]:
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## **Step 5:** **Initialize the model**

In [5]:
model = RandomForestRegressor(n_estimators=100, random_state=44)

## **Step 6:** **Train the Model**

In [6]:
model.fit(X_train_scaled, y_train)

## **Step 7:** **Make predictions**

In [7]:
predictions = model.predict(X_test_scaled)

## **Root Mean Squared Error (RMSE)**

In [8]:
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"Root Mean Squared Error: {rmse}")

Root Mean Squared Error: 289.26133209880277


<span style="color: rgb(13, 13, 13); font-family: Söhne, ui-sans-serif, system-ui, -apple-system, &quot;Segoe UI&quot;, Roboto, Ubuntu, Cantarell, &quot;Noto Sans&quot;, sans-serif, &quot;Helvetica Neue&quot;, Arial, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color Emoji&quot;; font-size: 16px; white-space-collapse: preserve; background-color: rgb(255, 255, 255);">Our model's average predictions are about <b>289.26</b> units from the actual sales price. Considering the scale of our prices, this could mean our model has a relatively <b>wide margin of error</b> in predicting sales prices.</span>

## **Mean Absolute Percentage Error (MAPE)**

In [9]:
def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape = mean_absolute_percentage_error(y_test, predictions)
print(f"Mean Absolute Percentage Error: {mape}%")

Mean Absolute Percentage Error: 269.741061359782%


<span style="color: rgb(13, 13, 13); font-family: Söhne, ui-sans-serif, system-ui, -apple-system, &quot;Segoe UI&quot;, Roboto, Ubuntu, Cantarell, &quot;Noto Sans&quot;, sans-serif, &quot;Helvetica Neue&quot;, Arial, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color Emoji&quot;; font-size: 16px; white-space-collapse: preserve; background-color: rgb(255, 255, 255);">On average, our model's predictions are off by about <b>269.74%</b> from the actual value. It indicates a significant discrepancy between our model's predictions and sales prices. This high error rate might imply that the <b>model struggles significantly</b> to capture the dynamics of your data.</span>

## **R-squared (Coefficient of Determination)**

In [10]:
r2 = r2_score(y_test, predictions)
print(f"R-squared: {r2}")

R-squared: -0.2017020297612504


<span style="background-color: rgb(255, 255, 255);"><font color="#0d0d0d" face="Söhne, ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Ubuntu, Cantarell, Noto Sans, sans-serif, Helvetica Neue, Arial, Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji"><span style="font-size: 16px; white-space-collapse: preserve;">An R-squared value less than </span></font><b style="color: rgb(13, 13, 13); font-family: Söhne, ui-sans-serif, system-ui, -apple-system, &quot;Segoe UI&quot;, Roboto, Ubuntu, Cantarell, &quot;Noto Sans&quot;, sans-serif, &quot;Helvetica Neue&quot;, Arial, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color Emoji&quot;; font-size: 16px; white-space-collapse: preserve;">0</b><font color="#0d0d0d" face="Söhne, ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Ubuntu, Cantarell, Noto Sans, sans-serif, Helvetica Neue, Arial, Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji"><span style="font-size: 16px; white-space-collapse: preserve;"> indicates that our model performs worse than a simple horizontal line representing the average of the data. In other words, using the model to make predictions is less accurate than using the mean sales price to predict all transactions. This suggests that the model <b>is not effectively capturing</b> the underlying trend or pattern in the data.</span></font></span>

## **Adjusted R-squared**

In [11]:
def adjusted_r2(r_square, labels, features):
    adj_r_square = 1 - ((1-r_square) * (len(labels)-1))/(len(labels)-features-1)
    return adj_r_square

adjusted_r2_value = adjusted_r2(r2, y_test, len(features))
print(f"Adjusted R-squared: {adjusted_r2_value}")

Adjusted R-squared: -0.3430787391449268


<span style="color: rgb(13, 13, 13); font-family: Söhne, ui-sans-serif, system-ui, -apple-system, &quot;Segoe UI&quot;, Roboto, Ubuntu, Cantarell, &quot;Noto Sans&quot;, sans-serif, &quot;Helvetica Neue&quot;, Arial, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color Emoji&quot;; font-size: 16px; white-space-collapse: preserve; background-color: rgb(255, 255, 255);">This value, even lower than the <b>R-squared</b>, adjusts for the number of predictors in the model and indicates that adding more variables (predictors) to our model doesn't necessarily improve its performance. In fact, it might be introducing noise or irrelevant information, making the <b>model's predictions less reliable.</b></span>

## **Mean Squared Logarithmic Error (MSLE)**

In [12]:
msle = mean_squared_log_error(y_test, predictions)
print(f"Mean Squared Logarithmic Error: {msle}")

Mean Squared Logarithmic Error: 2.4131792936203644


<span style="background-color: rgb(255, 255, 255);"><font color="#0d0d0d" face="Söhne, ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Ubuntu, Cantarell, Noto Sans, sans-serif, Helvetica Neue, Arial, Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji"><span style="font-size: 16px; white-space-collapse: preserve;">The model has difficulty making accurate predictions, especially since it penalizes underestimates more than overestimates. A <b>high MSLE</b> value implies that the model may be significantly underestimating the actual sales prices.</span></font></span>