# Model Evaluation Notebook

## Objectives
- Load the trained model
- Evaluate model performance on test data
- Create visualizations of predictions
- Determine if model meets business requirements

## Inputs
- Trained model from outputs/models/v1/price_prediction_model.pkl
- Test data from prepared dataset

## Outputs
- Performance metrics (R², MAE, RMSE)
- Actual vs Predicted plots
- Business case evaluation

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

In [4]:
# Load the saved model
print("Loading saved model...")
with open('../outputs/models/v1/price_prediction_model.pkl', 'rb') as file:
    model = pickle.load(file)

# Load the features list
with open('../outputs/models/v1/features.pkl', 'rb') as file:
    features = pickle.load(file)

print("Model loaded successfully")
print(f"Features used: {features}")

Loading saved model...
Model loaded successfully
Features used: ['Property_Type_Encoded', 'County_Encoded', 'Old_New_Encoded', 'Duration_Encoded', 'Type_Age_Interaction', 'County_Price_Tier', 'Type_Rarity']


In [5]:
# Load the prepared data
df = pd.read_csv('../outputs/datasets/prepared/v1/prepared_data.csv')
print(f"Loaded {len(df)} properties")

# Prepare features and target
X = df[features]
y = df['Price']

# Split data (same split as in training)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Test set size: {len(X_test)} properties")

Loaded 17553 properties
Test set size: 3511 properties
