
# 📈 Canada Rent Classification with AutoGluon

# ✨ Introduction

This notebook aims to classify rental prices across Canada into three categories: **Low**, **Medium**, and **High** based on historic rental data. 
**AutoGluon** will therefore be used for automating the machine learning pipeline, enabling fast and robust model development.

# 🏘️ Canada Rent Dataset (1987–2024)

**This dataset provides historical rental data for various cities and provinces across Canada from 1987 to 2024.** It includes details about rent prices, unit types, and geographic locations, which can be valuable for housing market analysis, rental trend forecasting, and urban planning studies.

## 🧾 Dataset Schema Overview

| Feature            | Data Type | Description                                                 |
|--------------------|-----------|-------------------------------------------------------------|
| **Province**        | object    | Canadian province where the rental unit is located          |
| **City**            | object    | City within the province                                    |
| **Year**            | int64     | Year of the rent data (from 1987 to 2024)                   |
| **AverageRent**     | int64     | Average monthly rent price in CAD                           |
| **UnitType**        | object    | Category of the rental unit (e.g., Apartment, Row House)    |
| **UnitDescription** | object    | Detailed description of the rental unit (e.g., 2 Bedroom)   |



In [1]:
import pandas as pd
from autogluon.tabular import TabularPredictor
from sklearn.model_selection import train_test_split

In [4]:
# Load Dataset
df = pd.read_csv('Canada_Rent_1987-2024_NO ZEROS.csv', encoding='latin1')
df.head()

Unnamed: 0,Province,City,Year,AverageRent,UnitType,UnitDescription
0,Newfoundland and Labrador,Corner Brook,1987,480,Two bedroom units,Apartment structures of six units and over
1,Newfoundland and Labrador,Gander,1987,370,One bedroom units,Apartment structures of six units and over
2,Newfoundland and Labrador,Gander,1987,414,Two bedroom units,Apartment structures of six units and over
3,Newfoundland and Labrador,Gander,1987,414,Three bedroom units,Apartment structures of six units and over
4,Newfoundland and Labrador,Labrador City,1987,254,One bedroom units,Apartment structures of six units and over


## 📊 Exploratory Data Analysis (EDA) Report

For a detailed overview of the dataset, distributions, correlations, and other insights, check out the full EDA report here:

🔗 [Canada Rent EDA Report](https://kcracks.github.io/EDA_Reports/ydata/Canada_Report.html)


In [5]:
#Binning AverageRent into Categories
def rent_category(rent):
    if rent < 800:
        return 'Low'
    elif rent < 1400:
        return 'Medium'
    else:
        return 'High'

df['RentCategory'] = df['AverageRent'].apply(rent_category)

#Drop the original AverageRent
df = df.drop(columns=['AverageRent'])

In [6]:
#Define Features and Label
target = 'RentCategory'
features = df.drop(columns=[target]).columns.tolist()

#Train/Test Split
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

In [7]:
#AutoGluon Classification
predictor = TabularPredictor(label=target).fit(train_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20250412_030727"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.10.16
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 24.3.0: Thu Jan  2 20:23:36 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T8112
CPU Count:          8
Memory Avail:       1.68 GB / 8.00 GB (21.0%)
Disk Space Avail:   13.78 GB / 228.27 GB (6.0%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and b

In [8]:
#Displaying the Model's Leaderboard
predictor.leaderboard(test_data, silent=True)

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM,0.979291,0.9792,accuracy,0.267375,0.044695,4.326824,0.267375,0.044695,4.326824,1,True,5
1,WeightedEnsemble_L2,0.978716,0.9804,accuracy,0.519161,0.089839,155.256498,0.004853,0.00038,0.069178,2,True,14
2,LightGBMLarge,0.975983,0.9772,accuracy,0.205906,0.026866,5.698687,0.205906,0.026866,5.698687,1,True,13
3,LightGBMXT,0.97059,0.9732,accuracy,0.245369,0.044745,3.136283,0.245369,0.044745,3.136283,1,True,4
4,NeuralNetFastAI,0.970518,0.972,accuracy,0.134854,0.023197,20.193271,0.134854,0.023197,20.193271,1,True,3
5,CatBoost,0.955562,0.9636,accuracy,0.052369,0.004832,18.421025,0.052369,0.004832,18.421025,1,True,8
6,XGBoost,0.951391,0.9564,accuracy,0.247164,0.044603,7.497379,0.247164,0.044603,7.497379,1,True,11
7,NeuralNetTorch,0.948803,0.9584,accuracy,0.049198,0.017809,150.384308,0.049198,0.017809,150.384308,1,True,12
8,RandomForestEntr,0.929676,0.9392,accuracy,0.234193,0.037749,1.035296,0.234193,0.037749,1.035296,1,True,7
9,RandomForestGini,0.927447,0.9376,accuracy,0.269486,0.040292,1.22317,0.269486,0.040292,1.22317,1,True,6


In [9]:
#Model Evaluation
performance = predictor.evaluate(test_data)
print("\nModel Performance:")
print(performance)


Model Performance:
{'accuracy': 0.9787157546559286, 'balanced_accuracy': 0.9375723170512115, 'mcc': 0.9479255753722184}


In [10]:
#Feature Importance
feature_importance = predictor.feature_importance(test_data)
print(feature_importance)

Computing feature importance via permutation shuffling for 5 features using 5000 rows with 5 shuffle sets...
	8.77s	= Expected runtime (1.75s per shuffle set)
	4.76s	= Actual runtime (Completed 5 of 5 shuffle sets)


                 importance    stddev       p_value  n  p99_high   p99_low
Year                0.28044  0.003568  3.142303e-09  5  0.287786  0.273094
City                0.21020  0.005577  5.939689e-08  5  0.221683  0.198717
UnitType            0.15324  0.004246  7.065507e-08  5  0.161982  0.144498
Province            0.02500  0.002642  1.474663e-05  5  0.030440  0.019560
UnitDescription     0.00964  0.002114  2.604709e-04  5  0.013992  0.005288


In [12]:

# Predict on Test Set
y_pred = predictor.predict(test_data.drop(columns=[target]))
print("\nPredictions:")
print(y_pred)



Predictions:
65499    Medium
51253       Low
26072       Low
56611       Low
53923    Medium
          ...  
19193    Medium
17539       Low
34271       Low
64481    Medium
16778       Low
Name: RentCategory, Length: 13907, dtype: object


## 🧠 Conclusion and Analysis of Classification Results

### ✅ Analysis of Model Performance

The model achieved a **high accuracy of 97.87%**, indicating that it correctly predicted nearly all instances in the test dataset. Additionally:

- **Balanced Accuracy**: 93.76% — shows the model performs well across all classes.
- **Matthews Correlation Coefficient (MCC)**: 0.9479 — a strong correlation between predicted and actual classes, suggesting reliable predictions overall.

These metrics confirm that the model is fairly robust, generalizes well to unseen data, and handles class imbalances effectively.

---

### 📊 Analysis of Feature Importance

1. **Year** (Importance: 0.28044) — The most influential feature, suggesting that rent classification is highly time-sensitive.
2. **City** (Importance: 0.21020) — Geographic location plays a significant role in determining rent classes.
3. **UnitType** (Importance: 0.15324) — The type of unit (e.g., one-bedroom, two-bedroom) also impacts rent classification.
4. **Province** and **UnitDescription** — These have smaller but still meaningful contributions to rent classification.

This analysis highlights the temporal and geographic nature of rent classification, with "Year" and "City" being critical drivers in the model's results.

---

**In conclusion, this Classification Model seems to be highly accurate and reliable where the most impactful features relate to "when" and "where" the rental unit is located, and "what type" of unit it is. These insights are therefore useful for further rent trend analysis, policy-making, or rental pricing tools.**