# Rental Listing Price Model

Below are the steps taken to build our regression model which will be used to predict effective prices for prospective rental listings.

## Preparing the Data

First we need to clean and standardize the data scraped from the rental listing site in order to have the model train on it.

In [8]:
from data_cleaner import get_cleaned_data, flatten_data
from classes import BedroomType, Building
import pandas as pd
import numpy as np
import torch

### Data Cleaning
`get_cleaned_data()` removes invalid and outlier data including blanks and data for single room listings. It also formats the building and unit amenities by making each column a dict that contains the relevant amenities as keys with a value of 1 if the listing has it, else 0.

`flatten_data()` flattens the building and unit amenities to put individual amenities into their own columns, essentially flattening the building and unit amenities dicts into separate columns in each row.

In [9]:
cleaned_data = get_cleaned_data()
flattened_data = flatten_data(cleaned_data)
df = pd.DataFrame(flattened_data)
df.to_excel("cleaned_data.xlsx", index=False)

FileNotFoundError: [Errno 2] No such file or directory: 'rental_listings.xlsx'

In [10]:
print("Printing columns:")
print(df.columns)

Printing columns:
Index(['Building', 'Address', 'City', 'Listing', 'Bed', 'Bath', 'SqFt',
       'Price', 'Pets', 'Latitude', 'Longitude', 'Balcony', 'In Unit Laundry',
       'Air Conditioning', 'High Ceilings', 'Furnished', 'Hardwood Floor',
       'Controlled Access', 'Fitness Center', 'Swimming Pool', 'Roof Deck',
       'Storage', 'Residents Lounge', 'Outdoor Space'],
      dtype='object')


In [11]:
print("Printing first 2 rows:")
print(df.head(2))

Printing first 2 rows:
             Building                                  Address     City  \
0  20 Samuel Wood Way  20 Samuel Wood Way, Toronto, ON M9B 0C8  toronto   
1  20 Samuel Wood Way  20 Samuel Wood Way, Toronto, ON M9B 0C8  toronto   

     Listing  Bed  Bath  SqFt  Price  Pets  Latitude  ...  High Ceilings  \
0     Studio    0   1.0   370   2225     0   43.6959  ...              0   
1  1 Bedroom    1   1.0   540   2625     0   43.6959  ...              0   

   Furnished  Hardwood Floor  Controlled Access  Fitness Center  \
0          0               0                  1               1   
1          0               0                  1               1   

   Swimming Pool  Roof Deck  Storage  Residents Lounge  Outdoor Space  
0              0          0        1                 1              1  
1              0          0        1                 1              1  

[2 rows x 24 columns]


### Standardize the Data
We use standard scaling to standardize the values before passing to the model.

In [12]:
from constants import TableHeaders

In [13]:


city_to_buildings: dict[str, list[Building]] = {}

# Group data by city to extract city specific insights
city_groups = df.groupby(TableHeaders.CITY.value)

for city_name, city_df in city_groups: 
    # Group city data by building name to extract building specific insights
    building_groups = city_df.groupby(TableHeaders.BUILDING.value)

    # Create an intermediary tuple to record number of available units and sort buildings accordingly
    # When displaying overarching insights for an area, buildings with more units will be more informational
    buildings_tuples = [(building, features, len(features)) for building, features in building_groups]
    buildings_tuples.sort(key = lambda x: x[2], reverse=True)

    buildings: list[Building] = []
    for building_name, building_df, num_units in buildings_tuples:

        current_building: Building = Building(building_name, city_name)

        # Group by bed type within this building
        bed_groups = building_df.groupby(TableHeaders.BED.value)
        for bed, bed_df in bed_groups:
            current_building.add_bedroom_type(bed=bed, bed_df=bed_df)
        
        buildings.append(current_building)
    
    city_to_buildings[city_name] = buildings

In [14]:
for city, buildings in city_to_buildings.items():
    print(f"City: {city}")
    for building in buildings:
        print(building)

City: edmonton
Building: Citizen on Jasper
Total Units: 79
Overall Average SqFt: 652.15
Overall Average Price: 2042.63
Overall Price Per SqFt: 3.13
-----------------------------------
Bedroom Type: 1 beds
 - Units: 63
 - Average SqFt: 585.62
 - Average Price: 1894.30
 - Price per SqFt: 3.23
-----------------------------------
Bedroom Type: 2 beds
 - Units: 15
 - Average SqFt: 899.93
 - Average Price: 2571.07
 - Price per SqFt: 2.86
-----------------------------------
Bedroom Type: 3 beds
 - Units: 1
 - Average SqFt: 1127.00
 - Average Price: 3461.00
 - Price per SqFt: 3.07
-----------------------------------

Building: Raymond Block
Total Units: 17
Overall Average SqFt: 824.18
Overall Average Price: 2017.94
Overall Price Per SqFt: 2.45
-----------------------------------
Bedroom Type: 1 beds
 - Units: 11
 - Average SqFt: 736.82
 - Average Price: 1913.64
 - Price per SqFt: 2.60
-----------------------------------
Bedroom Type: 2 beds
 - Units: 6
 - Average SqFt: 984.33
 - Average Price: