# Rental Listing Price Model

Below are the steps taken to build our regression model which will be used to predict effective prices for prospective rental listings.

## Preparing the Data

First we need to clean and standardize the data scraped from the rental listing site in order to have the model train on it.

In [2]:
from data_cleaner import get_cleaned_data, flatten_data
import pandas as pd
import numpy as np
import torch

### Data Cleaning
`get_cleaned_data()` removes invalid and outlier data including blanks and data for single room listings. It also formats the building and unit amenities by making each column a dict that contains the relevant amenities as keys with a value of 1 if the listing has it, else 0.

`flatten_data()` flattens the building and unit amenities to put individual amenities into their own columns, essentially flattening the building and unit amenities dicts into separate columns in each row.

In [3]:
cleaned_data = get_cleaned_data()
flattened_data = flatten_data(cleaned_data)
df = pd.DataFrame(flattened_data)

FileNotFoundError: [Errno 2] No such file or directory: 'rental_listings.xlsx'

In [None]:
print("Printing columns:")
print(df.columns)

In [None]:
print("Printing first 2 rows:")
print(df.head(2))

### Standardize the Data
We use standard scaling to standardize the values before passing to the model.

In [None]:
from constants import TableHeaders

In [None]:
SQFT = TableHeaders.SQFT.value

np_sqft = df[SQFT].to_numpy()
sqft_mean = np_sqft.mean()
sqft_std = np_sqft.std()

print("Mean SQFT:",sqft_mean)
print("STDEV SQFT:",sqft_std)