# OFFICIAL SUMMARY NOTEBOK

## Abstract

In this project, we scraped 11k property rent listings from domain.com.au (~6k after preprocessing) and, combined with API queried POI data, tried to answer the 3 fundamental questions. We performed statistical tests on the scraped data combined with external data such as crime rate and income to determine relevant features to be used in our model. We used POI data as an insight into what makes certain properties valuable, but they provide no predictive value since their numbers do not fluctuate much over time.

We fit linear models to the dataset, and used correlation metrics to determine useful features. Unfortunately, only income was found to have any correlation with rent price, which resulted in our model not being very accurate. However, the model still was able to show us a general trend for the future, which still allowed us to answer the question of predictive growth.

For matters of livability, we used POI data and created a metric based on external reports of what Victorians consider to be signs that a place is livable. For affordability, we used income data and rent prices in each SA2 area to obtain an estimated percentage of salary to be paid for rent. From external reports, we found that most Australians are only willing to pay up to 30% of their salary on rent, and thus we reasoned that anything below that threshold for each SA2 area is considered to be affordable.

## How to navigate the notebook

Please run the the code cells under 'Preliminary code' in the next section, which runs the skeleton notebook we've compiled with all the variables required to demonstrate our results and findings. Please hide the cells to avoid overflow of output. It may take a while to run; thank you for your patience.

Once that is done, please continue to the 'Analysis and Presentation of findings' section, where we will walk you through the internal, external feature analysis and modelling, as well as our forecasts and key findings. 

(Please run code cells where necessary to view specific results and visuals)

## Preliminary Code

In [4]:
# import packages
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
%run summary_notebook.ipynb

  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = c

SyntaxError: invalid syntax (<unknown>, line 1)

SyntaxError: invalid syntax (<unknown>, line 1)

In [7]:
# import packages
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import scipy.stats
import matplotlib.pyplot as plt
import association_metrics as am

from itertools import chain
from numpy import arange
from statsmodels.formula.api import ols, glm
from statsmodels.api import families
from scipy.stats import chi2_contingency
from sklearn.linear_model import ElasticNet
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import cross_validate

# set directory
main_directory = os.getcwd().strip('notebooks')

## Analysis and Presentation of findings

### Property Internal features analysis

In this Section we will be examining the correlation of property internal features, such as property type, number of beds/baths/parking to the rental price of the properties. Relevant features will be selected for modelling in the next stage

In [8]:
corr.drop(columns = 'index', inplace = True)

In [11]:
corr.drop('index', inplace = True)

In [12]:
corr

Unnamed: 0,cost_text,beds,baths,parking,under 20 (%),20 - 39 (%),40 - 59 (%),60 +(%)
cost_text,1.0,0.320863,0.42415,0.16642,-0.069596,-0.015882,0.047522,0.054644
beds,0.320863,1.0,0.555679,0.498012,0.437816,-0.392116,0.167035,0.18911
baths,0.42415,0.555679,1.0,0.348175,0.212736,-0.135217,0.073144,0.011129
parking,0.16642,0.498012,0.348175,1.0,0.345164,-0.354583,0.194312,0.186673
under 20 (%),-0.069596,0.437816,0.212736,0.345164,1.0,-0.687136,0.488667,0.051793
20 - 39 (%),-0.015882,-0.392116,-0.135217,-0.354583,-0.687136,1.0,-0.629799,-0.699556
40 - 59 (%),0.047522,0.167035,0.073144,0.194312,0.488667,-0.629799,1.0,0.099178
60 +(%),0.054644,0.18911,0.011129,0.186673,0.051793,-0.699556,0.099178,1.0


In [6]:

corr.style.background_gradient(cmap='coolwarm')


Unnamed: 0,index,cost_text,beds,baths,parking,under 20 (%),20 - 39 (%),40 - 59 (%),60 +(%)
index,1.0,0.024692,0.002815,-0.001664,-0.00199,-0.012109,0.005622,-0.001649,0.001671
cost_text,0.024692,1.0,0.320863,0.42415,0.16642,-0.069596,-0.015882,0.047522,0.054644
beds,0.002815,0.320863,1.0,0.555679,0.498012,0.437816,-0.392116,0.167035,0.18911
baths,-0.001664,0.42415,0.555679,1.0,0.348175,0.212736,-0.135217,0.073144,0.011129
parking,-0.00199,0.16642,0.498012,0.348175,1.0,0.345164,-0.354583,0.194312,0.186673
under 20 (%),-0.012109,-0.069596,0.437816,0.212736,0.345164,1.0,-0.687136,0.488667,0.051793
20 - 39 (%),0.005622,-0.015882,-0.392116,-0.135217,-0.354583,-0.687136,1.0,-0.629799,-0.699556
40 - 59 (%),-0.001649,0.047522,0.167035,0.073144,0.194312,0.488667,-0.629799,1.0,0.099178
60 +(%),0.001671,0.054644,0.18911,0.011129,0.186673,0.051793,-0.699556,0.099178,1.0
