# Strategic Location for Purchasing a Residential Property for Short-term Rent Out (Airbnb) 
## Background / Business Problem
Airbnbs are becoming more popular. They offer cheap short-term stay while providing the convinience of a complete property. However, they need to compete with the hotels in their vicinity. Hotels often host a restaurant which could be an important factor for those who care about having their meals close to their temporary location. Hence, for an Airbnb to be attractive enough for these kind of tourists, the number of restaurants in the neighborhood is of high importance. 

### Business Problem: 
A client is looking for the best neighbourhood to purchase a residential property to rent out as an Airbnb. Where could they purchase a rather cheap property which could attract a reasonable amount of tenants. Our reasoning would be based on number of hotels in the neighbourhood (which indicates level of competition), number of restaurants in the area (as an highly influential factor when choosing an Airbnb), and price of the residential properties in the area (as a decision factor for the client). 

In this analysis, the influence of other Airbnbs in the area is neglected. Furthermore, other factors to choose an Airbnb such as number of point-of-interests, museums, etc. are not included in this research. We also neglect the effect of costly restaurants and treat all restaurants as equally influential on the choise of Airbnb location. 

## Data
Foursquare API will be used to fetch data on venues in the neighbourhoods of Toronto. Neighbourhood names will be retrieved and scraped from Wikipedia (https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=945633050) based on their borough and postal codes.

To find out the price of residential properties in Toronto we use the data file provided in the following address which gives us the price, neighbourhood, and geo-coordinates of many residentials properties in Toronto: https://www.kaggle.com/mnabaee/ontarioproperties 
Although the data is from 2016, we can still use it for today's comparison since the goal is to compare different neighbourhoods' price level. We assume the residential properties are representitive and sufficient for our conclusion. 

In [1]:
# import the library we use to open URLs
import urllib.request
!conda install -c conda-forge geopy --yes
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import lxml.html as lh
import requests
import pandas as pd
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url)
#Store the contents of the website
data = lh.fromstring(page.content)
#Store table data
tr_elements = data.xpath('//tr')
col=[]
ind=0
#store each first element and an empty list for each row
for i in tr_elements[0]:
    ind= ind + 1
    name = i.text_content()
    col.append((name,[]))
#Since the first row is header, data is stored from the second row
for i in range(1,len(tr_elements)):
    T=tr_elements[i]
    
    #If row is not of size 3, then tr data is not from our table 
    if len(T)!=3:
        break
    #i is the index of our column
    ind=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        temp=t.text_content() 
        #Check if row is empty
        if ind>0:
        #Convert any numerical value to integers
            try:
                temp=int(temp)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[ind][1].append(temp)
        #Increment i for the next column
        ind = ind + 1
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
df['Postal Code\n']=df['Postal Code\n'].str.replace('\n', '')
df['Borough\n']=df['Borough\n'].str.replace('\n', '')
df['Neighbourhood\n']=df['Neighbourhood\n'].str.replace('\n', '')
df.rename(columns={'Postal Code\n': 'Postal Code', 'Borough\n': 'Borough', 'Neighbourhood\n': 'Neighbourhood'}, inplace = True)
df = df[df.Borough != 'Not assigned']
df = df.reset_index(drop=True)
#!conda install -c conda-forge geocoder -y
import geocoder # import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


address = 'Toronto, Canada'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude


latitude=[]
longitude=[]
for code in df['Postal Code']:
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(code))
    #print(code, g.latlng)
    while (g.latlng is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(code))
        #print(code, g.latlng)
    latlng = g.latlng
    latitude.append(latlng[0])
    longitude.append(latlng[1])
    
df['Latitude'] = latitude
df['Longitude'] = longitude

df = df[~df.Borough.str.contains("Canadian postal codes")]
df.head()

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


In [2]:
# load housing price of Ontario
prop_path = pd.read_csv('properties.csv')
df_prop = pd.DataFrame(prop_path)

# Only keep rows which are in Toronto, ON
df_prop = df_prop[df_prop.Address.str.contains("Toronto, ON")]
df_prop.drop('Unnamed: 0', inplace=True, axis=1)

# Lets see how our data looks like
df_prop = df_prop[~(df_prop['Price ($)'] <= 50000)]  
df_prop.head()

Unnamed: 0,Address,AreaName,Price ($),lat,lng
0,"86 Waterford Dr Toronto, ON",Richview,999888,43.679882,-79.544266
4,"#1409 - 230 King St Toronto, ON",Downtown,362000,43.651478,-79.368118
5,"254A Monarch Park Ave Toronto, ON",Old East York,1488000,43.686375,-79.328918
12,"3 Bracebridge Ave Toronto, ON",Old East York,599900,43.697842,-79.317368
15,"#710 - 1080 Bay St Toronto, ON",Downtown,805900,43.666794,-79.388756
