# Introduction: Business Problem

In this project we will predict the monthly rental price for a condominium. Specifically, this report will be targeted to users interested in finding the best value in renting a condominium in Singapore.

We will use our data science knowledge to find optimum rental price and recommend users the best values and similar units for the stakeholders.

# Data Acquisition

Based on definition of our problem, factors that will influence a housing price could be:

1. Size of the unit
2. Furnishing Level of the Unit
3. Location of the unit
4. Proximity of the unit to public transportation
5. Remaining least of the unit / How new the unit is

etc....

In this section, we will retrieve nearby places of interests for the given unit. This help us to understand if there is a similar neighbhourhood to recommend for our stakeholder.
The information will be retrieved by **FourSquare API**

In [160]:
import numpy as np
import pandas as pd
import requests
import json
import pickle
from bs4 import BeautifulSoup
import requests
import re
from itertools import compress

In [161]:
def get_venues(dict_,pk,lat,lng,category, radius):

    url = 'https://api.foursquare.com/v2/venues/explore'
    params = dict(
    client_id='here you should input your foursquare id',
    client_secret='here you should input your foursquare password',
    v='20180323',
    query = '*',
    categoryId = category,
    ll= f'{lat},{lng}',
    radius = str(radius),
    limit=None
    )
    
    dict_[pk] = requests.get(url=url, params=params).json()['response']['groups'][0]['items']
    print(f'{pk}: Retrieved Sucessful')
    return None

In [162]:
df = pd.read_csv('Data/latlng.csv')

In [163]:
all_venues2 = {}
#entertaintment
#college / university
#food
#nightlife
#outdoors & recreation
#shops and services

category = "4d4b7104d754a06370d81259,4d4b7105d754a06372d81259,4d4b7105d754a06374d81259,4d4b7105d754a06376d81259,4d4b7105d754a06377d81259,4d4b7105d754a06378d81259"
           
for pk, (lat,lng) in enumerate(zip(df.lat,df.long)):
    try:
        get_venues(all_venues2,pk,lat,lng, category, radius=500)
    except:
        print(f'{pk}: Retrieved unsucessful')
        continue


0: Retrieved Sucessful
1: Retrieved Sucessful
2: Retrieved Sucessful
3: Retrieved Sucessful
4: Retrieved Sucessful
5: Retrieved Sucessful
6: Retrieved Sucessful
7: Retrieved Sucessful
8: Retrieved Sucessful
9: Retrieved Sucessful
10: Retrieved Sucessful
11: Retrieved Sucessful
12: Retrieved Sucessful
13: Retrieved Sucessful
14: Retrieved Sucessful
15: Retrieved Sucessful
16: Retrieved Sucessful
17: Retrieved Sucessful
18: Retrieved Sucessful
19: Retrieved Sucessful
20: Retrieved Sucessful
21: Retrieved Sucessful
22: Retrieved Sucessful
23: Retrieved Sucessful
24: Retrieved Sucessful
25: Retrieved Sucessful
26: Retrieved Sucessful
27: Retrieved Sucessful
28: Retrieved Sucessful
29: Retrieved Sucessful
30: Retrieved Sucessful
31: Retrieved Sucessful
32: Retrieved Sucessful
33: Retrieved Sucessful
34: Retrieved Sucessful
35: Retrieved Sucessful
36: Retrieved Sucessful
37: Retrieved Sucessful
38: Retrieved Sucessful
39: Retrieved Sucessful
40: Retrieved Sucessful
41: Retrieved Sucessful
42

333: Retrieved Sucessful
334: Retrieved Sucessful
335: Retrieved Sucessful
336: Retrieved Sucessful
337: Retrieved Sucessful
338: Retrieved Sucessful
339: Retrieved Sucessful
340: Retrieved Sucessful
341: Retrieved Sucessful
342: Retrieved Sucessful
343: Retrieved Sucessful
344: Retrieved Sucessful
345: Retrieved Sucessful
346: Retrieved Sucessful
347: Retrieved Sucessful
348: Retrieved Sucessful
349: Retrieved Sucessful
350: Retrieved Sucessful
351: Retrieved Sucessful
352: Retrieved Sucessful
353: Retrieved Sucessful
354: Retrieved Sucessful
355: Retrieved Sucessful
356: Retrieved Sucessful
357: Retrieved Sucessful
358: Retrieved Sucessful
359: Retrieved Sucessful
360: Retrieved Sucessful
361: Retrieved Sucessful
362: Retrieved Sucessful
363: Retrieved Sucessful
364: Retrieved Sucessful
365: Retrieved Sucessful
366: Retrieved Sucessful
367: Retrieved Sucessful
368: Retrieved Sucessful
369: Retrieved Sucessful
370: Retrieved Sucessful
371: Retrieved Sucessful
372: Retrieved Sucessful


In [170]:
# with open('foursquareVenue.p', 'wb') as fp:
#     pickle.dump(all_venues2, fp, protocol=pickle.HIGHEST_PROTOCOL)

In [171]:
with open('foursquareVenue.p', 'rb') as fp:
    all_venues = pickle.load(fp)

In [172]:
#unpacking dictionary to create dataframe
listing = range(len(all_venues))
categories = []
pk = 0
for one_listing in listing:
    for venue in  range(len(all_venues[one_listing])):
        name = all_venues[one_listing][venue]['venue']['categories'][0]['name']
        ids = all_venues[one_listing][venue]['venue']['categories'][0]['id']
        categories.append([pk, name, ids])
    pk += 1

In [173]:
neighourhood_POI = pd.DataFrame(categories,columns=['pk','category', 'subid'])
neighourhood_POI

Unnamed: 0,pk,category,subid
0,0,Japanese Restaurant,4bf58dd8d48988d111941735
1,0,Coffee Shop,4bf58dd8d48988d1e0931735
2,0,Dessert Shop,4bf58dd8d48988d1d0941735
3,0,Coffee Shop,4bf58dd8d48988d1e0931735
4,0,Hotel,4bf58dd8d48988d1fa931735
...,...,...,...
11255,555,Gas Station,4bf58dd8d48988d113951735
11256,555,Dessert Shop,4bf58dd8d48988d1d0941735
11257,555,Pub,4bf58dd8d48988d11b941735
11258,555,Lighthouse,4bf58dd8d48988d15d941735


We need to convert subid to main id as per Foursquare. First we need to find out the main id for each sub id. We can manually look for it in Foursquare main website however, it is tasking. We will parse it instead.

In [176]:
url = 'https://developer.foursquare.com/docs/build-with-foursquare/categories/'
web = requests.get(url)

In [177]:
soup = BeautifulSoup(web.text, 'lxml')

In [200]:
#the main category that we are searching for
main_category = [
                'Arts & Entertainment icon',
                 'College & University icon',
                 'Food icon',
                 'Nightlife Spot icon',
                 'Outdoors & Recreation icon',
                 'Shop & Service icon'
    
                ]
#we will combine all result in a dictionary
all_category = {}

# we will includes the subcategory id in a list


for category in main_category:
    ps = list()
    #finding the parent of each main_category
    venue_object = soup.find('img', {'alt': category}).parent
    
    #find all p tags in each main_category for the ids
    for p in venue_object.find_all('p'):
        
        #exclude p tag with supported countries
        if p.text[0:8] in 'Supported countries':
            continue
        ps.append(p.text)
    all_category[category] = ps

In [201]:
# have a look into the result of our parsing
all_category

{'Arts & Entertainment icon': ['4d4b7104d754a06370d81259',
  '56aa371be4b08b9a8d5734db',
  '4fceea171983d5d06c3e9823',
  '4bf58dd8d48988d1e1931735',
  '4bf58dd8d48988d1e2931735',
  '4bf58dd8d48988d1e4931735',
  '4bf58dd8d48988d17c941735',
  '52e81612bcbc57f1066b79e7',
  '4bf58dd8d48988d18e941735',
  '5032792091d4c4b30a586d5c',
  '52e81612bcbc57f1066b79ef',
  '52e81612bcbc57f1066b79e8',
  '56aa371be4b08b9a8d573532',
  '4bf58dd8d48988d1f1931735',
  '52e81612bcbc57f1066b79ea',
  '4deefb944765f83613cdba6e',
  '5744ccdfe4b0c0459246b4bb',
  '52e81612bcbc57f1066b79e6',
  '5642206c498e4bfca532186c',
  '52e81612bcbc57f1066b79eb',
  '4bf58dd8d48988d17f941735',
  '56aa371be4b08b9a8d5734de',
  '4bf58dd8d48988d17e941735',
  '4bf58dd8d48988d180941735',
  '4bf58dd8d48988d181941735',
  '4bf58dd8d48988d18f941735',
  '559acbe0498e472f1a53fa23',
  '4bf58dd8d48988d190941735',
  '4bf58dd8d48988d192941735',
  '4bf58dd8d48988d191941735',
  '4bf58dd8d48988d1e5931735',
  '4bf58dd8d48988d1e7931735',
  '4bf58dd8

Now we have the main id of each subid, we will update our dataframe

In [214]:
main_category = ['Arts & Entertainment', 'College & University', 'Food', 'Nightlife Spot', 'Outdoors & Recreation', 'Shop & Service']

all_cat = []

#we will loop through each row of our main dataframe
for idx in range(len(neighourhood_POI)):
    boolean_list = []
    for main_cat in all_category:
        if neighourhood_POI.loc[idx, 'subid'] in all_category[main_cat]:
            boolean = True
        else:
            boolean = False
        #we will collect a list of boolean [False, True, False, False, False, False]. It shall only have 1 true inside.    
        boolean_list.append(boolean)
    
    #Foursquare return categorical which are not initally queried. We will list those ids as not a match
    if not any(boolean_list):
        result = 'Not A Match'
    else:
    #we will match the main category with the boolean list eg [art, college, food] [True, False, False] will return art
        result = list(compress(main_category, boolean_list))[0]
    all_cat.append(result)

#add all in neighbhourhood_dataframe
neighourhood_POI['main_category'] = all_cat

#save in a csv
neighourhood_POI.to_csv('Data/neighourhood_POI.csv', index=False)

In [218]:
# have a look into the dataframe
neighourhood_POI.tail(5)

Unnamed: 0,pk,category,subid,main_category
11255,555,Gas Station,4bf58dd8d48988d113951735,Shop & Service
11256,555,Dessert Shop,4bf58dd8d48988d1d0941735,Food
11257,555,Pub,4bf58dd8d48988d11b941735,Nightlife Spot
11258,555,Lighthouse,4bf58dd8d48988d15d941735,Outdoors & Recreation
11259,555,Bus Station,4bf58dd8d48988d1fe931735,Not A Match


The Labeling looks good. we have completed data acquisition.