# Preparation
The data on HDB resale property transactions comes in 4 separate datasets (link:
https://data.gov.sg/dataset/resale-flat-prices) differentiated by the date of the transaction:

- 1990 – 1999
- 2000 – Feb 2012
- Mar 2012 – Dec 2014
- Jan 2015 onwards

Prior to attempting the questions below, please merge the datasets together.

In [2]:
import json
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re

In [16]:
df = pd.read_csv('hdb_resale.csv')

In [2]:
# data source: https://data.gov.sg/dataset/resale-flat-prices
query_string=[
    # Jan 2017 onwards
    'https://data.gov.sg/api/action/datastore_search?resource_id=f1765b54-a209-4718-8d38-a39237f502b3&limit=103555',
    # Jan 2015 to Dec 2016
    'https://data.gov.sg/api/action/datastore_search?resource_id=1b702208-44bf-4829-b620-4615ee19b57c&limit=37153',
    # Mar 2012 to Dec 2014
    'https://data.gov.sg/api/action/datastore_search?resource_id=83b2fc37-ce8c-4df4-968b-370fd818138b&limit=52203',
    #2000 to Feb 2012
    'https://data.gov.sg/api/action/datastore_search?resource_id=8c00bf08-9124-479e-aeca-7cc411d884c4&limit=369651',
    #1990 to 1999
    'https://data.gov.sg/api/action/datastore_search?resource_id=adbbddd3-30e2-445f-a123-29bee150a6fe&limit=287196'
]

Using API provided by data.gov, a function is made to compile all the data and to form one merged sheet

In [3]:
def hdb_api(query_string):
    all_df = []
    for url in query_string:
        resp = requests.get(url)
        
        #Convert JSON into Python Object 
        data = json.loads(resp.content)
        
        # find length of data and columns
        hdb_price_dict_records = data['result']['records']
        col = len(hdb_price_dict_records[0])
        
        # Next we need to feed our JSON data into dataframe. 
        # We will access the 'records' key:value pairs of the python dictionary. 
        # We will then map the list into a dataframe.
        town = []
        flat_type = []
        flat_model = []
        floor_area_sqm = []
        street_name = []
        resale_price = []
        month = []
        remaining_lease = []
        lease_commence_date = []
        storey_range = []
        _id = []
        block = []

        for i in range(len(hdb_price_dict_records)):
            town.append(hdb_price_dict_records[i]['town'])
            flat_type.append(hdb_price_dict_records[i]['flat_type'])
            flat_model.append(hdb_price_dict_records[i]['flat_model'])
            floor_area_sqm.append(hdb_price_dict_records[i]['floor_area_sqm'])
            street_name.append(hdb_price_dict_records[i]['street_name'])
            resale_price.append(hdb_price_dict_records[i]['resale_price'])
            month.append(hdb_price_dict_records[i]['month'])
            lease_commence_date.append(hdb_price_dict_records[i]['lease_commence_date'])
            storey_range.append(hdb_price_dict_records[i]['storey_range'])
            _id.append(hdb_price_dict_records[i]['_id'])
            block.append(hdb_price_dict_records[i]['block'])
            if col >11:
                remaining_lease.append(hdb_price_dict_records[i]['remaining_lease'])

        # append all_df to be combined later
        if col >11:
            all_df.append(pd.DataFrame({
                'town': town,
                'flat_type': flat_type,
                'flat_model': flat_model,
                'floor_area_sqm': floor_area_sqm,
                'street_name': street_name,
                'resale_price': resale_price,
                'month': month,
                'remaining_lease': remaining_lease,
                'lease_commence_date': lease_commence_date,
                'storey_range': storey_range,
                '_id': _id,
                'block': block
            }))
        else:
            all_df.append(pd.DataFrame({
                'town': town,
                'flat_type': flat_type,
                'flat_model': flat_model,
                'floor_area_sqm': floor_area_sqm,
                'street_name': street_name,
                'resale_price': resale_price,
                'month': month,
                'lease_commence_date': lease_commence_date,
                'storey_range': storey_range,
                '_id': _id,
                'block': block
            }))
        
    return pd.concat(all_df,ignore_index=True)

In [4]:
df = hdb_api(query_string)

In [5]:
# Verify that there is no missing entries (849758 entries)
len(df)

849758

## Getting to know the dataset
The HDB resale data can be retrieved from data.gov.sg. Below is an example of the columns available from the [`HDB_resale.csv`]('HDB_resale.csv') dataset:

|Feature| Description |
|--|--|
| Month | Given in the format of year-month. We may retrieve the year data from this column, which may be useful when analysing the time trend for HDB resale price. |
| Town | Town location should be one of the key factors affecting HDB resale price — we are generally expecting an HDB flat in Orchard has a much higher resale price than Yishun given the same flat type.|
|Flat Type| There are 7 different kinds of flat types: 1 Room, 2 Room, 3 Room, 4 Room, 5 Room, EC and Multi-generation. Among which the 4 Room HDB flats are the most popular ones in Singapore. We may consider using 4 Room data samples to construct the model. |
|Storey Range| This column is given as a string rather than numbers, we may need to do some data munging accordingly if we want to use it to build the model. |
|Flat Model| Similarly, there are plenty of different flat models out there(35 different types). This factor would play an important role in the overall flat price. E.g., the DBSS (Design, Build and Sell Scheme) flats would have a higher resale price considering it allows buyers to design the HDB based on their own style. |
|Remaining Lease| Singapore HDB has a lease of 99 years. This column data has quite some NULL values, and it is calculated based on different years. We may need to adjust this column data accordingly when building the model.|

## Datatypes

In [8]:
from datetime import datetime
import re

In [9]:
df.dtypes

town                   object
flat_type              object
flat_model             object
floor_area_sqm         object
street_name            object
resale_price           object
month                  object
remaining_lease        object
lease_commence_date    object
storey_range           object
_id                     int64
block                  object
dtype: object

In [10]:
df.head(2)

Unnamed: 0,town,flat_type,flat_model,floor_area_sqm,street_name,resale_price,month,remaining_lease,lease_commence_date,storey_range,_id,block
0,WOODLANDS,5 ROOM,Model A,134,MARSILING RD,350000,2017-03,77 years 01 month,1995,04 TO 06,1,139
1,WOODLANDS,5 ROOM,Standard,120,MARSILING DR,355000,2017-03,57 years 10 months,1976,16 TO 18,2,9


The following column datatypes need to be handled: 
- '_id' should be dropped because there it has no meaning to the database other than identify itself in its .csv file
- 'storey_range': to two columns lower and upper storeys as separate columns
- 'floor_area_sqm','resale_price','storey_lower', 'storey_upper', 'block_num': to numeric
- 'month','lease_commence_date': to date
- 'remaining_lease': has many ms may not be used so not amended

In [11]:
df.drop(columns='_id',inplace=True)

In [12]:
df['storey_lower'] = df['storey_range'].apply(lambda x: re.findall('\d+',x)[0])
df['storey_upper'] = df['storey_range'].apply(lambda x: re.findall('\d+',x)[1])
df['block_num'] = df['block'].apply(lambda x: re.findall('\A\d+',x)[0])

In [13]:
# convert numeric columns
for x in ['floor_area_sqm','resale_price','storey_lower', 'storey_upper', 'block_num']:
    df[x] = pd.to_numeric(df[x])

In [14]:
# change to time format for date columns
for x in ['month','lease_commence_date']:
    df[x] = pd.to_datetime(df[x])

In [15]:
df['storey_ave'] = (df['storey_lower'] + df['storey_upper'])/2

In [4]:
numeric_feat =[col for col in df.columns if df[col].dtypes != 'O'and col not in ['_id']]
discrete_feat = [col for col in numeric_feat if len(df[col].unique())<25 and col not in ['_id']]
continuous_feat = [col for col in numeric_feat if col not in discrete_feat and col not in ['_id']]
categorical_feat = [col for col in df.columns if df[col].dtypes == 'O']

In [17]:
feature_types = [numeric_feat,discrete_feat,continuous_feat,categorical_feat]
feature_label = ['numeric','discrete','continuous','categorical']

for feat,ls in zip(feature_label,feature_types):
    print(f'{feat} features: {len(ls)}, \n{ls}\n')

numeric features: 8, 
['floor_area_sqm', 'resale_price', 'month', 'lease_commence_date', 'storey_lower', 'storey_upper', 'block_num', 'storey_ave']

discrete features: 3, 
['storey_lower', 'storey_upper', 'storey_ave']

continuous features: 5, 
['floor_area_sqm', 'resale_price', 'month', 'lease_commence_date', 'block_num']

categorical features: 7, 
['town', 'flat_type', 'flat_model', 'street_name', 'remaining_lease', 'storey_range', 'block']



## Dealing with duplicates

There is some duplicated data in this set. However, given the unspecific descriptors of the data it can be assumed that they are distinct purchases. So duplicates are not removed

In [27]:
df[df.duplicated(keep=False)].head()

Unnamed: 0,town,flat_type,flat_model,floor_area_sqm,street_name,resale_price,month,remaining_lease,lease_commence_date,storey_range,block,storey_lower,storey_upper,block_num,storey_ave
235,ANG MO KIO,4 ROOM,New Generation,91.0,ANG MO KIO AVE 1,470000.0,2017-04-01,63 years 10 months,1982-01-01,04 TO 06,335,4,6,335,5.0
245,ANG MO KIO,4 ROOM,New Generation,91.0,ANG MO KIO AVE 1,470000.0,2017-04-01,63 years 10 months,1982-01-01,04 TO 06,335,4,6,335,5.0
275,BEDOK,3 ROOM,New Generation,67.0,BEDOK NTH ST 3,288000.0,2017-04-01,61 years 05 months,1979-01-01,01 TO 03,510,1,3,510,2.0
276,BEDOK,3 ROOM,New Generation,67.0,BEDOK NTH ST 3,288000.0,2017-04-01,61 years 05 months,1979-01-01,01 TO 03,510,1,3,510,2.0
335,BEDOK,4 ROOM,Model A,108.0,JLN TENAGA,470000.0,2017-04-01,78 years 07 months,1996-01-01,04 TO 06,655,4,6,655,5.0


## Review Categories

'flat_type', 'flat_model' needs to be conditioned to remove redundancies

In [7]:
for feat in categorical_feat:
    print(f'======={feat}=======')
    print(df[feat].unique())

['WOODLANDS' 'YISHUN' 'ANG MO KIO' 'BEDOK' 'BISHAN' 'BUKIT BATOK'
 'PUNGGOL' 'BUKIT MERAH' 'BUKIT PANJANG' 'BUKIT TIMAH' 'CENTRAL AREA'
 'CHOA CHU KANG' 'CLEMENTI' 'GEYLANG' 'HOUGANG' 'JURONG EAST'
 'JURONG WEST' 'KALLANG/WHAMPOA' 'MARINE PARADE' 'PASIR RIS' 'QUEENSTOWN'
 'SEMBAWANG' 'SENGKANG' 'SERANGOON' 'TAMPINES' 'TOA PAYOH' 'LIM CHU KANG']
['5 ROOM' 'EXECUTIVE' '3 ROOM' '4 ROOM' '2 ROOM' '1 ROOM'
 'MULTI-GENERATION' 'MULTI GENERATION']
['Model A' 'Standard' 'Improved' 'Premium Apartment' 'Apartment'
 'Maisonette' 'New Generation' 'Simplified' 'DBSS' 'Model A2' 'Type S1'
 'Type S2' 'Model A-Maisonette' 'Adjoined flat' 'Improved-Maisonette'
 'Terrace' 'Premium Maisonette' 'Multi Generation'
 'Premium Apartment Loft' '2-room' 'IMPROVED' 'NEW GENERATION' 'MODEL A'
 'STANDARD' 'SIMPLIFIED' 'MODEL A-MAISONETTE' 'APARTMENT' 'MAISONETTE'
 'TERRACE' '2-ROOM' 'IMPROVED-MAISONETTE' 'MULTI GENERATION'
 'PREMIUM APARTMENT']
['MARSILING RD' 'MARSILING DR' 'WOODLANDS DR 60' 'WOODLANDS DR 75'
 'W

['2017-03-01' '2017-04-01' '2018-08-01' '2019-04-01' '2020-01-01'
 '2017-05-01' '2017-06-01' '2017-07-01' '2017-08-01' '2017-09-01'
 '2017-10-01' '2017-11-01' '2017-12-01' '2018-05-01' '2018-01-01'
 '2018-02-01' '2018-03-01' '2018-04-01' '2018-06-01' '2018-07-01'
 '2018-09-01' '2018-10-01' '2018-11-01' '2018-12-01' '2019-01-01'
 '2019-02-01' '2019-03-01' '2019-05-01' '2019-06-01' '2019-07-01'
 '2019-08-01' '2019-09-01' '2019-11-01' '2019-10-01' '2019-12-01'
 '2020-02-01' '2020-03-01' '2020-04-01' '2020-05-01' '2020-06-01'
 '2020-07-01' '2020-08-01' '2020-09-01' '2020-10-01' '2020-11-01'
 '2020-12-01' '2021-01-01' '2021-08-01' '2021-02-01' '2021-03-01'
 '2021-04-01' '2021-05-01' '2021-06-01' '2021-07-01' '2021-09-01'
 '2015-01-01' '2015-06-01' '2015-02-01' '2015-03-01' '2015-04-01'
 '2015-05-01' '2015-07-01' '2015-08-01' '2015-09-01' '2016-01-01'
 '2015-10-01' '2015-11-01' '2015-12-01' '2016-02-01' '2016-03-01'
 '2016-04-01' '2016-05-01' '2016-06-01' '2016-07-01' '2016-08-01'
 '2016-09-

In [14]:
# focus on flat_type, flat_model which have redundant titles
# focus on flat_type, flat_model which have redundant titles
def labeler(feat):
    temp = []
    for value in df[feat].unique():
        if value != ' '.join(re.findall( r'\w+', value.upper())):
                df[feat].replace(value,' '.join(re.findall( r'\w+', value.upper())),inplace=True)

In [18]:
for feat in ['flat_type', 'flat_model']:
    labeler(feat)

### Create a .csv of raw data

In [22]:
df.to_csv('hdb_resale.csv',index=False)

## Getting location data from OneMap

In [20]:
# Now, the next part gets a little tricky. We need to find the nearest MRT station the HDB unit is located in using their geo-location coordinates (Latitude, Longitude). We will use another API to achieve this - the OneMap API.

# First let's create a list of all the MRT stations in Singapore. 
# Since MRT stations change relatively slowly overtime, we can leverage a static list. 
# I obtained the data from Wikipedia, which also provides data on upcoming MRT stations.
# We will only consider existing MRT stations for now.
list_of_mrt = [
    'Jurong East MRT Station',
    'Bukit Batok MRT Station',
    'Bukit Gombak MRT Station',
    'Choa Chu Kang MRT Station',
    'Yew Tee MRT Station',
    'Kranji MRT Station',
    'Marsiling MRT Station',
    'Woodlands MRT Station',
    'Admiralty MRT Station',
    'Sembawang MRT Station',
    'Canberra MRT Station',
    'Yishun MRT Station',
    'Khatib MRT Station',
    'Yio Chu Kang MRT Station',
    'Ang Mo Kio MRT Station',
    'Bishan MRT Station',
    'Braddell MRT Station',
    'Toa Payoh MRT Station',
    'Novena MRT Station',
    'Newton MRT Station',
    'Orchard MRT Station',
    'Somerset MRT Station',
    'Dhoby Ghaut MRT Station',
    'City Hall MRT Station',
    'Raffles Place MRT Station',
    'Marina Bay MRT Station',
    'Marina South Pier MRT Station',
    'Pasir Ris MRT Station',
    'Tampines MRT Station',
    'Simei MRT Station',
    'Tanah Merah MRT Station',
    'Bedok MRT Station',
    'Kembangan MRT Station',
    'Eunos MRT Station',
    'Paya Lebar MRT Station',
    'Aljunied MRT Station',
    'Kallang MRT Station',
    'Lavender MRT Station',
    'Bugis MRT Station',
    'Tanjong Pagar MRT Station',
    'Outram Park MRT Station',
    'Tiong Bahru MRT Station',
    'Redhill MRT Station',
    'Queenstown MRT Station',
    'Commonwealth MRT Station',
    'Buona Vista MRT Station',
    'Dover MRT Station',
    'Clementi MRT Station',
    'Chinese Garden MRT Station',
    'Lakeside MRT Station',
    'Boon Lay MRT Station',
    'Pioneer MRT Station',
    'Joo Koon MRT Station',
    'Gul Circle MRT Station',
    'Tuas Crescent MRT Station',
    'Tuas West Road MRT Station',
    'Tuas Link MRT Station',
    'Expo MRT Station',
    'Changi Airport MRT Station',
    'HarbourFront MRT Station',
    'Chinatown MRT Station',
    'Clarke Quay MRT Station',
    'Little India MRT Station',
    'Farrer Park MRT Station',
    'Boon Keng MRT Station',
    'Potong Pasir MRT Station',
    'Woodleigh MRT Station',
    'Serangoon MRT Station',
    'Kovan MRT Station',
    'Hougang MRT Station',
    'Buangkok MRT Station',
    'Sengkang MRT Station',
    'Punggol MRT Station',
    'Bras Basah MRT Station',
    'Esplanade MRT Station',
    'Promenade MRT Station',
    'Nicoll Highway MRT Station',
    'Stadium MRT Station',
    'Mountbatten MRT Station',
    'Dakota MRT Station',
    'MacPherson MRT Station',
    'Tai Seng MRT Station',
    'Bartley MRT Station',
    'Lorong Chuan MRT Station',
    'Marymount MRT Station',
    'Caldecott MRT Station',
    'Botanic Gardens MRT Station',
    'Farrer Road MRT Station',
    'Holland Village MRT Station',
    'one-north MRT Station',
    'Kent Ridge MRT Station',
    'Haw Par Villa MRT Station',
    'Pasir Panjang MRT Station',
    'Labrador Park MRT Station',
    'Telok Blangah MRT Station',
    'Bayfront MRT Station',
    'Bukit Panjang MRT Station',
    'Cashew MRT Station',
    'Hillview MRT Station',
    'Beauty World MRT Station',
    'King Albert Park MRT Station',
    'Sixth Avenue MRT Station',
    'Tan Kah Kee MRT Station',
    'Stevens MRT Station',
    'Rochor MRT Station',
    'Downtown MRT Station',
    'Telok Ayer MRT Station',
    'Fort Canning MRT Station',
    'Bencoolen MRT Station',
    'Jalan Besar MRT Station',
    'Bendemeer MRT Station',
    'Geylang Bahru MRT Station',
    'Mattar MRT Station',
    'Ubi MRT Station',
    'Kaki Bukit MRT Station',
    'Bedok North MRT Station',
    'Bedok Reservoir MRT Station',
    'Tampines West MRT Station',
    'Tampines East MRT Station',
    'Upper Changi MRT Station'
]


# In[10]:


list_of_shopping_mall = [
    '100 AM',
    '313@Somerset',
    'Aperia',
    'Balestier Hill Shopping Centre',
    'Bugis Cube',
    'Bugis Junction',
    'Bugis+',
    'Capitol Piazza',
    'Cathay Cineleisure Orchard',
    'City Gate',
    'City Square Mall',
    'CityLink Mall',
    'Clarke Quay Central',
    'Duo',
    'Far East Plaza',
    'Funan',
    'Great World City',
    'HDB Hub',
    'Holland Village Shopping Centre',
    'ION Orchard',
    'Junction 8',
    'Knightsbridge',
    'Liang Court',
    'Liat Towers',
    'Lucky Plaza',
    'Marina Bay Financial Centre Tower 3',
    'Marina Bay Link Mall',
    'Marina Bay Sands',
    'Marina One',
    'Marina Square',
    'Midpoint Orchard',
    'Millenia Walk',
    'Mustafa Centre',
    'Ngee Ann City',
    'Orchard Central',
    'Orchard Gateway',
    'Orchard Plaza',
    'Orchard Shopping Centre',
    'Palais Renaissance',
    'Peoples Park Centre',
    'People"s Park Complex',
    'Plaza Singapura',
    'PoMo',
    'Raffles City',
    'Scotts Square',
    'Serangoon Plaza',
    'Shaw House and Centre',
    'Sim Lim Square',
    'Singapore Shopping Centre',
    'Square 2',
    'Suntec City',
    'Tanglin Mall',
    'Tangs',
    'Tanjong Pagar Centre',
    'Tekka Centre',
    'The Centrepoint',
    'The Paragon',
    'The Poiz [2]',
    'The Shoppes at Marina Bay Sands',
    'The South Beach',
    'Thomson Plaza',
    'United Square, The Kids Learning Mall',
    'Velocity',
    'Wheelock Place',
    'Wisma Atria',
    'Zhongshan Mall',
    '112 Katong',
    'Bedok Mall',
    'Bedok Point',
    'Century Square',
    'Changi Airport',
    'Changi City Point',
    'City Plaza',
    'Djitsun Mall Bedok',
    'Downtown East',
    'East Village',
    'Eastpoint Mall',
    'Elias Mall',
    'Kallang Wave Mall',
    'Katong Square',
    'Katong V',
    'KINEX (formerly One KM Mall)',
    'Leisure Park Kallang',
    'Loyang Point',
    'Our Tampines Hub',
    'Parkway Parade',
    'Paya Lebar Square',
    'PLQ Mall',
    'Singapore Post Centre',
    'Tampines 1',
    'Tampines Mall',
    'The Flow',
    'White Sands',
    '888 Plaza',
    'Admiralty Place',
    'AMK Hub',
    'Beauty World Centre',
    'Beauty World Plaza',
    'Broadway Plaza',
    'Buangkok Square',
    'Bukit Panjang Plaza',
    'Bukit Timah Plaza',
    'Causeway Point',
    'Compass One',
    'Djitsun Mall',
    'Fajar Shopping Centre',
    'Greenridge Shopping Centre',
    'Greenwich V',
    'Heartland Mall',
    'Hillion Mall',
    'HillV2',
    'Hougang 1',
    'Hougang Green Shopping Mall',
    'Hougang Mall',
    'Jubilee Square',
    'Junction 10',
    'Junction 9',
    'Keat Hong Shopping Centre',
    'KKH The Retail Mall',
    'Limbang Shopping Centre',
    'Lot One',
    'Marsiling Mall',
    'myVillage @ Serangoon',
    'NEX',
    'North East',
    'North West',
    'Northpoint City',
    'Oasis Terraces',
    'Punggol Plaza',
    'Rail Mall',
    'Rivervale Mall',
    'Rivervale Plaza',
    'Sembawang Shopping Centre',
    'Sun Plaza',
    'Sunshine Place',
    'Teck Whye Shopping Centre',
    'The Midtown',
    'The Seletar Mall',
    'Upper Serangoon Shopping Centre',
    'Waterway Point',
    'West Mall',
    'Wisteria Mall',
    'Woodlands Mart',
    'Yew Tee Point',
    'Yew Tee Shopping Centre',
    'Yew Tee Square',
    'Alexandra Retail Centre',
    'HarbourFront Centre',
    'VivoCity',
    '321 Clementi',
    'Alexandra Central',
    'Anchorpoint',
    'Big Box',
    'Boon Lay Shopping Centre',
    'Fairprice Hub',
    'Gek Poh Shopping Centre',
    'Grantral Mall',
    'IMM',
    'JCube',
    'Jem',
    'Jurong Point',
    'OD Mall',
    'Pioneer Mall',
    'Queensway Shopping Centre',
    'Rochester Mall',
    'Taman Jurong Shopping Centre',
    'The Clementi Mall',
    'The Star Vista',
    'Tiong Bahru Plaza',
    'West Coast Plaza',
    'Westgate Mall',
]



In [39]:
# We will use the OneMap API to obtain the (lat, long) coordinates of each MRT station.
# Obtaining MRT coordinates in Singapore
mrt_building = []
mrt_lat = []
mrt_long = []


for i in range(0, len(list_of_mrt)):
    query_address = list_of_mrt[i]
    query_string = 'https://developers.onemap.sg/commonapi/search?searchVal='+str(query_address)+'&returnGeom=Y&getAddrDetails=Y'
    resp = requests.get(query_string)

    data_mrt=json.loads(resp.content)
    
    if data_mrt['found'] != 0:
        mrt_building.append(data_mrt["results"][0]["BUILDING"])
        mrt_lat.append(data_mrt["results"][0]["LATITUDE"])
        mrt_long.append(data_mrt["results"][0]["LONGITUDE"])

        print (str(query_address)+",Lat: "+data_mrt['results'][0]['LATITUDE'] +" Long: "+data_mrt['results'][0]['LONGITUDE'])

    else:
        mrt_building.append('NotFound')
        mrt_lat.append('NotFound')
        mrt_long.append('NotFound')
        print ("No Results")

# Store this information in a dataframe
mrt_location = pd.DataFrame({
    'MRT': list_of_mrt,
    'Building': mrt_building,
    'Latitude': mrt_lat,
    'Longitude': mrt_long
})

Jurong East MRT Station,Lat: 1.33315281585758 Long: 103.742286332403
Bukit Batok MRT Station,Lat: 1.34903331201636 Long: 103.749566478309
Bukit Gombak MRT Station,Lat: 1.35861159094192 Long: 103.751790910733
Choa Chu Kang MRT Station,Lat: 1.38536316540225 Long: 103.744370779756
Yew Tee MRT Station,Lat: 1.39753506936297 Long: 103.747405150236
Kranji MRT Station,Lat: 1.42508698073648 Long: 103.762137459497
Marsiling MRT Station,Lat: 1.43252114855026 Long: 103.774074641403
Woodlands MRT Station,Lat: 1.43605761708128 Long: 103.787938777173
Admiralty MRT Station,Lat: 1.44058856161847 Long: 103.800990519771
Sembawang MRT Station,Lat: 1.44905082158502 Long: 103.820046140211
Canberra MRT Station,Lat: 1.44307664075699 Long: 103.829702590959
Yishun MRT Station,Lat: 1.42944308477331 Long: 103.835005047246
Khatib MRT Station,Lat: 1.41738337009565 Long: 103.832979908243
Yio Chu Kang MRT Station,Lat: 1.38175587099132 Long: 103.84494727118
Ang Mo Kio MRT Station,Lat: 1.36993284962264 Long: 103.849558

In [40]:
# Obtaining Mall Coordinates in Singapore
mall_name = []
mall_roadname = []
mall_lat = []
mall_long = []

for i in range(0, len(list_of_shopping_mall)):
    query_address = list_of_shopping_mall[i]
    query_string = 'https://developers.onemap.sg/commonapi/search?searchVal='+str(query_address)+'&returnGeom=Y&getAddrDetails=Y'
    resp = requests.get(query_string)
    data_mall=json.loads(resp.content)
    
    if data_mall['found'] != 0:
        mall_name.append(query_address)
        mall_roadname.append(data_mall["results"][0]["ROAD_NAME"])
        mall_lat.append(data_mall["results"][0]["LATITUDE"])
        mall_long.append(data_mall["results"][0]["LONGITUDE"])

        print (str(query_address)+" ,Lat: "+data_mall['results'][0]['LATITUDE'] +" Long: "+data_mall['results'][0]['LONGITUDE'])

    else:
        print ("No Results")

# Store this information in a dataframe
mall_location = pd.DataFrame({
    'Mall': mall_name,
    'RoadName': mall_roadname,
    'Latitude': mall_lat,
    'Longitude': mall_long
})

100 AM ,Lat: 1.27458821795426 Long: 103.84347073661
313@Somerset ,Lat: 1.30138510214714 Long: 103.837684350436
Aperia ,Lat: 1.3104736675734 Long: 103.86431321816
Balestier Hill Shopping Centre ,Lat: 1.32616307866261 Long: 103.843741438467
Bugis Cube ,Lat: 1.2981408343975 Long: 103.855635339249
Bugis Junction ,Lat: 1.30011789343093 Long: 103.856191571652
Bugis+ ,Lat: 1.30095171530648 Long: 103.855172625542
Capitol Piazza ,Lat: 1.29307884763132 Long: 103.851261982149
Cathay Cineleisure Orchard ,Lat: 1.30152101873533 Long: 103.836429655016
City Gate ,Lat: 1.30231590504573 Long: 103.862331661034
City Square Mall ,Lat: 1.31142103107683 Long: 103.856624019991
CityLink Mall ,Lat: 1.2927777312893 Long: 103.854173501417
No Results
Duo ,Lat: 1.29953434891664 Long: 103.85840168774
Far East Plaza ,Lat: 1.30717698071189 Long: 103.833792932243
Funan ,Lat: 1.29134759697794 Long: 103.849989790085
Great World City ,Lat: 1.29342273662815 Long: 103.832022135776
HDB Hub ,Lat: 1.33205485190481 Long: 103.84

Alexandra Retail Centre ,Lat: 1.27414893629254 Long: 103.801399416665
HarbourFront Centre ,Lat: 1.26396991176004 Long: 103.820243325181
VivoCity ,Lat: 1.26429316783973 Long: 103.82230469365
321 Clementi ,Lat: 1.31200212030821 Long: 103.764986676365
Alexandra Central ,Lat: 1.28728320930592 Long: 103.805283367958
Anchorpoint ,Lat: 1.28861469030586 Long: 103.805009964592
Big Box ,Lat: 1.33191839981568 Long: 103.744920406659
Boon Lay Shopping Centre ,Lat: 1.34633552457315 Long: 103.712430373455
Fairprice Hub ,Lat: 1.32561767765205 Long: 103.678409741897
Gek Poh Shopping Centre ,Lat: 1.34874357136408 Long: 103.697732091001
Grantral Mall ,Lat: 1.31427043922294 Long: 103.765147487482
IMM ,Lat: 1.33489844010888 Long: 103.746734487309
JCube ,Lat: 1.33332312550425 Long: 103.740187071348
Jem ,Lat: 1.33305999269581 Long: 103.743503708169
Jurong Point ,Lat: 1.33945271661445 Long: 103.70668501289
No Results
Pioneer Mall ,Lat: 1.34169239214292 Long: 103.697174842118
Queensway Shopping Centre ,Lat: 1.

In [41]:
mrt_location.to_csv('./locations/mrt_location.csv',index=False)
mall_location.to_csv('./locations/mall_location.csv',index=False)

In [43]:
mrt_location

Unnamed: 0,MRT,Building,Latitude,Longitude
0,Jurong East MRT Station,JURONG EAST MRT STATION (EW24 / NS1),1.33315281585758,103.742286332403
1,Bukit Batok MRT Station,BUKIT BATOK MRT STATION (NS2),1.34903331201636,103.749566478309
2,Bukit Gombak MRT Station,BUKIT GOMBAK MRT STATION (NS3),1.35861159094192,103.751790910733
3,Choa Chu Kang MRT Station,CHOA CHU KANG MRT STATION (NS4),1.38536316540225,103.744370779756
4,Yew Tee MRT Station,YEW TEE MRT STATION (NS5),1.39753506936297,103.747405150236
...,...,...,...,...
115,Bedok North MRT Station,BEDOK NORTH MRT STATION (DT29),1.33474211664091,103.91797832995
116,Bedok Reservoir MRT Station,BEDOK RESERVOIR MRT STATION (DT30),1.33660782955099,103.932234623286
117,Tampines West MRT Station,TAMPINES WEST MRT STATION (DT31),1.34551530560119,103.938436971222
118,Tampines East MRT Station,TAMPINES EAST MRT STATION (DT33),1.35619148271544,103.9546344625


In [44]:
mall_location

Unnamed: 0,Mall,RoadName,Latitude,Longitude
0,100 AM,TRAS STREET,1.27458821795426,103.84347073661
1,313@Somerset,ORCHARD ROAD,1.30138510214714,103.837684350436
2,Aperia,KALLANG AVENUE,1.3104736675734,103.86431321816
3,Balestier Hill Shopping Centre,BALESTIER ROAD,1.32616307866261,103.843741438467
4,Bugis Cube,NORTH BRIDGE ROAD,1.2981408343975,103.855635339249
...,...,...,...,...
151,Taman Jurong Shopping Centre,YUNG SHENG ROAD,1.33484487416514,103.720462024278
152,The Clementi Mall,COMMONWEALTH AVENUE WEST,1.31489619639366,103.764423054189
153,The Star Vista,VISTA EXCHANGE GREEN,1.30697044038323,103.788420274115
154,Tiong Bahru Plaza,TIONG BAHRU ROAD,1.28645889472825,103.827015256194


### getting data for all the house holds

In [112]:
from geopy.distance import geodesic, great_circle

In [95]:
from geopy.geocoders import GoogleV3

In [97]:
from geopy.geocoders import Nominatim

In [92]:
#!pip install geopy

Collecting geopy
  Downloading geopy-2.2.0-py3-none-any.whl (118 kB)
[K     |████████████████████████████████| 118 kB 13.8 MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49
  Downloading geographiclib-1.52-py3-none-any.whl (38 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.52 geopy-2.2.0


In [98]:
geolocator = Nominatim(user_agent="GoogleV3")

In [109]:
address2, (latitude2, longitude2) = geolocator.geocode('The Paragon Singapore')

In [114]:
print(great_circle((latitude1, longitude1), (latitude2, longitude2)).km)

4.757381542639183


In [50]:
print(f'{len(mrt_location)} mrt out of {len(list_of_mrt)} searched')
print(f'{len(mall_location)} malls out of {len(list_of_shopping_mall)} searched')

120 malls out of 120 searched
156 malls out of 171 searched


In [120]:
# Now - let's find grab the geolocation of each unit that was transacated using the same method. But hang on - that's a large dataset.. we can make it a bit more efficient. We know that there will be multiple units that will be transacted in the same HDB Apartment block. We can de-dup our dataframe and obtion the only unique addresses in our dataframe. 
# Let's combine the block and street name to form the address of our transacted unit.
df['address'] = df['block'] + ' ' + df['street_name']



# Dedup Address List
df_dedup = df.drop_duplicates(subset='address', keep='first')
len(df_dedup)

# Next let's grab the unique addresses and create a list 
address_list = df_dedup['address'].tolist()
len(address_list)

281

In [122]:
# This may take a while...
latitude = []
longitude = []
blk_no = []
road_name = []
postal_code = []
address = []
count = 0

for row in range(len(address_list)):
    #formulate query string 
    address1, (latitude1, longitude1) = geolocator.geocode(address_list[row])     

    #Convert JSON into Python Object 
    data_geo_location=json.loads(resp.content)
    if data_geo_location['found'] != 0:
        latitude.append(data_geo_location['results'][0]['LATITUDE'])
        longitude.append(data_geo_location['results'][0]['LONGITUDE'])
        blk_no.append(data_geo_location['results'][0]['BLK_NO'])
        road_name.append(data_geo_location['results'][0]['ROAD_NAME'])
        postal_code.append(data_geo_location['results'][0]['POSTAL'])
        address.append(query_address)
        print (str(query_address) + " ,Lat: " + data_geo_location['results'][0]['LATITUDE'] + " Long: " + data_geo_location['results'][0]['LONGITUDE'])
    else:
        print ("No Results")

52 LOR 6 TOA PAYOH ,Lat: 1.33824914332833 Long: 103.852481978622
195 KIM KEAT AVE ,Lat: 1.33008102222991 Long: 103.858354604112
194 KIM KEAT AVE ,Lat: 1.33093933675526 Long: 103.858181909317
71 LOR 4 TOA PAYOH ,Lat: 1.33417429846958 Long: 103.852204554719
3 LOR 7 TOA PAYOH ,Lat: 1.33919545395688 Long: 103.854450883864
92 LOR 4 TOA PAYOH ,Lat: 1.33807081646129 Long: 103.849457178811
8 LOR 7 TOA PAYOH ,Lat: 1.3378805011388 Long: 103.856612802183
12 LOR 7 TOA PAYOH ,Lat: 1.33674610724551 Long: 103.857615884177
18 LOR 7 TOA PAYOH ,Lat: 1.33575723665506 Long: 103.856789628423
118 POTONG PASIR AVE 1 ,Lat: 1.33548147418923 Long: 103.862653751675
27 TOA PAYOH EAST ,Lat: 1.33227000712346 Long: 103.856360426029
116 LOR 2 TOA PAYOH ,Lat: 1.34047287241805 Long: 103.846142516335
113 LOR 1 TOA PAYOH ,Lat: 1.34128334016832 Long: 103.844605484696
98 LOR 1 TOA PAYOH ,Lat: 1.33917577957921 Long: 103.848380074395
183 TOA PAYOH CTRL ,Lat: 1.33335596212429 Long: 103.848865687447
205 TOA PAYOH NTH ,Lat: 1.3

138C LOR 1A TOA PAYOH ,Lat: 1.33626413671063 Long: 103.844907413472
227 LOR 8 TOA PAYOH ,Lat: 1.33951514445795 Long: 103.857970837281
202 TOA PAYOH NTH ,Lat: 1.34152911615925 Long: 103.849589127712
241 KIM KEAT LINK ,Lat: 1.33050536301596 Long: 103.855776378989
36 LOR 5 TOA PAYOH ,Lat: 1.33482139850243 Long: 103.855272944611
175 LOR 2 TOA PAYOH ,Lat: 1.33339638797854 Long: 103.847053798767
155 LOR 1 TOA PAYOH ,Lat: 1.33279631135029 Long: 103.845588638625
84A LOR 2 TOA PAYOH ,Lat: 1.33535500400261 Long: 103.847029911151
244 KIM KEAT LINK ,Lat: 1.33069367982742 Long: 103.85635902134
86 LOR 2 TOA PAYOH ,Lat: 1.33549135587104 Long: 103.847691358826
140 POTONG PASIR AVE 3 ,Lat: 1.33381551462422 Long: 103.866824943073
139B LOR 1A TOA PAYOH ,Lat: 1.33671109109394 Long: 103.843878159336
127 POTONG PASIR AVE 1 ,Lat: 1.33503445523376 Long: 103.865505631737
94 LOR 4 TOA PAYOH ,Lat: 1.33889442783825 Long: 103.849538318068
4 UPP ALJUNIED LANE ,Lat: 1.33354743210386 Long: 103.879065838406
3 UPP ALJU

103 POTONG PASIR AVE 1 ,Lat: 1.33411338676706 Long: 103.86952371664
137 POTONG PASIR AVE 3 ,Lat: 1.33385759374197 Long: 103.865845367722
251 KIM KEAT LINK ,Lat: 1.3315468916112 Long: 103.85701736179
46 LOR 5 TOA PAYOH ,Lat: 1.33679590445632 Long: 103.85339881314
126 POTONG PASIR AVE 1 ,Lat: 1.33497246267555 Long: 103.865234309552
128 POTONG PASIR AVE 1 ,Lat: 1.33480397745816 Long: 103.86578784298
111 LOR 1 TOA PAYOH ,Lat: 1.34121955722461 Long: 103.84549278032
144 POTONG PASIR AVE 2 ,Lat: 1.33262019907368 Long: 103.86551350588
246 KIM KEAT LINK ,Lat: 1.33078082665669 Long: 103.855844268285
240 LOR 1 TOA PAYOH ,Lat: 1.34087644567662 Long: 103.850829960636
101B LOR 2 TOA PAYOH ,Lat: 1.33959884500115 Long: 103.847605414427
186 TOA PAYOH CTRL ,Lat: 1.3326539531256 Long: 103.85021164455
124 POTONG PASIR AVE 1 ,Lat: 1.3353470224763 Long: 103.865451584255
7 UPP ALJUNIED LANE ,Lat: 1.33471455041652 Long: 103.878708329058
9 JOO SENG RD ,Lat: 1.33490017856933 Long: 103.879334262884
255 KIM KEAT 

In [123]:
df_coordinates = pd.DataFrame({
    'latitude': latitude,
    'longitude': longitude,
    'blk_no': blk_no,
    'road_name': road_name,
    'postal_code': postal_code,
    'address': address
})

After running through 105 street addresses program error code.  
Initial 105 geodata is saved to `hdb_location.csv`  
so a new method is used with only using TOA PAYOH town for comparisons

In [124]:
df_coordinates.to_csv('./locations/hdb_location2.csv',index=False)

In [85]:
# read all the data needed
mrt_location = pd.read_csv('./locations/mrt_location.csv')
mall_location = pd.read_csv('./locations/mall_location.csv')
df_coordinates1 = pd.read_csv('./locations/hdb_location2.csv')
df_coordinates2 = pd.read_csv('./locations/hdb_location2.csv')
df_coordinates = pd.concat([df_coordinates1,df_coordinates2], ignore_index=True)

In [86]:
for i in [mrt_location, mall_location, df_coordinates1,df_coordinates2]:
    print(i.head())

                         MRT                              Building  Latitude  \
0    Jurong East MRT Station  JURONG EAST MRT STATION (EW24 / NS1)  1.333153   
1    Bukit Batok MRT Station         BUKIT BATOK MRT STATION (NS2)  1.349033   
2   Bukit Gombak MRT Station        BUKIT GOMBAK MRT STATION (NS3)  1.358612   
3  Choa Chu Kang MRT Station       CHOA CHU KANG MRT STATION (NS4)  1.385363   
4        Yew Tee MRT Station             YEW TEE MRT STATION (NS5)  1.397535   

    Longitude  
0  103.742286  
1  103.749566  
2  103.751791  
3  103.744371  
4  103.747405  
                             Mall           RoadName  Latitude   Longitude
0                          100 AM        TRAS STREET  1.274588  103.843471
1                    313@Somerset       ORCHARD ROAD  1.301385  103.837684
2                          Aperia     KALLANG AVENUE  1.310474  103.864313
3  Balestier Hill Shopping Centre     BALESTIER ROAD  1.326163  103.843741
4                      Bugis Cube  NORTH BRIDGE 

In [90]:
df_coordinates

Unnamed: 0,latitude,longitude,blk_no,road_name,postal_code,address
0,1.338249,103.852482,52,LORONG 6 TOA PAYOH,310052,52 LOR 6 TOA PAYOH
1,1.330081,103.858355,195,KIM KEAT AVENUE,310195,195 KIM KEAT AVE
2,1.330939,103.858182,194,KIM KEAT AVENUE,310194,194 KIM KEAT AVE
3,1.334174,103.852205,71,LORONG 4 TOA PAYOH,310071,71 LOR 4 TOA PAYOH
4,1.339195,103.854451,3,LORONG 7 TOA PAYOH,310003,3 LOR 7 TOA PAYOH
...,...,...,...,...,...,...
549,1.334449,103.865564,130,POTONG PASIR AVENUE 1,350130,130 POTONG PASIR AVE 1
550,1.334175,103.865611,132,POTONG PASIR AVENUE 1,350132,132 POTONG PASIR AVE 1
551,1.334762,103.865543,129,POTONG PASIR AVENUE 1,350129,129 POTONG PASIR AVE 1
552,1.334727,103.849823,79E,TOA PAYOH CENTRAL,315079,79 TOA PAYOH CTRL


---
---
---
---
play area

In [3]:
df = pd.read_csv('hdb_resale.csv')

  exec(code_obj, self.user_global_ns, self.user_ns)


In [4]:
mrt_location = pd.read_csv('./locations/mrt_location.csv')
mall_location = pd.read_csv('./locations/mall_location.csv')
df_coordinates1 = pd.read_csv('./locations/hdb_location.csv')
df_coordinates2 = pd.read_csv('./locations/hdb_location2.csv')
df_coordinates3 = pd.read_csv('./locations/hdb_location3.csv')

In [5]:
df_coordinates1

Unnamed: 0,latitude,longitude,blk_no,road_name,postal_code,address
0,1.439112,103.782257,1,MARSILING INDUSTRIAL ESTATE ROAD 4,739229,MARSILING RD
1,1.440671,103.775740,1,MARSILING DRIVE,730001,MARSILING DR
2,1.432705,103.797918,60,WOODLANDS DRIVE 16,737896,WOODLANDS DR 60
3,1.441216,103.806599,687,WOODLANDS DRIVE 75,730687,WOODLANDS DR 75
4,1.440922,103.789086,,WOODLANDS ST 81,NIL,WOODLANDS ST 81
...,...,...,...,...,...,...
100,1.346322,103.754328,,BT BATOK ST 21,NIL,BT BATOK ST 21
101,1.365774,103.748521,2,BUKIT BATOK WEST AVENUE 7,659003,BT BATOK WEST AVE 7
102,1.287288,103.831873,2,JALAN BUKIT HO SWEE,162002,JLN BT HO SWEE
103,1.287973,103.808839,35,JALAN RUMAH TINGGI,150035,JLN RUMAH TINGGI


In [6]:
df_coordinates2

Unnamed: 0,latitude,longitude,blk_no,road_name,postal_code,address
0,1.338249,103.852482,52,LORONG 6 TOA PAYOH,310052,52 LOR 6 TOA PAYOH
1,1.330081,103.858355,195,KIM KEAT AVENUE,310195,195 KIM KEAT AVE
2,1.330939,103.858182,194,KIM KEAT AVENUE,310194,194 KIM KEAT AVE
3,1.334174,103.852205,71,LORONG 4 TOA PAYOH,310071,71 LOR 4 TOA PAYOH
4,1.339195,103.854451,3,LORONG 7 TOA PAYOH,310003,3 LOR 7 TOA PAYOH
...,...,...,...,...,...,...
272,1.334449,103.865564,130,POTONG PASIR AVENUE 1,350130,130 POTONG PASIR AVE 1
273,1.334175,103.865611,132,POTONG PASIR AVENUE 1,350132,132 POTONG PASIR AVE 1
274,1.334762,103.865543,129,POTONG PASIR AVENUE 1,350129,129 POTONG PASIR AVE 1
275,1.334727,103.849823,79E,TOA PAYOH CENTRAL,315079,79 TOA PAYOH CTRL


In [7]:
df_coordinates3

Unnamed: 0,latitude,longitude,full_address,block,street_name,address
0,1.437702,103.777336,"139, Marsiling Road, Woodlands, Northwest, 739...",139,Marsiling Road,139 Marsiling Road
1,1.441276,103.776461,"9, Marsiling Drive, Woodlands, Northwest, 7300...",9,Marsiling Drive,9 Marsiling Drive
2,1.440353,103.777208,"Marsiling Drive, Woodlands, Northwest, 730010,...",Marsiling Drive,Woodlands,Marsiling Drive Woodlands
3,1.446388,103.799086,"Woodlands Drive 60, Woodlands, Northwest, 7327...",Woodlands Drive 60,Woodlands,Woodlands Drive 60 Woodlands
4,1.441802,103.786259,"801, Woodlands St 81, Woodlands, Northwest, 73...",801,Woodlands St 81,801 Woodlands St 81
...,...,...,...,...,...,...
7884,1.287325,103.817322,"71, Redhill Road, Bukit Merah, Singapore, Cent...",71,Redhill Road,71 Redhill Road
7885,1.372632,103.855825,"529, Ang Mo Kio Avenue 10, Ang Mo Kio, Singapo...",529,Ang Mo Kio Avenue 10,529 Ang Mo Kio Avenue 10
7886,1.286151,103.831144,"7, Tiong Bahru Road, Tiong Bahru Estate, Bukit...",7,Tiong Bahru Road,7 Tiong Bahru Road
7887,1.286207,103.831763,"5, Tiong Bahru Road, Tiong Bahru Estate, Bukit...",5,Tiong Bahru Road,5 Tiong Bahru Road


In [32]:
df

Unnamed: 0,town,flat_type,flat_model,floor_area_sqm,street_name,resale_price,month,remaining_lease,lease_commence_date,storey_range,...,storey_lower,storey_upper,block_num,storey_ave,address,latitude,longitude,blk_no,road_name,postal_code
0,ANG MO KIO,2 ROOM,IMPROVED,44.0,ANG MO KIO AVE 10,232000.0,2017-01-01,61 years 04 months,1979-01-01,10 TO 12,...,10,12,406,11.0,406 ANG MO KIO AVE 10,,,,,
1,ANG MO KIO,3 ROOM,NEW GENERATION,67.0,ANG MO KIO AVE 4,250000.0,2017-01-01,60 years 07 months,1978-01-01,01 TO 03,...,1,3,108,2.0,108 ANG MO KIO AVE 4,,,,,
2,ANG MO KIO,3 ROOM,NEW GENERATION,67.0,ANG MO KIO AVE 5,262000.0,2017-01-01,62 years 05 months,1980-01-01,01 TO 03,...,1,3,602,2.0,602 ANG MO KIO AVE 5,,,,,
3,ANG MO KIO,3 ROOM,NEW GENERATION,68.0,ANG MO KIO AVE 10,265000.0,2017-01-01,62 years 01 month,1980-01-01,04 TO 06,...,4,6,465,5.0,465 ANG MO KIO AVE 10,,,,,
4,ANG MO KIO,3 ROOM,NEW GENERATION,67.0,ANG MO KIO AVE 5,265000.0,2017-01-01,62 years 05 months,1980-01-01,01 TO 03,...,1,3,601,2.0,601 ANG MO KIO AVE 5,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
849753,YISHUN,EXECUTIVE,APARTMENT,142.0,YISHUN ST 61,456000.0,1999-12-01,,1987-01-01,10 TO 12,...,10,12,611,11.0,611 YISHUN ST 61,,,,,
849754,YISHUN,EXECUTIVE,APARTMENT,142.0,YISHUN CTRL,408000.0,1999-12-01,,1988-01-01,01 TO 03,...,1,3,324,2.0,324 YISHUN CTRL,,,,,
849755,YISHUN,EXECUTIVE,MAISONETTE,146.0,YISHUN AVE 6,469000.0,1999-12-01,,1988-01-01,07 TO 09,...,7,9,392,8.0,392 YISHUN AVE 6,,,,,
849756,YISHUN,EXECUTIVE,MAISONETTE,146.0,YISHUN RING RD,440000.0,1999-12-01,,1988-01-01,04 TO 06,...,4,6,356,5.0,356 YISHUN RING RD,,,,,


In [21]:
df['address'] = df['block'] + ' ' + df['street_name']

In [13]:
df[df.address.str.contains('52 LOR 6 TOA PAYOH')]

Unnamed: 0,town,flat_type,flat_model,floor_area_sqm,street_name,resale_price,month,remaining_lease,lease_commence_date,storey_range,block,storey_lower,storey_upper,block_num,storey_ave
2076,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,218000.0,2017-02-01,65 years 04 months,1983-01-01,04 TO 06,52,4,6,52,5.0
2077,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,245000.0,2017-02-01,65 years 04 months,1983-01-01,10 TO 12,52,10,12,52,11.0
3864,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,229000.0,2017-03-01,65 years 03 months,1983-01-01,10 TO 12,52,10,12,52,11.0
3865,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,240000.0,2017-03-01,65 years 03 months,1983-01-01,07 TO 09,52,7,9,52,8.0
5720,TOA PAYOH,2 ROOM,STANDARD,40.0,LOR 6 TOA PAYOH,215000.0,2017-04-01,65 years 02 months,1983-01-01,01 TO 03,52,1,3,52,2.0
16638,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,219000.0,2017-10-01,64 years 09 months,1983-01-01,10 TO 12,52,10,12,52,11.0
18571,TOA PAYOH,2 ROOM,STANDARD,40.0,LOR 6 TOA PAYOH,205000.0,2017-11-01,64 years 07 months,1983-01-01,01 TO 03,52,1,3,52,2.0
20275,TOA PAYOH,2 ROOM,STANDARD,52.0,LOR 6 TOA PAYOH,198000.0,2017-12-01,64 years 06 months,1983-01-01,04 TO 06,52,4,6,52,5.0
21391,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,210000.0,2018-01-01,64 years 06 months,1983-01-01,10 TO 12,52,10,12,52,11.0
63879,TOA PAYOH,2 ROOM,STANDARD,43.0,LOR 6 TOA PAYOH,188000.0,2019-12-01,62 years 06 months,1983-01-01,07 TO 09,52,7,9,52,8.0


In [28]:
df = df.merge(df_coordinates2,on='address',how='left')

In [31]:
df[df.address.str.contains('52 LORONG')]

Unnamed: 0,town,flat_type,flat_model,floor_area_sqm,street_name,resale_price,month,remaining_lease,lease_commence_date,storey_range,...,storey_lower,storey_upper,block_num,storey_ave,address,latitude,longitude,blk_no,road_name,postal_code
