# <center>Geocoding Singapore Planning Area information with longitude and latitude</center>

Once the longitude and latitude information acquired form GeoCoding with Google Map API, we can then get the Singapore zones and subzones information from geojson file. The zones information will be used to further mapped with socio-demograpy data from Department of Statistics Singapore Census data.

For Singapore Planning Area Information : https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore


## Importing packages

In [1]:
import pandas as pd
import numpy as np
import shapely
from shapely.geometry import Point, shape
import json

import warnings
warnings.filterwarnings('ignore')

## READING Input File with Longitude and Latitude info

In [2]:
# Location where the file has long and lat information
FILE_INPUT = 'C:/Users/liuleo/Documents/KT/Ext_Data/Data/hdb_trx.csv'
df = pd.read_csv(FILE_INPUT)


# GET the column name with longitude info
LNG_COL = 'lng' 
# GET the column name with latitude info
LAT_COL = 'lat'
# GET the ID (Customer ID or policy ID), here just using month as an illustration
ID_COL = 'month'

# Define column name saving zone info 
ZONE_COL = 'zone_name'
# Define column name saving region info
REGION_COL = 'zone_region'

# initiate the zone information 
df[ZONE_COL] = 0
df[REGION_COL] = 0

In [3]:
# Can a take sample to test first, for example 500
df = df[[ID_COL, LNG_COL, LAT_COL]].sample(500).reset_index(drop=True)

## READING ZONE GEO JSON FILE

Here we use zone geojson file. If subzone info required, then change to read the subzone geojson file

In [4]:
FILE_ZONES_GEOJSON = 'C:/Users/liuleo/Documents/KT/Ext_Data/GeoJSON/zones_singapore.json'

with open(FILE_ZONES_GEOJSON, "r") as file:
    zone_raw = json.load(file)

# Getting the GEO Polygons for zones
polygons = [shape(feature['geometry']) for feature in zone_raw['features']]

## From Longitude and Latitude info, get zone name and region name for each record 

In [5]:
print 'Mapping Longitude and Latitude for {} records'.format(len(df))

for i in range(len(df)):
    
    get = df.iloc[i]
    
    # Filter out empty lan or lng.
    if np.isnan(get[LAT_COL]) or np.isnan(get[LNG_COL]):
        continue
    
    # Define the address location with lng and lat
    point = Point(get[LNG_COL], get[LAT_COL])
    
    # Linke the location with zone and region name
    for idx, polygon in enumerate(polygons):
        if polygon.contains(point):
            df.loc[i, ZONE_COL] = zone_raw['features'][idx]['properties']['Name']
            df.loc[i, REGION_COL] = zone_raw['features'][idx]['properties']['Region']
            break
    
    if (i+1) % 100 == 0:
        print 'Mapping Finish First {}'.format(i+1)

Mapping Longitude and Latitude for 500 records
Mapping Finish First 100
Mapping Finish First 200
Mapping Finish First 300
Mapping Finish First 400
Mapping Finish First 500


## Save results to a text file

In [6]:
OUTPUT_FILE = 'C:/Users/liuleo/Documents/KT/TMP/geozones_results.csv'
df.to_csv(OUTPUT_FILE,sep='|',index=False)