## Introduction/Business Problem

There is a steady increase in the number of immigrants settling in Canada annually. This is in part due to the friendly nature of the Canadian immigration structure. These immigrants have to settle into their new lives in Canada within the shortest possible time. This means they have to get employment, accomodation etcetera. Giving the high cost of living within the major cities, finding the ideal place to settle can be very difficult

Using the city Toronto as case study, this project we will try to find the best location for a new immigrant to settle into. This project is specifically tailored to help the continuous growing number of immigrants moving to Canada find the most suited area to live in.
We are particularly interested in areas that have a high immigrant population to help with the transition, and also areas with available businesses to provide the very needed employment.

## Data

The data that will be used to solve this problem includes:
1. Foresquare location data of the city of Torornto will be used to explore the neighbourhoods
2. The Torornto Postal Code data from wiki. This data includes both the Borough and the assigned Neighbourhood. This will be used as input for the Foresquare API.
    https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M    
3. T

In [130]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
from pandas.io.html import read_html #library to read html data
!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [193]:
#Extracting table from Wiki page.
page = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wikitables = read_html(page, attrs={"class":"wikitable"})
NB_df = wikitables[0]

### Cleaning extracted Data

In [197]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
NB_df = NB_df[NB_df.Borough != "Not assigned"] 
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
NB_df['temp_column'] = np.where(NB_df['Neighbourhood']=='Not assigned',NB_df['Borough'],NB_df['Neighbourhood'])
# More than one neighborhood can exist in one postal code area.Neighbourhoods in thesame postal codes are grouped together in a row
NBH_df = NB_df.groupby(['Postcode','Borough'])['temp_column'].apply(', '.join).reset_index()
# Rename the columns as indicated in the instructions
NBH_df = NBH_df.rename(columns = {"Postcode": "PostalCode","temp_column":"Neighborhood"})

In [198]:
def get_geocoder(NBH_df):
    # initialize your variable to None
    Co_ordinates = None
    # loop until you get the coordinates
    while(Co_ordinates is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(NBH_df.strip()))
        Co_ordinates = g.latlng
        latitude = Co_ordinates[0]
        longitude = Co_ordinates[1]
    return latitude,longitude

In [199]:
# Add latitude and longitude to the Dataframe
NBH_df['Latitude'], NBH_df['Longitude'] = zip(*NBH_df['PostalCode'].apply(get_geocoder))
# Display first 5 rows
NBH_df.head(3)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765815,-79.175193


In [200]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_77d20a3484f3474c97f1ee465d5a4c88 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='urhdIUoLDxUPKAb6NwIMPTmWgeDCOHs60RWtn2g0g7qg',
    ibm_auth_endpoint="https://iam.eu-gb.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')

body = client_77d20a3484f3474c97f1ee465d5a4c88.get_object(Bucket='capstoneproject-donotdelete-pr-vkqs5qwll26fby',Key='Community Housing Data.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
Housing_data = pd.read_csv(body)
Housing_data.head(3)


Unnamed: 0,_id,OBJECTID,BLD_ID,DEV_ID,DEV_NAME,NGHBRHD_NUM,POLICE_DIV,PSTL_CODE,TTL_RES_UNIT,MRKT_UNIT,...,YR_BUILT,BLD_TYPO,SCATTERED,BLD_FORM,FLR_ABV_GR,BLD_DESC,geometry,LATITUDE,LONGITUDE,POSTAL_CODE
0,1,1,4327,1,O'Connor Drive,43,54,M4A 1A4,2,0,...,1966,,,House Semi-Detached Duplex,2,2 - Storey Semi-Detached House - Duplex,"{u'type': u'Point', u'coordinates': (-79.29980...",43.716338,-79.299807,M4A
1,2,2,4328,1,O'Connor Drive,43,54,M4A 1A4,2,0,...,1966,,,House Semi-Detached Duplex,2,2 - Storey Semi-Detached House - Duplex,"{u'type': u'Point', u'coordinates': (-79.30096...",43.71608,-79.300965,M4A
2,3,3,4329,1,O'Connor Drive,43,54,M4A 1A4,2,0,...,1966,,,House Semi-Detached Duplex,2,2 - Storey Semi-Detached House - Duplex,"{u'type': u'Point', u'coordinates': (-79.30030...",43.716226,-79.300309,M4A


In [201]:
Housing_data.LATITUDE = Housing_data.LATITUDE.astype(float)
Housing_data.LONGITUDE = Housing_data.LONGITUDE.astype(float)
Housing_data.POSTAL_CODE = Housing_data.POSTAL_CODE.astype(str)
NBH_df.PostalCode = NBH_df.PostalCode.astype(str)

In [202]:
# San Francisco latitude and longitude values
latitude = 43.6532
longitude = -79.3832

Toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)  # create map and display it
# display the map of San Francisco
#Toronto_map  # display the map of San Francisco

In [203]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(NBH_df.Latitude, NBH_df.Longitude):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )
    
    
# add incidents to map
Toronto_map.add_child(incidents)

In [204]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
Toronto_map = folium.Map(location = [latitude, longitude], zoom_start = 11)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(Toronto_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(Housing_data.LATITUDE, Housing_data.LONGITUDE, Housing_data.POSTAL_CODE):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
Toronto_map

In [205]:
# download countries geojson file
import json # library to handle JSON files
!wget --quiet C:\Users\adelola.adegbamiye\Downloads/Toronto Geo.json -O world_countries.json
    
Toronto_geo = r'https://raw.githubusercontent.com/jasonicarter/toronto-geojson/master/toronto_crs84.geojson' # geojson file

print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [208]:
Housing_geo_data = Housing_data.groupby('POSTAL_CODE').count()[['OBJECTID']]

Housing_geo_data = Housing_geo_data.reset_index()

# for sake of consistency, let's also make all column labels of type string
Housing_geo_data.columns = list(map(str, Housing_geo_data.columns))

# let's rename the columns so that they make sense
Housing_geo_data.rename(columns={'POSTAL_CODE':'neighbourhoods', 'OBJECTID':'Housing_units'}, inplace=True)
# sorting
Housing_geo_data = Housing_geo_data.sort_values('Housing_units', ascending=False)

s = Housing_geo_data.pop('neighbourhoods').map(NBH_df.set_index('PostalCode')['Neighborhood'])
Housing_geo_data = Housing_geo_data.assign(neighbourhoods = s)

#m = NBH_df.set_index('PostalCode')['Borough'].to_dict()
#v = Housing_geo_data.filter(like='POSTAL_CODE')
#Housing_geo_data[v.columns] = v.replace(m)

Housing_geo_data.head(15)

Unnamed: 0,Housing_units,neighbourhoods
0,228,"Rouge, Malvern"
53,155,"Lawrence Heights, Lawrence Manor"
78,137,"Albion Gardens, Beaumond Heights, Humbergate, ..."
34,105,"The Beaches West, India Bazaar"
42,103,"Harbourfront, Regent Park"
26,87,Downsview Northwest
35,77,Studio District
51,72,"Chinatown, Grange Park, Kensington Market"
21,52,"Flemingdon Park, Don Mills South"
32,50,East Toronto


In [175]:
# let's start again with a clean copy of the map of San Francisco
Toronto_map = folium.Map(location = [latitude, longitude], zoom_start = 11, tiles ='Stamen Terrain')
Toronto_map

In [188]:
# let's start again with a clean copy of the map of San Francisco
#Toronto_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
Toronto_map.choropleth(
    geo_data=Toronto_geo,
    data=Housing_geo_data,
    columns=['neighbourhoods', 'Housing_units'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Housing availability in Toronto'
)
folium.LayerControl().add_to(Toronto_map)

# display map
Toronto_map