# Capstone Project - Carol Sutton
#### 311 provides residents, businesses and visitors with easy access to non-emergency City services, programs and information 24 hours a day, seven days a week. 311 can offer assistance in more than 180 languages. The City of Toronto has been made aware that some of its residential areas (namely those near Downtown Toronto) may have hazardous materials buried.  As a service to residents 311 is offering to identify areas for current residents that are similar to the ones that they live in currently (obviously wihout the buried hazardous material).  Subsequent relocation would be free for those residents whose housing is paid for by Toronto .  Other residents needing to be relocated will have thier expenses subsidised.

#### This analysis is for Etobicoke

### Set up section
#### Import libraries required for the activities

In [70]:
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
!conda install -c conda-forge folium=0.5.0 --yes 

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

GeoLocator = Nominatim(user_agent='My-IBMNotebook')

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


!conda install -c conda-forge folium=0.5.0 --yes 
import folium 
from urllib import request
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

import bs4 as bs
        

from sklearn.cluster import KMeans




print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


### Scraping the web
#### This is the code to scrape the varios web pages

#### 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.html
#### I used Python BeautifulSoup and Python lxml

In [71]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
rawpage = request.urlopen(url)

### This is to parse the data using xpath

In [72]:
def scrape_table_bs4(cname,cols):
    page = urllib.request.urlopen(url).read()
    soup = bs.BeautifulSoup(page,'lxml')
    table = soup.find('table',class_=cname)
    header = [head.findAll(text=True)[0].strip() for head in table.find_all("th")]
    data = [[td.findAll(text=True)[0].strip() for td in tr.find_all("td")] for tr in table.find_all("tr")]
    data = [row for row in data if len(row) == cols]
# This is to store the data temporarily
    temp_df = pd.DataFrame(data, columns=header)
    return temp_df

### This is to test the work in Beautiful Soup

In [73]:
raw_Postcodes = scrape_table_bs4("wikitable",3)

### This is to test the work in LXML

In [74]:
print ("Postcodes")
print(raw_Postcodes.info(verbose = True))

Postcodes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 3 columns):
Postcode         287 non-null object
Borough          287 non-null object
Neighbourhood    287 non-null object
dtypes: object(3)
memory usage: 6.8+ KB
None


### Assumptions

#### The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

#### I will only process the cells that have an assigned borough. I will ignore cells with a borough that is Not assigned.

#### Where more than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma.

#### If a cell has a borough but a Not assigned neighborhood, then I will make the neighborhood will be the same as the borough. For example the 9th cell in¶

In [75]:
Postcodes = raw_Postcodes[~raw_Postcodes['Borough'].isin(['Not assigned'])]
                          
Postcodes=Postcodes.sort_values(by=['Postcode', 'Borough', 'Neighbourhood'], ascending =[1,1,1]).reset_index(drop=True)

In [76]:
Postcodes.loc[Postcodes['Neighbourhood'] == 'Not assigned', ['Neighbourhood']]=Postcodes['Borough']

check_unassigned_post_state_sample = Postcodes.loc[Postcodes['Borough'] =='Queen\'s Park']

In [77]:
Postcodes = Postcodes.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

#### List of Postal Codes in Toronto Canada (starting M...)

In [78]:
Postcodes

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [79]:
Postcodes.shape

(103, 3)

## Neighbourhood Coordinates

#### In order to utilize the Foursquare location data, I will get the latitude and the longitude coordinates of each neighborhood. Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

#### I choose to use the provided csv file  - http://cocl.us/Geospatial_data

In [80]:
lat_longcsv = 'http://cocl.us/Geospatial_data'
!wget -q -o 'Geospatial_coordinates.csv' lat_longcsv
geopostcode_data=pd.read_csv(lat_longcsv).set_index('Postal Code')
geopostcode_data.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [81]:
Postcodes.to_csv('postcode1_df.csv',index=False)

postcode_csv = 'postcode1_df.csv'

postcodes1 = pd.read_csv(postcode_csv).set_index('Postcode')
postcodes1.rename_axis('Postal Code', axis = 'index', inplace = True)
postcodes1.head()

Unnamed: 0_level_0,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"Malvern, Rouge"
M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
M1E,Scarborough,"Guildwood, Morningside, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae


### Combine the two sets of data

In [82]:
Combined_data = postcodes1.join( geopostcode_data)
Combined_data.head()

Unnamed: 0_level_0,Borough,Neighbourhood,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill",43.784535,-79.160497
M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
M1G,Scarborough,Woburn,43.770992,-79.216917
M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [83]:
Combined_data.shape

(103, 4)

## Exploring and clustering the neighbourhoods in Etobicoke

#### To explore the neighbourhoos of selected cities I will use the Foursquare API.



### Use geophy to get the lat/long values of Etobicoke Canada

In [84]:
address = 'Toronto, Ontario Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto Canada are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of Toronto Canada are 43.653963, -79.387207.


### Create a map of Toronto with the cities superimposed

In [85]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)


for lat, lng, borough, neighborhood in zip(Combined_data['Latitude'], Combined_data['Longitude'], Combined_data['Borough'], Combined_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)


In [86]:
map_toronto

## Now I will apply the same analysis to the Etobicoke area (as I did to Downtown Tornonto) to start the assessment

### Assumption
#### For the purpose of the exercise I will work with only boroughs that contain the word Etobicoke and then replicate the same analysis that I did with the New York City data.

In [87]:
Etob_data = Combined_data[Combined_data['Borough'].str.contains("Etobicoke")].reset_index(drop=True)
print(Etob_data.shape)
Etob_data.head()

(11, 4)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Etobicoke,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321
1,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484
2,Etobicoke,"Montgomery Road, Old Mill North, The Kingsway",43.653654,-79.506944
3,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509
4,Etobicoke,"Kingsway Park South West, Mimico NW, Royal Yor...",43.628841,-79.520999


#### I will now recreate the map with the markers on it for the neighourhoods

In [88]:

map_Et = folium.Map(location=[latitude, longitude], zoom_start=11)


for lat, lng, label in zip(Etob_data['Latitude'], Etob_data['Longitude'], Etob_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Et)  
    
map_Et

### Now using the Foursquare API to explore and segment neighborhoods

In [89]:
CLIENT_ID = 'DWE403I3DYSRFXV4VDIAQOSUD1IMFKWNV4LMVNQWSR5CZMDV' # your Foursquare ID
CLIENT_SECRET = 'UND1K2GR13ZF5ZUYY45MAQINZRAGY4IJ2EXBINYW0FAOPGGI' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DWE403I3DYSRFXV4VDIAQOSUD1IMFKWNV4LMVNQWSR5CZMDV
CLIENT_SECRET:UND1K2GR13ZF5ZUYY45MAQINZRAGY4IJ2EXBINYW0FAOPGGI


### To explore the neighbourhoods in Etobicoke
#### I will use the same query as for the NY exercise
#### https://api.foursquare.com/v2/venues/search? client_id=CLIENT_ID&client_secret=CLIENT_SECRET&ll=LATITUDE,LONGITUDE&v=VERSION&query=QUERY&radius=RADIUS&limit=LIMIT

In [90]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
      
    
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
     
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
Combined_data = Etob_data
Etob_venues = getNearbyVenues(names=Combined_data['Neighbourhood'],
                                   latitudes=Combined_data['Latitude'],
                                   longitudes=Combined_data['Longitude'])

Humber Bay Shores, Mimico South, New Toronto
Alderwood, Long Branch
Montgomery Road, Old Mill North, The Kingsway
Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, Royal York South East, Sunnylea, The Queensway East
Kingsway Park South West, Mimico NW, Royal York South West, South of Bloor, The Queensway West
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Westmount
Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips
Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown
Northwest


In [92]:
Combined_data.shape

(11, 4)

In [93]:
Etob_venues.head()

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,LCBO,43.602281,-79.499302,Liquor Store
1,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,Domino's Pizza,43.601676,-79.500908,Pizza Place
2,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,New Toronto Fish & Chips,43.601849,-79.503281,Restaurant
3,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,Delicia Bakery & Pastry,43.601403,-79.503012,Bakery
4,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,Lucky Dice Restaurant,43.601392,-79.503056,Café


In [94]:
Etob_venues.shape

(72, 7)

### Noting that 72 venues have been returned, lets check to see how many venues are in each neighbourhood

In [95]:
Etob_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",9,9,9,9,9,9
"Alderwood, Long Branch",8,8,8,8,8,8
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe",8,8,8,8,8,8
"Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park",2,2,2,2,2,2
"Humber Bay Shores, Mimico South, New Toronto",13,13,13,13,13,13
"Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, Royal York South East, Sunnylea, The Queensway East",2,2,2,2,2,2
"Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips",4,4,4,4,4,4
"Kingsway Park South West, Mimico NW, Royal York South West, South of Bloor, The Queensway West",14,14,14,14,14,14
"Montgomery Road, Old Mill North, The Kingsway",2,2,2,2,2,2
Northwest,2,2,2,2,2,2


### Checking on unique categories in each area

In [96]:
print('{} unique venue categories have been found.'.format(len(Etob_venues['Venue Category'].unique())))

40 unique venue categories have been found.


## Analysing each neighbourhood
### Using One Hot encoding
#### One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
#### then sort out the presentation of the data

In [97]:
venues_oh = pd.get_dummies(Etob_venues['Venue Category'])


venues_oh['Neighbourhood'] = Etob_venues['Neighbourhood'] 


fixed_columns = [venues_oh.columns[-1]] + list(venues_oh.columns[:-1])
venues_oh =venues_oh[fixed_columns]

venues_oh.head()

Unnamed: 0,Neighbourhood,American Restaurant,Bakery,Baseball Field,Beer Store,Burger Joint,Burrito Place,Bus Line,Café,Chinese Restaurant,Coffee Shop,Convenience Store,Discount Store,Drugstore,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Hardware Store,Intersection,Japanese Restaurant,Liquor Store,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pharmacy,Pizza Place,Pool,Pub,Rental Car Location,Restaurant,River,Sandwich Place,Shopping Plaza,Skating Rink,Supplement Shop,Tanning Salon,Wings Joint
0,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
2,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,"Humber Bay Shores, Mimico South, New Toronto",0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Just confirm that no venues have been dropped (check figure is 72)

In [98]:
venues_oh.shape

(72, 41)

### Lets work out the average number of the different types of venues for each neighbourhood to see which ones suit me the best

In [99]:
neighbourhoodsE_grouped = venues_oh.groupby('Neighbourhood').mean().reset_index()

neighbourhoodsE_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Bakery,Baseball Field,Beer Store,Burger Joint,Burrito Place,Bus Line,Café,Chinese Restaurant,Coffee Shop,Convenience Store,Discount Store,Drugstore,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Hardware Store,Intersection,Japanese Restaurant,Liquor Store,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pharmacy,Pizza Place,Pool,Pub,Rental Car Location,Restaurant,River,Sandwich Place,Shopping Plaza,Skating Rink,Supplement Shop,Tanning Salon,Wings Joint
0,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.25,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0
2,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
3,"Cloverdale, Islington, Martin Grove, Princess ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Humber Bay Shores, Mimico South, New Toronto",0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,0.0
5,"Humber Bay, King's Mill Park, Kingsway Park So...",0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Kingsview Village, Martin Grove Gardens, Richv...",0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Kingsway Park South West, Mimico NW, Royal Yor...",0.0,0.071429,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.071429,0.0,0.071429,0.0,0.0,0.071429,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.071429,0.071429
8,"Montgomery Road, Old Mill North, The Kingsway",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0
9,Northwest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Check for new size in case items get dropped in the future

In [100]:
neighbourhoodsE_grouped.shape

(11, 41)

### Convert to a panda's dataframe for easier use later on

In [101]:
num_top = 5
for neigh in neighbourhoodsE_grouped['Neighbourhood']:
    print(""+neigh+"")
    temp = neighbourhoodsE_grouped[neighbourhoodsE_grouped['Neighbourhood'] == neigh].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top))
    print('\n')

Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown
                 venue  freq
0       Discount Store  0.11
1       Sandwich Place  0.11
2        Grocery Store  0.11
3             Pharmacy  0.11
4  Fried Chicken Joint  0.11


Alderwood, Long Branch
            venue  freq
0     Pizza Place  0.25
1             Gym  0.12
2    Skating Rink  0.12
3  Sandwich Place  0.12
4     Coffee Shop  0.12


Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
            venue  freq
0    Liquor Store  0.12
1      Beer Store  0.12
2  Shopping Plaza  0.12
3            Café  0.12
4        Pharmacy  0.12


Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
                 venue  freq
0          Golf Course   0.5
1  Filipino Restaurant   0.5
2  American Restaurant   0.0
3                 Pool   0.0
4         Liquor Store   0.0


Humber Bay Shores, Mimico South, New Toronto
                 venue  freq
0               

### Intersting but not very easy to understand
### Lets put them in descending order - so I can see the top 20

In [102]:
def return_most_common_venues(row, num_top):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top]

In [103]:
num_top = 20

indicators = ['st', 'nd', 'rd']

columns = ['Neighbourhood']
for ind in np.arange(num_top):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


nvs = pd.DataFrame(columns=columns)
nvs['Neighbourhood'] = neighbourhoodsE_grouped['Neighbourhood']

for ind in np.arange(neighbourhoodsE_grouped.shape[0]):
    nvs.iloc[ind, 1:] = return_most_common_venues(neighbourhoodsE_grouped.iloc[ind, :], num_top)

nvs.shape

(11, 21)

In [104]:
nvs.head(20)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,"Albion Gardens, Beaumond Heights, Humbergate, ...",Pizza Place,Fried Chicken Joint,Fast Food Restaurant,Japanese Restaurant,Discount Store,Pharmacy,Grocery Store,Sandwich Place,Beer Store,Burrito Place,Bus Line,Café,Chinese Restaurant,Burger Joint,Coffee Shop,Convenience Store,Drugstore,Baseball Field,Bakery,Filipino Restaurant
1,"Alderwood, Long Branch",Pizza Place,Sandwich Place,Coffee Shop,Pharmacy,Pub,Gym,Skating Rink,Beer Store,Burger Joint,Burrito Place,Baseball Field,Bus Line,Fried Chicken Joint,Café,Chinese Restaurant,Bakery,Convenience Store,Discount Store,Drugstore,Fast Food Restaurant
2,"Bloordale Gardens, Eringate, Markland Wood, Ol...",Pizza Place,Coffee Shop,Shopping Plaza,Beer Store,Liquor Store,Café,Convenience Store,Pharmacy,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Drugstore,Discount Store,Wings Joint,Golf Course,Chinese Restaurant,Bus Line,Burrito Place,Burger Joint,Baseball Field
3,"Cloverdale, Islington, Martin Grove, Princess ...",Golf Course,Filipino Restaurant,Wings Joint,Tanning Salon,Fried Chicken Joint,Flower Shop,Fast Food Restaurant,Drugstore,Discount Store,Convenience Store,Coffee Shop,Chinese Restaurant,Café,Bus Line,Burrito Place,Burger Joint,Beer Store,Baseball Field,Bakery,Grocery Store
4,"Humber Bay Shores, Mimico South, New Toronto",Café,Gym,Pizza Place,Bakery,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Liquor Store,Pharmacy,American Restaurant,Sandwich Place,Restaurant,Pool,Supplement Shop,Baseball Field,Beer Store,Burger Joint,Burrito Place,Bus Line,Skating Rink
5,"Humber Bay, King's Mill Park, Kingsway Park So...",Pool,Baseball Field,Coffee Shop,Fried Chicken Joint,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Drugstore,Discount Store,Convenience Store,Wings Joint,Grocery Store,Chinese Restaurant,Café,Bus Line,Burrito Place,Burger Joint,Beer Store,Bakery,Golf Course
6,"Kingsview Village, Martin Grove Gardens, Richv...",Pizza Place,Bus Line,Mobile Phone Shop,Park,Wings Joint,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Drugstore,Discount Store,Convenience Store,Coffee Shop,Golf Course,Chinese Restaurant,Café,Burrito Place,Burger Joint,Beer Store,Baseball Field,Bakery
7,"Kingsway Park South West, Mimico NW, Royal Yor...",Wings Joint,Hardware Store,Bakery,Burger Joint,Burrito Place,Convenience Store,Discount Store,Fast Food Restaurant,Flower Shop,Grocery Store,Tanning Salon,Gym,Supplement Shop,Sandwich Place,Pharmacy,Park,Baseball Field,Beer Store,Skating Rink,Shopping Plaza
8,"Montgomery Road, Old Mill North, The Kingsway",River,Park,Wings Joint,Coffee Shop,Fried Chicken Joint,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Drugstore,Discount Store,Convenience Store,Chinese Restaurant,Grocery Store,Café,Bus Line,Burrito Place,Burger Joint,Beer Store,Baseball Field,Bakery
9,Northwest,Drugstore,Rental Car Location,Wings Joint,Coffee Shop,Fried Chicken Joint,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Discount Store,Convenience Store,Chinese Restaurant,Grocery Store,Café,Bus Line,Burrito Place,Burger Joint,Beer Store,Baseball Field,Bakery,Golf Course


# Looks like Albion, alderwood and Bloordale are good areas for me.
### Lets see what other neighbourhoods are in that area.
## Clustering the Neighbourhoods
### Using the k means technique to see what other areas are inthe same vicinity
### Given that there are over 100 neighbourhoods - I'll make 11 clusters

In [105]:
kclusters = 5
neighbourhoodclusteringE = neighbourhoodsE_grouped.drop('Neighbourhood',1)
kmeans = KMeans(n_clusters=kclusters, random_state = 1).fit(neighbourhoodclusteringE)
print(kmeans.labels_[0:12])
print(len(kmeans.labels_))

[1 1 1 2 1 0 1 1 4 3 1]
11


#### Confirm data set to be used going forward

In [106]:
Combined_data.shape

(11, 4)

In [107]:
finalised_data = Combined_data

finalised_data['Cluster Labels'] = kmeans.labels_

finalised_data = finalised_data.join(nvs.set_index('Neighbourhood'), on='Neighbourhood')

finalised_data.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Etobicoke,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,1,Café,Gym,Pizza Place,Bakery,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Liquor Store,Pharmacy,American Restaurant,Sandwich Place,Restaurant,Pool,Supplement Shop,Baseball Field,Beer Store,Burger Joint,Burrito Place,Bus Line,Skating Rink
1,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,1,Pizza Place,Sandwich Place,Coffee Shop,Pharmacy,Pub,Gym,Skating Rink,Beer Store,Burger Joint,Burrito Place,Baseball Field,Bus Line,Fried Chicken Joint,Café,Chinese Restaurant,Bakery,Convenience Store,Discount Store,Drugstore,Fast Food Restaurant
2,Etobicoke,"Montgomery Road, Old Mill North, The Kingsway",43.653654,-79.506944,1,River,Park,Wings Joint,Coffee Shop,Fried Chicken Joint,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Drugstore,Discount Store,Convenience Store,Chinese Restaurant,Grocery Store,Café,Bus Line,Burrito Place,Burger Joint,Beer Store,Baseball Field,Bakery
3,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509,2,Pool,Baseball Field,Coffee Shop,Fried Chicken Joint,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Drugstore,Discount Store,Convenience Store,Wings Joint,Grocery Store,Chinese Restaurant,Café,Bus Line,Burrito Place,Burger Joint,Beer Store,Bakery,Golf Course
4,Etobicoke,"Kingsway Park South West, Mimico NW, Royal Yor...",43.628841,-79.520999,1,Wings Joint,Hardware Store,Bakery,Burger Joint,Burrito Place,Convenience Store,Discount Store,Fast Food Restaurant,Flower Shop,Grocery Store,Tanning Salon,Gym,Supplement Shop,Sandwich Place,Pharmacy,Park,Baseball Field,Beer Store,Skating Rink,Shopping Plaza


### Lets now see what this looks like on a map so I can see where things are

In [108]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)


x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(finalised_data['Latitude'], finalised_data['Longitude'], finalised_data['Neighbourhood'], finalised_data ['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters