# The Battle of Neighborhoods

## Introduction

In the era of globalization, people moves a lot looking for better life and better work opportunities. Sometimes, this move becomes very hectic when it is to a completely unknown country or place. We people always have our own preference which might not match with other person. For example, my brother. He lives in India and works for a major US based IT firm. His company got a very good project to be executed for a client based out of Toronto, Canada. So my brother's company wants him to move to Toronto, Canada. 

This is the first time, my brother is moving outside of India and is completely confused in selecting the neighborhood for his staying. After doing almost a month's research, he reached out to me to help him out in selecting or recommending the neighborhood which will be best for him to stay.

When I started enquiring about his preferences, he came up with the following to be used as selection criteria:
 
 - Safe Neighborhood
 - Transportation - Metro Station, Bus Station
 - Breakfast places
 - Coffee Shops
 - Restaurants (Indian, Italian, Thai and American)
 - Shopping Center
 - Selected Bars - Sports Bar, Cocktail Bar, Pub
 - Outdoor Activity Center - Playground, Park
 
With these criteria provided by my brother, I would like to recommend to my brother which neighborhood of Toronto, Canada will be good for his living. The basis of this work will help the people like my brother to find their kind of neighborhood considering their preferences in mind. 



# Data Sources

### Neighborhood Data Source:
In Toronto, there are total 140 neighborhoods. To get the list of these neighborhood and their boundary information, I will be using the data from the below URL:
	http://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-
The data available in this page holds the below information:
	- Name of the neighborhood
	- Geometric information about the neighborhood (coordinates of the boundary)
	- Average crime rate for specific crimes till 2018.

I will use an Open Data Portal based API to access this data. The API RUL is : https://opendata.arcgis.com/datasets/af500b5abb7240399853b35a2362d0c0_0.geojson  

From this dataset, I will use only the name of neighborhood, the boundary coordinates. Crime data will be skipped due to old information (data till 2018). This dataset has the boundary coordinates of each neighborhood, which is not suitable for getting the venues. As, the boundary coordinates are nothing but the coordinates of a polygon, that is why using the boundary coordinates, I have calculated the centroids of all the neighborhood and assigned them as centroid coordinates of each neighborhood. 

### Crime Data:
Basically Toronto is one of the safest place in the world, but in recent years, crime has started rising in some of the neighborhoods. We will use the crime incident data published by Toronto Police Department. This data set captured all the major crime reports since 2014 covering for all the 140 neighborhoods. The URL is: http://data.torontopolice.on.ca/datasets/mci-2014-to-2019. 

I have used the API exposed by Open Data Portal to access this data programmatically. The service URL is: https://opendata.arcgis.com/datasets/f4c2e5de021f4836a3caf77f8421f487_0.geojson. This dataset also contains the coordinates of all the places of the incidents. These coordinate information is not relevant here, so I will ignore those information.

In this dataset, the name of the neighborhood was having some additional information (HOOD ID, some number related to Toronto police data), which I had to strip it out so that both the neighborhood dataset and crime dataset
can have same neighborhood name.

As safety is his one of the priority, rather than considering all the 140 neighborhoods, I will take first 100 lowest crime based neighborhood.

From this dataset, I will use the below information:
 - Name of the neighborhood (after removing the Hood ID from the name)
 - Crime Incident
Then I will count the number of incidents for each neighborhood.

### Foursquare Venue data:
My brother has shared his preferred venues which he wants near to his future place of residence. It is quite possible that all venues might not be available at all neighborhood. To collect the list of preferred venues, I will be using the category hierarchy list to identify the categoryId of each preferred categories to filter out the non-preferred venues. The link for the Foursquare category list is below:
	https://developer.foursquare.com/docs/build-with-foursquare/categories/
	
Going through this page, at first I will collect all the nearest and closest categoryIds for all the preferred facilities my brother is looking for. Then I will use the Foursquare API for searching the venues by passing all these categoryIds. 

From this Foursquare dataset, we will collect all the preferred venues (name of the venue and venue category name) present for each 100 neighborhoods.

We will use clustering technique to group the neighborhoods based on their availability near to each neighborhood and then come up with the recommendations. 


### Collecting and building required Datasets

Import all required Python pachages

In [1]:
import pandas as pd
import numpy as np
import json
import requests
from pandas.io.json import json_normalize
import folium
from IPython.display import HTML
from geopy.geocoders import Nominatim

In [2]:
# The code was removed by Watson Studio for sharing.

Use geolocaotor package to retrieve the coordinates of Toronro,Canada. This coordinates will be used to keep Toronto city at the center of the map.

In [4]:
city_name = "Toronto, Ontario"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(city_name)
toronto_latitude = location.latitude
toronto_longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(toronto_latitude, toronto_longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [5]:
# Create an utility to determine the centroid (coordinates) of a neighborhood
# using the boundary coordinates
def getCentroidCoordinates(coordinates):
    det = 0.0
    tempDet = 0.0
    centroidX = 0.0
    centroidY = 0.0
    j = 0
    count = len(coordinates)
    for i in range(count):
        if i + 1 == count:
            j = 0
        else:
            j = i + 1
        tempDet = ((coordinates[i][1]*coordinates[j][0]) - (coordinates[j][1] * coordinates[i][0]))
        det += tempDet
        centroidX += (coordinates[i][1] + coordinates[j][1])*tempDet;
        centroidY += (coordinates[i][0] + coordinates[j][0])*tempDet;
        
    ## divide by the total mass of the polygon
    centroidX /= (3*det);
    centroidY /= (3*det);
    return centroidX, centroidY

#### Collecting the neighborhood data

I am using an Open Data Portal based API to access the neighborhood data. 

The API URL is : https://opendata.arcgis.com/datasets/af500b5abb7240399853b35a2362d0c0_0.geojson

From this datasource, I am trying to fetch the below features:
  - Name of the Neighborhood
  - Boundary coordinates of respective neighborhoods

In [6]:
toronto_neighborhood_url = 'https://opendata.arcgis.com/datasets/af500b5abb7240399853b35a2362d0c0_0.geojson'
neighborhood_results = requests.get(toronto_neighborhood_url).json()
neighborhoods = neighborhood_results["features"]

In [7]:
neighborhood_list=[]
for neighborhood in neighborhoods:
    name = neighborhood["properties"]["Neighbourhood"]
    population = neighborhood["properties"]["Population"]
    neighbor_geometry = neighborhood["geometry"]["coordinates"][0]
    # Calculate the centroid coordinates using the boundary coordinates
    latitude, longitude = getCentroidCoordinates(neighbor_geometry)
    neighborhood_list.append([name, latitude, longitude])
    


neighborhood_df = pd.DataFrame([item for item in neighborhood_list])
neighborhood_df.columns = ["Neighborhood", "Latitude", "Longitude"]
neighborhood_df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Yonge-St.Clair,43.687859,-79.397831
1,York University Heights,43.765738,-79.488842
2,Lansing-Westgate,43.754272,-79.424706
3,Yorkdale-Glen Park,43.714673,-79.457068
4,Stonegate-Queensway,43.63552,-79.501091
5,Tam O'Shanter-Sullivan,43.78013,-79.302876
6,The Beaches,43.671049,-79.29956
7,Thistletown-Beaumond Heights,43.737989,-79.563452
8,Thorncliffe Park,43.707749,-79.349944
9,Danforth East York,43.689468,-79.331362


In [8]:
toronto_neighborhood_map = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhood_df['Latitude'], neighborhood_df['Longitude'], neighborhood_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_neighborhood_map)  
    
folium.LayerControl().add_to(toronto_neighborhood_map)

toronto_neighborhood_map

The Map can be viewed here as below


![Toronto_Neighborhood_Map.JPG](https://github.com/bitunsen/Coursera_Capstone/raw/master/Toronto_Neighborhood_Map.JPG)



    

#### Collecting crime data from Toronto Police department

I will be using another Open Data Portal based API to access details of the major crime incidents. 

The data can be visualized in the Toronto Police Department website: http://data.torontopolice.on.ca/datasets/mci-2014-to-2019/data

The API URL is : https://opendata.arcgis.com/datasets/f4c2e5de021f4836a3caf77f8421f487_0.geojson

From this datasource, I am trying to fetch the below features:
  - Name of the Neighborhood
  - Major Crime Incident since 2014 till 2019
  


In [10]:
toronto_crime_url = 'https://opendata.arcgis.com/datasets/f4c2e5de021f4836a3caf77f8421f487_0.geojson'
crime_results = requests.get(toronto_crime_url).json()
crime_features = crime_results["features"]

Using this major crime incident dataset, we will create a new dataframe "crime_df". I will count all the major criminal incidents and the percentage of crime per neighborhood.

Then sort of "crime_df" dataframe by "Crime_Count" column.

In [12]:
crime_list = []
for feature in crime_features:
    name = feature["properties"]["Neighbourhood"]
    clean_name = name[0:name.find("(")].strip()
    #print("Neighborhood Name : " + name)
    crime_list.append([clean_name, 1])
    

toronto_crime_df = pd.DataFrame([item for item in crime_list])
toronto_crime_df.columns = ["Neighborhood", "Crime_Count"]
toronto_crime_df = toronto_crime_df.groupby(["Neighborhood"]).count()
toronto_crime_df.sort_values(by="Crime_Count", ascending = True, inplace=True)
toronto_crime_df = toronto_crime_df.reset_index()
toronto_crime_df['Percentage_Crime']= (toronto_crime_df['Crime_Count'])/toronto_crime_df['Crime_Count'].sum()
toronto_crime_df.head(10)

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime
0,Lambton Baby Point,353,0.00171
1,Woodbine-Lumsden,377,0.001826
2,Maple Leaf,410,0.001986
3,Guildwood,411,0.001991
4,Yonge-St.Clair,412,0.001996
5,Markland Wood,413,0.002001
6,Old East York,479,0.00232
7,Casa Loma,480,0.002325
8,Forest Hill South,494,0.002393
9,Kingsway South,496,0.002403


So the safest neighborhood is Lambton Baby Point. The second safest neighborhood is Woodbine-Lumsden.

Let's see the neighborhood which are having higher criminal incidents.

In [13]:
toronto_crime_df.tail(10)

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime
130,West Hill,3497,0.01694
131,Woburn,3798,0.018398
132,Kensington-Chinatown,3823,0.018519
133,Downsview-Roding-CFB,3974,0.019251
134,York University Heights,3989,0.019323
135,Moss Park,4786,0.023184
136,West Humber-Clairville,5702,0.027621
137,Church-Yonge Corridor,6232,0.030189
138,Bay Street Corridor,6817,0.033023
139,Waterfront Communities-The Island,7747,0.037528


So the highest crime rate is at Waterfront Communities-The Island. Compare to the number of incidents at Lambton Baby Point, the number of incidents at Waterfront communities is quite at higher side. 

So I would have to make sure not to recommend this neighborhood to my brother.

Let's visualize the Crime rate based division through Toronto's neighborhood map.

In [14]:
url = 'https://github.com/bitunsen/Coursera_Capstone/raw/master'
toronto_geo = f'{url}/toronto_neighborhood.geojson'
toronto_crime_map = folium.Map(location=[43.653522, -79.510540], zoom_start=10)

folium.Choropleth(
    geo_data=toronto_geo,
    name='choropleth',
    data=toronto_crime_df,
    columns=['Neighborhood', 'Percentage_Crime'],
    key_on='feature.properties.AREA_NAME',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Crime Rate (%)'
).add_to(toronto_crime_map)

folium.LayerControl().add_to(toronto_crime_map)

toronto_crime_map

The Map can be viewed here as below


![Toronto_Crime_Data_Map.jpg](https://github.com/bitunsen/Coursera_Capstone/raw/master/Toronto_Crime_Data_Map.jpg)



We will join both data sources by the neighborhood name and then create a dataframe having below columns:
 - Neighborhood - Name of the neighborhood
 - Total number of Crime in a neighborhood
 - Percentage of Crime in a neighborhood
 - Centroid coordinates of each neighborhood (calculated using the boundary coordinates)



In [15]:
neighborhood_with_crime_df = pd.merge(toronto_crime_df, neighborhood_df, on="Neighborhood")
neighborhood_with_crime_df.head(10)

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime,Latitude,Longitude
0,Lambton Baby Point,353,0.00171,43.657421,-79.496008
1,Woodbine-Lumsden,377,0.001826,43.694107,-79.311123
2,Maple Leaf,410,0.001986,43.715575,-79.480718
3,Guildwood,411,0.001991,43.748827,-79.195014
4,Yonge-St.Clair,412,0.001996,43.687859,-79.397831
5,Markland Wood,413,0.002001,43.633542,-79.573394
6,Old East York,479,0.00232,43.696781,-79.335448
7,Casa Loma,480,0.002325,43.681852,-79.407967
8,Forest Hill South,494,0.002393,43.694526,-79.414278
9,Kingsway South,496,0.002403,43.653522,-79.51054


In [16]:
neighborhood_with_crime_df.shape

(140, 5)

Now we will visualize the neighborhood data using the Map.

In [28]:
url = 'https://github.com/bitunsen/Coursera_Capstone/raw/master'
toronto_geo = f'{url}/toronto_neighborhood.geojson'
neighborhood_crime_map = folium.Map(location=[43.653522, -79.510540], zoom_start=10)

folium.Choropleth(
    geo_data=toronto_geo,
    name='choropleth',
    data=neighborhood_with_crime_df,
    columns=['Neighborhood', 'Percentage_Crime'],
    key_on='feature.properties.AREA_NAME',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Crime Rate (%)'
).add_to(neighborhood_crime_map)

# add markers to map
for lat, lng, neighborhood, crime_count in zip(neighborhood_with_crime_df['Latitude'], neighborhood_with_crime_df['Longitude'], neighborhood_with_crime_df['Neighborhood'], neighborhood_with_crime_df['Crime_Count']):
    label = neighborhood + ", Crime Count : [{}]".format(crime_count)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(neighborhood_crime_map)
    
folium.LayerControl().add_to(neighborhood_crime_map)

neighborhood_crime_map

This map can be visualized here:

![Toronto_Neighborhood_With_Crime_Map.jpg](https://github.com/bitunsen/Coursera_Capstone/raw/master/Toronto_Neighborhood_With_Crime_Map.jpg)



My brother's one of the preference is safety. That is why rather than considering all the neighborhoods, we will take first 120 safer heighborhood for further analysis.

In [67]:
safe_neighborhood_with_crime_df = neighborhood_with_crime_df.head(120)
safe_neighborhood_with_crime_df.head(5)

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime,Latitude,Longitude
0,Lambton Baby Point,353,0.00171,43.657421,-79.496008
1,Woodbine-Lumsden,377,0.001826,43.694107,-79.311123
2,Maple Leaf,410,0.001986,43.715575,-79.480718
3,Guildwood,411,0.001991,43.748827,-79.195014
4,Yonge-St.Clair,412,0.001996,43.687859,-79.397831


In [68]:
safe_neighborhood_with_crime_df.shape

(120, 5)

#### Collect venue data for the selected 120 neighborhood

Here we will use Foursquare seach venue API to find the venues which my brother prefers. He is looking for the below venues:
 - Transportation - Metro Station, Bus Station
 - Breakfast places
 - Coffee Shops
 - Restaurants (Indian, Italian, Thai and American)
 - Shopping Center
 - Selected Bars - Sports Bar, Cocktail Bar, Pub
 - Outdoor Activity Center - Playground, Park
 


With the help of the Foursquare documentation, I hace collected all the relevant category ids which I will be using while looking for venues of those categories. The URL for this is below
  - [Foursquare Categories](https://developer.foursquare.com/docs/build-with-foursquare/categories/)
  

In [31]:
CATEGORY_ID_LIST = ["4bf58dd8d48988d1fd931735","4bf58dd8d48988d10f941735","4bf58dd8d48988d110941735","4bf58dd8d48988d149941735",
                    "4bf58dd8d48988d14e941735","4bf58dd8d48988d1fd941735","5744ccdfe4b0c0459246b4dc","4bf58dd8d48988d11d941735",
                    "4bf58dd8d48988d11e941735","4bf58dd8d48988d11b941735", "4bf58dd8d48988d1e7941735", "4bf58dd8d48988d163941735",
                    "4bf58dd8d48988d143941735", "52e81612bcbc57f1066b79f4", "4bf58dd8d48988d1e0931735", "52f2ab2ebcbc57f1066b8b4f"]

In [33]:
#CATEGORY_ID_QUERY = "4bf58dd8d48988d1fd931735,4bf58dd8d48988d10f941735,4bf58dd8d48988d110941735,4bf58dd8d48988d149941735,4bf58dd8d48988d14e941735,4bf58dd8d48988d1fd941735,5744ccdfe4b0c0459246b4dc,4bf58dd8d48988d11d941735,4bf58dd8d48988d11e941735,4bf58dd8d48988d11b941735,4bf58dd8d48988d1e7941735,4bf58dd8d48988d163941735,4bf58dd8d48988d143941735,52e81612bcbc57f1066b79f4,4bf58dd8d48988d1e0931735,52f2ab2ebcbc57f1066b8b4f,4bf58dd8d48988d1fe931735"
## Added all the categoryIds for the preferred venues as a string variable
## This variable will be used to query the venues near to each neighborhood

CATEGORY_ID_QUERY = ("4bf58dd8d48988d1fd931735,4bf58dd8d48988d10f941735,4bf58dd8d48988d110941735,4bf58dd8d48988d149941735"
                     ",4bf58dd8d48988d14e941735,4bf58dd8d48988d1fd941735,5744ccdfe4b0c0459246b4dc,4bf58dd8d48988d11d941735"
                     ",4bf58dd8d48988d11e941735,4bf58dd8d48988d11b941735,4bf58dd8d48988d1e7941735,4bf58dd8d48988d163941735"
                     ",4bf58dd8d48988d143941735,52e81612bcbc57f1066b79f4,4bf58dd8d48988d1e0931735,52f2ab2ebcbc57f1066b8b4f"
                     ",4bf58dd8d48988d1fe931735")

## This utility method will use latitude and longitude of a neighborhood and fetch the venues of these categories

def getVenuesByForNeighborhood(neighborhood_name, category_id_list, latitudes, longitudes, radius=1500):
    venues_list=[]
    venue_id_dict = dict()
    
    url = "https://api.foursquare.com/v2/venues/search?ll={},{}&radius={}&limit=60&categoryId={}&client_id={}&client_secret={}&v={}".format(
                latitudes, longitudes, radius, CATEGORY_ID_QUERY, CLIENT_ID, CLIENT_SECRET, VERSION)
        
        
    #print(url)
    #print("**************************************************************************************************************************")
    response = requests.get(url).json()["response"]
    venues = requests.get(url).json()["response"]['venues']
    if len(venues) == 0:
        print("Neighborhood [{}] has no venues listed.".format(neighborhood_name))
        
    else:
        #print("Neighborhood [{}] has {} venues listed.".format(neighborhood_name, len(venues)))
        for venue in venues:
            venue_id = venue["id"]
            venue_name = venue["name"]
            if len(venue["categories"]) == 0:
                continue
            venue_category = venue["categories"][0]["name"]
            venue_category_id = venue["categories"][0]["id"]
            if venue_category_id in category_id_list:
                #print("Venue Details :: ID : [{}] ::: Name : [{}] ::: Category : [{}]".format(venue_id, venue_name, venue_category))
                if venue_id in venue_id_dict:
                    #print("Venue Details :: ID : [{}] ::: Name : [{}] is already added. Skipping it now.".format(venue_id, venue_name))
                    continue
                else:
                    #print("Venue Details :: ID : [{}] ::: Name : [{}] is not added. Adding it now.".format(venue_id, venue_name))
                    venue_id_dict[venue_id] = venue_name
                    venues_list.append([neighborhood_name, venue_name, venue_category])
                    
    #print(venue_id_dict)
    return venues_list
    

Use the neighborhood data which was collected and consolidated from the previous data sources and then for each neighborhood collect the targeted venues.

Create a new dataframe with the below features for further venue analysis:
  - Neighborhood Name
  - Name of the Venue
  - Categoty Name of the Venue

In [34]:

consolidated_venue_list = []
for neighborhood_name, latitude, longitude in zip(safe_neighborhood_with_crime_df["Neighborhood"], safe_neighborhood_with_crime_df["Latitude"], 
                                                  safe_neighborhood_with_crime_df["Longitude"]):
    venues_list = getVenuesByForNeighborhood(neighborhood_name, CATEGORY_ID_LIST, latitude, longitude)
    print("Neighborhood {} has {} venues.".format(neighborhood_name, len(venues_list)))
    consolidated_venue_list.extend(venues_list)
    
neighborhood_venue_df = pd.DataFrame(venue for venue in consolidated_venue_list)
neighborhood_venue_df.columns = ["Neighborhood", "Venue_Name", "Category"]

neighborhood_venue_df.head(10)



Neighborhood Lambton Baby Point has 42 venues.
Neighborhood Woodbine-Lumsden has 41 venues.
Neighborhood Maple Leaf has 29 venues.
Neighborhood Guildwood has 20 venues.
Neighborhood Yonge-St.Clair has 45 venues.
Neighborhood Markland Wood has 36 venues.
Neighborhood Old East York has 43 venues.
Neighborhood Casa Loma has 42 venues.
Neighborhood Forest Hill South has 44 venues.
Neighborhood Kingsway South has 38 venues.
Neighborhood Centennial Scarborough has 23 venues.
Neighborhood Humber Heights-Westmount has 35 venues.
Neighborhood Mount Pleasant East has 44 venues.
Neighborhood Lawrence Park North has 45 venues.
Neighborhood Etobicoke West Mall has 34 venues.
Neighborhood Bayview Woods-Steeles has 21 venues.
Neighborhood Alderwood has 42 venues.
Neighborhood Bridle Path-Sunnybrook-York Mills has 18 venues.
Neighborhood Princess-Rosethorn has 23 venues.
Neighborhood Forest Hill North has 45 venues.
Neighborhood Pleasant View has 42 venues.
Neighborhood Runnymede-Bloor West Village ha

Unnamed: 0,Neighborhood,Venue_Name,Category
0,Lambton Baby Point,Magwood Park,Park
1,Lambton Baby Point,Jane Subway Station,Metro Station
2,Lambton Baby Point,RGLR,Coffee Shop
3,Lambton Baby Point,Tim Hortons,Coffee Shop
4,Lambton Baby Point,Strada,Italian Restaurant
5,Lambton Baby Point,La Veranda Osteria,Italian Restaurant
6,Lambton Baby Point,Old Mill Subway Station,Metro Station
7,Lambton Baby Point,Royal York Subway Station,Metro Station
8,Lambton Baby Point,Bryden's Pub,Pub
9,Lambton Baby Point,Starbucks,Coffee Shop


In [35]:
neighborhood_venue_df.head(10)

Unnamed: 0,Neighborhood,Venue_Name,Category
0,Lambton Baby Point,Magwood Park,Park
1,Lambton Baby Point,Jane Subway Station,Metro Station
2,Lambton Baby Point,RGLR,Coffee Shop
3,Lambton Baby Point,Tim Hortons,Coffee Shop
4,Lambton Baby Point,Strada,Italian Restaurant
5,Lambton Baby Point,La Veranda Osteria,Italian Restaurant
6,Lambton Baby Point,Old Mill Subway Station,Metro Station
7,Lambton Baby Point,Royal York Subway Station,Metro Station
8,Lambton Baby Point,Bryden's Pub,Pub
9,Lambton Baby Point,Starbucks,Coffee Shop


In [36]:
neighborhood_venue_df.shape

(4444, 3)

Lets see the counts of venues by venue category per neighborhood:

In [37]:
neighborhood_venue_df.groupby(['Neighborhood', 'Category']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue_Name
Neighborhood,Category,Unnamed: 2_level_1
Agincourt North,Breakfast Spot,1
Agincourt North,Bus Stop,1
Agincourt North,Coffee Shop,8
Agincourt North,Indian Restaurant,8
Agincourt North,Park,8
Agincourt North,Playground,2
Agincourt North,Shopping Mall,8
Agincourt North,Shopping Plaza,2
Agincourt South-Malvern West,American Restaurant,3
Agincourt South-Malvern West,Breakfast Spot,1


In [38]:
neighborhood_venue_df.groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue_Name,Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Agincourt North,38,38
Agincourt South-Malvern West,39,39
Alderwood,42,42
Banbury-Don Mills,34,34
Bathurst Manor,36,36
Bayview Village,43,43
Bayview Woods-Steeles,21,21
Bedford Park-Nortown,41,41
Beechborough-Greenbrook,42,42
Birchcliffe-Cliffside,35,35


In [39]:
print('There are {} uniques categories.'.format(len(neighborhood_venue_df['Category'].unique())))

There are 16 uniques categories.


We are planning to do a clustering analysis, so we need to convert our category features into numerical feature. We will perform a well-known transformation called <b>"One hot encoding"</b>.

In [40]:
# one hot encoding
neighborhood_venue_one_hot_df = pd.get_dummies(neighborhood_venue_df[['Category']], prefix="", prefix_sep="")
neighborhood_venue_one_hot_df.shape

(4444, 16)

In [41]:
neighborhood_venue_one_hot_df.head(5)

Unnamed: 0,American Restaurant,Breakfast Spot,Buffet,Bus Stop,Cocktail Bar,Coffee Shop,Indian Restaurant,Italian Restaurant,Metro Station,Park,Playground,Pub,Shopping Mall,Shopping Plaza,Sports Bar,Thai Restaurant
0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


You might have noticed that the above dataframe doesn't have the Neighborhood feature. Let's add it from the <b>neighborhood_venue_df</b> dataframe.



In [42]:
neighborhood_venue_one_hot_df.insert(0, 'Neighborhood', neighborhood_venue_df['Neighborhood'])
neighborhood_venue_one_hot_df.head(5)

Unnamed: 0,Neighborhood,American Restaurant,Breakfast Spot,Buffet,Bus Stop,Cocktail Bar,Coffee Shop,Indian Restaurant,Italian Restaurant,Metro Station,Park,Playground,Pub,Shopping Mall,Shopping Plaza,Sports Bar,Thai Restaurant
0,Lambton Baby Point,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1,Lambton Baby Point,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,Lambton Baby Point,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,Lambton Baby Point,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,Lambton Baby Point,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


Now let's convert the full dataset per neighborhood for better understanding.

In [43]:
neighborhood_venue_grouped_df = neighborhood_venue_one_hot_df.groupby('Neighborhood').mean().reset_index()
neighborhood_venue_grouped_df.shape

(120, 17)

In [44]:
neighborhood_venue_grouped_df.head(5)

Unnamed: 0,Neighborhood,American Restaurant,Breakfast Spot,Buffet,Bus Stop,Cocktail Bar,Coffee Shop,Indian Restaurant,Italian Restaurant,Metro Station,Park,Playground,Pub,Shopping Mall,Shopping Plaza,Sports Bar,Thai Restaurant
0,Agincourt North,0.0,0.026316,0.0,0.026316,0.0,0.210526,0.210526,0.0,0.0,0.210526,0.052632,0.0,0.210526,0.052632,0.0,0.0
1,Agincourt South-Malvern West,0.076923,0.025641,0.0,0.025641,0.0,0.333333,0.102564,0.076923,0.0,0.102564,0.0,0.051282,0.179487,0.0,0.0,0.025641
2,Alderwood,0.02381,0.071429,0.0,0.047619,0.02381,0.357143,0.02381,0.02381,0.0,0.166667,0.047619,0.047619,0.095238,0.0,0.0,0.071429
3,Banbury-Don Mills,0.029412,0.029412,0.0,0.029412,0.0,0.264706,0.058824,0.147059,0.0,0.294118,0.0,0.0,0.058824,0.0,0.0,0.088235
4,Bathurst Manor,0.083333,0.027778,0.0,0.083333,0.0,0.305556,0.027778,0.083333,0.0,0.194444,0.111111,0.0,0.055556,0.0,0.027778,0.0


In [47]:
neighborhood_venue_grouped_df.loc[neighborhood_venue_grouped_df["Buffet"] > 0.0]

Unnamed: 0,Neighborhood,American Restaurant,Breakfast Spot,Buffet,Bus Stop,Cocktail Bar,Coffee Shop,Indian Restaurant,Italian Restaurant,Metro Station,Park,Playground,Pub,Shopping Mall,Shopping Plaza,Sports Bar,Thai Restaurant
5,Bayview Village,0.0,0.023256,0.023256,0.023256,0.0,0.27907,0.023256,0.046512,0.069767,0.255814,0.069767,0.0,0.046512,0.046512,0.0,0.093023
37,Forest Hill South,0.0,0.045455,0.022727,0.022727,0.0,0.386364,0.0,0.068182,0.068182,0.204545,0.045455,0.090909,0.045455,0.0,0.0,0.0
44,Hillcrest Village,0.0,0.054054,0.027027,0.054054,0.0,0.324324,0.027027,0.054054,0.0,0.27027,0.0,0.027027,0.135135,0.027027,0.0,0.0
59,Lawrence Park South,0.02381,0.071429,0.02381,0.0,0.0,0.309524,0.047619,0.142857,0.071429,0.095238,0.047619,0.119048,0.02381,0.0,0.0,0.02381
68,Mount Pleasant East,0.0,0.045455,0.022727,0.0,0.0,0.272727,0.090909,0.136364,0.068182,0.113636,0.022727,0.113636,0.022727,0.0,0.022727,0.068182
69,Mount Pleasant West,0.0,0.045455,0.022727,0.0,0.0,0.295455,0.068182,0.136364,0.068182,0.136364,0.022727,0.090909,0.045455,0.0,0.0,0.068182
97,Steeles,0.026316,0.052632,0.026316,0.026316,0.0,0.236842,0.0,0.0,0.026316,0.210526,0.131579,0.0,0.236842,0.026316,0.0,0.0
117,Yonge-Eglinton,0.0,0.02381,0.02381,0.0,0.0,0.333333,0.047619,0.166667,0.071429,0.095238,0.02381,0.119048,0.047619,0.0,0.0,0.047619


This dataframe <b>neighborhood_venue_grouped_df</b> we will use for the clustering the neighborhood to statistically group the neighborhoods by ther number of venues.

We will create another dataframe which will help us visualizing the neighborhoods by their popular venues.


Now lets sort the venues by occurance per neighborhood. This will give us a good perspective to understand which venue category is is having higher presence in each neighborhood.

To achieve that, lets right a small utility function which will do the sorting for us. If any neighborhood does not have any particular venue category, then it will display "NA" after the name of the vaenue category. For example "Pub NA", in case if Pub category venue is not present in any particular neighborhood.

In [48]:
def getTopVenues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    sorted_venues = []
    for category in row_categories_sorted.index:
        #print("Category {} ::: Score :: {}".format(category,row_categories_sorted[category]))
        if row_categories_sorted[category] > 0.0:
            sorted_venues.append(category)
        else:
            sorted_venues.append(category + " NA")
    #print(row_categories_sorted)
    #return row_categories_sorted.index.values[0:num_top_venues]
    return sorted_venues

In [61]:
num_top_venues = 16

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhood_venues_sorted_df = pd.DataFrame(columns=columns)
neighborhood_venues_sorted_df['Neighborhood'] = neighborhood_venue_grouped_df['Neighborhood']

for ind in np.arange(neighborhood_venue_grouped_df.shape[0]):
    neighborhood_venues_sorted_df.iloc[ind, 1:] = getTopVenues(neighborhood_venue_grouped_df.iloc[ind, :], num_top_venues)

neighborhood_venues_sorted_df.head(5)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
0,Agincourt North,Shopping Mall,Park,Indian Restaurant,Coffee Shop,Shopping Plaza,Playground,Bus Stop,Breakfast Spot,Thai Restaurant NA,Sports Bar NA,Pub NA,Metro Station NA,Italian Restaurant NA,Cocktail Bar NA,Buffet NA,American Restaurant NA
1,Agincourt South-Malvern West,Coffee Shop,Shopping Mall,Park,Indian Restaurant,Italian Restaurant,American Restaurant,Pub,Thai Restaurant,Bus Stop,Breakfast Spot,Sports Bar NA,Shopping Plaza NA,Playground NA,Metro Station NA,Cocktail Bar NA,Buffet NA
2,Alderwood,Coffee Shop,Park,Shopping Mall,Thai Restaurant,Breakfast Spot,Pub,Playground,Bus Stop,Italian Restaurant,Indian Restaurant,Cocktail Bar,American Restaurant,Sports Bar NA,Shopping Plaza NA,Metro Station NA,Buffet NA
3,Banbury-Don Mills,Park,Coffee Shop,Italian Restaurant,Thai Restaurant,Shopping Mall,Indian Restaurant,Bus Stop,Breakfast Spot,American Restaurant,Sports Bar NA,Shopping Plaza NA,Pub NA,Playground NA,Metro Station NA,Cocktail Bar NA,Buffet NA
4,Bathurst Manor,Coffee Shop,Park,Playground,Italian Restaurant,Bus Stop,American Restaurant,Shopping Mall,Sports Bar,Indian Restaurant,Breakfast Spot,Thai Restaurant NA,Shopping Plaza NA,Pub NA,Metro Station NA,Cocktail Bar NA,Buffet NA


Looks like Agincourt North does not have any Thai Resturant, Sport Bar, Pub etc. It only has Shopping Mall, Park, Indian Resturant, Coffe Shop, Shopping Plaza, Playground, Bus Stop and Breakfast Spot. Which means, if someone with strong preference of Thai cuisine, might go to Alderwood neighborhood, as it's 4th common venue is Thai Resturant. 

Lets see what type of venues we can get in the safest neighborhood <b>Lambton Baby Point</b>

In [62]:
neighborhood_venues_sorted_df.loc[neighborhood_venues_sorted_df["Neighborhood"] == "Lambton Baby Point"]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
56,Lambton Baby Point,Coffee Shop,Park,Playground,Italian Restaurant,Metro Station,Breakfast Spot,Pub,Bus Stop,Thai Restaurant,Indian Restaurant,Cocktail Bar,American Restaurant,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Buffet NA


Looks like <b>Lambton Baby Point</b> has almost all preferred venues other than shopping malls.

Now lets go ahead and apply clustering technique on our dataset.

Lets import the neccessary python package to do the K-mean clustering.

In [63]:
from sklearn.cluster import KMeans

In [64]:
kclusters = 3

neighborhood_venue_clustering_df = neighborhood_venue_grouped_df.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhood_venue_clustering_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 1, 0, 0, 0, 2, 1, 2, 2], dtype=int32)

In [65]:
neighborhood_venues_sorted_df.insert(1, 'Cluster Labels', kmeans.labels_)

neighborhood_venues_sorted_df.head(5)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
0,Agincourt North,0,Shopping Mall,Park,Indian Restaurant,Coffee Shop,Shopping Plaza,Playground,Bus Stop,Breakfast Spot,Thai Restaurant NA,Sports Bar NA,Pub NA,Metro Station NA,Italian Restaurant NA,Cocktail Bar NA,Buffet NA,American Restaurant NA
1,Agincourt South-Malvern West,1,Coffee Shop,Shopping Mall,Park,Indian Restaurant,Italian Restaurant,American Restaurant,Pub,Thai Restaurant,Bus Stop,Breakfast Spot,Sports Bar NA,Shopping Plaza NA,Playground NA,Metro Station NA,Cocktail Bar NA,Buffet NA
2,Alderwood,1,Coffee Shop,Park,Shopping Mall,Thai Restaurant,Breakfast Spot,Pub,Playground,Bus Stop,Italian Restaurant,Indian Restaurant,Cocktail Bar,American Restaurant,Sports Bar NA,Shopping Plaza NA,Metro Station NA,Buffet NA
3,Banbury-Don Mills,0,Park,Coffee Shop,Italian Restaurant,Thai Restaurant,Shopping Mall,Indian Restaurant,Bus Stop,Breakfast Spot,American Restaurant,Sports Bar NA,Shopping Plaza NA,Pub NA,Playground NA,Metro Station NA,Cocktail Bar NA,Buffet NA
4,Bathurst Manor,0,Coffee Shop,Park,Playground,Italian Restaurant,Bus Stop,American Restaurant,Shopping Mall,Sports Bar,Indian Restaurant,Breakfast Spot,Thai Restaurant NA,Shopping Plaza NA,Pub NA,Metro Station NA,Cocktail Bar NA,Buffet NA


Now lets merge this venue and cluster specific information with the crime and coordinate information of each neighborhood.

We will join the below dataframes by using the "Neighborhood" feature of each dataframe
    - safe_neighborhood_with_crime_df
    - neighborhood_venues_sorted_df

In [69]:
safe_neighborhood_with_crime_df = safe_neighborhood_with_crime_df.join(neighborhood_venues_sorted_df.set_index('Neighborhood'), on='Neighborhood')
safe_neighborhood_with_crime_df.head(5)

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
0,Lambton Baby Point,353,0.00171,43.657421,-79.496008,1,Coffee Shop,Park,Playground,Italian Restaurant,...,Pub,Bus Stop,Thai Restaurant,Indian Restaurant,Cocktail Bar,American Restaurant,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Buffet NA
1,Woodbine-Lumsden,377,0.001826,43.694107,-79.311123,0,Coffee Shop,Bus Stop,Park,Playground,...,Breakfast Spot,Thai Restaurant,Italian Restaurant,Cocktail Bar,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Pub NA,Buffet NA,American Restaurant NA
2,Maple Leaf,410,0.001986,43.715575,-79.480718,0,Park,Coffee Shop,Bus Stop,Playground,...,Thai Restaurant,Shopping Mall,Italian Restaurant,Breakfast Spot,Shopping Plaza NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA
3,Guildwood,411,0.001991,43.748827,-79.195014,0,Park,Coffee Shop,Sports Bar,Shopping Mall,...,Playground,Thai Restaurant NA,Shopping Plaza NA,Metro Station NA,Italian Restaurant NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
4,Yonge-St.Clair,412,0.001996,43.687859,-79.397831,1,Coffee Shop,Park,Pub,Metro Station,...,Sports Bar,Shopping Mall,Playground,Indian Restaurant,Breakfast Spot,Thai Restaurant NA,Shopping Plaza NA,Cocktail Bar NA,Bus Stop NA,Buffet NA


Let us see which neighrborhoods are present cluster 0 

In [70]:
safe_neighborhood_with_crime_df.loc[safe_neighborhood_with_crime_df["Cluster Labels"] == 0]

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
1,Woodbine-Lumsden,377,0.001826,43.694107,-79.311123,0,Coffee Shop,Bus Stop,Park,Playground,...,Breakfast Spot,Thai Restaurant,Italian Restaurant,Cocktail Bar,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Pub NA,Buffet NA,American Restaurant NA
2,Maple Leaf,410,0.001986,43.715575,-79.480718,0,Park,Coffee Shop,Bus Stop,Playground,...,Thai Restaurant,Shopping Mall,Italian Restaurant,Breakfast Spot,Shopping Plaza NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA
3,Guildwood,411,0.001991,43.748827,-79.195014,0,Park,Coffee Shop,Sports Bar,Shopping Mall,...,Playground,Thai Restaurant NA,Shopping Plaza NA,Metro Station NA,Italian Restaurant NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
5,Markland Wood,413,0.002001,43.633542,-79.573394,0,Park,Coffee Shop,Bus Stop,Thai Restaurant,...,Indian Restaurant,Breakfast Spot,Shopping Plaza,Cocktail Bar,American Restaurant,Sports Bar NA,Pub NA,Playground NA,Metro Station NA,Buffet NA
9,Kingsway South,496,0.002403,43.653522,-79.51054,0,Park,Coffee Shop,Pub,Metro Station,...,Shopping Mall,Playground,Italian Restaurant,Breakfast Spot,American Restaurant,Sports Bar NA,Shopping Plaza NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA
19,Forest Hill North,565,0.002737,43.704218,-79.428102,0,Coffee Shop,Park,Playground,Italian Restaurant,...,Bus Stop,Shopping Mall,Indian Restaurant,American Restaurant,Sports Bar,Pub,Thai Restaurant NA,Shopping Plaza NA,Cocktail Bar NA,Buffet NA
20,Pleasant View,570,0.002761,43.786982,-79.334907,0,Coffee Shop,Shopping Mall,Park,Bus Stop,...,Playground,Metro Station,Breakfast Spot,Pub,American Restaurant,Thai Restaurant NA,Sports Bar NA,Italian Restaurant NA,Cocktail Bar NA,Buffet NA
24,Elms-Old Rexdale,593,0.002873,43.721521,-79.548944,0,Coffee Shop,Park,Indian Restaurant,Shopping Mall,...,Breakfast Spot,Thai Restaurant,Bus Stop,Sports Bar NA,Shopping Plaza NA,Pub NA,Metro Station NA,Cocktail Bar NA,Buffet NA,American Restaurant NA
26,Rustic,607,0.00294,43.71161,-79.498052,0,Coffee Shop,Park,Bus Stop,Thai Restaurant,...,Breakfast Spot,Sports Bar NA,Shopping Plaza NA,Pub NA,Metro Station NA,Italian Restaurant NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA,American Restaurant NA
33,Thistletown-Beaumond Heights,692,0.003352,43.737989,-79.563452,0,Indian Restaurant,Park,Coffee Shop,Shopping Mall,...,Italian Restaurant,Bus Stop,American Restaurant,Sports Bar NA,Shopping Plaza NA,Pub NA,Metro Station NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA


In [71]:
safe_neighborhood_with_crime_df.loc[safe_neighborhood_with_crime_df["Cluster Labels"] == 1]

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
0,Lambton Baby Point,353,0.00171,43.657421,-79.496008,1,Coffee Shop,Park,Playground,Italian Restaurant,...,Pub,Bus Stop,Thai Restaurant,Indian Restaurant,Cocktail Bar,American Restaurant,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Buffet NA
4,Yonge-St.Clair,412,0.001996,43.687859,-79.397831,1,Coffee Shop,Park,Pub,Metro Station,...,Sports Bar,Shopping Mall,Playground,Indian Restaurant,Breakfast Spot,Thai Restaurant NA,Shopping Plaza NA,Cocktail Bar NA,Bus Stop NA,Buffet NA
6,Old East York,479,0.00232,43.696781,-79.335448,1,Coffee Shop,Park,Indian Restaurant,Thai Restaurant,...,Pub,Italian Restaurant,Breakfast Spot,Shopping Mall,Metro Station,Sports Bar NA,Shopping Plaza NA,Playground NA,Cocktail Bar NA,Buffet NA
7,Casa Loma,480,0.002325,43.681852,-79.407967,1,Coffee Shop,Metro Station,Park,Pub,...,Indian Restaurant,Sports Bar,Thai Restaurant NA,Shopping Plaza NA,Playground NA,Cocktail Bar NA,Bus Stop NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
8,Forest Hill South,494,0.002393,43.694526,-79.414278,1,Coffee Shop,Park,Pub,Metro Station,...,Playground,Breakfast Spot,Bus Stop,Buffet,Thai Restaurant NA,Sports Bar NA,Shopping Plaza NA,Indian Restaurant NA,Cocktail Bar NA,American Restaurant NA
12,Mount Pleasant East,509,0.002466,43.704853,-79.384884,1,Coffee Shop,Italian Restaurant,Pub,Park,...,Metro Station,Breakfast Spot,Sports Bar,Shopping Mall,Playground,Buffet,Shopping Plaza NA,Cocktail Bar NA,Bus Stop NA,American Restaurant NA
13,Lawrence Park North,513,0.002485,43.730061,-79.403938,1,Coffee Shop,Italian Restaurant,Park,Indian Restaurant,...,American Restaurant,Pub,Playground,Bus Stop,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA
16,Alderwood,545,0.00264,43.604938,-79.541573,1,Coffee Shop,Park,Shopping Mall,Thai Restaurant,...,Playground,Bus Stop,Italian Restaurant,Indian Restaurant,Cocktail Bar,American Restaurant,Sports Bar NA,Shopping Plaza NA,Metro Station NA,Buffet NA
21,Runnymede-Bloor West Village,573,0.002776,43.65927,-79.485671,1,Coffee Shop,Park,Metro Station,Italian Restaurant,...,Pub,Playground,Indian Restaurant,Bus Stop,American Restaurant,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Cocktail Bar NA,Buffet NA
22,Leaside-Bennington,582,0.002819,43.703798,-79.366032,1,Coffee Shop,Park,Indian Restaurant,Italian Restaurant,...,Playground,Thai Restaurant,Sports Bar,Bus Stop,American Restaurant,Shopping Plaza NA,Pub NA,Metro Station NA,Cocktail Bar NA,Buffet NA


In [72]:
safe_neighborhood_with_crime_df.loc[safe_neighborhood_with_crime_df["Cluster Labels"] == 2]

Unnamed: 0,Neighborhood,Crime_Count,Percentage_Crime,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue
10,Centennial Scarborough,508,0.002461,43.782374,-79.150802,2,Park,Playground,Coffee Shop,Italian Restaurant,...,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Bus Stop NA,Buffet NA,American Restaurant NA
11,Humber Heights-Westmount,509,0.002466,43.692235,-79.522378,2,Park,Coffee Shop,Playground,Thai Restaurant,...,Italian Restaurant,Breakfast Spot,Sports Bar NA,Shopping Plaza NA,Shopping Mall NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA
14,Etobicoke West Mall,515,0.002495,43.645063,-79.568901,2,Park,Coffee Shop,Bus Stop,Shopping Mall,...,Italian Restaurant,Cocktail Bar,Breakfast Spot,Sports Bar NA,Pub NA,Playground NA,Metro Station NA,Indian Restaurant NA,Buffet NA,American Restaurant NA
15,Bayview Woods-Steeles,539,0.002611,43.796802,-79.382076,2,Park,Coffee Shop,Shopping Mall,Playground,...,Thai Restaurant NA,Sports Bar NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Bus Stop NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
17,Bridle Path-Sunnybrook-York Mills,560,0.002713,43.731014,-79.378865,2,Park,Coffee Shop,Bus Stop,Shopping Mall,...,Sports Bar NA,Shopping Plaza NA,Pub NA,Playground NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
18,Princess-Rosethorn,563,0.002727,43.666053,-79.544522,2,Park,Playground,Coffee Shop,Bus Stop,...,Thai Restaurant NA,Sports Bar NA,Shopping Plaza NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
25,Beechborough-Greenbrook,605,0.002931,43.693217,-79.479433,2,Park,Coffee Shop,American Restaurant,Shopping Mall,...,Italian Restaurant,Pub,Playground,Metro Station,Cocktail Bar,Breakfast Spot,Thai Restaurant NA,Shopping Plaza NA,Indian Restaurant NA,Buffet NA
28,Edenbridge-Humber Valley,634,0.003071,43.670889,-79.52242,2,Park,Playground,Shopping Mall,Coffee Shop,...,Thai Restaurant NA,Sports Bar NA,Shopping Plaza NA,Pub NA,Metro Station NA,Italian Restaurant NA,Cocktail Bar NA,Buffet NA,Breakfast Spot NA,American Restaurant NA
38,Keelesdale-Eglinton West,785,0.003803,43.685728,-79.471399,2,Park,Coffee Shop,Bus Stop,Shopping Mall,...,Breakfast Spot,American Restaurant,Thai Restaurant,Pub,Metro Station,Italian Restaurant,Cocktail Bar,Shopping Plaza NA,Indian Restaurant NA,Buffet NA
55,Pelmo Park-Humberlea,928,0.004495,43.717517,-79.528244,2,Park,Coffee Shop,Breakfast Spot,Thai Restaurant,...,Italian Restaurant,Sports Bar NA,Shopping Plaza NA,Pub NA,Metro Station NA,Indian Restaurant NA,Cocktail Bar NA,Bus Stop NA,Buffet NA,American Restaurant NA
