<a href="https://colab.research.google.com/github/ZhongyuZhao/Coursera_Capstone/blob/main/Coursera_Capstone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#** Capstone Project - The Battle of the Neighborhoods (Week 2)**

**This notebook will be mainly used for Coursera capstone project.**

## Introduction

In this project we will try to find an optimal location in North York for Torontonians to move in. Specifically, this report will be targeted to find a safe and convenient neighborhood that is similar to the previous one.

Since there are lots of neighborhoods in North York we will try to do clustering to filter out some of them. We will also use crime rate data and traffic data to secure the safety and convenience of the neighborhood. We would prefer neighborhood as safe as possible. We also want the neighborhood could offer venues that users love and not too far from the public transit.


## Data Description

Data Link:
 
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M 

https://en.wikipedia.org/wiki/List_of_Toronto_subway_stations 

https://www.kaggle.com/alincijov/toronto-crime-rate-per-neighbourhood

Will use the data scrapped from wikipedia on Week 3. I will use Toronto venues data ,neighborhood crime data and traffic data to finish my project.

### Foursquare API Data

We will need data about different venues in different neighborhoods of that specific borough. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighborhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 100 meter.

The data retrieved from Foursquare will contain information of venues in North York and Downtown Toronto. The information obtained per venue as follows:

Neighborhood
Neighborhood Latitude
Neighborhood Longitude
Venue
Name of the venue e.g. the name of a store or restaurant
Venue Latitude
Venue Longitude
Venue Category
I will use these data to find a kind of correspondence between downtown neighborhoods and North York neighborhoods.

### Toronto Crime Data & Metro Station Data

In order to select the best neighborhood, I will use crime rate data and metro data to find out if a neighborhood is safe or convenient enough. Imagine someone wants to move to North York, I will do clustering first, filtering the neighborhoods to make sure it's similar to the neighborhood he previously lived in.

## Data Extraction

We start off by importing all the required packages.

In [1]:
import pandas as pd
import numpy as np
import requests
from geopy.geocoders import Nominatim
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.distance import geodesic

Scraping the list of areas from Toronto wikipedia page

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
wiki_url = requests.get(url)
wiki_data = pd.read_html(wiki_url.text)
wiki_data

[    Postal Code  ...                                      Neighbourhood
 0           M1A  ...                                       Not assigned
 1           M2A  ...                                       Not assigned
 2           M3A  ...                                          Parkwoods
 3           M4A  ...                                   Victoria Village
 4           M5A  ...                          Regent Park, Harbourfront
 ..          ...  ...                                                ...
 175         M5Z  ...                                       Not assigned
 176         M6Z  ...                                       Not assigned
 177         M7Z  ...                                       Not assigned
 178         M8Z  ...  Mimico NW, The Queensway West, South of Bloor,...
 179         M9Z  ...                                       Not assigned
 
 [180 rows x 3 columns],
                                                   0   ...   17
 0                               

In [3]:
wiki_data = wiki_data[0]

Stroe the data in a pandas dataframe

In [4]:
df = wiki_data[~wiki_data['Borough'].isin(['Not assigned'])]
df.groupby(['Postal Code'])
df = df.reset_index()
df.drop(['index'],axis = 'columns', inplace = True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Import latitude and longitude data to locate neighborhoods.

In [5]:
data = pd.read_csv("https://cocl.us/Geospatial_data")
cols_to_use = data.columns.difference(df.columns)
df_combine = df.join(data.set_index('Postal Code'), on='Postal Code', how='inner')
df_combine

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


Seperate North York neighborhoods and Downtown neighborhoods

In [6]:
df_ny = df_combine[df['Borough'].isin(['North York'])]
df_ny.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
7,M3B,North York,Don Mills,43.745906,-79.352188
10,M6B,North York,Glencairn,43.709577,-79.445073


In [7]:
df_dt = df_combine[df['Borough'].isin(['Downtown Toronto'])]
df_dt.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


### Mapping

In this section we will draw a map to locate those neighborhoods.

In [8]:
address = 'North York, ON'

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York are 43.7543263, -79.44911696639593.


In [9]:
# create map of New York using latitude and longitude values

map_toronto_ny = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df_ny['Latitude'], df_ny['Longitude'], df_ny['Borough'], df_ny['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_ny)  
    
map_toronto_ny

### Find Nearby Venues

In this section we will use Foursquare API. We are going to search venues in a specific area and try to induct the catagory of the neighborhood according to the venues location.

In [10]:
CLIENT_ID = 'AUX0SNLBUOS4JO5H3YFS1VQ44HSUH3KWG5GPY1TNNES2LQDO' # your Foursquare ID
CLIENT_SECRET = 'MEKYKTHN1O2YVYICDD2D3T0T0LHT20A0NK3CDBC3XUDS2DKG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: AUX0SNLBUOS4JO5H3YFS1VQ44HSUH3KWG5GPY1TNNES2LQDO
CLIENT_SECRET:MEKYKTHN1O2YVYICDD2D3T0T0LHT20A0NK3CDBC3XUDS2DKG


In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
NY_venues= getNearbyVenues(df_ny['Neighbourhood'], df_ny['Latitude'], df_ny['Longitude'])

Parkwoods
Victoria Village
Lawrence Manor, Lawrence Heights
Don Mills
Glencairn
Don Mills
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Fairview, Henry Farm, Oriole
Northwood Park, York University
Bayview Village
Downsview
York Mills, Silver Hills
Downsview
North Park, Maple Leaf Park, Upwood Park
Humber Summit
Willowdale, Newtonbrook
Downsview
Bedford Park, Lawrence Manor East
Humberlea, Emery
Willowdale, Willowdale East
Downsview
York Mills West
Willowdale, Willowdale West


In [13]:
DT_venues= getNearbyVenues(df_dt['Neighbourhood'], df_dt['Latitude'], df_dt['Longitude'])

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


One hot encoding is needed to convert from categorical data to numerical data.

In [14]:
#onehot encoding
DT_onehot = pd.get_dummies(DT_venues[['Venue Category']], prefix="", prefix_sep="")
NY_onehot = pd.get_dummies(NY_venues[['Venue Category']], prefix="", prefix_sep="")

DT_onehot['Neighborhood'] = DT_venues['Neighborhood'] 
NY_onehot['Neighborhood'] = NY_venues['Neighborhood']

# move neighborhood column to the first column
dt_fixed_columns = [DT_onehot.columns[-1]] + list(DT_onehot.columns[:-1])
ny_fixed_columns = [NY_onehot.columns[-1]] + list(NY_onehot.columns[:-1])

DT_onehot = DT_onehot[dt_fixed_columns]
NY_onehot = NY_onehot[ny_fixed_columns]

In [15]:
DT_grouped = DT_onehot.groupby('Neighborhood').mean().reset_index()
DT_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,...,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.035088,0.0,0.0,0.0,0.017544,0.017544,0.0,0.035088,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.0,0.0,0.017544,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0625,0.0625,0.0625,0.125,0.125,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.031746,0.0,...,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.047619,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.025316,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.012658,0.0,0.025316,0.012658,...,0.0,0.0,0.037975,0.0,0.012658,0.0,0.012658,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.012658,0.0,0.063291,0.0,0.0,0.0,0.0,0.0,0.012658,0.012658,0.012658,0.0,0.0,0.0,0.0,0.0,0.0


In [16]:
NY_grouped = NY_onehot.groupby('Neighborhood').mean().reset_index()
NY_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Event Space,...,Korean Restaurant,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Women's Store
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0
3,Don Mills,0.0,0.0,0.0,0.038462,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.076923,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.038462,0.0,0.038462,0.076923,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Downsview,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,...,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Since we have finished encoding, we are going to define a function to find most common venues for a neighborhood.

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [18]:
#Print each neighborhood along with top 5 venues in a pandas dataframe
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_NY = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_DT = pd.DataFrame(columns=columns)

neighborhoods_venues_sorted_NY['Neighborhood'] = NY_grouped['Neighborhood']
neighborhoods_venues_sorted_DT['Neighborhood'] = DT_grouped['Neighborhood']


for ind in np.arange(NY_grouped.shape[0]):
    neighborhoods_venues_sorted_NY.iloc[ind, 1:] = return_most_common_venues(NY_grouped.iloc[ind, :], num_top_venues)
for ind in np.arange(DT_grouped.shape[0]):
    neighborhoods_venues_sorted_DT.iloc[ind, 1:] = return_most_common_venues(DT_grouped.iloc[ind, :], num_top_venues)

In [19]:
neighborhoods_venues_sorted_NY.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Fried Chicken Joint,Shopping Mall,Middle Eastern Restaurant,Mobile Phone Shop,Pharmacy,Pizza Place,Deli / Bodega,Bridal Shop
1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Dog Run,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
2,"Bedford Park, Lawrence Manor East",Sandwich Place,Italian Restaurant,Coffee Shop,Café,Locksmith,Liquor Store,Butcher,Juice Bar,Pharmacy,Pizza Place
3,Don Mills,Gym,Beer Store,Japanese Restaurant,Coffee Shop,Restaurant,Italian Restaurant,Discount Store,Dim Sum Restaurant,Construction & Landscaping,Clothing Store
4,Downsview,Grocery Store,Park,Shopping Mall,Athletics & Sports,Liquor Store,Baseball Field,Discount Store,Bank,Food Truck,Gym / Fitness Center


## Clustering

In [20]:
kclusters = 5

NY_grouped_clustering = NY_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NY_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 4, 3], dtype=int32)

In [21]:
neighborhoods_venues_sorted_NY.insert(0, 'Cluster Labels', kmeans.labels_)

In [22]:
NY_merged = df_ny

# merge two datasets to add latitude/longitude for each neighborhood
NY_merged = NY_merged.join(neighborhoods_venues_sorted_NY.set_index('Neighborhood'), on='Neighbourhood')

NY_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Fast Food Restaurant,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Women's Store,Discount Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Women's Store,Miscellaneous Shop,Boutique,Coffee Shop,Event Space,Vietnamese Restaurant,Accessories Store,Supplement Shop
7,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Beer Store,Japanese Restaurant,Coffee Shop,Restaurant,Italian Restaurant,Discount Store,Dim Sum Restaurant,Construction & Landscaping,Clothing Store
10,M6B,North York,Glencairn,43.709577,-79.445073,0.0,Pub,Bakery,Japanese Restaurant,Park,Women's Store,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store


In [23]:
NY_merged.isnull().any(axis=0)

Postal Code               False
Borough                   False
Neighbourhood             False
Latitude                  False
Longitude                 False
Cluster Labels             True
1st Most Common Venue      True
2nd Most Common Venue      True
3rd Most Common Venue      True
4th Most Common Venue      True
5th Most Common Venue      True
6th Most Common Venue      True
7th Most Common Venue      True
8th Most Common Venue      True
9th Most Common Venue      True
10th Most Common Venue     True
dtype: bool

As we can see, There are Null in Cluster Labels, so we need to drop those rows.

In [24]:
NY_merged = NY_merged.dropna(subset=['Cluster Labels'])

In [25]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NY_merged['Latitude'], NY_merged['Longitude'], NY_merged['Neighbourhood'], NY_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Find a Satisfying Neighborhood 

We have finished data cleaning, visualization and clustering, now it's time to solve the problems this project aims at.

Imagine a person (I will use my information to build an example) lives in downtown Toronto and he wants to move to North York, he needs a guide to select the best neighborhood for him and then locate a condo or house around the chosen area. Here are some brief information about this guy:

1.He's now living in Queen's Park area.

2.He's asian. He loves asian food and coffee.

3.He wants to live in a safe neighborhood.

4.He needs to take a subway everyday.

Now we will do our job to help this guy locate his future.

## Clustering

The first step we gonna do is clustering. This time, we will add his previous address 'Queen's Park' to the 'North York' data set and do clustering again, our goal is to find a cluster that has similar features.

In [26]:
#select Queen;s Park row and insert
insert_row = DT_venues[DT_venues['Neighborhood'].str.contains('Provincial')]
NY_venues_plus = NY_venues.copy()
NY_venues_plus = NY_venues_plus.append(insert_row)
NY_venues_plus

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
...,...,...,...,...,...,...,...
70,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Convocation Hall,43.660828,-79.395245,College Auditorium
71,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tim Hortons,43.658906,-79.388696,Coffee Shop
72,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tim Hortons,43.659415,-79.391221,Coffee Shop
73,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Understudy Café at Gerstein,43.662308,-79.394098,College Cafeteria


In [27]:
NY_venus_plus = NY_venues_plus.reset_index(drop = True)
NY_venus_plus.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


Do onehot encoding for the new set again

In [28]:
NY_onehot_plus = pd.get_dummies(NY_venues_plus[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
NY_onehot_plus['Neighborhood'] = NY_venues_plus['Neighborhood']

# move neighborhood column to the first column
ny_fixed_columns_plus = [NY_onehot_plus.columns[-1]] + list(NY_onehot_plus.columns[:-1])

NY_onehot_plus = NY_onehot_plus[ny_fixed_columns_plus]

In [29]:
NY_grouped_plus = NY_onehot_plus.groupby('Neighborhood').mean().reset_index()
NY_grouped_plus.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Store,Bike Shop,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,...,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music Venue,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Smoothie Shop,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Women's Store,Yoga Studio
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Don Mills,0.0,0.0,0.0,0.038462,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.038462,0.0,0.038462,0.076923,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Downsview,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
NY_neighborhoods_venues_plus_sorted = pd.DataFrame(columns=columns)
NY_neighborhoods_venues_plus_sorted['Neighborhood'] = NY_grouped_plus['Neighborhood']

for ind in np.arange(NY_grouped_plus.shape[0]):
    NY_neighborhoods_venues_plus_sorted.iloc[ind, 1:] = return_most_common_venues(NY_grouped_plus.iloc[ind, :], num_top_venues)

NY_neighborhoods_venues_plus_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Ice Cream Shop,Middle Eastern Restaurant,Mobile Phone Shop,Pharmacy,Pizza Place,Bridal Shop,Deli / Bodega,Restaurant
1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Yoga Studio,Discount Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
2,"Bedford Park, Lawrence Manor East",Sandwich Place,Coffee Shop,Italian Restaurant,Indian Restaurant,Locksmith,Liquor Store,Café,Butcher,Pharmacy,Pizza Place
3,Don Mills,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
4,Downsview,Grocery Store,Park,Bank,Discount Store,Liquor Store,Shopping Mall,Baseball Field,Korean Restaurant,Athletics & Sports,Gym / Fitness Center


K-means clustering for the new set

In [31]:
kclusters = 5

NY_grouped_clustering_plus = NY_grouped_plus.drop('Neighborhood', 1)

# run k-means clustering
kmeans_plus = KMeans(n_clusters=kclusters, random_state=0).fit(NY_grouped_clustering_plus)

# check cluster labels generated for each row in the dataframe
kmeans_plus.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 3, 1], dtype=int32)

In [32]:
NY_neighborhoods_venues_plus_sorted.insert(0, 'Cluster Labels', kmeans_plus.labels_)

In [33]:
insert_dt = df_dt[df_dt['Neighbourhood'].str.contains('Provincial')]

In [34]:
df_ny_plus = df_ny.copy()
df_ny_plus = df_ny_plus.append(insert_dt)
df_ny_plus = df_ny_plus.reset_index(drop = True)
df_ny_plus

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073
5,M3C,North York,Don Mills,43.7259,-79.340923
6,M2H,North York,Hillcrest Village,43.803762,-79.363452
7,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
9,M3J,North York,"Northwood Park, York University",43.76798,-79.487262


In [35]:
NY_merged_plus = df_ny_plus

NY_merged_plus = NY_merged_plus.join(NY_neighborhoods_venues_plus_sorted.set_index('Neighborhood'), on='Neighbourhood')

NY_merged_plus.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Fast Food Restaurant,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Boutique,Event Space,Miscellaneous Shop,Coffee Shop,Accessories Store,Women's Store,Vietnamese Restaurant,Distribution Center
3,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
4,M6B,North York,Glencairn,43.709577,-79.445073,0.0,Pub,Bakery,Japanese Restaurant,Park,Yoga Studio,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping


In [36]:
NY_merged_plus = NY_merged_plus.dropna(subset=['Cluster Labels'])

In [37]:
#visualization
map_clusters_plus = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NY_merged_plus['Latitude'], NY_merged_plus['Longitude'], NY_merged_plus['Neighbourhood'], NY_merged_plus['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters_plus)
       

In [38]:
map_clusters_plus

As we can see above, Queen's Park neighborhood is in cluster 0. It's a big cluster so only a few neighborhoods are filtered out. Since the clustering result is not that feasible, we need to go deep into the venue data.

## Venues Analysis

Now we dive into the data and select neighborhoods that contains keywords for the customer.

First, we take a glance at venues in Queen's Park neighborhood.

In [39]:
NY_merged_plus = NY_merged_plus.reset_index(drop = True)

In [40]:
NY_merged_plus[NY_merged_plus['Neighbourhood'].str.contains('Provincial')]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Sandwich Place,Burrito Place,Café,Park,Fried Chicken Joint,Mexican Restaurant,College Auditorium,Yoga Studio


It's clear that there are a lot of coffee shop and asian restaurant in this neighborhood. Mexican restaurants are common, too. Next, let's do a research in North York neighborhoods, we are gonna find those neighborhoods that has coffee shop and asian/mexico restaurants in its top 5 venues.

In [48]:
NY_merged_plus[NY_merged_plus['1st Most Common Venue'].str.contains('Restaurant|Coffee Shop')]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
7,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,0.0,Coffee Shop,Bank,Ice Cream Shop,Middle Eastern Restaurant,Mobile Phone Shop,Pharmacy,Pizza Place,Bridal Shop,Deli / Bodega,Restaurant
9,M3J,North York,"Northwood Park, York University",43.76798,-79.487262,0.0,Coffee Shop,Furniture / Home Store,Miscellaneous Shop,Caribbean Restaurant,Bar,Massage Studio,Diner,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
10,M2K,North York,Bayview Village,43.786947,-79.385975,0.0,Chinese Restaurant,Café,Bank,Japanese Restaurant,Yoga Studio,Discount Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
19,M2N,North York,"Willowdale, Willowdale East",43.77012,-79.408493,0.0,Ramen Restaurant,Café,Sandwich Place,Pizza Place,Coffee Shop,Plaza,Pet Store,Bubble Tea Shop,Movie Theater,Middle Eastern Restaurant
23,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Sandwich Place,Burrito Place,Café,Park,Fried Chicken Joint,Mexican Restaurant,College Auditorium,Yoga Studio


In [49]:
NY_merged_plus[NY_merged_plus['2nd Most Common Venue'].str.contains('Restaurant|Coffee Shop')]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,0.0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Juice Bar,Bank,Mobile Phone Shop,Food Court,Jewelry Store
17,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,0.0,Sandwich Place,Coffee Shop,Italian Restaurant,Indian Restaurant,Locksmith,Liquor Store,Café,Butcher,Pharmacy,Pizza Place
22,M2R,North York,"Willowdale, Willowdale West",43.782736,-79.442259,0.0,Grocery Store,Coffee Shop,Pharmacy,Pizza Place,Butcher,Discount Store,Food Service,Deli / Bodega,Gas Station,Furniture / Home Store
23,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Sandwich Place,Burrito Place,Café,Park,Fried Chicken Joint,Mexican Restaurant,College Auditorium,Yoga Studio


In [50]:
NY_merged_plus[NY_merged_plus['3rd Most Common Venue'].str.contains('Restaurant|Coffee Shop')]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Fast Food Restaurant,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
3,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
4,M6B,North York,Glencairn,43.709577,-79.445073,0.0,Pub,Bakery,Japanese Restaurant,Park,Yoga Studio,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping
5,M3C,North York,Don Mills,43.7259,-79.340923,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
6,M2H,North York,Hillcrest Village,43.803762,-79.363452,0.0,Golf Course,Pool,Mediterranean Restaurant,Fast Food Restaurant,Dog Run,Diner,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,0.0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Juice Bar,Bank,Mobile Phone Shop,Food Court,Jewelry Store
17,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,0.0,Sandwich Place,Coffee Shop,Italian Restaurant,Indian Restaurant,Locksmith,Liquor Store,Café,Butcher,Pharmacy,Pizza Place


In [51]:
NY_merged_plus[NY_merged_plus['4th Most Common Venue'].str.contains('Restaurant|Coffee Shop')]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Diner,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
3,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
5,M3C,North York,Don Mills,43.7259,-79.340923,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
6,M2H,North York,Hillcrest Village,43.803762,-79.363452,0.0,Golf Course,Pool,Mediterranean Restaurant,Fast Food Restaurant,Dog Run,Diner,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
7,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,0.0,Coffee Shop,Bank,Ice Cream Shop,Middle Eastern Restaurant,Mobile Phone Shop,Pharmacy,Pizza Place,Bridal Shop,Deli / Bodega,Restaurant
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,0.0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Juice Bar,Bank,Mobile Phone Shop,Food Court,Jewelry Store
9,M3J,North York,"Northwood Park, York University",43.76798,-79.487262,0.0,Coffee Shop,Furniture / Home Store,Miscellaneous Shop,Caribbean Restaurant,Bar,Massage Studio,Diner,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
10,M2K,North York,Bayview Village,43.786947,-79.385975,0.0,Chinese Restaurant,Café,Bank,Japanese Restaurant,Yoga Studio,Discount Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
17,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,0.0,Sandwich Place,Coffee Shop,Italian Restaurant,Indian Restaurant,Locksmith,Liquor Store,Café,Butcher,Pharmacy,Pizza Place


In [52]:
NY_merged_plus[NY_merged_plus['5th Most Common Venue'].str.contains('Restaurant|Coffee Shop')]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
5,M3C,North York,Don Mills,43.7259,-79.340923,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,0.0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Juice Bar,Bank,Mobile Phone Shop,Food Court,Jewelry Store
15,M2M,North York,"Willowdale, Newtonbrook",43.789053,-79.408493,4.0,Park,Discount Store,College Auditorium,College Cafeteria,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Deli / Bodega
19,M2N,North York,"Willowdale, Willowdale East",43.77012,-79.408493,0.0,Ramen Restaurant,Café,Sandwich Place,Pizza Place,Coffee Shop,Plaza,Pet Store,Bubble Tea Shop,Movie Theater,Middle Eastern Restaurant


As we can see, seven neighborhoods are chosen from the dataset. They belongs to cluster 0 and have coffee shops & asian restaurants.

In [53]:
NY_chosen = NY_merged_plus.iloc[[2,3,8,9,10,17,19]]
NY_chosen.reset_index(drop = True,inplace = True)
NY_chosen

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Boutique,Event Space,Miscellaneous Shop,Coffee Shop,Accessories Store,Women's Store,Vietnamese Restaurant,Distribution Center
1,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Beer Store,Coffee Shop,Japanese Restaurant,Restaurant,Chinese Restaurant,Clothing Store,Caribbean Restaurant,Café,Dim Sum Restaurant
2,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,0.0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Juice Bar,Bank,Mobile Phone Shop,Food Court,Jewelry Store
3,M3J,North York,"Northwood Park, York University",43.76798,-79.487262,0.0,Coffee Shop,Furniture / Home Store,Miscellaneous Shop,Caribbean Restaurant,Bar,Massage Studio,Diner,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
4,M2K,North York,Bayview Village,43.786947,-79.385975,0.0,Chinese Restaurant,Café,Bank,Japanese Restaurant,Yoga Studio,Discount Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
5,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,0.0,Sandwich Place,Coffee Shop,Italian Restaurant,Indian Restaurant,Locksmith,Liquor Store,Café,Butcher,Pharmacy,Pizza Place
6,M2N,North York,"Willowdale, Willowdale East",43.77012,-79.408493,0.0,Ramen Restaurant,Café,Sandwich Place,Pizza Place,Coffee Shop,Plaza,Pet Store,Bubble Tea Shop,Movie Theater,Middle Eastern Restaurant


## Crime Rate Analysis

To do further ranking, we will use more data. Safety is important for a new comer, so in this section we will use crime rate data to do analysis.

In [60]:
crime_rate = pd.read_csv('/content/Neighbourhood_Crime_Rates.csv')

In [61]:
crime_rate.head()

Unnamed: 0,OBJECTID,Neighbourhood,Hood_ID,Population,Assault_2014,Assault_2015,Assault_2016,Assault_2017,Assault_2018,Assault_2019,Assault_AVG,Assault_CHG,Assault_Rate_2019,AutoTheft_2014,AutoTheft_2015,AutoTheft_2016,AutoTheft_2017,AutoTheft_2018,AutoTheft_2019,AutoTheft_AVG,AutoTheft_CHG,AutoTheft_Rate_2019,BreakandEnter_2014,BreakandEnter_2015,BreakandEnter_2016,BreakandEnter_2017,BreakandEnter_2018,BreakandEnter_2019,BreakandEnter_AVG,BreakandEnter_CHG,BreakandEnter_Rate_2019,Homicide_2014,Homicide_2015,Homicide_2016,Homicide_2017,Homicide_2018,Homicide_2019,Homicide_AVG,Homicide_CHG,Homicide_Rate_2019,Robbery_2014,Robbery_2015,Robbery_2016,Robbery_2017,Robbery_2018,Robbery_2019,Robbery_AVG,Robbery_CHG,Robbery_Rate_2019,TheftOver_2014,TheftOver_2015,TheftOver_2016,TheftOver_2017,TheftOver_2018,TheftOver_2019,TheftOver_AVG,TheftOver_CHG,TheftOver_Rate_2019,Shape__Area,Shape__Length
0,1,Yonge-St.Clair,97,12528,20,29,39,27,34,37,31.0,0.09,295.3,2,3,7,2,6,6,4.3,0.0,47.9,37,20,12,19,24,28,23.3,0.17,223.5,0,0,0,0,0,0,0.0,0.0,0.0,6,5,6,8,5,4,5.7,-0.2,31.9,4,5,8,0,3,6,4.3,1.0,47.9,1161315.0,5873.270582
1,2,York University Heights,27,27593,271,296,361,344,357,370,333.2,0.04,1340.9,105,100,105,92,92,144,106.3,0.57,521.9,107,139,98,105,122,108,113.2,-0.11,391.4,1,0,2,1,1,0,0.8,-1.0,0.0,59,84,70,75,88,79,75.8,-0.1,286.3,30,46,37,39,38,28,36.3,-0.26,101.5,13246660.0,18504.777326
2,3,Lansing-Westgate,38,16164,44,80,68,85,75,72,70.7,-0.04,445.4,19,22,27,26,16,32,23.7,1.0,198.0,34,27,41,42,50,39,38.8,-0.22,241.3,0,0,0,0,10,0,1.7,-1.0,0.0,11,5,9,17,35,11,14.7,-0.69,68.1,4,5,5,11,6,11,7.0,0.83,68.1,5346186.0,11112.109625
3,4,Yorkdale-Glen Park,31,14804,106,136,174,161,175,209,160.2,0.19,1411.8,63,53,41,52,63,61,55.5,-0.03,412.1,51,57,66,58,64,84,63.3,0.31,567.4,1,1,1,1,2,1,1.2,-0.5,6.8,23,21,24,35,44,42,31.5,-0.05,283.7,23,14,26,23,20,29,22.5,0.45,195.9,6038326.0,10079.42692
4,5,Stonegate-Queensway,16,25051,88,71,76,95,87,82,83.2,-0.06,327.3,34,29,12,32,31,34,28.7,0.1,135.7,71,45,49,49,39,64,52.8,0.64,255.5,0,0,0,0,0,0,0.0,0.0,0.0,21,14,16,26,25,22,20.7,-0.12,87.8,7,8,4,6,7,4,6.0,-0.43,16.0,7946202.0,11853.189878


As the name of the neighborhood varies in this data set, we use 'contains' function to look for the neighborhoods that is selected above.

In [62]:
crime_data = crime_rate[crime_rate['Neighbourhood'].str.contains('Lawrence|Don Mills|Don|York|Northwood|Bayview|Willowdale')]
crime_data

Unnamed: 0,OBJECTID,Neighbourhood,Hood_ID,Population,Assault_2014,Assault_2015,Assault_2016,Assault_2017,Assault_2018,Assault_2019,Assault_AVG,Assault_CHG,Assault_Rate_2019,AutoTheft_2014,AutoTheft_2015,AutoTheft_2016,AutoTheft_2017,AutoTheft_2018,AutoTheft_2019,AutoTheft_AVG,AutoTheft_CHG,AutoTheft_Rate_2019,BreakandEnter_2014,BreakandEnter_2015,BreakandEnter_2016,BreakandEnter_2017,BreakandEnter_2018,BreakandEnter_2019,BreakandEnter_AVG,BreakandEnter_CHG,BreakandEnter_Rate_2019,Homicide_2014,Homicide_2015,Homicide_2016,Homicide_2017,Homicide_2018,Homicide_2019,Homicide_AVG,Homicide_CHG,Homicide_Rate_2019,Robbery_2014,Robbery_2015,Robbery_2016,Robbery_2017,Robbery_2018,Robbery_2019,Robbery_AVG,Robbery_CHG,Robbery_Rate_2019,TheftOver_2014,TheftOver_2015,TheftOver_2016,TheftOver_2017,TheftOver_2018,TheftOver_2019,TheftOver_AVG,TheftOver_CHG,TheftOver_Rate_2019,Shape__Area,Shape__Length
1,2,York University Heights,27,27593,271,296,361,344,357,370,333.2,0.04,1340.9,105,100,105,92,92,144,106.3,0.57,521.9,107,139,98,105,122,108,113.2,-0.11,391.4,1,0,2,1,1,0,0.8,-1.0,0.0,59,84,70,75,88,79,75.8,-0.1,286.3,30,46,37,39,38,28,36.3,-0.26,101.5,13246660.0,18504.777326
3,4,Yorkdale-Glen Park,31,14804,106,136,174,161,175,209,160.2,0.19,1411.8,63,53,41,52,63,61,55.5,-0.03,412.1,51,57,66,58,64,84,63.3,0.31,567.4,1,1,1,1,2,1,1.2,-0.5,6.8,23,21,24,35,44,42,31.5,-0.05,283.7,23,14,26,23,20,29,22.5,0.45,195.9,6038326.0,10079.42692
9,10,Danforth East York,59,17180,64,50,44,74,80,83,65.8,0.04,483.1,6,7,12,10,12,9,9.3,-0.25,52.4,31,22,43,25,18,24,27.2,0.33,139.7,0,0,0,0,0,0,0.0,0.0,0.0,2,6,5,8,8,6,5.8,-0.25,34.9,1,3,4,0,2,7,2.8,2.5,40.7,2188598.0,7623.857816
45,46,Bayview Woods-Steeles,49,13154,38,42,35,33,49,45,40.3,-0.08,342.1,16,7,18,22,7,18,14.7,1.57,136.8,22,34,33,38,28,20,29.2,-0.29,152.0,0,0,0,0,0,1,0.2,1.0,7.6,4,2,0,4,7,5,3.7,-0.29,38.0,3,1,1,3,2,2,2.0,0.0,15.2,4088934.0,8253.154502
49,50,Old East York,58,9233,46,50,37,39,59,46,46.2,-0.22,498.2,3,4,6,0,9,4,4.3,-0.56,43.3,21,11,26,17,28,13,19.3,-0.54,140.8,0,0,1,0,1,0,0.3,-1.0,0.0,17,5,6,4,6,8,7.7,0.33,86.6,4,3,2,2,0,3,2.3,3.0,32.5,2349933.0,7485.024779
81,82,Englemount-Lawrence,32,22372,88,83,129,110,98,116,104.0,0.18,518.5,29,23,23,32,41,23,28.5,-0.44,102.8,45,41,59,47,52,33,46.2,-0.37,147.5,0,0,0,0,1,0,0.2,-1.0,0.0,31,14,44,22,11,16,23.0,0.45,71.5,5,2,7,10,7,7,6.3,0.0,31.3,3478190.0,8311.869028
84,85,Banbury-Don Mills,42,27695,61,78,84,109,77,74,80.5,-0.04,267.2,19,20,7,16,27,42,21.8,0.56,151.7,65,87,57,64,85,81,73.2,-0.05,292.5,0,0,0,0,0,0,0.0,0.0,0.0,13,29,21,10,7,10,15.0,0.43,36.1,10,14,11,5,8,14,10.3,0.75,50.6,10041550.0,18165.12392
95,96,Willowdale East,51,50434,127,114,145,148,172,168,145.7,-0.02,333.1,30,24,46,23,37,57,36.2,0.54,113.0,87,75,78,95,76,88,83.2,0.16,174.5,0,0,0,0,0,0,0.0,0.0,0.0,23,32,29,48,54,40,37.7,-0.26,79.3,9,10,13,18,14,24,14.7,0.71,47.6,5061016.0,9482.383478
96,97,Willowdale West,37,16936,86,116,97,92,107,112,101.7,0.05,661.3,10,15,12,14,15,31,16.2,1.07,183.0,30,16,38,30,35,21,28.3,-0.4,124.0,0,0,0,1,0,0,0.2,0.0,0.0,12,4,18,24,37,28,20.5,-0.24,165.3,4,7,4,4,6,10,5.8,0.67,59.0,2883462.0,7476.939622
107,108,Parkwoods-Donalda,45,34805,155,152,181,170,142,158,159.7,0.11,454.0,33,34,26,44,20,32,31.5,0.6,91.9,70,61,46,68,78,68,65.2,-0.13,195.4,0,0,0,0,1,1,0.3,0.0,2.9,17,23,32,47,22,36,29.5,0.64,103.4,6,6,4,10,6,5,6.2,-0.17,14.4,7464327.0,13452.194759


3, 9, 49, 107, 116 are clearly irrelevant so just focus on the rest. Let's select those rates in columns and drop the rest. 

In [65]:
crime_rate = crime_data.iloc[:,[1,12,21,30,39,48,57]]
crime_rate['crime_rate_avg'] = (crime_rate['Assault_Rate_2019']/crime_rate['Assault_Rate_2019'].max()+crime_rate['AutoTheft_Rate_2019']/crime_rate['AutoTheft_Rate_2019'].max()+crime_rate['BreakandEnter_Rate_2019']/crime_rate['BreakandEnter_Rate_2019'].max()+crime_rate['Homicide_Rate_2019']/crime_rate['Homicide_Rate_2019'].max()+crime_rate['Robbery_Rate_2019']/crime_rate['Robbery_Rate_2019'].max()+crime_rate['TheftOver_Rate_2019']/crime_rate['TheftOver_Rate_2019'].max())
crime_rate

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Neighbourhood,Assault_Rate_2019,AutoTheft_Rate_2019,BreakandEnter_Rate_2019,Homicide_Rate_2019,Robbery_Rate_2019,TheftOver_Rate_2019,crime_rate_avg
1,York University Heights,1340.9,521.9,391.4,0.0,286.3,101.5,4.157715
3,Yorkdale-Glen Park,1411.8,412.1,567.4,6.8,283.7,195.9,5.67527
9,Danforth East York,483.1,52.4,139.7,0.0,34.9,40.7,1.01846
45,Bayview Woods-Steeles,342.1,136.8,152.0,7.6,38.0,15.2,1.982641
49,Old East York,498.2,43.3,140.8,0.0,86.6,32.5,1.152379
81,Englemount-Lawrence,518.5,102.8,147.5,0.0,71.5,31.3,1.233705
84,Banbury-Don Mills,267.2,151.7,292.5,0.0,36.1,50.6,1.379827
95,Willowdale East,333.1,113.0,174.5,0.0,79.3,47.6,1.279963
96,Willowdale West,661.3,183.0,124.0,0.0,165.3,59.0,1.916132
107,Parkwoods-Donalda,454.0,91.9,195.4,2.9,103.4,14.4,1.658286


From the form above we can figure out some important news:

York University neighborhood is dangerous, do not move to that area!

For the rest, Fairview, Lawrence, Don Mills are great. Bayview and Willowdale are not that good but acceptable.

Note:Fairview neighborhood is called Don Valley Village in this data set

## Traffic Analysis





Since the customer needs to take a subway, let's calculate the distance between those neighborhoods and the closest subway station around it. In this section we will scrap subway station data from wikipedia.

In [66]:
url_metro = 'https://en.wikipedia.org/wiki/List_of_Toronto_subway_stations'

In [67]:
wiki_url_metro = requests.get(url_metro)
wiki_data_metro = pd.read_html(wiki_url_metro.text)
wiki_data_metro

[                         0
 0  Line 1 Yonge–University
 1    Line 2 Bloor–Danforth
 2       Line 3 Scarborough
 3          Line 4 Sheppard,               Station  ...  Accessible[3]
 0               Finch  ...            Yes
 1   North York Centre  ...            Yes
 2      Sheppard–Yonge  ...            Yes
 3          York Mills  ...            Yes
 4            Lawrence  ...             No
 ..                ...  ...            ...
 70            McCowan  ...             No
 71            Bayview  ...            Yes
 72          Bessarion  ...            Yes
 73             Leslie  ...            Yes
 74          Don Mills  ...            Yes
 
 [75 rows x 10 columns],                    0
 0    Line 5 Eglinton
 1  Line 6 Finch West,                    Station  ...                     Proposed transfers[17]
 0             Mount Dennis  ...  TTC buses Kitchener Union Pearson Express
 1               Keelesdale  ...                                  TTC buses
 2                Caledo

In [68]:
metro = wiki_data_metro[1]
metro.groupby(['Station'])
df_metro = pd.DataFrame(data = metro['Station'])
df_metro

Unnamed: 0,Station
0,Finch
1,North York Centre
2,Sheppard–Yonge
3,York Mills
4,Lawrence
...,...
70,McCowan
71,Bayview
72,Bessarion
73,Leslie


Use geolocator to locate subway stations.

In [69]:
Latitude = []
Longitude = []

for i in range(len(df_metro)):
    address = '{},Toronto'.format(metro['Station'][i])
    geolocator = Nominatim(user_agent="Toronto explorer")
    location = geolocator.geocode(address)
    Latitude.append(location.latitude)
    Longitude.append(location.longitude)
print(Latitude, Longitude)

[43.7812974, 43.7686787, 43.7614518, 43.7440391, 43.7263483, 43.7061229, 43.697936, 43.6879512, 43.6816776, 43.6783556, 43.6707855, 43.6655242, 43.6606617, 43.6565367, 43.6529083, 43.6489494, 43.6456424, 43.6477917, 43.6508016, 43.6548199, 43.659659, 43.6670924, 43.6684465, 43.6685404, 43.6747046, 43.6849986, 43.7000172, 43.7087117, 43.7152827, 43.7246418, 43.7344762, 43.7492988, 43.7534325, 43.764653, 43.7743325, 43.776858000000004, -34.8899421, 43.786329800000004, 43.6375928, 43.645335, 43.6481827, 43.649826, 43.6499177, 43.6517026, 43.6538668, 43.655569, 43.6568809, 43.6588892, 43.6602019, 43.6623836, 43.6641106, 43.6655189, 43.6701851, 43.6722828, 43.673509, 43.6769999, 43.67819075, 43.67996425, 43.6809856, 43.68253285, 43.68414715, 43.6863861, 43.689040750000004, 43.6948696, 43.7111204, 43.732429, 43.7504987, 43.7668074, 43.7704537, 43.7475018, 43.7749091, 43.7667987, 43.7691308, 43.7710725, 43.775347] [-79.4158993, -79.4126298, -79.4109148, -79.406657, -79.4024743, -79.39853, -79

In [70]:
df_metro['Latitude'] = Latitude
df_metro['Longitude'] = Longitude
df_metro.head()

Unnamed: 0,Station,Latitude,Longitude
0,Finch,43.781297,-79.415899
1,North York Centre,43.768679,-79.41263
2,Sheppard–Yonge,43.761452,-79.410915
3,York Mills,43.744039,-79.406657
4,Lawrence,43.726348,-79.402474


Data visualization

In [71]:
map_metro = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, station in zip(df_metro['Latitude'], df_metro['Longitude'],df_metro['Station']):
    label = '{}'.format(station)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_metro)  
    
map_metro

Define a function  to calculate the distance between two spots

In [72]:
def Cal_distance(p,df):
  d_set = []
  for i in range(len(df)):
    d = geodesic((p[0],p[1]),(df['Latitude'][i],df['Longitude'][i])).miles
    d_set.append(d)
  min_d = min(d_set)
  metro_index = d_set.index(min_d)
  return df['Station'][metro_index],round(min_d,3)




Create a dataframe to collect neighborhood;s position, latitude, longitude, closest subway station and distance(kilometer).

In [73]:
df_neig =pd.DataFrame(columns=('Neighborhood','Latitude','Longitude','Closest','Distance'))
df_neig['Neighborhood'] = NY_chosen['Neighbourhood']
df_neig['Latitude'] = NY_chosen['Latitude']
df_neig['Longitude'] = NY_chosen['Longitude']
df_neig

Unnamed: 0,Neighborhood,Latitude,Longitude,Closest,Distance
0,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,,
1,Don Mills,43.745906,-79.352188,,
2,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,,
3,"Northwood Park, York University",43.76798,-79.487262,,
4,Bayview Village,43.786947,-79.385975,,
5,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,,
6,"Willowdale, Willowdale East",43.77012,-79.408493,,


In [74]:
Closest_list = []
Dis_list = []
for Lat,Lon in zip(df_neig['Latitude'],df_neig['Longitude']):
  p = [Lat,Lon]
  closest,dis = Cal_distance(p,df_metro)
  Closest_list.append(closest)
  Dis_list.append(dis)


In [75]:
df_neig['Closest'] = Closest_list
df_neig['Distance'] = Dis_list
df_neig

Unnamed: 0,Neighborhood,Latitude,Longitude,Closest,Distance
0,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Yorkdale,0.962
1,Don Mills,43.745906,-79.352188,Leslie,1.863
2,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,Don Mills,0.221
3,"Northwood Park, York University",43.76798,-79.487262,Finch West,0.305
4,Bayview Village,43.786947,-79.385975,Bessarion,1.317
5,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,Lawrence,0.989
6,"Willowdale, Willowdale East",43.77012,-79.408493,North York Centre,0.23


Willowdale, Fairview and York University are close to subway station. Lawrence, Don mills, Bayview Village and Lawrence are not good enough, the distance is more than 1km.

## Results and Conclusion

According to the analysis, York University neighborhood is not safe so we will not consider this place. For the rest candidates, Fairview neighborhood is close to subway station and it's safe according to the record. Willwodale is the second choice, it is also close to the subway station, but the west part of Willowdale is not that safe. Don Mills, Bayview Village and Lawrence are too fart from subway station so they are not that good. In general, Fairview neighborhood is the best choice.

## Conclusion

This project helps a person get a better understanding of the neighborhoods with respect to venues, crime rates and other data in that neighborhood. This project shows how data analysis helps people live better. Further discussion may include taking other factors such as quality of education, supermarkets, highways, cost of living, house price and so on.