# Capstone Project - The Battle of Neighborhoods

## 1. Introduction

In this section I will present a background about the problem and the main business probelm that will be solved

### 1.1 Background

The parent always considers choosing the appropriate school for their kids. Therefore, they tend to ask friends, family, and even neighbors, to help them make the right decision. 
However, this becomes a challenge, if the parent is moving to another city. Parents at that moment don't have enough information about the schools in that city, and maybe they don't know anyone there. When the parent is moved, the first step is to choose a neighborhood that they will live in. Their choice depends on several factors as the type of houses, the distance from the workplace, and the most important is the availability of schools in that neighborhood or a nearby neighborhood. Therefore, the parent has to explore the schools according to the neighborhood.

### 1.2 Business Problem

The main goal of this project is to help parents choose the right neighbourhood when they are moved. This will be based on the available schools in that neighbourhood. Therefore, in this project i will explore the neighbourhoods of Toronto city regarding the available schools in each neighbourhood. Schools have different categories, which are: elementary school, high School, middle School and etc. The number of schools in each category will be presented as well.

# The Methodology

##  2. Data

### 2.1 Import all libraries 

In [124]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import requests # library to handle requests
import lxml.html as lh
from bs4 import BeautifulSoup
from urllib.request import urlopen
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

### 2.2 Data Source

The data used from wikipedi which contains all neighborhood in Toronto city with their postal code. 

From this link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Then, I scraped the data using BeautifulSoup library

After that, I use another data set for getting the Latitude	Longitude of each neighborhood

From this link: http://cocl.us/Geospatial_data

However, The available schools in each neighbourhood and their categorises are extracted from foursquare.

## 3. Data Preprocessing

### Scrape The Data and save it into panda data frame

In [125]:
source=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup=BeautifulSoup(source,'lxml')


# find the table

from IPython.display import display_html
Postal_table = str(soup.table)

### Saving the scarped data into panda frame, and handle missing values by delete all rows that have ' Not assigned ' value 

In [126]:
# use panda frame and delete not assigned value
dfs = pd.read_html(Postal_table)
df=dfs[0]
df = df[df.Borough != 'Not assigned']
df.reset_index(inplace=True)
df.head()

Unnamed: 0,index,Postal code,Borough,Neighborhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,Regent Park / Harbourfront
3,5,M6A,North York,Lawrence Manor / Lawrence Heights
4,6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


### Group neighborhood by postal code

In [127]:
df=df.groupby('Postal code').agg(lambda x: ','.join(x)) 

### Read the Geospatial data from csv file

In [128]:
geo_data= pd.read_csv("http://cocl.us/Geospatial_data")
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge the original data frame with the Geospatial data frame to get a complete dataset

In [129]:
# merge the data
Merged_data = pd.merge(df,geo_data, how='left', left_on = 'Postal code', right_on = 'Postal Code')
Merged_data.head()

Unnamed: 0,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,Scarborough,Malvern / Rouge,M1B,43.806686,-79.194353
1,Scarborough,Rouge Hill / Port Union / Highland Creek,M1C,43.784535,-79.160497
2,Scarborough,Guildwood / Morningside / West Hill,M1E,43.763573,-79.188711
3,Scarborough,Woburn,M1G,43.770992,-79.216917
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


## 4. Explore The Neighborhoods

### Get the coordinate information of Toronto 

In [130]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Create a map of the Neighborhoods using folium

In [131]:
# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Merged_data['Latitude'], Merged_data['Longitude'], Merged_data['Borough'], Merged_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ccc731',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

## 5. Explore Schools in the Neighborhoods

### Assign Foursquare access information

In [132]:
CLIENT_ID = 'GFKFJQN5Y5BCZND0PK3FYJWOUECHRUJHQ4MLMYXMODTRVUCD' # your Foursquare ID
CLIENT_SECRET = 'A15JH5BBJZF2NGEPIYU1IA301G4CZUFBK5BZELCAWHLNKKND' # your Foursquare Secret
VERSION = '20200422' # Foursquare API version

### Get the information from Foursquare. 
### Based on Foursquare website, the category Id of schools is : 4bf58dd8d48988d13b941735 , so it was included in the request url

In [133]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):  
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d13b941735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Add the extraced information to the data frame based on the postal code

In [134]:

Toronto_venues = getNearbyVenues(names=Merged_data['Neighborhood'],
                                   latitudes=Merged_data['Latitude'],
                                   longitudes=Merged_data['Longitude']
                                  )
for item in Toronto_venues:
    indexNames = Merged_data[ Merged_data['Postal Code'] == item ].index
    # Delete these row indexes from dataFrame
    Merged_data.drop(indexNames , inplace=True)

Malvern / Rouge
Rouge Hill / Port Union / Highland Creek
Guildwood / Morningside / West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park / Ionview / East Birchmount Park
Golden Mile / Clairlea / Oakridge
Cliffside / Cliffcrest / Scarborough Village West
Birch Cliff / Cliffside West
Dorset Park / Wexford Heights / Scarborough Town Centre
Wexford / Maryvale
Agincourt
Clarks Corners / Tam O'Shanter / Sullivan
Milliken / Agincourt North / Steeles East / L'Amoreaux East
Steeles West / L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview / Henry Farm / Oriole
Bayview Village
York Mills / Silver Hills
Willowdale / Newtonbrook
Willowdale
York Mills West
Willowdale
Parkwoods
Don Mills
Don Mills
Bathurst Manor / Wilson Heights / Downsview North
Northwood Park / York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill / Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West / Riverdale
India Bazaar / The Beaches 

In [135]:
print(Toronto_venues.shape)
Toronto_venues.head()

(197, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kennedy Park / Ionview / East Birchmount Park,43.727929,-79.262029,St. Maria Goretti Catholic School,43.730201,-79.266135,School
1,Kennedy Park / Ionview / East Birchmount Park,43.727929,-79.262029,SCAS,43.728427,-79.256372,School
2,Clarks Corners / Tam O'Shanter / Sullivan,43.781638,-79.304302,Scarborough Pauline Johnson YMCA Before and Af...,43.78502,-79.303703,Daycare
3,Clarks Corners / Tam O'Shanter / Sullivan,43.781638,-79.304302,Stephen Leacock Collegiate Institute - Gym,43.785057,-79.300964,High School
4,Steeles West / L'Amoreaux West,43.799525,-79.318389,Scarborough Beverly Glen YMCA Before and After...,43.798805,-79.323136,Daycare


Let's find out how many unique categories can be curated from all the returned venues

In [136]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 31 uniques categories.


## 6. Analyze Each Neighborhood

In [137]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Neighborhood,Adult Education Center,Breakfast Spot,Building,Church,College & University,College Academic Building,College Administrative Building,College Arts Building,College Classroom,College Technology Building,Community College,Daycare,Elementary School,Flight School,General College & University,Gym / Fitness Center,High School,Language School,Medical School,Middle School,Miscellaneous Shop,Movie Theater,Music School,Non-Profit,Office,Paper / Office Supplies Store,Performing Arts Venue,School,Student Center,Trade School,University
0,Kennedy Park / Ionview / East Birchmount Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
1,Kennedy Park / Ionview / East Birchmount Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
2,Clarks Corners / Tam O'Shanter / Sullivan,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Clarks Corners / Tam O'Shanter / Sullivan,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Steeles West / L'Amoreaux West,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [138]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,Adult Education Center,Breakfast Spot,Building,Church,College & University,College Academic Building,College Administrative Building,College Arts Building,College Classroom,College Technology Building,Community College,Daycare,Elementary School,Flight School,General College & University,Gym / Fitness Center,High School,Language School,Medical School,Middle School,Miscellaneous Shop,Movie Theater,Music School,Non-Profit,Office,Paper / Office Supplies Store,Performing Arts Venue,School,Student Center,Trade School,University
0,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,Bedford Park / Lawrence Manor East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0
3,Brockton / Parkdale Village / Exhibition Place,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0
4,Business reply mail Processing CentrE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,CN Tower / King and Spadina / Railway Lands / ...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.125,0.0,0.125
7,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
9,Clarks Corners / Tam O'Shanter / Sullivan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Print each neighborhood along with the top 5 most common venues

In [139]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp =Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alderwood / Long Branch----
                    venue  freq
0                  School   1.0
1  Adult Education Center   0.0
2             High School   0.0
3            Trade School   0.0
4          Student Center   0.0


----Bedford Park / Lawrence Manor East----
                    venue  freq
0                  School   1.0
1  Adult Education Center   0.0
2             High School   0.0
3            Trade School   0.0
4          Student Center   0.0


----Berczy Park----
                    venue  freq
0          Student Center   0.5
1                  School   0.5
2  Adult Education Center   0.0
3             High School   0.0
4            Trade School   0.0


----Brockton / Parkdale Village / Exhibition Place----
                    venue  freq
0                  School  0.75
1          Breakfast Spot  0.25
2  Adult Education Center  0.00
3             High School  0.00
4            Trade School  0.00


----Business reply mail Processing CentrE----
                    venue  f

### Extract and display the top 5 venues for each neighborhood

In [140]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [141]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alderwood / Long Branch,School,University,General College & University,Breakfast Spot,Building
1,Bedford Park / Lawrence Manor East,School,University,General College & University,Breakfast Spot,Building
2,Berczy Park,Student Center,School,University,Flight School,Breakfast Spot
3,Brockton / Parkdale Village / Exhibition Place,School,Breakfast Spot,University,General College & University,Building
4,Business reply mail Processing CentrE,School,University,General College & University,Breakfast Spot,Building


In [142]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alderwood / Long Branch,1,1,1,1,1,1
Bedford Park / Lawrence Manor East,2,2,2,2,2,2
Berczy Park,2,2,2,2,2,2
Brockton / Parkdale Village / Exhibition Place,4,4,4,4,4,4
Business reply mail Processing CentrE,1,1,1,1,1,1
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,1,1,1,1,1,1
Central Bay Street,8,8,8,8,8,8
Christie,2,2,2,2,2,2
Church and Wellesley,6,6,6,6,6,6
Clarks Corners / Tam O'Shanter / Sullivan,2,2,2,2,2,2


# The Result 

## 7.Cluster Neighborhoods Based on Schools 

### Run k-means to cluster the neighborhood into 4 clusters

In [143]:
# set number of clusters
kclusters = 4

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 0, 2, 2, 0, 0, 1, 1, 1], dtype=int32)

### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [144]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = Merged_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Scarborough,Malvern / Rouge,M1B,43.806686,-79.194353,,,,,,
1,Scarborough,Rouge Hill / Port Union / Highland Creek,M1C,43.784535,-79.160497,,,,,,
2,Scarborough,Guildwood / Morningside / West Hill,M1E,43.763573,-79.188711,,,,,,
3,Scarborough,Woburn,M1G,43.770992,-79.216917,,,,,,
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,,,,,,


### Visualize the resulting clusters on a map

In [145]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[3],
        fill=True,
        fill_color=rainbow[3],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [146]:
# create map of Toronto using latitude and longitude values
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
from folium.plugins import MarkerCluster
grouping = MarkerCluster().add_to(map_clusters)
# add markers to map
for lat, lng, label in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_clusters

## 5. Examine Clusters

Now, I will examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories

#### Cluster 1

In [147]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
15,Steeles West / L'Amoreaux West,0.0,Daycare,University,General College & University,Breakfast Spot,Building
22,Willowdale,0.0,Student Center,College Administrative Building,College Classroom,High School,University
24,Willowdale,0.0,Student Center,College Administrative Building,College Classroom,High School,University
36,Woodbine Heights,0.0,Church,University,General College & University,Breakfast Spot,Building
40,East Toronto,0.0,Music School,School,High School,College Administrative Building,Daycare
42,India Bazaar / The Beaches West,0.0,College Classroom,University,General College & University,Breakfast Spot,Building
43,Studio District,0.0,School,Trade School,University,Flight School,Breakfast Spot
46,North Toronto West,0.0,College Academic Building,University,General College & University,Breakfast Spot,Building
47,Davisville,0.0,School,High School,Elementary School,University,Flight School
49,Summerhill West / Rathnelly / South Hill / For...,0.0,School,High School,Elementary School,University,Flight School


#### Cluster 2

In [148]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
13,Clarks Corners / Tam O'Shanter / Sullivan,1.0,Daycare,High School,University,General College & University,Breakfast Spot
17,Hillcrest Village,1.0,General College & University,High School,University,Breakfast Spot,Building
18,Fairview / Henry Farm / Oriole,1.0,High School,University,General College & University,Breakfast Spot,Building
20,York Mills / Silver Hills,1.0,High School,University,General College & University,Breakfast Spot,Building
26,Don Mills,1.0,General College & University,High School,University,Breakfast Spot,Building
27,Don Mills,1.0,General College & University,High School,University,Breakfast Spot,Building
44,Lawrence Park,1.0,School,High School,University,Flight School,Breakfast Spot
45,Davisville North,1.0,High School,University,General College & University,Breakfast Spot,Building
52,Church and Wellesley,1.0,High School,School,University,Flight School,Breakfast Spot
75,Christie,1.0,Music School,High School,University,Flight School,Breakfast Spot


#### Cluster 3

In [149]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Kennedy Park / Ionview / East Birchmount Park,2.0,School,University,General College & University,Breakfast Spot,Building
29,Northwood Park / York University,2.0,School,University,General College & University,Breakfast Spot,Building
35,Parkview Hill / Woodbine Gardens,2.0,School,University,General College & University,Breakfast Spot,Building
37,The Beaches,2.0,School,University,General College & University,Breakfast Spot,Building
38,Leaside,2.0,School,University,General College & University,Breakfast Spot,Building
41,The Danforth West / Riverdale,2.0,School,Elementary School,University,General College & University,Breakfast Spot
51,St. James Town / Cabbagetown,2.0,School,University,General College & University,Breakfast Spot,Building
58,Richmond / Adelaide / King,2.0,School,Adult Education Center,Language School,Non-Profit,College Arts Building
59,Harbourfront East / Union Station / Toronto Is...,2.0,School,University,General College & University,Breakfast Spot,Building
62,Bedford Park / Lawrence Manor East,2.0,School,University,General College & University,Breakfast Spot,Building


#### Cluster 4

In [150]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
48,Moore Park / Summerhill East,3.0,Elementary School,University,General College & University,Breakfast Spot,Building
83,Parkdale / Roncesvalles,3.0,Elementary School,University,General College & University,Breakfast Spot,Building


# 8. Results

### The result of each phase of this project has been presented above. However, parents know can use this data to see the available of schools in Toronto neighborhood. The can see the type of schools as well, and the number of schools in each neighborhood.

### The table below shows the top 5 Neighborhood that have the highest number of schools. Commerce Court / Victoria Hotel has the highest number of schools which is 15 schools. Followed by First Canadian Place / Underground city, Toronto Dominion Centre / Design Exchange, St. James Town and University of Toronto / Harbord.

In [151]:
Toronto_venues.groupby('Neighborhood').count().sort_values(['Venue'], ascending=False).head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Commerce Court / Victoria Hotel,15,15,15,15,15,15
First Canadian Place / Underground city,13,13,13,13,13,13
Toronto Dominion Centre / Design Exchange,11,11,11,11,11,11
St. James Town,10,10,10,10,10,10
University of Toronto / Harbord,10,10,10,10,10,10


## 9. Discussion

### The use of foursquare was useful for this type of projects, that depends mainly on locations. However, I noticed that they have 31 categories under school venue.

### However, I recommend to perform similar projects as the following: 
### 1- A project that focus on one type of school and compare between them. 
### 2- A project that compare between number of schools and the size of the neighbourhood. Is there any relationship between them?


## 10. Conclusion

### To conclude, the choose of the right school is one of parent concerns. This will raise, if the parent dose not know the area or moving to a new area,
### because their choose of a neighbourhood often depend on the chosen school. Therefore, this project is designed to help parents to explore the schools 
### in Toronto regarding its neighbourhoods. The results shows that Commerce Court / Victoria Hotel has the highest number of schools, that counted 
### as 15 schools. As a recommendation, to do slimier project that compare between number of schools and the size of the neighbourhood.