# CMSteward's Week 3 Class Submission
## Segmenting and Clustering Neighborhoods in Toronto
### Part 1:  Scraping the data from Wiki to get postal codes in Toronto
I will be using the Pandas Dataframe

In [1]:
#  Importing the necessary library
import numpy as np
import pandas as pd

geocoded ='C:/Users/suenc/OneDrive/Training/CourseraDataScience/Capstone/Geospatial_Coordinates.csv'
pcode_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Reading in the data using pd.read_html and converting that to a dataframe

In [2]:
#  Reading in the pandas dataframe for the postal code
df_pcode = pd.read_html(pcode_url)[0]
print(df_pcode.shape)
df_pcode.head()

(180, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Cleaning the data as required by the assignment - removing the not assigned postal codes

In [3]:
#  Just removing the Not assigned boroughs
df_pcode_adj = df_pcode[df_pcode['Borough'] != 'Not assigned'].reset_index(drop = True)
df_pcode_adj.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### Part 2, importing and merging the data with geocoding.
I tried the code process through geocode and ran into bad data sets as explained.  This is the merging of the .csv file provided in the instructions with the adjusted dataframe above.

In [4]:
#  Read the data set into pandas as a new dataframe
df_geo = pd.read_csv(geocoded)
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging the adjusted dataframe from part one to the new dataframe in part two and dropping postal code from part two

In [5]:
#  Creating a new dataframe based on df_geo and df_pcode_adj
df_geo_pcode = pd.merge(df_pcode_adj, df_geo)

In [6]:
df_geo_pcode.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [7]:
df_geo_pcode.shape

(103, 5)

### Part 3, Neighborhood analysis
I am a little uninspired, but here is the analytics - kind of a cheat based on the New York Data

In [8]:
#  Take a look at the number of Borough's in the data frame
#  Finally a quick look to ensure the data came across right, we know NYC has 5 boroughs and 306 neighborhoods
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_geo_pcode['Borough'].unique()),
        df_geo_pcode.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


First I am going to see which boroughs there are, then work on a smaller sub set of that

In [9]:
df_geo_pcode.groupby('Borough').count()

Unnamed: 0_level_0,Postal Code,Neighbourhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Central Toronto,9,9,9,9
Downtown Toronto,19,19,19,19
East Toronto,5,5,5,5
East York,5,5,5,5
Etobicoke,12,12,12,12
Mississauga,1,1,1,1
North York,24,24,24,24
Scarborough,17,17,17,17
West Toronto,6,6,6,6
York,5,5,5,5


I kindo of like the amount of data in either Toronto or York, so I am going to look at those graphically
Step 1.  Find the lat and lon of Toronto then build a map

In [10]:
df_geo_pcode.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [11]:
#  First a few additional libraries, mapping and geocoders
import folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm  
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# Now set a few variables
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="canada_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

#  Now see what we have ended up with
print(latitude)
print(longitude)

43.6534817
-79.3839347


In [12]:
#  Building my map based on the above ideas
# create map of Toronto using latitude and longitude values from the above geolocation updating folium
toronto_map_test = folium.Map(location=[latitude, longitude], zoom_start=11)

toronto_map_test

Now I just want to see how the boroughs group together.

In [13]:
# add markers to map
for lat, lng, borough, neighborhood in zip(df_geo_pcode['Latitude'], df_geo_pcode['Longitude'], df_geo_pcode['Borough'], df_geo_pcode['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map_test)  
    
toronto_map_test

After looking at the above information it feels like the best grouped information and the best data set would be to use the Toronto addresses.  Geographically they sit better.  So the next step is to create the dataset with only Toronto information and run it again. 

In [14]:
#  First create the toronto dataset
df_toronto = df_geo_pcode[df_geo_pcode['Borough'].str.contains('Toronto')].reset_index(drop = True)
df_toronto


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [15]:
# create map of Toronto using latitude and longitude values from the above geolocation updating folium
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=12)

toronto_map

In [16]:
#  Now a quick graph to double check
# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    

print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))
toronto_map

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


This makes a nice package of Toronto neighborhoods (sorry, I spell it United Statesy).  Next I am importing four square data.  The client ide etc.... will be hiden from the code.

In [18]:
#  The previous cell with my client ID and stuff is removed
#  This is one additional library I will be using soon
import json # Library to handle JSON files
import requests

#  Here we have the remaining parts of the four square pull
VERSION = '20180604'
LIMIT = 200


In [19]:
#  Borrowing the getNearbyVenues from the lab
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
#  Now to pull in the data using the above function
toronto_venues = getNearbyVenues(names = df_toronto['Neighbourhood'],
                                 latitudes = df_toronto['Latitude'],
                                 longitudes = df_toronto['Longitude']
                                 )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
R

In [21]:
#  Checking the new dataframe shape, size, etc...
print(toronto_venues.shape)
print(toronto_venues.dtypes)
toronto_venues.head()

(893, 7)
Neighborhood               object
Neighborhood Latitude     float64
Neighborhood Longitude    float64
Venue                      object
Venue Latitude            float64
Venue Longitude           float64
Venue Category             object
dtype: object


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [22]:
#  Out of curiosity I wanted to see how many neighborhoods there were
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,7,7,7,7,7,7
"Brockton, Parkdale Village, Exhibition Place",13,13,13,13,13,13
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",7,7,7,7,7,7
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",5,5,5,5,5,5
Central Bay Street,35,35,35,35,35,35
Christie,7,7,7,7,7,7
Church and Wellesley,52,52,52,52,52,52
"Commerce Court, Victoria Hotel",73,73,73,73,73,73
Davisville,25,25,25,25,25,25
Davisville North,4,4,4,4,4,4


### My question of the data
I decided I wanted to find the best neighborhood based on resturaunt types.  First thing was to evaluate the data frame for types and make a smaller dataframe based on resturaunts only.

In [23]:
#  First is to review the data. Grouping and identifying different venue categories.  To get a better look at what
# I could key in on I exported a CSV.
neighborhood_basic = toronto_venues[['Neighborhood', 'Venue Category']]
neighborhood_basic.groupby('Venue Category').count()
neighborhood_basic.to_csv('C:/Users/suenc/OneDrive/Training/CourseraDataScience/Capstone/types1.csv')

In [24]:
#  So here is my neighborhood restaurant dataframe that I am going to analyze
df_tor_rest = toronto_venues[toronto_venues['Venue Category'].str.contains('Restaurant')].reset_index(drop = True)
df_tor_rest 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Mercatto,43.660391,-79.387664,Italian Restaurant
1,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Nando's,43.661728,-79.386391,Portuguese Restaurant
2,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Sushi Box,43.662960,-79.386580,Sushi Restaurant
3,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Thai Express,43.661630,-79.387340,Thai Restaurant
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tosto,43.661198,-79.386414,Italian Restaurant
...,...,...,...,...,...,...,...
219,Church and Wellesley,43.665860,-79.383160,Ginger,43.665372,-79.380846,Vietnamese Restaurant
220,Church and Wellesley,43.665860,-79.383160,Darvish,43.663407,-79.383929,Middle Eastern Restaurant
221,Church and Wellesley,43.665860,-79.383160,Loaded Pierogi,43.664665,-79.380641,Polish Restaurant
222,Church and Wellesley,43.665860,-79.383160,Kokoni Izakaya,43.664181,-79.380258,Japanese Restaurant


Now that I have a smaller data set I can do one hot encoding, and see how that works out.

In [25]:
# First I will run one hot encoding - changing each venue type into a column 
df_tor_rest_onehot = pd.get_dummies(df_tor_rest[['Venue Category']], prefix = "", prefix_sep ="")
df_tor_rest_onehot['Neighborhood'] = df_tor_rest['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [df_tor_rest_onehot.columns[-1]] + list(df_tor_rest_onehot.columns[:-1])
df_tor_rest_onehot = df_tor_rest_onehot[fixed_columns]

#  A little deeper dive into data information
print(df_tor_rest_onehot.shape)
print(df_tor_rest_onehot.dtypes)
df_tor_rest_onehot.head()


(224, 39)
Neighborhood                     object
American Restaurant               uint8
Arepa Restaurant                  uint8
Asian Restaurant                  uint8
Belgian Restaurant                uint8
Caribbean Restaurant              uint8
Chinese Restaurant                uint8
Colombian Restaurant              uint8
Comfort Food Restaurant           uint8
Cuban Restaurant                  uint8
Ethiopian Restaurant              uint8
Falafel Restaurant                uint8
Fast Food Restaurant              uint8
French Restaurant                 uint8
Gluten-free Restaurant            uint8
Greek Restaurant                  uint8
Hong Kong Restaurant              uint8
Indian Restaurant                 uint8
Italian Restaurant                uint8
Japanese Restaurant               uint8
Korean Restaurant                 uint8
Latin American Restaurant         uint8
Mediterranean Restaurant          uint8
Mexican Restaurant                uint8
Middle Eastern Restaurant     

Unnamed: 0,Neighborhood,American Restaurant,Arepa Restaurant,Asian Restaurant,Belgian Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,...,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
#  Now we can group each neighborhood with totals
df_tor_rest_grouped = df_tor_rest_onehot.groupby('Neighborhood').mean().reset_index()
print(df_tor_rest_grouped.shape)
df_tor_rest_grouped.head()

(27, 39)


Unnamed: 0,Neighborhood,American Restaurant,Arepa Restaurant,Asian Restaurant,Belgian Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,...,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.333333
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Christie,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
#  Just a little further dive, top 5 common venues for each neighborhood
num_top_venues = 5

for hood in df_tor_rest_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = df_tor_rest_grouped[df_tor_rest_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                      venue  freq
0                Restaurant   1.0
1       American Restaurant   0.0
2     Portuguese Restaurant   0.0
3  Mediterranean Restaurant   0.0
4        Mexican Restaurant   0.0


----Brockton, Parkdale Village, Exhibition Place----
                   venue  freq
0  Vietnamese Restaurant  0.33
1             Restaurant  0.33
2    Japanese Restaurant  0.33
3     Tibetan Restaurant  0.00
4       Theme Restaurant  0.00


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                       venue  freq
0       Fast Food Restaurant   1.0
1        American Restaurant   0.0
2      Portuguese Restaurant   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----Central Bay Street----
                        venue  freq
0          Italian Restaurant  0.38
1                  Restaurant  0.25
2         Japanese Restaurant  0.12
3   Middle Eastern Restaurant  0.12
4  Modern European Re

In [28]:
#  Now build a function to sort the venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
# Now a new dataframe and display the top 10 venues in each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = df_tor_rest_grouped['Neighborhood']

for ind in np.arange(df_tor_rest_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_tor_rest_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(40)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
1,"Brockton, Parkdale Village, Exhibition Place",Vietnamese Restaurant,Restaurant,Japanese Restaurant,Theme Restaurant,Cuban Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant
2,"Business reply mail Processing Centre, South C...",Fast Food Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Indian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Falafel Restaurant,Cuban Restaurant
3,Central Bay Street,Italian Restaurant,Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Japanese Restaurant,Caribbean Restaurant,Chinese Restaurant,Belgian Restaurant,Colombian Restaurant,Comfort Food Restaurant
4,Christie,American Restaurant,Ethiopian Restaurant,Indian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Cuban Restaurant
5,Church and Wellesley,Japanese Restaurant,Ramen Restaurant,Ethiopian Restaurant,Italian Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Polish Restaurant,Vietnamese Restaurant,Restaurant,Sushi Restaurant
6,"Commerce Court, Victoria Hotel",Restaurant,Japanese Restaurant,Gluten-free Restaurant,Italian Restaurant,American Restaurant,Sushi Restaurant,Asian Restaurant,Fast Food Restaurant,Chinese Restaurant,Seafood Restaurant
7,Davisville,Italian Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,New American Restaurant,Thai Restaurant,Indian Restaurant,Belgian Restaurant,Caribbean Restaurant,Chinese Restaurant
8,"Dufferin, Dovercourt Village",Middle Eastern Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Cuban Restaurant
9,"First Canadian Place, Underground city",Restaurant,American Restaurant,Seafood Restaurant,Sushi Restaurant,Asian Restaurant,Japanese Restaurant,Thai Restaurant,Greek Restaurant,Chinese Restaurant,Fast Food Restaurant


#### Observation # 1:  
Since I am interested in resturants, I can see buy this dataframe how each neighborhood is sorted out.  This allows me to evaluate neighborhood by neighborhood which have restaurants that I like.  For instance Greek or Thai. 

Now I will do k-clustering to see these values a little better.

After looking at the data and the shape of information on the map, I felt like a k value of 4 was a good start.

In [30]:
#  First the k-means clustering the neighborhoods into 4 clusters
kclusters = 6
tor_grouped_cluster = df_tor_rest_grouped.drop('Neighborhood', 1)

# Build the model
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(tor_grouped_cluster)
kmeans.labels_[0:10]

array([5, 1, 3, 1, 4, 1, 1, 1, 2, 1])

In [31]:
#  Adding lables and creating a new dataframe that includes the cluster and top 10
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
# neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
df_tor_merge = df_toronto
df_tor_merge = df_tor_merge.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')
df_tor_merge.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,,,,,,,,,,,
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Italian Restaurant,Thai Restaurant,Sushi Restaurant,Restaurant,Portuguese Restaurant,Cuban Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1.0,Middle Eastern Restaurant,Vietnamese Restaurant,Ramen Restaurant,Fast Food Restaurant,Mexican Restaurant,Ethiopian Restaurant,Chinese Restaurant,Greek Restaurant,Restaurant,Thai Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ethiopian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,,,,,,,,,,,


Now to see those visually

In [32]:
df_tor_merge.dtypes

Postal Code                object
Borough                    object
Neighbourhood              object
Latitude                  float64
Longitude                 float64
Cluster Labels            float64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

In [33]:
#  Fix the NaN issue to see the clustering better
df_tor_merge.dropna(inplace = True)
df_tor_merge.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Italian Restaurant,Thai Restaurant,Sushi Restaurant,Restaurant,Portuguese Restaurant,Cuban Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1.0,Middle Eastern Restaurant,Vietnamese Restaurant,Ramen Restaurant,Fast Food Restaurant,Mexican Restaurant,Ethiopian Restaurant,Chinese Restaurant,Greek Restaurant,Restaurant,Thai Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ethiopian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,5.0,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1.0,Italian Restaurant,Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Japanese Restaurant,Caribbean Restaurant,Chinese Restaurant,Belgian Restaurant,Colombian Restaurant,Comfort Food Restaurant


In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_tor_merge['Latitude'], df_tor_merge['Longitude'], df_tor_merge['Neighbourhood'], df_tor_merge['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Now Finally each Cluster is to be evaluated on its own.  
I upped the number of cluster to 6, but the density still stays on Toronto downtown.  No worries, lets look at each cluster individually.

In [35]:
#  Evaluating Cluster 1
df_tor_merge.loc[df_tor_merge['Cluster Labels'] == 0, df_tor_merge.columns[[1] + list(range(5,df_tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,0.0,Sushi Restaurant,Vietnamese Restaurant,Cuban Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
28,West Toronto,0.0,Sushi Restaurant,French Restaurant,Falafel Restaurant,Vietnamese Restaurant,Cuban Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,Fast Food Restaurant,Ethiopian Restaurant


In [36]:
#  Evaluating Cluster 2
df_tor_merge.loc[df_tor_merge['Cluster Labels'] == 1, df_tor_merge.columns[[1] + list(range(5,df_tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,1.0,Italian Restaurant,Thai Restaurant,Sushi Restaurant,Restaurant,Portuguese Restaurant,Cuban Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant
2,Downtown Toronto,1.0,Middle Eastern Restaurant,Vietnamese Restaurant,Ramen Restaurant,Fast Food Restaurant,Mexican Restaurant,Ethiopian Restaurant,Chinese Restaurant,Greek Restaurant,Restaurant,Thai Restaurant
3,Downtown Toronto,1.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ethiopian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant
6,Downtown Toronto,1.0,Italian Restaurant,Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Japanese Restaurant,Caribbean Restaurant,Chinese Restaurant,Belgian Restaurant,Colombian Restaurant,Comfort Food Restaurant
8,Downtown Toronto,1.0,Asian Restaurant,Japanese Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant,American Restaurant,Thai Restaurant,Restaurant,Colombian Restaurant,Greek Restaurant
10,Downtown Toronto,1.0,Italian Restaurant,New American Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant,Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Cuban Restaurant,French Restaurant
11,West Toronto,1.0,Vietnamese Restaurant,Asian Restaurant,New American Restaurant,Cuban Restaurant,French Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Korean Restaurant,Japanese Restaurant,Portuguese Restaurant
12,East Toronto,1.0,Greek Restaurant,Japanese Restaurant,Restaurant,Italian Restaurant,Indian Restaurant,Tibetan Restaurant,Belgian Restaurant,Caribbean Restaurant,Asian Restaurant,Chinese Restaurant
13,Downtown Toronto,1.0,Restaurant,Japanese Restaurant,Thai Restaurant,Gluten-free Restaurant,American Restaurant,Sushi Restaurant,Mexican Restaurant,Seafood Restaurant,Italian Restaurant,Fast Food Restaurant
14,West Toronto,1.0,Vietnamese Restaurant,Restaurant,Japanese Restaurant,Theme Restaurant,Cuban Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant


In [37]:
#  Evaluating Cluster 3
df_tor_merge.loc[df_tor_merge['Cluster Labels'] == 2, df_tor_merge.columns[[1] + list(range(5,df_tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,West Toronto,2.0,Middle Eastern Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Cuban Restaurant
24,Central Toronto,2.0,Indian Restaurant,Middle Eastern Restaurant,Ethiopian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Vietnamese Restaurant


In [38]:
#  Evaluating Cluster 4
df_tor_merge.loc[df_tor_merge['Cluster Labels'] == 3, df_tor_merge.columns[[1] + list(range(5,df_tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,East Toronto,3.0,Fast Food Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Indian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Falafel Restaurant,Cuban Restaurant


In [39]:
#  Evaluating Cluster 5
df_tor_merge.loc[df_tor_merge['Cluster Labels'] == 4, df_tor_merge.columns[[1] + list(range(5,df_tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Downtown Toronto,4.0,American Restaurant,Ethiopian Restaurant,Indian Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Cuban Restaurant


In [40]:
#  Evaluating Cluster 6
df_tor_merge.loc[df_tor_merge['Cluster Labels'] == 5, df_tor_merge.columns[[1] + list(range(5,df_tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Downtown Toronto,5.0,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant


## Conclusion
### 1: The greatest grouping of clusters is primarily downtown Toronto

### 2: If I want to move to an area or work in an area with the best restaurant selection it would be
#### Downtown 1st - Centeral and West 2nd.
