**In this Notebook we will prepare out dataset by scrapping from Wikipedia the information on Neighborhoods in Toronto**

For scrapping the data from Wikipedia we will use the BeautifulSoup package. <br /> 
First we import the libraries we need:

In [1]:
from bs4 import BeautifulSoup
import requests
print("import complete")

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20190329094827-0002
KERNEL_ID = 4ec130cc-d207-4f68-817b-dd440b40e3c5
import complete


we import rest libraries that will come handy for our further data processing:

In [2]:
import numpy as np 
import pandas as pd 

print('Libraries imported.')

Libraries imported.


We use BeautifulSoup to get the data and create our pandas dataframe

In [3]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]

type(df)

pandas.core.frame.DataFrame

We assign column names:

In [4]:
df.columns=['Postcode','Borough','Neighborhood']

We drop the first line with values same as our column names as its not part of the data we need:

In [5]:
df.drop(df.index[0],inplace=True)

We replace the "Not assigned" parameters of Neighborhood column with the value of their Borough:

In [6]:
df['Neighborhood'] = np.where(df['Neighborhood'] == 'Not assigned', df['Borough'], df['Neighborhood'])


We create a new dataframe exluding all rows that have "Not assigned" Borough

In [7]:
df_new=df[df.Borough !='Not assigned']

We combine the rows where  Neighborhoods  share the same Postal codes and separate the Neighborhoods with a comma and assign them under the dataframe df_t:

In [8]:
df_t=df_new.groupby(['Postcode','Borough'])['Neighborhood'].apply(','.join)
df_t= df_t.to_frame().reset_index()
df_t

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [9]:
df_t.shape

(103, 3)

we import the csv file with coordinates

In [10]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.
client_92f09b9dc024490ba6781e485fc94a04 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='ZxI8pUy5o37zAzje-GQ1Hg7bWxaGBiS9E-kRjZ45VQxb',
    ibm_auth_endpoint="https://iam.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_92f09b9dc024490ba6781e485fc94a04.get_object(Bucket='courseracapstoneproject-donotdelete-pr-hhijrzfy6oinon',Key='Geospatial_Coordinates.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()



Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


we rename the postal code column to merge it with our dataframe

In [11]:
df_data_1.columns = df_data_1.columns.str.replace('Postal Code','Postcode')

In [12]:
df_data_1.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


we create a new dataframe by merging the df with coordinates to our existing dataset

In [13]:
df_toronto=pd.merge(df_t, df_data_1, on="Postcode")
df_toronto.head()


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


we download the libraries we need for geocoding and visualization 

In [14]:
import json 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
!pip install geocoder

import requests 
from pandas.io.json import json_normalize 


import matplotlib.cm as cm
import matplotlib.colors as colors


!pip install folium
import folium

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.6.8

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /opt/ibm/conda/miniconda3

  added / updated specs: 
    - geopy


The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0             conda-forge
    geopy:           1.19.0-py_0           conda-forge

The following packages will be UPDATED:

    ca-certificates: 2017.08.26-h1d4fec5_0             --> 2019.3.9-hecc5488_0 conda-forge
    certifi:         2018.1.18-py35_0                  --> 2018.8.24-py35_1001 conda-forge
    conda:           4.5.11-py35_0                     --> 4.5.11-py35_0       conda-forge
    libgcc-ng:       7.2.0-h7cc24e2_2                  --> 8.2.0-hdf63c60_1               
    openssl:         1.0.2o-h20670df_0                 --> 1.0.2r-h14c3975_0   conda-forge

Preparing transaction: failed

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<

we find the coordinates for Toronto

In [15]:
address = 'Toronto, TO'

geolocator = Nominatim(user_agent="TO_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6524203, -79.3834045.


we create a smaller subset with only Boroughs that contain "Toronto" in their name:

In [16]:
df_Tsample=df_toronto[df_toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
df_Tsample

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


we map the boroughs basis coordinates and label them with Borough name and Postal code

In [17]:
map_toronto_neighborhoods = folium.Map(location=[latitude, longitude], zoom_start=10)


for lat, lng, label in zip(df_Tsample['Latitude'], df_Tsample['Longitude'], df_Tsample['Neighborhood']):
    label = '{}'.format(label)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_neighborhoods)  
    
map_toronto_neighborhoods

Now we will utilize the Foursquare API and run a similar segmentation to explore the Toronto Neigbhorhoods

In [18]:
CLIENT_ID = 'TZV5X5URLJIWHPZR01A3OTBXRZIUUL2UILGAZXSWFKYXCLD0' 
CLIENT_SECRET = 'J1Y5FJCYQHKFVH2GU5T0YYASWESE10XNOBRSNC13P2AWR2K4' 
VERSION = '20190326' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TZV5X5URLJIWHPZR01A3OTBXRZIUUL2UILGAZXSWFKYXCLD0
CLIENT_SECRET:J1Y5FJCYQHKFVH2GU5T0YYASWESE10XNOBRSNC13P2AWR2K4


let's explore the first neighborhood

In [19]:
df_Tsample.loc[0, 'Neighborhood']


'The Beaches'

get the coordinates of the Neighborhood

In [20]:
neighborhood_latitude = df_Tsample.loc[0, 'Latitude'] 
neighborhood_longitude = df_Tsample.loc[0, 'Longitude'] 

neighborhood_name = df_Tsample.loc[0, 'Neighborhood'] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


**Now, let's get the top 100 venues that are in The Beaches within a radius of 500 meters**

First, let's create the GET request URL. 

In [21]:
radius=500
LIMIT=100
VERSION='20190329'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET,VERSION,latitude, longitude, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=TZV5X5URLJIWHPZR01A3OTBXRZIUUL2UILGAZXSWFKYXCLD0&client_secret=J1Y5FJCYQHKFVH2GU5T0YYASWESE10XNOBRSNC13P2AWR2K4&v=20190329&ll=43.6524203,-79.3834045&radius=500&limit=100'

send the GET request and examine the results

In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9dea4a351e3d4c7c88b9ae'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5227bb01498e17bf485e6202-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/neighborhood_',
          'suffix': '.png'},
         'id': '4f2a25ac4b909258e854f55f',
         'name': 'Neighborhood',
         'pluralName': 'Neighborhoods',
         'primary': True,
         'shortName': 'Neighborhood'}],
       'id': '5227bb01498e17bf485e6202',
       'location': {'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'distance': 177,
        'formattedAddress': ['Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng': -79.38529600606677}],
        'lat': 43.6532

we define the get_category_type function

In [23]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 


filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]


nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198
3,Old City Hall,Monument / Landmark,43.652009,-79.381744
4,M Square Coffee Co,Coffee Shop,43.651218,-79.383555
5,John & Sons Oyster House,Seafood Restaurant,43.650656,-79.381613
6,Indigo,Bookstore,43.653515,-79.380696
7,Assembly Chef's Hall,Food Court,43.650579,-79.383412
8,The Keg Steakhouse & Bar,Steakhouse,43.649937,-79.384196
9,Apple Eaton Centre,Electronics Store,43.652823,-79.380615


In [25]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


**Explore rest of Neighborhoods in Toronto** <br>
Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
       
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
      
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

we run above function for all Neighborhoods in Toronto and create a new dataframe called Toronto_venues

In [27]:
Toronto_venues = getNearbyVenues(names=df_Tsample['Neighborhood'],
                                   latitudes=df_Tsample['Latitude'],
                                   longitudes=df_Tsample['Longitude']
                                 )


The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvall

In [28]:
print(Toronto_venues.shape)
Toronto_venues.head()

(1712, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
1,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
2,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


lets check how many venues were created for each Neighborhood:

In [29]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,58,58,58,58,58,58
"Brockton,Exhibition Place,Parkdale Village",22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",14,14,14,14,14,14
"Cabbagetown,St. James Town",43,43,43,43,43,43
Central Bay Street,85,85,85,85,85,85
"Chinatown,Grange Park,Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,86,86,86,86,86,86


Check how many unique categories were found from the generated data:

In [30]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 240 uniques categories.


**Analyze each Neighborhood**

In [55]:
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']],prefix="")

Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]


Toronto_onehot.head()


Unnamed: 0,Neighborhood,_Accessories Store,_Afghan Restaurant,_Airport,_Airport Food Court,_Airport Gate,_Airport Lounge,_Airport Service,_Airport Terminal,_American Restaurant,...,_Trail,_Train Station,_Vegetarian / Vegan Restaurant,_Video Game Store,_Vietnamese Restaurant,_Wine Bar,_Wine Shop,_Wings Joint,_Women's Store,_Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West,Riverdale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [56]:
Toronto_onehot.shape

(1712, 241)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [61]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped


Unnamed: 0,Neighborhood,_Accessories Store,_Afghan Restaurant,_Airport,_Airport Food Court,_Airport Gate,_Airport Lounge,_Airport Service,_Airport Terminal,_American Restaurant,...,_Trail,_Train Station,_Vegetarian / Vegan Restaurant,_Video Game Store,_Vietnamese Restaurant,_Wine Bar,_Wine Shop,_Wings Joint,_Women's Store,_Yoga Studio
0,"Adelaide,King,Richmond",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,...,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.05,0.0,0.04,0.01,0.0,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,...,0.0,0.0,0.0,0.011628,0.011628,0.0,0.0,0.011628,0.0,0.023256


#### Let's print each neighborhood along with the top 5 most common venues

In [62]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
              venue  freq
0      _Coffee Shop  0.06
1  _Thai Restaurant  0.04
2       _Steakhouse  0.04
3              _Bar  0.04
4             _Café  0.04


----Berczy Park----
             venue  freq
0     _Coffee Shop  0.09
1    _Cocktail Bar  0.05
2      _Restaurant  0.03
3  _Farmers Market  0.03
4     _Cheese Shop  0.03


----Brockton,Exhibition Place,Parkdale Village----
             venue  freq
0  _Breakfast Spot  0.09
1     _Coffee Shop  0.09
2            _Café  0.09
3     _Yoga Studio  0.05
4             _Gym  0.05


----Business Reply Mail Processing Centre 969 Eastern----
                 venue  freq
0    _Recording Studio  0.06
1      _Farmers Market  0.06
2       _Garden Center  0.06
3  _Light Rail Station  0.06
4             _Brewery  0.06


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
               venue  freq
0    _Airport Lounge  0.14
1   _Airport Service  0.14
2  _Airport 

**Let's put that into a pandas dataframe** <br>
First, let's write a function to sort the venues in descending order.

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [64]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']


columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",_Coffee Shop,_Bar,_Café,_Steakhouse,_Thai Restaurant,_Hotel,_Asian Restaurant,_Burger Joint,_Gym,_Bakery
1,Berczy Park,_Coffee Shop,_Cocktail Bar,_Seafood Restaurant,_Restaurant,_Pub,_Farmers Market,_Cheese Shop,_Café,_Bakery,_Steakhouse
2,"Brockton,Exhibition Place,Parkdale Village",_Breakfast Spot,_Café,_Coffee Shop,_Climbing Gym,_Falafel Restaurant,_Convenience Store,_Burrito Place,_Stadium,_Caribbean Restaurant,_Bar
3,Business Reply Mail Processing Centre 969 Eastern,_Pizza Place,_Auto Workshop,_Comic Shop,_Moving Target,_Recording Studio,_Restaurant,_Burrito Place,_Brewery,_Skate Park,_Smoke Shop
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",_Airport Lounge,_Airport Terminal,_Airport Service,_Boat or Ferry,_Sculpture Garden,_Harbor / Marina,_Plane,_Airport Gate,_Airport Food Court,_Airport


#### Cluster Neighborhoods

In [69]:
from sklearn.cluster import KMeans

In [70]:
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [71]:

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = df_Tsample

Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head() 

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,_Health Food Store,_Coffee Shop,_Pub,_Neighborhood,_Discount Store,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space,_Ethiopian Restaurant
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,0,_Greek Restaurant,_Coffee Shop,_Ice Cream Shop,_Italian Restaurant,_Bookstore,_Yoga Studio,_Brewery,_Bubble Tea Shop,_Café,_Restaurant
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,_Sandwich Place,_Gym,_Brewery,_Sushi Restaurant,_Food & Drink Shop,_Steakhouse,_Fish & Chips Shop,_Light Rail Station,_Fast Food Restaurant,_Burger Joint
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,_Café,_Coffee Shop,_Gastropub,_Italian Restaurant,_Bakery,_American Restaurant,_Yoga Studio,_Park,_Brewery,_Seafood Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,_Gym / Fitness Center,_Park,_Swim School,_Bus Line,_Yoga Studio,_Dog Run,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space


Lets visualize the resulting clusters:

In [132]:
# create map
Toronto_map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(Toronto_map_clusters)
       
        
Toronto_map_clusters

Lets examine the clusters

### Cluster 1

In [76]:
Cluster1=Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
Cluster1

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,_Health Food Store,_Coffee Shop,_Pub,_Neighborhood,_Discount Store,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space,_Ethiopian Restaurant
1,East Toronto,0,_Greek Restaurant,_Coffee Shop,_Ice Cream Shop,_Italian Restaurant,_Bookstore,_Yoga Studio,_Brewery,_Bubble Tea Shop,_Café,_Restaurant
2,East Toronto,0,_Sandwich Place,_Gym,_Brewery,_Sushi Restaurant,_Food & Drink Shop,_Steakhouse,_Fish & Chips Shop,_Light Rail Station,_Fast Food Restaurant,_Burger Joint
3,East Toronto,0,_Café,_Coffee Shop,_Gastropub,_Italian Restaurant,_Bakery,_American Restaurant,_Yoga Studio,_Park,_Brewery,_Seafood Restaurant
5,Central Toronto,0,_Food & Drink Shop,_Burger Joint,_Park,_Gym,_Breakfast Spot,_Hotel,_Sandwich Place,_Clothing Store,_Dance Studio,_Donut Shop
6,Central Toronto,0,_Sporting Goods Shop,_Coffee Shop,_Yoga Studio,_Italian Restaurant,_Pet Store,_Clothing Store,_Chinese Restaurant,_Dessert Shop,_Rental Car Location,_Diner
7,Central Toronto,0,_Sandwich Place,_Dessert Shop,_Restaurant,_Café,_Pizza Place,_Coffee Shop,_Italian Restaurant,_Sushi Restaurant,_Pharmacy,_Indian Restaurant
9,Central Toronto,0,_Coffee Shop,_Pub,_Pizza Place,_American Restaurant,_Convenience Store,_Medical Center,_Sports Bar,_Supermarket,_Sushi Restaurant,_Fried Chicken Joint
11,Downtown Toronto,0,_Coffee Shop,_Café,_Restaurant,_Pub,_Italian Restaurant,_Pizza Place,_Bakery,_Caribbean Restaurant,_Snack Place,_Bistro
12,Downtown Toronto,0,_Coffee Shop,_Japanese Restaurant,_Sushi Restaurant,_Burger Joint,_Gay Bar,_Restaurant,_Yoga Studio,_Men's Store,_Café,_Bubble Tea Shop


In [123]:
Cluster1['1st Most Common Venue'].value_counts().argmax()


  if __name__ == '__main__':


'_Coffee Shop'

### Cluster 2

In [126]:
Cluster2=Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
Cluster2

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,1,_Playground,_Gym,_Diner,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space,_Ethiopian Restaurant,_Electronics Store,_Eastern European Restaurant


### Cluster 3


In [128]:
Cluster3=Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
Cluster3

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,2,_Home Service,_Garden,_Yoga Studio,_Discount Store,_Filipino Restaurant,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space,_Ethiopian Restaurant


### Cluster 4

In [130]:
Cluster4=Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
Cluster4

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,3,_Mexican Restaurant,_Trail,_Sushi Restaurant,_Jewelry Store,_Yoga Studio,_Dog Run,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space


### Cluster 5

In [131]:
Cluster5=Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
Cluster5

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,4,_Gym / Fitness Center,_Park,_Swim School,_Bus Line,_Yoga Studio,_Dog Run,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space
10,Downtown Toronto,4,_Park,_Playground,_Trail,_Diner,_Fast Food Restaurant,_Farmers Market,_Falafel Restaurant,_Event Space,_Ethiopian Restaurant,_Electronics Store


From a look at the Clusters we see a big concentration of venues in Cluster 1 and far less options at rest of the Clusters. By searching the areas we see that Cluster 1 represents town or region centers whereas Clusters 2-5 are suburbs (Forest Park and Lawrence Hill can be found among the richest suburbs in Toronto). Such segmentation could assist in evaluating options to stay when travelling in Toronto or for real-estate research