# Proyect_02c

### Introduction 
___
_Objective:_ __To explore, segment and group the neighbourhoods of the city of Toronto.__   
The information is obtained with web scraping by Wikipedia and the data is transformed into a structured data format.   
The project is divided into 3 parts to facilitate the understanding and implementation of the code.

### Table of contents 
#### Notebook 3  
This notebook focuses on exploring the data, segmenting and grouping them in order to verify which are the most frequented places according to the selected groups.  

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#ref1">Import</a></li>
        <li><a href="#ref2">GeoData Exploration</a></li>
        <li><a href="#ref3">Group selection</a></li>
        <li><a href="#ref4">Cluster Neighborhoods</a></li>
        <li><a href="#ref5">Conclusion</a></li>
        <li><a href="#ref6">Resources</a></li>
    </ol>
</div>
<br>

<a id="ref1"></a>
# 1. Import 
In this section installs and imports the necessary packages for the project.
***

In [15]:
!pip install beautifulsoup4 # HTML and XML data extraction library.
!pip install request # requests , timeout
!pip install folium # map rendering library
!pip install geopy # convert an address into latitude and longitude values
# Importing Packages
import pandas as pd # DataFrame
import numpy as np # Arrays
from bs4 import BeautifulSoup # HTML and XML data extraction library.
import requests # requests , timeout
# Matplotlib and associated plotting modules
import matplotlib.cm as cm 
import matplotlib.colors as colors
from sklearn.cluster import KMeans # import k-means from clustering stage
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Functions
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# function that extracts informations    
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
print("Ready")

Ready


<a href="https://github.com/Azhura/Coursera_Capstone/blob/master/Project02a.ipynb">Notebook 1: Data Wrangling</a><br>

In [16]:
# Code - Notebook 1
# Loading the url data.
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page_response = requests.get(url,timeout=5) # requests , timeout
soup = BeautifulSoup(page_response.content,'lxml') # Transforming to BeautifulSoup object
table = soup.find_all('table')[0] # Filtering the html data table
df = pd.read_html(str(table))[0] # Transforming data with pandas
# Value Filtering
na = df['Borough'] != 'Not assigned' 
df_na = df[na] 
new_data = df_na
new_data['Neighborhood'] = new_data['Neighborhood'].str.replace('/',',')
new_data  =  new_data.sort_values (['Postal code','Borough','Neighborhood'])
new_data.index = np.arange(0, len(new_data)) # changing start index
print("Size: ",new_data.shape)
new_data.head() # Head After

Size:  (103, 3)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Postal code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern , Rouge"
1,M1C,Scarborough,"Rouge Hill , Port Union , Highland Creek"
2,M1E,Scarborough,"Guildwood , Morningside , West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<a href="https://github.com/Azhura/Coursera_Capstone/blob/master/Project02b.ipynb">Notebook 2: Geolocation</a><br>

In [17]:
print("Load Complete: new_data02 = Geospatial_Coordinates.csv")
print("size: ",new_data02.shape)
new_data02.head()

Load Complete: new_data02 = Geospatial_Coordinates.csv
size:  (103, 5)


Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern , Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill , Port Union , Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood , Morningside , West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<a id="ref2"></a>
# 2. GeoData Exploration
Exploring the frequencies of the boroughs

In [18]:
new_data02.groupby('Borough').count()

Unnamed: 0_level_0,Postal code,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Central Toronto,9,9,9,9
Downtown Toronto,19,19,19,19
East Toronto,5,5,5,5
East York,5,5,5,5
Etobicoke,12,12,12,12
Mississauga,1,1,1,1
North York,24,24,24,24
Scarborough,17,17,17,17
West Toronto,6,6,6,6
York,5,5,5,5


In [19]:
print('There are {} uniques categories.'.format(len(new_data02['Borough'].unique())))

There are 10 uniques categories.


**Working only with boroughs that contain the word Toronto.**

In [20]:
Central_Toronto = new_data02[(new_data02['Borough']=='Central Toronto')]
Downtown_Toronto = new_data02[(new_data02['Borough']=='Downtown Toronto')]
East_Toronto = new_data02[(new_data02['Borough']=='East Toronto')]
West_Toronto = new_data02[(new_data02['Borough']=='West Toronto')]
frames = [Central_Toronto, Downtown_Toronto, East_Toronto,West_Toronto ]
result = pd.concat(frames)
result.index = np.arange(0, len(result)) # changing index
print("Size: ",result.shape)
result

Size:  (39, 5)


Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park , Summerhill East",43.689574,-79.38316
5,M4V,Central Toronto,"Summerhill West , Rathnelly , South Hill , For...",43.686412,-79.400049
6,M5N,Central Toronto,Roselawn,43.711695,-79.416936
7,M5P,Central Toronto,Forest Hill North & West,43.696948,-79.411307
8,M5R,Central Toronto,"The Annex , North Midtown , Yorkville",43.67271,-79.405678
9,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529


In [21]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(result['Borough'].unique()),
        result.shape[0]
    )
)

The dataframe has 4 boroughs and 39 neighborhoods.


## Creating a map 
Transforming a direction into latitude and longitude.

In [22]:
address = 'Toronto, ON, Canada'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


## Drawing the chosen boroughs

In [23]:
# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, postalcode, borough, neighborhood in zip(result['Latitude'], 
                                                       result['Longitude'],
                                                       result['Postal code'], 
                                                       result['Borough'], 
                                                       result['Neighborhood']):
    label = '{}, {}'.format(neighborhood, postalcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

<a id="ref3"></a>
# 3. Group selection
Selecting only the western boroughs

In [24]:
West_Toronto_data = result[result['Borough'] == 'West Toronto'].reset_index(drop=True)
West_Toronto_data.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M6H,West Toronto,"Dufferin , Dovercourt Village",43.669005,-79.442259
1,M6J,West Toronto,"Little Portugal , Trinity",43.647927,-79.41975
2,M6K,West Toronto,"Brockton , Parkdale Village , Exhibition Place",43.636847,-79.428191
3,M6P,West Toronto,"High Park , The Junction South",43.661608,-79.464763
4,M6R,West Toronto,"Parkdale , Roncesvalles",43.64896,-79.456325


### Drawing group on city map
Drawing only the western boroughs

In [25]:
neighborhood_latitude = West_Toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = West_Toronto_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = West_Toronto_data.loc[0, 'Neighborhood'] # neighborhood name
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(West_Toronto_data['Latitude'], West_Toronto_data['Longitude'], West_Toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Let's explore the first neighborhood in our dataframe.**

In [26]:
# West_Toronto_data.loc[0, 'Neighborhood']
neighborhood_latitude = West_Toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = West_Toronto_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = West_Toronto_data.loc[0, 'Neighborhood'] # neighborhood name
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Dufferin , Dovercourt Village are 43.66900510000001, -79.4422593.


**Now, let's get the top 100 venues that are in Dufferin and Dovercourt Village within a radius of 500 meters.**

In [27]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()

**Now we are ready to clean the json and structure it into a pandas dataframe.**

In [28]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print("size: ",nearby_venues.shape)
nearby_venues.transpose()

size:  (16, 4)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
name,The Greater Good Bar,Parallel,Planet Fitness,Blood Brothers Brewing,FreshCo,Happy Bakery & Pastries,Rehearsal Factory,Nova Era Bakery,The Sovereign,TD Canada Trust,Food Basics,Rexall,Shoppers Drug Mart,Wallace Emerson Park,Batl Backyard Axe Throwing League,241 Pizza
categories,Bar,Middle Eastern Restaurant,Gym / Fitness Center,Brewery,Grocery Store,Bakery,Music Venue,Bakery,Café,Bank,Supermarket,Pharmacy,Pharmacy,Park,Athletics & Sports,Pizza Place
lat,43.6694,43.6695,43.6676,43.6699,43.6679,43.6671,43.6689,43.6699,43.6731,43.6679,43.6669,43.6675,43.6667,43.6669,43.667,43.6729
lng,-79.4393,-79.4387,-79.4426,-79.4365,-79.4408,-79.4418,-79.4436,-79.4376,-79.4403,-79.4417,-79.4467,-79.4421,-79.4474,-79.4394,-79.4431,-79.441


In [29]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

16 venues were returned by Foursquare.


In [30]:
toronto_venues = getNearbyVenues(names=West_Toronto_data['Neighborhood'],
                                   latitudes=West_Toronto_data['Latitude'],
                                   longitudes=West_Toronto_data['Longitude']
                                  )
print(toronto_venues.shape)
toronto_venues.head()

Dufferin , Dovercourt Village
Little Portugal , Trinity
Brockton , Parkdale Village , Exhibition Place
High Park , The Junction South
Parkdale , Roncesvalles
Runnymede , Swansea
(163, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Dufferin , Dovercourt Village",43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar
1,"Dufferin , Dovercourt Village",43.669005,-79.442259,Parallel,43.669516,-79.438728,Middle Eastern Restaurant
2,"Dufferin , Dovercourt Village",43.669005,-79.442259,Planet Fitness,43.667588,-79.442574,Gym / Fitness Center
3,"Dufferin , Dovercourt Village",43.669005,-79.442259,Blood Brothers Brewing,43.669944,-79.436533,Brewery
4,"Dufferin , Dovercourt Village",43.669005,-79.442259,FreshCo,43.667918,-79.440754,Grocery Store


In [31]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Brockton , Parkdale Village , Exhibition Place",23,23,23,23,23,23
"Dufferin , Dovercourt Village",16,16,16,16,16,16
"High Park , The Junction South",24,24,24,24,24,24
"Little Portugal , Trinity",43,43,43,43,43,43
"Parkdale , Roncesvalles",14,14,14,14,14,14
"Runnymede , Swansea",43,43,43,43,43,43


In [32]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 85 uniques categories.


**Analyze Each Neighborhood**

In [33]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
print(toronto_onehot.shape)
toronto_onehot.tail()

(163, 86)


Unnamed: 0,Neighborhood,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Beer Store,...,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
158,"Runnymede , Swansea",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
159,"Runnymede , Swansea",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
160,"Runnymede , Swansea",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
161,"Runnymede , Swansea",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
162,"Runnymede , Swansea",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped

(6, 86)


Unnamed: 0,Neighborhood,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Beer Store,...,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Brockton , Parkdale Village , Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Dufferin , Dovercourt Village",0.0,0.0,0.0,0.0,0.0625,0.125,0.0625,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"High Park , The Junction South",0.041667,0.0,0.041667,0.0,0.0,0.041667,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0
3,"Little Portugal , Trinity",0.0,0.023256,0.0,0.046512,0.0,0.0,0.0,0.116279,0.023256,...,0.0,0.023256,0.0,0.0,0.0,0.023256,0.046512,0.023256,0.023256,0.023256
4,"Parkdale , Roncesvalles",0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Runnymede , Swansea",0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,...,0.046512,0.0,0.023256,0.023256,0.0,0.0,0.023256,0.0,0.0,0.023256


## Top 5 venues
Exploring the first 5 most visited places and setting up frequency tables.

In [35]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Brockton , Parkdale Village , Exhibition Place----
            venue  freq
0            Café  0.13
1       Nightclub  0.09
2     Coffee Shop  0.09
3  Breakfast Spot  0.09
4             Gym  0.04


----Dufferin , Dovercourt Village----
           venue  freq
0         Bakery  0.12
1       Pharmacy  0.12
2        Brewery  0.06
3  Grocery Store  0.06
4           Park  0.06


----High Park , The Junction South----
                venue  freq
0                Café  0.08
1  Mexican Restaurant  0.08
2                 Bar  0.08
3     Thai Restaurant  0.08
4        Antique Shop  0.04


----Little Portugal , Trinity----
                           venue  freq
0                            Bar  0.12
1                     Restaurant  0.07
2                    Coffee Shop  0.05
3               Asian Restaurant  0.05
4  Vegetarian / Vegan Restaurant  0.05


----Parkdale , Roncesvalles----
              venue  freq
0         Gift Shop  0.14
1     Movie Theater  0.07
2  Cuban Restaurant  0.07
3     

## Top 10 venues
Exploring the most visited places and based on their frequencies by assembling a top 10 in a data frame..

In [36]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']
for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head(6)

(6, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Brockton , Parkdale Village , Exhibition Place",Café,Breakfast Spot,Coffee Shop,Nightclub,Gym,Grocery Store,Intersection,Italian Restaurant,Convenience Store,Performing Arts Venue
1,"Dufferin , Dovercourt Village",Pharmacy,Bakery,Park,Gym / Fitness Center,Music Venue,Pizza Place,Café,Brewery,Supermarket,Middle Eastern Restaurant
2,"High Park , The Junction South",Café,Thai Restaurant,Bar,Mexican Restaurant,Antique Shop,Furniture / Home Store,Grocery Store,Fried Chicken Joint,Flea Market,Italian Restaurant
3,"Little Portugal , Trinity",Bar,Restaurant,Café,Vegetarian / Vegan Restaurant,Men's Store,Coffee Shop,Asian Restaurant,Miscellaneous Shop,Cocktail Bar,Korean Restaurant
4,"Parkdale , Roncesvalles",Gift Shop,Coffee Shop,Italian Restaurant,Dessert Shop,Movie Theater,Dog Run,Eastern European Restaurant,Restaurant,Breakfast Spot,Bookstore
5,"Runnymede , Swansea",Café,Coffee Shop,Pizza Place,Pub,Restaurant,Italian Restaurant,Sushi Restaurant,Ice Cream Shop,Latin American Restaurant,Juice Bar


<a id="ref4"></a>
# 4. Cluster Neighborhoods

In [37]:
# set number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
# kmeans.labels_[0:10] 
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = West_Toronto_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M6H,West Toronto,"Dufferin , Dovercourt Village",43.669005,-79.442259,2,Pharmacy,Bakery,Park,Gym / Fitness Center,Music Venue,Pizza Place,Café,Brewery,Supermarket,Middle Eastern Restaurant
1,M6J,West Toronto,"Little Portugal , Trinity",43.647927,-79.41975,1,Bar,Restaurant,Café,Vegetarian / Vegan Restaurant,Men's Store,Coffee Shop,Asian Restaurant,Miscellaneous Shop,Cocktail Bar,Korean Restaurant
2,M6K,West Toronto,"Brockton , Parkdale Village , Exhibition Place",43.636847,-79.428191,3,Café,Breakfast Spot,Coffee Shop,Nightclub,Gym,Grocery Store,Intersection,Italian Restaurant,Convenience Store,Performing Arts Venue
3,M6P,West Toronto,"High Park , The Junction South",43.661608,-79.464763,4,Café,Thai Restaurant,Bar,Mexican Restaurant,Antique Shop,Furniture / Home Store,Grocery Store,Fried Chicken Joint,Flea Market,Italian Restaurant
4,M6R,West Toronto,"Parkdale , Roncesvalles",43.64896,-79.456325,0,Gift Shop,Coffee Shop,Italian Restaurant,Dessert Shop,Movie Theater,Dog Run,Eastern European Restaurant,Restaurant,Breakfast Spot,Bookstore
5,M6S,West Toronto,"Runnymede , Swansea",43.651571,-79.48445,1,Café,Coffee Shop,Pizza Place,Pub,Restaurant,Italian Restaurant,Sushi Restaurant,Ice Cream Shop,Latin American Restaurant,Juice Bar


## Cluster Map

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

#### Cluster 1

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,West Toronto,0,Gift Shop,Coffee Shop,Italian Restaurant,Dessert Shop,Movie Theater,Dog Run,Eastern European Restaurant,Restaurant,Breakfast Spot,Bookstore


#### Cluster 2

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,West Toronto,1,Bar,Restaurant,Café,Vegetarian / Vegan Restaurant,Men's Store,Coffee Shop,Asian Restaurant,Miscellaneous Shop,Cocktail Bar,Korean Restaurant
5,West Toronto,1,Café,Coffee Shop,Pizza Place,Pub,Restaurant,Italian Restaurant,Sushi Restaurant,Ice Cream Shop,Latin American Restaurant,Juice Bar


#### Cluster 3

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Toronto,2,Pharmacy,Bakery,Park,Gym / Fitness Center,Music Venue,Pizza Place,Café,Brewery,Supermarket,Middle Eastern Restaurant


#### Cluster 4

In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,West Toronto,3,Café,Breakfast Spot,Coffee Shop,Nightclub,Gym,Grocery Store,Intersection,Italian Restaurant,Convenience Store,Performing Arts Venue


#### Cluster 5

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,West Toronto,4,Café,Thai Restaurant,Bar,Mexican Restaurant,Antique Shop,Furniture / Home Store,Grocery Store,Fried Chicken Joint,Flea Market,Italian Restaurant


<a id="ref5"></a>
# 5. Conclusion

The choice of the group to be explored was made by taking the districts containing the word toronto and then reducing the boroughs in the west.   
The accuracy of the information studied could be improved by better defining the objectives.   

A - We can conclude that the most frequent places by comparison in groups 2, 4 and 5 are the cafeterias.   
This does not imply that people prefer to drink coffee, but that the cafeterias are very crowded, as well as the places where people eat.   

B - In group 3 we notice very crowded categories that are not found in the rest, so they are closer to the park like the pharmacy, the bakery and the gym.   

C - As minimum frequencies we find the gyms and antique shops.   


<a id="ref6"></a>
# 6. Resources
<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><a href="https://www.coursera.org/learn/applied-data-science-capstone">Applied Data Science Capstone</a></li>
    <li><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">Beautiful Soup Documentation</a></li>
    <li><a href="https://requests.readthedocs.io/en/master/">Request Documentation</a></li>
    <li><a href="https://pandas.pydata.org/docs/">Pandas Documentation</a></li>
    <li><a href="https://numpy.org/doc/">NumPy Documentation</a></li>
    <li><a href="https://github.com/Azhura/Coursera_Capstone">Labs</a></li>
</ol>
</div>

<div class="alert alert-block alert-info" style="margin-top: 20px">Link´s to the notebooks</div><br>
<a href="https://github.com/Azhura/Coursera_Capstone/blob/master/Project02a.ipynb">Github - Notebook 1: Data Wrangling</a><br>
<a href="https://github.com/Azhura/Coursera_Capstone/blob/master/Project02b.ipynb">Github - Notebook 2: Geolocation</a><br>
<a href="https://github.com/Azhura/Coursera_Capstone/blob/master/Project02c.ipynb">Github - Notebook 3: GeoData Exploration</a>

<a href="https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/d017ce96-2b8f-41fa-8773-30b2e775c682/view?access_token=bb0eef0b23c151212374a5663c38a1abe998f35f6bffe18ea841cd80479c272b">Display with map - Notebook 3: GeoData Exploration</a>

This notebook was created by [Carlos Alberto Gómez Prado](https://www.linkedin.com/in/carlospradobigdata/), as an assignment for the IBM coursera course.   

This notebook is part of a course on **Coursera** called *Applied Data Science Capstone*. If you accessed this notebook outside the course, you can take this course online by clicking [here](https://www.coursera.org/professional-certificates/ibm-data-science).   
Copyright &copy; 2018 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).

---