# Agrupación y Segmentación de Vecindarios en la Ciudad de Toronto, Canadá

## ---------------------------------------------------------------------------------------------------------------------------------------------
## Primera Parte

In [60]:
import pandas as pd # librería para análisis de datos
import requests # librería para manejar solicitudes
from pandas.io.json import json_normalize # librería para convertir un archivo json en un dataframe pandas
import json # librería para manejar archivos JSON 
import numpy as np # librería para manejar datos vectorizados
# Matplotlib y módulos asociados para graficar
import matplotlib.cm as cm
import matplotlib.colors as colors

# importar k-means desde la fase de agrupación
from sklearn.cluster import KMeans


!pip3 install beautifulsoup4
from bs4 import BeautifulSoup


print('Libraries imported.')

Libraries imported.


In [61]:
url='https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1008658788'
result = requests.get(url)
data_html = BeautifulSoup(result.content)

soup = BeautifulSoup(str(data_html))
neigh = soup.find('table')
table_str = str(neigh.extract())

df = pd.read_html(table_str)[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


-------------------------------------------------------------------------------------------------

Procese únicamente las celdas que tengan un municipio asignado. Ignore las celdas con un municipio que esté No asignado.

In [62]:
df = df[df.Borough != 'Not assigned'].reset_index(drop=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


-------------------------------------------------------------------------------------------------

Puede existir más de un vecindario en un área de código postal. Por ejemplo, en la tabla de la página de Wikipedia, notará que M5A aparece dos veces y tiene dos vecindarios: Harbourfront y Regent Park. Estas dos filas se combinarán en una fila con los vecindarios separados con una coma

In [63]:
df_grouped = df.groupby(['Postal Code','Borough'], as_index=False).agg(lambda x:','.join(x))
df_grouped.head(20)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [64]:
df_grouped.shape

(103, 3)

## -----------------------------------------------------------------------------------------------------------------------------------


## Segunda parte

In [65]:
# download the data of latitude and longitude: link provided by >>>>coursera
!wget http://cocl.us/Geospatial_data
latlon = pd.read_csv('Geospatial_data')
df_grouped = pd.merge(df_grouped, latlon, how= 'inner', on = 'Postal Code')
    
print(df_grouped.shape)
df_grouped.head(10)

URL transformed to HTTPS due to an HSTS policy
--2022-05-11 11:53:19--  https://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 69.192.218.123, 69.192.218.208, 2600:140e:6::17c8:914b, ...
Connecting to cocl.us (cocl.us)|69.192.218.123|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2022-05-11 11:53:21--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 103.116.4.197
Connecting to ibm.box.com (ibm.box.com)|103.116.4.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2022-05-11 11:53:22--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ib

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## -----------------------------------------------------------------------------------------------------------------------------------

## Tercera parte

In [66]:
print('The dataframe has {} boroughs and {} neighbourhood.'.format(
        len(df_grouped['Borough'].unique()),
        df_grouped.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighbourhood.


In [67]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Mapa de Toronto con los barrios super puestos por encima.

In [68]:
!conda install -c conda-forge folium=0.5.0 --yes 

import folium # librería para graficar mapas 

/bin/bash: conda: command not found


In [69]:
# crear un mapa de Toronto utilizando los valores de latitud y longitud
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# añadir marcadores al mapa
for lat, lng, borough, Neighbourhood in zip(df_grouped['Latitude'], df_grouped['Longitude'], df_grouped['Borough'], df_grouped['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Explorando el barrio Downtown Toronto

In [70]:
downtown_toronto = df_grouped[df_grouped['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
print(downtown_toronto.shape)
downtown_toronto.head()

(19, 5)


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [71]:
address = 'Downtown Toronto ,Toronto, Ontario'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_downtown = folium.Map(location=[latitude, longitude], zoom_start= 11)

# add markers to map
for lat, lng, borough, neighbuorhood in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], 
                                           downtown_toronto['Borough'], downtown_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

In [72]:
#@title 
APIKEY: "fsq3LGh4JN/ev5HFQc3adfp/JwQMRPX3nFDTZPhwGIdtyFA"

 Exploremos el primer barrio del dataframe.

In [73]:
#define the latitude and longitude using above dataframe
lat = downtown_toronto.loc[0, 'Latitude'] # neighbourhood latitude value
lon = downtown_toronto.loc[0, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = downtown_toronto.loc[0, 'Neighbourhood'] # neighbourhood name
print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, lat, lon))


Latitude and longitude values of Rosedale are 43.6795626, -79.3775294.


Obtengamos los 50 sitios en Rosedale dentro de un radio de 500 metros.

In [74]:
url = "https://api.foursquare.com/v3/places/search?ll=43.67%2C-79.37&radius=500&limit=50"

headers = {
    "Accept": "application/json",
    "Authorization": "fsq3cOLGmaKNDeZ5oyVzwhYI0QvLXsKNAH+CtgNIGfA1TUk="
}

#results = requests.get(url, headers=headers)
#results=results.json()
results = requests.get(url, headers=headers).json()

In [75]:
# función para extraer la categoria del sitio
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [76]:
venues = results['results']
    
nearby_venues = json_normalize(venues) # objeto JSON

# filtrar columnas
filtered_columns = ['name', 'categories','geocodes.main.latitude', 'geocodes.main.longitude']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filtrar la categoría para cada fila
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# limpiar columnas
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,latitude,longitude,categories.1
0,Red Cranberries Restaurant,"[{'id': 13026, 'name': 'BBQ Joint', 'icon': {'...",43.667842,-79.369195,BBQ Joint
1,F'Amelia,"[{'id': 13064, 'name': 'Pizzeria', 'icon': {'p...",43.667484,-79.368718,Pizzeria
2,Absolute Bakery & Cafe,"[{'id': 13002, 'name': 'Bakery', 'icon': {'pre...",43.667516,-79.36905,Bakery
3,Wellesley Parliament Square,"[{'id': 16041, 'name': 'Plaza', 'icon': {'pref...",43.668494,-79.370281,Plaza
4,Butter Chicken Factory,"[{'id': 13055, 'name': 'Fried Chicken Joint', ...",43.666889,-79.369184,Fried Chicken Joint


In [77]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

33 venues were returned by Foursquare.


### Explorar Barrios en Toronto

In [78]:
# @hidden.cell
headers = {"Accept": "application/json", "Authorization": "fsq3cOLGmaKNDeZ5oyVzwhYI0QvLXsKNAH+CtgNIGfA1TUk="}


In [79]:
df_grouped['Latitude']= pd.Series([round(val,2) for val in df_grouped['Latitude']])
df_grouped['Longitude']= pd.Series([round(val,2) for val in df_grouped['Longitude']])
df_grouped.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.81,-79.19
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.78,-79.16
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76,-79.19
3,M1G,Scarborough,Woburn,43.77,-79.22
4,M1H,Scarborough,Cedarbrae,43.77,-79.24


In [80]:
# define the function
def getNearbyVenues(names, latitudes, longitudes, radius=500,limit=5):

    URL= "https://api.foursquare.com/v3/places/nearby?ll={},{}&radius={}&limit={}"

    df_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = URL.format(lat, lng, radius, limit)
        results = requests.request("GET", url, headers=headers).json()
        
        for each_result in results['results']: # filter the result based on JSON identification
            result={}
            result['Neighbourhood']=name
            result['Neighbourhood Latitude']=lat
            result['Neighbourhood Longitude']=lng
            result['Name']=each_result['name']
            result['Venue Latitude']=each_result['geocodes']['main']['latitude']
            result['Venue Longitude']=each_result['geocodes']['main']['longitude']
            result['Category_Names']=[each_name['name'] for each_name in each_result['categories']]
            df_list.append(result.copy())
    return pd.DataFrame(df_list) # return dataframe

In [81]:
toronto_venues = getNearbyVenues(names=df_grouped['Neighbourhood'],
                                   latitudes=df_grouped['Latitude'],
                                   longitudes=df_grouped['Longitude']
                                )

In [82]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Name,Venue Latitude,Venue Longitude,Category_Names
0,"Malvern, Rouge",43.81,-79.19,Upper Rouge Trail,43.810019,-79.186288,[Hiking Trail]
1,"Malvern, Rouge",43.81,-79.19,Canadian Appliance Source Whitby,43.808353,-79.191331,[Home Service]
2,"Malvern, Rouge",43.81,-79.19,Scarsview Chrysler Dodge Jeep Ram Fiat,43.800567,-79.189605,[Automotive Retail]
3,"Rouge Hill, Port Union, Highland Creek",43.78,-79.16,Colonel Danforth Park,43.777136,-79.16488,[Playground]
4,"Rouge Hill, Port Union, Highland Creek",43.78,-79.16,Royal Canadian Legion,43.782283,-79.162816,[Government Department / Agency]


In [104]:
toronto_venues["Category_Names"]=toronto_venues["Category_Names"].apply(str)
toronto_venues.dtypes

Neighbourhood               object
Neighbourhood Latitude     float64
Neighbourhood Longitude    float64
Name                        object
Venue Latitude             float64
Venue Longitude            float64
Category_Names              object
dtype: object

In [84]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Name,Venue Latitude,Venue Longitude,Category_Names
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",5,5,5,5,5,5
"Bathurst Manor, Wilson Heights, Downsview North",5,5,5,5,5,5
Bayview Village,5,5,5,5,5,5
"Bedford Park, Lawrence Manor East",5,5,5,5,5,5
...,...,...,...,...,...,...
"Willowdale, Willowdale West",4,4,4,4,4,4
Woburn,3,3,3,3,3,3
Woodbine Heights,3,3,3,3,3,3
York Mills West,4,4,4,4,4,4


In [85]:
print('There are {} uniques categories.'.format(len(toronto_venues['Category_Names'].unique())))


There are 186 uniques categories.


## Analizando los barrios

In [86]:
# codificación
toronto_onehot = pd.get_dummies(toronto_venues[['Category_Names']], prefix="", prefix_sep="")

# añadir la columna de barrio de regreso al dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# mover la columna de barrio a la primer columna
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,"[""Doctor's Office""]","[""Farmers' Market""]",['Art Studio'],"['Arts and Entertainment', 'Restaurant']",['Arts and Entertainment'],"['Automotive Repair Shop', 'Car Wash and Detail']","['Automotive Repair Shop', 'Towing Service']",['Automotive Retail'],['Automotive Service'],...,['Transport Hub'],['Travel Agency'],['Turkish Restaurant'],['Tutoring Service'],['Urban Park'],['Veterinarian'],['Warehouse / Wholesale Store'],['Wholesaler'],['Winery'],[]
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Malvern, Rouge",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
3,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [87]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,"[""Doctor's Office""]","[""Farmers' Market""]",['Art Studio'],"['Arts and Entertainment', 'Restaurant']",['Arts and Entertainment'],"['Automotive Repair Shop', 'Car Wash and Detail']","['Automotive Repair Shop', 'Towing Service']",['Automotive Retail'],['Automotive Service'],...,['Transport Hub'],['Travel Agency'],['Turkish Restaurant'],['Tutoring Service'],['Urban Park'],['Veterinarian'],['Warehouse / Wholesale Store'],['Wholesaler'],['Winery'],[]
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.2,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.2,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.2,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.00,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
95,Woburn,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
96,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
97,York Mills West,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0


## Los 5 lugares más comunes de cada barrio

In [88]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0       ['Department Store']   0.4
1  ['Sporting Goods Retail']   0.2
2             ['Wholesaler']   0.2
3     ['Automotive Service']   0.2
4         ['Medical Center']   0.0


----Alderwood, Long Branch----
                                             venue  freq
0                    ['Car Parts and Accessories']   0.2
1                                   ['Art Studio']   0.2
2           ['Business and Professional Services']   0.2
3  ['Restaurant', 'Cafes, Coffee, and Tea Houses']   0.2
4                                  ['Sports Club']   0.2


----Bathurst Manor, Wilson Heights, Downsview North----
                   venue  freq
0   ['Spiritual Center']   0.2
1           ['Bus Stop']   0.2
2  ['Healthcare Clinic']   0.2
3        ['High School']   0.2
4               ['Park']   0.2


----Bayview Village----
                             venue  freq
0  ['Grocery Store / Supermarket']   0.2
1             ['Community Center']   0.2
2   

In [89]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [90]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,['Department Store'],['Sporting Goods Retail'],['Wholesaler'],['Automotive Service'],['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
1,"Alderwood, Long Branch",['Car Parts and Accessories'],['Art Studio'],['Business and Professional Services'],"['Restaurant', 'Cafes, Coffee, and Tea Houses']",['Sports Club'],"[""Doctor's Office""]",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
2,"Bathurst Manor, Wilson Heights, Downsview North",['Spiritual Center'],['Bus Stop'],['Healthcare Clinic'],['High School'],['Park'],['Office Supply Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
3,Bayview Village,['Grocery Store / Supermarket'],['Community Center'],['Shopping Mall'],['Bike Trail'],['Tutoring Service'],['Office Building'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School']
4,"Bedford Park, Lawrence Manor East",['Coffee Shop'],['Boutique'],"['Clothing Store', ""Women's Store""]",['Gourmet Store'],['Dentist'],['Office'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']


## Agrupando Barrios

In [91]:
# establecer el número de agrupaciones
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
kmeans.labels_[0:10] 

  after removing the cwd from sys.path.


array([2, 2, 4, 2, 3, 2, 2, 4, 2, 2], dtype=int32)

In [92]:

toronto_merged = df_grouped

toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.81,-79.19
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.78,-79.16
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76,-79.19
3,M1G,Scarborough,Woburn,43.77,-79.22
4,M1H,Scarborough,Cedarbrae,43.77,-79.24


In [95]:
# añadir etiquetas
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_grouped

toronto_merged.head()
# juntar toronto_grouped con toronto_data 
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # revisar las ultimas columnas

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.81,-79.19,2,['Automotive Retail'],['Hiking Trail'],['Home Service'],"[""Doctor's Office""]",['Office'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.78,-79.16,2,['Playground'],['Daycare'],['Arts and Entertainment'],['Government Department / Agency'],['Nail Salon'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],"['Night Club', 'Lounge', 'Restaurant']"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76,-79.19,0,['Elementary School'],['High School'],['Daycare'],['Car Parts and Accessories'],"['Pet Service', 'Pet Supplies Store']",['Party Supply Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
3,M1G,Scarborough,Woburn,43.77,-79.22,3,['Coffee Shop'],['Convenience Store'],['Office'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']",['Office Building']
4,M1H,Scarborough,Cedarbrae,43.77,-79.24,0,[],"['Lounge', 'Caribbean Restaurant']",['Burger Joint'],['Elementary School'],['Hardware Store'],"['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']


In [96]:
# crear mapa
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examinando las agrupaciones

Agrupación 1

In [97]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,0,['Elementary School'],['High School'],['Daycare'],['Car Parts and Accessories'],"['Pet Service', 'Pet Supplies Store']",['Party Supply Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
4,Scarborough,0,[],"['Lounge', 'Caribbean Restaurant']",['Burger Joint'],['Elementary School'],['Hardware Store'],"['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
11,Scarborough,0,['Elementary School'],['Furniture and Home Store'],"['Fast Food Restaurant', 'Pizzeria']",['Burger Joint'],"['Pet Service', 'Pet Supplies Store']","['Night Club', 'Lounge', 'Restaurant']",['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],"['Pizzeria', 'Fast Food Restaurant']"
14,Scarborough,0,['Elementary School'],['Drugstore'],['Travel Agency'],"[""Doctor's Office""]",['Office Building'],['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School']
25,North York,0,['Elementary School'],['Public Art'],['Education'],['Office Building'],['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
40,East York,0,[],['Filipino Restaurant'],['Elementary School'],['Harbor / Marina'],['Hardware Store'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
45,Central Toronto,0,['Elementary School'],"['Property Management Office', 'Residential Bu...",['Swimming Pool'],['Hotel'],['Party Supply Store'],"['Pet Service', 'Pet Supplies Store']",['Mediterranean Restaurant'],['Metro Station'],['Pier'],['Middle Eastern Restaurant']
64,Central Toronto,0,['Elementary School'],['Fuel Station'],['Office Supply Store'],['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
74,York,0,['Elementary School'],['Clothing Store'],['Park'],['Bakery'],"['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
91,Etobicoke,0,['Elementary School'],['Dentist'],['Playground'],['Harbor / Marina'],['Hardware Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']


Agrupación 2

In [98]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,Downtown Toronto,1,['Office'],['Convenience Store'],['Residential Building'],"['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
55,Downtown Toronto,1,['Office'],['Hotel'],"['Pub', 'Restaurant']","['Bagel Shop', 'Coffee Shop', 'Restaurant']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
58,Downtown Toronto,1,['Office'],['Hotel'],"['Pub', 'Restaurant']","['Bagel Shop', 'Coffee Shop', 'Restaurant']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
60,Downtown Toronto,1,['Office'],['Hotel'],"['Pub', 'Restaurant']","['Bagel Shop', 'Coffee Shop', 'Restaurant']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
61,Downtown Toronto,1,['Office'],['Hotel'],"['Pub', 'Restaurant']","['Bagel Shop', 'Coffee Shop', 'Restaurant']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
70,Downtown Toronto,1,['Office'],['Hotel'],"['Pub', 'Restaurant']","['Bagel Shop', 'Coffee Shop', 'Restaurant']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"


Agrupación 3

In [99]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,2,['Automotive Retail'],['Hiking Trail'],['Home Service'],"[""Doctor's Office""]",['Office'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
1,Scarborough,2,['Playground'],['Daycare'],['Arts and Entertainment'],['Government Department / Agency'],['Nail Salon'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],"['Night Club', 'Lounge', 'Restaurant']"
5,Scarborough,2,"['Fast Food Restaurant', 'Burger Joint']",['Primary and Secondary School'],"['Café', 'Coffee Shop', 'Donut Shop']",['Convenience Store'],['BBQ Joint'],"[""Doctor's Office""]","['Office', 'Office Building']",['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
6,Scarborough,2,['Government Department / Agency'],['Restaurant'],"['Restaurant', 'Cafes, Coffee, and Tea Houses']","[""Doctor's Office""]",['Office Building'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
7,Scarborough,2,['Playground'],['Metro Station'],['Office Supply Store'],['Medical Center'],['Mediterranean Restaurant'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
8,Scarborough,2,['Chiropractor'],['Cemetery'],['Residential Building'],"[""Doctor's Office""]","['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
9,Scarborough,2,['Transmissions Shop'],['Arts and Entertainment'],"['Automotive Repair Shop', 'Towing Service']","['Café', 'Restaurant']",['Beach'],"['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
10,Scarborough,2,['Winery'],"['Brewery', 'Restaurant']",['Caterer'],['Design Studio'],"[""Doctor's Office""]",['Office Supply Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
12,Scarborough,2,['Department Store'],['Sporting Goods Retail'],['Wholesaler'],['Automotive Service'],['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
13,Scarborough,2,"['Fast Food Restaurant', 'American Restaurant']",['Healthcare Clinic'],['Chinese Restaurant'],['Thai Restaurant'],"[""Doctor's Office""]","['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']


Agrupación 4

In [100]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]



Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,3,['Coffee Shop'],['Convenience Store'],['Office'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']",['Office Building']
38,East York,3,['Coffee Shop'],['Sporting Goods Retail'],['Liquor Store'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
41,East Toronto,3,['Coffee Shop'],"['Greek Restaurant', 'Mediterranean Restaurant']",['Bakery'],['Bridge'],['Hardware Store'],"['Health and Medicine', 'Drugstore']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
43,East Toronto,3,['Coffee Shop'],"['Coffee Shop', 'Restaurant']",['Tattoo Parlor'],['Education'],"['Interior Designer', 'Painter']","['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
57,Downtown Toronto,3,['Coffee Shop'],['Dentist'],['Metro Station'],['Restaurant'],['Hospital'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
62,North York,3,['Coffee Shop'],['Boutique'],"['Clothing Store', ""Women's Store""]",['Gourmet Store'],['Dentist'],['Office'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
81,York,3,['Coffee Shop'],['Discount Store'],['Fuel Station'],"['Office', 'Office Building']",['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
85,Downtown Toronto,3,['Coffee Shop'],['Dentist'],['Metro Station'],['Restaurant'],['Hospital'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon'],"['Night Club', 'Lounge', 'Restaurant']"
90,Etobicoke,3,['Coffee Shop'],['Metro Station'],['Nail Salon'],['Gastropub'],"['Office', 'Office Building']",['Mediterranean Restaurant'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],"['Night Club', 'Lounge', 'Restaurant']"


Agrupación 5

In [101]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,North York,4,['Education'],['Primary and Secondary School'],['Hiking Trail'],['Park'],['Office Supply Store'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
20,North York,4,['Park'],['Convenience Store'],['Rehabilitation Center'],"[""Doctor's Office""]",['Office Supply Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
22,North York,4,['Residential Building'],['Park'],"[""Doctor's Office""]",['Office Supply Store'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
23,North York,4,['Tennis'],['Veterinarian'],['Park'],['Convenience Store'],"[""Doctor's Office""]",['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
28,North York,4,['Spiritual Center'],['Bus Stop'],['Healthcare Clinic'],['High School'],['Park'],['Office Supply Store'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
34,North York,4,['Park'],['Other Great Outdoors'],['Hiking Trail'],"[""Doctor's Office""]",['Office Supply Store'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
48,Central Toronto,4,[],['Park'],['Office Supply Store'],['Medical Center'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
50,Downtown Toronto,4,['Park'],['Bike Trail'],"[""Doctor's Office""]",['Office Supply Store'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater'],['Nail Salon']
63,Central Toronto,4,[],['Restaurant'],['Park'],['Office Building'],['Playground'],['Hardware Store'],"['Office', 'Office Building']",['Middle Eastern Restaurant'],['Middle School'],['Harbor / Marina']
65,Central Toronto,4,['Park'],['Education'],"['Diner', 'Vegan and Vegetarian Restaurant']","[""Doctor's Office""]",['Office Building'],['Mediterranean Restaurant'],['Metro Station'],['Middle Eastern Restaurant'],['Middle School'],['Movie Theater']
