# Agrupación y Segmentación de Vecindarios en la Ciudad de Toronto, Canadá

Importación de las librerias

In [1]:
print("Inicio de importación")
import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
print("Importación Exitosa")

Inicio de importación
Importación Exitosa


## Primer parte

Creación de la tabla tomando los datos de Tornto desde Wikipedia

In [2]:
from IPython.display import IFrame
url = "https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=958425468"

Uso de Pandas para procesar los datos y crear la tabla

In [3]:
wiki_html = requests.get(url).text
soup = BeautifulSoup(wiki_html, 'html.parser')
data = []
for tr in soup.tbody.find_all('tr'):
    data.append([td.get_text().strip() for td in tr.find_all('td')])
dataFrame = pd.DataFrame(data,columns=['CodigoPostal','Ciudad','Barrio'])
dataFrame.head(12)

Unnamed: 0,CodigoPostal,Ciudad,Barrio
0,,,
1,M1A,Not assigned,
2,M2A,Not assigned,
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"
6,M6A,North York,"Lawrence Manor, Lawrence Heights"
7,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M8A,Not assigned,
9,M9A,Etobicoke,Islington Avenue


Borramos la primera linea vacía y limpiamos las ciudades no asignadas

In [4]:
df = dataFrame.dropna()
df['Ciudad'].replace('Not assigned',np.nan,inplace=True)
df.dropna(subset=['Ciudad'],inplace=True)
df.head(12)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Ciudad'].replace('Not assigned',np.nan,inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(subset=['Ciudad'],inplace=True)


Unnamed: 0,CodigoPostal,Ciudad,Barrio
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"
6,M6A,North York,"Lawrence Manor, Lawrence Heights"
7,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
9,M9A,Etobicoke,Islington Avenue
10,M1B,Scarborough,"Malvern, Rouge"
12,M3B,North York,Don Mills
13,M4B,East York,"Parkview Hill, Woodbine Gardens"
14,M5B,Downtown Toronto,"Garden District, Ryerson"


Agrego el nombre de la ciudad al barrio si el barrio no esta asignado pero tiene ciudad

In [5]:
for index, row in df.iterrows():
    if row["Barrio"] == "Not assigned":
        row["Barrio"] = row["Ciudad"]
df.head()

Unnamed: 0,CodigoPostal,Ciudad,Barrio
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"
6,M6A,North York,"Lawrence Manor, Lawrence Heights"
7,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Agrupo el conjunto de datos por codigo postal y ciudad

In [6]:
df = df.groupby(['CodigoPostal', 'Ciudad',])['Barrio'].apply(', '.join).reset_index()
df.columns = ['Codigo Postal', 'Ciudad', 'Barrio']
df.head(50)

Unnamed: 0,Codigo Postal,Ciudad,Barrio
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [7]:
df.shape

(103, 3)

## Segunda parte

Para obtener los resultados de las coordenadas utilizaremos el archivo .csv

In [8]:
geospatial_data = pd.read_csv('https://cocl.us/Geospatial_data')

Verificamos que el tamaño de la colección de datos obtenidos es igual al del conjunto procesado anterior

In [9]:
geospatial_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
print("Conjunto de datos procesado anterior",df.shape)
print("Geopatial data ", geospatial_data.shape)

Conjunto de datos procesado anterior (103, 3)
Geopatial data  (103, 3)


Coinciden ambas por lo tanto se procesó la misma cantidad

Agregaremos la información de las coordenadas al conjunto de datos de codigos postales

In [11]:
#combinaremos las columnas utilizando un join por la clave codigo postal
df = pd.concat(
    [df.set_index('Codigo Postal'), geospatial_data.set_index('Postal Code')],
    axis=1, join='inner')
df.head()

Unnamed: 0,Ciudad,Barrio,Latitude,Longitude
M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
M1G,Scarborough,Woburn,43.770992,-79.216917
M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Se perdió el nombre de la columna Codigo Postal ya que se utilizo como clave, por lo que corregimos agregando nuevamente el nombre de la columna

In [12]:
df.reset_index(inplace=True)
df.head()
df.rename(columns={'index': 'Codigo_Postal'}, inplace=True)
df.head(12)

Unnamed: 0,Codigo_Postal,Ciudad,Barrio,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Parte 3 

Se obtiene los datos de longitud y latitud utilizando geopy

In [13]:
ciudad = 'Toronto, Canada'
geolocator = Nominatim(user_agent="toronto_explorer")
coordenadas = geolocator.geocode(ciudad)
print('Coordenadas de Toronto: {}, {}.'.format(
    coordenadas.latitude, coordenadas.longitude))

Coordenadas de Toronto: 43.6534817, -79.3839347.


Mapa de toronto utilizando Folium

In [14]:
mapa_toronto = folium.Map(
    location=[coordenadas.latitude, coordenadas.longitude],
    zoom_start=10)
mapa_toronto


Se agrega la información procesada de barrios y ciudad al mapa

In [15]:
for _, row in df.iterrows():
    label = '{} ({}), {}'.format(
        row.Codigo_Postal, row.Barrio, row.Ciudad)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [row.Latitude, row.Longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(mapa_toronto) 
    
mapa_toronto   

Se va a filtrar solamente por los barrios de la ciudad de Toronto

In [16]:
# Nuevo data frame solamente con la ciudad de Toronto
toronto_data = df[df.Ciudad.str.contains('Toronto')]
toronto_data.reset_index(inplace=True, drop=True)
toronto_data

Unnamed: 0,Codigo_Postal,Ciudad,Barrio,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049


In [17]:
toronto_data.shape

(39, 5)

Visualizamos los datos de toronto en el mapa

In [18]:
mapa_toronto = folium.Map(
    location=[coordenadas.latitude, coordenadas.longitude],
    zoom_start=11)

for _, row in toronto_data.iterrows():
    label = '{} ({}), {}'.format(
        row.Codigo_Postal, row.Barrio, row.Ciudad)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [row.Latitude, row.Longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(mapa_toronto) 
    
mapa_toronto

Se van a explorar los barrios de nuestro data frame, obteniendo primero las coordenadas geográficas de un barrio y a partir de esto obtenemos los venues de esta locación

In [19]:
venues_list = list()

for name, lat, lng in zip(toronto_data['Barrio'], toronto_data['Latitude'], toronto_data['Longitude']):
    print("Procesando los venues de Toronto:", name)
    
    # API request URL
    limit = 50
    radius = 500
    explore_url_prefix = 'https://api.foursquare.com/v3/places/search'
    
    url = '{}?ll={}%2C{}&limit={}'.format(
        explore_url_prefix, 
        lat, lng, limit)
    
    headers = {
    "accept": "application/json",
    "Authorization": "fsq3QU7ZPPlgHjAHxcrxXMqjXTt7QlulQ320poEiVv/qGD0="
    }

    # Request
    neighborhood_venues = requests.get(url,headers=headers).json()["results"]
    no_category = 'No Category'
    # Agregar la informacion relevante a la lista, en caso de no tener categoria se marca como sin categoria
    for v in neighborhood_venues:
        if len(v['categories']):
            venues_list.append((
            name, lat, lng,
            v['name'], 
            v['geocodes']['main']['latitude'], 
            v['geocodes']['main']['longitude'],  
            v['categories'][0]['name']))
        else:
            venues_list.append((
            name, lat, lng,
            v['name'], 
            v['geocodes']['main']['latitude'], 
            v['geocodes']['main']['longitude'],  
            no_category))

print("Completado")   

Procesando los venues de Toronto: The Beaches
Procesando los venues de Toronto: The Danforth West, Riverdale
Procesando los venues de Toronto: India Bazaar, The Beaches West
Procesando los venues de Toronto: Studio District
Procesando los venues de Toronto: Lawrence Park
Procesando los venues de Toronto: Davisville North
Procesando los venues de Toronto: North Toronto West,  Lawrence Park
Procesando los venues de Toronto: Davisville
Procesando los venues de Toronto: Moore Park, Summerhill East
Procesando los venues de Toronto: Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Procesando los venues de Toronto: Rosedale
Procesando los venues de Toronto: St. James Town, Cabbagetown
Procesando los venues de Toronto: Church and Wellesley
Procesando los venues de Toronto: Regent Park, Harbourfront
Procesando los venues de Toronto: Garden District, Ryerson
Procesando los venues de Toronto: St. James Town
Procesando los venues de Toronto: Berczy Park
Procesando los venues de To

In [20]:
# Convertimos la lista a un data frame
toronto_venues = pd.DataFrame(venues_list)
toronto_venues.columns = [
    'Barrio', 'Barrio Latitud', 'Barrio Longitud', 
    'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

In [21]:
print(toronto_venues.shape)
toronto_venues.head()

(1950, 7)


Unnamed: 0,Barrio,Barrio Latitud,Barrio Longitud,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Fox Theatre,43.672895,-79.287358,Indie Movie Theater
1,The Beaches,43.676357,-79.293031,The Fill Station Sports Bar,43.673361,-79.284765,Pub
2,The Beaches,43.676357,-79.293031,Starbucks,43.668443,-79.307936,Coffee Shop
3,The Beaches,43.676357,-79.293031,Woodbine Beach Park,43.66182,-79.307921,Park
4,The Beaches,43.676357,-79.293031,The Wren,43.682552,-79.327981,Bar


Mostramos el número de venues de cada barrio

In [22]:
toronto_venues.groupby('Barrio').count()

Unnamed: 0_level_0,Barrio Latitud,Barrio Longitud,Venue,Venue Latitude,Venue Longitude,Venue Category
Barrio,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,50,50,50,50,50,50
"Brockton, Parkdale Village, Exhibition Place",50,50,50,50,50,50
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",50,50,50,50,50,50
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",50,50,50,50,50,50
Central Bay Street,50,50,50,50,50,50
Christie,50,50,50,50,50,50
Church and Wellesley,50,50,50,50,50,50
"Commerce Court, Victoria Hotel",50,50,50,50,50,50
Davisville,50,50,50,50,50,50
Davisville North,50,50,50,50,50,50


Se van a contar cuantos venues unicos hay en la ciudad de Toronto

In [23]:
print('Existen {} unicos'.format(
    len(toronto_venues['Venue Category'].unique())))

Existen 104 unicos


Ahora realizaremos un analisis por cada barrio

In [24]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot.head()
# Agregamos la columna de Barrios nuevamente al DataFrame como NombreBarrio
toronto_onehot.insert(0, 'NombreBarrio', toronto_venues['Barrio']) 
toronto_onehot.head()
toronto_onehot.shape

(1950, 105)

Vamos a analizar la frecuencia de aparición de cada categoria

In [25]:
toronto_grouped = toronto_onehot.groupby('NombreBarrio').mean().reset_index()
toronto_grouped


Unnamed: 0,NombreBarrio,American Restaurant,Art Gallery,Arts and Entertainment,Asian Restaurant,Bakery,Bank,Bar,Barbershop,Beach,...,Taco Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Urban Park,Vegan and Vegetarian Restaurant,Vietnamese Restaurant,Wine Bar
0,Berczy Park,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.06,0.0,0.06,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.04
2,"Business reply mail Processing Centre, South C...",0.02,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.02,0.0,0.0,0.0,0.06,0.0,0.04,0.0,0.0,...,0.02,0.0,0.0,0.02,0.02,0.04,0.0,0.0,0.0,0.02
4,Central Bay Street,0.02,0.02,0.02,0.0,0.02,0.0,0.02,0.0,0.0,...,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02
5,Christie,0.0,0.02,0.02,0.02,0.08,0.0,0.02,0.02,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02
6,Church and Wellesley,0.02,0.02,0.02,0.0,0.04,0.0,0.02,0.0,0.0,...,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0
7,"Commerce Court, Victoria Hotel",0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.02,0.02,0.04,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.02,0.0,0.02,0.0,0.06,0.04,0.0,...,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02
9,Davisville North,0.0,0.0,0.02,0.0,0.04,0.0,0.06,0.04,0.0,...,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02


Mostramos el listado de barrios con el top 5 de de venues disponibles

In [26]:
num_top_venues = 5
for neighborhood in toronto_grouped['NombreBarrio']:
    print("#####{}#####".format(neighborhood))
    temp = toronto_grouped[toronto_grouped['NombreBarrio'] == neighborhood].T.reset_index()
    temp.columns = ['venue','frecuencia']
    temp = temp.iloc[1:]
    temp['frecuencia'] = temp['frecuencia'].astype(float)
    temp = temp.round({'frecuencia': 2})
    print(temp.sort_values('frecuencia', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

#####Berczy Park#####
         venue  frecuencia
0         Café        0.08
1         Park        0.06
2  Music Venue        0.06
3   Restaurant        0.04
4  Coffee Shop        0.04


#####Brockton, Parkdale Village, Exhibition Place#####
          venue  frecuencia
0  Cocktail Bar        0.08
1        Bakery        0.06
2           Bar        0.06
3          Park        0.06
4      Pizzeria        0.06


#####Business reply mail Processing Centre, South Central Letter Processing Plant Toronto#####
             venue  frecuencia
0             Café        0.10
1      Coffee Shop        0.08
2             Park        0.06
3          Brewery        0.04
4  Farmers' Market        0.04


#####CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport#####
       venue  frecuencia
0       Park        0.10
1   Pizzeria        0.06
2     Bakery        0.06
3    Theater        0.04
4  Bookstore        0.04


#####Central Bay Street#####
        

Vamos a crear un nuevo dataframe con el top 10 de venues mas comunes

Ordenamos por orden de frecuencia

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Ahora creamos el data frame

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
# creamos la columnas dependiendo del numero de top venues
columns = ['Barrio']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Venue mas comun'.format(ind+1, indicators[ind]))
    except:
        columns.append('{} Venue mas comun'.format(ind+1))

# creacion del nuevo data frame
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Barrio'] = toronto_grouped['NombreBarrio']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(
        toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Barrio,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
0,Berczy Park,Café,Park,Music Venue,Restaurant,Coffee Shop,Bistro,Clothing Store,Theater,Museum,Pizzeria
1,"Brockton, Parkdale Village, Exhibition Place",Cocktail Bar,Bakery,Bar,Park,Pizzeria,Wine Bar,Restaurant,Korean Restaurant,Café,Italian Restaurant
2,"Business reply mail Processing Centre, South C...",Café,Coffee Shop,Park,Brewery,Farmers' Market,Restaurant,Furniture and Home Store,Bar,Grocery Store / Supermarket,Diner
3,"CN Tower, King and Spadina, Railway Lands, Har...",Park,Pizzeria,Bakery,Theater,Bookstore,Bar,Music Venue,Music Store,Optometrist,Night Club
4,Central Bay Street,Café,Music Venue,Clothing Store,Grocery Store / Supermarket,Hair Salon,Pizzeria,Music Store,Art Gallery,Japanese Restaurant,Jewelry Store
5,Christie,Bakery,Café,Cocktail Bar,Coffee Shop,Grocery Store / Supermarket,Italian Restaurant,Pizzeria,Farmers' Market,Record Store,Ramen Restaurant
6,Church and Wellesley,Café,Clothing Store,Bakery,Juice Bar,Music Venue,Grocery Store / Supermarket,Park,Coffee Shop,Dessert Shop,Retail
7,"Commerce Court, Victoria Hotel",Café,Music Venue,Park,Clothing Store,Theater,Hair Salon,Coffee Shop,Pizzeria,Concert Hall,Dance Studio
8,Davisville,Park,Bar,Café,Clothing Store,Barbershop,Sporting Goods Retail,Japanese Restaurant,Italian Restaurant,Grocery Store / Supermarket,Music Store
9,Davisville North,Park,Bar,Café,Grocery Store / Supermarket,Japanese Restaurant,Bakery,Sporting Goods Retail,Barbershop,Italian Restaurant,Furniture and Home Store


### Cluster de Barrios de Toronto

Vamos a usar K-means y hacer 5 clusters de barrios

In [29]:
# Set number of clusters
k = 5

# Drop the neighborhood name column so that each column contains only the feature set.
toronto_grouped_clustering = toronto_grouped.drop('NombreBarrio', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=k, random_state=0).fit(toronto_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

  toronto_grouped_clustering = toronto_grouped.drop('NombreBarrio', 1)


array([0, 1, 2, 0, 3, 1, 3, 0, 4, 4], dtype=int32)

Vamos a crear el nuevo dataframe con la informacion de los barrios y el top 10 de los venues

In [30]:
# Add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# Merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Barrio'), on='Barrio')

print(toronto_merged.shape)
toronto_merged.head()

(39, 16)


Unnamed: 0,Codigo_Postal,Ciudad,Barrio,Latitude,Longitude,Cluster Labels,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Café,Coffee Shop,Park,Restaurant,Diner,Japanese Restaurant,Bar,Farmers' Market,Grocery Store / Supermarket,Brewery
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,2,Café,Coffee Shop,Furniture and Home Store,Clothing Store,Farmers' Market,Music Venue,Brewery,Bar,Restaurant,Diner
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,2,Café,Coffee Shop,Park,Furniture and Home Store,Grocery Store / Supermarket,Diner,Farmers' Market,Brewery,Restaurant,Bar
3,M4M,East Toronto,Studio District,43.659526,-79.340923,2,Café,Coffee Shop,Park,Bar,Diner,Restaurant,Furniture and Home Store,Bistro,Clothing Store,Grocery Store / Supermarket
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,Park,Bar,Café,Italian Restaurant,Sporting Goods Retail,Bakery,Barbershop,Grocery Store / Supermarket,Event Space,Farmers' Market


Visualizacion de los clusters

In [31]:
# Create a map instance
mapa_toronto = folium.Map(
    location=[coordenadas.latitude, coordenadas.longitude],
    zoom_start=11)

# Set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        toronto_merged['Latitude'], toronto_merged['Longitude'],
        toronto_merged['Barrio'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(mapa_toronto)
       
mapa_toronto

### Analisis de los clusters

##### Cluster 0

In [38]:
cluster_0 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0,
                               toronto_merged.columns[
                                   [2] + list(range(
                                       5, toronto_merged.shape[1]))]]
cluster_0


Unnamed: 0,Barrio,Cluster Labels,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
14,"Garden District, Ryerson",0,Café,Music Venue,Coffee Shop,Park,Pizzeria,Hair Salon,Clothing Store,Music Store,Farmers' Market,Shoe Store
15,St. James Town,0,Café,Coffee Shop,Restaurant,Park,Clothing Store,Theater,Bar,Music Venue,Bistro,Furniture and Home Store
16,Berczy Park,0,Café,Park,Music Venue,Restaurant,Coffee Shop,Bistro,Clothing Store,Theater,Museum,Pizzeria
18,"Richmond, Adelaide, King",0,Café,American Restaurant,Spa,Hair Salon,Clothing Store,Music Venue,Pizzeria,Park,Bar,Shoe Store
19,"Harbourfront East, Union Station, Toronto Islands",0,Café,Park,Music Venue,American Restaurant,Theater,Spa,Hair Salon,Pizzeria,Museum,Drugstore
20,"Toronto Dominion Centre, Design Exchange",0,Café,Park,Music Venue,Theater,Hair Salon,Clothing Store,Pizzeria,American Restaurant,Museum,Furniture and Home Store
21,"Commerce Court, Victoria Hotel",0,Café,Music Venue,Park,Clothing Store,Theater,Hair Salon,Coffee Shop,Pizzeria,Concert Hall,Dance Studio
27,"CN Tower, King and Spadina, Railway Lands, Har...",0,Park,Pizzeria,Bakery,Theater,Bookstore,Bar,Music Venue,Music Store,Optometrist,Night Club
28,Stn A PO Boxes,0,Café,Park,Music Venue,Coffee Shop,Restaurant,Bistro,Clothing Store,Theater,Museum,Pizzeria
29,"First Canadian Place, Underground city",0,Café,Music Venue,Park,Theater,Clothing Store,Hair Salon,Pizzeria,Museum,Farmers' Market,Furniture and Home Store


##### Cluster 1

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1,
                   toronto_merged.columns[
                       [2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Barrio,Cluster Labels,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
30,Christie,1,Bakery,Café,Cocktail Bar,Coffee Shop,Grocery Store / Supermarket,Italian Restaurant,Pizzeria,Farmers' Market,Record Store,Ramen Restaurant
31,"Dufferin, Dovercourt Village",1,Bakery,Bar,Pizzeria,Italian Restaurant,Café,Coffee Shop,Park,Korean Restaurant,Farmers' Market,Movie Theater
32,"Little Portugal, Trinity",1,Cocktail Bar,Pizzeria,Bar,Bakery,Café,Italian Restaurant,Wine Bar,Korean Restaurant,Japanese Restaurant,Fashion Accessories Store
33,"Brockton, Parkdale Village, Exhibition Place",1,Cocktail Bar,Bakery,Bar,Park,Pizzeria,Wine Bar,Restaurant,Korean Restaurant,Café,Italian Restaurant
34,"High Park, The Junction South",1,Bakery,Café,Italian Restaurant,Coffee Shop,Bar,Korean Restaurant,Cocktail Bar,Park,Pizzeria,Farmers' Market
35,"Parkdale, Roncesvalles",1,Bakery,Bar,Wine Bar,Italian Restaurant,Coffee Shop,Restaurant,Cocktail Bar,Café,Korean Restaurant,French Restaurant
36,"Runnymede, Swansea",1,Bakery,Bar,Coffee Shop,Italian Restaurant,Café,Park,Cocktail Bar,Wine Bar,Korean Restaurant,Art Gallery


##### Cluster 2

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2,
                   toronto_merged.columns[
                       [2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Barrio,Cluster Labels,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
0,The Beaches,2,Café,Coffee Shop,Park,Restaurant,Diner,Japanese Restaurant,Bar,Farmers' Market,Grocery Store / Supermarket,Brewery
1,"The Danforth West, Riverdale",2,Café,Coffee Shop,Furniture and Home Store,Clothing Store,Farmers' Market,Music Venue,Brewery,Bar,Restaurant,Diner
2,"India Bazaar, The Beaches West",2,Café,Coffee Shop,Park,Furniture and Home Store,Grocery Store / Supermarket,Diner,Farmers' Market,Brewery,Restaurant,Bar
3,Studio District,2,Café,Coffee Shop,Park,Bar,Diner,Restaurant,Furniture and Home Store,Bistro,Clothing Store,Grocery Store / Supermarket
10,Rosedale,2,Café,Clothing Store,Grocery Store / Supermarket,Music Venue,Diner,Furniture and Home Store,Park,Coffee Shop,Juice Bar,Bakery
11,"St. James Town, Cabbagetown",2,Café,Coffee Shop,Park,Restaurant,Juice Bar,Music Venue,Diner,Farmers' Market,Clothing Store,Furniture and Home Store
13,"Regent Park, Harbourfront",2,Café,Park,Coffee Shop,Restaurant,Bistro,Diner,Furniture and Home Store,Clothing Store,Playground,Retail
38,"Business reply mail Processing Centre, South C...",2,Café,Coffee Shop,Park,Brewery,Farmers' Market,Restaurant,Furniture and Home Store,Bar,Grocery Store / Supermarket,Diner


##### Cluster 3

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3,
                   toronto_merged.columns[
                       [2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Barrio,Cluster Labels,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
12,Church and Wellesley,3,Café,Clothing Store,Bakery,Juice Bar,Music Venue,Grocery Store / Supermarket,Park,Coffee Shop,Dessert Shop,Retail
17,Central Bay Street,3,Café,Music Venue,Clothing Store,Grocery Store / Supermarket,Hair Salon,Pizzeria,Music Store,Art Gallery,Japanese Restaurant,Jewelry Store
24,"The Annex, North Midtown, Yorkville",3,Cocktail Bar,Café,Bakery,Grocery Store / Supermarket,Italian Restaurant,Pizzeria,Art Gallery,Park,Music Store,Jewelry Store
25,"University of Toronto, Harbord",3,Grocery Store / Supermarket,Pizzeria,Clothing Store,Bakery,Café,Art Gallery,Park,Cocktail Bar,Fast Food Restaurant,Coffee Shop
26,"Kensington Market, Chinatown, Grange Park",3,Bakery,Pizzeria,Wine Bar,Cocktail Bar,Art Gallery,Café,Night Club,Grocery Store / Supermarket,Hair Salon,Home Improvement Service
37,"Queen's Park, Ontario Provincial Government",3,Music Venue,Clothing Store,Café,Grocery Store / Supermarket,Bakery,Pizzeria,Movie Theater,Hair Salon,Art Gallery,Japanese Restaurant


##### Cluster 4

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4,
                   toronto_merged.columns[
                       [2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Barrio,Cluster Labels,1st Venue mas comun,2nd Venue mas comun,3rd Venue mas comun,4 Venue mas comun,5 Venue mas comun,6 Venue mas comun,7 Venue mas comun,8 Venue mas comun,9 Venue mas comun,10 Venue mas comun
4,Lawrence Park,4,Park,Bar,Café,Italian Restaurant,Sporting Goods Retail,Bakery,Barbershop,Grocery Store / Supermarket,Event Space,Farmers' Market
5,Davisville North,4,Park,Bar,Café,Grocery Store / Supermarket,Japanese Restaurant,Bakery,Sporting Goods Retail,Barbershop,Italian Restaurant,Furniture and Home Store
6,"North Toronto West, Lawrence Park",4,Park,Bakery,Bar,Café,Sporting Goods Retail,Italian Restaurant,Clothing Store,Barbershop,Grocery Store / Supermarket,Retail
7,Davisville,4,Park,Bar,Café,Clothing Store,Barbershop,Sporting Goods Retail,Japanese Restaurant,Italian Restaurant,Grocery Store / Supermarket,Music Store
8,"Moore Park, Summerhill East",4,Café,Park,Grocery Store / Supermarket,Clothing Store,Bakery,Juice Bar,Music Venue,Diner,Dessert Shop,Spa
9,"Summerhill West, Rathnelly, South Hill, Forest...",4,Café,Bakery,Grocery Store / Supermarket,Wine Bar,Farmers' Market,Bar,Barbershop,Italian Restaurant,Park,Music Venue
22,Roselawn,4,Bakery,Park,Café,Italian Restaurant,Music Venue,Bar,Barbershop,Farmers' Market,Coffee Shop,Clothing Store
23,Forest Hill North & West,4,Bakery,Grocery Store / Supermarket,Park,Farmers' Market,Music Venue,Italian Restaurant,Coffee Shop,Café,Barbershop,Ramen Restaurant


### Conclusiones

- El cluster con mas barrios es el cluster 0 con 10 barrios. La categoría que aparece de forma mas común son las cafeterías que se ven en gran cantidad en lo que es Toronto al parecer
- Los otros clusters son mas pequeños aunque no parecen diferir en gran medida unos de otros. En todos podemos ver cafeterías, parques y pastelerías en gran cantidad
- El cluster 3 parece ser el que ofrece diferentes actividades de diferentes categorias principales