# **Proyecto Capstone**
### **Agrupación y segmentación de vecindarios de la ciudad de Toronto, en Canadá.**
##### *Francisco A. Herrera González*

## Parte 1. Crear el dataframe con los datos del código postal:

In [1]:
#Primero voy a importar todas las librerías necesarias para el ejercicio:

import numpy as np 

import pandas as pd 
from pandas import DataFrame
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json  
from geopy.geocoders import Nominatim 

import requests 

from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
import folium 
import urllib3
print('Todas las librerias importadas!')

Todas las librerias importadas!


In [2]:
# Extraemos el archivo de Wikipedia:

WikiTables = pd.read_html('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1008658788', header=0)
WikiTables

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 5           M6A        North York   
 6           M7A  Downtown Toronto   
 7           M8A      Not assigned   
 8           M9A         Etobicoke   
 9           M1B       Scarborough   
 10          M2B      Not assigned   
 11          M3B        North York   
 12          M4B         East York   
 13          M5B  Downtown Toronto   
 14          M6B        North York   
 15          M7B      Not assigned   
 16          M8B      Not assigned   
 17          M9B         Etobicoke   
 18          M1C       Scarborough   
 19          M2C      Not assigned   
 20          M3C        North York   
 21          M4C         East York   
 22          M5C  Downtown Toronto   
 23          M6C              York   
 24          M7C      Not assigned   
 25         

In [3]:
# Comprobemos si en dicho archivo se encuentra solamente la tabla que buscamos o hay más:

print(F'Número de tablas: {len(WikiTables)}')
print('Como podemos ver, en el archivo HTML hay 3 tablas. Debemos seleccionar la que queremos tratar:')

Número de tablas: 3
Como podemos ver, en el archivo HTML hay 3 tablas. Debemos seleccionar la que queremos tratar:


In [4]:
# Primero veamos las primeras líneas de cada tabla para poder identificarla:

for i in range(len(WikiTables)):
    print(WikiTables[i].head(2))

  Postal Code       Borough Neighbourhood
0         M1A  Not assigned  Not assigned
1         M2A  Not assigned  Not assigned
                                          Unnamed: 0  \
0  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
1                                                 NL   

                               Canadian postal codes  \
0  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
1                                                 NS   

                                          Unnamed: 2 Unnamed: 3 Unnamed: 4  \
0  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...        NaN        NaN   
1                                                 PE         NB         QC   

  Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10  \
0        NaN        NaN        NaN        NaN        NaN         NaN   
1         QC         QC         ON         ON         ON          ON   

  Unnamed: 11 Unnamed: 12 Unnamed: 13 Unnamed: 14 Unnamed: 15 Unnamed: 16  \
0      

In [5]:
# Vemos que la tabla de los códigos postales que nos interesa es la primera, vamos a seleccionarla e imprimirla en un dataframe:

CPT = WikiTables[0]
CPT

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [6]:
# Perfecto! Ahora tenemos que formatear la tabla en base a las directrices que nos pide el ejercicio:

# 1º eliminamos las celdas que no tienen un municipio asignado

CPT.replace("Not assigned", np.nan, inplace=True) # Para facilitar el trabajo remplazamos el valor 'Not assigned' a un marcador 'NaN'
CPT.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [7]:
missing_data = CPT.isnull()
missing_data
for column in missing_data.columns.values.tolist():
    print(column)
    print(missing_data[column].value_counts())
    print("")

Postal Code
False    180
Name: Postal Code, dtype: int64

Borough
False    103
True      77
Name: Borough, dtype: int64

Neighbourhood
False    103
True      77
Name: Neighbourhood, dtype: int64



In [8]:
# Vemos que los valores NaN coinciden para Borough y Neighbourhood, por tanto podemos eliminar todas las filas en las que haya NaN:

CPT.dropna(axis=0, inplace=True)
CPT

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [9]:
# Por cuestión de estilo, vamos a reiniciar los índices:

CPT.reset_index(drop=True, inplace=True)
CPT

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [10]:
# Veamos el numero de filas y columnas que quedan una vez tratado el dataframe:
CPT.shape
print(CPT.shape)
print('Vemos que el dataframe restante tiene 103 filas y 3 columnas')

(103, 3)
Vemos que el dataframe restante tiene 103 filas y 3 columnas


## Parte 2. Añadir datos geoespaciales

In [11]:
# Desde mi punto de vista, una de las cualidades de un buen científico de datos debe ser saber ahorrar tiempo y esfuerzo 
# cuando hay un camino más fácil para conseguir el mismo resultado. Por tanto, en lugar de crear la iteración para encontrar
# las coordenadas a través del Geocoder, descarguemos directamente el archivo csv con dichos valores y carguémoslo al notebook:

Coordenadas = pd.read_csv("C:/Users/Fran/Documents/Geospatial_Coordinates.csv")
Coordenadas

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [12]:
# Podemos ver que este dataframe está ordenado alfabéticamente en base al código postal.
# También, comprobando su shape, podemos ver si nuestro cálculo de la forma del dataframe anterior concuerda con este:

print('Tamaño de la tabla de coordenadas:', Coordenadas.shape)
print('')
print('Vemos que el número de filas es el mismo, por tanto, hay coherencia en los datos.')

Tamaño de la tabla de coordenadas: (103, 3)

Vemos que el número de filas es el mismo, por tanto, hay coherencia en los datos.


In [13]:
# Para añadir los datos geoespaciales a la tabla de Códigos Postales, primero tenemos que hacer algún ajuste, 
# como por ejemplo ordenar alfábeticamente la tabla en base al nombre del código postal:

CPT.sort_values(by=['Postal Code'], inplace = True, ignore_index=True) #ignore_index hace que los índices se reinicien.
CPT

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [14]:
# Una vez ordenada la columna y comprobada su coherencia, vamos a borrar la columna 'Postal Code' en el dataframe de
# las coordenadas y unirlo con el dataframe de los códigos postales:

Coordenadas.drop(['Postal Code'], axis=1, inplace=True) #Eliminamos la columna
Coordenadas

Unnamed: 0,Latitude,Longitude
0,43.806686,-79.194353
1,43.784535,-79.160497
2,43.763573,-79.188711
3,43.770992,-79.216917
4,43.773136,-79.239476
5,43.744734,-79.239476
6,43.727929,-79.262029
7,43.711112,-79.284577
8,43.716316,-79.239476
9,43.692657,-79.264848


In [15]:
TorontoGeoData = pd.concat([CPT, Coordenadas], axis=1) #Concatenamos los dos dataframes
TorontoGeoData

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Parte 3. Agrupación de los vecindarios y visualización de los mapas: 

In [16]:
# Lo primero de todo es entender los datos que tenemos; cuántos barrios, cuántos municipios, etc.

print('El dataframe de Toronto tiene {} municipios (boroughs) y {} barrios (neighborhoods).'.format(
        len(TorontoGeoData['Borough'].unique()),
        TorontoGeoData.shape[0]))

El dataframe de Toronto tiene 10 municipios (boroughs) y 103 barrios (neighborhoods).


In [17]:
# Ahora, a través de la librería geopy vamos a obtener las coordenadas de la ciudad de Toronto:

address='Toronto City, TO'

geolocator = Nominatim(user_agent = "tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('Las coordenadas geográficas de Toronto son {}, {}.'.format(latitude, longitude))

Las coordenadas geográficas de Toronto son 43.65238435, -79.38356765.


In [18]:
# Ahora vamos a generar el mapa de Toronto utilizando los valores anteriores:

mapa_toronto = folium.Map(location=[latitude, longitude], zoom_start = 12)
mapa_toronto

In [19]:
# Ahora vamos añadir los marcadores de cada barrio al mapa:

for lat, lng, borough, neighborhood in zip(TorontoGeoData['Latitude'], TorontoGeoData['Longitude'], TorontoGeoData['Borough'], TorontoGeoData['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='green',
    fill=True,
    fill_color='#33FF93',
    fill_opacity=0.6,
    parse_html=False).add_to(mapa_toronto)

mapa_toronto
    

In [20]:
# Ahora vamos a escoger uno de los municipios como nos sugiere la práctica y trabajar en base a este,
# 1º repasemos brevemente la tabla de vecindarios, pero esta vez ordenada por municipios en lugar de por el código postal:

TorontoGeoData.sort_values(by=['Borough'])

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
63,M5N,Central Toronto,Roselawn,43.711695,-79.416936
47,M4S,Central Toronto,Davisville,43.704324,-79.38879
64,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307
65,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197
46,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
49,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049
50,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529


In [21]:
TorontoGeoData['Borough'].value_counts() #Así tenemos fácil acceso al nombre de cada uno de los municipios y el número de vecindarios de cada uno.

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

In [22]:
# Analizando la tabla, creo que vamos a enfocar el desarrollo del ejercicio en los vecindarios que están al este de la ciudad,
# A priori podemos ver como parece ser una zona menos densa, situada entre la ciudad y un parque natural, etc.
# Podemos ver que Scarborough tiene bastantes vecindarios que explorar, además para hacer la muestra mas representativa de cómo
# es el este de Toronto, vamos a escoger también los barrios de East York y East Toronto.

CityEast = TorontoGeoData [(TorontoGeoData.Borough == "Scarborough") | 
                       (TorontoGeoData.Borough == "East York") | 
                       (TorontoGeoData.Borough == "East Toronto")]

CityEast.sort_values('Borough', ascending=False, ignore_index=True, inplace=True)
CityEast

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  CityEast.sort_values('Borough', ascending=False, ignore_index=True, inplace=True)


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
2,M1X,Scarborough,Upper Rouge,43.836125,-79.205636
3,M1W,Scarborough,"Steeles West, L'Amoreaux West",43.799525,-79.318389
4,M1V,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577
5,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
6,M1S,Scarborough,Agincourt,43.7942,-79.262029
7,M1R,Scarborough,"Wexford, Maryvale",43.750071,-79.295849
8,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304
9,M1T,Scarborough,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302


In [23]:
#Generemos pues el mapa sólo con los barrios del este de la ciudad
east_toronto = folium.Map(location=[latitude, longitude], zoom_start = 12) #Las coordenadas ya las llamamos anteriormente en geopy, por lo que no hay que repetir el proceso

for lat, lng, label in zip(CityEast['Latitude'], CityEast['Longitude'], CityEast['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#F5F5B0',
        fill_opacity=0.7,
        parse_html=False).add_to(east_toronto)  
    
east_toronto

In [24]:
# A continuación, repliquemos el análisis hecho con FourSquare en Manhattan en el lab anterior, 
# pero con los barrios del este de Toronto.

# Primero definimos las credenciales de FourSquare:

CLIENT_ID = '2FXV4YUCJ0EIYRAJI022BQNSYAHD42GRAOALJYSLMSH5VIMK' 
CLIENT_SECRET = 'SPDKXEYGNGLJGQWNG5OPCC50EYTNB3JXLC0CO1B4DOVKVN4E' 
VERSION = '20180605' 
LIMIT = 100 

print('Credenciales FourSquare:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Credenciales FourSquare:
CLIENT_ID: 2FXV4YUCJ0EIYRAJI022BQNSYAHD42GRAOALJYSLMSH5VIMK
CLIENT_SECRET:SPDKXEYGNGLJGQWNG5OPCC50EYTNB3JXLC0CO1B4DOVKVN4E


In [25]:
# En lugar de explorar el primer barrio del dataframe, vamos a elegir uno al azar de los 27 que tenemos:

CityEast.loc[25, 'Neighbourhood']

'Studio District'

In [26]:
# Obtengamos los datos geoespaciales de este barrio:

StudioDistrict_latitude = CityEast.loc[25, 'Latitude']
StudioDistrict_longitude = CityEast.loc[25, 'Longitude']

print('Las coordenadas del barrio Studio District son: {}, {}.'.format(StudioDistrict_latitude, StudioDistrict_longitude))

Las coordenadas del barrio Studio District son: 43.6595255, -79.340923.


In [27]:
# Con estos datos ya podemos hacer una request a FourSquare para que nos informe un poco de lo que hay en este barrio.
# Vamos a hacer un análisis de 50 lugares en un radio de 500 metros:

LIMIT = 50
radius = 500

# Establecemos la Url de la API:

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    StudioDistrict_latitude, 
    StudioDistrict_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=2FXV4YUCJ0EIYRAJI022BQNSYAHD42GRAOALJYSLMSH5VIMK&client_secret=SPDKXEYGNGLJGQWNG5OPCC50EYTNB3JXLC0CO1B4DOVKVN4E&v=20180605&ll=43.6595255,-79.340923&radius=500&limit=50'

In [28]:
# Ahora enviamos la solicitud GET, depuramos el Json y examinamos qué hay de interesante en el Studio District:

georesults = requests.get(url).json()
georesults

{'meta': {'code': 200, 'requestId': '60643c8a2138947f20b612fa'},
 'response': {'headerLocation': 'Leslieville',
  'headerFullLocation': 'Leslieville, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 36,
  'suggestedBounds': {'ne': {'lat': 43.6640255045, 'lng': -79.33471445573701},
   'sw': {'lat': 43.6550254955, 'lng': -79.347131544263}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ad7e958f964a520001021e3',
       'name': "Ed's Real Scoop",
       'location': {'address': '920 Queen St. E',
        'crossStreet': 'btwn Logan Ave. & Morse St.',
        'lat': 43.660655832455014,
        'lng': -79.3420187548006,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.660655832455014,
          'lng': -79.3420187548006}],
        '

In [29]:
def get_category_type(row): #Extraemos la categoría del sitio
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [30]:
venues = georesults['response']['groups'][0]['items']
    
lugares_cercanos = pd.json_normalize(venues) # normalizamos el objeto JSON

# filtramos las columnas:

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
lugares_cercanos =lugares_cercanos.loc[:, filtered_columns]

# filtramos la categoría para cada fila:

lugares_cercanos['venue.categories'] = lugares_cercanos.apply(get_category_type, axis=1)

# limpiamos las columnas

lugares_cercanos.columns = [col.split(".")[-1] for col in lugares_cercanos.columns]

lugares_cercanos

Unnamed: 0,name,categories,lat,lng
0,Ed's Real Scoop,Ice Cream Shop,43.660656,-79.342019
1,Mercury Espresso Bar,Coffee Shop,43.660806,-79.341241
2,Te Aro,Coffee Shop,43.661373,-79.338577
3,Queen Books,Bookstore,43.660651,-79.342267
4,Hooked,Fish Market,43.660407,-79.343257
5,The Bone House,Pet Store,43.660894,-79.341097
6,Brick Street Breads,Bakery,43.660685,-79.342501
7,Reliable Halibut and Chips,Seafood Restaurant,43.660874,-79.340938
8,Purple Penguin Cafe,Café,43.660501,-79.342565
9,WAYLABAR,Gay Bar,43.661234,-79.339597


In [31]:
print('Podemos ver que FourSquare nos ha devuelto {} lugares para Studio District.'.format(lugares_cercanos.shape[0]))

Podemos ver que FourSquare nos ha devuelto 36 lugares para Studio District.


#### Una vez reproducido el ejemplo con uno de los barrios, pasemos a hacerlo para todos los barrios del este de la Ciudad de Toronto:

In [32]:
# Primero definimos la función que haga todo el proceso pero para el dataframe que creado:

def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #Creamos la Url para la API:
        
        url1 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        #Creamos la solicitud GET
        
        georesults1 = requests.get(url1).json()["response"]['groups'][0]['items']
        
        #Establecemos que retorne solo información relevante:
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in georesults1])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
        

In [33]:
CityEastPlaces = getNearbyVenues(names=CityEast['Neighbourhood'],
                                   latitudes=CityEast['Latitude'],
                                   longitudes=CityEast['Longitude']
                                  )

Malvern, Rouge
Birch Cliff, Cliffside West
Upper Rouge
Steeles West, L'Amoreaux West
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Rouge Hill, Port Union, Highland Creek
Agincourt
Wexford, Maryvale
Dorset Park, Wexford Heights, Scarborough Town Centre
Clarks Corners, Tam O'Shanter, Sullivan
Cliffside, Cliffcrest, Scarborough Village West
Kennedy Park, Ionview, East Birchmount Park
Scarborough Village
Cedarbrae
Woburn
Guildwood, Morningside, West Hill
Golden Mile, Clairlea, Oakridge
Leaside
East Toronto, Broadview North (Old East York)
Thorncliffe Park
Woodbine Heights
Parkview Hill, Woodbine Gardens
The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto


In [34]:
print(CityEastPlaces.shape)
CityEastPlaces.head()

(293, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Birch Cliff, Cliffside West",43.692657,-79.264848,The Birchcliff,43.691666,-79.264532,Café
2,"Birch Cliff, Cliffside West",43.692657,-79.264848,Birchmount Community Centre,43.695175,-79.262161,General Entertainment
3,"Birch Cliff, Cliffside West",43.692657,-79.264848,torontochristmastree,43.690574,-79.262671,Farm
4,"Birch Cliff, Cliffside West",43.692657,-79.264848,Scarborough Gardens,43.694647,-79.26223,Skating Rink


#### Podemos ver que el dataframe resultante nos ha devuelto 290 resultados de lugares cercanos en el este de Toronto

In [35]:
# Veamos ahora cuantos sitios hay para cada barrio:

CityEastPlaces.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Birch Cliff, Cliffside West",5,5,5,5,5,5
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",18,18,18,18,18,18
Cedarbrae,9,9,9,9,9,9
"Clarks Corners, Tam O'Shanter, Sullivan",12,12,12,12,12,12
"Cliffside, Cliffcrest, Scarborough Village West",3,3,3,3,3,3
"Dorset Park, Wexford Heights, Scarborough Town Centre",6,6,6,6,6,6
"East Toronto, Broadview North (Old East York)",3,3,3,3,3,3
"Golden Mile, Clairlea, Oakridge",10,10,10,10,10,10
"Guildwood, Morningside, West Hill",8,8,8,8,8,8


In [36]:
print('Hay {} categorías únicas.'.format(len(CityEastPlaces['Venue Category'].unique())))

Hay 112 categorías únicas.


#### Vamos ahora a analizar cada barrio

In [37]:
# Primero vamos a hacer una codificación OneHot:

eastcity_onehot = pd.get_dummies(CityEastPlaces[['Venue Category']], prefix="", prefix_sep="")
eastcity_onehot.drop(['Neighborhood'], inplace=True, axis = 1)

# Añadimos la columna Neighborhood codificada de vuelta al data frame anterior:

eastcity_onehot['Neighborhood'] = CityEastPlaces['Neighborhood']
eastcity_onehot.columns

Index(['Accessories Store', 'American Restaurant', 'Athletics & Sports',
       'Auto Garage', 'Auto Workshop', 'Bagel Shop', 'Bakery', 'Bank', 'Bar',
       'Beer Store',
       ...
       'Supermarket', 'Sushi Restaurant', 'Thai Restaurant',
       'Thrift / Vintage Store', 'Tibetan Restaurant', 'Trail',
       'Vietnamese Restaurant', 'Warehouse Store', 'Yoga Studio',
       'Neighborhood'],
      dtype='object', length=112)

In [38]:
# La desplazamos, poniéndola primera en el frame:

fixed_columns = [eastcity_onehot.columns[-1]] + list(eastcity_onehot.columns[:-1])
eastcity_onehot = eastcity_onehot[fixed_columns]

# Comprobamos:

eastcity_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Bagel Shop,Bakery,Bank,Bar,Beer Store,Bike Shop,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Station,Café,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Stadium,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Curling Ice,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Health Food Store,Hobby Shop,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Juice Bar,Korean BBQ Restaurant,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Medical Center,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motel,Movie Theater,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Recording Studio,Rental Car Location,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Trail,Vietnamese Restaurant,Warehouse Store,Yoga Studio
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Birch Cliff, Cliffside West",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Birch Cliff, Cliffside West",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Birch Cliff, Cliffside West",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Birch Cliff, Cliffside West",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [39]:
eastcity_onehot.shape

(293, 112)

In [40]:
# Agrupamos los valores por barrios, usando la frecuencia de cada categoría como medida:

eastcity_grouped = eastcity_onehot.groupby(["Neighborhood"]).mean().reset_index()


eastcity_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Bagel Shop,Bakery,Bank,Bar,Beer Store,Bike Shop,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Station,Café,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Stadium,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Curling Ice,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Health Food Store,Hobby Shop,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Juice Bar,Korean BBQ Restaurant,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Medical Center,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motel,Movie Theater,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Recording Studio,Rental Car Location,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Trail,Vietnamese Restaurant,Warehouse Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556
3,Cedarbrae,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0
4,"Clarks Corners, Tam O'Shanter, Sullivan",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0


In [41]:
print('El tamaño del nuevo data frame es:', eastcity_grouped.shape)

El tamaño del nuevo data frame es: (26, 112)


In [42]:
# Ahora vamos a imprimir cada barrio con sus 5 sitios más populares:

toplugares = 5

for hood in eastcity_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = eastcity_grouped[eastcity_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(toplugares))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge   0.2
1             Clothing Store   0.2
2               Skating Rink   0.2
3             Breakfast Spot   0.2
4  Latin American Restaurant   0.2


----Birch Cliff, Cliffside West----
                   venue  freq
0  General Entertainment   0.2
1        College Stadium   0.2
2                   Café   0.2
3           Skating Rink   0.2
4                   Farm   0.2


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                venue  freq
0  Light Rail Station  0.11
1         Yoga Studio  0.06
2       Burrito Place  0.06
3       Garden Center  0.06
4              Garden  0.06


----Cedarbrae----
                  venue  freq
0      Hakka Restaurant  0.11
1                  Bank  0.11
2   Fried Chicken Joint  0.11
3  Caribbean Restaurant  0.11
4       Thai Restaurant  0.11


----Clarks Corners, Tam O'Shanter, Sullivan----
                 venue  freq
0     

In [43]:
# Ahora tenemos que transmitir dichos valores al dataframe:
## Primero tenemos que escribir una función que ordene los sitios en orden descendente:

def return_most_common_venues(row, toplugares):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:toplugares]

In [44]:
# Vamos a generar el nuevo dataframe con el top 10 de lugares de cada barrio:

toplugares = 10

indicators = ['st', 'nd', 'rd']

# Creamos las columnas acorde al top de sitios:

columns = ['Neighborhood']
for ind in np.arange(toplugares):
    try:
        columns.append('{}{} Lugar más común'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}º lugar mas comun'.format(ind+1))
        
# Creamos el nuevo dataframe

top_lugaresEast = pd.DataFrame(columns = columns)
top_lugaresEast['Neighborhood'] = eastcity_grouped['Neighborhood']

for ind in np.arange(eastcity_grouped.shape[0]):
    top_lugaresEast.iloc[ind, 1:] = return_most_common_venues(eastcity_grouped.iloc[ind, :], toplugares)
    
top_lugaresEast

Unnamed: 0,Neighborhood,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
0,Agincourt,Lounge,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Accessories Store,Pet Store,Park,Noodle House,Movie Theater
1,"Birch Cliff, Cliffside West",General Entertainment,College Stadium,Café,Skating Rink,Farm,Pizza Place,Pet Store,Park,Noodle House,Movie Theater
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Yoga Studio,Burrito Place,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Park,Pizza Place,Recording Studio
3,Cedarbrae,Hakka Restaurant,Bank,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Lounge,Gas Station,Bakery,Athletics & Sports,Movie Theater
4,"Clarks Corners, Tam O'Shanter, Sullivan",Pizza Place,Convenience Store,Bank,Fried Chicken Joint,Chinese Restaurant,Italian Restaurant,Thai Restaurant,Fast Food Restaurant,Gas Station,Noodle House
5,"Cliffside, Cliffcrest, Scarborough Village West",American Restaurant,Motel,Intersection,Light Rail Station,Pet Store,Park,Noodle House,Movie Theater,Middle Eastern Restaurant,Mexican Restaurant
6,"Dorset Park, Wexford Heights, Scarborough Town...",Indian Restaurant,Pet Store,Vietnamese Restaurant,Thrift / Vintage Store,Chinese Restaurant,Light Rail Station,Park,Noodle House,Movie Theater,Motel
7,"East Toronto, Broadview North (Old East York)",Park,Convenience Store,Accessories Store,Greek Restaurant,Pet Store,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant
8,"Golden Mile, Clairlea, Oakridge",Bakery,Bus Line,Soccer Field,Bus Station,Park,Ice Cream Shop,Intersection,Metro Station,Mexican Restaurant,Lounge
9,"Guildwood, Morningside, West Hill",Rental Car Location,Restaurant,Bank,Intersection,Mexican Restaurant,Breakfast Spot,Electronics Store,Medical Center,Accessories Store,Lounge


#### Ahora vamos a agruparlos usando el método K-MEANS

In [45]:
from sklearn.cluster import KMeans

In [46]:
kclusters = 5

eastcity_clustering = eastcity_grouped.drop('Neighborhood', 1)

# Ejecutamos y modelamos:

kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(eastcity_clustering)

# Revisamos las etiquetas de los clusters para cada fila del dataframe:

kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 4, 0, 0])

In [47]:
# Generamos un nuevo dataframe que incluya la agrupación y el top 10 de sitios para cada barrio:
## Primero vamos a añadir las etiquetas:

top_lugaresEast.insert(0, 'Cluster Labels', kmeans.labels_)
eastcity_merged = CityEast



In [48]:
# Juntamos el df eastcity_grouped con el df CityEast para tener un DataFrame con las coordenadas y los lugares:

eastcity_merged = CityEast.join(top_lugaresEast.set_index('Neighborhood'), on='Neighbourhood')
eastcity_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,2.0,Fast Food Restaurant,Greek Restaurant,Pet Store,Park,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
1,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,0.0,General Entertainment,College Stadium,Café,Skating Rink,Farm,Pizza Place,Pet Store,Park,Noodle House,Movie Theater
2,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,
3,M1W,Scarborough,"Steeles West, L'Amoreaux West",43.799525,-79.318389,0.0,Fast Food Restaurant,Coffee Shop,Pharmacy,Sandwich Place,Pizza Place,Chinese Restaurant,Breakfast Spot,Supermarket,Electronics Store,Bank
4,M1V,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577,0.0,Playground,Park,Intersection,Light Rail Station,Pet Store,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant
5,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,3.0,Bar,Construction & Landscaping,Accessories Store,Liquor Store,Pharmacy,Pet Store,Park,Noodle House,Movie Theater,Motel
6,M1S,Scarborough,Agincourt,43.7942,-79.262029,0.0,Lounge,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Accessories Store,Pet Store,Park,Noodle House,Movie Theater
7,M1R,Scarborough,"Wexford, Maryvale",43.750071,-79.295849,0.0,Middle Eastern Restaurant,Accessories Store,Auto Garage,Vietnamese Restaurant,Bakery,Sandwich Place,Shopping Mall,Liquor Store,Park,Noodle House
8,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,0.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Thrift / Vintage Store,Chinese Restaurant,Light Rail Station,Park,Noodle House,Movie Theater,Motel
9,M1T,Scarborough,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,0.0,Pizza Place,Convenience Store,Bank,Fried Chicken Joint,Chinese Restaurant,Italian Restaurant,Thai Restaurant,Fast Food Restaurant,Gas Station,Noodle House


#### En el dataframe anterior podemos constatar que Upper Rouge no tiene valores asignados. Haciendo una búsqueda rápida, podemos ver como Upper Rouge es más bien una especie de sendero de montaña, por tanto podemos eliminar la fila, ya que si no, nos va a dar problemas más adelante para graficar el mapa

In [49]:
# Eliminemos pues la fila con los valores faltantes:
eastcity_merged.dropna(axis=0, inplace=True)

eastcity_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,2.0,Fast Food Restaurant,Greek Restaurant,Pet Store,Park,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
1,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,0.0,General Entertainment,College Stadium,Café,Skating Rink,Farm,Pizza Place,Pet Store,Park,Noodle House,Movie Theater
3,M1W,Scarborough,"Steeles West, L'Amoreaux West",43.799525,-79.318389,0.0,Fast Food Restaurant,Coffee Shop,Pharmacy,Sandwich Place,Pizza Place,Chinese Restaurant,Breakfast Spot,Supermarket,Electronics Store,Bank
4,M1V,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577,0.0,Playground,Park,Intersection,Light Rail Station,Pet Store,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant
5,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,3.0,Bar,Construction & Landscaping,Accessories Store,Liquor Store,Pharmacy,Pet Store,Park,Noodle House,Movie Theater,Motel
6,M1S,Scarborough,Agincourt,43.7942,-79.262029,0.0,Lounge,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Accessories Store,Pet Store,Park,Noodle House,Movie Theater
7,M1R,Scarborough,"Wexford, Maryvale",43.750071,-79.295849,0.0,Middle Eastern Restaurant,Accessories Store,Auto Garage,Vietnamese Restaurant,Bakery,Sandwich Place,Shopping Mall,Liquor Store,Park,Noodle House
8,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,0.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Thrift / Vintage Store,Chinese Restaurant,Light Rail Station,Park,Noodle House,Movie Theater,Motel
9,M1T,Scarborough,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,0.0,Pizza Place,Convenience Store,Bank,Fried Chicken Joint,Chinese Restaurant,Italian Restaurant,Thai Restaurant,Fast Food Restaurant,Gas Station,Noodle House
10,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476,0.0,American Restaurant,Motel,Intersection,Light Rail Station,Pet Store,Park,Noodle House,Movie Theater,Middle Eastern Restaurant,Mexican Restaurant


In [53]:
# Vamos a visualizar las agrupaciones creadas:

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Esquema de color para las agrupaciones:

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Añadir marcadores:
markers_colors = []
for lat, lon, poi, cluster in zip(eastcity_merged['Latitude'], eastcity_merged['Longitude'], eastcity_merged['Neighbourhood'], eastcity_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [54]:
# Vemos que tenemos 5 agrupaciones, aunque no son muy homogéneas...

#### Revisemos ahora cada uno de los clusters:

In [55]:
# Cluster 1:

eastcity_merged.loc[eastcity_merged['Cluster Labels'] == 0, eastcity_merged.columns[[1] + list(range(5, eastcity_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
1,Scarborough,0.0,General Entertainment,College Stadium,Café,Skating Rink,Farm,Pizza Place,Pet Store,Park,Noodle House,Movie Theater
3,Scarborough,0.0,Fast Food Restaurant,Coffee Shop,Pharmacy,Sandwich Place,Pizza Place,Chinese Restaurant,Breakfast Spot,Supermarket,Electronics Store,Bank
4,Scarborough,0.0,Playground,Park,Intersection,Light Rail Station,Pet Store,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant
6,Scarborough,0.0,Lounge,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Accessories Store,Pet Store,Park,Noodle House,Movie Theater
7,Scarborough,0.0,Middle Eastern Restaurant,Accessories Store,Auto Garage,Vietnamese Restaurant,Bakery,Sandwich Place,Shopping Mall,Liquor Store,Park,Noodle House
8,Scarborough,0.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Thrift / Vintage Store,Chinese Restaurant,Light Rail Station,Park,Noodle House,Movie Theater,Motel
9,Scarborough,0.0,Pizza Place,Convenience Store,Bank,Fried Chicken Joint,Chinese Restaurant,Italian Restaurant,Thai Restaurant,Fast Food Restaurant,Gas Station,Noodle House
10,Scarborough,0.0,American Restaurant,Motel,Intersection,Light Rail Station,Pet Store,Park,Noodle House,Movie Theater,Middle Eastern Restaurant,Mexican Restaurant
11,Scarborough,0.0,Convenience Store,Hobby Shop,Department Store,Coffee Shop,Chinese Restaurant,Discount Store,Bus Station,Mexican Restaurant,Pizza Place,Pharmacy
13,Scarborough,0.0,Hakka Restaurant,Bank,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Lounge,Gas Station,Bakery,Athletics & Sports,Movie Theater


In [56]:
# Cluster 2:

eastcity_merged.loc[eastcity_merged['Cluster Labels'] == 1, eastcity_merged.columns[[1] + list(range(5, eastcity_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
12,Scarborough,1.0,Playground,Light Rail Station,Pet Store,Park,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant,Metro Station


In [57]:
# Cluster 3:

eastcity_merged.loc[eastcity_merged['Cluster Labels'] == 2, eastcity_merged.columns[[1] + list(range(5, eastcity_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
0,Scarborough,2.0,Fast Food Restaurant,Greek Restaurant,Pet Store,Park,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant,Metro Station


In [58]:
# Cluster 4: 
eastcity_merged.loc[eastcity_merged['Cluster Labels'] == 3, eastcity_merged.columns[[1] + list(range(5, eastcity_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
5,Scarborough,3.0,Bar,Construction & Landscaping,Accessories Store,Liquor Store,Pharmacy,Pet Store,Park,Noodle House,Movie Theater,Motel


In [59]:
# Cluster 5:
eastcity_merged.loc[eastcity_merged['Cluster Labels'] == 4, eastcity_merged.columns[[1] + list(range(5, eastcity_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Lugar más común,2nd Lugar más común,3rd Lugar más común,4º lugar mas comun,5º lugar mas comun,6º lugar mas comun,7º lugar mas comun,8º lugar mas comun,9º lugar mas comun,10º lugar mas comun
18,East York,4.0,Park,Convenience Store,Accessories Store,Greek Restaurant,Pet Store,Noodle House,Movie Theater,Motel,Middle Eastern Restaurant,Mexican Restaurant


## Vemos que la distribución de los clusters no es muy homogénea... Si sabes por qué me ha pasado esto, te ruego que me dejes un comentario o algo en la tarea. Muchísimas gracias por haber leído y revisado el trabajo!