# La Batalla de los Vecindarios - Proyecto Capstone por Olga Ascue

# Introducción

En este proyecto vamos a tratar de encontrar el mejor apartamento en alquiler debido a una oferta de trabajo en Down Town, Canadá. El proyecto está enfocado para las personas que esten en la búsqueda de apartamentos o casas accesibles en el vecindatio de Down Town en Canadá. Además se busca crear un análisis de las características de las personas que están migrando a Toronto en busca del mejor lugar como vivir como un análisis relativo a los otros vecindarios .

Los aspectos más destacados incluyen el precio medio de la vivienda y una mejor escuela de acuerdo con las calificaciones, las tasas de criminalidad de esa área en particular, la conectividad vial, las condiciones climáticas, la buena gestión para una emergencia, los recursos hídricos tanto frescos como residuales y los excrementos transportados en alcantarillas e instalaciones recreativas.

 

# Data Description

Necesitaremos datos sobre diferentes lugares en diferentes vecindarios de ese distrito específico. Para obtener esa información, utilizaremos la información de ubicación de "Foursquare". Este es un proveedor de datos de ubicación con información sobre todo tipo de lugares y eventos dentro de un área de interés.

Ademas haremos uso de los datos que hemos visto previamente en el curso: https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969

Haremos uso de los siguientes datos para el proyecto: Vecindario, Latitud y longitud del vecindario, Lugar, Nombre y Categoría

## Importación de librerías

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import urllib.request
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import folium


## Exportación de Datos

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
right_table=soup.find('table', class_='wikitable sortable')
right_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td>

## Separación de columnas

In [3]:
data = []
columns = []
table = soup.find(class_='wikitable')
for index, tr in enumerate(right_table.find_all('tr')):
    section = []
    for td in tr.find_all(['th','td']):
        section.append(td.text.rstrip())
    
    #First row of data is the header
    if (index == 0):
        columns = section
    else:
        data.append(section)

#convert list into Pandas DataFrame
df1 = pd.DataFrame(data = data,columns = columns)
df1.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Se observa que existen valores sin asignar en la tabala. Para asegurar que exista una mayor precisión se hara la limpieza de los datos

## Limpieza de los datos

In [4]:
#Ignore cells with a borough that is Not assigned.'
df1.drop(df1.index[df1['Borough'] == 'Not assigned'], inplace = True)
df1 = df1.reset_index(drop=True)
df1.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Si la celda tiene una ciudad pero no tiene asignado un vecindario, entonces el vecindario sera igual que la ciudad

In [5]:
df1.loc[df1['Neighborhood'] == 'Not assigned', 'Neighborhood'] = df1['Borough']
df1.shape

(103, 3)

EL dataframe ahora no presentará valores perdidos o duplicados

In [6]:
import geocoder # import geocoder
data=pd.read_csv("/Users/iris/Desktop/Geospatial_Coordinates.csv")
data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [7]:
df1['Latitude']=data['Latitude'].values
df1['Longitude']=data['Longitude'].values
df1

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.763573,-79.188711
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.770992,-79.216917
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.773136,-79.239476
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.744734,-79.239476
6,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
7,M3B,North York,Don Mills,43.711112,-79.284577
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.716316,-79.239476
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.692657,-79.264848


Como se mencionó en el problema, nos enfocaremos en la ciudad de DownTown Toronto

## Cluster de Neighborhood

In [8]:
toronto_data = df1[df1['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.763573,-79.188711
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.773136,-79.239476
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.692657,-79.264848
3,M5C,Downtown Toronto,St. James Town,43.799525,-79.318389
4,M5E,Downtown Toronto,Berczy Park,43.75749,-79.374714


In [9]:
#Getting address of Downtown Toronto
address = 'Downtown Toronto, ON'
geolocator = Nominatim(user_agent="trt_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6563221, -79.3809161.


In [10]:
# create map of Downtown Toronto using latitude and longitude values
map_trt = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df1['Latitude'], df1['Longitude'], df1['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trt)  
    
map_trt

## Exploración de Vecindarios en Downtown Toronto

In [64]:
CLIENT_ID = 'A3OE4MF5PIMSSGP4WYSJ1IWTZCY54VSETX0XZMF0ZTPEYT3K' # your Foursquare ID
CLIENT_SECRET = 'IO40RBJTWRID4F2JFHWSHQSSK3VTNZJ0X11E4BBOR0BE4A0I' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: A3OE4MF5PIMSSGP4WYSJ1IWTZCY54VSETX0XZMF0ZTPEYT3K
CLIENT_SECRET:IO40RBJTWRID4F2JFHWSHQSSK3VTNZJ0X11E4BBOR0BE4A0I


In [65]:
neighborhood_latitude = df1.loc[0, 'Latitude'] 
neighborhood_longitude = df1.loc[0, 'Longitude'] 

neighborhood_name = df1.loc[0, 'Neighborhood']
LIMIT=100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, neighborhood_latitude, 
    neighborhood_longitude, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=A3OE4MF5PIMSSGP4WYSJ1IWTZCY54VSETX0XZMF0ZTPEYT3K&client_secret=IO40RBJTWRID4F2JFHWSHQSSK3VTNZJ0X11E4BBOR0BE4A0I&ll=43.6694032,-79.3727041&v=20180605&query=43.806686299999996&radius=-79.19435340000001&limit=500'

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [67]:
trt_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [68]:
#check how many venues were returned for each neighborhood
trt_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",18,18,18,18,18,18
Central Bay Street,6,6,6,6,6,6
Christie,3,3,3,3,3,3
Church and Wellesley,8,8,8,8,8,8
"Commerce Court, Victoria Hotel",4,4,4,4,4,4
"First Canadian Place, Underground city",1,1,1,1,1,1
"Garden District, Ryerson",5,5,5,5,5,5
"Harbourfront East, Union Station, Toronto Islands",7,7,7,7,7,7
"Kensington Market, Chinatown, Grange Park",39,39,39,39,39,39
"Queen's Park, Ontario Provincial Government",8,8,8,8,8,8


Revisaremos cuantas categorías unicas existen en los lugares (venues)

In [69]:
print('There are {} uniques categories.'.format(len(trt_venues['Venue Category'].unique())))

There are 89 uniques categories.


In [70]:
# print out the list of categories
trt_venues['Venue Category'].unique()[:50]

array(['Bank', 'Electronics Store', 'Restaurant', 'Mexican Restaurant',
       'Rental Car Location', 'Medical Center', 'Intersection',
       'Breakfast Spot', 'Hakka Restaurant', 'Caribbean Restaurant',
       'Thai Restaurant', 'Athletics & Sports', 'Bakery', 'Gas Station',
       'Fried Chicken Joint', 'Café', 'General Entertainment', 'Farm',
       'Skating Rink', 'College Stadium', 'Chinese Restaurant',
       'Grocery Store', 'Sandwich Place', 'Fast Food Restaurant',
       'Pharmacy', 'Coffee Shop', 'Pizza Place', 'Supermarket', 'Gym',
       'Noodle House', 'Indian Restaurant', 'Discount Store', 'Park',
       'Bus Stop', 'Food & Drink Shop', 'Airport', 'Snack Place',
       'Curling Ice', 'Beer Store', 'Video Store', 'Ice Cream Shop',
       'Fish & Chips Shop', 'Sushi Restaurant', 'Brewery', 'Pub',
       'Italian Restaurant', 'Burrito Place', 'Pet Store', 'Steakhouse',
       'Liquor Store'], dtype=object)

In [28]:
# check if the results contain "Chinese Restaurant"
"Chinese Restaurant" in trt_venues['Venue Category'].unique()

True

Análisis de cada vecindario

In [77]:
# one hot encoding
trt1_onehot = pd.get_dummies(trt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
trt1_onehot['Neighborhoods'] = trt_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [trt1_onehot.columns[-1]] + list(trt1_onehot.columns[:-1])
trt1_onehot = trt1_onehot[fixed_columns]

print(trt1_onehot.shape)
trt1_onehot.head()

(165, 90)


Unnamed: 0,Neighborhoods,Airport,Athletics & Sports,Auto Workshop,Bakery,Bank,Bar,Baseball Field,Beer Store,Board Shop,Bookstore,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Comic Shop,Convenience Store,Curling Ice,Dessert Shop,Diner,Discount Store,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,French Restaurant,Fried Chicken Joint,Garden,Garden Center,Gas Station,General Entertainment,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,Health Food Store,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Kids Store,Latin American Restaurant,Lawyer,Light Rail Station,Liquor Store,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Pub,Recording Studio,Rental Car Location,Restaurant,Sandwich Place,Scenic Lookout,Skate Park,Skating Rink,Smoothie Shop,Snack Place,Spa,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Trail,Vegetarian / Vegan Restaurant,Video Store,Wings Joint,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [78]:
trt1_grouped = trt1_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(trt1_grouped.shape)
trt1_grouped

(18, 90)


Unnamed: 0,Neighborhoods,Airport,Athletics & Sports,Auto Workshop,Bakery,Bank,Bar,Baseball Field,Beer Store,Board Shop,Bookstore,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Comic Shop,Convenience Store,Curling Ice,Dessert Shop,Diner,Discount Store,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,French Restaurant,Fried Chicken Joint,Garden,Garden Center,Gas Station,General Entertainment,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,Health Food Store,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Kids Store,Latin American Restaurant,Lawyer,Light Rail Station,Liquor Store,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Pub,Recording Studio,Rental Car Location,Restaurant,Sandwich Place,Scenic Lookout,Skate Park,Skating Rink,Smoothie Shop,Snack Place,Spa,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Trail,Vegetarian / Vegan Restaurant,Video Store,Wings Joint,Yoga Studio
0,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.055556,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Central Bay Street,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
5,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Harbourfront East, Union Station, Toronto Islands",0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0
8,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.025641,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.025641,0.025641,0.0,0.025641,0.0,0.051282,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.051282,0.051282,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.051282,0.0,0.051282,0.0,0.0,0.025641,0.0,0.0,0.025641
9,"Queen's Park, Ontario Provincial Government",0.0,0.125,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0


In [32]:
len(trt1_grouped[trt1_grouped["Chinese Restaurant"] > 0])

2

In [34]:
trt1_mall = trt1_grouped[["Neighborhoods","Chinese Restaurant"]]

trt1_mall.head()

Unnamed: 0,Neighborhoods,Chinese Restaurant
0,"CN Tower, King and Spadina, Railway Lands, Har...",0.0
1,Central Bay Street,0.0
2,Christie,0.0
3,Church and Wellesley,0.125
4,"Commerce Court, Victoria Hotel",0.0


## Cluster Neighborhoods

In [36]:
# set number of clusters
kclusters = 3

trt1_clustering = trt1_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(trt1_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0], dtype=int32)

In [37]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
trt1_merged = trt1_mall.copy()

# add clustering labels
trt1_merged["Cluster Labels"] = kmeans.labels_

In [38]:
trt1_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
trt1_merged.head()

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels
0,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0
1,Central Bay Street,0.0,0
2,Christie,0.0,0
3,Church and Wellesley,0.125,1
4,"Commerce Court, Victoria Hotel",0.0,0


In [47]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
trt1_merged = trt1_merged.join(toronto_data.set_index("Neighborhood"), on="Neighborhood")

print(trt1_merged.shape)
trt1_merged # check the last columns!

(18, 7)


Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Postal Code,Borough,Latitude,Longitude
0,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0,M5V,Downtown Toronto,43.662744,-79.321558
1,Central Bay Street,0.0,0,M5G,Downtown Toronto,43.782736,-79.442259
2,Christie,0.0,0,M6G,Downtown Toronto,43.753259,-79.329656
3,Church and Wellesley,0.125,1,M4Y,Downtown Toronto,43.696319,-79.532242
4,"Commerce Court, Victoria Hotel",0.0,0,M5L,Downtown Toronto,43.689574,-79.38316
5,"First Canadian Place, Underground city",0.0,0,M5X,Downtown Toronto,43.724766,-79.532242
6,"Garden District, Ryerson",0.0,0,M5B,Downtown Toronto,43.692657,-79.264848
7,"Harbourfront East, Union Station, Toronto Islands",0.0,0,M5J,Downtown Toronto,43.695344,-79.318389
8,"Kensington Market, Chinatown, Grange Park",0.0,0,M5T,Downtown Toronto,43.651571,-79.48445
9,"Queen's Park, Ontario Provincial Government",0.0,0,M7A,Downtown Toronto,43.773136,-79.239476


## Visualización de resultados

In [48]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(trt1_merged['Latitude'], trt1_merged['Longitude'], trt1_merged['Neighborhood'], trt1_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Análisis de los Clusters

## Cluster 0

In [76]:
trt1_merged.loc[trt1_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Postal Code,Borough,Latitude,Longitude
0,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0,M5V,Downtown Toronto,43.662744,-79.321558
1,Central Bay Street,0.0,0,M5G,Downtown Toronto,43.782736,-79.442259
2,Christie,0.0,0,M6G,Downtown Toronto,43.753259,-79.329656
4,"Commerce Court, Victoria Hotel",0.0,0,M5L,Downtown Toronto,43.689574,-79.38316
5,"First Canadian Place, Underground city",0.0,0,M5X,Downtown Toronto,43.724766,-79.532242
6,"Garden District, Ryerson",0.0,0,M5B,Downtown Toronto,43.692657,-79.264848
7,"Harbourfront East, Union Station, Toronto Islands",0.0,0,M5J,Downtown Toronto,43.695344,-79.318389
8,"Kensington Market, Chinatown, Grange Park",0.0,0,M5T,Downtown Toronto,43.651571,-79.48445
9,"Queen's Park, Ontario Provincial Government",0.0,0,M7A,Downtown Toronto,43.773136,-79.239476
10,"Regent Park, Harbourfront",0.0,0,M5A,Downtown Toronto,43.763573,-79.188711


## Cluster 1

In [49]:
trt1_merged.loc[trt1_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Postal Code,Borough,Latitude,Longitude
3,Church and Wellesley,0.125,1,M4Y,Downtown Toronto,43.696319,-79.532242


## Cluser 2

In [50]:
trt1_merged.loc[trt1_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Postal Code,Borough,Latitude,Longitude
13,St. James Town,0.133333,2,M5C,Downtown Toronto,43.799525,-79.318389


## Top 10 de lugares más populares in Downtown Toronto

In [80]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = trt_grouped['Neighborhood']

for ind in np.arange(trt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(trt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"CN Tower, King and Spadina, Railway Lands, Har...",Light Rail Station,Comic Shop,Recording Studio,Butcher,Skate Park,Burrito Place,Pizza Place,Brewery,Spa,Farmers Market
1,Central Bay Street,Grocery Store,Bank,Coffee Shop,Discount Store,Pharmacy,Pizza Place,Garden Center,Garden,Convenience Store,Curling Ice
2,Christie,Park,Bus Stop,Food & Drink Shop,Yoga Studio,Fast Food Restaurant,Dessert Shop,Diner,Discount Store,Electronics Store,Falafel Restaurant
3,Church and Wellesley,Pizza Place,Coffee Shop,Chinese Restaurant,Sandwich Place,Middle Eastern Restaurant,Intersection,Discount Store,Farm,Dessert Shop,Diner
4,"Commerce Court, Victoria Hotel",Lawyer,Park,Trail,Restaurant,Yoga Studio,Curling Ice,Dessert Shop,Diner,Discount Store,Electronics Store


In [21]:
trt_data = neighborhoods_venues_sorted[neighborhoods_venues_sorted['3rd Most Common Venue'] == 'Chinese Restaurant'].reset_index(drop=True)
trt_data.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Church and Wellesley,Pizza Place,Coffee Shop,Chinese Restaurant,Sandwich Place,Middle Eastern Restaurant,Intersection,Discount Store,Farm,Dessert Shop,Diner
