# IBM Applied Data Science Capstone Project

## Opening a Hotel in Cuauhtemoc, Mexico City

- Build a dataframe of "colonias" in Cuauhtemoc, Mexico City by web scraping the data from Wikipedia page
- Get the geographical coordinates of the "colonias"
- Obtain the hotel data for the "colonias" from Foursquare API
- Explore and cluster the "colonias"
- Select the best cluster to open a new hotel

## Import Libraries

In [2]:
import numpy as np

import pandas as pd 
pd.set_option('display.max_columns', None) 
pd.set_option('display.max_rows', None)

import json

from geopy.geocoders import Nominatim


import requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize

import matplotlib.cm as cm 
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium 

print('Libraries imported.')

Libraries imported.


## Scrape Table From Wikipedia Into Dataframe

In [3]:
import urllib.request
from urllib.request import urlopen


url = 'https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Mexico_City'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")

In [4]:
# find all tables in page
all_tables=soup.find_all("table")
#all_tables

In [5]:
# locate the relevant table
right_table=soup.find('table', class_='sortable wikitable')
#right_table

In [6]:
A=[]
B=[]

# append the data into the list
for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==2:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))

In [7]:
# create a new DataFrame from the list 
df=pd.DataFrame(A,columns=['Colonia'])
df['Population as of 2010']=B
df.head(10)

Unnamed: 0,Colonia,Population as of 2010
0,Ampliación Asturias,5708
1,Asturias,4364
2,Atlampa,14433
3,Buenavista,15605
4,Buenos Aires,5772
5,Centro,61229
6,Condesa,8453
7,Cuauhtémoc,11399
8,Doctores,44703
9,Esperanza,4072


In [8]:
# print the number of rows/columns of the dataframe
df.shape

(32, 2)

In [9]:
df['Municipality']='Cuauhtemoc'
df.head(10)

Unnamed: 0,Colonia,Population as of 2010,Municipality
0,Ampliación Asturias,5708,Cuauhtemoc
1,Asturias,4364,Cuauhtemoc
2,Atlampa,14433,Cuauhtemoc
3,Buenavista,15605,Cuauhtemoc
4,Buenos Aires,5772,Cuauhtemoc
5,Centro,61229,Cuauhtemoc
6,Condesa,8453,Cuauhtemoc
7,Cuauhtémoc,11399,Cuauhtemoc
8,Doctores,44703,Cuauhtemoc
9,Esperanza,4072,Cuauhtemoc


In [10]:
import geocoder
from opencage.geocoder import OpenCageGeocode

In [11]:
import getpass
key = getpass.getpass("Geocoder_Key: ")

Geocoder_Key: ········


In [12]:
geocoder = OpenCageGeocode(key)
# retrive coordinates for each colonia and store in a list

list_lat = []   
list_long = []



for index, row in df.iterrows(): 


    Colonia = row['Colonia']
    Municipality = row['Municipality']       
    query = str(Colonia)+','+str(Municipality) + " ,Mexico City"

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)

    

df['lat'] = list_lat   

df['lon'] = list_long

In [13]:
df

Unnamed: 0,Colonia,Population as of 2010,Municipality,lat,lon
0,Ampliación Asturias,5708.0,Cuauhtemoc,19.44506,-99.14612
1,Asturias,4364.0,Cuauhtemoc,19.40762,-99.13322
2,Atlampa,14433.0,Cuauhtemoc,19.456785,-99.156875
3,Buenavista,15605.0,Cuauhtemoc,19.446167,-99.152696
4,Buenos Aires,5772.0,Cuauhtemoc,19.405364,-99.149864
5,Centro,61229.0,Cuauhtemoc,19.406526,-99.155157
6,Condesa,8453.0,Cuauhtemoc,19.414864,-99.176429
7,Cuauhtémoc,11399.0,Cuauhtemoc,19.425662,-99.154645
8,Doctores,44703.0,Cuauhtemoc,19.421442,-99.14322
9,Esperanza,4072.0,Cuauhtemoc,19.409644,-99.135924


In [14]:
df0=df[['Colonia','lat','lon']]
df0

Unnamed: 0,Colonia,lat,lon
0,Ampliación Asturias,19.44506,-99.14612
1,Asturias,19.40762,-99.13322
2,Atlampa,19.456785,-99.156875
3,Buenavista,19.446167,-99.152696
4,Buenos Aires,19.405364,-99.149864
5,Centro,19.406526,-99.155157
6,Condesa,19.414864,-99.176429
7,Cuauhtémoc,19.425662,-99.154645
8,Doctores,19.421442,-99.14322
9,Esperanza,19.409644,-99.135924


In [15]:
df2 = df0[~df0["Colonia"].isin(['Centro','Buenos Aires','Hipódromo']) ]
df2

Unnamed: 0,Colonia,lat,lon
0,Ampliación Asturias,19.44506,-99.14612
1,Asturias,19.40762,-99.13322
2,Atlampa,19.456785,-99.156875
3,Buenavista,19.446167,-99.152696
6,Condesa,19.414864,-99.176429
7,Cuauhtémoc,19.425662,-99.154645
8,Doctores,19.421442,-99.14322
9,Esperanza,19.409644,-99.135924
10,Ex Hipódromo de Peralvillo,19.456775,-99.13501
11,Felipe Pescador,19.454137,-99.125458


In [16]:
# save the dataframe as a csv file
df2.to_csv("df2.csv", index=False)

## Create a map of Cuauhtemoc, Mexico City with colonia markers

In [17]:
# get the coordinates of Cuauhtemoc
query = 'Cuauhtemoc Mexico City'

results = geocoder.geocode(query)
latitude = results[0]['geometry']['lat']

longitude = results[0]['geometry']['lng']



print('The geograpical coordinate of Cuauhtemoc, Mexico City {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cuauhtemoc, Mexico City 19.4416128, -99.1518637.


In [18]:
locations = df2[['lat','lon']]
locationlist = locations.values.tolist()
len(locationlist)
locationlist[24]

[19.4357757, -99.1539401]

In [19]:
# create a map of Cuauhtemoc
map_df2 = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, neighborhood in zip(df2['lat'], df2['lon'], df2['Colonia']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='black',
        fill=True,
        fill_color='purple',
        fill_opacity=0.5).add_to(map_df2)   
    
map_df2

## Use Foursquare API to explore the colonias

In [20]:
# define Foursquare Credentials and Version
import getpass
CLIENT_ID = getpass.getpass("YOUR_CLIENT_ID: ")
CLIENT_SECRET = getpass.getpass("YOUR_CLIENT_SECRET: ")
VERSION = '20180605' # Foursquare API version

#print(f'CLIENT_ID:{CLIENT_ID}')
#print(f'CLIENT_SECRET:{CLIENT_SECRET}')

YOUR_CLIENT_ID: ········
YOUR_CLIENT_SECRET: ········


 #### get the top 100 venues that are within a radius of 3000 meters

In [21]:
radius = 3000
LIMIT = 100

venues = []

for lat, lon, Colonia in zip(df2['lat'], df2['lon'], df2['Colonia']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            Colonia,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Colonia', 'lon', 'lat', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(20)

(2900, 7)


Unnamed: 0,Colonia,lon,lat,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ampliación Asturias,19.44506,-99.14612,Carnitas Rigo,19.447722,-99.144617,Mexican Restaurant
1,Ampliación Asturias,19.44506,-99.14612,La Terraza La Birria,19.441306,-99.145838,Food
2,Ampliación Asturias,19.44506,-99.14612,Casa Rivas Mercado,19.440926,-99.146706,Historic Site
3,Ampliación Asturias,19.44506,-99.14612,El Rey del Pastor » Taquería y pozolería,19.445172,-99.14714,Mexican Restaurant
4,Ampliación Asturias,19.44506,-99.14612,Palacio De Bellas Artes,19.440565,-99.143499,Art Gallery
5,Ampliación Asturias,19.44506,-99.14612,Don Chuy: Birria y Pozole,19.441299,-99.145853,Mexican Restaurant
6,Ampliación Asturias,19.44506,-99.14612,Las Brasitas,19.448293,-99.146835,Taco Place
7,Ampliación Asturias,19.44506,-99.14612,Tacos El Paraíso,19.44373,-99.149433,Taco Place
8,Ampliación Asturias,19.44506,-99.14612,Turin,19.447989,-99.151951,Candy Store
9,Ampliación Asturias,19.44506,-99.14612,Micro Teatro Mexico,19.445994,-99.153665,Public Art


#### check how many venues were returned for each colonia

In [23]:
venues_df.groupby(["Colonia"]).count()

Unnamed: 0_level_0,lon,lat,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Colonia,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ampliación Asturias,100,100,100,100,100,100
Asturias,100,100,100,100,100,100
Atlampa,100,100,100,100,100,100
Buenavista,100,100,100,100,100,100
Condesa,100,100,100,100,100,100
Cuauhtémoc,100,100,100,100,100,100
Doctores,100,100,100,100,100,100
Esperanza,100,100,100,100,100,100
Ex Hipódromo de Peralvillo,100,100,100,100,100,100
Felipe Pescador,100,100,100,100,100,100


#### find out how many unique categories can be made from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 114 uniques categories.


In [25]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:99]

array(['Mexican Restaurant', 'Food', 'Historic Site', 'Art Gallery',
       'Taco Place', 'Candy Store', 'Public Art',
       'Comfort Food Restaurant', 'History Museum', 'Brewery',
       'Donut Shop', 'Bakery', 'Salad Place', 'Art Museum',
       'Gym / Fitness Center', 'Hotel', 'Opera House', 'Hostel', 'Museum',
       'Park', 'Restaurant', 'Post Office', 'Sushi Restaurant', 'Plaza',
       'Concert Hall', 'Exhibit', 'Scenic Lookout',
       'General Entertainment', 'Monument / Landmark', 'Coffee Shop',
       'Steakhouse', 'Bed & Breakfast', 'Bistro', 'Seafood Restaurant',
       'Ice Cream Shop', 'Russian Restaurant', 'Theater',
       'Vegetarian / Vegan Restaurant', 'Building', 'Tapas Restaurant',
       'Tea Room', 'North Indian Restaurant', 'Deli / Bodega',
       'Sporting Goods Shop', 'Asian Restaurant', 'Diner', 'Beer Garden',
       'Spanish Restaurant', 'Jazz Club', 'Burger Joint', 'Pie Shop',
       'Nail Salon', 'Food Truck', 'Bar', 'Theme Restaurant',
       'Pizza Pla

In [26]:
# check if the results contain "Hotel"
"Hotel" in venues_df['VenueCategory'].unique()

True

## Analyze Each Colonia

In [27]:
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Colonia'] = venues_df['Colonia'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head(10)

(2900, 115)


Unnamed: 0,Colonia,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Cafeteria,Café,Candy Store,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Coffee Shop,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Exhibit,Falafel Restaurant,Farmers Market,Flea Market,Food,Food Truck,Fountain,French Restaurant,Garden,General College & University,General Entertainment,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hostel,Hotel,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Liquor Store,Market,Martial Arts School,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Museum,Music Store,Music Venue,Nail Salon,Non-Profit,North Indian Restaurant,Opera House,Optical Shop,Paella Restaurant,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pie Shop,Pizza Place,Plaza,Post Office,Public Art,Restaurant,Roof Deck,Russian Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shopping Mall,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Stadium,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Theater,Theme Restaurant,Vegetarian / Vegan Restaurant,Warehouse Store,Water Park
0,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ampliación Asturias,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
7,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
8,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Ampliación Asturias,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### group rows by colonia and by taking the mean of the frequency of occurrence of each category

In [28]:
kl_grouped = kl_onehot.groupby(["Colonia"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

(29, 115)


Unnamed: 0,Colonia,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Cafeteria,Café,Candy Store,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Coffee Shop,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Exhibit,Falafel Restaurant,Farmers Market,Flea Market,Food,Food Truck,Fountain,French Restaurant,Garden,General College & University,General Entertainment,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hostel,Hotel,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Liquor Store,Market,Martial Arts School,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Museum,Music Store,Music Venue,Nail Salon,Non-Profit,North Indian Restaurant,Opera House,Optical Shop,Paella Restaurant,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pie Shop,Pizza Place,Plaza,Post Office,Public Art,Restaurant,Roof Deck,Russian Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shopping Mall,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Stadium,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Theater,Theme Restaurant,Vegetarian / Vegan Restaurant,Warehouse Store,Water Park
0,Ampliación Asturias,0.0,0.0,0.02,0.06,0.0,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.03,0.03,0.01,0.02,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.06,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0
1,Asturias,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.07,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.03,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.05,0.01,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.1,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.02,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.06,0.0,0.01,0.01,0.0,0.01,0.03,0.0,0.01
2,Atlampa,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.02,0.01,0.0,0.03,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.02,0.01,0.01,0.05,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.19,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.1,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0
3,Buenavista,0.0,0.0,0.02,0.06,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.03,0.03,0.01,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.18,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.06,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0
4,Condesa,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.09,0.02,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.02,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.02,0.04,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.08,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.03,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.01,0.0,0.01,0.03,0.0,0.0
5,Cuauhtémoc,0.01,0.01,0.01,0.06,0.01,0.02,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.03,0.01,0.04,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.01,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0
6,Doctores,0.01,0.0,0.0,0.05,0.0,0.02,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.01,0.01,0.01,0.0,0.0,0.04,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.04,0.04,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.05,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0
7,Esperanza,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.08,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.02,0.0,0.03,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.06,0.01,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.11,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.04,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.07,0.0,0.01,0.01,0.0,0.01,0.02,0.0,0.0
8,Ex Hipódromo de Peralvillo,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.02,0.01,0.0,0.03,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.02,0.01,0.01,0.05,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.19,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.1,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,Felipe Pescador,0.0,0.0,0.01,0.06,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.02,0.02,0.01,0.02,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.2,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.02,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.07,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0


In [29]:
len(kl_grouped[kl_grouped["Hotel"] > 0])

23

#### Create a new DataFrame for hotel data only

In [30]:
kl_hotel = kl_grouped[["Colonia","Hotel"]]
kl_hotel

Unnamed: 0,Colonia,Hotel
0,Ampliación Asturias,0.02
1,Asturias,0.0
2,Atlampa,0.01
3,Buenavista,0.02
4,Condesa,0.02
5,Cuauhtémoc,0.04
6,Doctores,0.04
7,Esperanza,0.0
8,Ex Hipódromo de Peralvillo,0.01
9,Felipe Pescador,0.02


## Cluster Colonias

In [31]:
# set number of clusters
kclusters = 3

kl_clustering = kl_hotel.drop(["Colonia"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:32]

array([2, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 2, 0, 2, 2, 1, 2, 1, 0, 1, 2, 1,
       1, 2, 0, 0, 2, 1, 2], dtype=int32)

In [32]:
# create a new dataframe that includes the cluster for each colonia.
kl_merged = kl_hotel.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [33]:
kl_merged.rename(columns={"Colonia": "Colonia"}, inplace=True)
kl_merged.head()

Unnamed: 0,Colonia,Hotel,Cluster Labels
0,Ampliación Asturias,0.02,2
1,Asturias,0.0,1
2,Atlampa,0.01,1
3,Buenavista,0.02,2
4,Condesa,0.02,2


In [34]:
# merge to add latitude/longitude for each colonia
kl_merged = kl_merged.join(df2.set_index("Colonia"), on="Colonia")
print(kl_merged.shape)
kl_merged.head() 

(29, 5)


Unnamed: 0,Colonia,Hotel,Cluster Labels,lat,lon
0,Ampliación Asturias,0.02,2,19.44506,-99.14612
1,Asturias,0.0,1,19.40762,-99.13322
2,Atlampa,0.01,1,19.456785,-99.156875
3,Buenavista,0.02,2,19.446167,-99.152696
4,Condesa,0.02,2,19.414864,-99.176429


In [35]:
# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(29, 5)


Unnamed: 0,Colonia,Hotel,Cluster Labels,lat,lon
24,Tabacalera,0.04,0,19.435776,-99.15394
25,Tránsito,0.04,0,19.41789,-99.131599
5,Cuauhtémoc,0.04,0,19.425662,-99.154645
6,Doctores,0.04,0,19.421442,-99.14322
12,Juárez,0.04,0,19.433105,-99.14771
18,Roma Norte,0.04,0,19.418323,-99.162565
17,Peralvillo,0.0,1,19.461817,-99.13419
15,Obrera,0.01,1,19.4132,-99.144049
27,Valle Gómez,0.01,1,19.458629,-99.125247
21,San Simón Tolnáhuac,0.0,1,19.459546,-99.143257


#### visualize the resulting clusters

In [41]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['lat'], kl_merged['lon'], kl_merged['Colonia'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.2).add_to(map_clusters)
       
map_clusters

In [37]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## Examine Clusters

In [38]:
#cluster 0
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Colonia,Hotel,Cluster Labels,lat,lon
24,Tabacalera,0.04,0,19.435776,-99.15394
25,Tránsito,0.04,0,19.41789,-99.131599
5,Cuauhtémoc,0.04,0,19.425662,-99.154645
6,Doctores,0.04,0,19.421442,-99.14322
12,Juárez,0.04,0,19.433105,-99.14771
18,Roma Norte,0.04,0,19.418323,-99.162565


In [39]:
#cluster 1
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Colonia,Hotel,Cluster Labels,lat,lon
17,Peralvillo,0.0,1,19.461817,-99.13419
15,Obrera,0.01,1,19.4132,-99.144049
27,Valle Gómez,0.01,1,19.458629,-99.125247
21,San Simón Tolnáhuac,0.0,1,19.459546,-99.143257
22,Santa María Insurgentes,0.0,1,19.460899,-99.152712
19,Roma Sur,0.0,1,19.405833,-99.163304
8,Ex Hipódromo de Peralvillo,0.01,1,19.456775,-99.13501
7,Esperanza,0.0,1,19.409644,-99.135924
2,Atlampa,0.01,1,19.456785,-99.156875
1,Asturias,0.0,1,19.40762,-99.13322


In [40]:
#cluster 2
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Colonia,Hotel,Cluster Labels,lat,lon
26,Unidad Habitacional Nonoalco-Tlatelolco,0.02,2,19.453077,-99.14078
20,San Rafael,0.02,2,19.44506,-99.14612
23,Santa María la Ribera,0.02,2,19.448417,-99.157975
0,Ampliación Asturias,0.02,2,19.44506,-99.14612
14,Morelos,0.02,2,19.446847,-99.12984
13,Maza,0.02,2,19.454964,-99.128207
11,Hipódromo Condesa,0.02,2,19.44506,-99.14612
10,Guerrero,0.02,2,19.44491,-99.145173
9,Felipe Pescador,0.02,2,19.454137,-99.125458
4,Condesa,0.02,2,19.414864,-99.176429


#### Preliminary Observations

The highest concentration of hotels per colonia are in cluster 0 followed by cluster 2. Colonias in cluster 1 do not have as high a preponderance of hotels. Therefore, building a new Hotel somewhere in cluster 1 seems to be the most strategic decision based on the data.