# Final Assignment Week 4 - Electric charges infrastructure in Puebla City, Mexico.

### Introduction

##### The electric car sales will be increased in the following years. As a following effect, for every place  a car can reach, the  electric chargers will be required.
##### Nowadays just few brands from electrics cars can be seen on the streets. In Mexico, electric cars market is just starting. Currently is quite strange to see those cars
##### unless you are located in wealthy neighborhoods. On the other side, there are already electric cars production projects in almost most of OEM's. So it's almost a fact, that
##### electric cars will be  mass produced. 

In [133]:
Image(url= "https://canalys-com-public-prod.s3.eu-west-2.amazonaws.com/cosi/campaign/1935/vKlFQ43feFy9cC_VMpHFXVQkJ0rid7A9.png", width=800, height=800)

##### We know this is a huge challenge for every government indeed. However we do not expect to update infrastructure for electrics cars in a random way.
##### As well as gas stations, these places should be located strategically. As well of different bunch of suppliers ready to bite a piece of cake from this market.

In [134]:
Image(url= "https://i.guim.co.uk/img/media/7d5dfcdf7a70afe5273fa65590fdb99fdcda1a97/0_89_3000_1800/master/3000.jpg?width=1200&quality=85&auto=format&fit=max&s=d9a6fc098390afea29a96f4c59da2adc", width=800, height=800)

### Problem

##### Let say I am businessman interested to invest some of my money on this incoming business in the city of Puebla, Mexico. 
##### The question is...
####  Where  are the best places to build them ? 
##### Thinking a little bit by walking on the customer shoes, Why should I go to other side of the country, if there is no way to charge my car properly and risk my trip back home?
##### In order to make it a profitable business, not only a charge station should be builded, but a complete net around the city. 

#####  Therefore, we need to choose a strategically places to locate this charges spots. 
#####  Here we can approach the problem in different perspectives: 

* By income (How much does a neighborhood earns in average)
* By main streets
* By most frequently visited places
* By law 
* By the already installed charge spots. 
* Between others


##### On this analysis, we will choose the "By most frequently visited places" focus and use the Foursquare applications in order to find those places. 
##### The main hypothesis comes as:
####  It is more likely that people recharge their cars in more crowded places. Not only for business is better but also for accessibility of people.


### Data approach and methodology

##### Now some important considerations to take into account from venues: 

* We will focus on the city of Puebla. Taking into account a radio of 50 km around the main coordinate from Puebla
* Charge time for an average to go back home in the surrounding of Puebla city: At least 1 hr. 
* As shown before it doesn't make sense to consider places "Take n' go" if the customer will not spend enough time to charge it's car 
* We considered there is an electric infrastructure already on the selected boroughs.


#### Methodology: 

1. Find a link where to get Puebla City boroughs: 

https://codigospostales.nte.mx/poblados-de-Puebla-estado-de-puebla.html

2. Clean the data
   1. Load the set in the correct format for this information 
   2. Import coordinates
   3. Clean out unavailable data from geocoder
   4. Clean out data from coordinates out of range of 50 km around Puebla
   
3. Venues in Puebla 
   
   1. Load the venues in Puebla
   2. Take a look on the most frequently places. 
   3. Filter those venues which are not useful for the analysis.

4. K-means

   1. One hot data
   2. Get the top venues per area
   3. Get the K-means 
   4. Plot the venues per cluster
   
5. Analysis and Conclusions
   


In [5]:
Image(url= "https://www.honda.co.uk/engineroom/electric/ev/the-ultimate-electric-car-faq/assets/o2fBPuWzeW/batteriesh.gif", width=800, height=800)

## Development

In [2]:
#Required libraries for development
!pip install bs4
#!pip install requests

import numpy as np # library to handle data in a vectorized manner
import requests  # this module helps us to download a web page
import pandas as pd
import json # library to handle JSON files
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

!conda install -c conda-forge geocoder --yes  
import geocoder
#from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from bs4 import BeautifulSoup # this module helps in web scrapping.

from sklearn.cluster import KMeans # import k-means from clustering stage

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from IPython import display
from IPython.display import Image

import matplotlib as mpl
import matplotlib.pyplot as plt

print("Libs ok")

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libs ok


#### Reading the zip codes and neighboorhoods from Puebla City

In [3]:
url0 = "https://codigospostales.nte.mx/poblados-de-Puebla-estado-de-puebla.html"
dataframe_list = pd.read_html(url0, flavor='bs4')
len(dataframe_list)

Poblacion_P = dataframe_list[0]
Poblacion_P.columns = ['Asentamiento','Nombre Asentamiento','Municipio','Estado','Codigo Postal']
Poblacion_P.head()

Unnamed: 0,Asentamiento,Nombre Asentamiento,Municipio,Estado,Codigo Postal
0,Colonia,15 de Septiembre,Puebla,Puebla,72227
1,Colonia,16 de Septiembre Norte,Puebla,Puebla,72230
2,Colonia,16 de Septiembre Sur,Puebla,Puebla,72474
3,Colonia,18 de Marzo,Puebla,Puebla,72595
4,Colonia,2 de Marzo,Puebla,Puebla,72227


#### Reordering the data

In [4]:
ordenado = Poblacion_P

ordenado = ordenado.drop(['Asentamiento','Estado'], axis=1)
ordenado = ordenado.rename(columns = {'Nombre Asentamiento': 'Neighborhood', 'Municipio': 'Borough'}, inplace = False)
ordenado.head()

Unnamed: 0,Neighborhood,Borough,Codigo Postal
0,15 de Septiembre,Puebla,72227
1,16 de Septiembre Norte,Puebla,72230
2,16 de Septiembre Sur,Puebla,72474
3,18 de Marzo,Puebla,72595
4,2 de Marzo,Puebla,72227


#### Getting the latitude and longitude of each neighboorhood avaliable

In [5]:
puebla_coor = pd.DataFrame(columns = ['Latitude','Longitude'])

for row in ordenado['Neighborhood']:
        g = geocoder.osm(row + ", Puebla")
        v = g.latlng
        
        if v == None:
           v = [0,0]     #If geocoder coudn't find a location, then we switch None for a 0 Value, in order to continue with appending function
        
        puebla_coor = puebla_coor.append({'Latitude':v[0], 'Longitude':v[1]}, ignore_index=True)
        #print(v)

In [7]:
puebla_coor.to_csv("puebla_coor.csv")


ordenado2 = ordenado.join(puebla_coor)
#ordenadoR = ordenado2
#ordenado2.to_csv("ordenado_CSV.csv")
#ordenadoR.head()

#ordenado2 = pd.read_csv('ordenado_CSV.csv')
#ordenado2.drop(columns=['Unnamed: 0'])
ordenado2.head()

Unnamed: 0,Neighborhood,Borough,Codigo Postal,Latitude,Longitude
0,15 de Septiembre,Puebla,72227,19.01911,-98.220416
1,16 de Septiembre Norte,Puebla,72230,24.051496,-104.592982
2,16 de Septiembre Sur,Puebla,72474,18.994701,-98.219232
3,18 de Marzo,Puebla,72595,18.968889,-98.160278
4,2 de Marzo,Puebla,72227,19.051995,-98.254796


##### Now we have some irrelevant information which will not help to our analysis
##### First we will drop all those neighboords which Geocoder didn't recognize
##### Secondly, Geocoder might confuse some of the locations with another ones around the world.
##### Therefore, we will supreme all those places which are 1/2' Degree further than the Latitude and Logitude from Puebla (1' Degree ~ 111 km)

In [8]:
ordenado3 = ordenado2


#Puebla coordinates
LA_Puebla =  19.0413
LO_Puebla = -98.2062
Radio_P = 0.45       #Nearly 50 km round

ordenado3 = ordenado3[(ordenado3 != 0).all(1)]  # DropAll those neighboorhoods without location

ordenado3.drop(ordenado3[ordenado3['Latitude'] > (LA_Puebla + Radio_P)].index, inplace = True)  #50 km north
ordenado3.drop(ordenado3[ordenado3['Latitude'] < (LA_Puebla - Radio_P)].index, inplace = True)  #50 km south 
ordenado3.drop(ordenado3[ordenado3['Longitude'] > (LO_Puebla + Radio_P)].index, inplace = True) #50 km east 
ordenado3.drop(ordenado3[ordenado3['Longitude'] < (LO_Puebla - Radio_P)].index, inplace = True) #50 km west

ordenado3.drop_duplicates(subset=['Neighborhood'])
ordenado3 = ordenado3.reset_index(drop=True)
ordenado3.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Borough,Codigo Postal,Latitude,Longitude
0,15 de Septiembre,Puebla,72227,19.01911,-98.220416
1,16 de Septiembre Sur,Puebla,72474,18.994701,-98.219232
2,18 de Marzo,Puebla,72595,18.968889,-98.160278
3,2 de Marzo,Puebla,72227,19.051995,-98.254796
4,6 de Junio,Puebla,72227,19.111389,-98.147222


#### Visualization neighboorhoods through Folium

In [9]:
map_Puebla = folium.Map(location = [LA_Puebla, LO_Puebla], zoom_start=10)  # create map of New York using latitude and longitude values

# add markers to map
for lat, lng, borough, neighborhood in zip(ordenado3['Latitude'], ordenado3['Longitude'], ordenado3['Borough'], ordenado3['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Puebla)  
    
map_Puebla

In [10]:
#Foursquare crendentials

#CLIENT_ID = 'C1WRPSVUXQ5MFGIRI333ZSGWFBLAKFVQPLWJYBKKCBLCXMV5' # your Foursquare ID
#CLIENT_SECRET = 'R05Q31OQNMQI5ZW2VTGEIO1H5TCEVS1WEHMNFKMB31QK2GZI' # your Foursquare Secret

CLIENT_ID = 'C1WRPSVUXQ5MFGIRI333ZSGWFBLAKFVQPLWJYBKKCBLCXMV5' # your Foursquare ID
CLIENT_SECRET = 'R05Q31OQNMQI5ZW2VTGEIO1H5TCEVS1WEHMNFKMB31QK2GZI' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: C1WRPSVUXQ5MFGIRI333ZSGWFBLAKFVQPLWJYBKKCBLCXMV5
CLIENT_SECRET:R05Q31OQNMQI5ZW2VTGEIO1H5TCEVS1WEHMNFKMB31QK2GZI


In [11]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius

#We will use the average location of the coordinates localiton
url2 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    LA_Puebla,    #Latitude
    LO_Puebla,    #Longitude
    radius, 
    LIMIT)

results = requests.get(url2).json()

In [13]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()



Unnamed: 0,name,categories,lat,lng
0,Miel de Agave,Bar,19.042566,-98.200806
1,BRICO Pizzería Restaurant,Pizza Place,19.04355,-98.203515
2,Super Paletería Mary Barragán,Ice Cream Shop,19.038942,-98.201833
3,Todo Rock,Tattoo Parlor,19.043225,-98.209372
4,El Sueño Spa y Hotel,Hotel,19.041057,-98.199411


In [15]:
print('There are {} uniques categories.'.format(len(nearby_venues['categories'].unique())))

There are 42 uniques categories.


In [16]:
print (nearby_venues['categories'].value_counts())

Mexican Restaurant               20
Hotel                             8
Bar                               6
Taco Place                        5
Coffee Shop                       5
Candy Store                       4
Café                              4
Bakery                            3
Historic Site                     3
Steakhouse                        3
Ice Cream Shop                    2
Church                            2
Italian Restaurant                2
Plaza                             2
Vegetarian / Vegan Restaurant     2
Department Store                  2
Pizza Place                       2
Cosmetics Shop                    1
Art Gallery                       1
Restaurant                        1
Bed & Breakfast                   1
Event Space                       1
Sporting Goods Shop               1
Gym / Fitness Center              1
Seafood Restaurant                1
Sports Bar                        1
Art Museum                        1
Food Truck                  

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        # headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
Puebla_venues = getNearbyVenues(names=ordenado3['Neighborhood'],
                                   latitudes=ordenado3['Latitude'],
                                   longitudes=ordenado3['Longitude']
                                  )

15 de Septiembre
16 de Septiembre Sur
18 de Marzo
2 de Marzo
6 de Junio
8 de Diciembre
Acocota
Adolfo López Mateos
Álamos Haras
Alcanfores
Alpha 2
Álvaro Obregón
Ampliación Balcones del sur
Ampliación Reforma
Angelopolis
Antigua Francisco Villa
Anzures
Arboledas de Loma Bella
Arboledas del Sur
Artículo Primero Constitucional
Balcones del Sur
Barranca Honda
Barrio de Santa Anita
Barrio San Juan (San Francisco Totimehuacan)
Barrio San Miguel
Barrios de Santa Catarina
Bellas Artes
Benito Juárez
Benito Juárez
Benito Juárez
Bosques de Amalucan
Bosques de Amalucan 1ra Sección
Bosques de Angelopolis
Bosques de Chapultepec
Bosques de la Cañada
Bosques de los Angeles
Bosques de Manzanilla
Bosques de Santa Anita
Britania
Buenavista Tetela
Cabañas del Lago
Calderón (Crucero el Oásis)
Camino Real
Carmen Huexotitla
Central de Abastos
Centro Comercial Puebla
Centro Cruz del Sur
Chula Vista
Cleotilde Torres
Club Britania
Club de Golf
Club de Golf las Fuentes
Club de Golf Puebla
Colibrí
Colorines
Conc

In [26]:

Puebla_venues.to_csv(r'D:\Users\dm34942\Desktop\Puebla_venues.csv')
Puebla_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,15 de Septiembre,19.01911,-98.220416,El Gran Taco,19.01745,-98.219835,Taco Place
1,15 de Septiembre,19.01911,-98.220416,Sweet & Coffee,19.021024,-98.2176,Coffee Shop
2,15 de Septiembre,19.01911,-98.220416,Los 3 García,19.015389,-98.222399,Taco Place
3,15 de Septiembre,19.01911,-98.220416,KFFTO,19.022261,-98.217795,Café
4,15 de Septiembre,19.01911,-98.220416,Mi Super Tako,19.022705,-98.21925,Taco Place


In [27]:
print('There are {} uniques categories.'.format(len(Puebla_venues['Venue Category'].unique())))

There are 257 uniques categories.


#### Taking a look for the venues in Puebla City

In [28]:
Top25 = Puebla_venues['Venue Category'].value_counts()
Top25

Mexican Restaurant                 403
Taco Place                         346
Convenience Store                  213
Coffee Shop                        129
Restaurant                         123
Seafood Restaurant                 104
Pharmacy                            92
Hotel                               86
Pizza Place                         82
Café                                79
Bar                                 70
Gym / Fitness Center                60
Bakery                              54
Italian Restaurant                  49
Burger Joint                        49
Ice Cream Shop                      47
Park                                45
Gym                                 44
Steakhouse                          40
Shopping Mall                       39
Fried Chicken Joint                 36
Soccer Field                        36
Sandwich Place                      34
Candy Store                         33
Garden                              31
Sushi Restaurant         

#### Now after some review, we will notice, in order to charge a car electric, we will need to take out all those places which are "Take and go" and just those places where someone is likely to stay longer than an hour. And also those places where people use to drink, you know, safety first. Of course, for precisely information a deeper research on how much time does people expend on these venues might be helpful

In [70]:
PV = Puebla_venues


PV.drop(PV.index[PV['Venue Category'] == 'Taco Place'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Pizza Place'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Convenience Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Pharmacy'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bakery'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Ice Cream Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Candy Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Snack Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Food Truck'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bar'], inplace = True)         #We want to avoid driking while driving
PV.drop(PV.index[PV['Venue Category'] == 'Fried Chicken Joint'], inplace = True)   
PV.drop(PV.index[PV['Venue Category'] == 'Juice Bar'], inplace = True) 
PV.drop(PV.index[PV['Venue Category'] == 'Brewery'], inplace = True) 
PV.drop(PV.index[PV['Venue Category'] == 'Pet Store'], inplace = True) 
PV.drop(PV.index[PV['Venue Category'] == 'Liquor Store'], inplace = True) 
PV.drop(PV.index[PV['Venue Category'] == 'Beer Garden'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Paper / Office Supplies Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Hot Dog Joint'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Electronics Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Shoe Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Sports Bar'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Cosmetics Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Big Box Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Cocktail Bar'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Shipping Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Mobile Phone Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Arts & Crafts Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Flower Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Cupcake Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Donut Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Sporting Goods Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Irish Pub'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Pastry Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Athletics & Sports'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Beer Bar'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Dive Bar'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Speakeasy'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Food Stand'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bus Station'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Stables'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Frozen Yogurt Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bubble Tea Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Fountain'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Burrito Place'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Video Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Print Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Pub'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Video Game Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Toy / Game Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Butcher'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Smoke Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Gift Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Food & Drink Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bagel Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'ATM'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Business Service'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Wine Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Health Food Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Gourmet Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bus Stop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Volcano'], inplace = True)  #Definetly not there
PV.drop(PV.index[PV['Venue Category'] == 'Outdoor Supply Sotre'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == "Men's Store"], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Mountain'], inplace = True) #Neither here
PV.drop(PV.index[PV['Venue Category'] == 'Fish & Chips Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Record Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Creperie'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Hobby Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Tailor Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Monument / Landmark'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Cheese Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Jewelry Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Track'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Memorial Site'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == "Women's Store"], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Whisky Bar'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Public Art'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Heliport'], inplace = True)   #Cars can not fly yet
PV.drop(PV.index[PV['Venue Category'] == 'Herbs $ Spices Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Street Food Gathering'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Camera Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Fish Market'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Moving Target'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Baby Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Beer Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Lingerie Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Home Service'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Drugstore'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Organic Grocery'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Tiki Bar'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Supplement Shop'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Bridge'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Winery'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Outdoor Supply Store'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Boutique'], inplace = True)
PV.drop(PV.index[PV['Venue Category'] == 'Karaoke Bar'], inplace = True)


PV['Venue Category'].value_counts()

Mexican Restaurant                 403
Coffee Shop                        129
Restaurant                         123
Seafood Restaurant                 104
Hotel                               86
Café                                79
Gym / Fitness Center                60
Italian Restaurant                  49
Burger Joint                        49
Park                                45
Gym                                 44
Steakhouse                          40
Shopping Mall                       39
Soccer Field                        36
Sandwich Place                      34
Garden                              31
Sushi Restaurant                    30
Department Store                    28
Snack Place                         25
Breakfast Spot                      25
Japanese Restaurant                 22
Diner                               21
Grocery Store                       20
BBQ Joint                           19
Fast Food Restaurant                19
Plaza                    

In [30]:
Puebla_onehot = pd.get_dummies(PV[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Puebla_onehot['Neighborhood'] = PV['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Puebla_onehot.columns[-1]] + list(Puebla_onehot.columns[:-1])
Puebla_onehot = Puebla_onehot[fixed_columns]

Puebla_onehot.head()

Unnamed: 0,Yucatecan Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,BBQ Joint,Baseball Field,Bath House,Bed & Breakfast,Belgian Restaurant,Bistro,Bookstore,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Cable Car,Cafeteria,Café,Campground,Caribbean Restaurant,Carpet Store,Casino,Chinese Restaurant,Church,Circus,City,City Hall,Clothing Store,Coffee Shop,College Administrative Building,College Auditorium,Comedy Club,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Doctor's Office,Dog Run,Dry Cleaner,Empanada Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food Court,Football Stadium,French Restaurant,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Go Kart Track,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Health & Beauty Service,Herbs & Spices Store,Historic Site,History Museum,Hookah Bar,Hot Spring,Hotel,Hotel Bar,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Lounge,Market,Martial Arts School,Mediterranean Restaurant,Mexican Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Nightclub,Outdoor Event Space,Outdoor Sculpture,Paintball Field,Park,Pet Service,Photography Studio,Pie Shop,Planetarium,Plaza,Pool,Pool Hall,Post Office,Racetrack,Ramen Restaurant,Recreation Center,Rental Service,Resort,Restaurant,Rock Climbing Spot,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shop & Service,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Warehouse Store,Water Park,Wine Bar,Wings Joint,Yoga Studio
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15 de Septiembre,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15 de Septiembre,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,15 de Septiembre,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15 de Septiembre,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15 de Septiembre,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [31]:
Puebla_grouped = Puebla_onehot.groupby('Neighborhood').mean().reset_index()
Puebla_grouped.head()

Unnamed: 0,Neighborhood,Yucatecan Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,BBQ Joint,Baseball Field,Bath House,Bed & Breakfast,Belgian Restaurant,Bistro,Bookstore,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Cable Car,Cafeteria,Café,Campground,Caribbean Restaurant,Carpet Store,Casino,Chinese Restaurant,Church,Circus,City,City Hall,Clothing Store,Coffee Shop,College Administrative Building,College Auditorium,Comedy Club,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Doctor's Office,Dog Run,Dry Cleaner,Empanada Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food Court,Football Stadium,French Restaurant,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Go Kart Track,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Health & Beauty Service,Herbs & Spices Store,Historic Site,History Museum,Hookah Bar,Hot Spring,Hotel,Hotel Bar,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Lounge,Market,Martial Arts School,Mediterranean Restaurant,Mexican Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Outdoor Event Space,Outdoor Sculpture,Paintball Field,Park,Pet Service,Photography Studio,Pie Shop,Planetarium,Plaza,Pool,Pool Hall,Post Office,Racetrack,Ramen Restaurant,Recreation Center,Rental Service,Resort,Restaurant,Rock Climbing Spot,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shop & Service,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Warehouse Store,Water Park,Wine Bar,Wings Joint,Yoga Studio
0,15 de Septiembre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.235294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,16 de Septiembre Sur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2 de Marzo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,8 de Diciembre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Acocota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.37037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
#Top 5 Common Venues per neigboorhood

num_top_venues = 5

for hood in Puebla_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Puebla_grouped[Puebla_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----15 de Septiembre----
                venue  freq
0  Seafood Restaurant  0.24
1  Mexican Restaurant  0.18
2          Restaurant  0.12
3        Burger Joint  0.12
4         Coffee Shop  0.12


----16 de Septiembre Sur----
                  venue  freq
0    Seafood Restaurant  0.50
1                  Pool  0.25
2         Garden Center  0.25
3  Yucatecan Restaurant  0.00
4     Outdoor Sculpture  0.00


----2 de Marzo----
                  venue  freq
0          Soccer Field  0.33
1                Lounge  0.17
2   Japanese Restaurant  0.17
3  Gym / Fitness Center  0.17
4                   Gym  0.17


----8 de Diciembre----
                  venue  freq
0                  Park   1.0
1  Yucatecan Restaurant   0.0
2           Music Venue   0.0
3            Nail Salon   0.0
4       Nature Preserve   0.0


----Acocota----
                  venue  freq
0    Mexican Restaurant  0.37
1                 Hotel  0.07
2    Seafood Restaurant  0.07
3                Garden  0.04
4  Gym / Fitness Cente

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [63]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Puebla_grouped['Neighborhood']

for ind in np.arange(Puebla_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Puebla_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,15 de Septiembre,Seafood Restaurant,Mexican Restaurant,Restaurant,Burger Joint,Coffee Shop
1,16 de Septiembre Sur,Seafood Restaurant,Pool,Garden Center,Yucatecan Restaurant,Outdoor Sculpture
2,2 de Marzo,Soccer Field,Lounge,Japanese Restaurant,Gym / Fitness Center,Gym
3,8 de Diciembre,Park,Yucatecan Restaurant,Music Venue,Nail Salon,Nature Preserve
4,Acocota,Mexican Restaurant,Hotel,Seafood Restaurant,Garden,Gym / Fitness Center


#### Setting a K Value of 5 for this analysis

In [65]:
kclusters = 5

Puebla_grouped_clustering = Puebla_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Puebla_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kint = kmeans.labels_
kint

array([0, 0, 4, 1, 3, 3, 3, 4, 4, 0, 4, 0, 4, 4, 4, 0, 4, 4, 3, 4, 0, 4,
       3, 3, 0, 0, 4, 4, 4, 0, 0, 0, 3, 0, 4, 4, 4, 3, 3, 4, 3, 4, 4, 4,
       4, 4, 4, 4, 4, 0, 0, 4, 4, 4, 3, 4, 3, 3, 3, 4, 3, 3, 0, 4, 0, 4,
       1, 0, 3, 4, 4, 3, 4, 4, 3, 4, 0, 0, 4, 4, 4, 3, 3, 3, 3, 3, 4, 4,
       4, 0, 3, 0, 4, 0, 4, 2, 3, 4, 0, 4, 3, 0, 4, 4, 4, 4, 0, 3, 4, 4,
       3, 1, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 3, 3, 0, 4, 4, 0, 3, 4, 3, 4,
       4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 3, 4, 0, 4, 4, 4, 4, 3, 3, 3, 4, 4,
       4, 3, 0, 3, 4, 2, 3, 2, 2, 3, 4, 4, 4, 3, 2, 3, 4, 1, 3, 0, 4, 4,
       0, 0, 3, 3, 4, 4, 4, 0, 3, 1, 4, 4, 0, 3, 3, 3, 4, 4, 4, 4, 0, 4,
       3, 3, 3, 4, 0, 3, 4, 4, 4, 4, 3, 3, 4, 4, 4, 1, 1, 0, 3, 4, 4, 3,
       4, 4], dtype=int32)

In [77]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kint)
#neighborhoods_venues_sorted['Cluster Labels'] = list(kint)
Puebla_merged = ordenado3

#merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
Puebla_merged =  Puebla_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Puebla_merged['Cluster Labels'] = Puebla_merged['Cluster Labels'].fillna(0)
Puebla_merged['Cluster Labels'] = Puebla_merged['Cluster Labels'].apply(np.int64)

Puebla_merged.dropna(inplace=True)

Puebla_merged.head()

ValueError: cannot insert Cluster Labels, already exists

#### Displaying the clustered neighboorhoods

In [76]:
map_clusters = folium.Map(location=[LA_Puebla, LO_Puebla], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
    
#add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Puebla_merged ['Latitude'], Puebla_merged ['Longitude'], Puebla_merged ['Neighborhood'], Puebla_merged ['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

### Analyzing Clusters

In [71]:
#Cluster 1 - Purple ones
C1 = Puebla_merged.loc[Puebla_merged['Cluster Labels'] == 0, Puebla_merged.columns[[1] + list(range(5, Puebla_merged.shape[1]))]]
C1.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Puebla,0,Seafood Restaurant,Mexican Restaurant,Restaurant,Burger Joint,Coffee Shop
1,Puebla,0,Seafood Restaurant,Pool,Garden Center,Yucatecan Restaurant,Outdoor Sculpture
13,Puebla,0,Event Space,Restaurant,Pet Service,Nail Salon,Nature Preserve
15,Puebla,0,Hot Spring,Lounge,Restaurant,Nail Salon,Nature Preserve
23,Puebla,0,Café,Steakhouse,Burger Joint,Garden,Seafood Restaurant


In [72]:
#Cluster 2 - Blue
C2 = Puebla_merged.loc[Puebla_merged['Cluster Labels'] == 1, Puebla_merged.columns[[1] + list(range(5, Puebla_merged.shape[1]))]]
C2.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Puebla,1,Park,Yucatecan Restaurant,Music Venue,Nail Salon,Nature Preserve
85,Puebla,1,Park,Restaurant,Yucatecan Restaurant,Music Venue,Nail Salon
86,Puebla,1,Park,Restaurant,Yucatecan Restaurant,Music Venue,Nail Salon
141,Puebla,1,Park,Tennis Court,Paintball Field,Music Venue,Nail Salon
227,Puebla,1,Park,Grocery Store,Music Venue,Nail Salon,Nature Preserve


In [73]:
#Cluster 3 - Cyan
C3 = Puebla_merged.loc[Puebla_merged['Cluster Labels'] == 2, Puebla_merged.columns[[1] + list(range(5, Puebla_merged.shape[1]))]]
C3.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
118,Puebla,2,Soccer Field,Comfort Food Restaurant,Yucatecan Restaurant,Park,Nail Salon
145,Puebla,2,Soccer Field,Yucatecan Restaurant,Paintball Field,Music Venue,Nail Salon
208,Puebla,2,Soccer Field,Yucatecan Restaurant,Paintball Field,Music Venue,Nail Salon
211,Puebla,2,Soccer Field,Garden Center,Yucatecan Restaurant,Museum,Music Venue
214,Puebla,2,Burger Joint,Soccer Field,Yucatecan Restaurant,Museum,Music Venue


In [74]:
#Cluster 4 - Orange
C4 = Puebla_merged.loc[Puebla_merged['Cluster Labels'] == 3, Puebla_merged.columns[[1] + list(range(5, Puebla_merged.shape[1]))]]
C4.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Puebla,3,Mexican Restaurant,Hotel,Seafood Restaurant,Garden,Gym / Fitness Center
7,Puebla,3,Mexican Restaurant,Restaurant,Rock Climbing Spot,Health & Beauty Service,Flea Market
9,Puebla,3,Mexican Restaurant,Yucatecan Restaurant,Park,Nail Salon,Nature Preserve
26,Puebla,3,Pool Hall,Plaza,Mexican Restaurant,Tea Room,Yucatecan Restaurant
33,Puebla,3,Mexican Restaurant,Restaurant,Coffee Shop,Hotel,Vegetarian / Vegan Restaurant


In [75]:
#Cluster 5 - Red
C5 = Puebla_merged.loc[Puebla_merged['Cluster Labels'] == 4, Puebla_merged.columns[[1] + list(range(5, Puebla_merged.shape[1]))]]
C5.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Puebla,4,Soccer Field,Lounge,Japanese Restaurant,Gym / Fitness Center,Gym
10,Puebla,4,Coffee Shop,Shopping Mall,Restaurant,Mexican Restaurant,Seafood Restaurant
12,Puebla,4,Garden Center,Lounge,Yucatecan Restaurant,Museum,Music Venue
14,Puebla,4,Shopping Mall,Hotel,Italian Restaurant,Hookah Bar,Grocery Store
16,Puebla,4,Shopping Mall,Restaurant,Fast Food Restaurant,Spanish Restaurant,French Restaurant
