# Capstone Project - The Battle of the Neighborhoods in Bogota City (Week 2)
### Applied Data Science Capstone by IBM/Coursera

### By: Jose Eduardo Suarez Vargas

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis and Results](#analysis)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

Bogota is the capital city of Colombia, a country in the north of South America continent. This City is the main economic source of this country. Their economic activities represent the 24,5% of the National GDP and is the sixth bigger economic of the region.

In recent years, the city has become the main tourist destination in the country. Due to this, the city receives national and international tourist each day. Consequently, various economic activities such as restaurants, museums, bars and other places of tourist services have been consolidating as an opportunity of economic growth. 

In this sense, **severals companies would like open restaurants as a way to position their brand**. In this case, I will try **to decide and suggest them where is the best place to locate the business because the city is very large and has several neighborhoods or “Localidades” (in Spanish) with different economic dynamics**.

## Data <a name="data"></a>

The solution for this problem will be based on two types of information:
    
   a.	**Geographical information of the 20 towns (neighborhood or localidades) that make up Bogota**: Latitude and Longitude for each neighborhood.

   b.	**Information about the mean venues for each neighborhood or “localidad” of Bogota.** The source of this information is the **Foursquare API**, where can be rescued specific data about the venues of each neighborhood and their classification according the type of service. For example: water park, café, dessert shop, gym, etc.

To solve the problem of this project I try to combine de information of both sources of information. For each neighborhoods latitude and longitude (source 1) will be extract information of all venues from around.
    
After that, It will calculate a mean for each type of venue for each neighborhood and with this information will be applied the K-means algorithm to cluster all the neighborhoods of the city and deciding which cluster is the better to locate the restaurant. 

In [3]:
import numpy as np
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                       

In [4]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0          conda-forge
    geopy:         2.0.0-pyh9f0ad1d_0 conda-forge


Downloading and Extracting Packages
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


### a. Geographical information of the 20 towns (neighborhoods or localidades) that make up Bogota


In [5]:
address = 'Bogota'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Bogota City are: {}, {}.'.format(latitude, longitude))

The geographical coordinate of Bogota City are: 4.59808, -74.0760439.


In [6]:
map_bogota = folium.Map(location=[latitude, longitude], zoom_start=10)

folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup="Bogota",
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bogota)  
    
map_bogota

In [6]:
link="https://bogota-laburbano.opendatasoft.com/explore/dataset/georeferencia-puntual-por-localidad/download/?format=csv&timezone=America/Bogota&lang=es&use_labels_for_header=true&csv_separator=%3B"

In [7]:
# Table keeps the information that is found in the link web page
table=pd.read_csv(link)

In [8]:
table

Unnamed: 0,LOCALIDAD;LONGITUD;LATITUD;CODIGO;gp
BARRIOS UNIDOS;-74.084;4.6664;12;-74.084,4.6664
ENGATIVÁ;-74.1072;4.7071000000000005;10;-74.1072,4.7071
SUMAPAZ;-74.315224;4.034746;20;-74.315224,4.034746
TEUSAQUILLO;-74.0938;4.6448;13;-74.0938,4.6448
LA CANDELARIA;-74.0739;4.5939;17;-74.0739,4.5939
SANTA FE;-74.0298;4.5963;3;-74.0298,4.5963
SUBA;-74.0824;4.7652;11;-74.0824,4.7652
FONTIBÓN;-74.1479;4.6832;9;-74.1479,4.6832
LOS MÁRTIRES;-74.0913;4.603;14;-74.0913,4.603
SAN CRISTOBAL;-74.0883;4.5463000000000005;4;-74.0883,4.5463


Due to the information about geographical neighboorhoods of Bogota is not the correct format, it is necessary to create a new data frame with the correct format:

In [7]:
neighborhood=['BARRIOS UNIDOS', 'ENGATIVA', 'SUMAPAZ', 'TEUSAQUILLO', 'LA CANDELARIA', 'SANTAFE', 'SUBA', 'FONTIBON', 'LOS MARTIRES', 'SAN CRISTOBAL', 'USME', 'PUENTE ARANDA', 'USAQUEN', 'BOSA', 'CIUDAD BOLIVAR', 'RAFAEL URIBE URIBE', 'KENNEDY', 'CHAPINERO', 'TUNJUELITO', 'ANTONIO NARINO']
longitud=[-74.084,-74.1072,-74.315224,-74.0938,-74.0739,-74.0298,-74.0824,-74.1479,-74.0913,-74.088,-74.1033,-74.1227,-74.0312, -74.1945,-74.1539,-74.1164,-74.1573,-74.0467,-74.1407,-74.1009]
latitud=[4.6664,4.7071000000000005,4.034746,4.6448,4.5939,4.5963,4.7652,4.6832,4.603,4.5463000000000005,4.4766,4.6149000000000004,4.7485,4.6305,4.5066,4.5653,4.6268,4.6569,4.5875,4.5486 ]

In [17]:
neighborhood

['BARRIOS UNIDOS',
 'ENGATIVA',
 'SUMAPAZ',
 'TEUSAQUILLO',
 'LA CANDELARIA',
 'SANTAFE',
 'SUBA',
 'FONTIBON',
 'LOS MARTIRES',
 'SAN CRISTOBAL',
 'USME',
 'PUENTE ARANDA',
 'USAQUEN',
 'BOSA',
 'CIUDAD BOLIVAR',
 'RAFAEL URIBE URIBE',
 'KENNEDY',
 'CHAPINERO',
 'TUNJUELITO',
 'ANTONIO NARINO']

In [18]:
longitud

[-74.084,
 -74.1072,
 -74.315224,
 -74.0938,
 -74.0739,
 -74.0298,
 -74.0824,
 -74.1479,
 -74.0913,
 -74.088,
 -74.1033,
 -74.1227,
 -74.0312,
 -74.1945,
 -74.1539,
 -74.1164,
 -74.1573,
 -74.0467,
 -74.1407,
 -74.1009]

In [19]:
latitud

[4.6664,
 4.7071000000000005,
 4.034746,
 4.6448,
 4.5939,
 4.5963,
 4.7652,
 4.6832,
 4.603,
 4.5463000000000005,
 4.4766,
 4.6149000000000004,
 4.7485,
 4.6305,
 4.5066,
 4.5653,
 4.6268,
 4.6569,
 4.5875,
 4.5486]

In [20]:
data_bog=[neighborhood,latitud,longitud]
data_bogota=pd.DataFrame(data_bog)

In [21]:
data_bogota

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,BARRIOS UNIDOS,ENGATIVA,SUMAPAZ,TEUSAQUILLO,LA CANDELARIA,SANTAFE,SUBA,FONTIBON,LOS MARTIRES,SAN CRISTOBAL,USME,PUENTE ARANDA,USAQUEN,BOSA,CIUDAD BOLIVAR,RAFAEL URIBE URIBE,KENNEDY,CHAPINERO,TUNJUELITO,ANTONIO NARINO
1,4.6664,4.7071,4.03475,4.6448,4.5939,4.5963,4.7652,4.6832,4.603,4.5463,4.4766,4.6149,4.7485,4.6305,4.5066,4.5653,4.6268,4.6569,4.5875,4.5486
2,-74.084,-74.1072,-74.3152,-74.0938,-74.0739,-74.0298,-74.0824,-74.1479,-74.0913,-74.088,-74.1033,-74.1227,-74.0312,-74.1945,-74.1539,-74.1164,-74.1573,-74.0467,-74.1407,-74.1009


In [22]:
bogota_data=data_bogota.transpose()
bogota_data

Unnamed: 0,0,1,2
0,BARRIOS UNIDOS,4.6664,-74.084
1,ENGATIVA,4.7071,-74.1072
2,SUMAPAZ,4.03475,-74.3152
3,TEUSAQUILLO,4.6448,-74.0938
4,LA CANDELARIA,4.5939,-74.0739
5,SANTAFE,4.5963,-74.0298
6,SUBA,4.7652,-74.0824
7,FONTIBON,4.6832,-74.1479
8,LOS MARTIRES,4.603,-74.0913
9,SAN CRISTOBAL,4.5463,-74.088


In [23]:
# Changing the columns names
bogota_data.rename(columns={0:'Neighborhood',1:'Latitude',2:'Longitude'}, inplace=True)
bogota_data

Unnamed: 0,Neighborhood,Latitude,Longitude
0,BARRIOS UNIDOS,4.6664,-74.084
1,ENGATIVA,4.7071,-74.1072
2,SUMAPAZ,4.03475,-74.3152
3,TEUSAQUILLO,4.6448,-74.0938
4,LA CANDELARIA,4.5939,-74.0739
5,SANTAFE,4.5963,-74.0298
6,SUBA,4.7652,-74.0824
7,FONTIBON,4.6832,-74.1479
8,LOS MARTIRES,4.603,-74.0913
9,SAN CRISTOBAL,4.5463,-74.088


In [24]:
map_bogota2 = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(bogota_data['Latitude'], bogota_data['Longitude'], bogota_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bogota2)  
    
map_bogota2

### b. Information about the venues for each neigborhood or "localidad" of Bogota: from Foursquare API

In [34]:
CLIENT_ID = 'CGJ4NHVI1URRDVB0C2XIWDCPKXGFGYDYGGQLOE1DVLVT4BXS' # your Foursquare ID
CLIENT_SECRET = 'ASPFP2JUJXPXH4MTP4TR2PX1CYRAKMI1CJ2WCTKEYEEOFCRG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CGJ4NHVI1URRDVB0C2XIWDCPKXGFGYDYGGQLOE1DVLVT4BXS
CLIENT_SECRET:ASPFP2JUJXPXH4MTP4TR2PX1CYRAKMI1CJ2WCTKEYEEOFCRG


In [35]:
neighborhood_latitude = bogota_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = bogota_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = bogota_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of BARRIOS UNIDOS are 4.6664, -74.084.


In [36]:
# type your answer here
LIMIT = 150
radius = 2000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=CGJ4NHVI1URRDVB0C2XIWDCPKXGFGYDYGGQLOE1DVLVT4BXS&client_secret=ASPFP2JUJXPXH4MTP4TR2PX1CYRAKMI1CJ2WCTKEYEEOFCRG&v=20180605&ll=4.6664,-74.084&radius=2000&limit=150'

In [37]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2d9c9c6dd9a7231d23abd2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Barrios Unidos',
  'headerFullLocation': 'Barrios Unidos, Bogotá',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 108,
  'suggestedBounds': {'ne': {'lat': 4.6844000180000185,
    'lng': -74.06597383817429},
   'sw': {'lat': 4.648399981999982, 'lng': -74.10202616182572}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '504f5293e4b099c281883015',
       'name': 'Centro Canino Cruz Roja',
       'location': {'lat': 4.665529769653475,
        'lng': -74.0861505600684,
        'labeledLatLngs': [{'label': 'display',
          'lat': 4.665529769653475,
      

In [38]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [39]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Centro Canino Cruz Roja,Dog Run,4.66553,-74.086151
1,Riquisimo - Postres y Helados Principal,Dessert Shop,4.668366,-74.083662
2,Campo de Practica Fedegolf,Golf Course,4.6634,-74.08451
3,Solo Postres,Dessert Shop,4.667903,-74.083965
4,Centro De Salvamento Acuatico Cruz Roja,Water Park,4.665932,-74.086105


In [40]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [42]:
bogota_venues = getNearbyVenues(names=bogota_data['Neighborhood'],
                                   latitudes=bogota_data['Latitude'],
                                   longitudes=bogota_data['Longitude']
                                  )

BARRIOS UNIDOS
ENGATIVA
SUMAPAZ
TEUSAQUILLO
LA CANDELARIA
SANTAFE
SUBA
FONTIBON
LOS MARTIRES
SAN CRISTOBAL
USME
PUENTE ARANDA
USAQUEN
BOSA
CIUDAD BOLIVAR
RAFAEL URIBE URIBE
KENNEDY
CHAPINERO
TUNJUELITO
ANTONIO NARINO


In [43]:
print(bogota_venues.shape)
bogota_venues.head()

(150, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,BARRIOS UNIDOS,4.6664,-74.084,Riquisimo - Postres y Helados Principal,4.668366,-74.083662,Dessert Shop
1,BARRIOS UNIDOS,4.6664,-74.084,Centro Canino Cruz Roja,4.66553,-74.086151,Dog Run
2,BARRIOS UNIDOS,4.6664,-74.084,Solo Postres,4.667903,-74.083965,Dessert Shop
3,BARRIOS UNIDOS,4.6664,-74.084,Postres La Enramada,4.667113,-74.084464,Dessert Shop
4,BARRIOS UNIDOS,4.6664,-74.084,Centro De Salvamento Acuatico Cruz Roja,4.665932,-74.086105,Water Park


In [44]:
bogota_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,BARRIOS UNIDOS,4.6664,-74.084,Riquisimo - Postres y Helados Principal,4.668366,-74.083662,Dessert Shop
1,BARRIOS UNIDOS,4.6664,-74.084,Centro Canino Cruz Roja,4.66553,-74.086151,Dog Run
2,BARRIOS UNIDOS,4.6664,-74.084,Solo Postres,4.667903,-74.083965,Dessert Shop
3,BARRIOS UNIDOS,4.6664,-74.084,Postres La Enramada,4.667113,-74.084464,Dessert Shop
4,BARRIOS UNIDOS,4.6664,-74.084,Centro De Salvamento Acuatico Cruz Roja,4.665932,-74.086105,Water Park
5,BARRIOS UNIDOS,4.6664,-74.084,Campo de Practica Fedegolf,4.6634,-74.08451,Golf Course
6,BARRIOS UNIDOS,4.6664,-74.084,Cooks,4.667757,-74.084222,Diner
7,BARRIOS UNIDOS,4.6664,-74.084,TodoRico postres,4.668406,-74.083844,South American Restaurant
8,BARRIOS UNIDOS,4.6664,-74.084,Cento De Alto Rendimiento Federacion Colombian...,4.66395,-74.084657,Golf Course
9,BARRIOS UNIDOS,4.6664,-74.084,Fx Pizza Gourmet (Modelo Norte),4.669273,-74.083036,Pizza Place


In [45]:
bogota_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ANTONIO NARINO,4,4,4,4,4,4
BARRIOS UNIDOS,24,24,24,24,24,24
BOSA,5,5,5,5,5,5
CHAPINERO,1,1,1,1,1,1
ENGATIVA,7,7,7,7,7,7
FONTIBON,3,3,3,3,3,3
KENNEDY,4,4,4,4,4,4
LA CANDELARIA,59,59,59,59,59,59
LOS MARTIRES,8,8,8,8,8,8
PUENTE ARANDA,4,4,4,4,4,4


In [46]:
print('There are {} uniques categories.'.format(len(bogota_venues['Venue Category'].unique())))

There are 71 uniques categories.


## Methodology <a name="methodology"></a>

To solve this problem, the next steps were followed:

<p>  1. Getting the information about the venues for each neigborhood in the radious of 2000 kms: Venue name, venue category, latitude and longitude </p>
<p>  2. Creating dummies variables to each venue category for each neigborhood: So, it is possible to calculate a mean for each category venue according with its frecuency </p>
<p>  3. Implementing k-means algorithm: To define the clusters according with the category types </p>
<p>  4. Analyzing each cluster: Identify the mean characteristics of each cluster  </p>
<p>  5. Suggesting a neighborhood: Decide which Neigborhood would be the best place to locate the restaurant </p>

In [49]:
# one hot encoding
bogota_onehot = pd.get_dummies(bogota_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bogota_onehot['Neighborhood'] = bogota_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bogota_onehot.columns[-1]] + list(bogota_onehot.columns[:-1])
bogota_onehot = bogota_onehot[fixed_columns]

bogota_onehot.head(5)

Unnamed: 0,Neighborhood,Argentinian Restaurant,Art Gallery,Art Museum,Athletics & Sports,BBQ Joint,Bakery,Bookstore,Breakfast Spot,Burger Joint,Burrito Place,Café,Campground,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,French Restaurant,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,History Museum,Hostel,Hot Dog Joint,Hotel,Ice Cream Shop,Italian Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Museum,Music Store,Nightlife Spot,Paintball Field,Park,Performing Arts Venue,Peruvian Restaurant,Pizza Place,Plaza,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Soccer Field,South American Restaurant,Steakhouse,Storage Facility,Supermarket,Theater,Trail,Vegetarian / Vegan Restaurant,Water Park
0,BARRIOS UNIDOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,BARRIOS UNIDOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,BARRIOS UNIDOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,BARRIOS UNIDOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,BARRIOS UNIDOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [50]:
bogota_onehot.shape

(150, 72)

In [51]:
bogota_grouped = bogota_onehot.groupby('Neighborhood').mean().reset_index()
bogota_grouped.head(5)

Unnamed: 0,Neighborhood,Argentinian Restaurant,Art Gallery,Art Museum,Athletics & Sports,BBQ Joint,Bakery,Bookstore,Breakfast Spot,Burger Joint,Burrito Place,Café,Campground,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,French Restaurant,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,History Museum,Hostel,Hot Dog Joint,Hotel,Ice Cream Shop,Italian Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Museum,Music Store,Nightlife Spot,Paintball Field,Park,Performing Arts Venue,Peruvian Restaurant,Pizza Place,Plaza,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Soccer Field,South American Restaurant,Steakhouse,Storage Facility,Supermarket,Theater,Trail,Vegetarian / Vegan Restaurant,Water Park
0,ANTONIO NARINO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,BARRIOS UNIDOS,0.0,0.0,0.0,0.041667,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.041667,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.125,0.0,0.0,0.041667,0.0,0.083333,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667
2,BOSA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0
3,CHAPINERO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,ENGATIVA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0


In [106]:
bogota_grouped.describe()

Unnamed: 0,Argentinian Restaurant,Art Gallery,Art Museum,BBQ Joint,Bakery,Bar,Bookstore,Boutique,Breakfast Spot,Burger Joint,Burrito Place,Café,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,French Restaurant,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,History Museum,Hostel,Hot Dog Joint,Hotel,Italian Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Museum,Music Store,Nightlife Spot,Paintball Field,Park,Performing Arts Venue,Peruvian Restaurant,Pizza Place,Plaza,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Soccer Field,South American Restaurant,Steakhouse,Supermarket,Theater,Trail,Vegetarian / Vegan Restaurant,Water Park
count,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0
mean,0.002119,0.001059,0.002119,0.027101,0.006494,0.022917,0.021893,0.006944,0.001059,0.064298,0.0125,0.029308,0.015625,0.002119,0.015217,0.006944,0.001059,0.023425,0.002119,0.001059,0.036458,0.008929,0.001059,0.006944,0.038194,0.009211,0.011646,0.002717,0.010417,0.032051,0.010417,0.001059,0.001059,0.021429,0.01087,0.040625,0.010417,0.002717,0.020833,0.003178,0.001059,0.005682,0.002119,0.024011,0.03386,0.001059,0.003178,0.001059,0.020833,0.001059,0.020833,0.013134,0.001059,0.005682,0.013834,0.001059,0.001059,0.052533,0.005682,0.011117,0.020833,0.036458,0.010417,0.020833,0.057292,0.002717,0.009662,0.008929,0.001059,0.0625,0.001059,0.002717
std,0.008475,0.004237,0.008475,0.072486,0.021869,0.062915,0.083158,0.027778,0.004237,0.115042,0.05,0.06335,0.0625,0.008475,0.050455,0.027778,0.004237,0.064876,0.008475,0.004237,0.100778,0.035714,0.004237,0.027778,0.126229,0.032602,0.036632,0.01087,0.041667,0.079646,0.041667,0.004237,0.004237,0.059476,0.043478,0.087975,0.041667,0.01087,0.083333,0.012712,0.004237,0.022727,0.008475,0.083455,0.069525,0.004237,0.012712,0.004237,0.083333,0.004237,0.083333,0.042354,0.004237,0.022727,0.038484,0.004237,0.004237,0.103222,0.022727,0.030385,0.083333,0.100778,0.041667,0.083333,0.174055,0.01087,0.029146,0.035714,0.004237,0.25,0.004237,0.01087
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09596,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004237,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012712,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,0.033898,0.016949,0.033898,0.25,0.086957,0.2,0.333333,0.111111,0.016949,0.4,0.2,0.166667,0.25,0.033898,0.2,0.111111,0.016949,0.25,0.033898,0.016949,0.333333,0.142857,0.016949,0.111111,0.5,0.130435,0.142857,0.043478,0.166667,0.285714,0.166667,0.016949,0.016949,0.2,0.173913,0.25,0.166667,0.043478,0.333333,0.050847,0.016949,0.090909,0.033898,0.333333,0.2,0.016949,0.050847,0.016949,0.333333,0.016949,0.333333,0.166667,0.016949,0.090909,0.130435,0.016949,0.016949,0.272727,0.090909,0.090909,0.333333,0.333333,0.166667,0.333333,0.666667,0.043478,0.111111,0.142857,0.016949,1.0,0.016949,0.043478


In [52]:
num_top_venues = 5

for hood in bogota_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = bogota_grouped[bogota_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ANTONIO NARINO----
                        venue  freq
0  Construction & Landscaping  0.25
1                  Campground  0.25
2                  Restaurant  0.25
3                Burger Joint  0.25
4         Peruvian Restaurant  0.00


----BARRIOS UNIDOS----
                venue  freq
0         Golf Course  0.17
1        Dessert Shop  0.12
2         Pizza Place  0.12
3  Seafood Restaurant  0.08
4              Bakery  0.08


----BOSA----
                        venue  freq
0  Construction & Landscaping   0.2
1               Shopping Mall   0.2
2               Grocery Store   0.2
3                  Restaurant   0.2
4            Storage Facility   0.2


----CHAPINERO----
                    venue  freq
0                   Trail   1.0
1  Argentinian Restaurant   0.0
2      Mexican Restaurant   0.0
3   Performing Arts Venue   0.0
4                    Park   0.0


----ENGATIVA----
                  venue  freq
0  Fast Food Restaurant  0.29
1     Convenience Store  0.14
2        Ice Cre

In [54]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [55]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bogota_grouped['Neighborhood']

for ind in np.arange(bogota_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bogota_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ANTONIO NARINO,Burger Joint,Restaurant,Campground,Construction & Landscaping,Dog Run,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner
1,BARRIOS UNIDOS,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
2,BOSA,Shopping Mall,Storage Facility,Grocery Store,Restaurant,Construction & Landscaping,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Dog Run
3,CHAPINERO,Trail,Water Park,Dog Run,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Farmers Market
4,ENGATIVA,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega
5,FONTIBON,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Golf Course,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Gym
6,KENNEDY,Department Store,Coffee Shop,Soccer Field,Water Park,Dog Run,Convenience Store,Cultural Center,Deli / Bodega,Dessert Shop,Diner
7,LA CANDELARIA,Café,Restaurant,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,History Museum,Argentinian Restaurant,Burger Joint,Caribbean Restaurant,Comfort Food Restaurant
8,LOS MARTIRES,Shopping Mall,Burger Joint,Clothing Store,Deli / Bodega,Steakhouse,Department Store,Convenience Store,Cultural Center,Dessert Shop,Diner
9,PUENTE ARANDA,Burger Joint,Burrito Place,Grocery Store,Fried Chicken Joint,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Construction & Landscaping


In [56]:
kclusters = 5

bogota_grouped_clustering = bogota_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bogota_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20] 

array([1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 4, 2, 0, 1, 1, 1], dtype=int32)

In [57]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,ANTONIO NARINO,Burger Joint,Restaurant,Campground,Construction & Landscaping,Dog Run,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner
1,1,BARRIOS UNIDOS,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
2,1,BOSA,Shopping Mall,Storage Facility,Grocery Store,Restaurant,Construction & Landscaping,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Dog Run
3,3,CHAPINERO,Trail,Water Park,Dog Run,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Farmers Market
4,1,ENGATIVA,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega


In [58]:
bogota_merged = bogota_data

In [59]:
bogota_merged = bogota_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [60]:
bogota_merged.head(20)

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BARRIOS UNIDOS,4.6664,-74.084,1.0,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
1,ENGATIVA,4.7071,-74.1072,1.0,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega
2,SUMAPAZ,4.03475,-74.3152,,,,,,,,,,,
3,TEUSAQUILLO,4.6448,-74.0938,1.0,Restaurant,Latin American Restaurant,Sandwich Place,Coffee Shop,Peruvian Restaurant,Pizza Place,Hot Dog Joint,Burger Joint,Seafood Restaurant,Water Park
4,LA CANDELARIA,4.5939,-74.0739,1.0,Café,Restaurant,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,History Museum,Argentinian Restaurant,Burger Joint,Caribbean Restaurant,Comfort Food Restaurant
5,SANTAFE,4.5963,-74.0298,,,,,,,,,,,
6,SUBA,4.7652,-74.0824,0.0,Soccer Field,Paintball Field,Water Park,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Dog Run
7,FONTIBON,4.6832,-74.1479,1.0,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Golf Course,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Gym
8,LOS MARTIRES,4.603,-74.0913,1.0,Shopping Mall,Burger Joint,Clothing Store,Deli / Bodega,Steakhouse,Department Store,Convenience Store,Cultural Center,Dessert Shop,Diner
9,SAN CRISTOBAL,4.5463,-74.088,2.0,Music Store,Construction & Landscaping,Italian Restaurant,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Water Park


In [61]:
bogota_merged.replace(np.nan, 0)
bogota_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BARRIOS UNIDOS,4.6664,-74.084,1.0,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
1,ENGATIVA,4.7071,-74.1072,1.0,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega
2,SUMAPAZ,4.03475,-74.3152,,,,,,,,,,,
3,TEUSAQUILLO,4.6448,-74.0938,1.0,Restaurant,Latin American Restaurant,Sandwich Place,Coffee Shop,Peruvian Restaurant,Pizza Place,Hot Dog Joint,Burger Joint,Seafood Restaurant,Water Park
4,LA CANDELARIA,4.5939,-74.0739,1.0,Café,Restaurant,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,History Museum,Argentinian Restaurant,Burger Joint,Caribbean Restaurant,Comfort Food Restaurant
5,SANTAFE,4.5963,-74.0298,,,,,,,,,,,
6,SUBA,4.7652,-74.0824,0.0,Soccer Field,Paintball Field,Water Park,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Dog Run
7,FONTIBON,4.6832,-74.1479,1.0,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Golf Course,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Gym
8,LOS MARTIRES,4.603,-74.0913,1.0,Shopping Mall,Burger Joint,Clothing Store,Deli / Bodega,Steakhouse,Department Store,Convenience Store,Cultural Center,Dessert Shop,Diner
9,SAN CRISTOBAL,4.5463,-74.088,2.0,Music Store,Construction & Landscaping,Italian Restaurant,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Water Park


In [62]:
bc1=bogota_merged.drop(bogota_merged[bogota_merged.Neighborhood=='SUMAPAZ'].index)
bc2=bc1.drop(bc1[bc1.Neighborhood=='SANTAFE'].index)
bc3=bc2.drop(bc2[bc2.Neighborhood=='USME'].index)
bogota_clusters=bc3.drop(bc3[bc3.Neighborhood=='CIUDAD BOLIVAR'].index)

# Updating the index of new data frame
bogota_clusters.reset_index(drop=True, inplace=True)
bogota_clusters

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BARRIOS UNIDOS,4.6664,-74.084,1.0,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
1,ENGATIVA,4.7071,-74.1072,1.0,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega
2,TEUSAQUILLO,4.6448,-74.0938,1.0,Restaurant,Latin American Restaurant,Sandwich Place,Coffee Shop,Peruvian Restaurant,Pizza Place,Hot Dog Joint,Burger Joint,Seafood Restaurant,Water Park
3,LA CANDELARIA,4.5939,-74.0739,1.0,Café,Restaurant,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,History Museum,Argentinian Restaurant,Burger Joint,Caribbean Restaurant,Comfort Food Restaurant
4,SUBA,4.7652,-74.0824,0.0,Soccer Field,Paintball Field,Water Park,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Dog Run
5,FONTIBON,4.6832,-74.1479,1.0,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Golf Course,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Gym
6,LOS MARTIRES,4.603,-74.0913,1.0,Shopping Mall,Burger Joint,Clothing Store,Deli / Bodega,Steakhouse,Department Store,Convenience Store,Cultural Center,Dessert Shop,Diner
7,SAN CRISTOBAL,4.5463,-74.088,2.0,Music Store,Construction & Landscaping,Italian Restaurant,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Water Park
8,PUENTE ARANDA,4.6149,-74.1227,1.0,Burger Joint,Burrito Place,Grocery Store,Fried Chicken Joint,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Construction & Landscaping
9,USAQUEN,4.7485,-74.0312,1.0,Bookstore,Farmers Market,Gym,Café,Dog Run,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner


In [91]:
bogota_clusters

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BARRIOS UNIDOS,4.6664,-74.084,0.0,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Fast Food Restaurant,Gym / Fitness Center,Dog Run,Diner
1,ENGATIVA,4.7071,-74.1072,0.0,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Burger Joint,Diner,Supermarket,Fish & Chips Shop,Farmers Market,Dog Run,Coffee Shop
2,TEUSAQUILLO,4.6448,-74.0938,0.0,Restaurant,Sandwich Place,Peruvian Restaurant,Pizza Place,Latin American Restaurant,Hot Dog Joint,Burger Joint,Coffee Shop,Seafood Restaurant,Water Park
3,LA CANDELARIA,4.5939,-74.0739,0.0,Café,Restaurant,History Museum,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,Argentinian Restaurant,Caribbean Restaurant,Hotel,Coffee Shop
4,SUBA,4.7652,-74.0824,2.0,Soccer Field,Paintball Field,Water Park,Dessert Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cultural Center,Deli / Bodega,Department Store
5,FONTIBON,4.6832,-74.1479,0.0,Grocery Store,Bar,Latin American Restaurant,Chinese Restaurant,Fried Chicken Joint,Bakery,Construction & Landscaping,French Restaurant,Food,Fish & Chips Shop
6,LOS MARTIRES,4.603,-74.0913,0.0,Shopping Mall,Boutique,Deli / Bodega,Department Store,Burger Joint,Clothing Store,Steakhouse,Fast Food Restaurant,Farmers Market,Dog Run
7,SAN CRISTOBAL,4.5463,-74.088,0.0,Italian Restaurant,Construction & Landscaping,Music Store,Comfort Food Restaurant,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Water Park
8,PUENTE ARANDA,4.6149,-74.1227,0.0,Burger Joint,Grocery Store,Burrito Place,Latin American Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Dog Run,Diner,Comfort Food Restaurant
9,USAQUEN,4.7485,-74.0312,0.0,Bookstore,Café,Gym,Farmers Market,Park,Water Park,Construction & Landscaping,Convenience Store,Cultural Center,Deli / Bodega


## Analysis and Results <a name="analysis"></a>

<p>According with the results of the k-means algorithm, I analyse the mean features of each cluster based on the higher mean`s category venue of each neighborhood:</p>

<p>  Cluster 0: It has a concentration of parks and place where people practice sports. The neighborhood of this cluster is Suba </p>
<p>  Cluster 1: It has a concentration of restaturants, cafe shops and fast food places. The neighborhoods of this cluster are Barrios unidos, Engativa, Teusaquillo, La Candelaria, Fontibon, Los Martires, Puente Aranda, Usaquen, Kennedy, Bosa, Tunjuelito y Antonio Nariño </p>
<p>  Cluster 2: It has a concentration of stores. The neighborhood of this cluster is San Cristobal </p>
<p>  Cluster 3: It has a concentration of trails, parks and cultural centers. The neighborhood of this cluster is Chapinero </p>
<p>  Cluster 4: It has a concentration of hardware and shoes stores. The neighborhood of this cluster is Rafael Uribe Uribe</p>

### Decision

The cluster where can be located the restauran of this project is the number 1. However this this cluster has 13 posibles neighborhoods. In this sense, the choosen criterion to select the Neigboorhood is that has the lower top five mean of categories type. This logic is based on the idea that the lower mean represents a low concentration of restaurants and in the same way the better opportunity to explote the bussines because there is a potential market.

The lower mean is in the Neighborhood **LA CANDELARIA** with a mean of 0.14. This Neighborhood belongs to the historical center of Bogota and there go many foreings tourist to know the museums and center city. This represents an excellent opportunity of locating the restaurant.

In [63]:
bogota_clusters["Cluster Labels"]=bogota_clusters["Cluster Labels"].astype(int)
bogota_clusters


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BARRIOS UNIDOS,4.6664,-74.084,1,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
1,ENGATIVA,4.7071,-74.1072,1,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega
2,TEUSAQUILLO,4.6448,-74.0938,1,Restaurant,Latin American Restaurant,Sandwich Place,Coffee Shop,Peruvian Restaurant,Pizza Place,Hot Dog Joint,Burger Joint,Seafood Restaurant,Water Park
3,LA CANDELARIA,4.5939,-74.0739,1,Café,Restaurant,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,History Museum,Argentinian Restaurant,Burger Joint,Caribbean Restaurant,Comfort Food Restaurant
4,SUBA,4.7652,-74.0824,0,Soccer Field,Paintball Field,Water Park,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Dog Run
5,FONTIBON,4.6832,-74.1479,1,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Golf Course,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Gym
6,LOS MARTIRES,4.603,-74.0913,1,Shopping Mall,Burger Joint,Clothing Store,Deli / Bodega,Steakhouse,Department Store,Convenience Store,Cultural Center,Dessert Shop,Diner
7,SAN CRISTOBAL,4.5463,-74.088,2,Music Store,Construction & Landscaping,Italian Restaurant,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Water Park
8,PUENTE ARANDA,4.6149,-74.1227,1,Burger Joint,Burrito Place,Grocery Store,Fried Chicken Joint,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Construction & Landscaping
9,USAQUEN,4.7485,-74.0312,1,Bookstore,Farmers Market,Gym,Café,Dog Run,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner


In [64]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bogota_clusters['Latitude'], bogota_clusters['Longitude'], bogota_clusters['Neighborhood'], bogota_clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 0 - Parks and sports fields

In [66]:
bogota_clusters.loc[bogota_clusters['Cluster Labels'] == 0, bogota_clusters.columns[[0] + list(range(4, bogota_clusters.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,SUBA,Soccer Field,Paintball Field,Water Park,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Dog Run


### Cluster 1 - Restaurants, cafe shops and fast food places

In [67]:
bogota_clusters.loc[bogota_clusters['Cluster Labels'] == 1, bogota_clusters.columns[[0] + list(range(4, bogota_clusters.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BARRIOS UNIDOS,Golf Course,Dessert Shop,Pizza Place,Seafood Restaurant,Bakery,Water Park,Athletics & Sports,Diner,Dog Run,Fast Food Restaurant
1,ENGATIVA,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Supermarket,Ice Cream Shop,Burger Joint,Water Park,Diner,Cultural Center,Deli / Bodega
2,TEUSAQUILLO,Restaurant,Latin American Restaurant,Sandwich Place,Coffee Shop,Peruvian Restaurant,Pizza Place,Hot Dog Joint,Burger Joint,Seafood Restaurant,Water Park
3,LA CANDELARIA,Café,Restaurant,Mexican Restaurant,Latin American Restaurant,Italian Restaurant,History Museum,Argentinian Restaurant,Burger Joint,Caribbean Restaurant,Comfort Food Restaurant
5,FONTIBON,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Golf Course,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Gym
6,LOS MARTIRES,Shopping Mall,Burger Joint,Clothing Store,Deli / Bodega,Steakhouse,Department Store,Convenience Store,Cultural Center,Dessert Shop,Diner
8,PUENTE ARANDA,Burger Joint,Burrito Place,Grocery Store,Fried Chicken Joint,French Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Construction & Landscaping
9,USAQUEN,Bookstore,Farmers Market,Gym,Café,Dog Run,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner
10,BOSA,Shopping Mall,Storage Facility,Grocery Store,Restaurant,Construction & Landscaping,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Dog Run
12,KENNEDY,Department Store,Coffee Shop,Soccer Field,Water Park,Dog Run,Convenience Store,Cultural Center,Deli / Bodega,Dessert Shop,Diner


### Cluster 2 - Stores

In [68]:
bogota_clusters.loc[bogota_clusters['Cluster Labels'] == 2, bogota_clusters.columns[[0] + list(range(4, bogota_clusters.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,SAN CRISTOBAL,Music Store,Construction & Landscaping,Italian Restaurant,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Water Park


### Cluster 3 - Trails, parks an cultural centers

In [69]:
bogota_clusters.loc[bogota_clusters['Cluster Labels'] == 3, bogota_clusters.columns[[0] + list(range(4, bogota_clusters.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,CHAPINERO,Trail,Water Park,Dog Run,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Farmers Market


### Cluster 4 - Hardware and shoe stores

In [70]:
bogota_clusters.loc[bogota_clusters['Cluster Labels'] == 4, bogota_clusters.columns[[0] + list(range(4, bogota_clusters.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,RAFAEL URIBE URIBE,Hardware Store,Shoe Store,Water Park,Diner,Convenience Store,Cultural Center,Deli / Bodega,Department Store,Dessert Shop,Dog Run


In [71]:
num_top_venues = 5

for hood in bogota_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = bogota_grouped[bogota_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ANTONIO NARINO----
                        venue  freq
0  Construction & Landscaping  0.25
1                  Campground  0.25
2                  Restaurant  0.25
3                Burger Joint  0.25
4         Peruvian Restaurant  0.00


----BARRIOS UNIDOS----
                venue  freq
0         Golf Course  0.17
1        Dessert Shop  0.12
2         Pizza Place  0.12
3  Seafood Restaurant  0.08
4              Bakery  0.08


----BOSA----
                        venue  freq
0  Construction & Landscaping   0.2
1               Shopping Mall   0.2
2               Grocery Store   0.2
3                  Restaurant   0.2
4            Storage Facility   0.2


----CHAPINERO----
                    venue  freq
0                   Trail   1.0
1  Argentinian Restaurant   0.0
2      Mexican Restaurant   0.0
3   Performing Arts Venue   0.0
4                    Park   0.0


----ENGATIVA----
                  venue  freq
0  Fast Food Restaurant  0.29
1     Convenience Store  0.14
2        Ice Cre

## Discussion<a name="discussion"></a>

The decision and recommendation of this project is based only in the information about unique source information (FOURSQUARE API). However could be posible to consider other type of information as  distances to specific buildings, demand and prediction of potential customers, etc.

## Conclusion<a name="conclusion"></a>

According with the analysis and results, I recommend the **LA CANDELARIA** as the best Neighborhood to locate the restaurant. This Neighboorhood belongs to the historical center of Bogota and there go many foreings tourist to know the museums and center city. This represents an excellent opportunity of locating the restaurant.

Thanks!