# Proyecto Capstone - La Batalla de los Vecindarios semana 4

# Introducción

En este proyecto vamos a abordar geográficamente cada uno de los barrios de la ciudad de Medellín - Colombia, con el fin de determinar la mejor opción para establecer una discoteca.
Utilizaremos una base de datos proporcionada por la alcaldía de Medellí, ademas otra herramienta es la API de Foursqueare, que nos ayudara para analizar a detalle cada uno de los barrios, al realizar este análisis nos daremos cuenta de aquellos lugares cercanos y comunes a nuestra área de interés.
Además, nuestro público objetivo es todo aquel interesado en emprender crenado un negocio para el comercio nocturno de la ciudad de Medellín


# Problema Comercial

Este problema surge ya que uno de los efectos de la pandemia del año 2020 es la desaparición del entreteniendo nocturno en la ciudad de Medellín, por lo tanto, se busca el barrio más adecuado para emprender en este segmento del mercado.
Con base en lo anterior, utilizaremos Ciencia de Datos para generar Clústeres y así saber en cual barrio de Medellín es más factible entablar una discoteca.


# Datos

Para este proyecto nos basamos en la fuente de datos Barrios Medellín la cual la encontramos en la página web Geo Medellín, sitio web administrado por la Alcaldía de Medellín. En esta fuente de datos, encontraremos cada uno de los barrios de la ciudad de Medellín, juntos con otros atributos tales como tamaño del área, nombre de la comuna a la que pertenece, Id, entre otros. Posteriormente haremos limpieza de datos para dejar solo nuestros campos de interés.
Como segundo recurso utilizamos la librería geopy para encontrar los valores de latitud y longitud para cada uno de los barrios.
Una vez tengamos el DataFrame de los barrios de Medellín con sus respectivas coordenadas, procederemos a usar la API de Foursquare para analizar cada uno de los barrios.
Con nuestro DataFrame completo con el nombre del barrio y sus respectivas coordenadas, procedemos a conectarnos con Foursquare, el cual nos ayudará a analizar cuáles son las categorías de negocios más comunes en cada uno de los barrios de la ciudad de Medellín.


# Proceso

### Importar Librerias

In [1]:
import numpy as np # librería para manejar datos vectorizados

import pandas as pd # librería para análisis de datos
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # librería para manejar archivos JSON 

!conda install -c conda-forge geopy --yes # retirar el comentario de esta línea si no ha completado el laboratorio de la API de FourSquare 
from geopy.geocoders import Nominatim # convertir una dirección en valores de latitud y longitud

import requests # librería para manejar solicitudes
from pandas.io.json import json_normalize # librería para convertir un archivo json en un dataframe pandas

# Matplotlib y módulos asociados para graficar
import matplotlib.cm as cm
import matplotlib.colors as colors

# importar k-means desde la fase de agrupación
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # retirar el comentario de esta línea si no ha completado el laboratorio de la API de FourSquare
import folium # librería para graficar mapas 

print('Libraries imported.')

usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # retirar el comentario de esta línea si no ha completado el laboratorio de la API de FourSquare


Libraries imported.


In [2]:
#Cargar set de datos barrios de Medellín
df = pd.read_csv(r'C:\Users\jaimea.munoz\Documents\DATA SCIENCE\Curso 9 Final\BarriosMedellin.csv')
df.head()

Unnamed: 0,OBJECTID,CODIGO,NOMBRE,SUBTIPO_BARRIOVEREDA,NOMBRE_COMUNA_CORREGIMIENTO,SHAPEAREA,SHAPELEN
0,1112,510,Tricentenario,1,Castilla,420637.970349,2897.304229
1,1113,208,Villa Niza,1,Santa Cruz,143215.327504,1697.303318
2,1114,1108,Laureles,1,Laureles Estadio,707014.821267,3847.112683
3,1115,1303,Santa Rosa de Lima,1,San Javier,139970.996369,2158.954261
4,1116,1206,Santa Lucía,1,La América,275913.740234,3048.703385


In [3]:
#Borrar columnas
df = df.drop('OBJECTID',1).drop('SUBTIPO_BARRIOVEREDA',1).drop('NOMBRE_COMUNA_CORREGIMIENTO',1).drop('SHAPEAREA',1).drop('SHAPELEN',1)

In [4]:
df.head()

Unnamed: 0,CODIGO,NOMBRE
0,510,Tricentenario
1,208,Villa Niza
2,1108,Laureles
3,1303,Santa Rosa de Lima
4,1206,Santa Lucía


In [5]:
#Mostrar tamaño del DataFrame
df.shape

(332, 2)

### Determinar coordenadas de la ciudad de Medellín

In [6]:
for i, row in df.iterrows():
    try:
        
        adress = row['NOMBRE']
        geolocator = Nominatim(user_agent='Md_explorer')
        location = geolocator.geocode(adress)
        latitude = location.latitude
        longitude = location.longitude
        df.loc[i, 0] = latitude
        df.loc[i, 1] = longitude
        #print('las coordenadas de {} son {}, {}'.format(adress, latitude, longitude) )
    except:
        print(adress)

df.rename(columns={0:'Latitude',1:'Longitude'}, inplace=True)

Buga Patio Bonito
Área de Expansión El Noral
Potrera Miserenga
Picachito
Área de Expansión San Cristóbal
Área de Expansión Altavista
Área de Expansión Belén Rincón
Cabecera Urbana Corregimiento San Cristóbal
Área de Expansión Pajarito
Área de Expansión San Antonio de Prado


In [7]:
#Borrar filas con campos nulos
df.dropna(inplace=True)
df.head()

Unnamed: 0,CODIGO,NOMBRE,Latitude,Longitude
0,510,Tricentenario,6.29107,-75.566325
1,208,Villa Niza,6.295645,-75.56345
2,1108,Laureles,-31.498759,-57.52599
3,1303,Santa Rosa de Lima,14.436619,-90.352043
4,1206,Santa Lucía,13.825049,-60.975036


In [8]:
# Columna bandera
for i, row in df.iterrows():
    if row['Latitude'] > 6.1 and row['Latitude'] < 6.6:
        bandera = 1
    else:
        bandera = 0
    
    df.loc[i,0] = bandera

In [9]:
df = df.rename(columns={0:"Bandera"})

In [10]:
df = df[df["Bandera"] == 1]
df

Unnamed: 0,CODIGO,NOMBRE,Latitude,Longitude,Bandera
0,0510,Tricentenario,6.29107,-75.566325,1.0
1,0208,Villa Niza,6.295645,-75.56345,1.0
8,0725,Nueva Villa de La Iguaná,6.25994,-75.581743,1.0
9,0905,Alejandro Echavarría,6.23877,-75.546348,1.0
11,0105,Moscú No.2,6.289982,-75.549095,1.0
12,0101,Santo Domingo Savio No.1,6.297085,-75.54431,1.0
21,1213,Calasanz Parte Alta,6.266179,-75.60182,1.0
23,0202,Playón de Los Comuneros,6.306853,-75.553086,1.0
24,1004,El Chagualo,6.262212,-75.570487,1.0
27,0916,Asomadera No.3,6.218804,-75.558343,1.0


In [11]:
#Mostrar tamaño del DataFrame
df.shape

(99, 5)

### Utilice la librería geopy para obtener la latitud y la longitud de la Ciudad de Medellín

In [12]:
#Renombramos columna "NOMBRE" a "Neighborhood"
df.rename(columns={'NOMBRE':'Neighborhood'}, inplace=True)

In [13]:
#Obtenemos coordenadas solo para la ciudad de Medellín
address = 'Medellín, CO'

geolocator = Nominatim(user_agent="Md_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Medellín, CO are 6.2443382, -75.573553.


In [14]:
#Crear un mapa de  Medellin
map_medellin = folium.Map(location=[latitude, longitude], zoom_start=10)

#Añadir marcadores al mapa
for lat, lng, Neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(Neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_medellin)
    
map_medellin

# Definir la versión y credenciales de FourSquare

In [15]:
CLIENT_ID = 'N5LPL5YGBPLQ4RHJJWHJYN4RKXNVR0J3OTMOC35S00UVPFWL' # su ID de Foursquare
CLIENT_SECRET = 'OG2J1Z54UD2ATX3MD4SEYSAI5FCZUJBKDE2POA5ZIJDXIC1C' # Secreto de Foursquare
VERSION = '20180605' # versión de la API de Foursquare
LIMIT = 100 # Un valor límite para la API de Foursquare

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N5LPL5YGBPLQ4RHJJWHJYN4RKXNVR0J3OTMOC35S00UVPFWL
CLIENT_SECRET:OG2J1Z54UD2ATX3MD4SEYSAI5FCZUJBKDE2POA5ZIJDXIC1C


# Explorar el primer barrio del DataFrame

In [16]:
df.loc[0, 'Neighborhood']

'Tricentenario'

In [17]:
# Obtener coordenadas de nuestro primer barrio
neighborhood_latitude = df.loc[0, 'Latitude']
neighborhood_longitude = df.loc[0, 'Longitude']
neighborhood_name = df.loc[0, 'Neighborhood']

In [18]:
print("las coordenadas del barrio {}, son {}, {}".format(neighborhood_name,
                                                         neighborhood_latitude,
                                                         neighborhood_longitude
                                                        ))

las coordenadas del barrio Tricentenario, son 6.29107, -75.5663252


### Obtengamos los 100 sitios en Tricentenario en un radio de 500 metros

In [19]:
radius = 500
limit = 100
url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&limit={}&radius={}".format(CLIENT_ID,
                                                                                                                        CLIENT_SECRET,
                                                                                                                        VERSION,
                                                                                                                        neighborhood_latitude,
                                                                                                                        neighborhood_longitude,
                                                                                                                        limit,
                                                                                                                        radius)
url

'https://api.foursquare.com/v2/venues/explore?client_id=N5LPL5YGBPLQ4RHJJWHJYN4RKXNVR0J3OTMOC35S00UVPFWL&client_secret=OG2J1Z54UD2ATX3MD4SEYSAI5FCZUJBKDE2POA5ZIJDXIC1C&v=20180605&ll=6.29107,-75.5663252&limit=100&radius=500'

In [20]:
# Enviar la solicitud GET
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '61397b6775445b1930cc04ee'},
 'response': {'headerLocation': 'Castilla',
  'headerFullLocation': 'Castilla, Medellín',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 6.2955700045000045,
    'lng': -75.5618063854314},
   'sw': {'lat': 6.286569995499996, 'lng': -75.57084401456859}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ca53e1b8a65bfb726762b22',
       'name': 'Parque Juanes de la Paz',
       'location': {'address': 'Carrera 65 98 - 50',
        'lat': 6.292662952964911,
        'lng': -75.56867270242827,
        'labeledLatLngs': [{'label': 'display',
          'lat': 6.292662952964911,
          'lng': -75.56867270242827}],
        'distance': 314,
        'cc': 'CO',

In [21]:
# función para extraer la categoria del sitio
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [22]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # objeto JSON

# filtrar columnas
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filtrar la categoría para cada fila
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# limpiar columnas
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # objeto JSON


Unnamed: 0,name,categories,lat,lng
0,Parque Juanes de la Paz,Recreation Center,6.292663,-75.568673
1,METRO - Estacion Tricentenario,Metro Station,6.290542,-75.564733
2,Club De Tenis El Bosque,Tennis Court,6.293351,-75.568521
3,"Parche Tricen,Tienda mixta",Grocery Store,6.29267,-75.564551


# Analisar Barrios de Medellín

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # crear la URL de solicitud de API
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # solicitud GET
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # regresa solo información relevante de cada sitio cercano
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# Ejecutar función en cada barrio

In [24]:
medellin_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Tricentenario
Villa Niza
Nueva Villa de La Iguaná
Alejandro Echavarría
Moscú No.2
Santo Domingo Savio No.1
Calasanz Parte Alta
Playón de Los Comuneros
El Chagualo
Asomadera No.3
El Pesebre
Manrique Central No.1
Las Independencias
Moscú No.1
Bosques de San Pablo
Los Mangos
Los Cerros El Vergel
San José La Cima No.1
María Cano-Carambolas
La Pilarica
Tejelo
Cuarta Brigada
Aures No.1
Carpinelo
Carlos E. Restrepo
Juan XXIII La Quiebra
Veinte de Julio
Las Lomas No.2
Barrio Caicedo
Suramericana
Villa Turbay
Asomadera No.1
Bomboná No.2
Progreso No.2
Barrio Cristóbal
Los Balsos No.2
Los Colores
Batallón Girardot
Villatina
San José La Cima No.2
Doce de Octubre No.2
Granizal
San Antonio de Prado
Batallón Cuarta Brigada
Los Balsos No.1
Villa Carlota
El Compromiso
Villa Lilliam
Monteclaro
La Mansión
Diego Echavarría
Campo Valdés No.1
Manrique Central No.2
Calasanz
Bermejal-Los Álamos
Ocho de Marzo
Cucaracho
Belencito
Cerro Nutibara
Universidad de Antioquia
San Javier No.2
Cerro Nutibara
Mirador del

In [25]:
# tamaño del dataframe resultante
print(medellin_venues.shape)
medellin_venues.head()

(871, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Tricentenario,6.29107,-75.566325,Parque Juanes de la Paz,6.292663,-75.568673,Recreation Center
1,Tricentenario,6.29107,-75.566325,METRO - Estacion Tricentenario,6.290542,-75.564733,Metro Station
2,Tricentenario,6.29107,-75.566325,Club De Tenis El Bosque,6.293351,-75.568521,Tennis Court
3,Tricentenario,6.29107,-75.566325,"Parche Tricen,Tienda mixta",6.29267,-75.564551,Grocery Store
4,Villa Niza,6.295645,-75.56345,Central Ganadera S.A.,6.296084,-75.56518,Farm


In [26]:
# Analizar cada barrio
# codificación
medellin_onehot = pd.get_dummies(medellin_venues[['Venue Category']])

# añadir la columna de barrio de regreso al dataframe
medellin_onehot['Neighborhood'] = medellin_venues['Neighborhood'] 

# mover la columna de barrio a la primer columna
fixed_columns = [medellin_onehot.columns[-1]] + list(medellin_onehot.columns[:-1])
medellin_onehot = medellin_onehot[fixed_columns]

medellin_onehot.head()

Unnamed: 0,Neighborhood,Venue Category_Airport,Venue Category_American Restaurant,Venue Category_Amphitheater,Venue Category_Arepa Restaurant,Venue Category_Argentinian Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_BBQ Joint,Venue Category_Bakery,Venue Category_Bar,Venue Category_Baseball Field,Venue Category_Basketball Court,Venue Category_Bed & Breakfast,Venue Category_Beer Garden,Venue Category_Bike Rental / Bike Share,Venue Category_Bookstore,Venue Category_Bowling Alley,Venue Category_Brazilian Restaurant,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Burger Joint,Venue Category_Burrito Place,Venue Category_Business Service,Venue Category_Cable Car,Venue Category_Café,Venue Category_Campground,Venue Category_Caribbean Restaurant,Venue Category_Casino,Venue Category_Cemetery,Venue Category_Clothing Store,Venue Category_Cocktail Bar,Venue Category_Coffee Shop,Venue Category_Colombian Restaurant,Venue Category_Concert Hall,Venue Category_Construction & Landscaping,Venue Category_Convenience Store,Venue Category_Cosmetics Shop,Venue Category_Creperie,Venue Category_Cuban Restaurant,Venue Category_Cupcake Shop,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Donut Shop,Venue Category_Electronics Store,Venue Category_Fabric Shop,Venue Category_Farm,Venue Category_Farmers Market,Venue Category_Fast Food Restaurant,Venue Category_Food,Venue Category_Food & Drink Shop,Venue Category_Food Court,Venue Category_Food Stand,Venue Category_Food Truck,Venue Category_French Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Frozen Yogurt Shop,Venue Category_Furniture / Home Store,Venue Category_Gastropub,Venue Category_Gay Bar,Venue Category_General Entertainment,Venue Category_Gift Shop,Venue Category_Golf Course,Venue Category_Gourmet Shop,Venue Category_Grocery Store,Venue Category_Gym,Venue Category_Gym / Fitness Center,Venue Category_Health & Beauty Service,Venue Category_Historic Site,Venue Category_History Museum,Venue Category_Home Service,Venue Category_Hostel,Venue Category_Hot Dog Joint,Venue Category_Hotel,Venue Category_Housing Development,Venue Category_Ice Cream Shop,Venue Category_Indie Movie Theater,Venue Category_Indie Theater,Venue Category_Intersection,Venue Category_Italian Restaurant,Venue Category_Japanese Restaurant,Venue Category_Juice Bar,Venue Category_Karaoke Bar,Venue Category_Lake,Venue Category_Latin American Restaurant,Venue Category_Liquor Store,Venue Category_Lounge,Venue Category_Market,Venue Category_Men's Store,Venue Category_Metro Station,Venue Category_Mexican Restaurant,Venue Category_Middle Eastern Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Mountain,Venue Category_Movie Theater,Venue Category_Multiplex,Venue Category_Museum,Venue Category_Nightclub,Venue Category_Noodle House,Venue Category_Other Great Outdoors,Venue Category_Other Nightlife,Venue Category_Other Repair Shop,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Peruvian Restaurant,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Pie Shop,Venue Category_Pier,Venue Category_Pizza Place,Venue Category_Planetarium,Venue Category_Playground,Venue Category_Plaza,Venue Category_Pool,Venue Category_Pub,Venue Category_Public Art,Venue Category_Real Estate Office,Venue Category_Recreation Center,Venue Category_Rental Service,Venue Category_Resort,Venue Category_Rest Area,Venue Category_Restaurant,Venue Category_Rock Club,Venue Category_Salad Place,Venue Category_Salon / Barbershop,Venue Category_Salsa Club,Venue Category_Sandwich Place,Venue Category_Scenic Lookout,Venue Category_School,Venue Category_Seafood Restaurant,Venue Category_Shoe Store,Venue Category_Shopping Mall,Venue Category_Skating Rink,Venue Category_Snack Place,Venue Category_Soccer Field,Venue Category_Soccer Stadium,Venue Category_South American Restaurant,Venue Category_Spa,Venue Category_Spiritual Center,Venue Category_Sporting Goods Shop,Venue Category_Stadium,Venue Category_Steakhouse,Venue Category_Street Art,Venue Category_Supermarket,Venue Category_Sushi Restaurant,Venue Category_TV Station,Venue Category_Tapas Restaurant,Venue Category_Tennis Court,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Theme Park,Venue Category_Theme Restaurant,Venue Category_Tourist Information Center,Venue Category_Toy / Game Store,Venue Category_Tram Station,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Wings Joint
0,Tricentenario,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Tricentenario,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Tricentenario,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,Tricentenario,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Villa Niza,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [27]:

medellin_grouped = medellin_onehot.groupby('Neighborhood').mean().reset_index()
medellin_grouped.head()

Unnamed: 0,Neighborhood,Venue Category_Airport,Venue Category_American Restaurant,Venue Category_Amphitheater,Venue Category_Arepa Restaurant,Venue Category_Argentinian Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_BBQ Joint,Venue Category_Bakery,Venue Category_Bar,Venue Category_Baseball Field,Venue Category_Basketball Court,Venue Category_Bed & Breakfast,Venue Category_Beer Garden,Venue Category_Bike Rental / Bike Share,Venue Category_Bookstore,Venue Category_Bowling Alley,Venue Category_Brazilian Restaurant,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Burger Joint,Venue Category_Burrito Place,Venue Category_Business Service,Venue Category_Cable Car,Venue Category_Café,Venue Category_Campground,Venue Category_Caribbean Restaurant,Venue Category_Casino,Venue Category_Cemetery,Venue Category_Clothing Store,Venue Category_Cocktail Bar,Venue Category_Coffee Shop,Venue Category_Colombian Restaurant,Venue Category_Concert Hall,Venue Category_Construction & Landscaping,Venue Category_Convenience Store,Venue Category_Cosmetics Shop,Venue Category_Creperie,Venue Category_Cuban Restaurant,Venue Category_Cupcake Shop,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Donut Shop,Venue Category_Electronics Store,Venue Category_Fabric Shop,Venue Category_Farm,Venue Category_Farmers Market,Venue Category_Fast Food Restaurant,Venue Category_Food,Venue Category_Food & Drink Shop,Venue Category_Food Court,Venue Category_Food Stand,Venue Category_Food Truck,Venue Category_French Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Frozen Yogurt Shop,Venue Category_Furniture / Home Store,Venue Category_Gastropub,Venue Category_Gay Bar,Venue Category_General Entertainment,Venue Category_Gift Shop,Venue Category_Golf Course,Venue Category_Gourmet Shop,Venue Category_Grocery Store,Venue Category_Gym,Venue Category_Gym / Fitness Center,Venue Category_Health & Beauty Service,Venue Category_Historic Site,Venue Category_History Museum,Venue Category_Home Service,Venue Category_Hostel,Venue Category_Hot Dog Joint,Venue Category_Hotel,Venue Category_Housing Development,Venue Category_Ice Cream Shop,Venue Category_Indie Movie Theater,Venue Category_Indie Theater,Venue Category_Intersection,Venue Category_Italian Restaurant,Venue Category_Japanese Restaurant,Venue Category_Juice Bar,Venue Category_Karaoke Bar,Venue Category_Lake,Venue Category_Latin American Restaurant,Venue Category_Liquor Store,Venue Category_Lounge,Venue Category_Market,Venue Category_Men's Store,Venue Category_Metro Station,Venue Category_Mexican Restaurant,Venue Category_Middle Eastern Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Mountain,Venue Category_Movie Theater,Venue Category_Multiplex,Venue Category_Museum,Venue Category_Nightclub,Venue Category_Noodle House,Venue Category_Other Great Outdoors,Venue Category_Other Nightlife,Venue Category_Other Repair Shop,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Peruvian Restaurant,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Pie Shop,Venue Category_Pier,Venue Category_Pizza Place,Venue Category_Planetarium,Venue Category_Playground,Venue Category_Plaza,Venue Category_Pool,Venue Category_Pub,Venue Category_Public Art,Venue Category_Real Estate Office,Venue Category_Recreation Center,Venue Category_Rental Service,Venue Category_Resort,Venue Category_Rest Area,Venue Category_Restaurant,Venue Category_Rock Club,Venue Category_Salad Place,Venue Category_Salon / Barbershop,Venue Category_Salsa Club,Venue Category_Sandwich Place,Venue Category_Scenic Lookout,Venue Category_School,Venue Category_Seafood Restaurant,Venue Category_Shoe Store,Venue Category_Shopping Mall,Venue Category_Skating Rink,Venue Category_Snack Place,Venue Category_Soccer Field,Venue Category_Soccer Stadium,Venue Category_South American Restaurant,Venue Category_Spa,Venue Category_Spiritual Center,Venue Category_Sporting Goods Shop,Venue Category_Stadium,Venue Category_Steakhouse,Venue Category_Street Art,Venue Category_Supermarket,Venue Category_Sushi Restaurant,Venue Category_TV Station,Venue Category_Tapas Restaurant,Venue Category_Tennis Court,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Theme Park,Venue Category_Theme Restaurant,Venue Category_Tourist Information Center,Venue Category_Toy / Game Store,Venue Category_Tram Station,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Wings Joint
0,Aldea Pablo VI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alejandro Echavarría,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
2,Asomadera No.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Asomadera No.2,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.147059,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.058824,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412
4,Asomadera No.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Imprimir cada barrio junto con los 5 sitios mas comunes

In [28]:
num_top_venues = 5

for hood in medellin_grouped['Neighborhood']:
    print('----',hood,'----')
    temp = medellin_grouped[medellin_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Aldea Pablo VI ----
                                       venue  freq
0  Venue Category_Construction & Landscaping  0.67
1                  Venue Category_Restaurant  0.33
2                     Venue Category_Airport  0.00
3                    Venue Category_Pie Shop  0.00
4             Venue Category_Other Nightlife  0.00


---- Alejandro Echavarría ----
                               venue  freq
0      Venue Category_Ice Cream Shop  0.25
1        Venue Category_Tram Station  0.25
2       Venue Category_Shopping Mall  0.25
3  Venue Category_Miscellaneous Shop  0.25
4             Venue Category_Airport  0.00


---- Asomadera No.1 ----
                                 venue  freq
0                   Venue Category_Bar  0.12
1      Venue Category_Sushi Restaurant  0.12
2            Venue Category_Restaurant  0.12
3  Venue Category_Brazilian Restaurant  0.12
4  Venue Category_Fast Food Restaurant  0.12


---- Asomadera No.2 ----
                               venue  freq
0          

### escribamos una función para ordenar los sitios en orden descendente

In [59]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Generemos el nuevo dataframe y mostremos los primeros 10 sitios de cada barrio.

In [60]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = medellin_grouped['Neighborhood']

for ind in np.arange(medellin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(medellin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aldea Pablo VI,Venue Category_Construction & Landscaping,Venue Category_Restaurant,Venue Category_Wings Joint,Venue Category_Fabric Shop,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
1,Alejandro Echavarría,Venue Category_Ice Cream Shop,Venue Category_Tram Station,Venue Category_Miscellaneous Shop,Venue Category_Shopping Mall,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant
2,Asomadera No.1,Venue Category_Brazilian Restaurant,Venue Category_Sushi Restaurant,Venue Category_Business Service,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Restaurant,Venue Category_Cocktail Bar,Venue Category_Bar,Venue Category_Food Stand,Venue Category_Food Court
3,Asomadera No.2,Venue Category_Nightclub,Venue Category_Shopping Mall,Venue Category_Restaurant,Venue Category_Bar,Venue Category_Italian Restaurant,Venue Category_Supermarket,Venue Category_Mexican Restaurant,Venue Category_Hotel,Venue Category_History Museum,Venue Category_Café
4,Asomadera No.3,Venue Category_Hotel,Venue Category_Supermarket,Venue Category_South American Restaurant,Venue Category_Nightclub,Venue Category_Scenic Lookout,Venue Category_Tennis Court,Venue Category_Wings Joint,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant


# k-means para agrupar los barrios en 5 agrupaciones

In [61]:
# establecer el número de agrupaciones
kclusters = 5
medellin_grouped_clustering = medellin_grouped.drop('Neighborhood', 1)
# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(medellin_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
kmeans.labels_[0:10]

array([3, 1, 1, 1, 1, 3, 2, 1, 1, 4])

In [62]:
# añadir etiquetas
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [63]:
medellin_merged = df

#  juntar Medellin_grouped con df
medellin_merged = medellin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

medellin_merged.reset_index()
medellin_merged.head() # revisar las ultimas columnas


Unnamed: 0,CODIGO,Neighborhood,Latitude,Longitude,Bandera,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,510,Tricentenario,6.29107,-75.566325,1.0,1.0,Venue Category_Grocery Store,Venue Category_Recreation Center,Venue Category_Metro Station,Venue Category_Tennis Court,Venue Category_Electronics Store,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
1,208,Villa Niza,6.295645,-75.56345,1.0,1.0,Venue Category_Real Estate Office,Venue Category_Grocery Store,Venue Category_Health & Beauty Service,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
8,725,Nueva Villa de La Iguaná,6.25994,-75.581743,1.0,1.0,Venue Category_Hotel,Venue Category_Seafood Restaurant,Venue Category_Housing Development,Venue Category_Cocktail Bar,Venue Category_Soccer Stadium,Venue Category_Fast Food Restaurant,Venue Category_BBQ Joint,Venue Category_Gym,Venue Category_Latin American Restaurant,Venue Category_Shopping Mall
9,905,Alejandro Echavarría,6.23877,-75.546348,1.0,1.0,Venue Category_Ice Cream Shop,Venue Category_Tram Station,Venue Category_Miscellaneous Shop,Venue Category_Shopping Mall,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant
11,105,Moscú No.2,6.289982,-75.549095,1.0,1.0,Venue Category_Rental Service,Venue Category_Construction & Landscaping,Venue Category_Furniture / Home Store,Venue Category_Park,Venue Category_Wings Joint,Venue Category_Farm,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant


In [64]:
medellin_merged.shape

(99, 16)

In [65]:
#medellin_merged.to_csv(r'C:\Users\jaimea.munoz\Documents\DATA SCIENCE\Curso 9 Final\Categorias.csv')

In [66]:
#Borrar Null 
medellin_merged = medellin_merged1.dropna()
medellin_merged.head()
medellin_merged.shape

(90, 16)

In [67]:
#Convertir "Cluster labels" a int
medellin_merged['Cluster Labels'] = medellin_merged['Cluster Labels'].astype('int')

In [68]:
# crear mapa
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(medellin_merged['Latitude'], medellin_merged['Longitude'], medellin_merged['Neighborhood'], medellin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Sitios donde es mas común discotecas

In [78]:
#Barber shop es común en el barrio la castellana y en el Nogal
medellin_venues[medellin_venues['Venue Category'] =='Bar']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
34,El Chagualo,6.262212,-75.570487,Lico fuente - La mona,6.264126,-75.5679,Bar
51,Manrique Central No.1,6.266456,-75.557399,Gin Tonic,6.26542,-75.555295,Bar
109,Carlos E. Restrepo,6.257775,-75.580938,Aula,6.256565,-75.578337,Bar
166,Barrio Caicedo,6.243292,-75.555425,Humo Rock Bar,6.240764,-75.552485,Bar
196,Asomadera No.1,6.229427,-75.563471,Tapiando,6.227021,-75.565489,Bar
262,Los Balsos No.2,6.195344,-75.570364,Beer Store,6.196202,-75.574525,Bar
371,Manrique Central No.2,6.265049,-75.553665,Gin Tonic,6.26542,-75.555295,Bar
404,Cerro Nutibara,6.23622,-75.579527,Hamburgo,6.233525,-75.580971,Bar
411,Cerro Nutibara,6.23622,-75.579527,Tropical Cocktails La 33,6.239484,-75.579056,Bar
420,Universidad de Antioquia,6.267854,-75.569022,Lico fuente - La mona,6.264126,-75.5679,Bar


## Analizando Cluster de el barrio El Chagualo

In [90]:
#Hacemos zoom al barrio la castellana
medellin_merged[medellin_merged['Neighborhood']=='Cerro Nutibara']

Unnamed: 0,CODIGO,Neighborhood,Latitude,Longitude,Bandera,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
191,1621,Cerro Nutibara,6.23622,-75.579527,1.0,1,Venue Category_Bar,Venue Category_Restaurant,Venue Category_Amphitheater,Venue Category_Scenic Lookout,Venue Category_Indie Theater,Venue Category_History Museum,Venue Category_Historic Site,Venue Category_Park,Venue Category_Latin American Restaurant,Venue Category_Beer Garden
200,Inst_18,Cerro Nutibara,6.23622,-75.579527,1.0,1,Venue Category_Bar,Venue Category_Restaurant,Venue Category_Amphitheater,Venue Category_Scenic Lookout,Venue Category_Indie Theater,Venue Category_History Museum,Venue Category_Historic Site,Venue Category_Park,Venue Category_Latin American Restaurant,Venue Category_Beer Garden


In [94]:
#Hacemos zoom al barrio la castellana
medellin_merged[medellin_merged['Neighborhood']=='El Nogal-Los Almendros']

Unnamed: 0,CODIGO,Neighborhood,Latitude,Longitude,Bandera,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
202,1620,El Nogal-Los Almendros,6.237403,-75.597307,1.0,1,Venue Category_Bar,Venue Category_Steakhouse,Venue Category_Market,Venue Category_Plaza,Venue Category_Salon / Barbershop,Venue Category_Café,Venue Category_Food Truck,Venue Category_Fried Chicken Joint,Venue Category_Burger Joint,Venue Category_Cocktail Bar


## Metodología
Se recurrió a la base de datos de barrios de Medellín proporcionada en la página web de Geo Medellín de la Alcaldía, la cual contiene todos los barrios de Medellín con sus respectivos atributos como Nombre, Id, área, comuna a la que pertenece, entre otros.

Posteriormente se realizó limpieza de la data, dejando solo aquellos campos de interés.

Luego se hizo uso de la librería Geopy para agregar al DataFrame los valores de latitud y longitud de cada barrio; gracias a esto tenemos nuestro input completo para interactuar con la API de Foursqueare, la cual nos permite hacer un análisis profundo de cada uno de los barrios. al conectarnos con la API podemos conocer cuáles son los sitios o las categorías de negocios más comunes en cada uno de los barrios.

Una vez con los datos necesarios, procedemos a construir la url de consulta a la API de Foursquare, y con la respuesta que nos da la API, realizamos el análisis de todos los barrios de Medellín, arrojando como resultado las 10 categorías de lugares más comunes en cada uno de los barrios de Medellín.

Por último se realiza modelo de clustering con algoritmo K-Means, agrupando todo el set de datos en 5 clústeres

## Resultados

In [82]:
# Agrupacion 1
medellin_merged.loc[medellin_merged['Cluster Labels'] == 0, medellin_merged.columns[[1] + list(range(5, medellin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,El Pesebre,0,Venue Category_Mountain,Venue Category_Park,Venue Category_Gym / Fitness Center,Venue Category_Wings Joint,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant
35,Bosques de San Pablo,0,Venue Category_Pizza Place,Venue Category_Park,Venue Category_Gym,Venue Category_Wings Joint,Venue Category_Fabric Shop,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
42,San José La Cima No.1,0,Venue Category_Park,Venue Category_Wings Joint,Venue Category_Cuban Restaurant,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
51,María Cano-Carambolas,0,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Wings Joint,Venue Category_Cuban Restaurant,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
53,La Pilarica,0,Venue Category_BBQ Joint,Venue Category_Food,Venue Category_Park,Venue Category_Grocery Store,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
170,La Mansión,0,Venue Category_Burger Joint,Venue Category_Park,Venue Category_Wings Joint,Venue Category_French Restaurant,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
244,Manrique Oriental,0,Venue Category_Park,Venue Category_Construction & Landscaping,Venue Category_Clothing Store,Venue Category_Wings Joint,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant


In [83]:
# Agrupacion 2
medellin_merged.loc[medellin_merged['Cluster Labels'] == 1, medellin_merged.columns[[1] + list(range(5, medellin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Tricentenario,1,Venue Category_Grocery Store,Venue Category_Recreation Center,Venue Category_Metro Station,Venue Category_Tennis Court,Venue Category_Electronics Store,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
1,Villa Niza,1,Venue Category_Real Estate Office,Venue Category_Grocery Store,Venue Category_Health & Beauty Service,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
8,Nueva Villa de La Iguaná,1,Venue Category_Hotel,Venue Category_Seafood Restaurant,Venue Category_Housing Development,Venue Category_Cocktail Bar,Venue Category_Soccer Stadium,Venue Category_Fast Food Restaurant,Venue Category_BBQ Joint,Venue Category_Gym,Venue Category_Latin American Restaurant,Venue Category_Shopping Mall
9,Alejandro Echavarría,1,Venue Category_Ice Cream Shop,Venue Category_Tram Station,Venue Category_Miscellaneous Shop,Venue Category_Shopping Mall,Venue Category_Farm,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant
11,Moscú No.2,1,Venue Category_Rental Service,Venue Category_Construction & Landscaping,Venue Category_Furniture / Home Store,Venue Category_Park,Venue Category_Wings Joint,Venue Category_Farm,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant
12,Santo Domingo Savio No.1,1,Venue Category_South American Restaurant,Venue Category_Cable Car,Venue Category_Caribbean Restaurant,Venue Category_Farmers Market,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant
21,Calasanz Parte Alta,1,Venue Category_Café,Venue Category_Wings Joint,Venue Category_Cuban Restaurant,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
23,Playón de Los Comuneros,1,Venue Category_Nightclub,Venue Category_Wings Joint,Venue Category_Cuban Restaurant,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
24,El Chagualo,1,Venue Category_Soccer Field,Venue Category_Bar,Venue Category_Shopping Mall,Venue Category_Food Court,Venue Category_Café,Venue Category_Supermarket,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food & Drink Shop,Venue Category_Food
27,Asomadera No.3,1,Venue Category_Hotel,Venue Category_Supermarket,Venue Category_South American Restaurant,Venue Category_Nightclub,Venue Category_Scenic Lookout,Venue Category_Tennis Court,Venue Category_Wings Joint,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant


In [84]:
# Agrupacion 3
medellin_merged.loc[medellin_merged['Cluster Labels'] == 2, medellin_merged.columns[[1] + list(range(5, medellin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Los Mangos,2,Venue Category_Home Service,Venue Category_Wings Joint,Venue Category_Fabric Shop,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market,Venue Category_Farm
292,Aures No.2,2,Venue Category_Construction & Landscaping,Venue Category_Home Service,Venue Category_Wings Joint,Venue Category_Fabric Shop,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market


In [85]:
# Agrupacion 4
medellin_merged.loc[medellin_merged['Cluster Labels'] == 3, medellin_merged.columns[[1] + list(range(5, medellin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Aures No.1,3,Venue Category_Construction & Landscaping,Venue Category_Wings Joint,Venue Category_Farm,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
71,Carpinelo,3,Venue Category_Construction & Landscaping,Venue Category_Restaurant,Venue Category_Wings Joint,Venue Category_Fabric Shop,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
158,El Compromiso,3,Venue Category_Construction & Landscaping,Venue Category_Wings Joint,Venue Category_Farm,Venue Category_Food Truck,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
260,Aldea Pablo VI,3,Venue Category_Construction & Landscaping,Venue Category_Restaurant,Venue Category_Wings Joint,Venue Category_Fabric Shop,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market


In [86]:
# Agrupacion 5
medellin_merged.loc[medellin_merged['Cluster Labels'] == 4, medellin_merged.columns[[1] + list(range(5, medellin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
207,Las Estancias,4,Venue Category_Shoe Store,Venue Category_Wings Joint,Venue Category_French Restaurant,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market,Venue Category_Farm
221,Barrios de Jesús,4,Venue Category_Shoe Store,Venue Category_Wings Joint,Venue Category_French Restaurant,Venue Category_Food Stand,Venue Category_Food Court,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market,Venue Category_Farm


## Conclusiones


Según los resultados arrojados por el modelo, podemos concluir que los únicos barrios de Medellín donde la categoría Discoteca (bar) (nuestra categoría de interés) esta como la categoría más popular es el Cerro Nutibara y Los Almendros; es decir, estos 2 sitios serían sitios óptimos para entablar nuestro negocio Barber Shop.
Los sitios anteriormente mencionados tienen en común lugares como restaurantes y parques, lo que convierte estos sitios en lugares de interés para un nicho de mercado que abarca la gente joven de la sociedad, lo que los convierte en excelentes candaditos para emprender en este tipo de negocio
