<h1 align=center><font size = 5>Proyecto Final Capstone - La Batalla de los Vecindarios</font></h1>

<h2>Tabla de Contenido</h2>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><a href="#introduccion">Introducción</a></li>
    <li><a href="#problema">Problema Comercial</a></li>
    <li><a href="#datos">Datos</a></li>
    <li><a href="#metodologia">Metodologia</a></li>
    <li><a href="#resultados">Resultados</a></li>
    <li><a href="#conclusiones">Conclusiones</a></li>
</ol>
    
</div>
 
<hr>

<h2 id="introduccion">1. Introducción</h2>

Para este proyecto nos centraremos en analizar y visualizar, los barrios de la ciudad de **Medellin-Colombia**, para determinar cual lugar es mejor para abrir una **pizzeria**.

Utilizaremos un set de herramientas de ciencia de datos, para manipular la data, obtener datos geoespaciales y segmentar los barrios.

Este proyecto en particular, esta enfocado en un negocio de pizzeria, pero puede ser perfectamente aplicable a cualquier otro negocio.

<h2 id="problema">2. Problema Comercial</h2>

Un amigo cercano desea abrir un local de comida, especificamente una pizzeria, pero no tiene idea de en que lugar seria mas propicio abrir este local, en este emprendimiento tiene puesto sus esfuerzos y ahorros, es de vital importancia escoger bien el lugar donde estará la pizzeria.
Con la ayuda de la ciencia de datos, podemos darle un panomara claro, de donde es factible abrir la pizzeria y donde no lo es.

<h2 id="datos">3. Datos</h2>

<h3 id="datos">3.1. Fuentes de Datos</h3>

Los datos que vamos a utilizar, provienen de varias fuentes, una de ella es la base de datos proporcionada por la alcaldia de **Medellin**, donde se encuentran todos los barrios de medellin, la comuna a la que pertenecen, (parecido a los distritos en otras ciudades) y si son urbanos o rurales. Estos datos pueden ser consultados en este enlace: <a href="https://geomedellin-m-medellin.opendata.arcgis.com/datasets/M-Medellin::barrio-vereda/explore?location=6.268900%2C-75.595550%2C12.00&showTable=true">FuenteDatos<a/>.

<p>Tenemos la opción de hacer web scraping, con el paquete <b>BeautifulSoup</b>, que utilizamos en los laboratorios anteriores, o tambien podemos descargar el archivo en formato <b>.csv</b>, directamente desde la pagina de la alcaldia de Medellin. y cargarlo a nuestro Notebook, con Pandas.
Escogeremos esta opción por simplicidad.</p>
<p>Utilizaremos la libreria <b>GeoPy</b>, para recuperar los datos Geoespaciales de cada uno de los barrios de la ciudad y los combinaremos en una sola tabla.</p>
<p>Ademas utilizaremos los datos de la Api <b>FourSquare</b>, para recuperar lugares y calificaciones, de los negocios similares, para los que deseamos predecir las mejores ubicaciones.</p>

<h3 id="datos">3.2. Data Cleaning</h3>

Como los Datos provienen de varias fuentes es posible, que halla datos faltantes o nulos, decidí eliminar estos datos de nuestro DataFrame principal, al igual que solo trabajaremos con los barrios que sean catalogados como **Urbanos**, y eliminaremos los que aparecen como rurales.

### Importar librerias

In [1]:
import numpy as np # librería para manejar datos vectorizados

import pandas as pd # librería para análisis de datos
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # librería para manejar archivos JSON 

import requests # librería para manejar solicitudes
from pandas.io.json import json_normalize # librería para convertir un archivo json en un dataframe pandas

# Matplotlib y módulos asociados para graficar
import matplotlib.cm as cm
import matplotlib.colors as colors

# importar k-means desde la fase de agrupación
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # librería para graficar mapas 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


### Cargamos el conjunto de datos de los barrios de **Medellin**.

In [2]:
dfBarrios = pd.read_csv('C:/Users/andrey.zapata/Downloads/Barrio_Vereda.csv')
dfBarrios.head(5)

Unnamed: 0,OBJECTID,CODIGO,NOMBRE,SUBTIPO_BARRIOVEREDA,NOMBRE_COMUNA_CORREGIMIENTO,SHAPEAREA,SHAPELEN
0,1112,510,Tricentenario,1,Castilla,420637.970349,2897.304229
1,1113,208,Villa Niza,1,Santa Cruz,143215.327504,1697.303318
2,1114,1108,Laureles,1,Laureles Estadio,707014.821267,3847.112683
3,1115,1303,Santa Rosa de Lima,1,San Javier,139970.996369,2158.954261
4,1116,1206,Santa Lucía,1,La América,275913.740234,3048.703385


In [3]:
# Eliminammos las columnas SHAPEAREA, SHAPELEN, OBJECTID Y CODIGO ya que nos las utilizaremos

dfBarrios=dfBarrios.drop(['OBJECTID','CODIGO', 'SHAPEAREA', 'SHAPELEN'],axis=1)

#### Eliminamos los barrios que estan catalogados como **Rurales** y los barrios que no tienen nombre, o GeoPy no puede encontrar su datos Geoespaciales

In [4]:
indexNames = dfBarrios[ (dfBarrios['SUBTIPO_BARRIOVEREDA'] == 2)
                | (dfBarrios['NOMBRE'] == "Hospital San Vicente de Paúl")
                | (dfBarrios['NOMBRE'] == "Área de Expansión El Noral")
                | (dfBarrios['NOMBRE'] == "Facultad de Minas")
                | (dfBarrios['NOMBRE'] == "Facultad Veterinaria y Zootecnia U.de.A.")
                | (dfBarrios['NOMBRE'] == "U.P.B")
                | (dfBarrios['NOMBRE'] == "El Nogal-Los Almendros")
                | (dfBarrios['NOMBRE'] == "Cementerio Universal")
                | (dfBarrios['NOMBRE'] == "Centro Administrativo")
                | (dfBarrios['NOMBRE'] == "Facultad de Minas U. Nacional")
                | (dfBarrios['NOMBRE'] == "Las Acacias")
                | (dfBarrios['NOMBRE'] == "Plaza de Ferias")
                | (dfBarrios['NOMBRE'] == "Terminal de Transporte")
                | (dfBarrios['NOMBRE'] == "Oleoducto")
                | (dfBarrios['NOMBRE'] == "San Isidro")
                | (dfBarrios['NOMBRE'] == "Naranjal")
                | (dfBarrios['NOMBRE'] == "La Palma")
                | (dfBarrios['NOMBRE'] == "Las Palmas")
                | (dfBarrios['NOMBRE'] == "El Salado")
                | (dfBarrios['NOMBRE'] == "Altavista")                     
                | (dfBarrios['NOMBRE'] == "Sin Nombre") ].index
dfBarrios.drop(indexNames , inplace=True)
dfBarrios.head(5)

Unnamed: 0,NOMBRE,SUBTIPO_BARRIOVEREDA,NOMBRE_COMUNA_CORREGIMIENTO
0,Tricentenario,1,Castilla
1,Villa Niza,1,Santa Cruz
2,Laureles,1,Laureles Estadio
3,Santa Rosa de Lima,1,San Javier
4,Santa Lucía,1,La América


In [5]:
#Ordenamos los datos por nombre del barrio, renombramos algunas columnas y verificamos el DF
dfBarrios=dfBarrios[['NOMBRE','SUBTIPO_BARRIOVEREDA','NOMBRE_COMUNA_CORREGIMIENTO']]
dfBarrios.columns=['barrio','BarrioOVereda','Comuna']
dfBarrios.sort_values(by=['barrio'], ascending=True,inplace=True)
dfBarrios.reset_index(drop=True, inplace=True)
dfBarrios.head()

Unnamed: 0,barrio,BarrioOVereda,Comuna
0,Aldea Pablo VI,1,Popular
1,Alejandro Echavarría,1,Buenos Aires
2,Alejandría,1,El Poblado
3,Alfonso López,1,Castilla
4,Altamira,1,Robledo


#### Verificamos el numero de resgistros resultantes.

In [6]:
dfBarrios.shape

(251, 3)

#### Importamos libreria **GeoPy**, para obtener los datos geoespaciales de los barrios de la ciudad

In [7]:
from geopy.geocoders import Nominatim

In [8]:
address = 'Medellin, CO'

geolocator = Nominatim(user_agent="mde_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Medellin City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Medellin City are 6.2443382, -75.573553.


### Insertamos 2 columnas nuevas para la latitud y longitud

In [9]:
dfBarrios.insert(3, "lat", "")
dfBarrios.insert(4, "lng", "")
dfBarrios.head(5)

Unnamed: 0,barrio,BarrioOVereda,Comuna,lat,lng
0,Aldea Pablo VI,1,Popular,,
1,Alejandro Echavarría,1,Buenos Aires,,
2,Alejandría,1,El Poblado,,
3,Alfonso López,1,Castilla,,
4,Altamira,1,Robledo,,


#### Construimos un ciclo para asignar a cada fila, de nuestro DataFrame, los datos de **latitud** y **longitud**, resultantes de la libreria GeoPy

In [10]:
address = 'Medellin,CO'

for i in dfBarrios.index:
    nomBarrio = dfBarrios.loc[i, 'barrio']
    addressFull = nomBarrio + "," + address
    geolocator = Nominatim(user_agent="mde_explorer")
    location = geolocator.geocode(addressFull)
    #print(str(i) + dfBarrios.loc[i, 'NOMBRE'])
    if(location.latitude is not None and location.longitude is not None):
        latitude = location.latitude
        longitude = location.longitude
    else:
        print(dfBarrios.loc[i, 'barrio'] + "Tiene datos geoespaciales nulos")
        latitude = "NoData"
        longitude = "NoData"
        
    
    dfBarrios.loc[i, 'lat']= latitude
    dfBarrios.loc[i, 'lng']= longitude

#### verificamos el numero de filas y columnas, que concuerden con los que teniamos anterirormente y si hay valores nulos

In [11]:
dfBarrios.shape
dfBarrios.isnull().any()

barrio           False
BarrioOVereda    False
Comuna           False
lat              False
lng              False
dtype: bool

<h3 id="analisis">5. Analisis</h3>

#### inicializamos los parametros para generar el mapa de la ciudad

In [12]:
address = 'Medellin, CO'

geolocator = Nominatim(user_agent="mde_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Medellin City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Medellin City are 6.2443382, -75.573553.


#### Generemos el mapa con sus marcadores

In [13]:
# crear un mapa de Medellin utilizando los valores de latitud y longitud
map_medellin = folium.Map(location=[latitude, longitude], zoom_start=12)

# añadir marcadores al mapa
for lat, lng, comuna, barrio in zip(dfBarrios['lat'], dfBarrios['lng'], dfBarrios['BarrioOVereda'], dfBarrios['barrio']):
    label = '{}, {}'.format(barrio, comuna)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_medellin)  
    
map_medellin

#### inicializamos los argumentos que utilizaremos para la Api Foursquare

In [14]:
CLIENT_ID = '3KQ23TNDRE4U545FXH421FR5OEUAJ0UU4PKVNC4XNHRE3LKM' # su ID de Foursquare
CLIENT_SECRET = 'L1MSXE30SXDOQ4CNBTSMIRK2ZAXX5NZ1S3S2IRU1OTMMEBEA' # Secreto de Foursquare
VERSION = '20180605' # versión de la API de Foursquare
LIMIT = 50 # Un valor límite para la API de Foursquare

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3KQ23TNDRE4U545FXH421FR5OEUAJ0UU4PKVNC4XNHRE3LKM
CLIENT_SECRET:L1MSXE30SXDOQ4CNBTSMIRK2ZAXX5NZ1S3S2IRU1OTMMEBEA


#### Construimos una funcion para obtener los sitios cercanos de cada uno de los barrios de la ciudad. en un radio de 800 Mts y un limite de 50 sitios 

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=800):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # crear la URL de solicitud de API
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # solicitud GET
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # regresa solo información relevante de cada sitio cercano
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['BARRIO', 
                  'BARRIO Latitude', 
                  'BARRIO Longitude', 
                  'SITIO', 
                  'SITIO Latitude', 
                  'SITIO Longitude', 
                  'SITIO Category']
    
    return(nearby_venues)

##### Llamamos la funcion que construimos

In [17]:
Medellin_venues = getNearbyVenues(names=dfBarrios['barrio'],
                                   latitudes=dfBarrios['lat'],
                                   longitudes=dfBarrios['lng']
                                  )

Aldea Pablo VI
Alejandro Echavarría
Alejandría
Alfonso López
Altamira
Altos del Poblado
Andalucía
Antonio Nariño
Aranjuez
Asomadera No.1
Asomadera No.2
Asomadera No.3
Astorga
Aures No.1
Aures No.2
B. Cerro El Volador
Barrio Caicedo
Barrio Colombia
Barrio Colón
Barrio Cristóbal
Barrios de Jesús
Batallón Cuarta Brigada
Batallón Girardot
Belalcázar
Belencito
Bello Horizonte
Belén
Berlín
Bermejal-Los Álamos
Betania
Blanquizal
Bolivariana
Bomboná No.1
Bomboná No.2
Bosques de San Pablo
Boston
Boyacá
Brasilia
Buenos Aires
Calasanz
Calasanz Parte Alta
Calle Nueva
Campo Alegre
Campo Amor
Campo Valdés No.1
Campo Valdés No.2
Caribe
Carlos E. Restrepo
Carpinelo
Castilla
Castropol
Cataluña
Cerro Nutibara
Cerro Nutibara
Corazón de Jesús
Cristo Rey
Cuarta Brigada
Cucaracho
Córdoba
Diego Echavarría
Doce de Octubre No.1
Doce de Octubre No.2
Ecoparque Cerro El Volador
Eduardo Santos
El Castillo
El Chagualo
El Compromiso
El Corazón
El Danubio
El Diamante
El Diamante No.2
El Pesebre
El Pinal
El Poblado
El

#### Revisemos el tamaño del nuevo DataFrame y las primeras filas

In [18]:
print(Medellin_venues.shape)
Medellin_venues.head()

(4471, 7)


Unnamed: 0,BARRIO,BARRIO Latitude,BARRIO Longitude,SITIO,SITIO Latitude,SITIO Longitude,SITIO Category
0,Aldea Pablo VI,6.288287,-75.542067,Parque Santo Domingo,6.293031,-75.541819,Plaza
1,Aldea Pablo VI,6.288287,-75.542067,Metrocable Linea L - Estación Santo Domingo,6.292908,-75.541689,Cable Car
2,Aldea Pablo VI,6.288287,-75.542067,La Mesa del Barrio,6.294749,-75.544043,South American Restaurant
3,Aldea Pablo VI,6.288287,-75.542067,San José La Cima # 1,6.281684,-75.543637,Park
4,Aldea Pablo VI,6.288287,-75.542067,Alquiler T & M Equipos,6.281701,-75.544761,Rental Service


Revisemos cuantos retornaron para cada barrio

In [19]:
Medellin_venues.groupby('BARRIO').count()

Unnamed: 0_level_0,BARRIO Latitude,BARRIO Longitude,SITIO,SITIO Latitude,SITIO Longitude,SITIO Category
BARRIO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aldea Pablo VI,5,5,5,5,5,5
Alejandro Echavarría,8,8,8,8,8,8
Alejandría,50,50,50,50,50,50
Alfonso López,6,6,6,6,6,6
Altamira,3,3,3,3,3,3
Altos del Poblado,5,5,5,5,5,5
Andalucía,4,4,4,4,4,4
Antonio Nariño,4,4,4,4,4,4
Aranjuez,7,7,7,7,7,7
Asomadera No.1,35,35,35,35,35,35


#### Encontremos cuantas categorías únicas se pueden conservar de todos los sitios regresados

In [20]:
print('Hay {} categorias unicas.'.format(len(Medellin_venues['SITIO Category'].unique())))

Hay 228 categorias unicas.


### Analicemos cada barrio

In [21]:
# codificación
Medellin_onehot = pd.get_dummies(Medellin_venues[['SITIO Category']], prefix="", prefix_sep="")

# añadir la columna de barrio de regreso al dataframe
Medellin_onehot['BARRIO'] = Medellin_venues['BARRIO'] 

# mover la columna de barrio a la primer columna
fixed_columns = [Medellin_onehot.columns[-1]] + list(Medellin_onehot.columns[:-1])
Medellin_onehot = Medellin_onehot[fixed_columns]

Medellin_onehot.head()

Unnamed: 0,BARRIO,Advertising Agency,Airport,Airport Lounge,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,BBQ Joint,Bakery,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Betting Shop,Big Box Store,Bike Rental / Bike Share,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Station,Business Service,Cable Car,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Casino,Cemetery,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Gym,Colombian Restaurant,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Donut Shop,Electronics Store,Empanada Restaurant,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Health Food Store,Historic Site,History Museum,Home Service,Hostel,Hot Dog Joint,Hotel,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Liquor Store,Locksmith,Lounge,Market,Medical Supply Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Motel,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Noodle House,Nursery School,Optical Shop,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Event Space,Outlet Mall,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pie Shop,Pier,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Recreation Center,Rental Service,Rest Area,Restaurant,Road,Rock Club,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Ski Trail,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stadium,Steakhouse,Street Art,Supermarket,Sushi Restaurant,TV Station,Tapas Restaurant,Tea Room,Tennis Court,Theater,Theme Park,Theme Restaurant,Tour Provider,Tourist Information Center,Toy / Game Store,Track Stadium,Tram Station,Vegetarian / Vegan Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Aldea Pablo VI,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Aldea Pablo VI,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aldea Pablo VI,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aldea Pablo VI,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aldea Pablo VI,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Verifiquemos el nuevo tamaño del DataFrame

In [22]:
Medellin_onehot.shape

(4471, 229)

#### Agrupemos las filas por barrios tomando la média de la frecuencia de la ocurrencia de cada categoría

In [23]:
Medellin_grouped = Medellin_onehot.groupby('BARRIO').mean().reset_index()
Medellin_grouped

Unnamed: 0,BARRIO,Advertising Agency,Airport,Airport Lounge,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,BBQ Joint,Bakery,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Betting Shop,Big Box Store,Bike Rental / Bike Share,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Station,Business Service,Cable Car,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Casino,Cemetery,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Gym,Colombian Restaurant,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Donut Shop,Electronics Store,Empanada Restaurant,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Health Food Store,Historic Site,History Museum,Home Service,Hostel,Hot Dog Joint,Hotel,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Liquor Store,Locksmith,Lounge,Market,Medical Supply Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Motel,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Noodle House,Nursery School,Optical Shop,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Event Space,Outlet Mall,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pie Shop,Pier,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Recreation Center,Rental Service,Rest Area,Restaurant,Road,Rock Club,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Ski Trail,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stadium,Steakhouse,Street Art,Supermarket,Sushi Restaurant,TV Station,Tapas Restaurant,Tea Room,Tennis Court,Theater,Theme Park,Theme Restaurant,Tour Provider,Tourist Information Center,Toy / Game Store,Track Stadium,Tram Station,Vegetarian / Vegan Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Aldea Pablo VI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alejandro Echavarría,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alejandría,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.12,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alfonso López,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Altamira,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
5,Altos del Poblado,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Andalucía,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Antonio Nariño,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Aranjuez,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Asomadera No.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Confirmemos el tamaño del DataFrame

In [24]:
Medellin_grouped.shape

(247, 229)

#### Imprimamos cada barrio junto con los 5 sitios mas comunes

In [25]:
num_top_venues = 5

for hood in Medellin_grouped['BARRIO']:
    print("----"+hood+"----")
    temp = Medellin_grouped[Medellin_grouped['BARRIO'] == hood].T.reset_index()
    temp.columns = ['lugar','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aldea Pablo VI----
                       lugar  freq
0                      Plaza   0.2
1             Rental Service   0.2
2                  Cable Car   0.2
3                       Park   0.2
4  South American Restaurant   0.2


----Alejandro Echavarría----
                  lugar  freq
0           Pizza Place  0.12
1          Tram Station  0.12
2             Multiplex  0.12
3  Gym / Fitness Center  0.12
4         Shopping Mall  0.12


----Alejandría----
           lugar  freq
0          Hotel  0.12
1           Café  0.10
2  Shopping Mall  0.08
3    Pizza Place  0.06
4         Bakery  0.04


----Alfonso López----
                  lugar  freq
0             Rest Area  0.17
1   Fried Chicken Joint  0.17
2            Public Art  0.17
3             BBQ Joint  0.17
4  Fast Food Restaurant  0.17


----Altamira----
                lugar  freq
0         Wings Joint  0.33
1                Food  0.33
2           BBQ Joint  0.33
3  Advertising Agency  0.00
4   Other Repair Shop  0.00


----

#### pongamoslo en el DataFrame
Primero escribamos una función para ordenar los sitios en orden descendente.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Generemos el nuevo dataframe y mostremos los primeros 10 sitios de cada barrio.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['BARRIO']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['BARRIO'] = Medellin_grouped['BARRIO']

for ind in np.arange(Medellin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Medellin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,BARRIO,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aldea Pablo VI,South American Restaurant,Cable Car,Plaza,Park,Rental Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,Alejandro Echavarría,Soccer Field,Shopping Mall,Bar,Multiplex,Tram Station,Gym / Fitness Center,Ice Cream Shop,Pizza Place,Farmers Market,Fast Food Restaurant
2,Alejandría,Hotel,Café,Shopping Mall,Pizza Place,Salad Place,Bakery,Cocktail Bar,Burger Joint,Breakfast Spot,Frozen Yogurt Shop
3,Alfonso López,BBQ Joint,Rest Area,Fast Food Restaurant,Lake,Fried Chicken Joint,Public Art,Donut Shop,Electronics Store,Empanada Restaurant,Eye Doctor
4,Altamira,Wings Joint,Food,BBQ Joint,Zoo,Donut Shop,Food & Drink Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


### Agrupemos los barrios, **K-Means**

In [28]:
#Ejecutemos _k_-means para agrupar los barrios en 5 agrupaciones.
# establecer el número de agrupaciones
kclusters = 5

In [29]:
Medellin_grouped_clustering = Medellin_grouped.drop('BARRIO', 1)

# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Medellin_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
kmeans.labels_[0:5]

array([4, 0, 0, 0, 0])

Generemos un nuevo dataframe que incluya la agrupación asi como los 10 sitios mas populares de cada barrio.

In [30]:
# añadir etiquetas
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Medellin_merged = dfBarrios

# juntar Medellin_grouped con dfBarrios
Medellin_merged = Medellin_merged.join(neighborhoods_venues_sorted.set_index('BARRIO'), on='barrio')

Medellin_merged.head() # revisar las ultimas columnas

Unnamed: 0,barrio,BarrioOVereda,Comuna,lat,lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aldea Pablo VI,1,Popular,6.28829,-75.5421,4.0,South American Restaurant,Cable Car,Plaza,Park,Rental Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,Alejandro Echavarría,1,Buenos Aires,6.23877,-75.5463,0.0,Soccer Field,Shopping Mall,Bar,Multiplex,Tram Station,Gym / Fitness Center,Ice Cream Shop,Pizza Place,Farmers Market,Fast Food Restaurant
2,Alejandría,1,El Poblado,6.19997,-75.569,0.0,Hotel,Café,Shopping Mall,Pizza Place,Salad Place,Bakery,Cocktail Bar,Burger Joint,Breakfast Spot,Frozen Yogurt Shop
3,Alfonso López,1,Castilla,6.28489,-75.576,0.0,BBQ Joint,Rest Area,Fast Food Restaurant,Lake,Fried Chicken Joint,Public Art,Donut Shop,Electronics Store,Empanada Restaurant,Eye Doctor
4,Altamira,1,Robledo,6.27983,-75.5814,0.0,Wings Joint,Food,BBQ Joint,Zoo,Donut Shop,Food & Drink Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


Antes de visualizar las agrupaciones, renombremos una columna, llenamos datos nulos y cambiamos el tipo de dato a entero en la columna ClusterLabels

In [31]:
Medellin_merged.rename(columns = {'Cluster Labels':'ClusterLabels'}, inplace = True)
Medellin_merged.ClusterLabels = Medellin_merged.ClusterLabels.fillna(9)
Medellin_merged.ClusterLabels = Medellin_merged.ClusterLabels.astype(int)

Eliminamos las columna con datos nulos y verificamos los datos

In [32]:
indexNames = Medellin_merged[ (Medellin_merged['ClusterLabels'] == 9)].index
Medellin_merged.drop(indexNames, inplace=True)
Medellin_merged.head(5)

Unnamed: 0,barrio,BarrioOVereda,Comuna,lat,lng,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aldea Pablo VI,1,Popular,6.28829,-75.5421,4,South American Restaurant,Cable Car,Plaza,Park,Rental Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,Alejandro Echavarría,1,Buenos Aires,6.23877,-75.5463,0,Soccer Field,Shopping Mall,Bar,Multiplex,Tram Station,Gym / Fitness Center,Ice Cream Shop,Pizza Place,Farmers Market,Fast Food Restaurant
2,Alejandría,1,El Poblado,6.19997,-75.569,0,Hotel,Café,Shopping Mall,Pizza Place,Salad Place,Bakery,Cocktail Bar,Burger Joint,Breakfast Spot,Frozen Yogurt Shop
3,Alfonso López,1,Castilla,6.28489,-75.576,0,BBQ Joint,Rest Area,Fast Food Restaurant,Lake,Fried Chicken Joint,Public Art,Donut Shop,Electronics Store,Empanada Restaurant,Eye Doctor
4,Altamira,1,Robledo,6.27983,-75.5814,0,Wings Joint,Food,BBQ Joint,Zoo,Donut Shop,Food & Drink Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


Despues de ajustar el formato de nuestros datos, procedemos a visualizar las agrupaciones resultantes

In [62]:
#crear mapa
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(Medellin_merged['lat'], Medellin_merged['lng'], Medellin_merged['barrio'], Medellin_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Barrios donde es mas comun las pizzerias.

In [58]:
# visualicemos los barrios con mas pizzerias en la ciudad
Barrio_Con_Pizzerias = Medellin_venues[Medellin_venues['SITIO Category'] =='Pizza Place']
Barrio_Con_Pizzerias['BARRIO'].value_counts()

Los Alpes                   5
Nueva Villa del Aburrá      4
La Castellana               4
Boston                      3
Los Pinos                   3
Simón Bolívar               3
Los Ángeles                 3
Suramericana                3
Miravalle                   3
Los Balsos No.2             3
Alejandría                  3
Las Mercedes                2
La Loma de Los Bernal       2
Nueva Villa de La Iguaná    2
El Velódromo                2
Corazón de Jesús            2
Lorena                      2
Versalles No.1              2
Las Playas                  2
Barrio Cristóbal            2
Bomboná No.1                2
Los Balsos No.1             2
La Mota                     2
Florida Nueva               2
La Florida                  2
Universidad Nacional        2
Carlos E. Restrepo          2
La Candelaria               2
Villa Nueva                 2
Diego Echavarría            2
Barrio Caicedo              1
Cristo Rey                  1
Asomadera No.3              1
San Bernar

#### observemos que los 5 barrios con mas pizzerias de la ciudad pertenen al Cluster 0, definitivamente no es un buen lugar para abrir una pizzeria, ya que hay mucha competencia.

In [60]:
Barrio_Con_Pizzerias1=Medellin_merged.loc[(Medellin_merged['barrio'] == 'Los Alpes') | (Medellin_merged['barrio'] == 'Nueva Villa del Aburrá') | (Medellin_merged['barrio'] == 'La Castellana') | (Medellin_merged['barrio'] == 'Boston') | (Medellin_merged['barrio'] == 'Los Pinos')]
Barrio_Con_Pizzerias1

Unnamed: 0,barrio,BarrioOVereda,Comuna,lat,lng,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Boston,1,La Candelaria,6.24804,-75.5576,0,Bar,Plaza,Pizza Place,Restaurant,Italian Restaurant,Theater,Ice Cream Shop,Gastropub,Seafood Restaurant,Park
110,La Castellana,1,Laureles Estadio,6.24023,-75.6054,0,Bar,Café,Pizza Place,Sandwich Place,Shopping Mall,Italian Restaurant,Bakery,Mexican Restaurant,Restaurant,Cocktail Bar
150,Los Alpes,1,Belén,6.2309,-75.6076,0,Pizza Place,Fast Food Restaurant,Sandwich Place,Burger Joint,Café,Italian Restaurant,BBQ Joint,Theater,Shopping Mall,Mexican Restaurant
158,Los Pinos,1,La América,6.25205,-75.599,0,Pizza Place,Restaurant,Gym,Park,Mexican Restaurant,Bar,Café,Shopping Mall,Peruvian Restaurant,Sushi Restaurant
176,Nueva Villa del Aburrá,1,Belén,6.23566,-75.6043,0,Bar,Café,Pizza Place,Sandwich Place,Theater,Italian Restaurant,Burger Joint,Pie Shop,Breakfast Spot,Restaurant


<h3 id="metodologia">4. Metodologia</h3>

Se utilizo la base de datos de la alcaldia de medellin de barrios y veredas, que esta disponible en la pagina web Geo Medellin, esta base de datos, contiene los datos demograficos de los barrios y comunas del area metropolitana de la ciudad.

Posteriormente se hizo todo el tratamiento de los datos, limpieza de datos, organizacion y formato de los datos.

Despues se utilizo la libreria **GeoPy** , para obtener los valores de latitud y longitud, de cada uno de los barrios del dataframe.
Ademas con los datos Geoespaciales obtenidos de GeoPy, se procedio a hacer llamados a la Api de **Foursquare** para consultar el top 10 de los sitios mas comunes para cada barrio.


por ultimo Abordamos el problema utilizando la tecnica de agrupamiento que ya conocemos, **k-means**. Este enfoque permitio a la audiencia ver como los vecindarios se agruparon en 5 Clusters. 

<h3 id="resultados">5. Resultados</h3>

**Agrupacion 1**

In [34]:
Mde_group1=Medellin_merged.loc[Medellin_merged['ClusterLabels'] == 0, Medellin_merged.columns[[0] + list(range(5, Medellin_merged.shape[1]))]]
Mde_group1

Unnamed: 0,barrio,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Alejandro Echavarría,0,Soccer Field,Shopping Mall,Bar,Multiplex,Tram Station,Gym / Fitness Center,Ice Cream Shop,Pizza Place,Farmers Market,Fast Food Restaurant
2,Alejandría,0,Hotel,Café,Shopping Mall,Pizza Place,Salad Place,Bakery,Cocktail Bar,Burger Joint,Breakfast Spot,Frozen Yogurt Shop
3,Alfonso López,0,BBQ Joint,Rest Area,Fast Food Restaurant,Lake,Fried Chicken Joint,Public Art,Donut Shop,Electronics Store,Empanada Restaurant,Eye Doctor
4,Altamira,0,Wings Joint,Food,BBQ Joint,Zoo,Donut Shop,Food & Drink Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
5,Altos del Poblado,0,Gym / Fitness Center,Hotel,Breakfast Spot,BBQ Joint,Electronics Store,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
7,Antonio Nariño,0,Construction & Landscaping,Fast Food Restaurant,Seafood Restaurant,Juice Bar,Doner Restaurant,Food,Fish & Chips Shop,Farmers Market,Farm,Falafel Restaurant
8,Aranjuez,0,Shopping Mall,Recreation Center,Sandwich Place,Art Museum,Donut Shop,Plaza,Convenience Store,Farmers Market,Farm,Falafel Restaurant
9,Asomadera No.1,0,Sandwich Place,Nightclub,Gym,Hotel,Steakhouse,Ice Cream Shop,Italian Restaurant,Café,Chinese Restaurant,Lounge
10,Asomadera No.2,0,Hotel,Supermarket,Italian Restaurant,History Museum,Shopping Mall,Nightclub,Restaurant,Scenic Lookout,Brazilian Restaurant,Seafood Restaurant
11,Asomadera No.3,0,Hotel,Italian Restaurant,Restaurant,Shopping Mall,Supermarket,Burger Joint,Mexican Restaurant,Bed & Breakfast,South American Restaurant,Snack Place


**Agrupacion 2**

In [35]:
Mde_group2=Medellin_merged.loc[Medellin_merged['ClusterLabels'] == 1, Medellin_merged.columns[[0] + list(range(5, Medellin_merged.shape[1]))]]
Mde_group2

Unnamed: 0,barrio,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Andalucía,1,Cable Car,Metro Station,Zoo,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
96,Granizal,1,Cable Car,South American Restaurant,Plaza,Rental Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Factory
99,Héctor Abad Gómez,1,Cable Car,Metro Station,Clothing Store,Park,Zoo,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
102,Juan Pablo II,1,BBQ Joint,Cable Car,Zoo,Electronics Store,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
108,La Avanzada,1,South American Restaurant,Cable Car,Plaza,Liquor Store,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
114,La Esperanza No.2,1,South American Restaurant,Cable Car,Plaza,Construction & Landscaping,History Museum,Discount Store,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
117,La Francia,1,Cable Car,Metro Station,Zoo,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
118,La Frontera,1,Cable Car,Coffee Shop,Metro Station,BBQ Joint,Zoo,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
121,La Isla,1,Cable Car,Fast Food Restaurant,Metro Station,Furniture / Home Store,Zoo,Food,Fish & Chips Shop,Farmers Market,Farm,Falafel Restaurant
133,La Sierra,1,BBQ Joint,Campground,Cable Car,Zoo,Electronics Store,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


**Agrupacion 3**

In [36]:
Mde_group3=Medellin_merged.loc[Medellin_merged['ClusterLabels'] == 2, Medellin_merged.columns[[0] + list(range(5, Medellin_merged.shape[1]))]]
Mde_group3

Unnamed: 0,barrio,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Belencito,2,Ice Cream Shop,General Entertainment,Art Gallery,Café,Museum,Fast Food Restaurant,Coffee Shop,BBQ Joint,Scenic Lookout,Empanada Restaurant
29,Betania,2,Ice Cream Shop,Café,Museum,Coffee Shop,Art Gallery,Scenic Lookout,Zoo,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
67,El Corazón,2,Ice Cream Shop,Café,Coffee Shop,Art Gallery,Scenic Lookout,Zoo,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
76,El Raizal,2,Diner,Ice Cream Shop,Park,Scenic Lookout,Doner Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
82,El Triunfo,2,Mountain,Scenic Lookout,Donut Shop,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
112,La Cruz,2,Food Truck,Ice Cream Shop,Pedestrian Plaza,Zoo,Donut Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
139,Las Independencias,2,Ice Cream Shop,Café,Coffee Shop,Art Gallery,Scenic Lookout,Zoo,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
177,Nuevos Conquistadores,2,Ice Cream Shop,Café,Coffee Shop,Art Gallery,Scenic Lookout,Zoo,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
229,Tejelo,2,Ice Cream Shop,Burger Joint,Bar,Zoo,Electronics Store,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
238,Veinte de Julio,2,Ice Cream Shop,Park,Burger Joint,Street Art,Scenic Lookout,Art Gallery,General Entertainment,Café,Coffee Shop,Falafel Restaurant


**Agrupacion 4**

In [37]:
Mde_group4=Medellin_merged.loc[Medellin_merged['ClusterLabels'] == 3, Medellin_merged.columns[[0] + list(range(5, Medellin_merged.shape[1]))]]
Mde_group4

Unnamed: 0,barrio,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Barrios de Jesús,3,Tram Station,Soccer Field,Hardware Store,Shoe Store,Salon / Barbershop,Doner Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
33,Bomboná No.2,3,Tram Station,Pizza Place,Soccer Field,Doner Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
60,Doce de Octubre No.1,3,Soccer Field,Mountain,Playground,Recreation Center,Scenic Lookout,Discount Store,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
61,Doce de Octubre No.2,3,Soccer Field,Recreation Center,Scenic Lookout,Laundromat,Zoo,Donut Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
80,El Socorro,3,Cable Car,General Entertainment,Street Art,Burger Joint,Soccer Field,Cuban Restaurant,Empanada Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant
103,Juan XXIII La Quiebra,3,Cable Car,Soccer Field,Ski Trail,Zoo,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Factory
104,Kennedy,3,Soccer Field,Wings Joint,Zoo,Doner Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
123,La Libertad,3,Soccer Field,Multiplex,Shopping Mall,Tram Station,Gym / Fitness Center,Doner Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
130,La Pradera,3,Cable Car,Soccer Field,Street Art,Burger Joint,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
153,Los Cerros El Vergel,3,Tram Station,Athletics & Sports,Ice Cream Shop,Shopping Mall,Soccer Field,Pizza Place,Women's Store,Cosmetics Shop,Donut Shop,Food


**Agrupacion 5**

In [38]:
Mde_group5=Medellin_merged.loc[Medellin_merged['ClusterLabels'] == 4, Medellin_merged.columns[[0] + list(range(5, Medellin_merged.shape[1]))]]
Mde_group5

Unnamed: 0,barrio,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aldea Pablo VI,4,South American Restaurant,Cable Car,Plaza,Park,Rental Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
22,Batallón Girardot,4,Park,Comfort Food Restaurant,Other Great Outdoors,Zoo,Donut Shop,Food,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
27,Berlín,4,Fast Food Restaurant,Plaza,Café,Recreation Center,Park,Zoo,Farmers Market,Farm,Falafel Restaurant,Factory
36,Boyacá,4,Ice Cream Shop,Burger Joint,Park,Bar,Zoo,Electronics Store,Food & Drink Shop,Food,Fish & Chips Shop,Fast Food Restaurant
66,El Compromiso,4,South American Restaurant,Cable Car,Plaza,Park,Rental Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
71,El Pesebre,4,Gym / Fitness Center,Shopping Mall,Sandwich Place,Juice Bar,Park,Fried Chicken Joint,Convenience Store,Big Box Store,Farm,Donut Shop
74,El Pomar,4,Shopping Mall,History Museum,Plaza,Department Store,Gym / Fitness Center,Liquor Store,Park,Eye Doctor,Donut Shop,Electronics Store
79,El Salvador,4,Department Store,Park,Restaurant,Art Gallery,Supermarket,Nature Preserve,Steakhouse,Sandwich Place,Arepa Restaurant,Plaza
84,Enciso,4,Plaza,Park,Caribbean Restaurant,Steakhouse,Museum,Bus Station,South American Restaurant,Construction & Landscaping,Comfort Food Restaurant,Farmers Market
119,La Gloria,4,Park,Pizza Place,Ice Cream Shop,Shopping Mall,Burger Joint,Market,BBQ Joint,Donut Shop,Fast Food Restaurant,Farmers Market


<h3 id="conclusiones">6. Conclusiones</h3>

En base a los resultados obtenidos, Podemos notar que el grupo #5 'Mde_group5' que pertenece al **Cluster 4**, tiene una afinidad con el sitio que estamos buscando, porque tiene sitios de alto flujo de transeuntes, como estaciones de metro, parques, escenarios deportivos, gimnasios entre otros, y aunque hay sitios de comida, esto indica que hay una demanda de consumo, de sitios que ofrecen ofertas gastronomicas, observamos que no hay disponibles en este grupo **pizzerias**, asi que seria factible abrir una alli.

Sin embargo habria que analizar a profundidad otros aspectos, para determinar de los barrios del **Cluster 4**, cual es el mas factible, para abrir la pizzeria, estos aspectos pueden ser de criminalidad, rango de precios de los sitios de comida, y cercania con sitios turisticos. 

### Gracias!!