# Proyecto Capstone - Batalla de Vecindarios (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Tabla de contenidos
* [Introducción: El problema de negocio](#Introducción)
* [Datos](#Datos)
* [Metodología](#Metodología)
* [Análisis](#Análisis)
* [Resultados y discusión](#resultados)
* [Conclusión](#Conclusión)

## Introducción: El problema de negocio

Este proyecto tiene la finalidad de recomendar a un emprendedor el tipo de negocio más recomendable para ubicarlo en la ciudad de Guayaquil(Ecuador), para lo cual se realizará la segmentación de la ciudad en base a sus parroquias urbanas, y posteriormente se aplicará el algoritmos de K-means para clusterización en cinco zonas de alta densidad comercial, se escogerá el modelo de negocio con la menor impacto comercial con la ayuda de la API de Foursquare.

## Datos

Nuestros datos fueron obtenidos del sitio web: https://www.getpostalcodes.com/ecuador/county-guayaquil-guayas/, el cual por medio de los códigos postales de las parroquias urbanas de Guayaquil, se recopilaron los nombres de mencionadas parroquias junto a sus coordenadas geográficas.

Hay que tomar en cuenta que la inversión en el negocio es relatívamente alta, por lo que se descarta ponerlo en funcionamiento en parroquias rurales en donde existe poca actividad comercial.

Importamos las librerías necesarias para nuestro proyecto

In [42]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

Hacemos la llamada al dataset creado desde un archivo xlsx con pandas

In [4]:
ruta1 = 'C:/Users/INFORMEGA/Desktop/G.xlsx'

dfg = pd.read_excel(ruta1)

dfg.head()

Unnamed: 0,Postal Code,Neighborhood,latitud,longitud
0,90101,AYACUCHO,-2.20548,-79.89029
1,90102,BOLIVAR,-2.19923,-79.88992
2,90104,FEBRES CORDERO,-2.2034,-79.93424
3,90105,GARCIA MORENO,-2.20836,-79.8993
4,90106,LETAMENDI,-2.21127,-79.90823


Ubicamos la posición geográfica de Guayaquil con la ayuda de la librería geopy

In [5]:
from geopy.geocoders import Nominatim

address = 'GUAYAQUIL,EC'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Guayaquil are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Guayaquil are -2.1900563, -79.8868741.


Desplegamos el mapa de la ciudad de Guayaquil con los puntos geográficos que señalan las coordenadas geográficas de las parroquias urbanas de Guayaquil

In [6]:
! pip install folium

import folium

# crear un mapa de gye usando los valores de latitud y longitud
map_gye = folium.Map(location=[latitude, longitude], zoom_start=11)

# añadir los marcadores al mapa
for lat, lng, label in zip(dfg['latitud'], dfg['longitud'], dfg['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_gye)  
    
map_gye



### Uso de la Api de Foursquare

Por medio de las credenciales de Foursquare, hacemos la llamada a la API de los lugares cercanos a las posiciones geográficas de las parroquias urbanas de Guayaquil

In [7]:
CLIENT_ID = '2OF3KKCYFHP50KM33SYTWRWORAU5A2IWLQGYH2K3NRPGTTMU' # su ID de Foursquare
CLIENT_SECRET = 'TSQBJ2TIQLG5KFUSS234GQO5KGMFVWL5FVKWVV5KYVQJUPXB' # su Secreto de Cliente de Foursquare
#VERSION = '20180604'
VERSION = '20230417'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2OF3KKCYFHP50KM33SYTWRWORAU5A2IWLQGYH2K3NRPGTTMU
CLIENT_SECRET:TSQBJ2TIQLG5KFUSS234GQO5KGMFVWL5FVKWVV5KYVQJUPXB


In [8]:
parroquia_latitude = dfg.loc[0, 'latitud'] # latitud de la parroquia 
parroquia_longitude = dfg.loc[0, 'longitud'] # longitud de la parroquia

parroquia_name = dfg.loc[0, 'Neighborhood'] # nombre de la parroquia

print('Latitude and longitude values of {} are {},{}.'.format(parroquia_name, 
                                                               parroquia_latitude, 
                                                               parroquia_longitude))

Latitude and longitude values of AYACUCHO are -2.20548,-79.89029.


In [9]:
LIMIT= 50
radius= 500
url = "https://api.foursquare.com/v3/places/search?ll={}%2C{}&radius={}&limit={}".format(parroquia_latitude, parroquia_longitude, radius, LIMIT)

In [10]:
#url = "https://api.foursquare.com/v3/places/search?ll=43.6763574%2C%20-79.2930312&radius=500&limit=50"
#url = "https://api.foursquare.com/v3/places/search?ll={}%2C%20{}&radius={}&limit={}".format(neighborhood_latitude, neighborhood_longitude, radius, LIMIT)

headers = {
    "accept": "application/json",
    "Authorization": "fsq3+btNZJiWH0LtffL3n4zu2NiE6iThx9A752cLEJs5q1U="
}
#response = requests.get(url, headers=headers)
#results = requests.get(url, headers=headers)
results = requests.get(url, headers=headers).json()
results
#print(results.text)

{'results': [{'fsq_id': '4c98da0805a1b1f7b1039153',
   'categories': [{'id': 13065,
     'name': 'Restaurant',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_',
      'suffix': '.png'}}],
   'chains': [],
   'distance': 110,
   'geocodes': {'main': {'latitude': -2.20526, 'longitude': -79.891262}},
   'link': '/v3/places/4c98da0805a1b1f7b1039153',
   'location': {'address': 'Villavicencio y',
    'country': 'EC',
    'cross_street': 'Francisco de Marco',
    'formatted_address': 'Villavicencio y (Francisco de Marco), Guayaquil',
    'locality': 'Guayaquil',
    'region': 'Provincia del Guayas'},
   'name': 'El Descanso De Los Amigos',
   'related_places': {},
   'timezone': 'America/Guayaquil'},
  {'fsq_id': '4ff77f42e4b01a84edc578a1',
   'categories': [{'id': 13065,
     'name': 'Restaurant',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_',
      'suffix': '.png'}}],
   'chains': [],
   'distance': 120,
   'geocodes': {'main'

Convertimos los datos JSON de los lugares cercanos en un radio de 500 metros de las parroquias urbanas en un dataframe de pandas

In [14]:
from pandas.io.json import json_normalize 
venues = results['results']
nearby_venues = json_normalize(venues) # objeto JSON

  nearby_venues = json_normalize(venues) # objeto JSON


In [15]:
nearby_venues.head()

Unnamed: 0,fsq_id,categories,chains,distance,link,name,timezone,geocodes.main.latitude,geocodes.main.longitude,location.address,location.country,location.cross_street,location.formatted_address,location.locality,location.region,related_places.children,related_places.parent.fsq_id,related_places.parent.name,location.postcode
0,4c98da0805a1b1f7b1039153,"[{'id': 13065, 'name': 'Restaurant', 'icon': {...",[],110,/v3/places/4c98da0805a1b1f7b1039153,El Descanso De Los Amigos,America/Guayaquil,-2.20526,-79.891262,Villavicencio y,EC,Francisco de Marco,"Villavicencio y (Francisco de Marco), Guayaquil",Guayaquil,Provincia del Guayas,,,,
1,4ff77f42e4b01a84edc578a1,"[{'id': 13065, 'name': 'Restaurant', 'icon': {...",[],120,/v3/places/4ff77f42e4b01a84edc578a1,Restaurant 2 Hermanos,America/Guayaquil,-2.204454,-79.890628,,EC,,Guayaquil,Guayaquil,Provincia del Guayas,,,,
2,4fe39115e4b079c77b1fc1f1,"[{'id': 17005, 'name': 'Automotive Retail', 'i...",[],166,/v3/places/4fe39115e4b079c77b1fc1f1,Tecnicentro Granja,America/Guayaquil,-2.205115,-79.891747,,EC,,Guayaquil,Guayaquil,Provincia del Guayas,,,,
3,4d1f278cf7a9a14343d1219f,"[{'id': 13338, 'name': 'Seafood Restaurant', '...",[],250,/v3/places/4d1f278cf7a9a14343d1219f,Los Arbolitos,America/Guayaquil,-2.207205,-79.888831,San martin,EC,Rumichaca,"San martin (Rumichaca), Guayaquil",Guayaquil,Provincia del Guayas,,,,
4,4e91be3e30f81ec6fff61503,"[{'id': 13000, 'name': 'Dining and Drinking', ...",[],178,/v3/places/4e91be3e30f81ec6fff61503,Colorado,America/Guayaquil,-2.206837,-79.889434,Ambato,EC,Villavicencio,"Ambato (Villavicencio), Guayaquil",Guayaquil,Provincia del Guayas,,,,


Hacemos una limpieza del dataframe para que nos quede sólamente las categorías de los lugares junto a sus coordenadas geográficas

In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['categories.name']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.head()

Unnamed: 0,fsq_id,categories,chains,distance,link,name,timezone,geocodes.main.latitude,geocodes.main.longitude,location.address,location.country,location.cross_street,location.formatted_address,location.locality,location.region,related_places.children,related_places.parent.fsq_id,related_places.parent.name,location.postcode
0,4c98da0805a1b1f7b1039153,Restaurant,[],110,/v3/places/4c98da0805a1b1f7b1039153,El Descanso De Los Amigos,America/Guayaquil,-2.20526,-79.891262,Villavicencio y,EC,Francisco de Marco,"Villavicencio y (Francisco de Marco), Guayaquil",Guayaquil,Provincia del Guayas,,,,
1,4ff77f42e4b01a84edc578a1,Restaurant,[],120,/v3/places/4ff77f42e4b01a84edc578a1,Restaurant 2 Hermanos,America/Guayaquil,-2.204454,-79.890628,,EC,,Guayaquil,Guayaquil,Provincia del Guayas,,,,
2,4fe39115e4b079c77b1fc1f1,Automotive Retail,[],166,/v3/places/4fe39115e4b079c77b1fc1f1,Tecnicentro Granja,America/Guayaquil,-2.205115,-79.891747,,EC,,Guayaquil,Guayaquil,Provincia del Guayas,,,,
3,4d1f278cf7a9a14343d1219f,Seafood Restaurant,[],250,/v3/places/4d1f278cf7a9a14343d1219f,Los Arbolitos,America/Guayaquil,-2.207205,-79.888831,San martin,EC,Rumichaca,"San martin (Rumichaca), Guayaquil",Guayaquil,Provincia del Guayas,,,,
4,4e91be3e30f81ec6fff61503,Dining and Drinking,[],178,/v3/places/4e91be3e30f81ec6fff61503,Colorado,America/Guayaquil,-2.206837,-79.889434,Ambato,EC,Villavicencio,"Ambato (Villavicencio), Guayaquil",Guayaquil,Provincia del Guayas,,,,


In [17]:
filtered_columns = ['name', 'categories', 'geocodes.main.latitude', 'geocodes.main.longitude']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues.head()

Unnamed: 0,name,categories,geocodes.main.latitude,geocodes.main.longitude
0,El Descanso De Los Amigos,Restaurant,-2.20526,-79.891262
1,Restaurant 2 Hermanos,Restaurant,-2.204454,-79.890628
2,Tecnicentro Granja,Automotive Retail,-2.205115,-79.891747
3,Los Arbolitos,Seafood Restaurant,-2.207205,-79.888831
4,Colorado,Dining and Drinking,-2.206837,-79.889434


In [18]:
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,latitude,longitude
0,El Descanso De Los Amigos,Restaurant,-2.20526,-79.891262
1,Restaurant 2 Hermanos,Restaurant,-2.204454,-79.890628
2,Tecnicentro Granja,Automotive Retail,-2.205115,-79.891747
3,Los Arbolitos,Seafood Restaurant,-2.207205,-79.888831
4,Colorado,Dining and Drinking,-2.206837,-79.889434


In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # crear la URL de solicitud de API
        url = "https://api.foursquare.com/v3/places/search?ll={}%2C{}&radius={}&limit={}".format(lat, lng, radius, LIMIT)
        headers = {
        "accept": "application/json",
        "Authorization": "fsq3+btNZJiWH0LtffL3n4zu2NiE6iThx9A752cLEJs5q1U="
        }
            
        # solicitud GET
        results = requests.get(url, headers=headers).json()
        venues = results['results']

        
        # regresa solo información relevante de cada sitio cercano
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['geocodes']['main']['latitude'], 
            v['geocodes']['main']['longitude'],  
            (v['categories'][0]['name'] if len(v['categories']) > 0 else None)) for v in venues])
       


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
nearby_venues

Unnamed: 0,name,categories,latitude,longitude
0,El Descanso De Los Amigos,Restaurant,-2.20526,-79.891262
1,Restaurant 2 Hermanos,Restaurant,-2.204454,-79.890628
2,Tecnicentro Granja,Automotive Retail,-2.205115,-79.891747
3,Los Arbolitos,Seafood Restaurant,-2.207205,-79.888831
4,Colorado,Dining and Drinking,-2.206837,-79.889434
5,Chicha resbaladera Rosa Amendaño,Juice Bar,-2.203775,-79.889469
6,Chifa Dinastia Sur,Chinese Restaurant,-2.203905,-79.889119
7,Estadio Banco del Pacífico Capwell,Stadium,-2.206698,-79.89378
8,Salsa Na Mas,Night Club,-2.203558,-79.889556
9,PROINFRA,Hardware Store,-2.207531,-79.890737


In [23]:
gye_venues = getNearbyVenues(names=dfg['Neighborhood'],
                                   latitudes=dfg['latitud'],
                                   longitudes=dfg['longitud']
                                  )

AYACUCHO
BOLIVAR
FEBRES CORDERO
GARCIA MORENO
LETAMENDI
9 DE OCTUBRE
PASCUALES
ROCA
TARQUI
URDANETA
XIMENA


In [24]:
print(gye_venues.shape)
gye_venues.head()

(364, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,AYACUCHO,-2.20548,-79.89029,El Descanso De Los Amigos,-2.20526,-79.891262,Restaurant
1,AYACUCHO,-2.20548,-79.89029,Restaurant 2 Hermanos,-2.204454,-79.890628,Restaurant
2,AYACUCHO,-2.20548,-79.89029,Tecnicentro Granja,-2.205115,-79.891747,Automotive Retail
3,AYACUCHO,-2.20548,-79.89029,Los Arbolitos,-2.207205,-79.888831,Seafood Restaurant
4,AYACUCHO,-2.20548,-79.89029,Colorado,-2.206837,-79.889434,Dining and Drinking


Podemos observar que existen 4 parroquias con poca actividad comercial, la mayoría de ellas se encuentran en la zona periférica del casco comercial de la ciudad

In [25]:
gye_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
9 DE OCTUBRE,46,46,46,46,46,45
AYACUCHO,50,50,50,50,50,50
BOLIVAR,50,50,50,50,50,45
FEBRES CORDERO,6,6,6,6,6,6
GARCIA MORENO,49,49,49,49,49,45
LETAMENDI,10,10,10,10,10,10
PASCUALES,6,6,6,6,6,4
ROCA,50,50,50,50,50,46
TARQUI,33,33,33,33,33,29
URDANETA,50,50,50,50,50,47


## Metodología

Una vez que hemos preprocesado los datos, es momento de aplicar las técnicas de machine learning aprendidas en el curso, para lo cual vamos a agrupar las parroquias urbanas de Guayaquil según el número de lugares comerciales cercanos a las mismas, la agrupación será de k= 5 utilizando el algoritmo de K-means.

También desplegaremos el mapa junto con las agrupaciones dadas por el algoritmos, luego de eso haremos nuestro análisis.

## Análisis

Aplicamos la técnica One-Hot para convertir los datos categóricos de las parroquias en datos numéricos para posteriormente utilizar el algoritmo K-means sobre ellos

In [26]:
# codificación
gye_onehot = pd.get_dummies(gye_venues[['Venue Category']], prefix="", prefix_sep="")

# añadir la columna de barrio de regreso al dataframe
gye_onehot['Neighborhood'] = gye_venues['Neighborhood'] 

# mover la columna de barrio a la primer columna
fixed_columns = [gye_onehot.columns[-1]] + list(gye_onehot.columns[:-1])
gye_onehot = gye_onehot[fixed_columns]

gye_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Store,Arcade,Arts and Crafts Store,Arts and Entertainment,Attorney / Law Office,Automotive Retail,Automotive Service,BBQ Joint,...,Sports Club,Stadium,Steakhouse,Storage Facility,Swimming Pool,Swiss Restaurant,Tailor,Vegan and Vegetarian Restaurant,Video Games Store,Volleyball Court
0,AYACUCHO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,AYACUCHO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,AYACUCHO,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,AYACUCHO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,AYACUCHO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
gye_grouped = gye_onehot.groupby('Neighborhood').mean().reset_index()
gye_grouped

Unnamed: 0,Neighborhood,American Restaurant,Antique Store,Arcade,Arts and Crafts Store,Arts and Entertainment,Attorney / Law Office,Automotive Retail,Automotive Service,BBQ Joint,...,Sports Club,Stadium,Steakhouse,Storage Facility,Swimming Pool,Swiss Restaurant,Tailor,Vegan and Vegetarian Restaurant,Video Games Store,Volleyball Court
0,9 DE OCTUBRE,0.0,0.0,0.021739,0.021739,0.043478,0.0,0.0,0.0,0.043478,...,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0
1,AYACUCHO,0.0,0.0,0.0,0.0,0.02,0.0,0.06,0.0,0.04,...,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BOLIVAR,0.0,0.02,0.0,0.0,0.0,0.0,0.06,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
3,FEBRES CORDERO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,GARCIA MORENO,0.0,0.0,0.0,0.0,0.020408,0.0,0.061224,0.0,0.020408,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,LETAMENDI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,PASCUALES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,ROCA,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.0
8,TARQUI,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,...,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,URDANETA,0.02,0.0,0.0,0.0,0.04,0.0,0.08,0.04,0.04,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02


Vamos a construir un Dataframe con los 10 tipos de categorías más populares de los lugares cercanos a las parroquias

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
parroquias_venues_sorted = pd.DataFrame(columns=columns)
parroquias_venues_sorted['Neighborhood'] = gye_grouped['Neighborhood']

for ind in np.arange(gye_grouped.shape[0]):
    parroquias_venues_sorted.iloc[ind, 1:] = return_most_common_venues(gye_grouped.iloc[ind, :], num_top_venues)

parroquias_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,9 DE OCTUBRE,Pizzeria,Burger Joint,Arts and Entertainment,Department Store,BBQ Joint,Latin American Restaurant,Coffee Shop,Health and Beauty Service,Hot Dog Joint,Cupcake Shop
1,AYACUCHO,Seafood Restaurant,Restaurant,Bakery,Automotive Retail,BBQ Joint,Fast Food Restaurant,Office Supply Store,Coffee Shop,Miscellaneous Store,Night Club
2,BOLIVAR,Restaurant,Automotive Retail,Grocery Store / Supermarket,Hardware Store,Seafood Restaurant,Bank,Clothing Store,Drugstore,Department Store,Design Studio
3,FEBRES CORDERO,Breakfast Spot,Restaurant,Seafood Restaurant,BBQ Joint,Electric Vehicle Charging Station,Bar,American Restaurant,Peruvian Restaurant,Performing Arts Venue,Pastry Shop
4,GARCIA MORENO,Sandwich Restaurant,Drugstore,Automotive Retail,Restaurant,Seafood Restaurant,Health and Beauty Service,Electronics Store,Department Store,Fried Chicken Joint,Furniture and Home Store


Aplicamos el algoritmo de K-means con K=5

In [32]:
from sklearn.cluster import KMeans
# establecer el número de agrupaciones
kclusters = 5

gye_grouped_clustering = gye_grouped.drop('Neighborhood', 1)

# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(gye_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
kmeans.labels_[0:10]

  gye_grouped_clustering = gye_grouped.drop('Neighborhood', 1)


array([1, 1, 1, 2, 1, 0, 3, 1, 1, 1])

In [33]:
# añadir etiquetas
parroquias_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

gye_merged = dfg

# juntar toronto_grouped con toronto_data 
gye_merged = gye_merged.join(parroquias_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

gye1 = gye_merged.drop('Postal Code', axis=1)

gye1.head() # revisar las ultimas columnas

Unnamed: 0,Neighborhood,latitud,longitud,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AYACUCHO,-2.20548,-79.89029,1,Seafood Restaurant,Restaurant,Bakery,Automotive Retail,BBQ Joint,Fast Food Restaurant,Office Supply Store,Coffee Shop,Miscellaneous Store,Night Club
1,BOLIVAR,-2.19923,-79.88992,1,Restaurant,Automotive Retail,Grocery Store / Supermarket,Hardware Store,Seafood Restaurant,Bank,Clothing Store,Drugstore,Department Store,Design Studio
2,FEBRES CORDERO,-2.2034,-79.93424,2,Breakfast Spot,Restaurant,Seafood Restaurant,BBQ Joint,Electric Vehicle Charging Station,Bar,American Restaurant,Peruvian Restaurant,Performing Arts Venue,Pastry Shop
3,GARCIA MORENO,-2.20836,-79.8993,1,Sandwich Restaurant,Drugstore,Automotive Retail,Restaurant,Seafood Restaurant,Health and Beauty Service,Electronics Store,Department Store,Fried Chicken Joint,Furniture and Home Store
4,LETAMENDI,-2.21127,-79.90823,0,Drugstore,Farmers' Market,South American Restaurant,Convenience Store,Restaurant,Food Truck,Fast Food Restaurant,South Indian Restaurant,Performing Arts Venue,Pastry Shop


Desplegamos mapa con las agrupaciones dadas en el algoritmo de K-means, las agrupaciones están dadas por los puntos con la misma coloración

In [34]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# crear mapa
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(gye_merged['latitud'], gye_merged['longitud'], gye_merged['Neighborhood'], gye_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Verificamos las categorías de los lugares de las parroquias agrupadas con menor frecuencia comercial

In [37]:
gye_merged.loc[gye_merged['Cluster Labels'] == 0, gye_merged.columns[[1] + list(range(5, gye_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,LETAMENDI,Drugstore,Farmers' Market,South American Restaurant,Convenience Store,Restaurant,Food Truck,Fast Food Restaurant,South Indian Restaurant,Performing Arts Venue,Pastry Shop


In [38]:
gye_merged.loc[gye_merged['Cluster Labels'] == 1, gye_merged.columns[[1] + list(range(5, gye_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AYACUCHO,Seafood Restaurant,Restaurant,Bakery,Automotive Retail,BBQ Joint,Fast Food Restaurant,Office Supply Store,Coffee Shop,Miscellaneous Store,Night Club
1,BOLIVAR,Restaurant,Automotive Retail,Grocery Store / Supermarket,Hardware Store,Seafood Restaurant,Bank,Clothing Store,Drugstore,Department Store,Design Studio
3,GARCIA MORENO,Sandwich Restaurant,Drugstore,Automotive Retail,Restaurant,Seafood Restaurant,Health and Beauty Service,Electronics Store,Department Store,Fried Chicken Joint,Furniture and Home Store
5,9 DE OCTUBRE,Pizzeria,Burger Joint,Arts and Entertainment,Department Store,BBQ Joint,Latin American Restaurant,Coffee Shop,Health and Beauty Service,Hot Dog Joint,Cupcake Shop
7,ROCA,Drugstore,Latin American Restaurant,Seafood Restaurant,Bar,Vegan and Vegetarian Restaurant,Restaurant,Music Store,Internet Cafe,Sandwich Restaurant,Plaza
8,TARQUI,Furniture and Home Store,Restaurant,Health and Beauty Service,Miscellaneous Store,Design Studio,Burger Joint,Boutique,Diner,Pizzeria,Electronics Store
9,URDANETA,Seafood Restaurant,Automotive Retail,Bakery,Latin American Restaurant,Fast Food Restaurant,Arts and Entertainment,Automotive Service,BBQ Joint,Health and Beauty Service,Bank


In [39]:
gye_merged.loc[gye_merged['Cluster Labels'] == 2, gye_merged.columns[[1] + list(range(5, gye_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,FEBRES CORDERO,Breakfast Spot,Restaurant,Seafood Restaurant,BBQ Joint,Electric Vehicle Charging Station,Bar,American Restaurant,Peruvian Restaurant,Performing Arts Venue,Pastry Shop


In [40]:
gye_merged.loc[gye_merged['Cluster Labels'] == 3, gye_merged.columns[[1] + list(range(5, gye_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,PASCUALES,Chinese Restaurant,Bakery,Miscellaneous Store,South American Restaurant,American Restaurant,Nail Salon,Pet Service,Peruvian Restaurant,Performing Arts Venue,Pastry Shop


In [65]:
gye_merged.loc[gye_merged['Cluster Labels'] == 4, gye_merged.columns[[1] + list(range(5, gye_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,XIMENA,Breakfast Spot,Chinese Restaurant,Food and Beverage Retail,Burger Joint,Boutique,Stadium,Bank,Bakery,Park,Swimming Pool


## Resultados y discusión

Los resultados del agrupamiento de parroquias urbanas de Guayaquil en función del número de negocios que existen en un radio de 500 metros, arrojan que el negocio que en futuro emprendedor está claramente favorecido si lo coloca en el centro del rombo comformado por las parroquias: Ayacucho, Bolìvar, García Moreno, 9 de Octubre, Roca, Tarqui y Urdaneta.

Así mismo, se puede notar que de todos los negocios que conformas esta agrupación, la de Night Club es la que tiene menor frecuencia en esta agrupación, por lo que este tipo de negocio sería recomendable al no haber mucha oferta en un sector tan comercial.

## Conclusión

después de realizar el análisis de los datos recibidos y de las posiciones geográficas de los lugares más populares de cada parroquia urbana de Guayaquil, podemos concluir que el negocio de Night Club en el centro de sector conformado por el rombo de parroquias antes señaladas, podría ser la mejor alternativa, ya que es una zona muy comercial de Guayaquil, pero para este tipo de emprendimiento no hay una oferta suficiente.