## Proyecto final La Batalla de los Vecindarios 
## Ciudad de La Paz Bolivia


## Introducción

En Bolivia la ciudad de La Paz, es la sede de gobierno. Es el centro político, financiero, social, académico y cultural más importante del país, además de ser la ciudad con mayor nivel de desarrollo sostenible en Bolivia.
Con una población estimada de 940 000 habitantes (en 2020), La Paz es la tercera ciudad más poblada del país, detrás de Santa Cruz de la Sierra.
La ciudad de La Paz se convirtió en un importante centro político, administrativo, económico y financiero es responsable de generar el 27 % del Producto Interno Bruto del país, además de ser la sede central de la mayoría de bancos, empresas e industrias bolivianas.
La Paz es el centro cultural más importante de Bolivia, motivo por el cual es la ciudad más visitada por turistas internacionales y bolivianos.


## Definición del Problema Comercial


Debido a que La Paz es el centro cultural más importante de Bolivia, motivo por el cual es la ciudad más visitada por turistas internacionales y bolivianos, un grupo de inversionistas desea realizar un estudio del departamento de La Paz para determinar la zona con mayor potencial para desarrollar un proyecto de inversión para turistas.


Utilizaremos los datos de la API de FOURSQUARE, que incluyen variables como locales comerciales, centros turísticos,las coordenadas geográficas, direcciones, la oferta de servicios públicos, entre otras.

Utilizaremos información de la página de Wikipedia en el siguiente link https://es.wikipedia.org/wiki/La_Paz
Se realizará un proceso de limpieza a los datos y se analizarán para determinar la mejor zona de inversión.

## Fuente de datos

Vamos a descargar todas las dependencias necesarias.

In [1]:
import requests # librería para manejar las solicitudes
import pandas as pd # librería para análisis de datos
import numpy as np # librería para manejar datos vectorizados
import random # librería para generar números aleatorios
from bs4 import BeautifulSoup

!pip install geopy
from geopy.geocoders import Nominatim # módulo para convertir una dirección en valores de latitud y longitud 

# librerías para mostrar imágenes 
from IPython.display import Image 
from IPython.core.display import HTML 
    
# librería para convertir un archivo json en un dataframe pandas
from pandas.io.json import json_normalize


! pip install folium==0.5.0
import folium # librería para graficar 
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

print('Folium installed')
print('Libraries imported.')



  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


Folium installed
Libraries imported.


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.float).eps,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.float).eps, copy_X=True, fit_path=True,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.float).eps, copy_X=True, fit_path=True,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.float).eps, positive=False):
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_n_alphas=1000, n_jobs=None, eps=np.finfo(np.float).eps,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_n_alpha

#### Descargamos y exploramos el conjuno de datos


La Paz tiene ocho municipios, obtenemos los datos de los municipios y los transformamos en un dataframe. 

In [2]:
URL = "https://es.wikipedia.org/wiki/%C3%81rea_metropolitana_de_La_Paz"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
table = soup.find('table', class_='wikitable')
df = pd.read_html(str(table))[0]
df

Unnamed: 0,Provincia,Municipio,Superficie,Población
0,Murillo,La Paz,3.04,779.728
1,Murillo,Palca,737.0,16.959
2,Murillo,Mecapaca,511.0,16.324
3,Murillo,Achocalla,182.0,22.594
4,Murillo,El Alto,345.0,860.062
5,Ingavi,Viacha,849.0,81.668
6,Los Andes,Pucarani,930.0,29.040
7,Los Andes,Laja,691.0,24.975
8,Total,Total,7.284,1.831.350


Eliminamos la información que no necesitamos

In [3]:
df = df.drop(index=[8])
df

Unnamed: 0,Provincia,Municipio,Superficie,Población
0,Murillo,La Paz,3.04,779.728
1,Murillo,Palca,737.0,16.959
2,Murillo,Mecapaca,511.0,16.324
3,Murillo,Achocalla,182.0,22.594
4,Murillo,El Alto,345.0,860.062
5,Ingavi,Viacha,849.0,81.668
6,Los Andes,Pucarani,930.0,29.04
7,Los Andes,Laja,691.0,24.975


En esta tabla se puede observar que los Municipios con mayor población son La Paz y El Alto.

Ahora obtendremos los datos de las coordenadas de los Municipios con la libreria de Geopy.

Creamos una función con Geopy y agregamos los datos de las coordenadas nuestro dataframe.

In [4]:
geolocator = Nominatim(user_agent="s_s_explorer")

def get_coordinates(row):
    location = geolocator.geocode(row['Municipio'])
    if row['Municipio'] == 'La Paz':
        location = geolocator.geocode('La Paz, Bolivia')
        return pd.Series({'Latitud': location.latitude, 'Longitud': location.longitude})
    else:
        if row['Municipio'] == 'El Alto':
            location = geolocator.geocode('El Alto, Bolivia')
            return pd.Series({'Latitud': location.latitude, 'Longitud': location.longitude})
        else:
            if location:
                return pd.Series({'Latitud': location.latitude, 'Longitud': location.longitude})
            else:
                return pd.Series({'Latitud': None, 'Longitud': None})

df[['Latitud', 'Longitud']] = df.apply(get_coordinates, axis=1)
df

Unnamed: 0,Provincia,Municipio,Superficie,Población,Latitud,Longitud
0,Murillo,La Paz,3.04,779.728,-16.495545,-68.133623
1,Murillo,Palca,737.0,16.959,-16.559034,-67.951544
2,Murillo,Mecapaca,511.0,16.324,-16.667952,-68.017938
3,Murillo,Achocalla,182.0,22.594,-16.568074,-68.170956
4,Murillo,El Alto,345.0,860.062,-16.504823,-68.162434
5,Ingavi,Viacha,849.0,81.668,-16.653486,-68.302189
6,Los Andes,Pucarani,930.0,29.04,-16.399816,-68.477922
7,Los Andes,Laja,691.0,24.975,-37.279511,-72.714921


Creamos un mapa de La Paz, utilizamos la libreria Folium

In [5]:
map_la_paz = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# añadir marcadores al mapa
for lat, lng, Municipio in zip(df['Latitud'], df['Longitud'], df['Municipio']):
    label = '{}'.format(Municipio)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la_paz)  
    
map_la_paz

Exploramos la información de La Paz, buscaremos los lugares por categorias utilizaremos el sitio de Foursquare y las pasamos a un dataframe.

In [6]:
URL = "https://location.foursquare.com/places/docs/categories"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
table_ = soup.find('table')
df_categories = pd.read_html(str(table_))[0]
df_categories.head()

Unnamed: 0,Category IDs,Category Labels,Countries Supported
0,10000,Arts and Entertainment,all
1,10001,Arts and Entertainment > Amusement Park,all
2,10002,Arts and Entertainment > Aquarium,all
3,10003,Arts and Entertainment > Arcade,all
4,10004,Arts and Entertainment > Art Gallery,all


In [7]:
df_categories = df_categories[~df_categories["Category Labels"].str.contains(">")]
df_categories = df_categories.drop("Countries Supported", axis=1)
df_categories = df_categories.reset_index(drop=True)
df_categories

Unnamed: 0,Category IDs,Category Labels
0,10000,Arts and Entertainment
1,11000,Business and Professional Services
2,12000,Community and Government
3,13000,Dining and Drinking
4,14000,Event
5,15000,Health and Medicine
6,16000,Landmarks and Outdoors
7,17000,Retail
8,18000,Sports and Recreation
9,19000,Travel and Transportation


Cargamos las creenciales 

In [8]:
CLIENT_ID = '55S2ZJJJVK3T1D1ZDNQLHEAH15HG4AHHMYJZNETKJK5EXKBB' # su ID de Foursquare
CLIENT_SECRET = 'NRH3ODLT453CFZY2C0P33DJEDKCXROHBNC1OQSMT52ST3TD4' # su Secreto de Cliente de Foursquare
VERSION = '20180604'
LIMIT = 50
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 55S2ZJJJVK3T1D1ZDNQLHEAH15HG4AHHMYJZNETKJK5EXKBB
CLIENT_SECRET:NRH3ODLT453CFZY2C0P33DJEDKCXROHBNC1OQSMT52ST3TD4


#### Definir la URL correspondiente

#### Procesar el objeto JSON, convetirlo y limpiar el dataframe

Exploramos la información de La Paz en FOURSQUARE.

In [9]:
import requests

url = "https://api.foursquare.com/v3/places/search"

params = {

  	"near": "La Paz, Bo",
  	"open_now": "true",
  	"sort":"DISTANCE"
}

headers = {
    "Accept": "application/json",
    "Authorization": "fsq3a5oXiMEiuZYusTTUUGz8CHLfGZvmouDBQSuiyW4QVys="
}

results= requests.request("GET", url, params=params, headers=headers).json()

results

{'results': [{'fsq_id': '6096cb7fc9549334f85f1e1b',
   'categories': [{'id': 17046,
     'name': 'Lingerie Store',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/apparel_lingerie_',
      'suffix': '.png'}}],
   'chains': [],
   'distance': 169907,
   'geocodes': {'main': {'latitude': -16.495956, 'longitude': -68.135796}},
   'link': '/v3/places/6096cb7fc9549334f85f1e1b',
   'location': {'address': 'Shopping Norte 2do Nivel Of. 252',
    'country': 'BO',
    'cross_street': 'Potosí y Yanacocha',
    'formatted_address': 'Shopping Norte 2do Nivel Of. 252 (Potosí y Yanacocha), La Paz',
    'locality': 'Provincia Murillo',
    'region': 'La Paz'},
   'name': 'E&E Lencería',
   'related_places': {'parent': {'fsq_id': '4d5fbf9d618aa0900bc5f7e1',
     'name': 'Shopping Norte'}},
   'timezone': 'America/La_Paz'},
  {'fsq_id': '4fc26a7be4b02db73e89295d',
   'categories': [{'id': 13026,
     'name': 'BBQ Joint',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories

In [10]:
# asignar la parte relevante del JSON a la variable venues 
venues = results['results']

# convertir venues en un dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,fsq_id,categories,chains,distance,link,name,timezone,geocodes.main.latitude,geocodes.main.longitude,location.address,location.country,location.cross_street,location.formatted_address,location.locality,location.region,related_places.parent.fsq_id,related_places.parent.name,location.postcode
0,6096cb7fc9549334f85f1e1b,"[{'id': 17046, 'name': 'Lingerie Store', 'icon...",[],169907,/v3/places/6096cb7fc9549334f85f1e1b,E&E Lencería,America/La_Paz,-16.495956,-68.135796,Shopping Norte 2do Nivel Of. 252,BO,Potosí y Yanacocha,Shopping Norte 2do Nivel Of. 252 (Potosí y Yan...,Provincia Murillo,La Paz,4d5fbf9d618aa0900bc5f7e1,Shopping Norte,
1,4fc26a7be4b02db73e89295d,"[{'id': 13026, 'name': 'BBQ Joint', 'icon': {'...",[],169959,/v3/places/4fc26a7be4b02db73e89295d,La Casa de mi Viejita Restaurant,America/La_Paz,-16.49573,-68.11761,Av Pasoskanki,BO,Esquinado Cruce de Villas,Av Pasoskanki (Esquinado Cruce de Villas),,,,,
2,5c7f3f1f2be425002c9feff7,"[{'id': 13031, 'name': 'Burger Joint', 'icon':...",[],170030,/v3/places/5c7f3f1f2be425002c9feff7,Mestiza,America/La_Paz,-16.49714,-68.138003,Calle Sagarnaga #227,BO,Entre Linares Y Murillo,Calle Sagarnaga #227 (Entre Linares Y Murillo)...,La Paz,La Paz,,,
3,4ee7cfa3722e4c7e7133aeb0,"[{'id': 13034, 'name': 'Café', 'icon': {'prefi...",[],170076,/v3/places/4ee7cfa3722e4c7e7133aeb0,Cafe Del Mundo,America/La_Paz,-16.49757,-68.138618,,BO,,La Paz,La Paz,La Paz,,,
4,5c5b01eb345cbe002cfeb67b,"[{'id': 13377, 'name': 'Vegan and Vegetarian R...",[],170086,/v3/places/5c5b01eb345cbe002cfeb67b,Bolivia Green Kitchen,America/La_Paz,-16.497656,-68.13851,Saranaga 315,BO,,"Saranaga 315, La Paz",La Paz,La Paz,,,


#### Definimos la información de interes y filtramos el dataframe

In [11]:
# mantener unicamente las columnas que incluyan el nombre del lugar y cualquier cosa asociada a la ubicación
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['fsq_id'] + [col for col in dataframe.columns if col.startswith('geocodes.main')]
dataframe_filtered = dataframe.loc[:, filtered_columns]

# función para extraer la categoría del lugar
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filtrar la categoría para cada fila
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# limpiar la columna "names" manteniendo solo el último término
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,country,cross_street,formatted_address,locality,region,postcode,fsq_id,latitude,longitude
0,E&E Lencería,Lingerie Store,Shopping Norte 2do Nivel Of. 252,BO,Potosí y Yanacocha,Shopping Norte 2do Nivel Of. 252 (Potosí y Yan...,Provincia Murillo,La Paz,,6096cb7fc9549334f85f1e1b,-16.495956,-68.135796
1,La Casa de mi Viejita Restaurant,BBQ Joint,Av Pasoskanki,BO,Esquinado Cruce de Villas,Av Pasoskanki (Esquinado Cruce de Villas),,,,4fc26a7be4b02db73e89295d,-16.49573,-68.11761
2,Mestiza,Burger Joint,Calle Sagarnaga #227,BO,Entre Linares Y Murillo,Calle Sagarnaga #227 (Entre Linares Y Murillo)...,La Paz,La Paz,,5c7f3f1f2be425002c9feff7,-16.49714,-68.138003
3,Cafe Del Mundo,Café,,BO,,La Paz,La Paz,La Paz,,4ee7cfa3722e4c7e7133aeb0,-16.49757,-68.138618
4,Bolivia Green Kitchen,Vegan and Vegetarian Restaurant,Saranaga 315,BO,,"Saranaga 315, La Paz",La Paz,La Paz,,5c5b01eb345cbe002cfeb67b,-16.497656,-68.13851
5,Sol y Luna,Restaurant,calle Murillo 999,BO,calle Cochabamba,"calle Murillo 999 (calle Cochabamba), La Paz",La Paz,La Paz,,4d02c95d8620224bd7f69e40,-16.498093,-68.137004
6,Hb Bronze Coffeebar,Coffee Shop,Plaza Thomas Frías 1570,BO,,"Plaza Thomas Frías 1570, La Paz",La Paz,La Paz,,5820a83567957a26e48ff29d,-16.498087,-68.13054
7,Restaurante Casa de España,Spanish Restaurant,Av. Camacho Nº1484,BO,Bueno,"Av. Camacho Nº1484 (Bueno), La Paz 00000",Provincia Murillo,La Paz,0.0,53d81ed4498ee5928d83c962,-16.500382,-68.132763
8,Hispania,Paella Restaurant,Av. Camacho,BO,Calle Bueno,"Av. Camacho (Calle Bueno), La Paz 5912",La Paz,La Paz,5912.0,56084342498e9b727b12a5ef,-16.500611,-68.131628
9,Brosso,Ice Cream Parlor,Av. 16 de julio (El Prado) #1473,BO,,"Av. 16 de julio (El Prado) #1473, La Paz",La Paz,LP,,4d0cde647d28721e1c78f520,-16.500951,-68.13308


Exploramos la información

In [12]:
dataframe_filtered.name

0                        E&E Lencería
1    La Casa de mi Viejita Restaurant
2                             Mestiza
3                      Cafe Del Mundo
4               Bolivia Green Kitchen
5                          Sol y Luna
6                 Hb Bronze Coffeebar
7          Restaurante Casa de España
8                            Hispania
9                              Brosso
Name: name, dtype: object

In [13]:
print(dataframe_filtered.shape)
dataframe_filtered.head()

(10, 12)


Unnamed: 0,name,categories,address,country,cross_street,formatted_address,locality,region,postcode,fsq_id,latitude,longitude
0,E&E Lencería,Lingerie Store,Shopping Norte 2do Nivel Of. 252,BO,Potosí y Yanacocha,Shopping Norte 2do Nivel Of. 252 (Potosí y Yan...,Provincia Murillo,La Paz,,6096cb7fc9549334f85f1e1b,-16.495956,-68.135796
1,La Casa de mi Viejita Restaurant,BBQ Joint,Av Pasoskanki,BO,Esquinado Cruce de Villas,Av Pasoskanki (Esquinado Cruce de Villas),,,,4fc26a7be4b02db73e89295d,-16.49573,-68.11761
2,Mestiza,Burger Joint,Calle Sagarnaga #227,BO,Entre Linares Y Murillo,Calle Sagarnaga #227 (Entre Linares Y Murillo)...,La Paz,La Paz,,5c7f3f1f2be425002c9feff7,-16.49714,-68.138003
3,Cafe Del Mundo,Café,,BO,,La Paz,La Paz,La Paz,,4ee7cfa3722e4c7e7133aeb0,-16.49757,-68.138618
4,Bolivia Green Kitchen,Vegan and Vegetarian Restaurant,Saranaga 315,BO,,"Saranaga 315, La Paz",La Paz,La Paz,,5c5b01eb345cbe002cfeb67b,-16.497656,-68.13851


#### Exploraremos una Ubicación, La Paz
## Explorar Sitios Populares

In [14]:
address = 'Calacoto, La Paz'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of La Paz are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of La Paz are -16.5419857, -68.0787444.


In [15]:
search_query = 'San Miguel'
radius = 500
print(search_query + ' .... OK!')

San Miguel .... OK!


In [16]:
url = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)

results= requests.request("GET", url, params=params, headers=headers).json()

results

{'meta': {'code': 200, 'requestId': '645687f0b06b8f4147330270'},
 'response': {'venues': [],
  'geocode': {'what': '',
   'where': 'la paz bo',
   'feature': {'cc': 'BO',
    'name': 'La Paz',
    'displayName': 'La Paz',
    'matchedName': 'La Paz, BO',
    'highlightedName': '<b>La Paz</b>, <b>BO</b>',
    'woeType': 7,
    'slug': 'la-paz-bolivia',
    'id': 'geonameid:3911925',
    'longId': '72057594041839861',
    'geometry': {'center': {'lat': -16.5, 'lng': -68.15},
     'bounds': {'ne': {'lat': -16.425547, 'lng': -68.081635},
      'sw': {'lat': -16.557226, 'lng': -68.222519}}}},
   'parents': []}}}

In [17]:
if len(results['response']['venues']) == 0:
    trending_venues_df = 'No trending venues are available at the moment!'
    
else:
    trending_venues = results['response']['venues']
    trending_venues_df = json_normalize(trending_venues)

    # filtrar columnas
    columns_filtered = ['name', 'categories'] + ['location.distance', 'location.city', 'location.postalCode', 'location.state', 'location.country', 'location.lat', 'location.lng']
    trending_venues_df = trending_venues_df.loc[:, columns_filtered]

    # filtrar la categoría para cada fila
    trending_venues_df['categories'] = trending_venues_df.apply(get_category_type, axis=1)

In [18]:
# mostrar los sitios populares
trending_venues_df

'No trending venues are available at the moment!'

La plataforma no tiene los sitios más populares de Calacoto en este momento.

In [51]:
if len(results['response']['venues']) == 0:
    venues_map = 'Cannot generate visual as no trending venues are available at the moment!'

else:
    venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Calacoto


    # añadir Ecco como una marca circular roja
    folium.CircleMarker(
        [latitude, longitude],
        radius=10,
        popup='Café',
        fill=True,
        color='red',
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)


    # añadir los sitios populares como marcas circulares azules
    for lat, lng, label in zip(trending_venues_df['location.lat'], trending_venues_df['location.lng'], trending_venues_df['name']):
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            poup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(venues_map)

In [52]:
# mostrar el mapa
venues_map

'Cannot generate visual as no trending venues are available at the moment!'

La plataforma tampoco pudo generar un mapa de los sitios más populares de Calacoto en este momento.

## Continuando con el análisis, vamos crear un dataframe con la información de Foursquare API

In [21]:
print(dataframe_filtered.shape)
dataframe_filtered.head()

(10, 12)


Unnamed: 0,name,categories,address,country,cross_street,formatted_address,locality,region,postcode,fsq_id,latitude,longitude
0,E&E Lencería,Lingerie Store,Shopping Norte 2do Nivel Of. 252,BO,Potosí y Yanacocha,Shopping Norte 2do Nivel Of. 252 (Potosí y Yan...,Provincia Murillo,La Paz,,6096cb7fc9549334f85f1e1b,-16.495956,-68.135796
1,La Casa de mi Viejita Restaurant,BBQ Joint,Av Pasoskanki,BO,Esquinado Cruce de Villas,Av Pasoskanki (Esquinado Cruce de Villas),,,,4fc26a7be4b02db73e89295d,-16.49573,-68.11761
2,Mestiza,Burger Joint,Calle Sagarnaga #227,BO,Entre Linares Y Murillo,Calle Sagarnaga #227 (Entre Linares Y Murillo)...,La Paz,La Paz,,5c7f3f1f2be425002c9feff7,-16.49714,-68.138003
3,Cafe Del Mundo,Café,,BO,,La Paz,La Paz,La Paz,,4ee7cfa3722e4c7e7133aeb0,-16.49757,-68.138618
4,Bolivia Green Kitchen,Vegan and Vegetarian Restaurant,Saranaga 315,BO,,"Saranaga 315, La Paz",La Paz,La Paz,,5c5b01eb345cbe002cfeb67b,-16.497656,-68.13851


Revisemos cuantos lugares se tiene para cada categoria

In [22]:
def getNearbyPlaces(names, latitudes, longitudes, categorias, radius=1000):
    LIMIT=50
    places_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        for category in zip(categorias):
            
            # crear la URL de solicitud de API
            url = "https://api.foursquare.com/v3/places/search?ll={}%2C{}&radius={}&categories{}=&limit={}".format(lat, lng, radius, category, LIMIT)
            headers = {
            "accept": "application/json",
            "Authorization": "fsq3a5oXiMEiuZYusTTUUGz8CHLfGZvmouDBQSuiyW4QVys="
            }
            
            # solicitud GET
            results = requests.get(url, headers=headers).json()
            places = results['results']

        
            # regresa solo información relevante de cada sitio cercano
            places_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['geocodes']['main']['latitude'], 
                v['geocodes']['main']['longitude'],  
                (v['categories'][0]['name'] if len(v['categories']) > 0 else None)) for v in places])
       


    nearby_places = pd.DataFrame([item for place_list in places_list for item in place_list])
    nearby_places.columns = ['Municipio', 
                  'Municipio Latitud', 
                  'municipio Longitud', 
                  'Place', 
                  'Place Latitud', 
                  'Place Longitud', 
                  'Place Category']
    
    return(nearby_places)

In [23]:
la_paz_places = getNearbyPlaces(names=df['Municipio'],
                                   latitudes=df['Latitud'],
                                   longitudes=df['Longitud'],
                                    categorias=df_categories['Category IDs']
                                  )

La Paz
Palca
Mecapaca
Achocalla
El Alto
Viacha
Pucarani
Laja


Analicemos el dataset de Venues

In [24]:
print(la_paz_places.shape)
la_paz_places.head()

(1220, 7)


Unnamed: 0,Municipio,Municipio Latitud,municipio Longitud,Place,Place Latitud,Place Longitud,Place Category
0,La Paz,-16.495545,-68.133623,Plaza Murillo,-16.495688,-68.133559,Plaza
1,La Paz,-16.495545,-68.133623,The Writer's Coffee,-16.496536,-68.1329,Coffee Shop
2,La Paz,-16.495545,-68.133623,Museo nacional de arte,-16.495748,-68.134499,Art Museum
3,La Paz,-16.495545,-68.133623,Ali Pacha,-16.497424,-68.133053,Vegan and Vegetarian Restaurant
4,La Paz,-16.495545,-68.133623,Alexander Coffee,-16.496545,-68.13464,Coffee Shop


Podemos observar que hay 1220 venues en los 7 Municipios en nuestra dataset. Podemos observar que nos muestra los lugares "Place" con su descripción de categoría, como la Plaza Murillo.

Utilizaremos esta información para categorizar utilizando agrupaciones por K_Means.

Cuantifiquemos la cantidad de Venues por Municipio.

In [25]:
la_paz_places.groupby('Municipio').count()

Unnamed: 0_level_0,Municipio Latitud,municipio Longitud,Place,Place Latitud,Place Longitud,Place Category
Municipio,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Achocalla,10,10,10,10,10,10
El Alto,340,340,340,340,340,340
La Paz,500,500,500,500,500,500
Laja,350,350,350,350,350,290
Viacha,20,20,20,20,20,20


Observemos las categorías únicas

In [26]:
print('There are {} uniques categories.'.format(len(la_paz_places['Place Category'].unique())))

There are 63 uniques categories.


Se tiene 63 categorías únicas. Ahora los agrupamos por categoría del lugar y analizaremos la información.

In [27]:
la_paz_places.groupby('Place Category').count()

Unnamed: 0_level_0,Municipio,Municipio Latitud,municipio Longitud,Place,Place Latitud,Place Longitud
Place Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accounting and Bookkeeping Service,10,10,10,10,10,10
Argentinian Restaurant,10,10,10,10,10,10
Art Museum,30,30,30,30,30,30
Arts and Entertainment,50,50,50,50,50,50
Automotive Retail,20,20,20,20,20,20
...,...,...,...,...,...,...
Sushi Restaurant,7,7,7,7,7,7
Swimming Pool,10,10,10,10,10,10
Theater,13,13,13,13,13,13
Travel Agency,20,20,20,20,20,20


Se observa información de restaurantes veganos, sushi, museos, piscinas...

Ahora necesitamos la frecuencia para cada lugar, para esto necesitamos convertir a one-hot vector.

In [28]:
# codificación
la_paz_one = pd.get_dummies(la_paz_places[['Place Category']], prefix="", prefix_sep="")

# añadir la columna de barrio de regreso al dataframe
la_paz_one['Municipio'] = la_paz_places['Municipio']


# mover la columna de barrio a la primer columna
fixed_columns = [la_paz_one.columns[-1]] + list(la_paz_one.columns[:-1])
la_paz_one = la_paz_one[fixed_columns]

la_paz_one.head()

Unnamed: 0,Municipio,Accounting and Bookkeeping Service,Argentinian Restaurant,Art Museum,Arts and Entertainment,Automotive Retail,BBQ Joint,Bakery,Bank,Bar,...,South American Restaurant,Spiritual Center,Sports and Recreation,Stadium,Steakhouse,Sushi Restaurant,Swimming Pool,Theater,Travel Agency,Vegan and Vegetarian Restaurant
0,La Paz,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,La Paz,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,La Paz,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,La Paz,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,La Paz,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
la_paz_one.shape

(1220, 63)

Agrupando las filas tomando la média de la frecuancia de la ocurrencia de cada categoría

In [30]:
la_paz_grouped = la_paz_one.groupby('Municipio').mean().reset_index()
la_paz_grouped

Unnamed: 0,Municipio,Accounting and Bookkeeping Service,Argentinian Restaurant,Art Museum,Arts and Entertainment,Automotive Retail,BBQ Joint,Bakery,Bank,Bar,...,South American Restaurant,Spiritual Center,Sports and Recreation,Stadium,Steakhouse,Sushi Restaurant,Swimming Pool,Theater,Travel Agency,Vegan and Vegetarian Restaurant
0,Achocalla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,El Alto,0.0,0.029412,0.0,0.147059,0.058824,0.0,0.029412,0.088235,0.0,...,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0
2,La Paz,0.0,0.0,0.06,0.0,0.0,0.02,0.0,0.026,0.04,...,0.046,0.0,0.0,0.0,0.054,0.014,0.0,0.006,0.02,0.04
3,Laja,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,...,0.0,0.028571,0.0,0.028571,0.0,0.0,0.028571,0.0,0.028571,0.0
4,Viacha,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Veamos los TOP 5 lugares con mayor frecuencia.

In [31]:
num_top_venues = 5

for hood in la_paz_grouped['Municipio']:
    print("----"+hood+"----")
    temp = la_paz_grouped[la_paz_grouped['Municipio'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Achocalla----
                                venue  freq
0                          Restaurant   1.0
1  Accounting and Bookkeeping Service   0.0
2                 Sandwich Restaurant   0.0
3           Latin American Restaurant   0.0
4                  Mobile Phone Store   0.0


----El Alto----
                         venue  freq
0       Arts and Entertainment  0.15
1  Grocery Store / Supermarket  0.09
2                         Bank  0.09
3            Automotive Retail  0.06
4                        Plaza  0.06


----La Paz----
            venue  freq
0     Coffee Shop  0.09
1            Café  0.08
2  Breakfast Spot  0.06
3      Art Museum  0.06
4          Museum  0.06


----Laja----
                                venue  freq
0                           Drugstore  0.06
1                                Bank  0.06
2                          Restaurant  0.06
3                            Engineer  0.06
4  Accounting and Bookkeeping Service  0.03


----Viacha----
                     

Colocamos la información en un dataframe

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Veamos los TOP 10 lugares con mayor frecuencia y generemos un dataframe.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['Municipio']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
municipios_venues_sorted = pd.DataFrame(columns=columns)
municipios_venues_sorted['Municipio'] = la_paz_grouped['Municipio']

for ind in np.arange(la_paz_grouped.shape[0]):
    municipios_venues_sorted.iloc[ind, 1:] = return_most_common_venues(la_paz_grouped.iloc[ind, :], num_top_venues)

municipios_venues_sorted

Unnamed: 0,Municipio,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Achocalla,Restaurant,Accounting and Bookkeeping Service,Sandwich Restaurant,Latin American Restaurant,Mobile Phone Store,Mountain,Museum,Other Great Outdoors,Performing Arts Venue,Peruvian Restaurant
1,El Alto,Arts and Entertainment,Grocery Store / Supermarket,Bank,Automotive Retail,Plaza,Fast Food Restaurant,Electronics Store,Pizzeria,Mobile Phone Store,Pub
2,La Paz,Coffee Shop,Café,Breakfast Spot,Art Museum,Museum,Steakhouse,South American Restaurant,Latin American Restaurant,Fried Chicken Joint,Vegan and Vegetarian Restaurant
3,Laja,Drugstore,Bank,Restaurant,Engineer,Accounting and Bookkeeping Service,Retail,Farmers' Market,Food Court,Furniture and Home Store,Government Department / Agency
4,Viacha,Plaza,Soccer Field,Accounting and Bookkeeping Service,Sandwich Restaurant,Latin American Restaurant,Mobile Phone Store,Mountain,Museum,Other Great Outdoors,Performing Arts Venue


## Agrupación K-Means algorithm

In [34]:
# establecer el número de agrupaciones

kclusters = 3

la_paz_grouped_clustering = la_paz_grouped.drop('Municipio', 1)

# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(la_paz_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
labels = kmeans.labels_
labels

  """
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=verbose)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype

array([2, 1, 1, 1, 0], dtype=int32)

Creamos un nuevo dataframe incluyendo la agrupación así como los 10 sitios mas populares de cada Municipi.

In [35]:
# añadir etiquetas
municipios_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

la_paz_merged = df

# juntar la_paz_grouped con df 
la_paz_merged = la_paz_merged.join(municipios_venues_sorted.set_index('Municipio'), on='Municipio')

la_paz_merged.head() # revisar las ultimas columnas

Unnamed: 0,Provincia,Municipio,Superficie,Población,Latitud,Longitud,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Murillo,La Paz,3.04,779.728,-16.495545,-68.133623,1.0,Coffee Shop,Café,Breakfast Spot,Art Museum,Museum,Steakhouse,South American Restaurant,Latin American Restaurant,Fried Chicken Joint,Vegan and Vegetarian Restaurant
1,Murillo,Palca,737.0,16.959,-16.559034,-67.951544,,,,,,,,,,,
2,Murillo,Mecapaca,511.0,16.324,-16.667952,-68.017938,,,,,,,,,,,
3,Murillo,Achocalla,182.0,22.594,-16.568074,-68.170956,2.0,Restaurant,Accounting and Bookkeeping Service,Sandwich Restaurant,Latin American Restaurant,Mobile Phone Store,Mountain,Museum,Other Great Outdoors,Performing Arts Venue,Peruvian Restaurant
4,Murillo,El Alto,345.0,860.062,-16.504823,-68.162434,1.0,Arts and Entertainment,Grocery Store / Supermarket,Bank,Automotive Retail,Plaza,Fast Food Restaurant,Electronics Store,Pizzeria,Mobile Phone Store,Pub


Creamos un mapa para visualizar los municipios agrupados

In [38]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from geopy.geocoders import Nominatim

In [46]:
la_paz_merged.loc[la_paz_merged['Cluster Labels'] == 0, la_paz_merged.columns[[1] + list(range(5, la_paz_merged.shape[1]))]]


Unnamed: 0,Municipio,Longitud,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Viacha,-68.302189,0.0,Plaza,Soccer Field,Accounting and Bookkeeping Service,Sandwich Restaurant,Latin American Restaurant,Mobile Phone Store,Mountain,Museum,Other Great Outdoors,Performing Arts Venue


In [41]:
df_places= df
k_1 = 3
kmeans_1 = KMeans(n_clusters=k_1, random_state=0).fit(df_places[["Población"]])

# Añadir etiquetas de cluster al dataframe
df_places["Cluster"] = kmeans_1.labels_
df_places

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=verbose)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.

Unnamed: 0,Provincia,Municipio,Superficie,Población,Latitud,Longitud,Cluster
0,Murillo,La Paz,3.04,779.728,-16.495545,-68.133623,2
1,Murillo,Palca,737.0,16.959,-16.559034,-67.951544,1
2,Murillo,Mecapaca,511.0,16.324,-16.667952,-68.017938,1
3,Murillo,Achocalla,182.0,22.594,-16.568074,-68.170956,1
4,Murillo,El Alto,345.0,860.062,-16.504823,-68.162434,0
5,Ingavi,Viacha,849.0,81.668,-16.653486,-68.302189,1
6,Los Andes,Pucarani,930.0,29.04,-16.399816,-68.477922,1
7,Los Andes,Laja,691.0,24.975,-37.279511,-72.714921,1


Creamos un mapa

In [42]:
# crear mapa
map_clusters_1 = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(la_paz_merged['Latitud'], la_paz_merged['Longitud'], la_paz_merged['Municipio'], df_places['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_1)
       
map_clusters_1

In [43]:
places_cluster=la_paz_places[["Municipio", "Place", "Place Latitud", "Place Longitud"]]
places_cluster.head()

Unnamed: 0,Municipio,Place,Place Latitud,Place Longitud
0,La Paz,Plaza Murillo,-16.495688,-68.133559
1,La Paz,The Writer's Coffee,-16.496536,-68.1329
2,La Paz,Museo nacional de arte,-16.495748,-68.134499
3,La Paz,Ali Pacha,-16.497424,-68.133053
4,La Paz,Alexander Coffee,-16.496545,-68.13464


In [44]:
k_2 = 3
kmeans_2 = KMeans(n_clusters=k_2, random_state=0).fit(places_cluster[["Place Latitud", "Place Longitud"]])

# Añadir etiquetas de cluster al dataframe
places_cluster["Cluster"] = kmeans_2.labels_
places_cluster.head()

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=verbose)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.

Unnamed: 0,Municipio,Place,Place Latitud,Place Longitud,Cluster
0,La Paz,Plaza Murillo,-16.495688,-68.133559,0
1,La Paz,The Writer's Coffee,-16.496536,-68.1329,0
2,La Paz,Museo nacional de arte,-16.495748,-68.134499,0
3,La Paz,Ali Pacha,-16.497424,-68.133053,0
4,La Paz,Alexander Coffee,-16.496545,-68.13464,0


In [45]:
# crear mapa
map_clusters_2 = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(places_cluster['Place Latitud'], places_cluster['Place Longitud'], places_cluster['Place'], places_cluster['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_2)
       
map_clusters_2

Los lugares más proximos y en mayor cantidad estan ubicados en las zonas de El Alto y la zona central de La Paz.
Para la zona Sur de La Paz, todavía no se tiene información.

## Conclusión

Como conclusión, se recomienda la zona del centro de La Paz, para realizar inversiones turísticas, ya que son las más frecuentadas.
La calle Potosí es donde se ubican más lugares frecuentados.
