<h1 align=center><font size = 5>Segmentacion y Agrupacion de Municipios del AMSS</font></h1>


## Introduccion

### Seccion I: Analisis del problema y antecedentes


Descripción del problema y antecedentes:
El área metropolitana de San Salvador, capital de El Salvador, es una zona altamente poblada y compleja desde el punto de vista social, económico y geográfico. En esta área conviven 14 municipios con características y necesidades distintas, lo que dificulta la toma de decisiones por parte de las autoridades y la identificación de problemáticas comunes.

En este contexto, el objetivo de este proyecto es agrupar y segmentar los municipios del área metropolitana de San Salvador con el fin de identificar patrones y tendencias relevantes que permitan mejorar la comprensión de la dinámica de la zona y la toma de decisiones.

En términos de antecedentes, existen diversos trabajos previos relacionados con la agrupación y segmentación de municipios en otras partes del mundo. Por ejemplo, en México se ha utilizado el análisis de conglomerados para identificar grupos de municipios con características similares en términos de indicadores socioeconómicos y de salud. En Colombia se ha utilizado el análisis de componentes principales para identificar factores clave que explican la variabilidad entre los municipios en términos de desarrollo humano.

### seccion II: Descripción de los datos y su uso en el proyecto

Los datos utilizados en este proyecto provienen de diversas fuentes, como datos geograficos de la api de FOURSQUARE, entre otras. Estos datos incluyen variables como locales comerciales, centros turisticos, el nivel socioeconómico, la oferta de servicios públicos, entre otras.

Para utilizar estos datos en el proyecto, se realizará un proceso de limpieza y procesamiento para asegurar su calidad y consistencia. Luego, se aplicarán técnicas de minería de datos y aprendizaje automático para identificar patrones y tendencias entre los municipios y agruparlos en función de variables relevantes.

Los resultados obtenidos permitirán a los tomadores de decisiones tener una mejor comprensión de la dinámica del área metropolitana de San Salvador y tomar decisiones más informadas en cuanto a la asignación de recursos y políticas públicas.

## Indice

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="https://#item1">Descargar y Explorar el Conjunto de Datos</a>

2.  <a href="https://#item2">Explorar Vecindarios en la Ciudad de San Salvador</a>

3.  <a href="https://#item3">Analizar cada Vecindario</a>

4.  <a href="https://#item4">Agrupaciones de Vecindarios</a>

5.  <a href="https://#item5">Examinar Agrupaciones</a> </font>
    </div>

instalamos todas las dependencias necesarias

In [219]:
!pip install geopy

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 



## 1. Descargamos y exploramos el conjuno de datos

El Area Metropolitana de San Salvador es la zona mas densamente poblada de El Salvador y cuenta con 14 municipios asi que necesitamos datos que contengan los 14 municipios ademas de su latitud y longitud 

Obtenemos los datos de los municipios y los combertimos en un dataframe pandas

In [220]:
URL = "https://es.wikipedia.org/wiki/%C3%81rea_metropolitana_de_San_Salvador"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
table = soup.find('table', class_='wikitable')
df = pd.read_html(str(table))[0]
df

Unnamed: 0,Municipio,Población
0,San Salvador,950090
1,Soyapango,665403
2,Mejicanos,240751
3,Apopa,215286
4,Santa Tecla (La Libertad),205908
5,Ciudad Delgado,160200
6,Ilopango,145862
7,Tonacatepeque,100896
8,San Martín,85758
9,Cuscatancingo,66400


Eliminamos las filas que no nos interesan

In [221]:
df = df.drop(index=[14,15])
df

Unnamed: 0,Municipio,Población
0,San Salvador,950090
1,Soyapango,665403
2,Mejicanos,240751
3,Apopa,215286
4,Santa Tecla (La Libertad),205908
5,Ciudad Delgado,160200
6,Ilopango,145862
7,Tonacatepeque,100896
8,San Martín,85758
9,Cuscatancingo,66400


#### Ahora obtendremos los datos de las coordenadas de los municipios con la libreria de geopy

Creamos una funcion con geopy y agregamos los datos de las coordenadas nuestro dataframe

In [222]:

geolocator = Nominatim(user_agent="s_s_explorer")

def get_coordinates(row):
    location = geolocator.geocode(row['Municipio'])
    if row['Municipio'] == 'San Marcos':
        location = geolocator.geocode('San Marcos, El Salvador')
        return pd.Series({'Latitud': location.latitude, 'Longitud': location.longitude})
    else:
        if row['Municipio'] == 'San Martín':
            location = geolocator.geocode('San Martín, El Salvador')
            return pd.Series({'Latitud': location.latitude, 'Longitud': location.longitude})
        else:
            if location:
                return pd.Series({'Latitud': location.latitude, 'Longitud': location.longitude})
            else:
                return pd.Series({'Latitud': None, 'Longitud': None})

df[['Latitud', 'Longitud']] = df.apply(get_coordinates, axis=1)
df


Unnamed: 0,Municipio,Población,Latitud,Longitud
0,San Salvador,950090,13.698994,-89.191425
1,Soyapango,665403,13.703658,-89.150158
2,Mejicanos,240751,13.722484,-89.18699
3,Apopa,215286,13.801304,-89.179078
4,Santa Tecla (La Libertad),205908,13.674299,-89.288041
5,Ciudad Delgado,160200,13.722637,-89.170524
6,Ilopango,145862,13.694188,-89.110474
7,Tonacatepeque,100896,13.779962,-89.118167
8,San Martín,85758,13.738618,-89.055216
9,Cuscatancingo,66400,13.731439,-89.178589


#### Creamos un mapa del Gran San Salavador y superponemos los municipios

Para esto utilizamos la libreria Folium

In [223]:
map_san_salvador = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# añadir marcadores al mapa
for lat, lng, Municipio in zip(df['Latitud'], df['Longitud'], df['Municipio']):
    label = '{}'.format(Municipio)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_san_salvador)  
    
map_san_salvador

## 2. Exploramos los municipios del Gran San Salavador

Ya que cada llamada a la API de Foursquare solo nos retorna un maximo de 50 resultados, buscaremos los lugares por categorias asi que obtenemos las  categorias de busqueda del sitio de Foursquare y las pasamos a un dataframe 

In [224]:
URL = "https://location.foursquare.com/places/docs/categories"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
table_ = soup.find('table')
df_categories = pd.read_html(str(table_))[0]
df_categories.head()

Unnamed: 0,Category IDs,Category Labels,Countries Supported
0,10000,Arts and Entertainment,all
1,10001,Arts and Entertainment > Amusement Park,all
2,10002,Arts and Entertainment > Aquarium,all
3,10003,Arts and Entertainment > Arcade,all
4,10004,Arts and Entertainment > Art Gallery,all


como las categorias esta jerarqizadas solo dejamos las principales y las demas subcategorias las eliminamos de nuestro dataframe

In [225]:
df_categories = df_categories[~df_categories["Category Labels"].str.contains(">")]
df_categories = df_categories.drop("Countries Supported", axis=1)
df_categories = df_categories.reset_index(drop=True)
df_categories

Unnamed: 0,Category IDs,Category Labels
0,10000,Arts and Entertainment
1,11000,Business and Professional Services
2,12000,Community and Government
3,13000,Dining and Drinking
4,14000,Event
5,15000,Health and Medicine
6,16000,Landmarks and Outdoors
7,17000,Retail
8,18000,Sports and Recreation
9,19000,Travel and Transportation


#### Vamos a crear una funcion para obteer los lugares de cada municipio

In [226]:
def getNearbyPlaces(names, latitudes, longitudes, categorias, radius=1000):
    LIMIT=50
    places_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        for category in zip(categorias):
            
            # crear la URL de solicitud de API
            url = "https://api.foursquare.com/v3/places/search?ll={}%2C{}&radius={}&categories{}=&limit={}".format(lat, lng, radius, category, LIMIT)
            headers = {
            "accept": "application/json",
            "Authorization": "fsq3+btNZJiWH0LtffL3n4zu2NiE6iThx9A752cLEJs5q1U="
            }
            
            # solicitud GET
            results = requests.get(url, headers=headers).json()
            places = results['results']

        
            # regresa solo información relevante de cada sitio cercano
            places_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['geocodes']['main']['latitude'], 
                v['geocodes']['main']['longitude'],  
                (v['categories'][0]['name'] if len(v['categories']) > 0 else None)) for v in places])
       


    nearby_places = pd.DataFrame([item for place_list in places_list for item in place_list])
    nearby_places.columns = ['Municipio', 
                  'Municipio Latitud', 
                  'municipio Longitud', 
                  'Place', 
                  'Place Latitud', 
                  'Place Longitud', 
                  'Place Category']
    
    return(nearby_places)

Ahora ya tenemos lista la funcion creamos un  nuevo dataframe con los sitios

In [227]:
san_salvador_places = getNearbyPlaces(names=df['Municipio'],
                                   latitudes=df['Latitud'],
                                   longitudes=df['Longitud'],
                                    categorias=df_categories['Category IDs']
                                  )

San Salvador
Soyapango
Mejicanos
Apopa
Santa Tecla (La Libertad)
Ciudad Delgado
Ilopango
Tonacatepeque
San Martín
Cuscatancingo
San Marcos
Ayutuxtepeque
Antiguo Cuscatlán (La Libertad)
Nejapa


In [228]:
print(san_salvador_places.shape)
san_salvador_places.head()

(4800, 7)


Unnamed: 0,Municipio,Municipio Latitud,municipio Longitud,Place,Place Latitud,Place Longitud,Place Category
0,San Salvador,13.698994,-89.191425,Teatro Nacional,13.698768,-89.1907,Theater
1,San Salvador,13.698994,-89.191425,Coffee Tempo Centro Histórico,13.69888,-89.190865,Café
2,San Salvador,13.698994,-89.191425,Mori's Rooftop,13.699007,-89.190701,Beer Garden
3,San Salvador,13.698994,-89.191425,Palacio Nacional,13.697736,-89.19199,Historic and Protected Site
4,San Salvador,13.698994,-89.191425,Plaza Gerardo Barrios,13.697527,-89.191239,Plaza


Revisamos cuantos sitios nos regreso por cada municipio

In [229]:
san_salvador_places.groupby('Municipio').count()

Unnamed: 0_level_0,Municipio Latitud,municipio Longitud,Place,Place Latitud,Place Longitud,Place Category
Municipio,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Antiguo Cuscatlán (La Libertad),500,500,500,500,500,490
Apopa,450,450,450,450,450,430
Ayutuxtepeque,500,500,500,500,500,450
Ciudad Delgado,350,350,350,350,350,320
Cuscatancingo,240,240,240,240,240,210
Ilopango,80,80,80,80,80,80
Mejicanos,500,500,500,500,500,460
Nejapa,130,130,130,130,130,130
San Marcos,290,290,290,290,290,270
San Martín,170,170,170,170,170,160


Vemos cuantas categorias unicas tenemos (subcategorias, no las que se usaron en la llamada a la API)

In [230]:
print('There are {} uniques categories.'.format(len(san_salvador_places['Place Category'].unique())))

There are 111 uniques categories.


## 3. Analizar cada Municipio

In [231]:
# codificación
san_salvador_onehot = pd.get_dummies(san_salvador_places[['Place Category']], prefix="", prefix_sep="")

# añadir la columna de barrio de regreso al dataframe
san_salvador_onehot['Municipio'] = san_salvador_places['Municipio']


# mover la columna de barrio a la primer columna
fixed_columns = [san_salvador_onehot.columns[-1]] + list(san_salvador_onehot.columns[:-1])
san_salvador_onehot = san_salvador_onehot[fixed_columns]

san_salvador_onehot.head()

Unnamed: 0,Municipio,ATM,American Restaurant,Arepa Restaurant,Art Gallery,Arts and Crafts Store,Arts and Entertainment,Automotive Retail,Automotive Service,BBQ Joint,...,Taco Restaurant,Tattoo Parlor,Tea Room,Theater,Toy / Game Store,Turkish Restaurant,Vintage and Thrift Store,Water Park,Wine Bar,Wings Joint
0,San Salvador,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,San Salvador,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,San Salvador,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,San Salvador,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,San Salvador,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [232]:
san_salvador_onehot.shape

(4800, 111)

#### Agrupemos las filas por barrios tomando la média de la frecuancia de la ocurrencia de cada categoría

## *****

In [233]:
sansalvador_grouped = san_salvador_onehot.groupby('Municipio').mean().reset_index()
sansalvador_grouped

Unnamed: 0,Municipio,ATM,American Restaurant,Arepa Restaurant,Art Gallery,Arts and Crafts Store,Arts and Entertainment,Automotive Retail,Automotive Service,BBQ Joint,...,Taco Restaurant,Tattoo Parlor,Tea Room,Theater,Toy / Game Store,Turkish Restaurant,Vintage and Thrift Store,Water Park,Wine Bar,Wings Joint
0,Antiguo Cuscatlán (La Libertad),0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Apopa,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.022222,...,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ayutuxtepeque,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Ciudad Delgado,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0
4,Cuscatancingo,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Ilopango,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Mejicanos,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.04,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Nejapa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,San Marcos,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483
9,San Martín,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [234]:
sansalvador_grouped.shape

(14, 111)

Imprimamos los 5 sitios mas comunes en cada municipio

In [235]:
num_top_venues = 5

for hood in sansalvador_grouped['Municipio']:
    print("----"+hood+"----")
    temp = sansalvador_grouped[sansalvador_grouped['Municipio'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Antiguo Cuscatlán (La Libertad)----
                       venue  freq
0  Latin American Restaurant  0.14
1                Coffee Shop  0.08
2                       Café  0.06
3           Salad Restaurant  0.06
4                 Restaurant  0.04


----Apopa----
                         venue  freq
0  Grocery Store / Supermarket  0.11
1                    Drugstore  0.07
2     Furniture and Home Store  0.07
3                   Restaurant  0.04
4                Shopping Mall  0.04


----Ayutuxtepeque----
                       venue  freq
0  Latin American Restaurant  0.12
1                  Drugstore  0.06
2                     Bakery  0.06
3                 Food Truck  0.04
4                 Restaurant  0.04


----Ciudad Delgado----
                       venue  freq
0                      Plaza  0.06
1  Latin American Restaurant  0.06
2                   Cemetery  0.06
3          Automotive Retail  0.06
4                 Nail Salon  0.06


----Cuscatancingo----
                   

#### Ponemos eso en un dataframe

In [236]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Generemos el nuevo dataframe y mostremos los primeros 10 sitios de cada barrio.


In [237]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['Municipio']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
municipios_venues_sorted = pd.DataFrame(columns=columns)
municipios_venues_sorted['Municipio'] = sansalvador_grouped['Municipio']

for ind in np.arange(sansalvador_grouped.shape[0]):
    municipios_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sansalvador_grouped.iloc[ind, :], num_top_venues)

municipios_venues_sorted

Unnamed: 0,Municipio,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Antiguo Cuscatlán (La Libertad),Latin American Restaurant,Coffee Shop,Café,Salad Restaurant,Restaurant,Shopping Mall,Fast Food Restaurant,Miscellaneous Store,Health and Beauty Service,Diner
1,Apopa,Grocery Store / Supermarket,Drugstore,Furniture and Home Store,Restaurant,Shopping Mall,Credit Union,Fast Food Restaurant,Donut Shop,Diner,Department Store
2,Ayutuxtepeque,Latin American Restaurant,Drugstore,Bakery,Food Truck,Restaurant,Miscellaneous Store,Beer Garden,Other Great Outdoors,Diner,Farmers' Market
3,Ciudad Delgado,Plaza,Latin American Restaurant,Cemetery,Automotive Retail,Nail Salon,Coffee Shop,Drugstore,Design Studio,Snack Place,Candy Store
4,Cuscatancingo,Latin American Restaurant,Ice Cream Parlor,Cosmetics Store,Grocery Store / Supermarket,Bakery,Camera Store,Food Truck,Drugstore,Cemetery,Other Great Outdoors
5,Ilopango,Restaurant,Bank,Miscellaneous Store,Museum,Other Great Outdoors,Park,Hardware Store,American Restaurant,Tea Room,Tattoo Parlor
6,Mejicanos,Bakery,Latin American Restaurant,Mexican Restaurant,Hardware Store,Grocery Store / Supermarket,Food Truck,Diner,Fried Chicken Joint,Automotive Service,Bank
7,Nejapa,Latin American Restaurant,Grocery Store / Supermarket,Bar,Race Track,Cemetery,Pizzeria,Park,Other Great Outdoors,Restaurant,Stadium
8,San Marcos,Restaurant,Bakery,Fast Food Restaurant,Cafeteria,Wings Joint,Miscellaneous Store,Ice Cream Parlor,Coffee Shop,Sandwich Restaurant,Chinese Restaurant
9,San Martín,Restaurant,Department Store,Bank,Burger Joint,Fast Food Restaurant,Snack Place,Bookstore,Fried Chicken Joint,Eyecare Store,Bakery


## 4. Municipios Agrupados

Ejecutemos *k*-means para agrupar los barrios en 3 agrupaciones.


In [238]:
# establecer el número de agrupaciones
kclusters = 3

sansalvador_grouped_clustering = sansalvador_grouped.drop('Municipio', 1)

# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sansalvador_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
kmeans.labels_[0:10] 

  after removing the cwd from sys.path.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=verbose)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20

array([1, 0, 0, 0, 1, 2, 0, 1, 0, 0], dtype=int32)

In [239]:
sansalvador_grouped_clustering.head()

Unnamed: 0,ATM,American Restaurant,Arepa Restaurant,Art Gallery,Arts and Crafts Store,Arts and Entertainment,Automotive Retail,Automotive Service,BBQ Joint,Bakery,...,Taco Restaurant,Tattoo Parlor,Tea Room,Theater,Toy / Game Store,Turkish Restaurant,Vintage and Thrift Store,Water Park,Wine Bar,Wings Joint
0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.022222,0.022222,...,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.06,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Generemos un nuevo dataframe que incluya la agrupación asi como los 10 sitios mas populares de cada barrio.


In [240]:
# añadir etiquetas
municipios_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sansalvador_merged = df

# juntar sansalvador_grouped con df 
sansalvador_merged = sansalvador_merged.join(municipios_venues_sorted.set_index('Municipio'), on='Municipio')

sansalvador_merged.head() # revisar las ultimas columnas

Unnamed: 0,Municipio,Población,Latitud,Longitud,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,San Salvador,950090,13.698994,-89.191425,0,Café,Fried Chicken Joint,Plaza,Historic and Protected Site,Bookstore,Drugstore,Restaurant,Bakery,ATM,Hot Dog Joint
1,Soyapango,665403,13.703658,-89.150158,0,Drugstore,Department Store,Bakery,Fried Chicken Joint,Pizzeria,Stadium,Mexican Restaurant,American Restaurant,Donut Shop,Fast Food Restaurant
2,Mejicanos,240751,13.722484,-89.18699,0,Bakery,Latin American Restaurant,Mexican Restaurant,Hardware Store,Grocery Store / Supermarket,Food Truck,Diner,Fried Chicken Joint,Automotive Service,Bank
3,Apopa,215286,13.801304,-89.179078,0,Grocery Store / Supermarket,Drugstore,Furniture and Home Store,Restaurant,Shopping Mall,Credit Union,Fast Food Restaurant,Donut Shop,Diner,Department Store
4,Santa Tecla (La Libertad),205908,13.674299,-89.288041,0,Bar,Pizzeria,Café,Hardware Store,Latin American Restaurant,Spanish Restaurant,Candy Store,Shopping Mall,Furniture and Home Store,Seafood Restaurant


Finalmente visualizamos los municipios agrupados

In [241]:
# crear mapa
map_clusters = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(sansalvador_merged['Latitud'], sansalvador_merged['Longitud'], sansalvador_merged['Municipio'], sansalvador_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Ahora analisamos los cluster de municipios  que han sido agrupados por catergorias de los lugares mas comunes

Acontinuacion vemso los dataframes de los cluster y los lugares mas comunes en ellos

In [242]:
sansalvador_merged.loc[sansalvador_merged['Cluster Labels'] == 0, sansalvador_merged.columns[[1] + list(range(5, sansalvador_merged.shape[1]))]]

Unnamed: 0,Población,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,950090,Café,Fried Chicken Joint,Plaza,Historic and Protected Site,Bookstore,Drugstore,Restaurant,Bakery,ATM,Hot Dog Joint
1,665403,Drugstore,Department Store,Bakery,Fried Chicken Joint,Pizzeria,Stadium,Mexican Restaurant,American Restaurant,Donut Shop,Fast Food Restaurant
2,240751,Bakery,Latin American Restaurant,Mexican Restaurant,Hardware Store,Grocery Store / Supermarket,Food Truck,Diner,Fried Chicken Joint,Automotive Service,Bank
3,215286,Grocery Store / Supermarket,Drugstore,Furniture and Home Store,Restaurant,Shopping Mall,Credit Union,Fast Food Restaurant,Donut Shop,Diner,Department Store
4,205908,Bar,Pizzeria,Café,Hardware Store,Latin American Restaurant,Spanish Restaurant,Candy Store,Shopping Mall,Furniture and Home Store,Seafood Restaurant
5,160200,Plaza,Latin American Restaurant,Cemetery,Automotive Retail,Nail Salon,Coffee Shop,Drugstore,Design Studio,Snack Place,Candy Store
8,85758,Restaurant,Department Store,Bank,Burger Joint,Fast Food Restaurant,Snack Place,Bookstore,Fried Chicken Joint,Eyecare Store,Bakery
10,85209,Restaurant,Bakery,Fast Food Restaurant,Cafeteria,Wings Joint,Miscellaneous Store,Ice Cream Parlor,Coffee Shop,Sandwich Restaurant,Chinese Restaurant
11,34710,Latin American Restaurant,Drugstore,Bakery,Food Truck,Restaurant,Miscellaneous Store,Beer Garden,Other Great Outdoors,Diner,Farmers' Market


In [243]:
sansalvador_merged.loc[sansalvador_merged['Cluster Labels'] == 1, sansalvador_merged.columns[[1] + list(range(5, sansalvador_merged.shape[1]))]]

Unnamed: 0,Población,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,100896,Other Great Outdoors,Park,Latin American Restaurant,Landmarks and Outdoors,Café,Soccer Field,Art Gallery,Arepa Restaurant,Pizzeria,Plaza
9,66400,Latin American Restaurant,Ice Cream Parlor,Cosmetics Store,Grocery Store / Supermarket,Bakery,Camera Store,Food Truck,Drugstore,Cemetery,Other Great Outdoors
12,60698,Latin American Restaurant,Coffee Shop,Café,Salad Restaurant,Restaurant,Shopping Mall,Fast Food Restaurant,Miscellaneous Store,Health and Beauty Service,Diner
13,29458,Latin American Restaurant,Grocery Store / Supermarket,Bar,Race Track,Cemetery,Pizzeria,Park,Other Great Outdoors,Restaurant,Stadium


In [244]:
sansalvador_merged.loc[sansalvador_merged['Cluster Labels'] == 2, sansalvador_merged.columns[[1] + list(range(5, sansalvador_merged.shape[1]))]]

Unnamed: 0,Población,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,145862,Restaurant,Bank,Miscellaneous Store,Museum,Other Great Outdoors,Park,Hardware Store,American Restaurant,Tea Room,Tattoo Parlor


Algunas cosas saltan a la vista al anizar los cluster:
Primero que en el primer cluster(cluster 0) vemos que los lugares mas comunes que podemos encontrar en estos municipios son restaurantes, cafes, supermercados y bares;
Segundo que el segundo cluster (cluster 1) esta dominado por restaurantes de comida tipica en su mayoria;
Tercero que el utimo cluster (cluster 2) solo es formado por un municipio Ilopango.

#### Agruparemos los municipios en funcion de su poblacion

Creamos un dataframe con la informacion de los municipios y su poblacion y agregamos las etiquetas de sus clusters

In [245]:
df_places= df
k_1 = 3
kmeans_1 = KMeans(n_clusters=k_1, random_state=0).fit(df_places[["Población"]])

# Añadir etiquetas de cluster al dataframe
df_places["Cluster"] = kmeans_1.labels_
df_places

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=verbose)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.

Unnamed: 0,Municipio,Población,Latitud,Longitud,Cluster
0,San Salvador,950090,13.698994,-89.191425,1
1,Soyapango,665403,13.703658,-89.150158,1
2,Mejicanos,240751,13.722484,-89.18699,2
3,Apopa,215286,13.801304,-89.179078,2
4,Santa Tecla (La Libertad),205908,13.674299,-89.288041,2
5,Ciudad Delgado,160200,13.722637,-89.170524,2
6,Ilopango,145862,13.694188,-89.110474,2
7,Tonacatepeque,100896,13.779962,-89.118167,0
8,San Martín,85758,13.738618,-89.055216,0
9,Cuscatancingo,66400,13.731439,-89.178589,0


Generamos un mapa para vizualizar los clusters

In [246]:
# crear mapa
map_clusters_1 = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(sansalvador_merged['Latitud'], sansalvador_merged['Longitud'], sansalvador_merged['Municipio'], df_places['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_1)
       
map_clusters_1

#### Para termiar agruparemos los lugares en funcion de su pocicion geografica 

Primero creamos un dataframe con la nformacion de los lugares su coordenadas geograficas y a qu municipio pertenecen

In [247]:
places_cluster=san_salvador_places[["Municipio", "Place", "Place Latitud", "Place Longitud"]]
places_cluster.head()

Unnamed: 0,Municipio,Place,Place Latitud,Place Longitud
0,San Salvador,Teatro Nacional,13.698768,-89.1907
1,San Salvador,Coffee Tempo Centro Histórico,13.69888,-89.190865
2,San Salvador,Mori's Rooftop,13.699007,-89.190701
3,San Salvador,Palacio Nacional,13.697736,-89.19199
4,San Salvador,Plaza Gerardo Barrios,13.697527,-89.191239


Ejecutemos *k*-means para agrupar los lugares siempre en tres clusters


In [248]:
k_2 = 3
kmeans_2 = KMeans(n_clusters=k_2, random_state=0).fit(places_cluster[["Place Latitud", "Place Longitud"]])

# Añadir etiquetas de cluster al dataframe
places_cluster["Cluster"] = kmeans_2.labels_
places_cluster.head()

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=verbose)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.

Unnamed: 0,Municipio,Place,Place Latitud,Place Longitud,Cluster
0,San Salvador,Teatro Nacional,13.698768,-89.1907,1
1,San Salvador,Coffee Tempo Centro Histórico,13.69888,-89.190865,1
2,San Salvador,Mori's Rooftop,13.699007,-89.190701,1
3,San Salvador,Palacio Nacional,13.697736,-89.19199,1
4,San Salvador,Plaza Gerardo Barrios,13.697527,-89.191239,1


Por utimo visualizamos los clusters en un mapa

In [249]:
# crear mapa
map_clusters_2 = folium.Map(location=[df['Latitud'][0], df['Longitud'][0]], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(places_cluster['Place Latitud'], places_cluster['Place Longitud'], places_cluster['Place'], places_cluster['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_2)
       
map_clusters_2

Podemos observar que la mayoria de lugares estan ubicados en el centro del area metropolitana de San Salvador 