> # Clustering of the communes belonging to the capital Santiago de Chile.

![alt text](https://vulcanopro.s3.amazonaws.com/images/lar_0Rx9BpytgiH6wX043ZF6mkDEC1e3jF6WGWs2uLNo.jpeg)

---

> ### A description of the problem and a discussion of the background.

The great cities of Latin America have experienced a chaotic evolution during the last decades, thus generating the proliferation of poor sectors and a segregating urban environment.

Because of this, Latin American cities are characterized by profound socioeconomic inequality.

To which, the national capital of Chile is no exception.

The metropolitan capital is located in the province of Santiago, which is subdivided into 32 communes and houses more than 5 million inhabitants.

*(According to the territorial organization of Chile, the commune is the basic unit of the political-administrative structure.)*

The purpose of this notebook is to visually explore those territorial gaps present in the 32 communes that constitute the metropolitan city.

For this, the communal territories will be characterized and grouped based on a series of indicators regarding; quality of life, socio-economic and infrastructure, developed by various national and international studies.

Which will be contrasted with availability of services and businesses located in each communal territory.


---

> ### A description of the data and how it will be used to solve the problem.

**Data**

* *Censo 2017* - Instituto Nacional de Estadísticas (INE)
* *Casen 2015* - Ministerio de Desarrollo Social y Familia (MDSF)
* *IDSE 2013* - Observatorio Chileno de Salud Pública (OCHISAP)
* *ICVU 2019* - Camara Chilena de la Construcción (CCHC) & Instituto de Estudios Urbanos y Territoriales (IEUT)
* *Wikipedia*
* *Foresquare API*

**Variables considereds**
 
* [Communes] = Name of the communes that make up the province of Santiago.
* [Population] = Number of inhabitants per commune.
* [Area] = Surface per commune, in km ^ 2.
* [Density] = Population over Area by commune.

* [IPM]= Multidimensional poverty Index per commune.

* [ICVU] = Urban Quality of Life Index per commune.

* [IDSE] = Socio-Economic Development Index per commune, based on the HDI (UNDP)

* [Coordinate] = Latitude and Longitude per commune, in decimal degree.

* [Venues] = Detail of nearby places per commune; quantity and category.

---

In [None]:
#$ conda update -n base -c defaults conda
#!conda install -c anaconda lxml

In [1]:
#We import the basic libraries
import numpy as np
import pandas as pd
import lxml

In [37]:
#Web scrapping 
link1 = "https://es.wikipedia.org/wiki/Anexo:Comunas_de_Chile"
'https://es.wikipedia.org/wiki/Provincia_de_Santiago'
tables = pd.read_html(link1)
tables

[     CUT (Código Único Territorial)                Nombre  Unnamed: 2  \
 0                             15101                 Arica         NaN   
 1                             15102             Camarones         NaN   
 2                             15201                 Putre         NaN   
 3                             15202         General Lagos         NaN   
 4                              1101               Iquique         NaN   
 5                              1107         Alto Hospicio         NaN   
 6                              1401          Pozo Almonte         NaN   
 7                              1402                Camiña         NaN   
 8                              1403              Colchane         NaN   
 9                              1404                 Huara         NaN   
 10                             1405                  Pica         NaN   
 11                             2101           Antofagasta         NaN   
 12                             2102  

In [38]:
#Transform to Dataframe
df = pd.DataFrame(tables[0])

print(df.shape)
df.head(10)

(346, 12)


Unnamed: 0,CUT (Código Único Territorial),Nombre,Unnamed: 2,Provincia,Región,Superficie(km2),Población2017,Densidad(hab./km2),IDH 2005,IDH 2005.1,Latitud,Longitud
0,15101,Arica,,Arica,Arica y Parinacota,47994,221 364,46,384.0,Alto,"-18°27'18""","-70°17'24"""
1,15102,Camarones,,Arica,Arica y Parinacota,3927,1255,32,751.0,Alto,"-19°1'1,2""","-69°52'1,2"""
2,15201,Putre,,Parinacota,Arica y Parinacota,59025,2765,47,707.0,Alto,"-18°12'0""","-69°34'58,8"""
3,15202,General Lagos,,Parinacota,Arica y Parinacota,22444,684,31,67.0,Medio,"-17°39'10,8""","-69°38'6"""
4,1101,Iquique,,Iquique,Tarapacá,22421,191 468,854,766.0,Alto,"-20°14'38,4""","-70°8'20,4"""
5,1107,Alto Hospicio,,Iquique,Tarapacá,5729,108 375,1892,,Bajo,"-20°15'25,2""","-70°1'19,2"""
6,1401,Pozo Almonte,,Tamarugal,Tarapacá,"13 765,8",15 711,114,722.0,Alto,"-20°17'27,6""","-69°41'45,6"""
7,1402,Camiña,,Tamarugal,Tarapacá,22002,1250,57,619.0,Medio,"-20°28'58,8""","-69°22'1,2"""
8,1403,Colchane,,Tamarugal,Tarapacá,40156,1728,43,603.0,Medio,"-19°17'2,4""","-68°40'30"""
9,1404,Huara,,Tamarugal,Tarapacá,"10 474,6",2730,26,676.0,Medio,"-19°48'32,4""","-69°58'19,2"""


In [39]:
#We select the communes belonging to the province of Santiago
#We sort by name of commune
#We drop the unnecessary columns 
#And we rename the columns
df = df[df['Provincia'] == 'Santiago']
df = df.sort_values('Nombre')
df = df.drop(['CUT (Código Único Territorial)', 'Unnamed: 2', 'Provincia', 'Región', 'IDH 2005', 'IDH 2005.1'], axis=1).reset_index(drop=True)
df.columns = (['Commune', 'Area', 'Population', 'Density', 'Latitude', 'Longitude'])
df

Unnamed: 0,Commune,Area,Population,Density,Latitude,Longitude
0,Cerrillos,21,80 832,38491,"-33°30'0""","-70°43'0"""
1,Cerro Navia,11,132 622,"12 056,5","-33°25'19,2""","-70°44'6"""
2,Conchalí,107,126 955,"11 865,0","-33°22'48""","-70°40'30"""
3,El Bosque,142,162 505,"11 444,0","-33°34'1,2""","-70°40'30"""
4,Estación Central,15,147 041,98027,"-33°27'32,4""","-70°41'56,4"""
5,Huechuraba,448,98 671,22025,"-33°22'4,8""","-70°38'2,4"""
6,Independencia,7,100 281,"14 325,9","-33°24'46,8""","-70°39'57,6"""
7,La Cisterna,10,90 119,90119,"-33°31'44,4""","-70°39'46,8"""
8,La Florida,702,366 916,52267,"-33°31'30""","-70°32'16,8"""
9,La Granja,10,116 571,"11 657,1","-33°31'60""","-70°37'30"""


In [40]:
#Web scrapping 2
link2 = "https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile"
tables = pd.read_html(link2)
tables

[                                                    0  \
 0   Comunas en la Provincia de Santiago  Cerrillos...   
 1                 Comunas en la Provincia de Santiago   
 2                                           Cerrillos   
 3                                         Cerro Navia   
 4                                            Conchalí   
 5                                           El Bosque   
 6                                    Estación Central   
 7                                          Huechuraba   
 8                                       Independencia   
 9                                         La Cisterna   
 10                                         La Florida   
 11                                         La Pintana   
 12                                          La Granja   
 13                        Comunas en otras provincias   
 14                                            Colina†   
 15                                             Lampa†   
 16           

In [41]:
#Transform to Dataframe
icvu = pd.DataFrame(tables[3])

print(icvu.shape)
icvu

(36, 8)


Unnamed: 0,Comuna,Ubicación?,Población (2017)?,Viviendas (2002)?,Densidad poblacional (2002) ?,Crecimiento demográfico (2002-2017)?,ICVU (2019)?,Pobreza (2015)?
0,Cerrillos,surponiente,80832,19811.0,4329.08,12.9%,47.82 (74),19.7
1,Cerro Navia,norponiente,132622,35277.0,13482.91,-10.7%,42.42 (92),35.6
2,Conchalí,norte,126955,32609.0,12070.29,-4.4%,46.52 (84),21.6
3,El Bosque,sur,162505,42808.0,12270.72,-7.3%,48.54 (70),27.0
4,Estación Central,surponiente,147041,32357.0,9036.31,16.6%,49.96 (64),14.5
5,Huechuraba,norte,98671,16386.0,3493.87,33.4%,55.70 (33),23.8
6,Independencia,norte,100281,18588.0,8824.66,55.4%,49.51 (66),21.3
7,La Cisterna,sur,90119,22817.0,8477.89,6%,51.63 (53),20.0
8,La Florida,suroriente,366916,97137.0,9356.62,0.6%,57.23 (22),17.0
9,La Granja,sur,116571,32035.0,13212.36,-12.2%,51.97 (51),24.5


In [42]:
#Data cleaning
icvu = icvu[['Comuna','ICVU (2019)?', 'Pobreza (2015)?']]
icvu.columns=(['Commune', 'ICVU', 'IPM'])
s=''
icvu['ICVU']=icvu['ICVU'].str[0:6]
icvu

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Commune,ICVU,IPM
0,Cerrillos,47.82,19.7
1,Cerro Navia,42.42,35.6
2,Conchalí,46.52,21.6
3,El Bosque,48.54,27.0
4,Estación Central,49.96,14.5
5,Huechuraba,55.7,23.8
6,Independencia,49.51,21.3
7,La Cisterna,51.63,20.0
8,La Florida,57.23,17.0
9,La Granja,51.97,24.5


In [43]:
# We join both Dataframe
df_stgo = df.join(icvu.set_index('Commune'), on='Commune')

df_stgo

Unnamed: 0,Commune,Area,Population,Density,Latitude,Longitude,ICVU,IPM
0,Cerrillos,21,80 832,38491,"-33°30'0""","-70°43'0""",47.82,19.7
1,Cerro Navia,11,132 622,"12 056,5","-33°25'19,2""","-70°44'6""",42.42,35.6
2,Conchalí,107,126 955,"11 865,0","-33°22'48""","-70°40'30""",46.52,21.6
3,El Bosque,142,162 505,"11 444,0","-33°34'1,2""","-70°40'30""",48.54,27.0
4,Estación Central,15,147 041,98027,"-33°27'32,4""","-70°41'56,4""",49.96,14.5
5,Huechuraba,448,98 671,22025,"-33°22'4,8""","-70°38'2,4""",55.7,23.8
6,Independencia,7,100 281,"14 325,9","-33°24'46,8""","-70°39'57,6""",49.51,21.3
7,La Cisterna,10,90 119,90119,"-33°31'44,4""","-70°39'46,8""",51.63,20.0
8,La Florida,702,366 916,52267,"-33°31'30""","-70°32'16,8""",57.23,17.0
9,La Granja,10,116 571,"11 657,1","-33°31'60""","-70°37'30""",51.97,24.5


In [44]:
#Data wrangling
df_stgo['Density'] = df_stgo['Density'].str.replace(',','.')

for i in range(0,32):
    x= df_stgo.iloc[i,2]
    if len(x)==6:
        x1=x[0:2]
    else:
        x1=x[0:3]
    x2=x[-3:]
    z=x1+x2
    df_stgo.iloc[i,2]=z

for i in range(0,32):
    x= df_stgo.iloc[i,3]
    if len(x)==8:
        x1=x[0:2]
        x2=x[-5:]
        z=x1+x2
        df_stgo.iloc[i,3]=z

df_stgo

Unnamed: 0,Commune,Area,Population,Density,Latitude,Longitude,ICVU,IPM
0,Cerrillos,21,80832,38491.0,"-33°30'0""","-70°43'0""",47.82,19.7
1,Cerro Navia,11,132622,12056.5,"-33°25'19,2""","-70°44'6""",42.42,35.6
2,Conchalí,107,126955,11865.0,"-33°22'48""","-70°40'30""",46.52,21.6
3,El Bosque,142,162505,11444.0,"-33°34'1,2""","-70°40'30""",48.54,27.0
4,Estación Central,15,147041,98027.0,"-33°27'32,4""","-70°41'56,4""",49.96,14.5
5,Huechuraba,448,98671,22025.0,"-33°22'4,8""","-70°38'2,4""",55.7,23.8
6,Independencia,7,100281,14325.9,"-33°24'46,8""","-70°39'57,6""",49.51,21.3
7,La Cisterna,10,90119,90119.0,"-33°31'44,4""","-70°39'46,8""",51.63,20.0
8,La Florida,702,366916,52267.0,"-33°31'30""","-70°32'16,8""",57.23,17.0
9,La Granja,10,116571,11657.1,"-33°31'60""","-70°37'30""",51.97,24.5


In [None]:
#!conda install -c anaconda xlrd

In [45]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Commune,IDSE
0,Cerrillos,0.676
1,Cerro Navia,0.562
2,Conchalí,0.662
3,El Bosque,0.611
4,Estación Central,0.686
5,Huechuraba,0.655
6,Independencia,0.731
7,La Cisterna,0.724
8,La Florida,0.741
9,La Granja,0.592


In [46]:
#We join a third Dataframe
df_stgo = df_stgo.join(idse.set_index('Commune'), on='Commune')

df_stgo

Unnamed: 0,Commune,Area,Population,Density,Latitude,Longitude,ICVU,IPM,IDSE
0,Cerrillos,21,80832,38491.0,"-33°30'0""","-70°43'0""",47.82,19.7,0.676
1,Cerro Navia,11,132622,12056.5,"-33°25'19,2""","-70°44'6""",42.42,35.6,0.562
2,Conchalí,107,126955,11865.0,"-33°22'48""","-70°40'30""",46.52,21.6,0.662
3,El Bosque,142,162505,11444.0,"-33°34'1,2""","-70°40'30""",48.54,27.0,0.611
4,Estación Central,15,147041,98027.0,"-33°27'32,4""","-70°41'56,4""",49.96,14.5,0.686
5,Huechuraba,448,98671,22025.0,"-33°22'4,8""","-70°38'2,4""",55.7,23.8,0.655
6,Independencia,7,100281,14325.9,"-33°24'46,8""","-70°39'57,6""",49.51,21.3,0.731
7,La Cisterna,10,90119,90119.0,"-33°31'44,4""","-70°39'46,8""",51.63,20.0,0.724
8,La Florida,702,366916,52267.0,"-33°31'30""","-70°32'16,8""",57.23,17.0,0.741
9,La Granja,10,116571,11657.1,"-33°31'60""","-70°37'30""",51.97,24.5,0.592


In [None]:
# The code was removed by Watson Studio for sharing.

In [26]:
#We install/import libraries for mapping and geolocation

!conda install -c conda-forge folium=0.10.0 --yes

!conda install -c conda-forge geopy --yes 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.10.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.10.0              |             py_1          59 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    branca:          0.3.1-py_0        conda-forge
    folium:   

In [27]:
import folium
from geopy.geocoders import Nominatim

In [28]:
address = 'Santiago, Chile'

geolocator = Nominatim(user_agent="007")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Santiago are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Santiago are -33.4377968, -70.6504451.


In [47]:
#We replace the coordinates in DMS format with coordinates in decimal degree

for i in range(0,len(df_stgo.index)):
    try:
        
        address = df_stgo.iloc[i,0]+', Chile'
        #print(address)
        geolocator = Nominatim(user_agent="007")
        location = geolocator.geocode(address)
        df_stgo.iloc[i,4] = location.latitude
        df_stgo.iloc[i,5] = location.longitude
        #latitude = location.latitude
        #longitude = location.longitude
        #print('The geograpical coordinate of Commune are {}, {}.'.format(latitude, longitude))
    
    except(AttributeError):
        pass
    
df_stgo

Unnamed: 0,Commune,Area,Population,Density,Latitude,Longitude,ICVU,IPM,IDSE
0,Cerrillos,21,80832,38491.0,-33.5025,-70.7159,47.82,19.7,0.676
1,Cerro Navia,11,132622,12056.5,-33.4251,-70.744,42.42,35.6,0.562
2,Conchalí,107,126955,11865.0,-33.3851,-70.6745,46.52,21.6,0.662
3,El Bosque,142,162505,11444.0,-33.5624,-70.6768,48.54,27.0,0.611
4,Estación Central,15,147041,98027.0,-33.4637,-70.705,49.96,14.5,0.686
5,Huechuraba,448,98671,22025.0,-33.3655,-70.6432,55.7,23.8,0.655
6,Independencia,7,100281,14325.9,-33.4153,-70.6659,49.51,21.3,0.731
7,La Cisterna,10,90119,90119.0,-33.5295,-70.6643,51.63,20.0,0.724
8,La Florida,702,366916,52267.0,-33.5307,-70.544,57.23,17.0,0.741
9,La Granja,10,116571,11657.1,-33.5359,-70.6223,51.97,24.5,0.592


In [48]:
df_stgo = df_stgo.astype({"Latitude":'float64', "Longitude":'float64'})

In [56]:
# create map of Santiago using latitude and longitude values
map_stgo = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, commune, population in zip(df_stgo['Latitude'], df_stgo['Longitude'], df_stgo['Commune'], df_stgo['Population']):
    label = '{}, {}'.format(commune, population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_stgo)  
    
map_stgo

In [35]:
# The code was removed by Watson Studio for sharing.

Commune        object
Area           object
Population     object
Density        object
Latitude       object
Longitude      object
ICVU           object
IPM           float64
IDSE          float64
dtype: object

In [53]:
# The code was removed by Watson Studio for sharing.

In [54]:
# The code was removed by Watson Studio for sharing.

5250595.0