# Oxxo: Cleaning

This is the second part recovering information about the popular Oxxo stores.

Here, the data previously obtained from Google Maps will be cleaned.

Additionally, and as a preview of what was attempted, socioeconomic information about the locations where the establishments were located was crossed, but unfortunately, the level of granularity offered by institutes such as the INEGI (Instituto Nacional de Estadística y Geografía) or the CONEVAL (Consejo Nacional de Evaluación de la Política de Desarrollo Social) only reaches the level of municipalities (such as the Azcapotzalco delegation in Mexico City) and, therefore, a good resolution was not achieved in this aspect.

Thus, only information regarding the socioeconomic status by municipality was added. This information comes from the "Encuesta Nacional de Ingresos y Gastos de los Hogares 2020" or by its abbreviation ENIGH.

ENIGH evaluates aspects such as household income and expenses by subcategories, description of housing and household, labor aspects, etc. Here, the dataset used "conjunto_de_datos_viviendas_enigh_2020_ns.csv" represents 87755 households throughout the national territory (Mexico).

UPDATE: More info about the income per home in the mexico city was found on [datos.cdmx](https://datos.cdmx.gob.mx/dataset/ingresos-trimestrales-para-la-ciudad-de-mexico) as a shp file (Total quarterly income per household). This data has a better resolution about the average income and it was added at the end of the notebook. In addition, the data says the info was created in 2014, but the last update was done in january 2023, so I'll trust in the data

In [1]:
import pandas as pd
import numpy as np
import geopy
import os
import geopandas as gpd
from shapely.geometry import Point

## Loading results

In [2]:
# Loading results from web-scrapping
oxxo_1 = pd.read_csv("../data/oxxo/oxxo_coordinates_0.csv")
oxxo_2 = pd.read_csv("../data/oxxo/oxxo_coordinates_1.csv")
oxxo_3 = pd.read_csv("../data/oxxo/oxxo_coordinates_2.csv")

oxxo_df = pd.concat([oxxo_1,oxxo_2,oxxo_3])
oxxo_df

Unnamed: 0,latitude,longitude,name,comments,rating,cp
0,19.353753,-99.189937,OXXO Helenico,4.0,4.3,1000
1,19.348316,-99.185520,Oxxo La Paz,13.0,3.5,1000
2,19.351621,-99.185912,OXXO,9.0,3.0,1000
3,19.341629,-99.203068,Oxxo,27.0,4.0,1000
4,19.360927,-99.185247,Oxxo,15.0,3.0,1000
...,...,...,...,...,...,...
38929,19.300681,-99.112765,Oxxo Camil,2.0,3.0,16797
38930,19.266287,-98.900100,OXXO OBREGÓN MEX,11.0,3.5,16797
38931,19.291616,-98.908509,Oxxo Santa cruz II,12.0,3.4,16797
38932,19.338992,-98.953441,Oxxo Geo san Isidro,24.0,2.5,16797


## Cleaning web-scrapping results

In [3]:
# Filtering false positives
mask_only_oxxo = oxxo_df["name"].str.contains("oxxo", case=False)
mask_only_stores = ~oxxo_df["name"].str.contains("\bgas\b", case=False)

oxxo_df = oxxo_df[mask_only_oxxo & mask_only_stores]
oxxo_df

Unnamed: 0,latitude,longitude,name,comments,rating,cp
0,19.353753,-99.189937,OXXO Helenico,4.0,4.3,1000
1,19.348316,-99.185520,Oxxo La Paz,13.0,3.5,1000
2,19.351621,-99.185912,OXXO,9.0,3.0,1000
3,19.341629,-99.203068,Oxxo,27.0,4.0,1000
4,19.360927,-99.185247,Oxxo,15.0,3.0,1000
...,...,...,...,...,...,...
38929,19.300681,-99.112765,Oxxo Camil,2.0,3.0,16797
38930,19.266287,-98.900100,OXXO OBREGÓN MEX,11.0,3.5,16797
38931,19.291616,-98.908509,Oxxo Santa cruz II,12.0,3.4,16797
38932,19.338992,-98.953441,Oxxo Geo san Isidro,24.0,2.5,16797


In [4]:
# Removing duplicates
oxxo_df = oxxo_df.drop_duplicates(["latitude", "longitude"]).reset_index(drop=True)
oxxo_df

Unnamed: 0,latitude,longitude,name,comments,rating,cp
0,19.353753,-99.189937,OXXO Helenico,4.0,4.3,1000
1,19.348316,-99.185520,Oxxo La Paz,13.0,3.5,1000
2,19.351621,-99.185912,OXXO,9.0,3.0,1000
3,19.341629,-99.203068,Oxxo,27.0,4.0,1000
4,19.360927,-99.185247,Oxxo,15.0,3.0,1000
...,...,...,...,...,...,...
1800,19.263130,-99.105382,Oxxo,,,16000
1801,19.273990,-99.124151,OXXO Aldama,321.0,3.9,16010
1802,19.274346,-99.120606,Museo OXXO,1.0,5.0,16010
1803,32.524194,-116.997760,OXXO POSTAL,1.0,5.0,16083


## Adding address information and cleaning

In [5]:
%%time 

# Getting the real postal code of the oxxo and address

# ArcGIS has data more accurate
geolocator = geopy.ArcGIS()

# geopy.geolocator is expecting a string with the coordinates rather than two separated values
get_address = lambda row: ",".join(row.transform(str).values)
lat_lon_list = oxxo_df[["latitude", "longitude"]].apply(get_address, axis=1)

address_df = pd.DataFrame()

for i, lat_lon in enumerate(lat_lon_list):
    
    address = geolocator.reverse(lat_lon)
    current_oxxo_address = pd.DataFrame(address.raw, index=[i])
    address_df = pd.concat([address_df, current_oxxo_address])

    if ((i + 1) % 10) == 0:
        print(f"Iteration {i + 1} out {len(lat_lon_list)}", end="\r")

print(f"Iteration {i + 1} out {len(lat_lon_list)}", end="\r")

CPU times: user 23.8 s, sys: 1.54 s, total: 25.4 s
Wall time: 18min 3s


In [6]:
get_columns = ["Address","Neighborhood", "City", "Postal", "Subregion", "Region"]
new_address_df = address_df[get_columns]

In [7]:
# Joining data from scrapping with the address from geopy.geolocator ArcGIS
new_oxxo_df = oxxo_df.merge(new_address_df, right_index=True, left_index=True, how="inner")
new_oxxo_df.drop("cp", inplace=True, axis=1)

In [8]:
# Getting data only in mexico state and mexico city
mask_cdmx = (new_oxxo_df["Region"] == "Ciudad de México") | (new_oxxo_df["Region"] == "México")
new_oxxo_df = new_oxxo_df[mask_cdmx].reset_index(drop=True)
new_oxxo_df.head()

Unnamed: 0,latitude,longitude,name,comments,rating,Address,Neighborhood,City,Postal,Subregion,Region
0,19.353753,-99.189937,OXXO Helenico,4.0,4.3,Avenida Revolución,Sn Ángel,Guadalupe Inn,1020,Álvaro Obregón,Ciudad de México
1,19.348316,-99.18552,Oxxo La Paz,13.0,3.5,Avenida Miguel Ángel de Quevedo 36B,,Chimalistac,1050,Álvaro Obregón,Ciudad de México
2,19.351621,-99.185912,OXXO,9.0,3.0,Avenida Vito Alessio Robles 12,,Florida,1030,Álvaro Obregón,Ciudad de México
3,19.341629,-99.203068,Oxxo,27.0,4.0,Calle Veracruz 87,,Progreso Tizapán,1080,Álvaro Obregón,Ciudad de México
4,19.360927,-99.185247,Oxxo,15.0,3.0,Calle Manuel M. Ponce,Sn Ángel,Guadalupe Inn,1020,Álvaro Obregón,Ciudad de México


## Normalization

In [9]:
def ConvertAccentMarks(text):
    
    convert_marks = {"á":"a","é":"e","í":"i","ó":"o","ú":"u"}
    new_string = ""
    
    for letter in text:
        
        if letter in convert_marks.keys():
            new_string += convert_marks[letter] 
            
        else:
            new_string += letter
    
    return new_string
            
new_oxxo_df["Subregion"] = new_oxxo_df["Subregion"].apply(str.lower).apply(ConvertAccentMarks)
new_oxxo_df["Region"] = new_oxxo_df["Region"].apply(str.lower)

## Adding mexico city data about socieconomic status

In [10]:
# Loading socieconomic status by municipality
path_enigh = "../data/oxxo/conjunto_de_datos_enigh_ns_2020_csv/" 

path_vivienda = "conjunto_de_datos_viviendas_enigh_2020_ns/conjunto_de_datos/conjunto_de_datos_viviendas_enigh_2020_ns.csv"
path_state_vivienda = "conjunto_de_datos_viviendas_enigh_2020_ns/catalogos/ubica_geo.csv"

In [11]:
vivienda_df = pd.read_csv(os.path.join(path_enigh, path_vivienda),
                          usecols=["folioviv","ubica_geo","est_socio"])

states_keys = pd.read_csv(os.path.join(path_enigh, path_state_vivienda))

In [12]:
# est_socio equivalences
#1=Bajo,
#2=Medio bajo,
#3=Medio alto,
#4=Alto,

vivienda_df.head()

Unnamed: 0,folioviv,ubica_geo,est_socio
0,100013605,1001,3
1,100013606,1001,3
2,100017801,1001,3
3,100017802,1001,3
4,100017803,1001,3


In [13]:
# Transform int to str in the socieconomic status
status_equiv = {1:"Bajo", 2:"Medio Bajo", 3:"Medio Alto", 4:"Alto"}
vivienda_df["est_socio"] = vivienda_df["est_socio"].apply(lambda x: status_equiv[x])
vivienda_df.head()

Unnamed: 0,folioviv,ubica_geo,est_socio
0,100013605,1001,Medio Alto
1,100013606,1001,Medio Alto
2,100017801,1001,Medio Alto
3,100017802,1001,Medio Alto
4,100017803,1001,Medio Alto


In [14]:
states_keys.head()

Unnamed: 0,ubica_geo,entidad,desc_ent,municipio,des_mun
0,1001,1,Aguascalientes,1,Aguascalientes
1,1002,1,Aguascalientes,2,Asientos
2,1003,1,Aguascalientes,3,Calvillo
3,1004,1,Aguascalientes,4,Cosio
4,1005,1,Aguascalientes,5,Jesus Maria


## Filtering information about socioeconomic status in the mexico city and joining

In [15]:
mask_states = (states_keys["desc_ent"] == "Ciudad De México") | (states_keys["desc_ent"] == "México" )
states_keys = states_keys[mask_states]
states_keys.head()

Unnamed: 0,ubica_geo,entidad,desc_ent,municipio,des_mun
166,9002,9,Ciudad De México,2,Azcapotzalco
167,9003,9,Ciudad De México,3,Coyoacan
168,9004,9,Ciudad De México,4,Cuajimalpa De Morelos
169,9005,9,Ciudad De México,5,Gustavo A. Madero
170,9006,9,Ciudad De México,6,Iztacalco


In [16]:
# Joining geographic information with the socieconomic status
population_df = states_keys.merge(vivienda_df, how="left", on="ubica_geo")
population_df = population_df.drop(["municipio","folioviv","ubica_geo","entidad"], axis=1)
population_df

Unnamed: 0,desc_ent,des_mun,est_socio
0,Ciudad De México,Azcapotzalco,Medio Alto
1,Ciudad De México,Azcapotzalco,Medio Alto
2,Ciudad De México,Azcapotzalco,Medio Alto
3,Ciudad De México,Azcapotzalco,Medio Alto
4,Ciudad De México,Azcapotzalco,Medio Alto
...,...,...,...
6047,México,San Jose Del Rincon,Bajo
6048,México,San Jose Del Rincon,Bajo
6049,México,San Jose Del Rincon,Bajo
6050,México,San Jose Del Rincon,Bajo


In [17]:
# Normalization
population_df["des_mun"] = population_df["des_mun"].apply(str.lower).apply(ConvertAccentMarks)
population_df["desc_ent"] = population_df["desc_ent"].apply(str.lower)

In [18]:
# Getting counts of the socieconomic status by municipality
population_group = population_df.groupby(["desc_ent","des_mun"]).value_counts().reset_index()
population_group.rename({0:"Counts"}, inplace=True, axis=1)
population_group.head()

Unnamed: 0,desc_ent,des_mun,est_socio,Counts
0,ciudad de méxico,alvaro obregon,Medio Bajo,104
1,ciudad de méxico,alvaro obregon,Medio Alto,80
2,ciudad de méxico,alvaro obregon,Alto,24
3,ciudad de méxico,azcapotzalco,Medio Alto,47
4,ciudad de méxico,azcapotzalco,Medio Bajo,40


In [19]:
# converting one values in column into multiples columns with the counts of
# socieconomic status
population_est_socio = population_group.pivot_table("Counts", ["desc_ent","des_mun"], "est_socio")
population_est_socio = population_est_socio.reset_index()
population_est_socio = population_est_socio[["desc_ent","des_mun","Bajo","Medio Bajo","Medio Alto","Alto"]]
population_est_socio.head()

est_socio,desc_ent,des_mun,Bajo,Medio Bajo,Medio Alto,Alto
0,ciudad de méxico,alvaro obregon,,104.0,80.0,24.0
1,ciudad de méxico,azcapotzalco,,40.0,47.0,23.0
2,ciudad de méxico,benito juarez,,,54.0,39.0
3,ciudad de méxico,coyoacan,,53.0,60.0,43.0
4,ciudad de méxico,cuajimalpa de morelos,,48.0,22.0,11.0


In [20]:
# Reordering columns
population_est_socio[["Bajo","Medio Bajo","Medio Alto","Alto"]] = population_est_socio[["Bajo","Medio Bajo","Medio Alto","Alto"]].astype("Int32")
population_est_socio.head()

est_socio,desc_ent,des_mun,Bajo,Medio Bajo,Medio Alto,Alto
0,ciudad de méxico,alvaro obregon,,104.0,80,24
1,ciudad de méxico,azcapotzalco,,40.0,47,23
2,ciudad de méxico,benito juarez,,,54,39
3,ciudad de méxico,coyoacan,,53.0,60,43
4,ciudad de méxico,cuajimalpa de morelos,,48.0,22,11


In [21]:
# Joining the oxxo data with the socieconomic status of the municipality
new_oxxo_df = new_oxxo_df.merge(population_est_socio, how="left",
                  left_on=["Region", "Subregion"], right_on=["desc_ent", "des_mun"])

new_oxxo_df.drop(["des_mun","desc_ent"], axis=1, inplace=True)

In [22]:
new_oxxo_df.head()

Unnamed: 0,latitude,longitude,name,comments,rating,Address,Neighborhood,City,Postal,Subregion,Region,Bajo,Medio Bajo,Medio Alto,Alto
0,19.353753,-99.189937,OXXO Helenico,4.0,4.3,Avenida Revolución,Sn Ángel,Guadalupe Inn,1020,alvaro obregon,ciudad de méxico,,104,80,24
1,19.348316,-99.18552,Oxxo La Paz,13.0,3.5,Avenida Miguel Ángel de Quevedo 36B,,Chimalistac,1050,alvaro obregon,ciudad de méxico,,104,80,24
2,19.351621,-99.185912,OXXO,9.0,3.0,Avenida Vito Alessio Robles 12,,Florida,1030,alvaro obregon,ciudad de méxico,,104,80,24
3,19.341629,-99.203068,Oxxo,27.0,4.0,Calle Veracruz 87,,Progreso Tizapán,1080,alvaro obregon,ciudad de méxico,,104,80,24
4,19.360927,-99.185247,Oxxo,15.0,3.0,Calle Manuel M. Ponce,Sn Ángel,Guadalupe Inn,1020,alvaro obregon,ciudad de méxico,,104,80,24


In [23]:
new_oxxo_df.to_csv("../data/oxxo/oxxo_data.csv", index=False)

## Parsing information about income and other deatils in the mexico city

In [24]:
cdmx_trim_inc = gpd.read_file("../data/oxxo/ingresos_trimestrales/ingresos_trimestrales/ingresos_trimestrales.shp")
# Original proj is EPSG:32614
cdmx_trim_inc = cdmx_trim_inc.to_crs("WGS84")
cdmx_trim_inc.head()

Unnamed: 0,CVEGEO,IngMon_14,ID,geometry
0,0900200010383,34629.3,1,"POLYGON ((-99.15142 19.47910, -99.15152 19.478..."
1,0900200010379,33061.7,2,"POLYGON ((-99.15813 19.47749, -99.15824 19.477..."
2,090020001067A,44264.7,3,"POLYGON ((-99.18126 19.47366, -99.18115 19.473..."
3,0900200010699,47045.3,4,"POLYGON ((-99.17616 19.46819, -99.17657 19.467..."
4,0900200010260,59562.1,5,"POLYGON ((-99.19685 19.49267, -99.19640 19.492..."


In [25]:
cdmx_trim_inc["CVEGEO_alcaldia"] = cdmx_trim_inc.CVEGEO.str.extract("(^\d{5})")
cdmx_trim_inc = cdmx_trim_inc.rename_geometry("geometry_colonias_cdmx")
cdmx_trim_inc.head()

Unnamed: 0,CVEGEO,IngMon_14,ID,geometry_colonias_cdmx,CVEGEO_alcaldia
0,0900200010383,34629.3,1,"POLYGON ((-99.15142 19.47910, -99.15152 19.478...",9002
1,0900200010379,33061.7,2,"POLYGON ((-99.15813 19.47749, -99.15824 19.477...",9002
2,090020001067A,44264.7,3,"POLYGON ((-99.18126 19.47366, -99.18115 19.473...",9002
3,0900200010699,47045.3,4,"POLYGON ((-99.17616 19.46819, -99.17657 19.467...",9002
4,0900200010260,59562.1,5,"POLYGON ((-99.19685 19.49267, -99.19640 19.492...",9002


In [26]:
cdmx_trim_inc.drop("ID", axis=1, inplace=True)
cdmx_trim_inc.explore("geometry_colonias_cdmx", legend=False)

In [27]:
cdmx_poly_alc = gpd.read_file("../data/oxxo/poligonos_alcaldias/poligonos_alcaldias_cdmx.shp")
cdmx_poly_alc.head()

Unnamed: 0,CVEGEO,CVE_ENT,CVE_MUN,NOMGEO,geometry
0,9002,9,2,Azcapotzalco,"POLYGON ((-99.18231 19.50748, -99.18229 19.507..."
1,9003,9,3,Coyoacán,"POLYGON ((-99.13427 19.35654, -99.13397 19.356..."
2,9004,9,4,Cuajimalpa de Morelos,"POLYGON ((-99.25738 19.40112, -99.25698 19.400..."
3,9005,9,5,Gustavo A. Madero,"POLYGON ((-99.11124 19.56150, -99.11485 19.557..."
4,9006,9,6,Iztacalco,"POLYGON ((-99.05751 19.40673, -99.05753 19.406..."


In [28]:
cdmx_poly_alc.drop(["CVE_ENT","CVE_MUN"], axis=1, inplace=True)
cdmx_poly_alc = cdmx_poly_alc.rename_geometry("geometry_alcaldia_cdmx")

In [29]:
cdmx_poly_alc.explore("geometry_alcaldia_cdmx", legend=False)

In [30]:
cdmx_polygon = cdmx_poly_alc.merge(cdmx_trim_inc, left_on="CVEGEO", right_on="CVEGEO_alcaldia", how="left")
cdmx_polygon.drop("CVEGEO_x", axis=1, inplace=True)
cdmx_polygon.rename({"CVEGEO_y":"CVEGEO_completo_colonias",
                     "IngMon_14":"ingreso_promedio_trimestral_por_colonia"},
                    inplace=True, axis=1)
cdmx_polygon.head()

Unnamed: 0,NOMGEO,geometry_alcaldia_cdmx,CVEGEO_completo_colonias,ingreso_promedio_trimestral_por_colonia,geometry_colonias_cdmx,CVEGEO_alcaldia
0,Azcapotzalco,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",0900200010383,34629.3,"POLYGON ((-99.15142 19.47910, -99.15152 19.478...",9002
1,Azcapotzalco,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",0900200010379,33061.7,"POLYGON ((-99.15813 19.47749, -99.15824 19.477...",9002
2,Azcapotzalco,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",090020001067A,44264.7,"POLYGON ((-99.18126 19.47366, -99.18115 19.473...",9002
3,Azcapotzalco,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",0900200010699,47045.3,"POLYGON ((-99.17616 19.46819, -99.17657 19.467...",9002
4,Azcapotzalco,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",0900200010260,59562.1,"POLYGON ((-99.19685 19.49267, -99.19640 19.492...",9002


In [31]:
new_col_names = {"NOMGEO":"boroughs_name",
                 "geometry_alcaldia_cdmx":"boroughs_geometry", 
                 "CVEGEO_completo_colonias":"neighborhoods_id", 
                 "ingreso_promedio_trimestral_por_colonia":"neighborhoods_avg_income_quarterly",
                 "geometry_colonias_cdmx": "neighborhoods_geometry", 
                 "CVEGEO_alcaldia": "boroughs_id"}

cdmx_polygon = cdmx_polygon.rename(new_col_names, axis=1)

friendly_order = cdmx_polygon.columns.sort_values()

cdmx_polygon = cdmx_polygon[friendly_order]
cdmx_polygon.head()

Unnamed: 0,boroughs_geometry,boroughs_id,boroughs_name,neighborhoods_avg_income_quarterly,neighborhoods_geometry,neighborhoods_id
0,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,Azcapotzalco,34629.3,"POLYGON ((-99.15142 19.47910, -99.15152 19.478...",0900200010383
1,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,Azcapotzalco,33061.7,"POLYGON ((-99.15813 19.47749, -99.15824 19.477...",0900200010379
2,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,Azcapotzalco,44264.7,"POLYGON ((-99.18126 19.47366, -99.18115 19.473...",090020001067A
3,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,Azcapotzalco,47045.3,"POLYGON ((-99.17616 19.46819, -99.17657 19.467...",0900200010699
4,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,Azcapotzalco,59562.1,"POLYGON ((-99.19685 19.49267, -99.19640 19.492...",0900200010260


In [32]:
popu_characteristic = pd.read_csv("../data/oxxo/c_demograficas_total_alcaldia.csv", 
                                  usecols=["alcaldia","poblacion"])
popu_characteristic.head()

Unnamed: 0,alcaldia,poblacion
0,AZCAPOTZALCO,432205
1,COYOACAN,614447
2,CUAJIMALPA DE MORELOS,217686
3,GUSTAVO A. MADERO,1173351
4,IZTACALCO,404695


In [33]:
popu_characteristic.alcaldia.unique()

array(['AZCAPOTZALCO', 'COYOACAN', 'CUAJIMALPA DE MORELOS',
       'GUSTAVO A. MADERO', 'IZTACALCO', 'IZTAPALAPA',
       'LA MAGDALENA CONTRERAS', 'MILPA ALTA', 'ALVARO OBREGON',
       'TLAHUAC', 'TLALPAN', 'XOCHIMILCO', 'BENITO JUAREZ', 'CUAUHTEMOC',
       'MIGUEL HIDALGO', 'VENUSTIANO CARRANZA'], dtype=object)

In [34]:
cdmx_polygon.boroughs_name.unique()

array(['Azcapotzalco', 'Coyoacán', 'Cuajimalpa de Morelos',
       'Gustavo A. Madero', 'Iztacalco', 'Iztapalapa',
       'La Magdalena Contreras', 'Milpa Alta', 'Álvaro Obregón',
       'Tláhuac', 'Tlalpan', 'Xochimilco', 'Benito Juárez', 'Cuauhtémoc',
       'Miguel Hidalgo', 'Venustiano Carranza'], dtype=object)

In [35]:
popu_characteristic["alcaldia"] = popu_characteristic.alcaldia.str.lower()
cdmx_polygon["boroughs_name"] = cdmx_polygon.boroughs_name.str.lower().apply(ConvertAccentMarks)

In [36]:
popu_characteristic.alcaldia.unique() == cdmx_polygon.boroughs_name.unique()

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

In [37]:
cdmx_polygon = cdmx_polygon.merge(popu_characteristic, 
                                  left_on="boroughs_name",
                                  right_on="alcaldia",
                                  how="left")

cdmx_polygon.drop("alcaldia", axis=1, inplace=True)
cdmx_polygon.rename({"poblacion":"boroughs_population"}, axis=1, inplace=True)
cdmx_polygon.head()

Unnamed: 0,boroughs_geometry,boroughs_id,boroughs_name,neighborhoods_avg_income_quarterly,neighborhoods_geometry,neighborhoods_id,boroughs_population
0,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,azcapotzalco,34629.3,"POLYGON ((-99.15142 19.47910, -99.15152 19.478...",0900200010383,432205
1,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,azcapotzalco,33061.7,"POLYGON ((-99.15813 19.47749, -99.15824 19.477...",0900200010379,432205
2,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,azcapotzalco,44264.7,"POLYGON ((-99.18126 19.47366, -99.18115 19.473...",090020001067A,432205
3,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,azcapotzalco,47045.3,"POLYGON ((-99.17616 19.46819, -99.17657 19.467...",0900200010699,432205
4,"POLYGON ((-99.18231 19.50748, -99.18229 19.507...",9002,azcapotzalco,59562.1,"POLYGON ((-99.19685 19.49267, -99.19640 19.492...",0900200010260,432205


In [38]:
boroughs = cdmx_polygon[["boroughs_id", "boroughs_name",
                         "boroughs_population", "boroughs_geometry"]]

boroughs = gpd.GeoDataFrame(boroughs, geometry="boroughs_geometry")
boroughs.to_file("../data/oxxo/boroughs_cdmx.geojson", driver="GeoJSON", 
                 index=True, index_label="id")

In [39]:
neighborhoods = cdmx_polygon[["neighborhoods_id",
                             "neighborhoods_avg_income_quarterly",
                             "neighborhoods_geometry"]]

neighborhoods = gpd.GeoDataFrame(neighborhoods, geometry="neighborhoods_geometry")
neighborhoods.to_file("../data/oxxo/neighborhoods_cdmx.geojson", driver="GeoJSON", 
                      index=True, index_label="id")