<a href="https://colab.research.google.com/github/Jennlg/Ingenier-a_de_Caracteristicas/blob/main/Ing_Car.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Consumer Confidence 2022***

---
This notebook contains code to convert the data from the 2022 National Consumer Confidence Survey (ENCO), for subsequent comparison with the data from the National Household Income and Expenditure Survey (ENIGH) for the same year.


First, the libraries necessary for the process will be loaded.

In [None]:
import requests
import zipfile
import os
import pandas as pd
from io import BytesIO

Using the following functions, the .csv files of the year 2022 that we are interested in are downloaded and unzipped.

In [None]:
# URLs of the files to be downloaded
base_url = "https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/"
urls = [f"{base_url}conjunto_de_datos_enco_2022_{str(i).zfill(2)}_csv.zip" for i in range(1, 13)]

# Function to download and extract zip files
def descargar_y_extraer_zip(url, extract_path='/content/enco_2022'):
    response = requests.get(url)
    if response.status_code == 200:
        with zipfile.ZipFile(BytesIO(response.content)) as z:
            z.extractall(extract_path)
            print(f"Archivos extraídos de {url}")
    else:
        print(f"Error al descargar {url}")

# Downloading and extracting the files in the enco_2022 folder
os.makedirs('/content/enco_2022', exist_ok=True)

for url in urls:
    descargar_y_extraer_zip(url)

Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_01_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_02_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_03_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_04_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_05_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_06_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabiertos/2022/conjunto_de_datos_enco_2022_07_csv.zip
Archivos extraídos de https://www.inegi.org.mx/contenidos/programas/enco/datosabier

Now, let's upload, filter, and merge datasets from the ENCO 2022 survey over the course of different months. Specific columns from three types of datasets (cs, viv, cb) are read, combined based on common columns, and concatenate the results into a final ordered DataFrame. Ultimately, the code saves this final processed dataset in a CSV file for later analysis.

In [None]:
# Define the common and specific columns for each dataset
columnas_comunes = ['fol', 'ent', 'con', 'v_sel', 'n_hog', 'h_mud']

# Specific columns for each file
viv_especificas = ['mpio','ageb', 'fch_def']
cs_especificas = ['i_per', 'ing']
cb_especificas = [f'p{i}' for i in range(1, 16)]  # 'p1' a 'p15' para 'cb'

# Define the columns to be used in each dataset
viv_cols = columnas_comunes + viv_especificas
cs_cols = columnas_comunes + cs_especificas
cb_cols = columnas_comunes + cb_especificas

# Function to select relevant columns
def seleccionar_columnas(df, columnas_relevantes):
    return df[columnas_relevantes]

# Read and filter the DataFrames for each file
base_path = '/content/enco_2022'

# Function to read and store the DataFrames
def cargar_datos(mes, tipo):
    file_name = f'conjunto_de_datos_{tipo}_enco_2022_{str(mes).zfill(2)}.CSV'
    folder_path = f'{base_path}/conjunto_de_datos_{tipo}_enco_2022_{str(mes).zfill(2)}/conjunto_de_datos'
    file_path = os.path.join(folder_path, file_name)

    if os.path.exists(file_path):
        return pd.read_csv(file_path)
    else:
        print(f"Archivo no encontrado: {file_name}")
        return pd.DataFrame()  # Devolver DataFrame vacío si no se encuentra el archivo

# Store the filtered DataFrames for each dataset
cs_enco_filtrado = [seleccionar_columnas(cargar_datos(i, 'cs'), cs_cols) for i in range(1, 13)]
viv_enco_filtrado = [seleccionar_columnas(cargar_datos(i, 'viv'), viv_cols) for i in range(1, 13)]
cb_enco_filtrado = [seleccionar_columnas(cargar_datos(i, 'cb'), cb_cols) for i in range(1, 13)]

# Merge the DataFrames by the common columns and concatenate the results
df_final = pd.DataFrame()

for cs_df, viv_df, cb_df in zip(cs_enco_filtrado, viv_enco_filtrado, cb_enco_filtrado):
    # Only merge if the DataFrames are not empty
    if not cs_df.empty and not viv_df.empty and not cb_df.empty:
        temp = pd.merge(cs_df, viv_df, on=columnas_comunes, how='inner')
        temp = pd.merge(temp, cb_df, on=columnas_comunes, how='inner')
        df_final = pd.concat([df_final, temp], ignore_index=True)

# Display the first rows of the final DataFrame
df_final.head()

Unnamed: 0,fol,ent,con,v_sel,n_hog,h_mud,i_per,ing,mpio,ageb,...,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15
0,12A207,1,40007,3,1,0,1.0,1200.0,1,049-8,...,4,3,3,2,2,4,6,4,3,3
1,12A207,1,40007,4,1,0,3.0,5000.0,1,049-8,...,3,3,3,2,2,2,4,4,3,2
2,12A207,1,40007,3,1,0,1.0,1300.0,1,049-8,...,4,3,3,2,2,4,6,4,3,3
3,12A207,1,40007,3,1,0,1.0,1000.0,1,049-8,...,4,3,3,2,2,4,6,4,3,3
4,12A207,1,40007,3,1,0,,,1,049-8,...,4,3,3,2,2,4,6,4,3,3


We'll take a look at the descriptive statistics of the data, as well as see the data lost by each column.

In [None]:
# Getting basic statistics from the final DataFrame
print("Basic statistics of the final DataFrame:")
df_final.describe(include='all')

Basic statistics of the final DataFrame:


Unnamed: 0,fol,ent,con,v_sel,n_hog,h_mud,i_per,ing,mpio,ageb,...,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15
count,80577,80577.0,80577.0,80577.0,80577.0,80577.0,80577.0,35017.0,80577.0,80577,...,80577.0,80577.0,80577.0,80577.0,80577.0,80577.0,80577.0,80577.0,80577.0,80577.0
unique,81,,,,,,6.0,,,1778,...,,,,,,,,,,
top,11A201,,,,,,,,,036-3,...,,,,,,,,,,
freq,1818,,,,,,45118.0,,,236,...,,,,,,,,,,
mean,,15.686375,40285.871129,2.49668,1.0,0.043102,,399391.851244,41.892587,,...,3.172891,2.303176,2.51748,1.726895,1.759063,3.033806,5.099619,3.135746,2.777021,2.676334
std,,7.799257,588.671605,1.116164,0.0,0.21703,,487287.000362,52.820117,,...,1.010005,0.753374,0.635385,0.473536,0.613592,0.876353,1.089937,1.05202,0.568731,0.668834
min,,1.0,22251.0,1.0,1.0,0.0,,25.0,1.0,,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,,9.0,40141.0,1.0,1.0,0.0,,1900.0,6.0,,...,2.0,2.0,2.0,1.0,1.0,2.0,4.0,2.0,3.0,3.0
50%,,15.0,40266.0,3.0,1.0,0.0,,6500.0,26.0,,...,3.0,2.0,3.0,2.0,2.0,3.0,6.0,3.0,3.0,3.0
75%,,20.0,40382.0,3.0,1.0,0.0,,999999.0,51.0,,...,4.0,3.0,3.0,2.0,2.0,4.0,6.0,4.0,3.0,3.0


In [None]:
# Check for missing data
print("Missing data in the final DataFrame:")
missing_data = df_final.isnull().sum()
missing_data

Missing data in the final DataFrame:


Unnamed: 0,0
fol,0
ent,0
con,0
v_sel,0
n_hog,0
h_mud,0
i_per,0
ing,45560
mpio,0
ageb,0


In [None]:
# Save the processed tidy DataFrame
df_final.to_csv('/content/enco_2022/processed_enco_tidy.csv', index=False)