In [1]:
import pandas as pd

# DataKind Financial Inclusion - Mexico

https://en.www.inegi.org.mx/app/scitel/Default?ev=9

## Scraping INEGI Data

I initially tried using the API and other programs linked to on the main INEGI website, but had no luck finding municipal level data. I deleted all the related code to clean things up.

Through a Google search, I found this app that allows access to multiple datasets at the municipal (and even more specific) level. Changing the "ev" parameter allows access to different datasets-- most appear to be old censuses. Some datasets seem to go down to a city block level! The most interesting ones I found:

- https://en.www.inegi.org.mx/app/scitel/Default?ev=9 (2020 Population and Housing Census)
- https://en.www.inegi.org.mx/app/scitel/Default?ev=8 (2014 Infrastructure data)

The data for all 2469 municipalities can be obtained by adding two filters using the "Advanced Filter" function:

 LOC = '0000' Y MUN ENTRE 001 AND 999

Which translates to a locality code of 0000 and a municipality code between 001 and 999 (A municipality code of 000 gives data for the entire state, like the locality code of 0000 gives data for the municipality).

**WARNING**: If exporting from the SCITEL data portal, watch out for the municipality "Heroica Villa Tezoatlán de Segura y Luna, Cuna de la Independencia de Oaxaca". I'm not sure if it happened during the initial CSV generation or at some other point, but the comma led the name to be split between two cells and shifted all the data over by one column. Apparently this is an honorific name and the municipality is generally just known as Tezoatlán de Segura y Luna.


I downloaded what seemed to be the most important, general socioeconomic and demographic data, but there is more fine-grained info available. The data file exported with inscrutable variable nicknames instead of full column titles-- I manually replaced these doing some messy copy/pasting from the website.

In [2]:
# ITER stands for Información por entidad, municipio, localidad y AGEB urbana del Censo de Población y Vivienda
# (Information by state, municipality, locality, and urban AGEB from the Population and Housing Census)
# I didn't realize until later I'd need more detailed population / age info to properly calculate e.g. literacy rates. I manually added them to the file from a separate export

INEGI_df = pd.read_csv(r'datasets\ITER2020 - Nacional_columntitles.csv')

# Create a municipal code in the same format as the CNBV data set
INEGI_df['mun_code'] = INEGI_df['ENTIDAD'].astype(str) + INEGI_df['MUN'].astype(str).str.zfill(3)

# Some basic cleanup
INEGI_df = INEGI_df.drop(columns = ['LOC', 'MUN', 'ENTIDAD', 'NOM_LOC'])

# Move municipal code to the front
last_col = INEGI_df.columns[-1]
INEGI_df = INEGI_df[[last_col] + list(INEGI_df.columns[:-1])]


In [3]:
# Translated with ChatGPT
translations = {
    'Total de viviendas particulares habitadas con características': 'Total Occupied Private Households with Characteristics (2020)', # This value should be used to calculate e.g. "% of Households with Electricity"
    'Población total': 'Total Population (2020)',
    'Población de 60 años y más': 'Population Aged 60+ (2020)',
    'Promedio de hijas e hijos nacidos vivos': 'Average Number of Live-born Children per Woman (2020)',
    'Población nacida en otra entidad': 'Population Born in Another State (2020)',
    'Población de 5 años y más que habla alguna lengua indígena': 'Population Aged 5+ That Speaks an Indigenous Language (2020)',
    'Población de 5 años y más que habla alguna lengua indígena y no habla español': 'Population Aged 5+ That Speaks Only an Indigenous Language (2020)',
    'Población en hogares censales indígenas': 'Population in Indigenous Census Households (2020)',
    'Población que se considera afromexicana o afrodescendiente': 'Population That Identifies as Afro-Mexican or Afrodescendant (2020)',
    'Población con discapacidad': 'Population with Disabilities (2020)',
    'Población de 15 años y más analfabeta': 'Population Aged 15+ That Is Illiterate (2020)',
    'Población de 15 años y más sin escolaridad': 'Population Aged 15+ Without Schooling (2020)',
    'Población de 15 años y más con primaria completa': 'Population Aged 15+ with Completed Primary Education (2020)',
    'Población de 15 años y más con secundaria completa': 'Population Aged 15+ with Completed Secondary Education (2020)',
    'Grado promedio de escolaridad': 'Average Years of Schooling (2020)',
    'Población de 18 años y más con educación posbásica': 'Population Aged 18+ with Post-Basic Education (2020)',
    'Población de 12 años y más económicamente activa': 'Economically Active Population Aged 12+ (2020)',
    'Población de 12 años y más no económicamente activa': 'Economically Inactive Population Aged 12+ (2020)',
    'Población de 12 años y más desocupada': 'Unemployed Population Aged 12+ (2020)',
    'Población sin afiliación a servicios de salud': 'Population Without Health Service Affiliation (2020)',
    'Población de 12 años y más casada o unida': 'Population Aged 12+ That Is Married or Cohabiting (2020)',
    'Población con religión católica': 'Population with Catholic Religion (2020)',
    'Población con grupo religioso protestante/cristiano evangélico': 'Population with Protestant/Evangelical Christian Religion (2020)',
    'Población con otras religiones diferentes a las anteriores': 'Population with Other Religions (2020)',
    'Población sin religión o sin adscripción religiosa': 'Population Without Religion or Religious Affiliation (2020)',
    'Total de viviendas': 'Total Dwellings (2020)',
    'Total de viviendas habitadas': 'Total Occupied Dwellings (2020)',
    'Total de viviendas particulares': 'Total Private Dwellings (2020)',
    'Viviendas particulares habitadas': 'Occupied Private Dwellings (2020)',
    'Total de viviendas particulares habitadas': 'Total Occupied Private Dwellings (2020)',
    'Promedio de ocupantes en viviendas particulares habitadas': 'Average Occupants per Occupied Private Dwelling (2020)',
    'Promedio de ocupantes por cuarto en viviendas particulares habitadas': 'Average Occupants per Room in Occupied Private Dwellings (2020)',
    'Viviendas particulares habitadas con piso de material diferente de tierra': 'Occupied Private Dwellings with Flooring Material Other Than Dirt (2020)',
    'Viviendas particulares habitadas con piso de tierra': 'Occupied Private Dwellings with Dirt Floor (2020)',
    'Viviendas particulares habitadas que disponen de energía eléctrica': 'Occupied Private Dwellings with Electricity (2020)',
    'Viviendas particulares habitadas que disponen de agua entubada en el ámbito de la vivienda': 'Occupied Private Dwellings with Running Water Inside the Home (2020)',
    'Viviendas particulares habitadas que disponen de excusado o sanitario': 'Occupied Private Dwellings with Toilet or Latrine (2020)',
    'Viviendas particulares habitadas que no disponen de energía eléctrica, agua entubada, ni drenaje': 'Occupied Private Dwellings Without Electricity, Water, or Drainage (2020)',
    'Viviendas particulares habitadas que no disponen de automóvil o camioneta, ni de motocicleta o motoneta': 'Occupied Private Dwellings Without Car, Truck, Motorcycle, or Scooter (2020)',
    'Viviendas particulares habitadas que disponen de refrigerador': 'Occupied Private Dwellings with Refrigerator (2020)',
    'Viviendas particulares habitadas que disponen de computadora, laptop o tablet': 'Occupied Private Dwellings with Computer, Laptop, or Tablet (2020)',
    'Viviendas particulares habitadas que disponen de teléfono celular': 'Occupied Private Dwellings with Cell Phone (2020)',
    'Viviendas particulares habitadas que disponen de Internet': 'Occupied Private Dwellings with Internet Access (2020)',
}

INEGI_df = INEGI_df.rename(columns = translations)

In [4]:
INEGI_df

Unnamed: 0,mun_code,NOM_ENT,NOM_MUN,Total Occupied Private Households with Characteristics (2020),Total Population (2020),Population Aged 5+,Population Aged 12+,Population Aged 15+,Population Aged 18+,Population Aged 60+ (2020),...,Occupied Private Dwellings with Dirt Floor (2020),Occupied Private Dwellings with Electricity (2020),Occupied Private Dwellings with Running Water Inside the Home (2020),Occupied Private Dwellings with Toilet or Latrine (2020),"Occupied Private Dwellings Without Electricity, Water, or Drainage (2020)","Occupied Private Dwellings Without Car, Truck, Motorcycle, or Scooter (2020)",Occupied Private Dwellings with Refrigerator (2020),"Occupied Private Dwellings with Computer, Laptop, or Tablet (2020)",Occupied Private Dwellings with Cell Phone (2020),Occupied Private Dwellings with Internet Access (2020)
0,1001,Aguascalientes,Aguascalientes,266508,948990,871193,756970,707473,657539,102987,...,1530,265785,265232,265261,53,88348,254959,136923,251719,178619
1,1002,Aguascalientes,Asientos,12522,51536,46095,38399,35250,32032,4697,...,184,12420,12390,11968,13,4779,11071,2826,10682,4526
2,1003,Aguascalientes,Calvillo,15520,58250,52691,44778,41495,38275,7829,...,174,15398,15364,15378,26,6148,14659,4003,13666,6553
3,1004,Aguascalientes,Cosío,3931,17000,15254,12820,11817,10772,1613,...,40,3906,3900,3842,5,1526,3518,1005,3424,1741
4,1005,Aguascalientes,Jesús María,33171,129929,117571,99250,91487,83807,9815,...,411,32977,32845,32817,26,9392,31414,15687,31408,19920
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2464,32054,Zacatecas,Villa Hidalgo,4950,19446,17273,14279,13072,11957,2126,...,70,4914,4846,4366,6,1626,4164,665,3929,1150
2465,32055,Zacatecas,Villanueva,9032,31558,29105,25147,23446,21847,5638,...,97,8956,8916,8617,20,3237,8528,2206,7418,4411
2466,32056,Zacatecas,Zacatecas,42372,149607,138071,121838,114886,107613,18847,...,203,42244,41852,42089,19,13794,40440,24100,39755,31989
2467,32057,Zacatecas,Trancoso,4668,20455,18145,15081,13817,12605,1785,...,65,4628,4506,4415,5,2435,3863,868,3834,1092


In [5]:
INEGI_df.to_csv(r"datasets/INEGI_ITER2020_data.csv", index = False)

## CNBV Financial Inclusion Data

Reports performed by the Comisión Nacional Bancaria y de Valores (National Banking and Securities Commission) found here: https://www.gob.mx/cnbv/acciones-y-programas/bases-de-datos-de-inclusion-financiera

I downloaded Sept 2024 report contained in a multi-sheet Excel file (CNBV - Base_de_Datos_de_Inclusion_Financiera_202409.xlsx). These are the sheets with municipal level data (with my translation in parentheses):

- BD Infraestructura Mun (Municipal Infrastructure Database)
- BD Captación Mun (Municipal Financial Accounts Database)
- BD Crédito Mun (Municipal Credit Database)
- BD por sexo Banca Mun (Municipal Banking by Gender)
- BD por sexo EACP Mun (Municipal Local Savings and Credit Entities by Gender)

I translated the column titles for these worksheets and saved the version as CNBV - Base_de_Datos_de_Inclusion_Financiera_202409_translated.xlsx.

Each sheet has a "general information" section followed by a breakdown of the topic specified in the sheet title for "All sectors" followed by more specific subdomains like "commercial banking", "local credit unions", etc.

I did my best to localize terms and learned more as I went on. At some point I want to revisit my translations, because earlier on there were some terms that have a specific meaning that I may have translated inconsistently. The main example I ran across: "Entidades de Ahorro y Crédito Popular (EACP)" is a frequent category in the dataset, and literally translates to "Popular Savings and Credit Entities". There is also the term "Sociedades Financieras Populares (SFPs)" which literally translates to "Popular Finance Societies". I knew enough not to translate "popular" literally-- in this context, the term means more "for the people" rather than "widely well-liked". However, I did not learn until later that both EACPs and SFPs are rigorously defined terms that are regulated by the Mexican government.

The code in the cell below performs a rough import of all of the municipal-level sheets. However, the formatting of the subheadings is not perserved. I'm leaving it as a starting point in case someone wants to explore the data in more depth.

The Global Partnership for Financial Inclusion uses the "G20 Basic Set of Financial Inclusion Indicators": <br>

- Formally banked adults: Percentage of adults with an account at a formal financial institution
- Adults with credit from regulated institutions: Percentage of adults with at least one loan outstanding from a financial institution   
- Formally banked enterprises: Number or percentage of SMEs with accounts
- Enterprises with outstanding loan from a regulated financial institution: Number or percentage of SMEs with outstanding loan
- Points of service: Number of branches per 100,000 adults

(taken from https://www.worldbank.org/en/topic/financialinclusion/brief/how-to-measure-financial-inclusion)

These all seem pretty straightforward, and not reliant on distinctions between commercial banking, development banking, savings and loan cooperatives, and community financial institutions (SFPs). Therefore, I'm just going to focus on extracting the "All sectors" data.

In [6]:
# Going to neglect subheadings unless they appear absolutely necessary. In that case, will append them to column title

#  Starting with the first sheet, "Infrastructure".

CNBV_df = pd.read_excel(
    'datasets/CNBV - Base_de_Datos_de_Inclusion_Financiera_202409_translated.xlsx', 
    sheet_name='BD Infraestructura Mun', 
    header=[11]
)

# Dropping first empty column and columns related to more detailed breakdown
CNBV_df = CNBV_df.iloc[:, 1:20]

In [7]:
# Import "financial accounts" sheet
accounts_df = pd.read_excel(
    'datasets/CNBV - Base_de_Datos_de_Inclusion_Financiera_202409_translated.xlsx', 
    sheet_name='BD Captación Mun', 
    header=[10,11]
)

accounts_df = accounts_df.iloc[:, 12:24]

# Flatten the columns. Even the "all sectors" section is broken up into "Banking" and "EACP" sections
accounts_df.columns = [' - '.join(col) for col in accounts_df.columns]

# Drop full translation of EACP for readability
accounts_df.columns = accounts_df.columns.str.replace('Community Savings and Loan Institutions (EACP)', 'EACP')

# Combine banking and EACP data
# accounts_df['Savings Accounts'] = accounts_df['Banking - Savings Accounts'] + accounts_df['EACP - Savings Accounts']
# accounts_df['Term Deposit Accounts'] = 

CNBV_df = pd.concat([CNBV_df, accounts_df], axis = 1)

In [8]:
# Import "credit" sheet
credit_df = pd.read_excel(
    'datasets/CNBV - Base_de_Datos_de_Inclusion_Financiera_202409_translated.xlsx', 
    sheet_name='BD Crédito Mun', 
    header=[10, 11]
)
# Drop extra columns, only keeping "all sector" data  
credit_df = credit_df.iloc[:, 12:24]

# Flatten columns
credit_df.columns = [' - '.join(col) for col in credit_df.columns]

CNBV_df = pd.concat([CNBV_df, credit_df], axis = 1)

# Drop full translation of EACP for readability
CNBV_df.columns = CNBV_df.columns.str.replace('Community Savings and Credit Institutions (EACP)', 'EACP')

In [9]:
# List of columns just to check how everything looks together. I was curious what the difference would be between "personal" and "consumer" loans-- 
# ChatGPT said a personal loan is just a lump sum, whereas a consumer loan would cover financing for something specific, like an auto loan
CNBV_df.columns

Index(['Municipality Code', 'State Code', 'Region', 'State', 'Municipality',
       'Population', 'Adult Population', 'Adult Population (Women)',
       'Adult Population (Men)', 'Type of Population', 'Social Lag Index',
       'Number of Banking Agents', 'Number of Bank Branches', 'Number of ATMs',
       'Number of POS Terminals', 'Number of Businesses with POS Terminals',
       'Number of Accounts with Mobile Transactions',
       'Number of ATM Transactions', 'Number of POS Transactions',
       'Banking - Savings Accounts', 'Banking - Term Deposit Accounts',
       'Banking - Transactional Accounts Level 1',
       'Banking - Transactional Accounts Level 2',
       'Banking - Transactional Accounts Level 3',
       'Banking - Transactional Accounts Level 4',
       'Banking - Debit Card Accounts', 'EACP - Savings Accounts',
       'EACP - Term Deposit Accounts', 'EACP - Demand Deposit Accounts',
       'EACP - Debit Card Accounts', 'Total - Total Accounts',
       'Banking - Cred

In [10]:
# Cleaning up column titles
CNBV_df.columns = CNBV_df.columns.str.replace('Number of ', '')

CNBV_df.columns

Index(['Municipality Code', 'State Code', 'Region', 'State', 'Municipality',
       'Population', 'Adult Population', 'Adult Population (Women)',
       'Adult Population (Men)', 'Type of Population', 'Social Lag Index',
       'Banking Agents', 'Bank Branches', 'ATMs', 'POS Terminals',
       'Businesses with POS Terminals', 'Accounts with Mobile Transactions',
       'ATM Transactions', 'POS Transactions', 'Banking - Savings Accounts',
       'Banking - Term Deposit Accounts',
       'Banking - Transactional Accounts Level 1',
       'Banking - Transactional Accounts Level 2',
       'Banking - Transactional Accounts Level 3',
       'Banking - Transactional Accounts Level 4',
       'Banking - Debit Card Accounts', 'EACP - Savings Accounts',
       'EACP - Term Deposit Accounts', 'EACP - Demand Deposit Accounts',
       'EACP - Debit Card Accounts', 'Total - Total Accounts',
       'Banking - Credit Cards', 'Banking - Personal Loans',
       'Banking - Payroll Loans', 'Banking - ABC

In [11]:
# exporting to CSV
CNBV_df.to_csv(r"datasets/CNBV_allsector_data.csv", index = False)