# Getting historical inflations indices values from IBGE website

**What ?** The IBGE (Instituto Brasileiro de Geografia e Estatística) is the institution that provides various inflation indices for the Brazilian economy. These indices serve as key references for monetary policy and investment decisions. The most important inflation indices provided by the IBGE are:

>- **IPCA**: The primary inflation index in Brazil, reflecting price changes for goods and services consumed by a broad range of households.
>- **IPCA-15**: A preliminary version of the IPCA, providing early insights into inflation trends.
>- **IPCA-E**: The cumulative IPCA-15 over a specific period, used for short-term economic analysis.
>- **INPC**: Measures inflation for households with lower incomes, earning between 1 and 5 minimum wages, often reflecting the inflationary impact on lower-income households more accurately.


**Why ?** Inflation indices such as IPCA, IPCA-15, IPCA-E, and INPC are crucial for data analysis because they provide valuable insights into the economic environment like economic trends, monetary policy, cost of living, investment decisions and more.

**How ?** Historical inflation rate data will be downloaded directly from the IBGE website using the 'requests' library
[(see this link)](https://www.ibge.gov.br/estatisticas/economicas/precos-e-custos/9256-indice-nacional-de-precos-ao-consumidor-amplo?=&t=downloads). Once obtained, the data will be cleaned by removing empty rows, and the data columns will be transformed to create a consolidated dataframe with consistent columns and rules. Finally, the dataframe will be uploaded into a local SQLite database for further analysis.

<img src="https://lh3.googleusercontent.com/d/1WRUYhcCLQ7PNGn6yNmc9m5k8w9LccJq-" alt="texto_alternativo" width="400" align="center">

## Import Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import io

import sqlite3
import requests
import zipfile

### IPCA

#### Downloading IPCA data

In [2]:
url_ipca = 'https://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA/Serie_Historica/ipca_SerieHist.zip'

response = requests.get(url_ipca)

columns_name = ['yeardt','monthdt','num_indice','var_mes','var_3mes','var_6mes','var_ano','var_12mes'] # previous mapped columns name 

if response.status_code == 200:
    # Create a ZipFile object from the response content
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        for file_name in zip_ref.namelist():
            if file_name.endswith('.xlsx') or file_name.endswith('.xls'):
                # Read the Excel file into a DataFrame
                with zip_ref.open(file_name) as excel_file:
                    df_ipca = pd.read_excel(excel_file,skiprows = 7, names = columns_name,usecols=range(8))
                break
    print("DataFrame loaded successfully.")
    # Display the DataFrame
    print(df_ipca.head(2))    
else:
    print('Connection erro: {}'.format(response.status_code))

DataFrame loaded successfully.
  yeardt monthdt num_indice var_mes var_3mes var_6mes var_ano var_12mes
0   1994     JAN     141.31   41.31   162.13   533.33   41.31   2693.84
1    NaN     FEV     198.22   40.27   171.24   568.17   98.22   3035.71


#### Clean and prepare the dataframe IPCA

In [3]:
# data cleaning for IPCA
df_ipca['yeardt'] = df_ipca['yeardt'].replace(r'^\s*$', np.nan, regex=True) # replace white space by NaN
df_ipca.dropna(how = 'all', inplace = True) # drop na values
df_ipca['yeardt'].fillna(method = 'ffill', inplace = True) # writiing year over all rows 
df_ipca = df_ipca[df_ipca['yeardt'].apply(lambda x: str(x).isnumeric())] # drop rows without information
map_month = {'JAN':1, 'FEV':2,'MAR':3, 'ABR':4,'MAI':5,'JUN':6,'JUL':7,'AGO':8, 'SET':9,'OUT':10,'NOV':11,'DEZ':12}
df_ipca['year_monthdt'] = df_ipca['yeardt'].astype(str)+'-'+df_ipca['monthdt'].map(map_month).astype(str)  # remap data to a new format
df_ipca['index_name'] = 'IPCA'
df_ipca.head(2)

Unnamed: 0,yeardt,monthdt,num_indice,var_mes,var_3mes,var_6mes,var_ano,var_12mes,year_monthdt,index_name
0,1994,JAN,141.31,41.31,162.13,533.33,41.31,2693.84,1994-1,IPCA
1,1994,FEV,198.22,40.27,171.24,568.17,98.22,3035.71,1994-2,IPCA


### IPCA 15

#### Downloading IPCA-15 data

In [4]:
url_ipca15 = 'https://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA_15/Series_Historicas/ipca-15_SerieHist.zip'

response = requests.get(url_ipca15)

columns_name = ['yeardt','monthdt','num_indice','var_mes','var_3mes','var_6mes','var_ano','var_12mes'] # previous mapped columns name 

if response.status_code == 200:
    # Create a ZipFile object from the response content
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        for file_name in zip_ref.namelist():
            if file_name.endswith('.xlsx') or file_name.endswith('.xls'):
                # Read the Excel file into a DataFrame
                with zip_ref.open(file_name) as excel_file:
                    df_ipca_15 = pd.read_excel(excel_file,skiprows = 8, names = columns_name,usecols=range(8))
                break
    print("DataFrame loaded successfully.")
    # Display the DataFrame
    print(df_ipca_15.head(2))    
else:
    print('Connection erro: {}'.format(response.status_code))

DataFrame loaded successfully.
  yeardt monthdt num_indice var_mes var_3mes var_6mes var_ano var_12mes
0   1994     JAN     139.17   39.17   154.72   510.69   39.17   2561.94
1    NaN     FEV     194.42    39.7   165.75   546.36   94.42   2834.59


#### Clean and prepare the dataframe IPCA-15

In [5]:
# data cleaning for IPCA 15
df_ipca_15['yeardt'] = df_ipca_15['yeardt'].replace(r'^\s*$', np.nan, regex=True) # replace white space by NaN
df_ipca_15.dropna(how = 'all', inplace = True) # drop na values
df_ipca_15['yeardt'].fillna(method = 'ffill', inplace = True) # writiing year over all rows 
df_ipca_15 = df_ipca_15[df_ipca_15['yeardt'].apply(lambda x: str(x).isnumeric())] # drop rows without information
map_month = {'JAN':1, 'FEV':2,'MAR':3, 'ABR':4,'MAI':5,'JUN':6,'JUL':7,'AGO':8, 'SET':9,'OUT':10,'NOV':11,'DEZ':12}
df_ipca_15['year_monthdt'] = df_ipca_15['yeardt'].astype(str)+'-'+df_ipca_15['monthdt'].map(map_month).astype(str)  # remap data to a new format
df_ipca_15['index_name'] = 'IPCA-15'
df_ipca_15.head(2)

Unnamed: 0,yeardt,monthdt,num_indice,var_mes,var_3mes,var_6mes,var_ano,var_12mes,year_monthdt,index_name
0,1994,JAN,139.17,39.17,154.72,510.69,39.17,2561.94,1994-1,IPCA-15
1,1994,FEV,194.42,39.7,165.75,546.36,94.42,2834.59,1994-2,IPCA-15


### IPCA E

#### Downloading IPCA-E data

In [6]:
url_ipca_e = 'https://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA_E/Series_Historicas/ipca-e_SerieHist.zip'

response = requests.get(url_ipca_e)
columns_name = ['yeardt','monthdt','num_indice','var_mes','var_3mes','var_6mes','var_ano','var_12mes'] # previous mapped columns name 

if response.status_code == 200:
    # Create a ZipFile object from the response content
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        for file_name in zip_ref.namelist():
            if file_name.endswith('.xlsx') or file_name.endswith('.xls'):
                # Read the Excel file into a DataFrame
                with zip_ref.open(file_name) as excel_file:
                    df_ipca_e = pd.read_excel(excel_file,skiprows = 5, names = columns_name, usecols=range(8))
                break
    print("DataFrame loaded successfully.")
    # Display the DataFrame
    print(df_ipca_e.head(2))    
else:
    print('Connection erro: {}'.format(response.status_code))

DataFrame loaded successfully.
  yeardt monthdt num_indice var_mes var_3mes var_6mes var_ano var_12mes
0   1994     JAN     139.17   39.17   154.72   510.69   39.17   2561.94
1    NaN     FEV     194.42    39.7   165.75   546.36   94.42   2834.59


#### Clean and prepare the dataframe IPCA-E

In [7]:
# data cleaning for IPCA E
df_ipca_e['yeardt'] = df_ipca_e['yeardt'].replace(r'^\s*$', np.nan, regex=True) # replace white space by NaN
df_ipca_e.dropna(how = 'all', inplace = True) # drop na values
df_ipca_e['yeardt'].fillna(method = 'ffill', inplace = True) # writiing year over all rows 
df_ipca_e = df_ipca_e[df_ipca_e['yeardt'].apply(lambda x: str(x).isnumeric())] # drop rows without information
map_month = {'JAN':1, 'FEV':2,'MAR':3, 'ABR':4,'MAI':5,'JUN':6,'JUL':7,'AGO':8, 'SET':9,'OUT':10,'NOV':11,'DEZ':12}
df_ipca_e['year_monthdt'] = df_ipca_e['yeardt'].astype(str)+'-'+df_ipca_e['monthdt'].map(map_month).astype(str)  # remap data to a new format
df_ipca_e['index_name'] = 'IPCA-E'
df_ipca_e.head(2)

Unnamed: 0,yeardt,monthdt,num_indice,var_mes,var_3mes,var_6mes,var_ano,var_12mes,year_monthdt,index_name
0,1994,JAN,139.17,39.17,154.72,510.69,39.17,2561.94,1994-1,IPCA-E
1,1994,FEV,194.42,39.7,165.75,546.36,94.42,2834.59,1994-2,IPCA-E


### INPC

#### Downloading INPC data

In [8]:
url_inpc= 'https://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/INPC/Serie_Historica/inpc_SerieHist.zip'

response = requests.get(url_ipca)

columns_name = ['yeardt','monthdt','num_indice','var_mes','var_3mes','var_6mes','var_ano','var_12mes'] # previous mapped columns name 

if response.status_code == 200:
    # Create a ZipFile object from the response content
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        for file_name in zip_ref.namelist():
            if file_name.endswith('.xlsx') or file_name.endswith('.xls'):
                # Read the Excel file into a DataFrame
                with zip_ref.open(file_name) as excel_file:
                    df_inpc = pd.read_excel(excel_file,skiprows = 7, names = columns_name, usecols=range(8))
                break
    print("DataFrame loaded successfully.")
    # Display the DataFrame
    print(df_inpc.head(2))    
else:
    print('Connection erro: {}'.format(response.status_code))

DataFrame loaded successfully.
  yeardt monthdt num_indice var_mes var_3mes var_6mes var_ano var_12mes
0   1994     JAN     141.31   41.31   162.13   533.33   41.31   2693.84
1    NaN     FEV     198.22   40.27   171.24   568.17   98.22   3035.71


#### Clean and prepare the dataframe INPC

In [9]:
# data cleaning for INPC
df_inpc['yeardt'] = df_inpc['yeardt'].replace(r'^\s*$', np.nan, regex=True) # replace white space by NaN
df_inpc.dropna(how = 'all', inplace = True) # drop na values
df_inpc['yeardt'].fillna(method = 'ffill', inplace = True) # writiing year over all rows 
df_inpc = df_inpc[df_inpc['yeardt'].apply(lambda x: str(x).isnumeric())] # drop rows without information
map_month = {'JAN':1, 'FEV':2,'MAR':3, 'ABR':4,'MAI':5,'JUN':6,'JUL':7,'AGO':8, 'SET':9,'OUT':10,'NOV':11,'DEZ':12}
df_inpc['year_monthdt'] = df_inpc['yeardt'].astype(str)+'-'+df_inpc['monthdt'].map(map_month).astype(str)  # remap data to a new format
df_inpc['index_name'] = 'INPC'
df_inpc.head(2)

Unnamed: 0,yeardt,monthdt,num_indice,var_mes,var_3mes,var_6mes,var_ano,var_12mes,year_monthdt,index_name
0,1994,JAN,141.31,41.31,162.13,533.33,41.31,2693.84,1994-1,INPC
1,1994,FEV,198.22,40.27,171.24,568.17,98.22,3035.71,1994-2,INPC


### Concat all inflations rates in a unique dataframe

In [10]:
df_inflation_rates = pd.concat([df_ipca,df_ipca_15,df_ipca_e,df_inpc])

### Saving the final dataframe into the SQLite database

In [11]:
conn = sqlite3.connect(os.getenv('MY_FINANCE_DB_PATH')+'/finance_database.db')
df_inflation_rates.to_sql('IBGE_inflation_rates',conn,if_exists='replace',index=False)

conn.close()    