# Getting historical quotation from brazilian stock exchange

**What ?** Historical quotation data refers to the daily prices of all listed Brazilian securities traded on the B3 stock exchange. For each asset, such as stocks, funds, BDRs, options, futures, and many other financial instruments, the open, high, low, and closing prices are provided. In addition to these prices, the number of trades and the total daily volume traded are also available.

**Why ?** Quotes data is crucial in the context of quantitative trading and asset allocation. It enables algorithmic strategies, market timing, backtesting, and risk management, all of which are essential for effective asset allocation. 

**How ?** The Brazilian stock exchange (B3) publishes a file on its website containing historical daily asset prices in a fixed-width .csv format. The last five years of price data (2019 - 2024) were manually downloaded from the website. This notebook demonstrates how to read the .csv files within each downloaded .zip file and apply data cleaning and transformation according to this guide[(link here)](https://www.b3.com.br/data/files/33/67/B9/50/D84057102C784E47AC094EA8/SeriesHistoricas_Layout.pdf). At the end, the cleaned data is uploaded to a local SQLite database for further analysis. ***It is important to note that, at this stage, the historical prices are not adjusted for dividend distributions and split/inplit events.***


<img src="https://lh3.googleusercontent.com/d/1e-hu9egDMB2j2ZoRQLKXd0qd0GTLuXmL" alt="texto_alternativo" width="400" align="center">

## Import Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import re
from datetime import datetime

import sqlite3
import requests
import zipfile

## Defining paths and constant variables

####  Looking for new files manually downloaded from B3 website into a local folder

In [2]:
script_directory = os.getcwd() #getting the script directory path
file_path = os.path.join(script_directory,'temp_files') # Define the file path within the subfolder
all_files = os.listdir(file_path)
zip_files_with_paths = [os.path.join(file_path, file) for file in all_files if file.endswith(('.ZIP','.zip'))]

zip_files_with_paths

['C:\\Users\\lucas\\OneDrive\\CM_Explorer\\data_scraping\\B3_historical_quotes\\temp_files\\COTAHIST_A2019.ZIP',
 'C:\\Users\\lucas\\OneDrive\\CM_Explorer\\data_scraping\\B3_historical_quotes\\temp_files\\COTAHIST_A2020.ZIP',
 'C:\\Users\\lucas\\OneDrive\\CM_Explorer\\data_scraping\\B3_historical_quotes\\temp_files\\COTAHIST_A2021.ZIP',
 'C:\\Users\\lucas\\OneDrive\\CM_Explorer\\data_scraping\\B3_historical_quotes\\temp_files\\COTAHIST_A2022.ZIP',
 'C:\\Users\\lucas\\OneDrive\\CM_Explorer\\data_scraping\\B3_historical_quotes\\temp_files\\COTAHIST_A2023.ZIP',
 'C:\\Users\\lucas\\OneDrive\\CM_Explorer\\data_scraping\\B3_historical_quotes\\temp_files\\COTAHIST_A2024.ZIP']

#### Defining constant

In [3]:
colspecs = [(0,2),(2,10),(10,12),(12,24),(24,27),(27,39),(39,49), \
            (49,52),(52,56),(56,69),(69,82),(82,95),(95,108),(108,121),\
            (121,134),(134,147),(147,152),(152,170),(170,188),(188,201),\
            (201,202),(202,210),(210,217),(217,230),(230,242),(242,None)] # defining the fixed width of the columns based on the reference document: https://www.b3.com.br/data/files/33/67/B9/50/D84057102C784E47AC094EA8/SeriesHistoricas_Layout.pdf 

names= ['TIPREG','DT_PREGAO','CODBDI','CODNEG','TPMERC','NOMRES',\
        'ESPECI','PRAZOT','MODREF','PREABE','PREMAX','PREMIN','PREMED',\
        'PREULT','PREOFC','PREOFV','TOTNEG','QUATOT','VOLTOT','PREEXE',\
        'INDOPC','DATVEN','FATCOT','PTOEXE','CODISI','DISMES'] # defing columns names

# reding support table with descriptions for some columns values
df_codbdi = pd.read_excel(script_directory+'\\support_table.xlsx', sheet_name='CODBDI')
df_especi = pd.read_excel(script_directory+'\\support_table.xlsx', sheet_name='ESPECI')
df_tpmerc = pd.read_excel(script_directory+'\\support_table.xlsx', sheet_name='TPMERC')
df_indopc = pd.read_excel(script_directory+'\\support_table.xlsx', sheet_name='INDOPC')

##  Defining functions to prep and save dataframe

#### Clean and prepare dataframe

In [4]:
def func_data_cleaning(df,df_codbdi,df_especi,df_tpmerc,df_indopc):

    # adjusting prices 
    df['PREABE'] = df['PREABE']/100
    df['PREMAX'] = df['PREMAX']/100
    df['PREMIN'] = df['PREMIN']/100
    df['PREMED'] = df['PREMED']/100
    df['PREULT'] = df['PREULT']/100
    df['PREOFC'] = df['PREOFC']/100
    df['PREOFV'] = df['PREOFV']/100
    df['VOLTOT'] = df['VOLTOT']/100
    df['PREEXE'] = df['PREEXE']/100
    # bring description values for some columns from support table 
    df = pd.merge(df,df_codbdi, how = 'left', on = ['CODBDI','CODBDI'])
    df = pd.merge(df,df_especi, how = 'left', on = ['ESPECI','ESPECI'])
    df = pd.merge(df,df_tpmerc, how = 'left', on = ['TPMERC','TPMERC'])
    df = pd.merge(df,df_indopc, how = 'left', on = ['INDOPC','INDOPC'])
    # changing data types
    df['DT_PREGAO'] = df['DT_PREGAO'].astype(str).str[:4] + '-' + df['DT_PREGAO'].astype(str).str[4:6] + '-' + df['DT_PREGAO'].astype(str).str[6:]
    df['DATVEN'] = df['DATVEN'].astype(str).str[:4] + '-' + df['DATVEN'].astype(str).str[4:6] + '-' + df['DATVEN'].astype(str).str[6:]
    df['pregao_dt'] = pd.to_datetime(df['DT_PREGAO']) # adding also a date_time type 
    # adding a processing data column to referece the date of the scraping process run
    df['proc_datedt'] = datetime.now().replace(microsecond=0) 
    
    return df

#### Save dataframe into a local SQLite file

In [5]:
def func_write_sqldatabase(df):
    conn = sqlite3.connect(os.getenv('MY_FINANCE_DB_PATH')+'/finance_database.db')
    
    df.to_sql('B3_historical_quotes',conn,if_exists='append',index=False)
    
    conn.close()
    del df

## Loop to read the downloaded files

In [6]:
for zip_file_name in zip_files_with_paths:

    with zipfile.ZipFile(zip_file_name, 'r') as zip_file:
        
        csv_file_name = zip_file.namelist()[0]
        
        df = pd.read_fwf(zip_file.open(csv_file_name),skiprows=2,skipfooter=1,\
                  colspecs=colspecs,names=names,header=None,index_col=False,encoding='mbcs')
        
        # applying cleaning and trasnformations over the dataframe
        df = func_data_cleaning(df,df_codbdi,df_especi,df_tpmerc,df_indopc)
        
        # writting the final dataframe into a local SQL lite database
        func_write_sqldatabase(df)
        # deleting dataframe for the next year file reading
        del df
        
        print('\n File read: {}'.format(csv_file_name))


 File read: COTAHIST_A2019.TXT 


 File read: COTAHIST_A2020.TXT 


 File read: COTAHIST_A2021.TXT 


 File read: COTAHIST_A2022.TXT 


 File read: COTAHIST_A2023.TXT 


 File read: COTAHIST_A2024.TXT 

