# Getting derivative open positions data from the brazilian stock exchange

**What ?** Brazilian stock exchange B3 publishes daily a report on its website with all open positions of listed derivatives. This dataset includes information as the ticker by ticker open interest, forward prices, covered and uncovered quantity expirations dates among others.


**Why ?** This dataset is highly useful for displaying and analyzing the daily performance of open positions in derivatives (options, forwards etc..) for commodities, stocks, currencies, and more. It facilitates the recognition of patterns and offers valuable insights into the behavior and market appeal of each derivative.


**How ?** The last 20 days of derivative open positions are available on the B3 website
[(see this link)](https://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/boletim-diario/dados-publicos-de-produtos-listados-e-de-balcao/). From there, it will be manually downloaded to a local temporary folder. This notebook demonstrates how to read the downloaded .csv file and apply data cleaning and transformation using the provided data glossary as a guide [(link here)](https://www.b3.com.br/data/files/1E/D1/BA/58/C841B810E9C1AAA8AC094EA8/DerivativesOpenPositionFile%202023.pdf). Finally, the cleaned data is uploaded to a local SQLite database for further analysis.

<img src="https://lh3.googleusercontent.com/d/1-txvYkA3629aIVb4EEsat2Q3sEU3wgDV" alt="texto_alternativo" width="400" align="center">


## Import Libraries

In [2]:
import pandas as pd
import numpy as np
import os
import re

import sqlite3
import requests
import zipfile

#### Search at local SQLite database what is the last available data uploaded 

In [85]:
conn = sqlite3.connect(os.getenv('MY_FINANCE_DB_PATH')+'/finance_database.db')
cursor = conn.cursor()
cursor.execute('''SELECT ReportDate 
                    FROM B3_Derivative_open_position 
                    GROUP BY ReportDate ''') # this table was previously created to hold the trade by trade data

rows = cursor.fetchall()
columns = [description[0] for description in cursor.description]

df_dt = pd.DataFrame(rows, columns=columns)
conn.close()

df_dt['ReportDate'].sort_values(ascending = False).head(3)

19    2024-05-21
18    2024-05-20
17    2024-05-17
Name: ReportDate, dtype: object

####  Looking for new files manually downloaded from B3 website into a local folder

In [82]:
file_path = os.path.join('temp_files') # Define the file path within the subfolder
all_files = [file for file in os.listdir(file_path) if ".csv" in file]
all_files

['DerivativesOpenPositionFile_20240423_1.csv',
 'DerivativesOpenPositionFile_20240424_1.csv',
 'DerivativesOpenPositionFile_20240425_1.csv',
 'DerivativesOpenPositionFile_20240426_1.csv',
 'DerivativesOpenPositionFile_20240429_1.csv',
 'DerivativesOpenPositionFile_20240430_1.csv',
 'DerivativesOpenPositionFile_20240502_1.csv',
 'DerivativesOpenPositionFile_20240503_1.csv',
 'DerivativesOpenPositionFile_20240506_1.csv',
 'DerivativesOpenPositionFile_20240507_1.csv',
 'DerivativesOpenPositionFile_20240508_1.csv',
 'DerivativesOpenPositionFile_20240509_1.csv',
 'DerivativesOpenPositionFile_20240510_1.csv',
 'DerivativesOpenPositionFile_20240513_1.csv',
 'DerivativesOpenPositionFile_20240514_1.csv',
 'DerivativesOpenPositionFile_20240515_1.csv',
 'DerivativesOpenPositionFile_20240516_1.csv',
 'DerivativesOpenPositionFile_20240517_1.csv',
 'DerivativesOpenPositionFile_20240520_1.csv']

#### Extract, Transform and Load dataset

In [83]:
# Reading each .csv manually downlodaded from B3
####################################################################################################
df_app = pd.DataFrame()

for files in all_files: #zip_files_with_paths[::-1][6:8]

    df_app = pd.read_csv(file_path+'/'+files,sep = ";", encoding = "UTF-8",low_memory=False, dtype=str)

# Data cleaning: changing columns names and data types
####################################################################################################

    # renaming columns
    new_col_names = {'RptDt':'ReportDate',
                 'Asst':'Asset',
                 'XprtnCd':'ExpirationCode',
                 'SgmtNm':'SegmentName',
                 'OpnIntrst':'OpenInterest',
                 'VartnOpnIntrst':'VariationOpenInterest',
                 'DstrbtnId':'DistributionIdentification',
                 'CvrdQty':'CoveredQuantity',
                 'TtlBlckdPos':'TotalBlockedPosition',
                 'UcvrdQty':'UncoveredQuantity',
                 'TtlPos':'TotalPosition',
                 'BrrwrQty':'BorrowerQuantity',
                 'LndrQty':'LenderQuantity',
                 'CurQty':'CurrentQuantity',
                 'FwdPric':'ForwardPrice'}
    df_app.rename(columns = new_col_names, inplace = True )

# changing data type
    df_app['OpenInterest'] = df_app['OpenInterest'].str.replace(',', '.').astype(float)
    df_app['VariationOpenInterest'] = df_app['VariationOpenInterest'].str.replace(',', '.').astype(float) 
    df_app['CoveredQuantity'] = df_app['CoveredQuantity'].str.replace(',', '.').astype(float)
    df_app['TotalBlockedPosition'] = df_app['TotalBlockedPosition'].str.replace(',', '.').astype(float)
    df_app['UncoveredQuantity'] = df_app['UncoveredQuantity'].str.replace(',', '.').astype(float)
    df_app['TotalPosition'] = df_app['TotalPosition'].str.replace(',', '.').astype(float)
    df_app['BorrowerQuantity'] = df_app['BorrowerQuantity'].str.replace(',', '.').astype(float)
    df_app['LenderQuantity'] = df_app['LenderQuantity'].str.replace(',', '.').astype(float)
    df_app['CurrentQuantity'] = df_app['CurrentQuantity'].str.replace(',', '.').astype(float)
    df_app['ForwardPrice'] = df_app['ForwardPrice'].str.replace(',', '.').astype(float)


# Write the dataframe into the SQLite database
####################################################################################################
    conn = sqlite3.connect(os.getenv('MY_FINANCE_DB_PATH')+'/finance_database.db')

    df_app.to_sql('B3_Derivative_open_position',conn,if_exists='append',index=False)
    
    # printing files read over each iteraction
    print(files)
    del df_app 
    df_app = pd.DataFrame()
    

DerivativesOpenPositionFile_20240423_1.csv
DerivativesOpenPositionFile_20240424_1.csv
DerivativesOpenPositionFile_20240425_1.csv
DerivativesOpenPositionFile_20240426_1.csv
DerivativesOpenPositionFile_20240429_1.csv
DerivativesOpenPositionFile_20240430_1.csv
DerivativesOpenPositionFile_20240502_1.csv
DerivativesOpenPositionFile_20240503_1.csv
DerivativesOpenPositionFile_20240506_1.csv
DerivativesOpenPositionFile_20240507_1.csv
DerivativesOpenPositionFile_20240508_1.csv
DerivativesOpenPositionFile_20240509_1.csv
DerivativesOpenPositionFile_20240510_1.csv
DerivativesOpenPositionFile_20240513_1.csv
DerivativesOpenPositionFile_20240514_1.csv
DerivativesOpenPositionFile_20240515_1.csv
DerivativesOpenPositionFile_20240516_1.csv
DerivativesOpenPositionFile_20240517_1.csv
DerivativesOpenPositionFile_20240520_1.csv


#### Reading a sample from the SQLite database

In [84]:
# Show a sample of the data
conn = sqlite3.connect(os.getenv('MY_FINANCE_DB_PATH')+'/finance_database.db')
cursor = conn.cursor()
cursor.execute('''SELECT *
            FROM B3_Derivative_open_position
            WHERE ReportDate = '2024-04-30' ''') # reading a specifically ticker and date as an exlaple
rows = cursor.fetchall()
columns = [description[0] for description in cursor.description]

df_sample = pd.DataFrame(rows, columns=columns)
conn.close()

df_sample.head()

Unnamed: 0,ReportDate,TckrSymb,ISIN,Asset,ExpirationCode,SegmentName,OpenInterest,VariationOpenInterest,DistributionIdentification,CoveredQuantity,TotalBlockedPosition,UncoveredQuantity,TotalPosition,BorrowerQuantity,LenderQuantity,CurrentQuantity,ForwardPrice
0,2024-04-30,ABEVOK24,BRABEV350015,ABEVO,K24,FINANCIAL,191300.0,-5600.0,,,,,,,,,
1,2024-04-30,ABEVOM24,BRABEV360014,ABEVO,M24,FINANCIAL,111400.0,-1500.0,,,,,,,,,
2,2024-04-30,AFSK24,BRBMEFAFS1S2,AFS,K24,FINANCIAL,670.0,0.0,,,,,,,,,
3,2024-04-30,AFSM24,BRBMEFAFS1T0,AFS,M24,FINANCIAL,49.0,0.0,,,,,,,,,
4,2024-04-30,AUDM24,BRBMEFAUD4O3,AUD,M24,FINANCIAL,2.0,0.0,,,,,,,,,
