# Brazilian Investment Funds Overview

This project main objective is to map investment fund market in Brazil. Investment funds can trade in different types of assets, like financial products (stocks, options, debentures), real state or buying part of other funds. Each fund raises capital by offering quotas, this can be an ongoing process (open fund) or only one time offer (closed fund).

The funds are created and managed by a financial institution (manager). Managers will also take care of all the paper work involving fund rules, which index should investors use as benchmark and general structure so a bank can issue the fund. The bank role is so ensure the fund is legit, has the paperwork in order and distribute the financial sheets so potential investor can buy a quota. This is a very efficient way to poll resources and diversify the risk for all parties involved.

Learn more in ANBIMA [PDF](https://www.anbima.com.br/data/files/D7/B6/AD/5E/369EC8104606BDC8B82BA2A8/CPA-10-Cap5.pdf).

Now that we know what is an investment fund, and how is its structure, let's be curious about it:
- How many funds are in the market?
- What are the main assets traded? Financial? Real State? Credit?
- Who are the big managers?
- Who are the main issuers?
- How was the market for the last years?

## Importing libraries

In [1]:
import pandas as pd # manipulating data
import numpy as np  # basic math operations
import matplotlib.pyplot as plt # graphs
import seaborn as sns   #graphs
import requests # request files on Brazilian Securities and Exchange Comission (CVM)
import zipfile  # unzip CVM files
import os   # manipulate disk files

## Downloading data
Investment Funds must be registered in Brazilian Securities and Exchange Comission (CVM). CVM has funds data on their daily returns, benchmark index, type of fund, issuer, manager and other informations related to it. All data is open to the public in CVM [website](https://dados.cvm.gov.br/group/fundos-de-investimento):
- Funds return: one database for daily, monthly, quarterly, anual
- Register info: funds name, manager, issuer, type, open/close
- Statement of Income: database with funds link to their state of income
- Investors profile: who owns funds quotas (other business, retirement funds, individual investors, professional investors)
- Performance metrics: how to calculate return, risk accordin to managers, collateral

For this study, we are interested in the daily returns and register info. This way we can map funds by their features and follow their performance in time. The advantage to use daily data is that we can transform daily info into month, quarter and annual.

In [2]:
# Creating parameters to download data
## Date paramenters to match CVM files
years = ['2024','2023','2022','2021','2020']    # Creating a five year window so we can see the end of pandemic and current government
legacy = ['2020']   # Creating legacy list, CVM moves old data to another URL/directory

months = range(1,13)    # Crete month list from Jan(01) to Dec(12)
month_list = []     # List must be a string bc we'll add each emelento to a url request to CVM

for i in months:    # Transform each integer element into a string element
    if i<10:    # For months with only one digit, we need to add zero (0) before to match the csv file
        i = str('0'+str(i))
    else:   # Months with two digits only need to be converted to string
        i = str(i)
    month_list.append(i)    # Append each string to the month list


## URL with daily return data
### URL model: https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/inf_diario_fi_202307.zip
### We will replace the date on the URL '202307'
# daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/HIST/inf_diario_fi_{yyyy}{mm}.zip'

In [12]:
cvm_daily_return = pd.DataFrame()
cvm_legacy_return = pd.DataFrame()
# Create loop to download data from 2020 to July 2024
for yyyy in years:
    # try:
        if yyyy in legacy:
            daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/HIST/inf_diario_fi_{yyyy}.zip'
            download_url = requests.get(daily_return_url)
            zip_ref = zipfile.ZipFile(f'inf_diario_fi_{yyyy}.zip')
            
        else:
            for mm in month_list:
                daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/inf_diario_fi_{yyyy}{mm}.zip'
                download_url = requests.get(daily_return_url)
                zip_filename = f'inf_diario_fi_{yyyy}{mm}.zip'
                with open(zip_filename, 'wb') as zip_ref:
                    zip_ref.write(download_url.content)
                with zipfile.ZipFile(zip_filename) as cvm_zip:
                    for file_name in cvm_zip.namelist():
                        if file_name.endswith('.csv'):
                            with cvm_zip.open(file_name) as cvm_csv:
                                cvm_daily_return_temp = pd.read_csv(cvm_csv, sep=';')
                                cvm_daily_return = pd.concat([cvm_daily_return, cvm_daily_return_temp])
                os.remove(zip_filename)  # Deleta o arquivo ZIP
        cvm_daily_return = pd.concat([cvm_daily_return,cvm_legacy_return])
    # except:
    #     pass
# print(zip_ref)

BadZipFile: File is not a zip file

In [13]:
cvm_daily_return

Unnamed: 0,TP_FUNDO,CNPJ_FUNDO,DT_COMPTC,VL_TOTAL,VL_QUOTA,VL_PATRIM_LIQ,CAPTC_DIA,RESG_DIA,NR_COTST
0,FI,00.017.024/0001-53,2024-01-02,1136699.13,34.298860,1139708.10,0.0,0.0,1
1,FI,00.017.024/0001-53,2024-01-03,1137245.82,34.312303,1140154.80,0.0,0.0,1
2,FI,00.017.024/0001-53,2024-01-04,1137741.93,34.326023,1140610.71,0.0,0.0,1
3,FI,00.017.024/0001-53,2024-01-05,1138240.64,34.338221,1141016.02,0.0,0.0,1
4,FI,00.017.024/0001-53,2024-01-08,1138427.98,34.350495,1141423.89,0.0,0.0,1
...,...,...,...,...,...,...,...,...,...
566096,FI,97.929.213/0001-34,2024-07-24,85934325.45,12.006308,85939053.64,0.0,0.0,2
566097,FI,97.929.213/0001-34,2024-07-25,85830176.90,11.991669,85834264.75,0.0,0.0,2
566098,FI,97.929.213/0001-34,2024-07-26,86024213.90,12.018688,86027661.95,0.0,0.0,2
566099,FI,97.929.213/0001-34,2024-07-29,86057021.22,12.023181,86059828.50,0.0,0.0,2


In [None]:
cvm_daily_return = []
# Create loop to download data from 2020 to July 2024
for yyyy in years:
    # try:
        if yyyy in ['2000']:
            daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/HIST/inf_diario_fi_{yyyy}.zip'
            download_url = requests.get(daily_return_url)
            zip_ref = zipfile.ZipFile(f'inf_diario_fi_{yyyy}.zip')
        else:
            for mm in month_list:
                daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/inf_diario_fi_{yyyy}{mm}.zip'
                download_url = requests.get(daily_return_url)
                zip_ref = zipfile.ZipFile(f'inf_diario_fi_{yyyy}{mm}.zip')
        cvm_daily_return_temp = pd.read_csv(zip_ref.open(f'inf_diario_fi_{yyyy}{mm}.csv'))
        cvm_daily_return = pd.concat(cvm_daily_return, axis = 1)
    # except:
    #     pass
print(zip_ref)

In [18]:
daily_return_url

'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/inf_diario_fi_202401.zip'