# Brazilian Investment Funds Overview

This project main objective is to map investment fund market in Brazil. Investment funds can trade in different types of assets, like financial products (stocks, options, debentures), real state or buying part of other funds. Each fund raises capital by offering quotas, this can be an ongoing process (open fund) or only one time offer (closed fund).

The funds are created and managed by a financial institution (manager). Managers will also take care of all the paper work involving fund rules, which index should investors use as benchmark and general structure so a bank can issue the fund. The bank role is so ensure the fund is legit, has the paperwork in order and distribute the financial sheets so potential investor can buy a quota. This is a very efficient way to poll resources and diversify the risk for all parties involved.

Learn more in ANBIMA [PDF](https://www.anbima.com.br/data/files/D7/B6/AD/5E/369EC8104606BDC8B82BA2A8/CPA-10-Cap5.pdf).

Now that we know what is an investment fund, and how is its structure, let's be curious about it:
- How many funds are in the market?
- What are the main assets traded? Financial? Real State? Credit?
- Who are the big managers?
- Who are the main issuers?
- How was the market for the last years?

## Importing libraries

In [1]:
import pandas as pd # manipulating data
import numpy as np  # basic math operations
import matplotlib.pyplot as plt # graphs
import seaborn as sns   #graphs
import requests # request files on Brazilian Securities and Exchange Comission (CVM)
import zipfile  # unzip CVM files
import os   # manipulate disk files

## Downloading data
Investment Funds must be registered in Brazilian Securities and Exchange Comission (CVM). CVM has funds data on their daily returns, benchmark index, type of fund, issuer, manager and other informations related to it. All data is open to the public in CVM [website](https://dados.cvm.gov.br/group/fundos-de-investimento):
- Funds return: one database for daily, monthly, quarterly, anual
- Register info: funds name, manager, issuer, type, open/close
- Statement of Income: database with funds link to their state of income
- Investors profile: who owns funds quotas (other business, retirement funds, individual investors, professional investors)
- Performance metrics: how to calculate return, risk accordin to managers, collateral

For this study, I'm interested in the daily returns and register info. This way I can map funds by their features and follow their performance in time. The advantage to use daily data is that I can transform daily info into month, quarter and annual.

In [2]:
# Creating parameters to download data
## Date paramenters to match CVM files
years = ['2024','2023','2022','2021','2020']    # Creating a five year window so we can see the end of pandemic and current government
legacy = ['2020']   # Creating legacy list, CVM moves old data to another URL/directory

months = range(1,13)    # Crete month list from Jan(01) to Dec(12)
month_list = []     # List must be a string bc I'll add each emelento to a url request to CVM

for i in months:    # Transform each integer element into a string element
    if i<10:    # For months with only one digit, we need to add zero (0) before to match the csv file
        i = str('0'+str(i))
    else:   # Months with two digits only need to be converted to string
        i = str(i)
    month_list.append(i)    # Append each string to the month list

To collect the data, I'll request directly from CVM website the zipfiles containing the [daily returns of financial funds](https://dados.cvm.gov.br/dataset/fi-doc-inf_diario). The main issue here, is that we have two types of repositories: (1) monthly zip files from the current year to 3 years ago (Y-3), (2) yearly zip file for older data. So in my study 2024,2023,2022 and 2021 will have monthly zip files, while [2020](https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/HIST/) will have only one zip file with 12 csv files inside.

To solve it, I'll just need to create a loop to identify which year is considered old/legacy by CVM. Previously I created a legacy list with 2020 so I can use it in the loop now.

URL with daily return data
URL model: dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/**DADOS/inf_diario_fi_202307**.zip
- replace the date on the URL '202307' with the loop

URL for legacy data
URL legacy model: dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/**DADOS/HIST/inf_diario_fi_2000**.zip
- replace only the year 2000

In [3]:
cvm_daily_return = pd.DataFrame()   # Create an empty dataframe for daily returns
cvm_legacy_return = pd.DataFrame()  # Creawte an empty dataframe for legacy data

# Create loop to download data from 2020 to July 2024
## I'll collect data for each year in our years list
for yyyy in years:
    try:
        if yyyy in legacy:  # Y-3 data is considered history and moved to a different directory, I'll call it legacy
            daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/HIST/inf_diario_fi_{yyyy}.zip'
            download_url = requests.get(daily_return_url)   # Access URL with zipfile, else would be just a string on our local machine
            zip_filename = f'inf_diario_fi_{yyyy}.zip'  # Save zipfile name, only asks for year to identify it. All 12 month csv are inside it.
            with open(zip_filename,'wb') as zip_ref:    # Open zipfile from CVM url. Using 'wb' because I need to Write ('w') the file in Binary('b')
                zip_ref.write(download_url.content)     # Write/save zipfile on local disk. I used '.content' to write/save the files withing the URL zip
            with zipfile.ZipFile(zip_filename, 'r') as cvm_zip: # Legacy Zip has 12 csv files, so I'll just read and concatenate all
                legacy_csv = [pd.read_csv(cvm_zip.open(f), sep=';') for f in cvm_zip.namelist()]
                cvm_legacy_return = pd.concat(legacy_csv)
            os.remove(zip_filename)  # Delete zipfile from disk so I can keep a clean directory
        else:
            for mm in month_list:   # Y-3< data is called by year and month. So we need to run all month_list elements
                daily_return_url = f'https://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/inf_diario_fi_{yyyy}{mm}.zip'
                download_url = requests.get(daily_return_url)   # Access URL with zipfile
                zip_filename = f'inf_diario_fi_{yyyy}{mm}.zip'  # Save zipfile name, it's important to name it here bc I'll delete the zipfile after
                with open(zip_filename, 'wb') as zip_ref:   # Open zipfile from CVM url
                    zip_ref.write(download_url.content)     # Write/save zipfile on local disk. I used '.content' to write/save the files withing the URL zip
                with zipfile.ZipFile(zip_filename) as cvm_zip:  # Manipulate recent saved zipfile on our disk
                    for file_name in cvm_zip.namelist():    # Look for all files within zipfile
                        if file_name.endswith('.csv'):      # If a file ends with '.csv' I'll open and read it with pandas
                            with cvm_zip.open(file_name) as cvm_csv:    # I'll create a temporary obj, so we can concatenate it with our main dataframe
                                cvm_daily_return_temp = pd.read_csv(cvm_csv, sep=';')   # Read csv, but Brazilian data is separated with ';'
                                cvm_daily_return = pd.concat([cvm_daily_return, cvm_daily_return_temp]) # Concat main dataframe with temporary
                os.remove(zip_filename)  # Delete zipfile from disk so I can keep a clean directory
        cvm_daily_return = pd.concat([cvm_daily_return,cvm_legacy_return])  # Concatenate current and legacy data
    except:
        pass    # Avoid stopping the process in case it looks for current year and months yet to come (YYYYMM+1)

cvm_daily_return    # Observe we get all monthly data from 2024 to 2000

The next step is to identify the funds. Investors know it by their name, not their register number. I'll also need to check their status, to see if the fund still active. Here I face a similar situation from return data, [current](https://dados.cvm.gov.br/dados/FI/CAD/DADOS/cad_fi.csv) and [legacy](https://dados.cvm.gov.br/dados/FI/CAD/DADOS/cad_fi_hist.zip) data.