### Standardization of SUPERCIAS's data.

The objective of this notebook is to automate the preprocessing and generation of the financial statements of all the companies listed in the Superintendencia de Compañías, Valores y Seguros. The information is quite complete, but the aggregated accounts tend to be imperfectly registered, which is why it's necessary to fill them automatically using the values of the accounts that compose them.

This task can be done using the IFRS manual developed by SUPERCIAS; which details with accuracy which accounts compose the faulty aggregated variables before mentioned. The general idea is to sum up those smaller accounts to get a better version of the faulty variables.

In [32]:
#Import Libraries
import pandas as pd
import os
from zipfile import ZipFile
import tabula

Before working on the standardization it is crucial to find out how does the aggregation method of the Ecuadorian Superintendency of Companies, Stocks, and Insurance works. To do this we decided to grab the largest company, in terms of revenue (i.e. Corporación Favorita C.A.). The reasoning behind this is that such a large company ought to have the least empty values possible for the disaggregated accounts; therefore making it easier to make a complete standardization of the methods needed to generate the aggregated accounts.

We will read the last balance sheet and income statement submitted by this company, then we will use the IFRS instructive developed by the Superintendency to design the aggregation methodology. The idea is to create a function that can take in the value of a specific account, and sum up all the disaggregated "children" of that account.

In [126]:
#Read the submitted statements from the pdf.
fav = tabula.read_pdf("Guide_Docs/favorita_2022.pdf",pages = "all")
for i in range(len(fav)):
    if i == 0:
        num = 7
    else:
        num = 0
    cols = fav[i].iloc[num]
    fav[i].columns = cols
    fav[i] = fav[i].iloc[num+1:].reset_index(drop = True)
fav = fav[:-1]
fav = pd.concat(fav)
#Set dtypes for each column
fav["CUENTA"] = fav["CUENTA"].astype(str)
fav["CÓDIGO"] = fav["CÓDIGO"].astype(str)
fav["VALOR (En USD$)"] = fav["VALOR (En USD$)"].astype(float)
#Visualize data as units
pd.options.display.float_format = '{:.2f}'.format
fav


Unnamed: 0,CUENTA,CÓDIGO,VALOR (En USD$)
0,ACTIVO,1,2480403867.05
1,ACTIVO CORRIENTE,101,701635273.03
2,EFECTIVO Y EQUIVALENTES DE EFECTIVO,10101,7283311.87
3,CAJA,1010101,1342366.55
4,INSTITUCIONES FINANCIERAS PÚBLICAS,1010102,0.00
...,...,...,...
48,SUPERÁVIT POR REVALUACIÓN DE INVERSIONES,30607,0.00
49,RESULTADOS DEL EJERCICIO,307,152679114.47
50,GANANCIA NETA DEL PERIODO,30701,152679114.47
51,(-) PÉRDIDA NETA DEL PERIODO,30702,0.00


In [127]:
set(fav["CÓDIGO"].str.len())

{1, 2, 3, 5, 7, 9, 11}

In [143]:
#Display parent
print("Displaying Total Assets account:")
display(fav[fav["CÓDIGO"] == "1"])
#Display children
print("What children is the Total Assets account composed of?")
display(fav[(fav["CÓDIGO"].str.startswith("1")) & (fav["CÓDIGO"].str.len() == 3)])
s = fav[(fav["CÓDIGO"].str.startswith("1")) & (fav["CÓDIGO"].str.len() == 3)]["VALOR (En USD$)"].sum()
print(f"Total sum of children = {s}")
#Test if parent matches the sum of children
print(f"""Is sum of children equal to parent?: {fav[fav["CÓDIGO"] == "1"]["VALOR (En USD$)"][0] == s}""")

Displaying Total Assets account:


Unnamed: 0,CUENTA,CÓDIGO,VALOR (En USD$)
0,ACTIVO,1,2480403867.05


What is the Total Assets account composed of?


Unnamed: 0,CUENTA,CÓDIGO,VALOR (En USD$)
1,ACTIVO CORRIENTE,101,701635273.03
3,ACTIVOS NO CORRIENTES,102,1778768594.02


Total sum of children = 2480403867.05
Is sum of children equal to parent?: True


According to the syntax of the account numeration, as well as the IFRS instructive, the accounts follow a specific logic. All accounts have an odd number of digits. Their children contain the same characters as their parents, plus two other characters more. For example, the Total Assets account (1) is composed of the Current Assets (101) and the Non Current Assets (102) accounts. And the same goes for the accounts that compose the current assets and non current assets accounts which are, respectively, (101XX) & (102XX).

In [31]:
#Decompress 2022's financial statements
with ZipFile("Financials/estadosFinancieros_2022.zip","r") as zip_ref:
    zip_ref.extractall("Financials")
#Read Financial Statements ¿
df = pd.read_csv("Financials/balances_2022_1.txt",sep = "\t",encoding = "latin-1")
dic = pd.read_csv("Financials/catalogo_2022_1.txt",sep = "\t",encoding = "latin-1")
display(df.head(5))
display(dic.head(5))

  df = pd.read_csv("Financials/balances_2022_1.txt",sep = "\t",encoding = "latin-1")


Unnamed: 0,AÑO,EXPEDIENTE,RUC,NOMBRE,RAMA_ACTIVIDAD,DESCRIPCION_RAMA,CIIU,CUENTA_1,CUENTA_101,CUENTA_10101,...,CUENTA_80004,CUENTA_80005,CUENTA_80006,CUENTA_80007,CUENTA_80008,CUENTA_80009,CUENTA_801,CUENTA_80101,CUENTA_80102,Unnamed: 629
0,2022,1,1790013731001,ACEITES TROPICALES SOCIEDAD ANONIMA ATSA,A,"AGRICULTURA, GANADERÍA, SILVICULTURA Y PESCA.",A0126.01,136005192,7163018,5474404,...,0,-124481370,0,0,0,0,-128854554,0,0,
1,2022,2,1790004724001,ACERIA DEL ECUADOR CA ADELCA.,C,INDUSTRIAS MANUFACTURERAS.,C2410.25,45904562792,24176799046,4180908924,...,0,0,0,0,0,0,0,0,0,
2,2022,3,1790008959001,ACERO COMERCIAL ECUATORIANO S.A.,G,COMERCIO AL POR MAYOR Y AL POR MENOR REPARACIÓ...,G4610.03,1122447090,1089343961,123846608,...,0,0,0,0,0,0,0,0,0,
3,2022,11,1790044149001,AEROVIAS DEL CONTINENTE AMERICANO S.A. AVIANCA,H,TRANSPORTE Y ALMACENAMIENTO.,H5110.01,2091542896,218195471,205045471,...,0,0,0,0,0,0,0,0,0,
4,2022,22,1790023516001,AGENCIAS Y REPRESENTACIONES CORDOVEZ SA,G,COMERCIO AL POR MAYOR Y AL POR MENOR REPARACIÓ...,G4630.95,2779966203,2713138013,121227674,...,0,140276,0,0,0,0,101021323,0,0,


Unnamed: 0,1,ACTIVO
0,101,ACTIVO CORRIENTE
1,10101,EFECTIVO Y EQUIVALENTES DE EFECTIVO
2,1010101,CAJA
3,1010102,INSTITUCIONES FINANCIERAS PÚBLICAS
4,1010103,INSTITUCIONES FINANCIERAS PRIVADAS
