# Import and Export Data for Santa Catarina

We want to answer the folowing questions:

* What are the top 3 most exported products by State for the years 2017, 2018 and 2019?
* What are the top 3 most imported products by State for the years 2017, 2018 and 2019?
* What are the top 3 most exported products in each month of 2019 by State?
* What is the percentage of total national exports by State in 2019?
* What is the percentage of total national imports by State in 2019?
* Prediction of value of top 3 exported products by month from Santa Catarina to each target Country.
* Prediction of value of top 3 imported products by month from Santa Catarina from each source Country.

## Contents





## Part 1: Data Wrangling for Exports

First step to address our analysis will be to download the files, unzip them, clean them and finally organize the best way possible. I have decided to store the link to the government website and download it as we execute the notebook instead of uploading the file as Git LFS because those file locations **should not** change at all. Considering we are betting on the Brazilian government efficiency, we will store the MD5 of the used files "just in case" they ever change for any reason. One faster way to do this is simply download the zip files yourself and put them in the data folder.

In [57]:
#imports
import pandas as pd
import requests
import numpy as np
import seaborn as sns
import os
import asyncio
import hashlib
import base64
import zipfile

EXP_MD5 = b'qQlAf4t9CVrpQ9h+fMMqfw=='
IMP_MD5 = b'X5m1GyzT+AlRGNciPVHrOA=='

In [58]:
#download files (have patience young padawan...)
async def download_to_data_folder(file_url:str)->str:
    local_filename = 'data/'+file_url.split('/')[-1]
    with requests.get(file_url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            downloaded = 0
            for chunk in r.iter_content(chunk_size=32768): 
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)
                    downloaded += 32768
                    print(file_url.split('/')[-1]+'  '+str(downloaded),end='\r')
    return local_filename

async def download_all_files()->list:
    request_list = []
    request_list.append(download_to_data_folder('http://www.mdic.gov.br/balanca/bd/comexstat-bd/ncm/EXP_COMPLETA.zip'))
    request_list.append(download_to_data_folder('http://www.mdic.gov.br/balanca/bd/comexstat-bd/ncm/IMP_COMPLETA.zip'))
    return asyncio.gather(*request_list)

if not os.path.exists('data/EXP_COMPLETA.zip'):
    file_list = await download_all_files()

In [60]:
#check MD5
for file in os.listdir('data/'):
    if file == 'EXP_COMPLETA.zip' or file == 'IMP_COMPLETA.zip':
        file_hash = hashlib.md5()
        with open('data/'+file,'rb') as f:
            for chunk in iter(lambda: f.read(2**20*file_hash.block_size), b''):
                file_hash.update(chunk)
        if base64.b64encode(file_hash.digest()) != EXP_MD5 and base64.b64encode(file_hash.digest()) != IMP_MD5:
            print(EXP_MD5)
            print(IMP_MD5)
            print(base64.b64encode(file_hash.digest()))
            raise Exception('Wrong file downloaded!')

In [61]:
#unzip files
for file in os.listdir('data/'):
    print(file)
    if file[-3:] == 'zip':
        with zipfile.ZipFile('data/'+file,"r") as zip_ref:
            zip_ref.extractall("data/")
            print('Extracted '+file)

EXP_COMPLETA.csv
EXP_COMPLETA.zip
Extracted EXP_COMPLETA.zip
IMP_COMPLETA.csv
IMP_COMPLETA.zip
Extracted IMP_COMPLETA.zip
NCM.csv
PAIS.csv
README.txt
URF.csv
VIA.csv


We start with exports. One important point we notice are the file sizes. One has around 1.5G unzipped, the other has 2.15G. Those may be a problem for python if you are using a computer with less than 8G RAM Memory.

Let´s import the first one.

In [62]:
exp_df = pd.read_csv('data/EXP_COMPLETA.csv', delimiter=';')

We will replace the names with more human friendly ones, and we notice that some of the data is using codes like integers, not in a human friendly naming system. Those will have to be addressed as well. Reading the documentation in http://www.mdic.gov.br/index.php/comercio-exterior/estatisticas-de-comercio-exterior/base-de-dados-do-comercio-exterior-brasileiro-arquivos-para-download , we can download the missing tables and change our data to a more friendly one.

First thing I learned was what FOB means! https://www.ipea.gov.br/desafios/index.php?option=com_content&view=article&id=2115:catid=28&Itemid=23

So, the column VL_FOB should be renamed to Amount(USD). We also need to download the remaining tables with more human friendly definitions so we get our final Dataframe. Another impotant aspect to notice is that we will only need data from 2017 - 2019, so we will filter those in order to spare some memory space.




In [63]:
#Filtering years
exp_df.query('CO_ANO in [2017,2018,2019]', inplace=True)





In [64]:
#Renaming
exp_df.rename(
    columns={
        "CO_ANO":"Ano",
        "VL_FOB":"Amount(USD)",
        "KG_LIQUIDO":"Kg",
        "SG_UF_NCM":"Estado",
        "CO_MES":"Mes"
    },inplace=True)


Unnamed: 0,Ano,Mes,CO_NCM,CO_UNID,CO_PAIS,Estado,CO_VIA,CO_URF,QT_ESTAT,Kg,Amount(USD)
17941682,2017,5,84181000,11,97,SC,7,145200,982,61327,312310
17941683,2017,11,20029090,10,249,SP,1,817800,73,73,192
17941684,2017,1,39235000,10,493,RS,1,817800,401,401,2241
17941685,2017,3,84313110,10,97,RS,7,145200,662,662,2865
17941686,2017,7,69099000,10,756,SP,4,817600,17,17,1346
17941687,2017,3,87089300,11,521,SP,1,817800,223,400,13120
17941688,2017,12,93062100,10,267,SP,1,817800,66261,66261,1480327
17941689,2017,4,84329000,10,275,RS,1,817800,5,5,111
17941690,2017,7,84099912,11,97,SP,7,145200,25,122,1922
17941691,2017,1,76082090,10,63,ND,4,817600,4,4,863


In [65]:
def download_missing_tables():
    request_list = []
    if not os.path.exists('data/PAIS.csv'):
        request_list.append(download_to_data_folder('http://www.mdic.gov.br/balanca/bd/tabelas/PAIS.csv'))
    if not os.path.exists('data/VIA.csv'):
        request_list.append(download_to_data_folder('http://www.mdic.gov.br/balanca/bd/tabelas/VIA.csv'))
    if not os.path.exists('data/NCM.csv'):
        request_list.append(download_to_data_folder('http://www.mdic.gov.br/balanca/bd/tabelas/NCM.csv'))
    if len(request_list)>0:
        return asyncio.gather(*request_list)
    else:
        return None

download_missing_tables()

In [66]:
pais_df = pd.read_csv('data/PAIS.csv', delimiter=';', encoding="latin-1")


Unnamed: 0,CO_PAIS,CO_PAIS_ISON3,CO_PAIS_ISOA3,NO_PAIS,NO_PAIS_ING,NO_PAIS_ESP
0,0,898.0,ZZZ,Não Definido,Not defined,No definido
1,13,4.0,AFG,Afeganistão,Afghanistan,Afganistan
2,15,248.0,ALA,"Aland, Ilhas",Aland Islands,"Alans, Islas"
3,17,8.0,ALB,Albânia,Albania,Albania
4,20,724.0,ESP,"Alboran-Perejil, Ilhas","Alboran-Perejil, Islands","Alboran-Perejil, Islas"
5,23,276.0,DEU,Alemanha,Germany,Alemania
6,25,278.0,DEU,Alemanha Oriental,East Germany,Alemania del Este
7,31,854.0,BFA,Burkina Faso,Burkina Faso,Burkina Faso
8,37,20.0,AND,Andorra,Andorra,Andorra
9,40,24.0,AGO,Angola,Angola,Angola


In [67]:
via_df = pd.read_csv('data/VIA.csv', delimiter=';', encoding="latin-1")


Unnamed: 0,CO_VIA,NO_VIA
0,0,VIA NAO DECLARADA
1,1,MARITIMA
2,2,FLUVIAL
3,3,LACUSTRE
4,4,AEREA
5,5,POSTAL
6,6,FERROVIARIA
7,7,RODOVIARIA
8,8,CONDUTO/REDE DE TRANSMISSAO
9,9,MEIOS PROPRIOS


In [68]:
produto_df = pd.read_csv('data/NCM.csv', delimiter=';', encoding="latin-1")


Unnamed: 0,CO_NCM,CO_UNID,CO_SH6,CO_PPE,CO_PPI,CO_FAT_AGREG,CO_CUCI_ITEM,CO_CGCE_N3,CO_SIIT,CO_ISIC4,CO_EXP_SUBSET,NO_NCM_POR,NO_NCM_ESP,NO_NCM_ING
0,2042200,10,20422,1990,1101,1,1211,324,4000,10,599.0,"Outras peças não desossadas de ovino, frescas ...","Los dem.cortes c/huesos de la esp.ovina, fresc...","Other sheep cuts, with bone in, fresh or chilled"
1,2042300,10,20423,1990,1101,1,1211,324,4000,10,599.0,"Carnes desossadas de ovino, frescas ou refrige...","Carnes deshues.de la esp.ovina, frescas o refr...","Other meat of sheep, boneless, fresh or chilled"
2,2043000,10,20430,1990,1101,1,1212,324,4000,10,599.0,"Carcaças e meias-carcaças de cordeiro, congeladas","Canales o medias canales de cordero, congeladas","Carcases and half-carcases of lamb, frozen"
3,2044100,10,20441,1990,1101,1,1212,324,4000,10,599.0,"Carcaças e meias-carcaças de ovino, congeladas","Canales o medias canales de la esp.ovina, cong...","Carcases and half-carcases of sheep, frozen"
4,2044200,10,20442,1990,1101,1,1212,324,4000,10,599.0,"Outras peças não desossadas de ovino, congeladas","Los demás cortes s/deshuesar de la esp.ovina, ...","Other lamb cuts with bone in, frozen"
5,2044300,10,20443,1990,1101,1,1212,324,4000,10,599.0,"Carnes desossadas de ovino, congeladas","Carnes deshuesadas de la esp.ovina, congeladas","Meat of sheep, boneless, frozen"
6,2045000,10,20450,1990,1101,1,1213,324,4000,10,599.0,"Carnes de caprino, frescas, refrigeradas ou co...","Carnes de la esp.capr.frescas, refrigeradas o...","Meat of goats, fresh, chilled or frozen"
7,2050000,10,20500,1110,1990,1,124,220,4000,10,599.0,"Carnes de animais das espécies cavalar, asinin...","Carnes de la esp.caball.asnal y mular, fresc.r...","Meat of horses, asses, mules, etc.fresh, chill..."
8,2061000,10,20610,1511,1990,1,1251,324,4000,10,599.0,"Miudezas comestíveis de bovino, frescas ou ref...","Despojos comestibles de la esp.bovina, frescos...","Edible offal of bovine, fresh or chilled"
9,2062100,10,20621,1511,1269,1,1252,324,4000,10,599.0,"Línguas de bovino, congeladas","Lenguas de la esp.bovina, congeladas",Frozen edible bovine tongues


In [69]:
exp_df = exp_df.merge(pais_df,how='inner',on='CO_PAIS')


Unnamed: 0,Ano,Mes,CO_NCM,CO_UNID,CO_PAIS,Estado,CO_VIA,CO_URF,QT_ESTAT,Kg,Amount(USD),CO_PAIS_ISON3,CO_PAIS_ISOA3,NO_PAIS,NO_PAIS_ING,NO_PAIS_ESP
0,2017,5,84181000,11,97,SC,7,145200,982,61327,312310,68.0,BOL,Bolívia,Bolivia,Bolivia
1,2017,3,84313110,10,97,RS,7,145200,662,662,2865,68.0,BOL,Bolívia,Bolivia,Bolivia
2,2017,7,84099912,11,97,SP,7,145200,25,122,1922,68.0,BOL,Bolívia,Bolivia,Bolivia
3,2017,9,90212900,10,97,SP,5,817900,12,12,7980,68.0,BOL,Bolívia,Bolivia,Bolivia
4,2017,2,30039099,10,97,RO,9,250151,90,90,683,68.0,BOL,Bolívia,Bolivia,Bolivia
5,2017,8,52094210,10,97,CE,7,145200,2730,2730,16714,68.0,BOL,Bolívia,Bolivia,Bolivia
6,2017,11,85444200,10,97,ND,4,817600,581,581,24664,68.0,BOL,Bolívia,Bolivia,Bolivia
7,2017,8,58041090,10,97,SP,7,145200,982,982,15469,68.0,BOL,Bolívia,Bolivia,Bolivia
8,2017,6,85061010,11,97,SC,9,250151,240,5,172,68.0,BOL,Bolívia,Bolivia,Bolivia
9,2017,7,73066100,10,97,SC,7,145200,310648,310648,221822,68.0,BOL,Bolívia,Bolivia,Bolivia


In [70]:
exp_df.drop(['CO_PAIS','QT_ESTAT','CO_PAIS_ISON3','NO_PAIS_ING','CO_PAIS_ISOA3','NO_PAIS_ESP'],axis=1,inplace=True)


Unnamed: 0,Ano,Mes,CO_NCM,CO_UNID,Estado,CO_VIA,CO_URF,Kg,Amount(USD),NO_PAIS
0,2017,5,84181000,11,SC,7,145200,61327,312310,Bolívia
1,2017,3,84313110,10,RS,7,145200,662,2865,Bolívia
2,2017,7,84099912,11,SP,7,145200,122,1922,Bolívia
3,2017,9,90212900,10,SP,5,817900,12,7980,Bolívia
4,2017,2,30039099,10,RO,9,250151,90,683,Bolívia
5,2017,8,52094210,10,CE,7,145200,2730,16714,Bolívia
6,2017,11,85444200,10,ND,4,817600,581,24664,Bolívia
7,2017,8,58041090,10,SP,7,145200,982,15469,Bolívia
8,2017,6,85061010,11,SC,9,250151,5,172,Bolívia
9,2017,7,73066100,10,SC,7,145200,310648,221822,Bolívia


In [71]:
exp_df = exp_df.merge(via_df,how='inner',on='CO_VIA')
exp_df = exp_df.merge(produto_df,how='inner',on=['CO_NCM','CO_UNID'])


Unnamed: 0,Ano,Mes,CO_NCM,CO_UNID,Estado,CO_VIA,CO_URF,Kg,Amount(USD),NO_PAIS,...,CO_PPI,CO_FAT_AGREG,CO_CUCI_ITEM,CO_CGCE_N3,CO_SIIT,CO_ISIC4,CO_EXP_SUBSET,NO_NCM_POR,NO_NCM_ESP,NO_NCM_ING
0,2017,5,84181000,11,SC,7,145200,61327,312310,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
1,2017,5,84181000,11,PR,7,230151,1092,3570,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
2,2017,4,84181000,11,SC,7,145200,58772,281605,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
3,2017,8,84181000,11,SC,7,145200,95000,491717,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
4,2017,5,84181000,11,MG,7,145200,2970,16919,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
5,2017,8,84181000,11,PR,7,145200,40324,228961,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
6,2017,3,84181000,11,PR,7,145200,20213,66990,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
7,2017,2,84181000,11,PR,7,145200,21078,75652,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
8,2017,10,84181000,11,SC,7,145200,144439,662794,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."
9,2017,10,84181000,11,MG,7,145200,7138,42485,Bolívia,...,3745,3,77521,311,2000,27,910.0,Combinações de refrigeradores e congeladores (...,Combin.de refrigerador c/congelad.c/puerta ext...,"Refrigerators combin.with freezers, sepap.exte..."


In [72]:
exp_df.drop([
    'CO_NCM',
    'CO_UNID',
    'CO_VIA',
    'CO_URF',
    'CO_PPI',
    'CO_FAT_AGREG',
    'CO_CUCI_ITEM',
    'CO_CGCE_N3',
    'CO_SIIT',
    'CO_ISIC4',
    'CO_EXP_SUBSET',
    'NO_NCM_ESP',
    'NO_NCM_ING',
    'CO_SH6',
    'CO_PPE'
],axis=1,inplace=True)


Unnamed: 0,Ano,Mes,Estado,Kg,Amount(USD),NO_PAIS,NO_VIA,NO_NCM_POR
0,2017,5,SC,61327,312310,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
1,2017,5,PR,1092,3570,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
2,2017,4,SC,58772,281605,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
3,2017,8,SC,95000,491717,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
4,2017,5,MG,2970,16919,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
5,2017,8,PR,40324,228961,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
6,2017,3,PR,20213,66990,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
7,2017,2,PR,21078,75652,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
8,2017,10,SC,144439,662794,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...
9,2017,10,MG,7138,42485,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...


After some manipulations and merging of Dataframes, we finally have the ideal Dataframe with all information we need in a human friendly naming system.

In [73]:
exp_df.rename(
    columns={
        "NO_VIA":"Via",
        "NO_PAIS":"Pais",
        "NO_NCM_POR":"Produto"
    },inplace=True)
column_order = ['Ano','Mes','Estado','Pais','Via','Produto','Kg','Amount(USD)']
exp_df = exp_df[column_order]
exp_df.head(10)

Unnamed: 0,Ano,Mes,Estado,Pais,Via,Produto,Kg,Amount(USD)
0,2017,5,SC,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,61327,312310
1,2017,5,PR,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,1092,3570
2,2017,4,SC,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,58772,281605
3,2017,8,SC,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,95000,491717
4,2017,5,MG,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,2970,16919
5,2017,8,PR,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,40324,228961
6,2017,3,PR,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,20213,66990
7,2017,2,PR,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,21078,75652
8,2017,10,SC,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,144439,662794
9,2017,10,MG,Bolívia,RODOVIARIA,Combinações de refrigeradores e congeladores (...,7138,42485


Next thing is to check for NAs, but there should not be many because all our merging was done using inner joins (we discarded information that was missing in either tables).

In [83]:
exp_df.isna().sum()

Ano            0
Mes            0
Estado         0
Pais           0
Via            0
Produto        0
Kg             0
Amount(USD)    0
dtype: int64

## Part 2: Analysing Exports

Next step in our analysis is to answer the first question: list the top 3 exports by estate in 2017, 2018 and 2019.

In [101]:
exp_grouped = exp_df.groupby(['Ano','Estado','Produto'], as_index=False)['Amount(USD)'].sum()
exp_grouped = exp_grouped.sort_values(by='Amount(USD)',ascending=False)
exp_grouped = exp_grouped.groupby(['Ano','Estado']).head(1)
exp_grouped = exp_grouped.sort_values(by=['Estado','Ano'],ascending=False)
exp_grouped.head(10)

Unnamed: 0,Ano,Estado,Produto,Amount(USD)
115117,2019,TO,"Soja, mesmo triturada, exceto para semeadura",772631871
74880,2018,TO,"Soja, mesmo triturada, exceto para semeadura",995302750
35046,2017,TO,"Soja, mesmo triturada, exceto para semeadura",755967278
115021,2019,SP,Óleos brutos de petróleo,3830462450
74804,2018,SP,Óleos brutos de petróleo,4808992195
32297,2017,SP,Outros açúcares de cana,5562541254
108196,2019,SE,"Suco (sumo) de laranja, não fermentados, sem a...",22690534
68281,2018,SE,"Suco (sumo) de laranja, não fermentados, sem a...",40056880
28964,2017,SE,"Suco (sumo) de laranja, não fermentados, sem a...",34547201
107286,2019,SC,"Pedaços e miudezas, comestíveis de galos/galin...",1489058442


We have 26 States in Brazil, now we just have to find out how many different products we have to think about the best representation.

In [103]:
exp_grouped['Produto'].drop_duplicates().count()

23

We have a rate of around 1:1 of product to state. This leaves us some 

In [105]:
exp_grouped = exp_df[exp_df['Ano']==2019].groupby(['Ano','Mes','Estado','Produto'], as_index=False)['Amount(USD)'].sum()
exp_grouped = exp_grouped.sort_values(by='Amount(USD)',ascending=False)
exp_grouped = exp_grouped.groupby(['Ano','Mes','Estado']).head(3)
exp_grouped = exp_grouped.sort_values(by=['Estado','Mes','Ano'],ascending=False)
exp_grouped.head(10)

Unnamed: 0,Ano,Mes,Estado,Produto,Amount(USD)
237711,2019,12,TO,"Soja, mesmo triturada, exceto para semeadura",24404810
237689,2019,12,TO,"Carnes desossadas de bovino, congeladas",21599502
237695,2019,12,TO,"Milho em grão, exceto para semeadura",4946941
217438,2019,11,TO,"Soja, mesmo triturada, exceto para semeadura",39979339
217404,2019,11,TO,"Carnes desossadas de bovino, congeladas",25670606
217411,2019,11,TO,"Milho em grão, exceto para semeadura",5680944
197570,2019,10,TO,"Soja, mesmo triturada, exceto para semeadura",36715944
197547,2019,10,TO,"Carnes desossadas de bovino, congeladas",18106841
197552,2019,10,TO,"Milho em grão, exceto para semeadura",13944446
176919,2019,9,TO,"Soja, mesmo triturada, exceto para semeadura",76427859


In [106]:
exp_grouped['Produto'].drop_duplicates().count()

112

In [107]:
exp_grouped = exp_df[exp_df['Ano']==2019].groupby(['Ano','Estado'], as_index=False)['Amount(USD)'].sum()
exp_grouped.head(10)

Unnamed: 0,Ano,Estado,Amount(USD)
0,2019,AC,32853291
1,2019,AL,319088835
2,2019,AM,731091968
3,2019,AP,261368366
4,2019,BA,8168158116
5,2019,CE,2275188077
6,2019,DF,160701336
7,2019,ES,8800321849
8,2019,GO,7133398211
9,2019,MA,3543622779


In [109]:
total_sum = exp_grouped['Amount(USD)'].sum()
print(total_sum)

225383482468


In [115]:
exp_grouped['percentage'] = exp_grouped.apply(lambda x: (x['Amount(USD)'] / total_sum) * 100, axis=1)
exp_grouped = exp_grouped.sort_values(by='percentage',ascending=False)
exp_grouped.head(10)

Unnamed: 0,Ano,Estado,Amount(USD),percentage
26,2019,SP,48852560073,21.675306
19,2019,RJ,28634458633,12.704772
10,2019,MG,25138578745,11.153692
23,2019,RS,18545065307,8.228227
14,2019,PA,17841239155,7.915948
12,2019,MT,17206103910,7.634146
18,2019,PR,16454197120,7.300534
24,2019,SC,8951838846,3.971826
7,2019,ES,8800321849,3.904599
4,2019,BA,8168158116,3.624116
