# Tech Challenge

## Descrição

Você foi contratado(a) para uma consultoria e seu trabalho envolve analisar os dados de vitivinicultura da Embrapa, os quais estão disponíveis [aqui](http://vitibrasil.cnpuv.embrapa.br/index.php).

A ideia do projeto é a criação de uma _API_ pública de consulta nos dados do site nas respectivas abas:

* Produção
* Processamento
* Comercialização
* Importação
* Exportação

A _API_ vai servir para alimentar uma base de dados que futuramente será usada para um modelo de _Machine Learning_.

Seus objetivos incluem:

* Criar uma Rest _API_ em [Python](https://www.python.org/) que faça a consulta no [site da Embrapa](http://vitibrasil.cnpuv.embrapa.br/index.php).
* Sua _API_ deve estar documentada.
* É recomendável (não obrigatório) a escolha de um método de autenticação ([`JWT`](https://jwt.io), por exemplo).
* Criar um plano para fazer o _deploy_ da _API_, desenhando a arquitetura do projeto desde a ingestão até a alimentação do modelo (aqui não é necessário elaborar um modelo de ML, mas é preciso que vocês escolham um cenário interessante em que a _API_ possa ser utilizada).
* Fazer um _MVP_ realizando o _deploy_ com um link compartilhável e um repositório no [GitHub](https://github.com/LucasVmigotto/fiap-pos-ml-eng-techchallenge-1).

## Setup

### Importação de Libs

In [2]:
from os.path import exists
from os import listdir
from urllib.request import urlretrieve
from pathlib import Path

import numpy as np
import pandas as pd
import seaborn
from tqdm import tqdm
from matplotlib import pyplot as plt

### Constantes

In [3]:
BASE_URL = 'http://vitibrasil.cnpuv.embrapa.br/download/'

CSV_URLS = dict(
    producao=dict(producao='Producao.csv'),
    processamento=dict(viniferas='ProcessaViniferas.csv',
                       americanas='ProcessaAmericanas.csv',
                       mesa='ProcessaMesa.csv',
                       sem_classificacao='ProcessaSemclass.csv'),
    comercializacao=dict(comercio='Comercio.csv'),
    importacao=dict(mesa='ImpVinhos.csv',
                    espumantes='ImpEspumantes.csv',
                    frescas='ImpFrescas.csv',
                    passas='ImpPassas.csv',
                    suco='ImpSuco.csv'),
    exportacao=dict(mesa='ExpVinho.csv',
                    espumantes='ExpEspumantes.csv',
                    frescas='ExpUva.csv',
                    suco='ExpSuco.csv'),
)

In [4]:
PATH_BASE_DATA = Path('./data')
PATH_RAW_DATA = Path('./data') / 'raw'
PATH_INTERIM_DATA = Path('./data') / 'interim'

SEED = 42

np.random.seed(SEED)

PATH_RAW_DATA.mkdir(parents=True, exist_ok=True)
PATH_INTERIM_DATA.mkdir(parents=True, exist_ok=True)

### Utils

In [4]:
def download_files(url: str, save_path: Path, is_retry: bool = False) -> None:
    try:
        if not exists(save_path):
            urlretrieve(url, save_path)
    except ConnectionResetError:
        print(f'File {url} could not be downloaded')
        if not is_retry:
            download_files(url, save_path, True)
        return

In [5]:
def download_data_files(data_files_structure: dict, parent_path: Path = Path('')):
    for key, value in tqdm(data_files_structure.items()):
        if isinstance(value, dict):
            download_data_files(value, parent_path / key)
            continue
        (PATH_RAW_DATA / parent_path).mkdir(parents=True,
                                            exist_ok=True)
        download_files(url=BASE_URL + value,
                       save_path=PATH_RAW_DATA / parent_path / value.lower())

## Coleta dos Dados

In [6]:
download_data_files(CSV_URLS)
listdir(PATH_RAW_DATA)

100%|██████████| 1/1 [00:00<00:00,  3.22it/s]
100%|██████████| 4/4 [00:01<00:00,  3.26it/s]
100%|██████████| 1/1 [00:00<00:00,  3.08it/s]
100%|██████████| 5/5 [00:01<00:00,  3.30it/s]
100%|██████████| 4/4 [00:01<00:00,  3.12it/s]
100%|██████████| 5/5 [00:04<00:00,  1.07it/s]


['producao', 'importacao', 'exportacao', 'comercializacao', 'processamento']

* Verificar o header de cada arquivo `.csv` - conferir o tipo de separador utilizado

In [7]:
! head -n 1 ./data/raw/**/*.csv

==> ./data/raw/comercializacao/comercio.csv <==
1;VINHO DE MESA;VINHO DE MESA;98327606;114399031;118377367;116617910;94173324;108031792;139238614;140813114;141293379;149609112;122825298;128894580;166861772;195616620;171619507;185191837;203130018;131065191;150678647;172921267;164725646;190134895;180230431;201168480;180295366;146583828;165831436;174768638;181576649;200578746;221023603;221518224;227447392;217082959;225021830;271248493;245625614;226710045;200488612;234525979;221242945;230310468;206969571;221590810;206404427;209198468;166769622;176059959;177186273;180446489;215557931;210012238;187939996

==> ./data/raw/exportacao/expespumantes.csv <==
Id;País;1970;1970;1971;1971;1972;1972;1973;1973;1974;1974;1975;1975;1976;1976;1977;1977;1978;1978;1979;1979;1980;1980;1981;1981;1982;1982;1983;1983;1984;1984;1985;1985;1986;1986;1987;1987;1988;1988;1989;1989;1990;1990;1991;1991;1992;1992;1993;1993;1994;1994;1995;1995;1996;1996;1997;1997;1998;1998;1999;1999;2000;2000;2001;2001;2002;2002;2003;20

**Um pouco de cultura**

* "Imaginemos, por um instante, que a humanidade fosse transportada a um país utópico, onde os pombos voem já assados, onde todo o alimento cresça do solo espontaneamente, onde cada homem encontre sua amada ideal e a conquiste sem qualquer dificuldade. Ora, nesse país, muitos homens morreriam de tédio ou se enforcariam nos galhos das árvores, enquanto outros se dedicariam a lutar entre si e a se estrangular, a se assassinar uns aos outros." (Arthur Schopenhauer)

**TL;DR**

* Se fácil, teria graça? (Provavelmente ainda sim, mas onde está o significado da vida sem o estresse?)

**Whatever**

* Todos, com exceção de `processamento` que usa `\t`, usam, como separador, o `;`

## Manipulação dos Dados

In [68]:
dfs = dict.fromkeys(listdir(PATH_RAW_DATA), {})

for category in listdir(PATH_RAW_DATA):
    for data_file in listdir(PATH_RAW_DATA / category):
        dfs[category] = {
            **dfs[category], 
            data_file[:-4]: pd.read_csv(PATH_RAW_DATA / category / data_file,
                                        sep=';' if category != 'processamento' else '\t').iloc[:, 1:]
        }
        
dfs.keys()

dict_keys(['producao', 'importacao', 'exportacao', 'comercializacao', 'processamento'])

In [70]:
def print_head(df_dict: dict, parent_key: str | None = ''):
    for key, value in df_dict.items():
        if isinstance(value, dict):
            print_head(value, key)
        else:
            print(parent_key, key)
            display(value.head())

print_head(dfs)

producao producao


Unnamed: 0,produto,1970,1971,1972,1973,1974,1975,1976,1977,1978,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,VINHO DE MESA,217208604,154264651,146953297,116710345,193875345,177401209,144565438,195359778,200053669,...,196904222,196173123,210308560,86319015,255015187,218375636,144629737,124200414,173899995,195031611
1,Tinto,174224052,121133369,118180926,88589019,146544484,144274134,118360170,154801826,162917363,...,163111797,157776363,169811472,75279191,1365957,188270142,121045115,103916391,146075996,162844214
2,Branco,748400,1160500,1812367,243900,4138768,1441507,1871473,4954387,5079748,...,32066403,37438069,39557250,10727099,217527985,29229970,22032828,19568734,26432799,30198430
3,Rosado,42236152,31970782,26960004,27877426,43192093,31685568,24333795,35603565,32056558,...,1726022,958691,939838,312725,36121245,875524,1551794,715289,1391200,1988968
4,VINHO FINO DE MESA (VINÍFERA),23899346,23586062,21078771,12368410,31644124,39424590,34500590,41264971,36750933,...,45782530,38464314,37148982,18070626,44537870,38707220,37615422,32516686,43474998,47511796


importacao impvinhos


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Africa do Sul,0,0.0,0,0,0,0,0,0,0,...,1127053,3574371,1092042,3604038,627150,1701072,859169,2508140,738116.0,2266827.0
1,Alemanha,52297,30498.0,34606,26027,134438,92103,111523,98638,219173,...,142971,516975,101055,412794,136992,504168,106541,546967,92600.0,438595.0
2,Argélia,0,0.0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0.0,0.0
3,Arábia Saudita,0,0.0,0,0,0,0,0,0,0,...,563,3249,0,0,0,0,2510,8761,0.0,0.0
4,Argentina,19525,12260.0,24942,15022,104906,58137,116887,76121,215930,...,15221318,52817642,16548931,54527380,22610267,66322932,26869241,79527959,27980574.0,87519642.0


importacao impfrescas


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Argélia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,21746,34476,0,0,0,0
1,Argentina,2412831,782408,1805310,684328,1965010,779841,396840,253061,1179250,...,1616855,2499358,3211513,4659791,1798220,2559671,1356735,1696659,771187,1053519
2,Brasil,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Chile,1557316,409704,1485897,381705,1680664,578569,2472710,958295,2154407,...,16103415,25237521,9082957,13450429,4943446,6986630,3888723,5404163,6536258,9640996
4,Colômbia,0,0,3312,2168,0,0,63645,37098,131888,...,0,0,0,0,0,0,0,0,0,0


importacao imppassas


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Afeganistão,0,0,0,0,0,0,0,0,0,...,0,0,40000,65452,0,0,0,484,0,0
1,África do Sul,0,0,0,0,0,0,0,0,0,...,1089700,3106052,270875,763921,616688,1433381,327000,531327,657826,1266633
2,"Alemanha, República Democrática",0,0,0,0,3000,3366,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Arábia Saudita,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Argentina,1530058,845068,1206326,936632,1047482,981648,1277380,1848195,1004987,...,23564519,43501396,24135143,48810789,24992502,36743996,24015825,33337604,20829418,35659325


importacao impespumantes


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Africa do Sul,0,0,0,0,0,0,0,0,0,...,15368,74800,17583,72077,3574,14542,6980,36677,9882,64582
1,Alemanha,0,0,25,36,2864,2479,2900,3130,1667,...,18376,82273,26853,169989,21174,65359,19977,46237,12447,26877
2,Argentina,4980,3836,8811,7543,35301,26909,39208,20230,2831,...,706478,2462909,757716,2282614,469547,1304986,723847,2211657,1333420,4123623
3,Austrália,0,0,0,0,0,0,0,0,0,...,57917,212199,16701,27592,7426,15190,8062,26208,0,0
4,Austria,0,0,0,0,0,0,0,0,0,...,1228,11638,1269,11571,909,9399,90,1434,0,0


importacao impsuco


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2021.2,2021.3
0,Africa do Sul,0,0,0,0,0,0,0,0,0,...,21618,18334,0,0,0,0,0,0,0,0
1,Alemanha,0,0,0,0,0,0,0,0,0,...,80,652,0,0,0,0,0,0,0,0
2,Argentina,0,0,0,0,0,0,3600,1350,10200,...,2998,3139,0,0,0,0,0,0,0,0
3,Austria,0,0,0,0,0,0,0,0,0,...,0,0,666,655,0,0,0,0,0,0
4,Canadá,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


exportacao expsuco


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,África do Sul,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,34344,49753
1,"Alemanha, República Democrática da",0,0,0,0,9962,6514,238418,160979,0,...,96,91,12,24,86,44,13,4,5,24
2,Angola,0,0,0,0,0,0,0,0,0,...,1895,1553,0,0,17766,21627,6073,5915,86536,91839
3,Antígua e Barbuda,0,0,0,0,0,0,0,0,0,...,48,53,95,99,36,25,120,168,48,57
4,Antilhas Holandesas,0,0,0,0,1125,945,144,135,0,...,0,0,0,0,0,0,0,0,0,0


exportacao expvinho


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Afeganistão,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,11,46,0,0
1,África do Sul,0,0,0,0,0,0,0,0,0,...,0,0,26,95,4,21,0,0,0,0
2,"Alemanha, República Democrática",0,0,0,0,4168,2630,12000,8250,0,...,10794,45382,3660,25467,6261,32605,2698,6741,7630,45367
3,Angola,0,0,0,0,0,0,0,0,0,...,477,709,345,1065,0,0,0,0,4068,4761
4,Anguilla,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


exportacao expespumantes


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Alemanha,0,0,0,0,0,0,0,0,0,...,4092,21373,1003,5466,2388,14767,142,265,1164,6560
1,Angola,0,0,0,0,0,0,0,0,0,...,63,280,1007,3615,24,38,0,0,26383,141588
2,Antigua e Barbuda,0,0,0,0,0,0,0,0,0,...,0,0,7,34,32,328,10,82,65,146
3,Antilhas Holandesas,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Argentina,0,0,0,0,0,0,0,0,0,...,4342,17243,0,0,315,894,0,0,0,0


exportacao expuva


Unnamed: 0,País,1970,1970.1,1971,1971.1,1972,1972.1,1973,1973.1,1974,...,2018,2018.1,2019,2019.1,2020,2020.1,2021,2021.1,2022,2022.1
0,Africa do Sul,0,0,0,0,0,0,0,0,0,...,0,0,8,30,44,152,0,0,0,0
1,"Alemanha, República Democrática",0,0,135,103,0,0,0,0,3840,...,2870420,5833592,1863097,3480290,1371694,2791556,1461590,2569452,559012,1213303
2,Angola,0,0,0,0,0,0,0,0,0,...,15,33,75,145,0,0,0,0,0,0
3,Antígua e Barbuda,0,0,0,0,0,0,0,0,0,...,65,164,190,580,304,1013,437,1349,253,999
4,Arabia Saudita,0,0,0,0,0,0,0,0,0,...,14725,28615,167731,271231,32325,95999,2818,14671,12224,53675


comercializacao comercio


Unnamed: 0,VINHO DE MESA,VINHO DE MESA.1,98327606,114399031,118377367,116617910,94173324,108031792,139238614,140813114,...,221590810,206404427,209198468,166769622,176059959,177186273,180446489,215557931,210012238,187939996
0,vm_Tinto,Tinto,83300735,98522869,101167932,98196747,77167303,91528090,116407222,116609545,...,188033494,178250072,182028785,146646365,154309442,155115499,158519218,189573423,185653678,165067340
1,vm_Rosado,Rosado,107681,542274,7770851,8425617,8891367,7261777,11748047,15195525,...,1777648,1419855,1409002,1391942,1097426,1972944,1265435,1394901,1931606,2213723
2,vm_Branco,Branco,14919190,15333888,9438584,9995546,8114654,9241925,11083345,9008044,...,31779668,26734500,25760681,18731315,20653091,20097830,20661836,24589607,22426954,20658933
3,VINHO FINO DE MESA,VINHO FINO DE MESA,4430629,4840369,5602091,7202830,7571802,8848303,14095648,14975330,...,27912934,20424983,20141631,19630158,15874354,14826143,15684588,24310834,27080445,21533487
4,vm_Tinto,Tinto,435354,428927,624499,783508,1616144,2050960,4450570,4504303,...,19121750,15354938,15572632,15228514,12021684,11150517,11433702,18202453,19337862,15258778


processamento processamesa


Unnamed: 0,control,cultivar,1970,1971,1972,1973,1974,1975,1976,1977,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,TINTAS,TINTAS,56976,43390,4428,8939,125563,183731,88512,113591,...,75362,65850,108797,51310,85510,62567,nd,63474,21732,*
1,ti_ Alphonse Lavallee,Alphonse Lavallee,31878,2333,170,7690,124762,74293,23684,24430,...,0,0,0,0,0,0,nd,0,0,*
2,ti_ Moscato de Hamburgo,Moscato de Hamburgo,25098,41057,4258,1249,801,109438,64828,89161,...,75362,65850,108797,51310,85510,62567,nd,63474,21732,*
3,BRANCAS,BRANCAS,3900,16335,6829,8052,278951,587745,153445,230332,...,4412,30600,64540,3260,5900,730,nd,45610,52760,*
4,br_Cardinal,Cardinal,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,nd,0,0,*


processamento processaviniferas


Unnamed: 0,control,cultivar,1970,1971,1972,1973,1974,1975,1976,1977,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,TINTAS,TINTAS,10448228,11012833,10798824,8213674,17457849,22593885,20265190,24830345,...,36855419,29810706,29935627,13370866,32850915,26868514,nd,28003505,93296587,*
1,ti_Alicante Bouschet,Alicante Bouschet,0,0,0,0,0,0,0,0,...,1524728,1456305,1519576,908841,2040198,2103844,nd,2272985,811140,*
2,ti_Ancelota,Ancelota,0,0,0,0,0,0,0,0,...,1137943,937844,773526,179028,733907,492106,nd,481402,6513974,*
3,ti_Aramon,Aramon,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,nd,0,0,*
4,ti_Alfrocheiro,Alfrocheiro,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,nd,0,0,*


processamento processaamericanas


Unnamed: 0,control,cultivar,1970,1971,1972,1973,1974,1975,1976,1977,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,TINTAS,TINTAS,284285642,193695427,187251954,161593877,295404693,282003228,222289729,272772106,...,460738996,471555903,537592720,235356261,600605876,519506589,nd,376136313,394224543,*
1,ti_Bacarina,Bacarina,82899,106962,67464,58690,138158,101454,57297,0,...,2990,3900,0,0,0,0,nd,0,0,*
2,ti_Bailey,Bailey,0,0,0,0,0,0,0,0,...,756000,991449,963159,442784,1370092,539742,nd,534981,4092669,*
3,ti_Bordo,Bordo,7242197,7227090,6530686,5584243,13341412,16023998,12725233,18714617,...,102788361,113008320,137467196,60976531,160146475,158405972,nd,129978861,117655879,*
4,ti_Bourdin (S),Bourdin (S),0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,nd,0,0,*


processamento processasemclass


Unnamed: 0,control,cultivar,1970,1971,1972,1973,1974,1975,1976,1977,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,sc,Sem classificação,3675463,665425,197232,491357,57307,540146,24440,9743,...,0,0,0,0,0,0,nd,166947,0,*


## Exploração dos Dados