<a href="https://colab.research.google.com/github/andrevcmelo/BootcampAlura2021/blob/main/BootcampAlura_ProjetoModulo01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Projeto do Módulo 01 do Bootcamp Alura em Data Science Aplicada 2020**

O objetivo deste projeto é realizar uma análise exploratória na base de dados do COVID-19 do Brasil.IO.

1. Link da base de dados (baixado em 06-11-2020): https://github.com/andrevcmelo/BootcampAlura2021/blob/main/COVID19-BRASILIO-06-11-2020.csv.gz

2. Link do Brasil.IO: https://brasil.io/home/

3. Link da descrição do dataset: https://github.com/turicas/covid19-br/blob/master/README.md#licen%C3%A7a-e-cita%C3%A7%C3%A3o


#### Importação de Bibliotecas

In [18]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

#Definição do tema dos gráficos do matplotlib
plt.style.use('ggplot')

#### Importação de Dados

In [19]:
file_gzip = ("https://raw.githubusercontent.com/alura-cursos/agendamento-hospitalar/main/dados/COVID19-BRASILIO-06-11-2020.csv.gz")

dados = pd.read_csv(file_gzip, compression='gzip',  
                    sep=",", decimal=".",
                   error_bad_lines=False)

#Definindo formato dos registros tipo númerico
pd.options.display.float_format = "{:.2f}".format
#Avaliando as linhas iniciais do dataset
dados.head()

Unnamed: 0,date,state,city,place_type,confirmed,deaths,order_for_place,is_last,estimated_population_2019,estimated_population,city_ibge_code,confirmed_per_100k_inhabitants,death_rate
0,2020-11-05,AC,,state,31248,697,234,True,881935.0,894470.0,12.0,3493.47,0.02
1,2020-11-04,AC,,state,31218,696,233,False,881935.0,894470.0,12.0,3490.11,0.02
2,2020-11-03,AC,,state,30962,695,232,False,881935.0,894470.0,12.0,3461.49,0.02
3,2020-11-02,AC,,state,30954,693,231,False,881935.0,894470.0,12.0,3460.6,0.02
4,2020-11-01,AC,,state,30951,693,230,False,881935.0,894470.0,12.0,3460.26,0.02


In [20]:
dados.tail()

Unnamed: 0,date,state,city,place_type,confirmed,deaths,order_for_place,is_last,estimated_population_2019,estimated_population,city_ibge_code,confirmed_per_100k_inhabitants,death_rate
936865,2020-06-23,SP,Óleo,city,1,0,5,False,2496.0,2471.0,3533809.0,40.47,0.0
936866,2020-06-22,SP,Óleo,city,1,0,4,False,2496.0,2471.0,3533809.0,40.47,0.0
936867,2020-06-21,SP,Óleo,city,1,0,3,False,2496.0,2471.0,3533809.0,40.47,0.0
936868,2020-06-20,SP,Óleo,city,1,0,2,False,2496.0,2471.0,3533809.0,40.47,0.0
936869,2020-06-19,SP,Óleo,city,1,0,1,False,2496.0,2471.0,3533809.0,40.47,0.0


In [21]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 936870 entries, 0 to 936869
Data columns (total 13 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   date                            936870 non-null  object 
 1   state                           936870 non-null  object 
 2   city                            930535 non-null  object 
 3   place_type                      936870 non-null  object 
 4   confirmed                       936870 non-null  int64  
 5   deaths                          936870 non-null  int64  
 6   order_for_place                 936870 non-null  int64  
 7   is_last                         936870 non-null  bool   
 8   estimated_population_2019       933215 non-null  float64
 9   estimated_population            933215 non-null  float64
 10  city_ibge_code                  933215 non-null  float64
 11  confirmed_per_100k_inhabitants  919888 non-null  float64
 12  death_rate      

In [37]:
#Quantidade de valores em branco
print(dados.isna().sum())
print('\n')

date                                  0
state                                 0
city                               6335
place_type                            0
confirmed                             0
deaths                                0
order_for_place                       0
is_last                               0
estimated_population_2019          3655
estimated_population               3655
city_ibge_code                     3655
confirmed_per_100k_inhabitants    16982
death_rate                            0
dtype: int64




#### Tratamento dos dados

Alterando o tipo do campo `date` de *object* para *datetime*

In [None]:
dados['date'] = pd.to_datetime(dados['date'])
dados.info()

De acordo com a documentação, o campo `place_type` está subdividido em Estado e Cidade. Irei dividir em 2 datasets, um para a visão por Estados e outro para a visão por Cidades.

In [35]:
#dados_estado -> Estados
dados_estado = dados[dados['place_type']=='state'].reset_index(drop=True)
dados_estado.head(5)

Unnamed: 0,date,state,city,place_type,confirmed,deaths,order_for_place,is_last,estimated_population_2019,estimated_population,city_ibge_code,confirmed_per_100k_inhabitants,death_rate
0,2020-11-05,AC,,state,31248,697,234,True,881935.0,894470.0,12.0,3493.47,0.02
1,2020-11-04,AC,,state,31218,696,233,False,881935.0,894470.0,12.0,3490.11,0.02
2,2020-11-03,AC,,state,30962,695,232,False,881935.0,894470.0,12.0,3461.49,0.02
3,2020-11-02,AC,,state,30954,693,231,False,881935.0,894470.0,12.0,3460.6,0.02
4,2020-11-01,AC,,state,30951,693,230,False,881935.0,894470.0,12.0,3460.26,0.02


In [36]:
#dados_cidade -> Cidades
dados_cidade = dados[dados['place_type']=='city'].reset_index(drop=True)
dados_cidade.head(5)

Unnamed: 0,date,state,city,place_type,confirmed,deaths,order_for_place,is_last,estimated_population_2019,estimated_population,city_ibge_code,confirmed_per_100k_inhabitants,death_rate
0,2020-11-05,AC,Acrelândia,city,492,11,211,True,15256.0,15490.0,1200013.0,3176.24,0.02
1,2020-11-04,AC,Acrelândia,city,492,11,210,False,15256.0,15490.0,1200013.0,3176.24,0.02
2,2020-11-03,AC,Acrelândia,city,492,11,209,False,15256.0,15490.0,1200013.0,3176.24,0.02
3,2020-11-02,AC,Acrelândia,city,492,10,208,False,15256.0,15490.0,1200013.0,3176.24,0.02
4,2020-11-01,AC,Acrelândia,city,492,10,207,False,15256.0,15490.0,1200013.0,3176.24,0.02
