<img src="https://raw.githubusercontent.com/andre-marcos-perez/ebac-course-utils/main/media/logo/newebac_logo_black_half.png" alt="ebac-logo">

---

# **Módulo** | Análise de Dados: COVID-19 Dashboard
Caderno de **Exercícios**<br>
Professor [André Perez](https://www.linkedin.com/in/andremarcosperez/)

---

# **Tópicos**

<ol type="1">
  <li>Introdução;</li>
  <li>Análise Exploratória de Dados;</li>
  <li>Visualização Interativa de Dados;</li>
  <li>Storytelling.</li>
</ol>


# **Exercícios**

Este *notebook* deve servir como um guia para **você continuar** a construção da sua própria análise exploratória de dados interativa. Fique a vontate para copiar os códigos da aula mas busque explorar os dados ao máximo. Por fim, publique seu *notebook* no [Kaggle](https://www.kaggle.com/) e seu *dashboard* [Google Data Studio](https://datastudio.google.com/).

---

# **COVID Dashboard**

## 1\. Contexto

Neste projeto foram extraidos, preparados e analisados os dados a respeito da Pandemia ocorrida entre os anos de 2020 e 2022. Em seguida é feita e analisada a produção do Dashboard que descreve de forma gráfica e didática os dados a respeito dos casos, doses da vacina, mortes, população e letalidade do vírus.

### **1.1. TLDR**

 - **Dashboard**:
  - Google Data Studio ([link](https://lookerstudio.google.com/reporting/bfb640e1-bf15-4e19-b593-08390e7df60d)).
 - **Processamento**:
  - Kaggle Notebook ([link](https://www.kaggle.com/athnxz/projeto-an-lise-de-dados-covid-19-dashboard)).
 - **Fontes**:
  - Vacinação pela universidade de Oxford ([link](https://covid.ourworldindata.org/data/owid-covid-data.csv)).

### **1.2. Pandemia Covid-19**

> A COVID-19 é uma infecção respiratória aguda causada pelo coronavírus SARS-CoV-2, potencialmente grave, de elevada transmissibilidade e de distribuição global. Fonte: Governo brasileiro ([link](https://www.gov.br/saude/pt-br/coronavirus/o-que-e-o-coronavirus)).

No decorrer do projeto serão explicadas e analisadas as porcentagens equiparadas dos anos de 2020 à 2022 durante a pandemia.

### **1.3. Dados**

Os dados são retirados de uma pesquisa realizada pela universidade de Oxford, abaixo estão listados e utilizados como base de processamento para o desenvolvimento do projeto.

- **country**: País;
- **date**: Data;
- **total_cases**: Números de casos;
- **total_deaths**: Números de óbitos;  
- **Lethality**: Letalidade do vírus;
- **total**: Número de doses aplicadas;
- **one_shot**: Quantidade de pessoas que receberam a primeira dose da vacina;
- **one_shot_hab**: Número acumulado de pessoas com uma dose;
- **one_shot_vac**: Relação do total de doses e a primeira dose aplicada;
- **two_shots**: Quantidade de pessoas que receberam a segunda dose da vacina;
- **two_shots_hab**: Número acumulado de pessoas com duas doses;
- **two_shots_vac**: Relação do total de doses e a segunda dose
- **three_shots**: Quantidade de pessoas que receberam a terceira dose da vacina;
- **three_shots_hab**: Número acumulado de pessoas com três doses;
- **three_shots_vac**: Relação do total de doses e a terceira dose da vacina;
- **population**: População estimada


## 2\. Pacotes e bibliotecas

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np

## 3\. Extração

In [2]:
#Coleta de Dados:
url = 'https://covid.ourworldindata.org/data/owid-covid-data.csv'
df_original = pd.read_csv(url)

#Limpando colunas
df_clear = df_original.drop(['iso_code', 'new_cases','new_cases_smoothed', 'new_deaths','positive_rate', 'new_deaths_smoothed', 'total_cases_per_million','new_cases_per_million', 'new_cases_smoothed_per_million','total_deaths_per_million', 'new_deaths_per_million','new_deaths_smoothed_per_million','icu_patients_per_million','hosp_patients_per_million', 'weekly_icu_admissions','weekly_icu_admissions_per_million', 'weekly_hosp_admissions','weekly_hosp_admissions_per_million', 'new_tests','total_tests_per_thousand', 'new_tests_per_thousand','new_tests_smoothed', 'new_tests_smoothed_per_thousand', 'tests_per_case', 'tests_units','new_vaccinations', 'new_vaccinations_smoothed','total_vaccinations_per_hundred', 'people_vaccinated_per_hundred','people_fully_vaccinated_per_hundred', 'total_boosters_per_hundred','new_vaccinations_smoothed_per_million','new_people_vaccinated_smoothed','new_people_vaccinated_smoothed_per_hundred', 'stringency_index','population_density', 'median_age', 'aged_65_older', 'aged_70_older','gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate','diabetes_prevalence', 'female_smokers', 'male_smokers','handwashing_facilities', 'hospital_beds_per_thousand','excess_mortality_cumulative_absolute','continent','human_development_index', 'total_tests','excess_mortality_cumulative','excess_mortality','life_expectancy','hosp_patients', 'reproduction_rate', 'excess_mortality_cumulative_per_million','icu_patients'],axis=1)
df_clear.columns

Index(['location', 'date', 'total_cases', 'total_deaths', 'total_vaccinations',
       'people_vaccinated', 'people_fully_vaccinated', 'total_boosters',
       'population'],
      dtype='object')

In [3]:
#Filtro da localização:
df_aux1 = df_clear.loc[df_clear['location']=='Brazil']
df_aux2 = df_clear.loc[df_clear['location']=='China']
df_aux3 = pd.concat([df_aux1,df_aux2],ignore_index=True)
df_aux3['date'] = pd.to_datetime(df_aux3['date'])
df_filter = df_aux3.sort_values(by='date')
df_filter

Unnamed: 0,location,date,total_cases,total_deaths,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,population
0,Brazil,2020-01-03,,,,,,,2.153135e+08
1420,China,2020-01-03,,,,,,,1.425887e+09
1421,China,2020-01-04,1.0,,,,,,1.425887e+09
1,Brazil,2020-01-04,,,,,,,2.153135e+08
2,Brazil,2020-01-05,,,,,,,2.153135e+08
...,...,...,...,...,...,...,...,...,...
1417,Brazil,2023-11-20,37721749.0,704659.0,,,,,2.153135e+08
2838,China,2023-11-21,99319858.0,121798.0,,,,,1.425887e+09
1418,Brazil,2023-11-21,37721749.0,704659.0,,,,,2.153135e+08
1419,Brazil,2023-11-22,37721749.0,704659.0,,,,,2.153135e+08


In [4]:
# filtragem dos dados para as datas 01/12/2020 e 01/09/2022
df_filter = df_filter[(df_filter['date'] >= '2020-12-01') & (df_filter['date'] <= '2022-9-01')].reset_index(drop=True)
df_filter

Unnamed: 0,location,date,total_cases,total_deaths,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,population
0,China,2020-12-01,93577.0,4750.0,,,,,1.425887e+09
1,Brazil,2020-12-01,6314740.0,172833.0,,,,,2.153135e+08
2,Brazil,2020-12-02,6335878.0,173120.0,,,,,2.153135e+08
3,China,2020-12-02,93668.0,4750.0,,,,,1.425887e+09
4,China,2020-12-03,93797.0,4751.0,,,,,1.425887e+09
...,...,...,...,...,...,...,...,...,...
1275,China,2022-08-30,6366770.0,24836.0,3.432506e+09,,,,1.425887e+09
1276,China,2022-08-31,6401474.0,24883.0,3.432778e+09,,,,1.425887e+09
1277,Brazil,2022-08-31,34397205.0,683622.0,4.703274e+08,186519881.0,171665721.0,117565662.0,2.153135e+08
1278,Brazil,2022-09-01,34414011.0,683851.0,4.704831e+08,186542426.0,171709120.0,117655924.0,2.153135e+08


In [5]:
# Convertendo valores
df_filter['population'] = df_filter['population'].astype('Int64')
df_filter['total_cases'] = df_filter['total_cases'].astype('Int64')
df_filter['total_deaths'] = df_filter['total_deaths'].astype('Int64')
df_filter['total_vaccinations'] = df_filter['total_vaccinations'].astype('Int64')
df_filter['people_vaccinated'] = df_filter['people_vaccinated'].astype('Int64')
df_filter['people_fully_vaccinated'] = df_filter['people_fully_vaccinated'].astype('Int64')
df_filter['total_boosters'] = df_filter['total_boosters'].astype('Int64')

# Visualizar se os dados estão na tipagem correta
df = df_filter
df

Unnamed: 0,location,date,total_cases,total_deaths,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,population
0,China,2020-12-01,93577,4750,,,,,1425887360
1,Brazil,2020-12-01,6314740,172833,,,,,215313504
2,Brazil,2020-12-02,6335878,173120,,,,,215313504
3,China,2020-12-02,93668,4750,,,,,1425887360
4,China,2020-12-03,93797,4751,,,,,1425887360
...,...,...,...,...,...,...,...,...,...
1275,China,2022-08-30,6366770,24836,3432506000,,,,1425887360
1276,China,2022-08-31,6401474,24883,3432778000,,,,1425887360
1277,Brazil,2022-08-31,34397205,683622,470327404,186519881,171665721,117565662,215313504
1278,Brazil,2022-09-01,34414011,683851,470483060,186542426,171709120,117655924,215313504


In [6]:
df = df.rename(
  columns={
    'location': 'country',
    'total_vaccinations': 'total',
    'people_vaccinated': 'one_shot',
    'people_fully_vaccinated': 'two_shots',
    'total_boosters': 'three_shots',
  }
)

df.columns

Index(['country', 'date', 'total_cases', 'total_deaths', 'total', 'one_shot',
       'two_shots', 'three_shots', 'population'],
      dtype='object')

## 4\. Transformação

In [7]:
#Enriquecimento:
# Doses de vacinação distruibuidas para a população
df['one_shot_hab'] = round(df['one_shot'] / df['population'], 4)
df['two_shots_hab'] = round(df['two_shots'] / df['population'], 4)
df['three_shots_hab'] = round(df['three_shots'] / df['population'], 4)
# Doses de vacinação distribuidas no total
df['one_shot_vac'] = round(df['one_shot'] / df['total'], 4)
df['two_shots_vac'] = round(df['two_shots'] / df['total'], 4)
df['three_shots_vac'] = round(df['three_shots'] / df['total'], 4)
df['Lethality'] = round(df['total_deaths'] / df['total_cases'], 4)
df.columns
# Organização das colunas
df = df[['country', 'date', 'total_cases', 'total_deaths', 'Lethality', 'total', 'one_shot', 'one_shot_hab', 'one_shot_vac', 'two_shots', 'two_shots_hab', 'two_shots_vac', 'three_shots', 'three_shots_hab' ,'three_shots_vac', 'population']]
df = df.fillna(0)
# Adição da coluna mês
df['month'] = df['date'].apply(lambda date: date.strftime('%Y-%m'))
df

Unnamed: 0,country,date,total_cases,total_deaths,Lethality,total,one_shot,one_shot_hab,one_shot_vac,two_shots,two_shots_hab,two_shots_vac,three_shots,three_shots_hab,three_shots_vac,population,month
0,China,2020-12-01,93577,4750,0.0508,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,1425887360,2020-12
1,Brazil,2020-12-01,6314740,172833,0.0274,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,215313504,2020-12
2,Brazil,2020-12-02,6335878,173120,0.0273,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,215313504,2020-12
3,China,2020-12-02,93668,4750,0.0507,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,1425887360,2020-12
4,China,2020-12-03,93797,4751,0.0507,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,1425887360,2020-12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1275,China,2022-08-30,6366770,24836,0.0039,3432506000,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,1425887360,2022-08
1276,China,2022-08-31,6401474,24883,0.0039,3432778000,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,1425887360,2022-08
1277,Brazil,2022-08-31,34397205,683622,0.0199,470327404,186519881,0.8663,0.3966,171665721,0.7973,0.365,117565662,0.546,0.25,215313504,2022-08
1278,Brazil,2022-09-01,34414011,683851,0.0199,470483060,186542426,0.8664,0.3965,171709120,0.7975,0.365,117655924,0.5464,0.2501,215313504,2022-09


## 5\. Carregamento

In [9]:
# Carregando CSV
df.to_csv('covid-vaccine.csv', index = False)