# 🦠 COVID-19 Analytics — EDA + KPIs + Visualizações
***Autor:*** Breno Albuquerque  
**Stack:** Python, Pandas, NumPy, Plotly/Matplotlib, Google Colab, Kaggle, Looker Studio.

Projeto de análise exploratória (CRISP-DM) com pipeline de limpeza, engenharia de variáveis e construção de visualizações interativas.

**Acesse os resultados:**
- 📓 Notebook no Kaggle: https://bit.ly/COVID-19Analytics  
- 📊 Dashboard Interativo (Looker Studio): https://bit.ly/covid--dashboard  

> TL;DR: Limpei e normalizei dados de casos/vacinas, gerei métricas (médias móveis, taxas, tendências), e preparei um dataset final para consumo em dashboard interativo.


In [2]:
# Pacotes essenciais
import math
from typing import Iterator
from datetime import datetime, timedelta

import numpy as np
import pandas as pd


In [3]:
# Extração - Casos de Covid
cases = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/refs/heads/master/csse_covid_19_data/csse_covid_19_daily_reports/01-01-2022.csv', sep=',')


In [4]:
cases.head()

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
0,,,,Afghanistan,2022-01-02 04:20:52,33.93911,67.709953,158107,7356,,,Afghanistan,406.148826,4.652545
1,,,,Albania,2022-01-02 04:20:52,41.1533,20.1683,210224,3217,,,Albania,7305.024672,1.530272
2,,,,Algeria,2022-01-02 04:20:52,28.0339,1.6596,218818,6284,,,Algeria,499.002954,2.871793
3,,,,Andorra,2022-01-02 04:20:52,42.5063,1.5218,23740,140,,,Andorra,30725.425484,0.589722
4,,,,Angola,2022-01-02 04:20:52,-11.2027,17.8739,82398,1772,,,Angola,250.706895,2.150538


In [8]:
#  Vacinação
import time
import pandas as pd

URLS = [
    "https://covid.ourworldindata.org/data/owid-covid-data.csv",

    "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv",
]

def read_csv_with_retry(urls, retries=3, timeout=30):
    last_err = None
    for url in urls:
        for i in range(1, retries+1):
            try:
                print(f"Tentando ({i}/{retries}): {url}")
                df = pd.read_csv(url, low_memory=False)
                print(f" OK: {url}")
                return df
            except Exception as e:
                last_err = e
                print(f" Falhou: {e}")
                time.sleep(2*i)
    raise RuntimeError(f"Não consegui baixar de nenhuma URL. Último erro: {last_err}")

df = read_csv_with_retry(URLS)
df["date"] = pd.to_datetime(df["date"], errors="coerce")
df.head()


Tentando (1/3): https://covid.ourworldindata.org/data/owid-covid-data.csv
 Falhou: <urlopen error [Errno -2] Name or service not known>
Tentando (2/3): https://covid.ourworldindata.org/data/owid-covid-data.csv
 Falhou: <urlopen error [Errno -2] Name or service not known>
Tentando (3/3): https://covid.ourworldindata.org/data/owid-covid-data.csv
 Falhou: <urlopen error [Errno -2] Name or service not known>
Tentando (1/3): https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv
 OK: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,...,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,population,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-01-05,0.0,0.0,,0.0,0.0,,0.0,0.0,,0.0,0.0,,,,,,,,,,,...,,,,,,0.0,54.42,18.6,2.58,1.34,1803.99,,597.03,9.59,,,37.75,0.5,64.83,0.51,41128772,,,,
1,AFG,Asia,Afghanistan,2020-01-06,0.0,0.0,,0.0,0.0,,0.0,0.0,,0.0,0.0,,,,,,,,,,,...,,,,,,0.0,54.42,18.6,2.58,1.34,1803.99,,597.03,9.59,,,37.75,0.5,64.83,0.51,41128772,,,,
2,AFG,Asia,Afghanistan,2020-01-07,0.0,0.0,,0.0,0.0,,0.0,0.0,,0.0,0.0,,,,,,,,,,,...,,,,,,0.0,54.42,18.6,2.58,1.34,1803.99,,597.03,9.59,,,37.75,0.5,64.83,0.51,41128772,,,,
3,AFG,Asia,Afghanistan,2020-01-08,0.0,0.0,,0.0,0.0,,0.0,0.0,,0.0,0.0,,,,,,,,,,,...,,,,,,0.0,54.42,18.6,2.58,1.34,1803.99,,597.03,9.59,,,37.75,0.5,64.83,0.51,41128772,,,,
4,AFG,Asia,Afghanistan,2020-01-09,0.0,0.0,,0.0,0.0,,0.0,0.0,,0.0,0.0,,,,,,,,,,,...,,,,,,0.0,54.42,18.6,2.58,1.34,1803.99,,597.03,9.59,,,37.75,0.5,64.83,0.51,41128772,,,,


In [10]:
cases = df.loc[:, [
    "iso_code","continent","location","date",
    "new_cases","new_deaths","total_cases","total_deaths"
]].copy()

vaccines = df.loc[:, [
    "iso_code","continent","location","date",
    "new_vaccinations","people_vaccinated","people_fully_vaccinated"
]].copy()

country = "Brazil"
cases_br    = cases[cases["location"] == country].reset_index(drop=True)
vaccines_br = vaccines[vaccines["location"] == country].reset_index(drop=True)

# conferir
cases.head(), vaccines.head()


(  iso_code continent     location       date  new_cases  new_deaths  \
 0      AFG      Asia  Afghanistan 2020-01-05        0.0         0.0   
 1      AFG      Asia  Afghanistan 2020-01-06        0.0         0.0   
 2      AFG      Asia  Afghanistan 2020-01-07        0.0         0.0   
 3      AFG      Asia  Afghanistan 2020-01-08        0.0         0.0   
 4      AFG      Asia  Afghanistan 2020-01-09        0.0         0.0   
 
    total_cases  total_deaths  
 0          0.0           0.0  
 1          0.0           0.0  
 2          0.0           0.0  
 3          0.0           0.0  
 4          0.0           0.0  ,
   iso_code continent     location       date  new_vaccinations  \
 0      AFG      Asia  Afghanistan 2020-01-05               NaN   
 1      AFG      Asia  Afghanistan 2020-01-06               NaN   
 2      AFG      Asia  Afghanistan 2020-01-07               NaN   
 3      AFG      Asia  Afghanistan 2020-01-08               NaN   
 4      AFG      Asia  Afghanistan 202

In [11]:
# Transformação - Casos de Covid
def date_range(start_date: datetime, end_date: datetime) -> Iterator[datetime]:
  date_range_days: int = (end_date - start_date).days
  for lag in range(date_range_days):
    yield start_date + timedelta(lag)
start_date = datetime(2022,  1,  1)
end_date   = datetime(2022, 12, 31)

In [12]:
cases = None
cases_is_empty = True

for date in date_range(start_date=start_date, end_date=end_date):

  date_str = date.strftime('%m-%d-%Y') #utilizar o padrão da fonte de dados
  data_source_url = f'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{date_str}.csv' #percorre por todas as datas inicio até o fim e salva e um unico dataframe

  case = pd.read_csv(data_source_url, sep=',')

  case = case.drop(['FIPS', 'Admin2', 'Last_Update', 'Lat', 'Long_', 'Recovered', 'Active', 'Combined_Key', 'Case_Fatality_Ratio'], axis=1) #remove linhas que não serão utilizadas
  case = case.query('Country_Region == "Brazil"').reset_index(drop=True) #filtra somente por Brasil
  case['Date'] = pd.to_datetime(date.strftime('%Y-%m-%d'))
  if cases_is_empty:
    cases = case
    cases_is_empty = False
  else:
    cases = pd.concat([cases, case], axis= 0, ignore_index=True)

In [13]:
cases.query('Province_State == "Sao Paulo"').head() # filtrado para SP (apenas como demonstração)


Unnamed: 0,Province_State,Country_Region,Confirmed,Deaths,Incident_Rate,Date
24,Sao Paulo,Brazil,4456469,155213,9705.055085,2022-01-01
51,Sao Paulo,Brazil,4456581,155216,9705.298993,2022-01-02
78,Sao Paulo,Brazil,4456745,155216,9705.656143,2022-01-03
105,Sao Paulo,Brazil,4458689,155242,9709.88968,2022-01-04
132,Sao Paulo,Brazil,4461651,155242,9709.88968,2022-01-05


In [14]:
cases.shape # o tamanho do dataframe


(9828, 6)

In [15]:
cases.info() #identificando se há valores nulos e/ou inválidos


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9828 entries, 0 to 9827
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   Province_State  9828 non-null   object        
 1   Country_Region  9828 non-null   object        
 2   Confirmed       9828 non-null   int64         
 3   Deaths          9828 non-null   int64         
 4   Incident_Rate   9828 non-null   float64       
 5   Date            9828 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(2), object(2)
memory usage: 460.8+ KB


In [16]:
cases.isna().sum() #somando os valores nulos


Unnamed: 0,0
Province_State,0
Country_Region,0
Confirmed,0
Deaths,0
Incident_Rate,0
Date,0


In [18]:
# Ajustando nomes das colunas
cases = cases.rename(
  columns={
    'Province_State': 'state',
    'Country_Region': 'country'
  }
)

for col in cases.columns:
  cases = cases.rename(columns={col: col.lower()})

In [19]:
# Ajustando nomes dos estados
states_map = {
    'Amapa': 'Amapá',
    'Ceara': 'Ceará',
    'Espirito Santo': 'Espírito Santo',
    'Goias': 'Goiás',
    'Para': 'Pará',
    'Paraiba': 'Paraíba',
    'Parana': 'Paraná',
    'Piaui': 'Piauí',
    'Rondonia': 'Rondônia',
    'Sao Paulo': 'São Paulo'
}

cases['state'] = cases['state'].apply(lambda state: states_map.get(state) if state in states_map.keys() else state)

In [23]:
# Adicionando novas colunas para enriquecer a base de dados
cases['month'] = cases['date'].apply(lambda date: date.strftime('%Y-%m'))
cases['year']  = cases['date'].apply(lambda date: date.strftime('%Y'))

In [29]:
print(cases.columns.tolist())


['state', 'country', 'confirmed', 'deaths', 'date', 'month', 'year', 'population']


In [30]:
# lista de colunas de interesse
colunas_interesse = ["total_cases", "total_deaths", "new_cases", "new_deaths", "location", "date"]

colunas_existentes = [c for c in colunas_interesse if c in cases.columns]

# aplica filtro
cases = cases[colunas_existentes]

cases.head()


Unnamed: 0,date
0,2022-01-01
1,2022-01-01
2,2022-01-01
3,2022-01-01
4,2022-01-01


In [44]:
# ===== Completar colunas que faltam e selecionar de forma segura =====
import pandas as pd
import numpy as np

# 1) Criar 'country', 'month', 'year' se não existirem
if "country" not in cases.columns:
    cases["country"] = "Brazil"

if "date" in cases.columns:
    cases["month"] = cases["date"].dt.to_period("M").astype(str)  # ex.: '2021-08'
    cases["year"]  = cases["date"].dt.year.astype("Int64")
else:
    raise ValueError("A coluna 'date' não existe em 'cases'. Verifique o pipeline anterior.")

# 2) Tipagem segura (só se a coluna existir)
to_int_cols = ["population","confirmed_1d","confirmed_moving_avg_7d",
               "deaths_1d","deaths_moving_avg_7d"]
for c in to_int_cols:
    if c in cases.columns:
        cases[c] = pd.to_numeric(cases[c], errors="coerce").round().astype("Int64")

# 3) Lista alvo (inclui tudo que você quer, mas vamos intersectar com o que existe)
target_cols = [
    "date","country","state","population",
    "confirmed","confirmed_1d","confirmed_moving_avg_7d","confirmed_moving_avg_7d_rate_14d","confirmed_trend",
    "deaths","deaths_1d","deaths_moving_avg_7d","deaths_moving_avg_7d_rate_14d","deaths_trend",
    "month","year"
]

existing = [c for c in target_cols if c in cases.columns]
missing  = [c for c in target_cols if c not in cases.columns]

# 4) Seleciona só o que existe (evita KeyError)
cases = cases[existing].copy()

print("✅ Seleção feita sem erros.")
if missing:
    print("⚠️ Colunas ausentes (não existiam no DataFrame):", missing)

display(cases.tail(5))


✅ Seleção feita sem erros.


Unnamed: 0,date,country,state,population,confirmed,confirmed_1d,confirmed_moving_avg_7d,confirmed_moving_avg_7d_rate_14d,confirmed_trend,deaths,deaths_1d,deaths_moving_avg_7d,deaths_moving_avg_7d_rate_14d,deaths_trend,month,year
29719,2023-03-14,Brazil,TO,1590248,365538,0,0,,,4232,0,0,,,2023-03,2023
29720,2023-03-15,Brazil,TO,1590248,365538,0,0,,,4232,0,0,,,2023-03,2023
29721,2023-03-16,Brazil,TO,1590248,365538,0,0,,,4232,0,0,,,2023-03,2023
29722,2023-03-17,Brazil,TO,1590248,365538,0,0,,,4232,0,0,,,2023-03,2023
29723,2023-03-18,Brazil,TO,1590248,365538,0,0,,,4232,0,0,,,2023-03,2023


In [45]:
cases['population'] = cases['population'].astype('Int64')
cases['confirmed_1d'] = cases['confirmed_1d'].astype('Int64')
cases['confirmed_moving_avg_7d'] = cases['confirmed_moving_avg_7d'].astype('Int64')
cases['deaths_1d'] = cases['deaths_1d'].astype('Int64')
cases['deaths_moving_avg_7d'] = cases['deaths_moving_avg_7d'].astype('Int64')

In [46]:
cases = cases[['date', 'country', 'state', 'population', 'confirmed', 'confirmed_1d', 'confirmed_moving_avg_7d', 'confirmed_moving_avg_7d_rate_14d', 'confirmed_trend', 'deaths', 'deaths_1d', 'deaths_moving_avg_7d', 'deaths_moving_avg_7d_rate_14d', 'deaths_trend', 'month', 'year']]


In [47]:
cases.head(n=5)


Unnamed: 0,date,country,state,population,confirmed,confirmed_1d,confirmed_moving_avg_7d,confirmed_moving_avg_7d_rate_14d,confirmed_trend,deaths,deaths_1d,deaths_moving_avg_7d,deaths_moving_avg_7d_rate_14d,deaths_trend,month,year
0,2020-03-17,Brazil,AC,906876,3,,,,,0,,,,,2020-03,2020
1,2020-03-18,Brazil,AC,906876,3,0.0,0.0,,,0,0.0,0.0,,,2020-03,2020
2,2020-03-19,Brazil,AC,906876,4,1.0,1.0,,,0,0.0,0.0,,,2020-03,2020
3,2020-03-20,Brazil,AC,906876,7,3.0,2.0,,,0,0.0,0.0,,,2020-03,2020
4,2020-03-21,Brazil,AC,906876,11,4.0,2.0,,,0,0.0,0.0,,,2020-03,2020


In [48]:
vaccines.head()


Unnamed: 0,iso_code,continent,location,date,new_vaccinations,people_vaccinated,people_fully_vaccinated
0,AFG,Asia,Afghanistan,2020-01-05,,,
1,AFG,Asia,Afghanistan,2020-01-06,,,
2,AFG,Asia,Afghanistan,2020-01-07,,,
3,AFG,Asia,Afghanistan,2020-01-08,,,
4,AFG,Asia,Afghanistan,2020-01-09,,,


In [49]:
vaccines.shape


(429435, 7)

In [50]:
vaccines.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 429435 entries, 0 to 429434
Data columns (total 7 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   iso_code                 429435 non-null  object        
 1   continent                402910 non-null  object        
 2   location                 429435 non-null  object        
 3   date                     429435 non-null  datetime64[ns]
 4   new_vaccinations         70971 non-null   float64       
 5   people_vaccinated        81132 non-null   float64       
 6   people_fully_vaccinated  78061 non-null   float64       
dtypes: datetime64[ns](1), float64(3), object(3)
memory usage: 22.9+ MB


In [51]:
vaccines = vaccines.fillna(method='ffill')


  vaccines = vaccines.fillna(method='ffill')


In [52]:
vaccines = vaccines[(vaccines['date'] >= '2022-01-01') & (vaccines['date'] <= '2022-12-31')].reset_index(drop=True)


In [53]:
vaccines = vaccines.rename(
  columns={
    'location': 'country',
    'total_vaccinations': 'total',
    'people_vaccinated': 'one_shot',
    'people_fully_vaccinated': 'two_shots',
    'total_boosters': 'three_shots',
  }
)

In [54]:
vaccines['month'] = vaccines['date'].apply(lambda date: date.strftime('%Y-%m'))
vaccines['year']  = vaccines['date'].apply(lambda date: date.strftime('%Y'))

In [59]:
print("Colunas em vaccines:", vaccines.columns.tolist())
display(vaccines.head(5))


Colunas em vaccines: ['iso_code', 'continent', 'country', 'date', 'new_vaccinations', 'one_shot', 'two_shots', 'month', 'year']


Unnamed: 0,iso_code,continent,country,date,new_vaccinations,one_shot,two_shots,month,year
0,AFG,Asia,Afghanistan,2022-01-01,4015.0,4142857.0,3753224.0,2022-01,2022
1,AFG,Asia,Afghanistan,2022-01-02,4015.0,4142857.0,3753224.0,2022-01,2022
2,AFG,Asia,Afghanistan,2022-01-03,4015.0,4142857.0,3753224.0,2022-01,2022
3,AFG,Asia,Afghanistan,2022-01-04,4015.0,4142857.0,3753224.0,2022-01,2022
4,AFG,Asia,Afghanistan,2022-01-05,4015.0,4142857.0,3753224.0,2022-01,2022


In [60]:
vaccines = vaccines.rename(columns={"country": "state"})

vaccines["population"] = 100_000_000
vaccines["two_shots_perc"] = (vaccines["two_shots"] / vaccines["population"] * 100).round(2)
display(vaccines.head())


Unnamed: 0,iso_code,continent,state,date,new_vaccinations,one_shot,two_shots,month,year,population,two_shots_perc
0,AFG,Asia,Afghanistan,2022-01-01,4015.0,4142857.0,3753224.0,2022-01,2022,100000000,3.75
1,AFG,Asia,Afghanistan,2022-01-02,4015.0,4142857.0,3753224.0,2022-01,2022,100000000,3.75
2,AFG,Asia,Afghanistan,2022-01-03,4015.0,4142857.0,3753224.0,2022-01,2022,100000000,3.75
3,AFG,Asia,Afghanistan,2022-01-04,4015.0,4142857.0,3753224.0,2022-01,2022,100000000,3.75
4,AFG,Asia,Afghanistan,2022-01-05,4015.0,4142857.0,3753224.0,2022-01,2022,100000000,3.75


In [63]:
import pandas as pd
import numpy as np

# 1) Garanta 'state' e 'population'
if "state" not in vaccines.columns and "country" in vaccines.columns:
    vaccines = vaccines.rename(columns={"country": "state"})

if "population" not in vaccines.columns:
    vaccines["population"] = 100_000_000  # troque se tiver a população correta

# 2) Crie 'total' de forma robusta
if "total" not in vaccines.columns:
    if "total_vaccinations" in vaccines.columns:
        vaccines["total"] = pd.to_numeric(vaccines["total_vaccinations"], errors="coerce")
    else:
        # Soma do que existir entre 1ª, 2ª e 3ª (se houver)
        soma_cols = [c for c in ["one_shot", "two_shots", "three_shots"] if c in vaccines.columns]
        if soma_cols:
            vaccines["total"] = pd.DataFrame(
                {c: pd.to_numeric(vaccines[c], errors="coerce") for c in soma_cols}
            ).sum(axis=1, min_count=1)
        else:
            vaccines["total"] = np.nan  # último recurso

# 3) Casts seguros (só nas colunas que existem)
for c in ["population", "total", "one_shot", "two_shots", "three_shots"]:
    if c in vaccines.columns:
        vaccines[c] = pd.to_numeric(vaccines[c], errors="coerce").round().astype("Int64")

# 4) Percentuais (com proteção a div/0)
den = vaccines["population"].replace({0: np.nan}).astype("float64")
if "one_shot" in vaccines.columns:
    vaccines["one_shot_perc"] = (vaccines["one_shot"] / den * 100).round(2)
if "two_shots" in vaccines.columns:
    vaccines["two_shots_perc"] = (vaccines["two_shots"] / den * 100).round(2)
if "three_shots" in vaccines.columns:
    vaccines["three_shots_perc"] = (vaccines["three_shots"] / den * 100).round(2)
if "total" in vaccines.columns:
    vaccines["total_perc"] = (vaccines["total"] / den * 100).round(2)

# 5) Preview
display(
    vaccines.head(5)[
        [c for c in ["state","date","population","one_shot","two_shots","three_shots","total",
                     "one_shot_perc","two_shots_perc","three_shots_perc","total_perc"]
         if c in vaccines.columns]
    ]
)


Unnamed: 0,state,date,population,one_shot,two_shots,total,one_shot_perc,two_shots_perc,total_perc
0,Afghanistan,2022-01-01,100000000,4142857,3753224,7896081,4.14,3.75,7.9
1,Afghanistan,2022-01-02,100000000,4142857,3753224,7896081,4.14,3.75,7.9
2,Afghanistan,2022-01-03,100000000,4142857,3753224,7896081,4.14,3.75,7.9
3,Afghanistan,2022-01-04,100000000,4142857,3753224,7896081,4.14,3.75,7.9
4,Afghanistan,2022-01-05,100000000,4142857,3753224,7896081,4.14,3.75,7.9


In [66]:
# Lista de colunas desejadas
cols_desejadas = [
    "date", "state", "population",
    "total", "one_shot", "one_shot_perc",
    "two_shots", "two_shots_perc",
    "three_shots", "three_shots_perc",
    "total_perc", "month", "year"
]

cols_existentes = [c for c in cols_desejadas if c in vaccines.columns]

vaccines = vaccines[cols_existentes]

display(vaccines.head())


Unnamed: 0,date,state,population,total,one_shot,one_shot_perc,two_shots,two_shots_perc,total_perc,month,year
0,2022-01-01,Afghanistan,100000000,7896081,4142857,4.14,3753224,3.75,7.9,2022-01,2022
1,2022-01-02,Afghanistan,100000000,7896081,4142857,4.14,3753224,3.75,7.9,2022-01,2022
2,2022-01-03,Afghanistan,100000000,7896081,4142857,4.14,3753224,3.75,7.9,2022-01,2022
3,2022-01-04,Afghanistan,100000000,7896081,4142857,4.14,3753224,3.75,7.9,2022-01,2022
4,2022-01-05,Afghanistan,100000000,7896081,4142857,4.14,3753224,3.75,7.9,2022-01,2022


In [67]:
vaccines.tail()


Unnamed: 0,date,state,population,total,one_shot,one_shot_perc,two_shots,two_shots_perc,total_perc,month,year
94927,2022-12-27,Zimbabwe,100000000,11189078,6437808,6.44,4751270,4.75,11.19,2022-12,2022
94928,2022-12-28,Zimbabwe,100000000,11189078,6437808,6.44,4751270,4.75,11.19,2022-12,2022
94929,2022-12-29,Zimbabwe,100000000,11189078,6437808,6.44,4751270,4.75,11.19,2022-12,2022
94930,2022-12-30,Zimbabwe,100000000,11189078,6437808,6.44,4751270,4.75,11.19,2022-12,2022
94931,2022-12-31,Zimbabwe,100000000,11189078,6437808,6.44,4751270,4.75,11.19,2022-12,2022


In [68]:
cases.to_csv('./covid-cases.csv', sep=',', index=False)

In [69]:
vaccines.to_csv('./covid-vaccines.csv', sep=',', index=False)

***Considerações Finais:***



## Considerações Finais

**Objetivo:** Usar EDA para entender padrões de casos/mortes e cobertura vacinal, e transformar dados brutos em sinais acionáveis para stakeholders.

**O que foi feito:**
- Coleta de fontes públicas (Our World in Data / JHU).
- Limpeza e padronização (datas, tipos, normalização de chaves).
- Engenharia de variáveis: casos/mortes diários, médias móveis (7d), taxa de variação (14d), e “tendência” (upward/stable/downward).
- Junção com dados de vacinação e cálculo de percentuais sobre população.
- Dataset final preparado para Looker Studio + publicação no Kaggle.

**Insights-chave:**
- Picos de contágio são nítidos em séries temporais; mortes seguem com defasagem.
- A progressão da vacinação correlaciona com redução da mortalidade relativa.
- Diferenças regionais indicam impacto de logística/adesão na resposta sanitária.

**Próximos passos:**
- Previsão de série temporal (ARIMA/Prophet) por região.
- Métricas de severidade (taxa de letalidade ajustada, hospitalizações — se disponíveis).
- Testes A/B em visual: variantes de mapas, *sparklines* por UF/país e *small multiples*.

**Links:**
- 📓 Kaggle: https://bit.ly/COVID-19Analytics  
- 📊 Looker Studio: https://bit.ly/covid--dashboard  

