
# Laboratório: DW de Saúde Pública — COVID‑19 (Brasil)

**Tema:** Saúde Pública (COVID‑19)  
**Bases:**  
1) **OWID COVID** (casos/óbitos por país) — CSV público  
2) **OWID Vaccinations** (vacinação por país) — CSV público  
3) **BrasilAPI Feriados** (enriquecimento de `dim_date` com `is_holiday`)

> Objetivo: consolidar dados diários de COVID e vacinação para o Brasil em um **DW (Star Schema)**, realizar **ETL** com pandas e aplicar **clustering** para segmentar períodos epidemiológicos.



## Bases (links)
- OWID COVID: `https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv`  
- OWID Vaccinations: `https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv`  
- BrasilAPI (feriados por ano): `https://brasilapi.com.br/api/feriados/v1/{ANO}`

> Se a internet estiver bloqueada, baixe os CSVs e coloque ao lado do notebook.

In [1]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import requests

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score

%matplotlib inline

OUT_DIR = Path("/content/drive/MyDrive/dw_health_output_aula")
OUT_DIR.mkdir(exist_ok=True)

# Nova seção
## Modelo conceitual (Star Schema)

Fato **`fact_covid_daily`** (granularidade: **dia-país**):  
**Chaves:** `date_sk`, `location_sk`  
**Medidas:** `new_cases`, `new_deaths`, `new_vaccinations`, `people_fully_vaccinated`, `stringency_index` (quando disponível)

Dimensões:  
- **`dim_date`**: `date_sk`, `date`, `year`, `month`, `day`, `is_holiday`  
- **`dim_location`**: `location_sk`, `iso_code`, `location`, `continent`, `population`, `population_density`


<pre style="font-size:13px; white-space:pre; overflow-x:auto;">
                   +-------------------+
                   |     dim_date      |
                   +-------------------+
                   | date_sk (PK)      |
                   | date              |
                   | year              |
                   | month             |
                   | day               |
                   | is_holiday        |
                   +-------------------+
                          |
                          | (FK)
                          |
+-------------------+     |     +-------------------+
|  dim_location     |     |     |  fact_covid_daily |
+-------------------+     |     +-------------------+
| location_sk (PK)  |-----+-----| date_sk (FK)      |
| iso_code          |           | location_sk (FK)  |
| location          |           | new_cases         |
| continent         |           | new_deaths        |
| population        |           | new_vaccinations  |
| population_density|           | people_fully_vacc |
+-------------------+           | stringency_index  |
                                +-------------------+
</pre>

In [3]:
def read_csv_web_or_local(url):
    fname = Path(url.split("/")[-1])
    full_output = OUT_DIR / fname

    if full_output.exists():
        print("✔ Lido do cache:", full_output)
        return pd.read_csv(full_output)

    else:
        print("✔ Lido da web:", url)
        try:
            df = pd.read_csv(url)
            print("✔ Lido da web:", url)
            if not full_output.exists():
                df.to_csv(full_output, index=False)
            return df
        except Exception as e:
            print("Web falhou:", e)

OWID_COVID = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv"
OWID_VAX = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv"

covid = read_csv_web_or_local(OWID_COVID)
vax = read_csv_web_or_local(OWID_VAX)

covid.shape, vax.shape

✔ Lido da web: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv
✔ Lido da web: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv
✔ Lido da web: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv
✔ Lido da web: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv


((429435, 67), (196246, 16))