# EDA SRAG DATASUS

## Summary

Os dados foram obtidos pelo  [Datasus](https://opendatasus.saude.gov.br/dataset/srag-2021-a-2024) no dia 23/10

Atualmente os dados possuem as seguintes distribuições

| ano | Linhas | Colunas|
|----|----|-----|
| 19 | 48.961 | 194 |
| 20 | 1.206.920 | 194 |
| 21 | 1.745.672 | 194 |
| 22 | 560.577 | 194 |
| 23 | 279.453 | 194 |
| 24 | 267.984 | 194 |
| 25 | 278.902 | 194 |
| Total| 4.388.469| |

## EDA

### Setup

In [1]:
import sys
from pathlib import Path

import pandas as pd

# Caminho absoluto até a pasta src
project_root = Path.cwd().resolve().parents[2]
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

In [2]:
# Import das funções utilitárias customizadas
from src.utils.eda_utils import iter_blocos, proximo_bloco

In [3]:
# Diretório contendo os arquivos CSV
data_dir = project_root / "data" / "srag_csvs"
csv_files = list(data_dir.glob("*.csv"))
csv_files.sort()
csv_files

[WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD19-26-06-2025.csv'),
 WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD20-26-06-2025.csv'),
 WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD21-26-06-2025.csv'),
 WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD22-26-06-2025.csv'),
 WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD23-26-06-2025.csv'),
 WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD24-26-06-2025.csv'),
 WindowsPath('C:/Users/aurelio/projetos/ai_engineer_projects/data/srag_csvs/INFLUD25-27-10-2025.csv')]

### Verificação de consistência dos arquivos CSV

#### Colunas

In [4]:
colunas_por_arquivo = {}

for file in csv_files:
    # Lê apenas o cabeçalho para obter os nomes das colunas
    cols = set(pd.read_csv(file, sep=";", nrows=0).columns)
    colunas_por_arquivo[file.name] = cols
    print(f"{file.name} - cols: {len(colunas_por_arquivo[file.name])}")

# Faço a união de todas as colunas encontradas em todos os arquivos
todas_colunas = set().union(*colunas_por_arquivo.values())

# Quantidade total de colunas únicas em todos os arquivos
print(f"\nQuantidade de colunas após a união de todas as colunas únicas: {len(todas_colunas)}")

INFLUD19-26-06-2025.csv - cols: 194
INFLUD20-26-06-2025.csv - cols: 194
INFLUD21-26-06-2025.csv - cols: 194
INFLUD22-26-06-2025.csv - cols: 194
INFLUD23-26-06-2025.csv - cols: 194
INFLUD24-26-06-2025.csv - cols: 194
INFLUD25-27-10-2025.csv - cols: 194

Quantidade de colunas após a união de todas as colunas únicas: 194


Como os dados estão dividos na fonte por ano, precisei confirmar que todos os dataframes possuem colunas iguais, após a união de todos as colunas únicas, constatei que a quantidade total de colunas não se altera nesta união, ou seja, as colunas são iguais (ao menos no nomeação).

#### Types

In [5]:
dtypes_por_csv = {}

for file in csv_files:
    # Leitura de uma amostra para inferir os dtypes
    df_sample = pd.read_csv(file, sep=";", nrows=10)
    dtypes_por_csv[file.name] = df_sample.dtypes.to_dict()

# Dtypes do primeiro arquivo csv como referência
csv_ref = next(iter(dtypes_por_csv))
dtypes_ref = dtypes_por_csv[csv_ref]

# Verifica diferenças de tipos
diferencas_dtypes = {}

for nome, dtypes in dtypes_por_csv.items():
    diffs = {
        col: (dtypes_ref[col], dtypes.get(col))
        for col in dtypes_ref
        if col in dtypes and dtypes_ref[col] != dtypes[col]
    }
    if diffs:
        diferencas_dtypes[nome] = diffs

# Exibe resultados
if diferencas_dtypes:
    print("\nDiferenças de tipos encontradas:")
    for nome, diffs in diferencas_dtypes.items():
        print(f"\n * {nome}:")
        for col, (ref, tipo) in diffs.items():
            print(f"  - {col}: {ref} -> {tipo}")
else:
    print("\nTodos os arquivos possuem os mesmos dtypes!")


Diferenças de tipos encontradas:

 * INFLUD20-26-06-2025.csv:
  - CO_REGIONA: int64 -> float64
  - CS_ZONA: int64 -> float64
  - AVE_SUINO: float64 -> int64
  - MORB_DESC: float64 -> object
  - VACINA: int64 -> float64
  - CO_RG_INTE: int64 -> float64
  - SUPORT_VEN: int64 -> float64
  - RAIOX_OUT: object -> float64
  - AMOSTRA: int64 -> float64
  - OUT_AMOST: object -> float64
  - PCR_RESUL: int64 -> float64
  - CLASSI_FIN: int64 -> float64
  - EVOLUCAO: int64 -> float64
  - DT_RES_AN: object -> float64
  - RES_AN: int64 -> float64

 * INFLUD21-26-06-2025.csv:
  - CO_REGIONA: int64 -> float64
  - CS_RACA: float64 -> int64
  - CO_RG_RESI: int64 -> float64
  - CS_ZONA: int64 -> float64
  - FEBRE: int64 -> float64
  - GARGANTA: int64 -> float64
  - DISPNEIA: int64 -> float64
  - DESC_RESP: int64 -> float64
  - SATURACAO: int64 -> float64
  - DIARREIA: int64 -> float64
  - VOMITO: int64 -> float64
  - OUTRO_SIN: int64 -> float64
  - MORB_DESC: float64 -> object
  - VACINA: int64 -> float

Foram identificadas diferenças na inferência dos tipos de dados `dtypes` entre os arquivos.
Essas variações ocorrem devido à tipagem dinâmica aplicada durante o carregamento dos dados pelo pandas.
Portanto, será necessário padronizar os tipos antes da concatenação, garantindo maior consistência e eficiência no processamento.

### INFLUD19-26-06-2025.csv

In [6]:
# Carregar o primeiro arquivo CSV.
df_19 = pd.read_csv(csv_files[0], sep=";", low_memory=False)
df_19.head()

Unnamed: 0,NU_NOTIFIC,DT_NOTIFIC,SEM_NOT,DT_SIN_PRI,SEM_PRI,SG_UF_NOT,ID_REGIONA,CO_REGIONA,ID_MUNICIP,CO_MUN_NOT,...,VG_OMS,VG_OMSOUT,VG_LIN,VG_MET,VG_METOUT,VG_DTRES,VG_ENC,VG_REINF,VG_CODEST,REINF
0,315478195042,2019-01-10,2,2019-01-06,2,MG,BELO HORIZONTE,1449.0,BELO HORIZONTE,310620.0,...,,,,,,,,,,
1,315478195276,2019-01-03,1,2019-01-01,1,SP,GVE I CAPITAL,1331.0,SAO PAULO,355030.0,...,,,,,,,,,,
2,315478207219,2019-01-02,1,2018-12-31,1,PE,001,1497.0,RECIFE,261160.0,...,,,,,,,,,,
3,315478211086,2019-01-10,2,2019-01-07,2,SP,GVE XVII CAMPINAS,1342.0,CAMPINAS,350950.0,...,,,,,,,,,,
4,315478212765,2019-01-11,2,2019-01-06,2,PE,004,1499.0,CARUARU,260410.0,...,,,,,,,,,,


In [10]:
df_19.shape

(48961, 194)

##### Analise de colunas

##### 0 - 10

| # | Column     | Non-Null Count | Dtype   |
| - | ---------- | -------------- | ------- |
| 0 | NU_NOTIFIC | 48961 non-null | int64   |
| 1 | DT_NOTIFIC | 48961 non-null | object  |
| 2 | SEM_NOT    | 48961 non-null | int64   |
| 3 | DT_SIN_PRI | 48961 non-null | object  |
| 4 | SEM_PRI    | 48961 non-null | int64   |
| 5 | SG_UF_NOT  | 48919 non-null | object  |
| 6 | ID_REGIONA | 44025 non-null | object  |
| 7 | CO_REGIONA | 44025 non-null | float64 |
| 8 | ID_MUNICIP | 48919 non-null | object  |
| 9 | CO_MUN_NOT | 48919 non-null | float64 |

In [49]:
# Definindo o tamanho do bloco de colunas
bloco = 10
n_colunas = df_19.shape[1]

# Cria o iterador
blocos_iter = iter(iter_blocos(n_colunas, bloco))

In [50]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 0 a 9
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   NU_NOTIFIC  48961 non-null  int64  
 1   DT_NOTIFIC  48961 non-null  object 
 2   SEM_NOT     48961 non-null  int64  
 3   DT_SIN_PRI  48961 non-null  object 
 4   SEM_PRI     48961 non-null  int64  
 5   SG_UF_NOT   48919 non-null  object 
 6   ID_REGIONA  44025 non-null  object 
 7   CO_REGIONA  44025 non-null  float64
 8   ID_MUNICIP  48919 non-null  object 
 9   CO_MUN_NOT  48919 non-null  float64
dtypes: float64(2), int64(3), object(5)
memory usage: 3.7+ MB


In [51]:
df_19[columns_name]

Unnamed: 0,NU_NOTIFIC,DT_NOTIFIC,SEM_NOT,DT_SIN_PRI,SEM_PRI,SG_UF_NOT,ID_REGIONA,CO_REGIONA,ID_MUNICIP,CO_MUN_NOT
0,315478195042,2019-01-10,2,2019-01-06,2,MG,BELO HORIZONTE,1449.0,BELO HORIZONTE,310620.0
1,315478195276,2019-01-03,1,2019-01-01,1,SP,GVE I CAPITAL,1331.0,SAO PAULO,355030.0
2,315478207219,2019-01-02,1,2018-12-31,1,PE,001,1497.0,RECIFE,261160.0
3,315478211086,2019-01-10,2,2019-01-07,2,SP,GVE XVII CAMPINAS,1342.0,CAMPINAS,350950.0
4,315478212765,2019-01-11,2,2019-01-06,2,PE,004,1499.0,CARUARU,260410.0
...,...,...,...,...,...,...,...,...,...,...
48956,31692547426126,2023-08-20,34,2019-11-19,47,CE,1 CRES FORTALEZA,1519.0,FORTALEZA,230440.0
48957,31706621109902,2019-05-13,20,2019-05-10,19,RJ,,,CABO FRIO,330070.0
48958,31706621464126,2019-06-11,24,2019-06-11,24,RJ,,,CABO FRIO,330070.0
48959,31709922959685,2019-10-30,44,2019-10-28,44,MT,CUIABA,1578.0,VARZEA GRANDE,510840.0


##### 10-19

| # | Column     | Non-Null Count | Dtype   |
| - | ---------- | -------------- | ------- |
| 0 | CS_SEXO    | 48961 non-null | object  |
| 1 | DT_NASC    | 48797 non-null | object  |
| 2 | NU_IDADE_N | 48961 non-null | int64   |
| 3 | TP_IDADE   | 48961 non-null | int64   |
| 4 | COD_IDADE  | 48961 non-null | int64   |
| 5 | CS_GESTANT | 48961 non-null | int64   |
| 6 | CS_RACA    | 47130 non-null | float64 |
| 7 | CS_ETINIA  | 287 non-null   | object  |
| 8 | CS_ESCOL_N | 45153 non-null | float64 |
| 9 | ID_PAIS    | 48957 non-null | object  |


In [52]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 10 a 19
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   CS_SEXO     48961 non-null  object 
 1   DT_NASC     48797 non-null  object 
 2   NU_IDADE_N  48961 non-null  int64  
 3   TP_IDADE    48961 non-null  int64  
 4   COD_IDADE   48961 non-null  int64  
 5   CS_GESTANT  48961 non-null  int64  
 6   CS_RACA     47130 non-null  float64
 7   CS_ETINIA   287 non-null    object 
 8   CS_ESCOL_N  45153 non-null  float64
 9   ID_PAIS     48957 non-null  object 
dtypes: float64(2), int64(4), object(4)
memory usage: 3.7+ MB


In [53]:
df_19[columns_name]

Unnamed: 0,CS_SEXO,DT_NASC,NU_IDADE_N,TP_IDADE,COD_IDADE,CS_GESTANT,CS_RACA,CS_ETINIA,CS_ESCOL_N,ID_PAIS
0,M,1988-03-17,30,3,3030,6,1.0,,2.0,BRASIL
1,F,2018-05-30,7,2,2007,6,1.0,,5.0,BRASIL
2,M,2017-05-07,1,3,3001,6,4.0,,5.0,BRASIL
3,F,2018-07-15,5,2,2005,6,1.0,,5.0,BRASIL
4,F,2018-09-15,3,2,2003,6,4.0,,5.0,BRASIL
...,...,...,...,...,...,...,...,...,...,...
48956,M,2019-03-16,8,2,2008,6,1.0,,9.0,BRASIL
48957,M,1968-01-17,51,3,3051,6,1.0,,9.0,BRASIL
48958,F,1949-01-23,70,3,3070,5,9.0,,1.0,BRASIL
48959,M,2018-10-07,1,3,3001,6,4.0,,9.0,BRASIL


##### 20-29

| # | Column     | Non-Null Count | Dtype   |
| - | ---------- | -------------- | ------- |
| 0 | CO_PAIS    | 48957 non-null | float64 |
| 1 | SG_UF      | 48924 non-null | object  |
| 2 | ID_RG_RESI | 44623 non-null | object  |
| 3 | CO_RG_RESI | 44623 non-null | float64 |
| 4 | ID_MN_RESI | 48924 non-null | object  |
| 5 | CO_MUN_RES | 48924 non-null | float64 |
| 6 | CS_ZONA    | 47713 non-null | float64 |
| 7 | NOSOCOMIAL | 46996 non-null | float64 |
| 8 | AVE_SUINO  | 47204 non-null | float64 |
| 9 | FEBRE      | 48740 non-null | float64 |


In [55]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 20 a 29
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   CO_PAIS     48957 non-null  float64
 1   SG_UF       48924 non-null  object 
 2   ID_RG_RESI  44623 non-null  object 
 3   CO_RG_RESI  44623 non-null  float64
 4   ID_MN_RESI  48924 non-null  object 
 5   CO_MUN_RES  48924 non-null  float64
 6   CS_ZONA     47713 non-null  float64
 7   NOSOCOMIAL  46996 non-null  float64
 8   AVE_SUINO   47204 non-null  float64
 9   FEBRE       48740 non-null  float64
dtypes: float64(7), object(3)
memory usage: 3.7+ MB


##### 30-39


In [56]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 30 a 39
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   TOSSE       48712 non-null  float64
 1   GARGANTA    47720 non-null  float64
 2   DISPNEIA    48527 non-null  float64
 3   DESC_RESP   48464 non-null  float64
 4   SATURACAO   48049 non-null  float64
 5   DIARREIA    47489 non-null  float64
 6   VOMITO      47525 non-null  float64
 7   OUTRO_SIN   44976 non-null  float64
 8   OUTRO_DES   10740 non-null  object 
 9   FATOR_RISC  20756 non-null  float64
dtypes: float64(9), object(1)
memory usage: 3.7+ MB


##### 40-49


In [57]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 40 a 49
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   PUERPERA    18355 non-null  float64
 1   CARDIOPATI  18804 non-null  float64
 2   HEMATOLOGI  18387 non-null  float64
 3   SIND_DOWN   18446 non-null  float64
 4   HEPATICA    18329 non-null  float64
 5   ASMA        18606 non-null  float64
 6   DIABETES    18617 non-null  float64
 7   NEUROLOGIC  18580 non-null  float64
 8   PNEUMOPATI  18616 non-null  float64
 9   IMUNODEPRE  18489 non-null  float64
dtypes: float64(10)
memory usage: 3.7 MB


##### 50-59


In [58]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 50 a 59
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   RENAL       18285 non-null  float64
 1   OBESIDADE   18299 non-null  float64
 2   OBES_IMC    734 non-null    float64
 3   OUT_MORBI   18490 non-null  float64
 4   MORB_DESC   8624 non-null   object 
 5   TABAG       0 non-null      float64
 6   VACINA      39649 non-null  float64
 7   DT_UT_DOSE  8024 non-null   object 
 8   MAE_VAC     6167 non-null   float64
 9   DT_VAC_MAE  1098 non-null   object 
dtypes: float64(7), object(3)
memory usage: 3.7+ MB


##### 60-69


In [59]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 60 a 69
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   M_AMAMENTA  5520 non-null   float64
 1   DT_DOSEUNI  242 non-null    object 
 2   DT_1_DOSE   300 non-null    object 
 3   DT_2_DOSE   210 non-null    object 
 4   ANTIVIRAL   48268 non-null  float64
 5   TP_ANTIVIR  29675 non-null  float64
 6   OUT_ANTIV   135 non-null    object 
 7   DT_ANTIVIR  29673 non-null  object 
 8   HOSPITAL    48684 non-null  float64
 9   DT_INTERNA  47640 non-null  object 
dtypes: float64(4), object(6)
memory usage: 3.7+ MB


##### 70-79


In [61]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 70 a 79
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   SG_UF_INTE  46127 non-null  object 
 1   ID_RG_INTE  41547 non-null  object 
 2   CO_RG_INTE  41547 non-null  float64
 3   ID_MN_INTE  46127 non-null  object 
 4   CO_MU_INTE  46127 non-null  float64
 5   NM_UN_INTE  46124 non-null  object 
 6   UTI         47392 non-null  float64
 7   DT_ENTUTI   17197 non-null  object 
 8   DT_SAIDUTI  9734 non-null   object 
 9   SUPORT_VEN  47136 non-null  float64
dtypes: float64(4), object(6)
memory usage: 3.7+ MB


##### 80-89


In [62]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 80 a 89
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   RAIOX_RES   45128 non-null  float64
 1   RAIOX_OUT   4979 non-null   object 
 2   DT_RAIOX    36303 non-null  object 
 3   AMOSTRA     48645 non-null  float64
 4   DT_COLETA   42617 non-null  object 
 5   TP_AMOSTRA  41827 non-null  float64
 6   OUT_AMOST   1053 non-null   object 
 7   PCR_RESUL   44059 non-null  float64
 8   DT_PCR      38984 non-null  object 
 9   POS_PCRFLU  13649 non-null  float64
dtypes: float64(5), object(5)
memory usage: 3.7+ MB


##### 90-99


In [63]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 90 a 99
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   TP_FLU_PCR  6377 non-null   float64
 1   PCR_FLUASU  5520 non-null   float64
 2   FLUASU_OUT  85 non-null     object 
 3   PCR_FLUBLI  521 non-null    float64
 4   FLUBLI_OUT  22 non-null     object 
 5   POS_PCROUT  11542 non-null  float64
 6   PCR_VSR     4839 non-null   float64
 7   PCR_PARA1   189 non-null    float64
 8   PCR_PARA2   61 non-null     float64
 9   PCR_PARA3   605 non-null    float64
dtypes: float64(8), object(2)
memory usage: 3.7+ MB


##### 100-109


In [64]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 100 a 109
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   PCR_PARA4   29 non-null     float64
 1   PCR_ADENO   832 non-null    float64
 2   PCR_METAP   723 non-null    float64
 3   PCR_BOCA    195 non-null    float64
 4   PCR_RINO    788 non-null    float64
 5   PCR_OUTRO   372 non-null    float64
 6   DS_PCR_OUT  329 non-null    object 
 7   CLASSI_FIN  47935 non-null  float64
 8   CLASSI_OUT  119 non-null    object 
 9   CRITERIO    46978 non-null  float64
dtypes: float64(8), object(2)
memory usage: 3.7+ MB


##### 110-119


In [65]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 110 a 119
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   EVOLUCAO    46090 non-null  float64
 1   DT_EVOLUCA  41146 non-null  object 
 2   DT_ENCERRA  47179 non-null  object 
 3   DT_DIGITA   32142 non-null  object 
 4   HISTO_VGM   48961 non-null  int64  
 5   PAIS_VGM    1 non-null      object 
 6   CO_PS_VGM   1 non-null      float64
 7   LO_PS_VGM   2 non-null      object 
 8   DT_VGM      1 non-null      object 
 9   DT_RT_VGM   1 non-null      object 
dtypes: float64(2), int64(1), object(7)
memory usage: 3.7+ MB


##### 120-129


In [66]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 120 a 129
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   PCR_SARS2  12 non-null     float64
 1   PAC_COCBO  11 non-null     object 
 2   PAC_DSCBO  11 non-null     object 
 3   OUT_ANIM   0 non-null      float64
 4   DOR_ABD    161 non-null    float64
 5   FADIGA     272 non-null    float64
 6   PERD_OLFT  158 non-null    float64
 7   PERD_PALA  158 non-null    float64
 8   TOMO_RES   84 non-null     float64
 9   TOMO_OUT   4 non-null      object 
dtypes: float64(7), object(3)
memory usage: 3.7+ MB


##### 130-139


In [67]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 130 a 139
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   DT_TOMO     12 non-null     object 
 1   TP_TES_AN   48597 non-null  float64
 2   DT_RES_AN   10452 non-null  object 
 3   RES_AN      43035 non-null  float64
 4   POS_AN_FLU  2863 non-null   float64
 5   TP_FLU_AN   847 non-null    float64
 6   POS_AN_OUT  2745 non-null   float64
 7   AN_SARS2    1 non-null      float64
 8   AN_VSR      1801 non-null   float64
 9   AN_PARA1    31 non-null     float64
dtypes: float64(8), object(2)
memory usage: 3.7+ MB


##### 140-149


In [68]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 140 a 149
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   AN_PARA2   11 non-null     float64
 1   AN_PARA3   96 non-null     float64
 2   AN_ADENO   131 non-null    float64
 3   AN_OUTRO   48 non-null     float64
 4   DS_AN_OUT  32 non-null     object 
 5   TP_AM_SOR  114 non-null    float64
 6   SOR_OUT    0 non-null      float64
 7   DT_CO_SOR  2 non-null      object 
 8   TP_SOR     2 non-null      float64
 9   OUT_SOR    0 non-null      float64
dtypes: float64(8), object(2)
memory usage: 3.7+ MB


##### 150-159


In [69]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 150 a 159
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   DT_RES      2 non-null      object 
 1   RES_IGG     11 non-null     float64
 2   RES_IGM     11 non-null     float64
 3   RES_IGA     9 non-null      float64
 4   POV_CT      845 non-null    float64
 5   TP_POV_CT   1 non-null      object 
 6   TEM_CPF     261 non-null    float64
 7   ESTRANG     236 non-null    float64
 8   VACINA_COV  641 non-null    float64
 9   DOSE_1_COV  49 non-null     object 
dtypes: float64(7), object(3)
memory usage: 3.7+ MB


##### 160-169


In [70]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 160 a 169
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   DOSE_2_COV  36 non-null     object
 1   DOSE_REF    13 non-null     object
 2   DOSE_2REF   4 non-null      object
 3   DOSE_ADIC   1 non-null      object
 4   DOS_RE_BI   3 non-null      object
 5   FAB_COV_1   49 non-null     object
 6   FAB_COV_2   36 non-null     object
 7   FAB_COVRF   13 non-null     object
 8   FAB_COVRF2  4 non-null      object
 9   FAB_ADIC    1 non-null      object
dtypes: object(10)
memory usage: 3.7+ MB


##### 170-179


In [71]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 170 a 179
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   FAB_RE_BI   3 non-null      object 
 1   LOTE_1_COV  48 non-null     object 
 2   LOTE_2_COV  36 non-null     object 
 3   LOTE_REF    13 non-null     object 
 4   LOTE_REF2   4 non-null      object 
 5   LOTE_ADIC   1 non-null      object 
 6   LOT_RE_BI   3 non-null      object 
 7   FNT_IN_COV  641 non-null    float64
 8   TRAT_COV    97 non-null     float64
 9   TIPO_TRAT   0 non-null      float64
dtypes: float64(3), object(7)
memory usage: 3.7+ MB


##### 180-189


In [72]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 180 a 189
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   DT_TRT_COV  0 non-null      float64
 1   OUT_TRAT    0 non-null      float64
 2   SURTO_SG    41 non-null     float64
 3   CO_DETEC    50 non-null     float64
 4   VG_OMS      0 non-null      float64
 5   VG_OMSOUT   0 non-null      float64
 6   VG_LIN      0 non-null      float64
 7   VG_MET      0 non-null      float64
 8   VG_METOUT   0 non-null      float64
 9   VG_DTRES    0 non-null      float64
dtypes: float64(10)
memory usage: 3.7 MB


##### 190-193


In [73]:
_, columns_name = proximo_bloco(df_19, blocos_iter)

📊 Colunas 190 a 193
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48961 entries, 0 to 48960
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   VG_ENC     0 non-null      float64
 1   VG_REINF   0 non-null      float64
 2   VG_CODEST  0 non-null      float64
 3   REINF      361 non-null    float64
dtypes: float64(4)
memory usage: 1.5 MB


### Métricas

**Taxa de aumento de casos**

In [9]:
df_19["DT_NOTIFIC"] = pd.to_datetime(df_19["DT_NOTIFIC"], errors="coerce")
df_19["ANO_MES"] = df_19["DT_NOTIFIC"].dt.to_period("M")

# Contagem de casos por mês
casos_mensais = df_19.groupby("ANO_MES").size().reset_index(name="casos")

# Calcular taxa de aumento (%)
casos_mensais["taxa_aumento_%"] = casos_mensais["casos"].pct_change() * 100

In [10]:
casos_mensais

Unnamed: 0,ANO_MES,casos,taxa_aumento_%
0,2018-12,3,
1,2019-01,1128,37500.0
2,2019-02,1830,62.234043
3,2019-03,3860,110.928962
4,2019-04,6131,58.834197
5,2019-05,8399,36.992334
6,2019-06,7769,-7.500893
7,2019-07,5851,-24.687862
8,2019-08,3975,-32.062895
9,2019-09,3084,-22.415094


**Taxa de mortalidade**

In [14]:
total_casos = len(df_19)
obitos = df_19[df_19["EVOLUCAO"] == 2].shape[0]

taxa_mortalidade = (obitos / total_casos) * 100
print(f"Taxa de mortalidade: {taxa_mortalidade:.2f}%")

Taxa de mortalidade: 11.09%


In [15]:
mortalidade_por_mes = (
    df_19.groupby(df_19["DT_NOTIFIC"].dt.to_period("M"))
    .apply(lambda x: (x["EVOLUCAO"].eq(2).sum() / len(x)) * 100)
    .reset_index(name="taxa_mortalidade_%")
)

In [17]:
mortalidade_por_mes

Unnamed: 0,DT_NOTIFIC,taxa_mortalidade_%
0,2018-12,0.0
1,2019-01,14.095745
2,2019-02,11.202186
3,2019-03,8.212435
4,2019-04,9.13391
5,2019-05,10.846529
6,2019-06,11.674604
7,2019-07,14.236883
8,2019-08,11.899371
9,2019-09,10.149157


**Taxa de ocupação de UTI**

In [18]:
internados = df_19[df_19["HOSPITAL"] == 1]
uti = internados[internados["UTI"] == 1]

taxa_ocupacao_uti = (len(uti) / len(internados)) * 100
print(f"Taxa de ocupação de UTI: {taxa_ocupacao_uti:.2f}%")

Taxa de ocupação de UTI: 35.94%


In [19]:
df_19["DT_ENTUTI"] = pd.to_datetime(df_19["DT_ENTUTI"], errors="coerce")
df_19["DT_SAIDUTI"] = pd.to_datetime(df_19["DT_SAIDUTI"], errors="coerce")
df_19["dias_uti"] = (df_19["DT_SAIDUTI"] - df_19["DT_ENTUTI"]).dt.days
media_dias_uti = df_19.loc[df_19["dias_uti"] > 0, "dias_uti"].mean()

In [20]:
media_dias_uti

np.float64(10.385061794734014)

**Taxa de vacinação da população**

In [None]:
vacinados = df_19[df_19["VACINA_COV"] == 1]
taxa_vacinacao = (len(vacinados) / len(df_19)) * 100
print(f"Taxa de vacinação (entre casos notificados): {taxa_vacinacao:.2f}%")

Taxa de vacinação (entre casos notificados): 0.11%


# End