# Dados de Saúde e Orçamento

Este notebook realiza uma análise exploratória de dados (EDA) com foco em saúde pública e orçamento, utilizando o dataset fornecido.

In [1]:
# Importar bibliotecas necessárias
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style='whitegrid')

## Carregar o Dataset
Vamos carregar os dados de saúde e orçamento para análise.

In [2]:
# Carregar o dataset
dataset_path = '../datasets/br_ieps_saude_uf.csv'
saude = pd.read_csv(dataset_path, delimiter=',')

# Exibir as primeiras linhas do dataset
saude.head()

Unnamed: 0,ano,sigla_uf,cob_ab,cob_acs,cob_esf,cob_vac_bcg,cob_vac_rota,cob_vac_menin,cob_vac_pneumo,cob_vac_polio,...,desp_recp_saude_pc_mun,pct_desp_recp_saude_uf,desp_tot_saude_pc_uf,desp_recp_saude_pc_uf,desp_tot_saude_pc_mun_def,desp_recp_saude_pc_mun_def,desp_tot_saude_pc_uf_def,desp_recp_saude_pc_uf_def,num_familias_bf,gasto_pbf_pc_def
0,2010,AC,75.558215,87.753425,64.064896,100.0,66.749718,0.652842,4.864448,100.0,...,114.16634,17.43,756.49,561.04,392.055532,218.628717,1448.679513,1074.392463,59779,194.538143
1,2010,AL,78.580644,77.051768,72.96154,100.0,74.794916,3.434813,6.652844,100.0,...,124.854673,12.34,221.78,148.88,630.157824,239.096891,424.709041,285.105429,414112,289.369319
2,2010,AM,65.216161,69.13669,50.478242,100.0,58.681194,2.051782,7.585714,92.480398,...,163.53858,20.67,504.84,360.75,505.598301,313.176633,966.769376,690.836805,278893,195.658268
3,2010,AP,90.042754,89.235233,73.277913,100.0,73.531058,0.395436,4.546726,90.752658,...,111.239863,12.03,548.27,400.33,447.77467,213.024509,1049.937893,766.632566,44096,166.175598
4,2010,BA,66.06322,82.518806,59.840187,100.0,72.142332,64.723803,2.83148,95.929891,...,120.605392,13.67,209.53,130.96,547.66446,230.959512,401.250272,250.788601,1662069,259.609686


## Limpeza de Dados
Antes de realizar análises, precisamos lidar com valores ausentes e ajustar o formato dos dados conforme necessário.

In [4]:
import os
if not os.path.exists('datasets'):
    os.makedirs('datasets')

# Identificar valores ausentes
print('Valores ausentes por coluna:')
print(saude.isnull().sum())

# Substituir valores ausentes por 0 nas colunas financeiras
colunas_financeiras = [col for col in saude.columns if 'desp' in col or 'gasto' in col]
saude[colunas_financeiras] = saude[colunas_financeiras].fillna(0)

# Imputar valores ausentes nas demais colunas 
for col in saude.columns:
    if saude[col].dtype in ['float64', 'int64']:
        saude[col].fillna(saude[col].mean(), inplace=True)
    else:
        saude[col].fillna(saude[col].mode()[0], inplace=True)

saude.to_csv('datasets/uf_limpo.csv', index=False)

Valores ausentes por coluna:
ano                           0
sigla_uf                      0
cob_ab                        0
cob_acs                       0
cob_esf                       0
                             ..
desp_recp_saude_pc_mun_def    0
desp_tot_saude_pc_uf_def      0
desp_recp_saude_pc_uf_def     0
num_familias_bf               0
gasto_pbf_pc_def              0
Length: 66, dtype: int64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  saude[col].fillna(saude[col].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  saude[col].fillna(saude[col].mode()[0], inplace=True)
