<a href="https://colab.research.google.com/github/digo-eu/advanced_learning/blob/main/microdados_violencia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h3>Usado para baixar a base como microdados_violencia.csv:</h3>
   pip install basedosdados <br>
   import basedosdados as bd <br>
   df = bd.read_table(dataset_id='br_ms_sinan', table_id='microdados_violencia', billing_project_id="violenciasinan", use_bqstorage_api = True) <br>
   df.to_csv(path_or_buf = 'microdados_violencia.csv')

<b>Incluído para simples referência à origem dos dados. Eles já estão incluídos no repositório por meio do Git LFS com o nome <i>"microdados_violencia.csv"</i>.</b> <br>
Necessário criar projeto do Google Big Query para que seja feita a conexão com a base original.

In [5]:
# importar bibliotecas usadas
import numpy as np
import pandas as pd

import tensorflow as tf
from tf import keras
from tf.keras import layers
from tf.keras.callbacks import EarlyStopping
from tf.keras.layers.experimental import preprocessing
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', 100)
from sklearn.metrics import confusion_matrix, classification_report
import itertools
print(tf.__version__)
try:
    physical_devices = tf.config.list_physical_devices('GPU') 
    #tf.config.experimental.set_memory_growth(physical_devices[0], True)
    for device in physical_devices:
        tf.config.experimental.set_memory_growth(device, True)
    print(physical_devices)
except:
    print("No GPU")

## Resumo dos dados e objetivos de pesquisa

Nossos dados têm origem no questionário de violência interpessoal ou autoprovocada do SINAN, aplicado em usuários da rede de saúde que se apresentam como casos suspeitos ou confirmados de violência doméstica/intrafamiliar ou extrafamiliar caso a vítima seja criança, adolescente, mulher, idosa, indígena, LGBT ou pessoa com deficiência ou transtorno.

Dentre as perguntas do questionário estão vários fatores que, imaginamos, representam um diferencial de risco quanto a diferentes características dessa violência sofrida, sejam: letalidade, repetição, caráter sexual, entre outras.

A grande quantidade de dados disponíveis nessa base nos levam a crer que seria possível determinar a relação de várias dessas respostas ao questionário com as características mais relevantes da violência sofrida, a fim de traçar um perfil e mais facilmente detectar casos de violência em um cenário de cuidado real, em que vítimas de violência frequentemente escondem a situação por que estão passando.

Demos preferência a uma característica em particular: a reincidência do caso de violência.
Essa característica se apresenta no questionário em forma binária, na forma da pergunta "Ocorreu outras vezes?" para a qual se pode responder "sim", "não" ou "ignorado".

Trata-se portanto de um problema de classificação binária, em que se quer predizer a probabilidade que um indivíduo com dadas características tem de sofrer de violência interpessoal ou autoprovocada repetida. Devemos portanto optar por modelos adequados para essa tarefa. A título de comparação do desempenho, escolhemos:

*   Naive Bayes
*   Regressão Logística
*   K-Nearest Neighbours
*   Support Vector Machine
*   Árvore Decisória
*   Bagging Decision Tree (Ensemble Learning I)
*   Boosted Decision Tree (Ensemble Learning II)
*   Random Forest (Ensemble Learning III)
*   Voting Classification (Ensemble Learning IV)
*   Rede Neural (Deep Learning)

In [8]:
# carregar dados em um dataframe pandas
df = pd.read_csv("microdados_violencia.csv")
# excluir colunas inadequadas para a análise (tornado opcional por enquanto)
# df = df.drop(['data_encerramento', 'data_notificacao', 'id_categoria_cid10', 'id_subcategoria_cid10', 'data_ocorrencia', 'hora_ocorrencia', 'id_municipio_notificacao', 'id_municipio_6_notificacao', 'id_unidade_notificacao',
# 'id_regional_saude_notificacao', 'id_municipio_ocorrencia', 'id_municipio_6_ocorrencia', 'id_municipio_residencia', 'id_municipio_6_residencia', 'id_regional_saude_residencia',
# 'houve_qual_outra_violencia_sexual', 'meio_qual_outro', 'ocorreu_qual_outra', 'outro_local_ocorrencia', 'quais_outras_deficiencias_paciente', 'autor_relacao_outros'], axis=1)

# resumo dos dados
with pd.option_context('display.max_rows', 5, 'display.max_columns', None): 
    display(df)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [9]:
# NaN em 0
df.replace (np.nan, 0)

Unnamed: 0.1,Unnamed: 0,ano,tipo_notificacao,id_uf_notificacao,id_uf_ocorrencia,local_ocorrencia,outras_vezes_ocorrencia,id_uf_residencia,idade_paciente,sexo_paciente,gestante_paciente,raca_paciente,escolaridade_paciente,ocupacao_paciente,estado_civil_paciente,orientacao_sexual_paciente,identidade_genero_paciente,motivacao_violencia,violencia_relacionada_trabalho,emitiu_cat,deficiencia_transtorno_paciente,deficiencia_fisica_paciente,deficiencia_mental_paciente,deficiencia_visual_paciente,deficiencia_auditiva_paciente,transtorno_mental_paciente,transtorno_comportamental_paciente,outras_deficiencias_paciente,lesao_autoprovocada,ocorreu_violencia_fisica,ocorreu_violencia_psicologica,ocorreu_tortura,ocorreu_violencia_sexual,ocorreu_trafico_ser_humano,ocorreu_violencia_financeira,ocorreu_negligencia_abandono,ocorreu_trabalho_infantil,ocorreu_intervencao_legal,ocorreu_outra_violencia,meio_forca,meio_enforcamento,meio_objeto_contundente,meio_objeto_perfurante,meio_objeto_quente,meio_envenenamento,meio_arma_fogo,meio_ameaca,meio_outros,houve_assedio,houve_estupro,houve_pornografia_infantil,houve_exploracao_sexual,houve_outra_violencia_sexual,profilaxia_dst,profilaxia_hiv,profilaxia_hepatite_b,coleta_sangue,coleta_semen,coleta_secrecao_vaginal,profilaxia_contraceptivo,aborto,numero_envolvidos_violencia,autor_pai,autor_mae,autor_padrasto,autor_madrasta,autor_conjugue,autor_ex_conjugue,autor_namorado_a,autor_ex_namorado_a,autor_filho_a,autor_desconhecido,autor_irmao,autor_conhecido,autor_cuidador,autor_patrao_chefe,autor_institucional,autor_policial,autor_propria_pessoa,autor_outros,autor_sexo,autor_usou_alcool,encaminhamento_saude,encaminhamento_assistencia_social,encaminhamento_educacao,encaminhamento_atendimento_mulher,encaminhamento_conselho_tutelar,encaminhamento_conselho_idoso,encaminhamento_delegacia_idoso,encaminhamento_direitos_humanos,encaminhamento_mpu,encaminhamento_delegacia_crianca,encaminhamento_delegacia_mulher,encaminhamento_delegacia,encaminhamento_justica_infancia_juventude,encaminhamento_defensoria_publica
0,0,2009,2,12,12.0,6.0,0.0,12,26.0,1.0,6.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,2009,2,27,27.0,0.0,0.0,27,27.0,1.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2,2009,2,27,27.0,1.0,0.0,27,20.0,1.0,6.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3,2009,2,27,27.0,0.0,0.0,27,16.0,1.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,4,2009,2,27,27.0,0.0,0.0,27,12.0,1.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2299137,2299137,2016,2,33,33.0,1.0,1.0,33,40.0,0.0,0.0,4.0,3.0,513205.0,1.0,1.0,8.0,10.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2299138,2299138,2016,2,33,33.0,9.0,1.0,33,29.0,0.0,5.0,4.0,5.0,999994.0,4.0,1.0,8.0,10.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2299139,2299139,2016,2,33,33.0,1.0,0.0,33,27.0,1.0,6.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2299140,2299140,2016,2,33,33.0,1.0,1.0,33,11.0,0.0,5.0,4.0,3.0,999991.0,1.0,1.0,8.0,88.0,0.0,8.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
# separando labels e features
features = df.copy()
labels = features.pop('outras_vezes_ocorrencia')

In [12]:
from sklearn.model_selection import train_test_split

Xtreino, Xteste, Ytreino, Yteste=train_test_split(features, labels, test_size=0.6, shuffle=True, random_state=0)

In [None]:
# transformação opcional do formato dos dados
# inputs = {}

# for name, column in features.items():
#  dtype = column.dtype
#   if dtype == object:
#     features.drop(name, axis=1)
#   else:
#     dtype = tf.float32
# 
#   inputs[name] = tf.keras.Input(shape=(1,), name=name, dtype=dtype)

In [13]:
# modelo de regressão
# SGD como otimizador
modelo = tf.keras.Sequential([
  layers.Dense(64),
  layers.Dense(1)
])

modelo.compile(loss = tf.losses.MeanSquaredError(),
                      optimizer = tf.optimizers.SGD())

In [14]:
# fit do modelo com callbacks
lr_reducer = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',patience=2,factor=0.2)
early_stopper = tf.keras.callbacks.EarlyStopping(patience=5)
callbacks = [lr_reducer,early_stopper]

history = modelo.fit(Xtreino,
                    Ytreino,
                    validation_data=(Xteste,Yteste), 
                    callbacks=callbacks,
                    epochs=5,
                    batch_size=64,
                    verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [15]:
modelo.save('./pesos/')

INFO:tensorflow:Assets written to: ./pesos/assets


def show_cf(y_true, y_pred, class_names=None, model_name=None):
    """Plots a confusion matrix"""
    cf = confusion_matrix(y_true, y_pred)
    plt.imshow(cf, cmap=plt.cm.Blues)
    
    if model_name:
        plt.title("Confusion Matrix: {}".format(model_name))
    else:
        plt.title("Confusion Matrix")
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    
    class_names = set(y_true)
    tick_marks = np.arange(len(class_names))
    if class_names:
        plt.xticks(tick_marks, class_names)
        plt.yticks(tick_marks, class_names)
    
    thresh = cf.max() / 2.
    
    for i, j in itertools.product(range(cf.shape[0]), range(cf.shape[1])):
        plt.text(j, i, cf[i, j], horizontalalignment='center', color='white' if cf[i, j] > thresh else 'black')

    plt.colorbar()

In [16]:
modelo.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 64)                6144      
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
Total params: 6,209
Trainable params: 6,209
Non-trainable params: 0
_________________________________________________________________
