# **Atividade 11 - Parte 2**
# **Classificador SVM**

Trabalho realizado na disciplina de Ciência e Visualização de Dados em Saúde

Data: 28/06/2022

## Equipe

Bruna Almeida Osti - RA 231024

Fábio Fogliarini Brolesi - RA 023718

Ingrid Alves de Paiva Barbosa - RA 182849

## Sumário

1. [Proposta e objetivo](#section1)
2. [Classificador SVM](#section2)
3. [Análise](#section3)
4. [Teste](#section4)

## Proposta e objetivo <a name="section1"></a>

Na segunda parte a proposta é treinar um classificador SVM com o objetivo de classificar uma imagem com lesão como "isquêmica" (AVC) ou "desmielinizante" (EM). Serão usadas as imagens e suas respectivas máscaras fornecidas na pasta TRAIN para treino e validação, e o conjunto de teste será fornecido separadamente sem as labels.


## Classificador SVM <a name="section2"></a>

Para atingir os objetivos apresentados anteriormente, este classficador foi dividido nas seguintes etapas:

1. Leitura das imagens e aplicação da normalização
2. Seleção da Região de Interesse (ROI)
3. Extração dos atributos
4. Preparação dos datasets de treino e validação
5. Realização do treino e validação
6. Análise dos resultados obtidos
7. Realização do teste com base no melhor resultado

Com o resultado da parte 1 desta atividade, a intenção da equipe era utilizar a normalização N2 (decimal scaling) e a normalização N8 (método extra, que iguala todas as média), para comparar seus resultados e identificar qual se sai melhor. Entretanto, houve um desafio ao utilizar o N8, porque sua faixa de valores de intensidade vai de 0 à 600 aproximadamente, estourando o limite de 255, e isso impedia o funcionamento de alguns métodos implementados.

Como o método N8 apresentou ótimos resultados nos artigos lidos como referência, optamos por mantê-lo, mas realizar alguma outra operação que permitisse que seus valores ficassem no limite de 0 a 255, sem perder seu conceito de igualar as médias. Foi aplicado então o método N2 após a aplicação do método N8, e o resultado dessa dupla normalização será comparada com o resultado sem nenhuma normalização.

Já para a seleção da região de interesse, a equipe tinha dúvidas se a imagem inteira (com aplicação da máscara) apresentaria melhor ou pior resultado que a imagem recortada com uma certa dilatação. Por isso, também optou-se por testar as duas possibilidades e comparar os resultados.

Abaixo, será exposto todo o código desenvolvido para cumprir as etapas de 1 à 5 apresentadas anteriormente.

In [None]:
!pip install glrlm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting glrlm
  Downloading glrlm-0.1.0-py3-none-any.whl (6.9 kB)
Installing collected packages: glrlm
Successfully installed glrlm-0.1.0


In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import svm
import matplotlib.image as mpimg
from google.colab import drive
from google.colab import files
import glob, os
from skimage import io, color
import skimage.filters
from collections import Counter
from skimage.feature import greycomatrix, greycoprops
from glrlm import GLRLM
from scipy import ndimage
from skimage.feature import local_binary_pattern
from scipy import stats
import pandas as pd
from tqdm import tqdm
from tqdm import trange
#from pycaret.regression import *
import imageio
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, roc_auc_score, precision_score, recall_score, f1_score
from sklearn.model_selection import GridSearchCV
import pickle
import joblib
from PIL import Image

Conexão com o Google Drive

In [None]:
# connect drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Funções auxiliares

In [None]:
def _normalize_image(img):

  # Normalização min-max (0-255)
  constant = (255 - 0) / (img.max() - img.min())
  return img * constant


def get_glrlm_run_lenght_matrix(img):
  app = GLRLM()
  glrlm = app.get_features(img, 8)
  #- SRE = Short Run Emphasis
  #- LRE = Long Run Emphasis
  #- GLU = Grey Level Uniformity
  #- RLU = Run Length Uniformity
  #- RPC = Run Percentage
  res = glrlm.Features
  return {
      'SRE': res[0],
      'LRE': res[1],
      'GLU': res[2],
      'RLU': res[3],
      'RPC': res[4]
  }
  

def get_lbp_pattern(f, p=8, r=1, method='default'):
    """
    Retorna o LBP de uma imagem

    Parameters
    ----------
    f : array
        matriz da imagem.
    p : int, optional
        Número de pontos de ajuste vizinhos circularmente simétricos
        (quantização do espaço angular). O padrão é 8.
    r : float, optional
        Raio do círculo (resolução espacial do operador). O padrão é 1.

    Returns
    -------
    patterns : array
            imagem LBP.
    """

    patterns = local_binary_pattern(f, p, r, method)  
    return patterns


def get_image_metrics(f, nbins=20, min_hist=0,  max_hist=255, prefix=''):
  h, bin_edges = np.histogram(f, nbins, (min_hist, max_hist))
  #bin_centers = bin_edges[1:]-(w/2)
  #mean value
  mean= np.mean(f)
  #median value
  median = np.median(f)
  #mode value
  # mode= stats.mode(f)
  # Kurtosis
  kurtosis = stats.kurtosis(h)
  # Skewness
  skewness = stats.skew(h) 
  # Desvio padrão
  std = np.std(f)
  # Variancia
  var_ = np.var(f)
  return {
      prefix + 'Mean': mean,
      prefix + 'Median': median,
      # prefix + 'Mode': mode.mode[0],
      prefix + 'Kurtosis': kurtosis,
      prefix + 'Skewness': skewness,
      prefix + "Var": var_,
      prefix + "Std": std}

def get_co_occurrence_matrix(image, prefix='', distance=1, angle=np.pi/4):
  glcm_properties = []
  glcm = greycomatrix((image).astype(np.uint8), 
                            [distance], [angle], 256, 
                            symmetric=True, normed=True)
        
  diss = greycoprops(glcm, 'dissimilarity')
  cont = greycoprops(glcm, 'contrast')
  eng = greycoprops(glcm, 'energy')
  corr = greycoprops(glcm, 'correlation')
  ASM = greycoprops(glcm, 'ASM')
  homo = greycoprops(glcm, 'homogeneity')
  #full_data = np.concatenate((diss[0], cont[0], eng[0], corr[0], ASM[0], homo[0]), axis=0)
  return {prefix +'diss': diss[0][0], 
                          prefix + 'cont': cont[0][0],
                          prefix + 'eng': eng[0][0],
                          prefix + 'corr': corr[0][0], 
                          prefix + 'ASM': ASM[0][0], 
                          prefix + 'homo': homo[0][0]}


def get_real_file(f):
  if os.path.exists(f + '.bmp'):
    return f + '.bmp'
  if os.path.exists(f + '.png'):
    return f + '.png'
  print('ERRO - Arquivo ' + f + ' não encontrado')

def get_N8_ratio_value(avc_default_figures, em_default_figures):
  media_avc = []
  for i in range(len(avc_default_figures)):
    img_array = np.array(Image.open(avc_default_figures[i]).convert('L'))
    media_atual = np.mean(img_array)
    media_avc.append(media_atual)
  
  media_em = []
  for i in range(len(em_default_figures)):
    img_array = np.array(Image.open(em_default_figures[i]).convert('L'))
    media_atual = np.mean(img_array)
    media_em.append(media_atual)
  return max(media_em + media_avc)

def N2(img_array): #definindo a função para calcular a normalização diretamente
    img_N2 = (img_array/1000)
    return img_N2


def N8(img_array):
  global max_mean
  ratio = max_mean/np.mean(img_array)
  img_N8 = img_array * ratio
  return img_N8 / 1000

def crop_image(img,tol=0):
    # img is 2D image data
    # tol  is tolerance
    mask = img > tol
    return img[np.ix_(mask.any(1), mask.any(0))]


def get_interest_image_data(f, f_mask, dilation=0):
  f_masked = f * ndimage.binary_dilation(f_mask, iterations=dilation)
  cropped = crop_image(f_masked)
  return cropped

def get_files_list_and_rejects(folder):
  # Recupera todas as imagens (incluindo as mascaras)
  figures = [i.split('/')[-1] for i in glob.glob(folder + "/*[!mask].*")]
  # Recupera somente as mascaras
  figures_mask = [i.split('/')[-1] for i in glob.glob(folder + "/*mask.*")]
  files = [i.lower().split('.')[0] for i in figures]
  mask_files = [i.lower().split('_mask')[0] for i in figures_mask]
  # Intersecção entre imagens e máscaras
  default_figures = [value for value in mask_files if value in files]
  # Aqui arquivos sem máscara (ou que o nome não bateu exato)
  without_mask = list(set(files) - set(default_figures))
  path_default_figures = [get_real_file(folder + os.sep + i.upper()) for i in default_figures]
  path_default_figures_mask = [get_real_file(folder + os.sep + i.upper() + '_mask') for i in default_figures]
  path_without_mask = [get_real_file(folder + os.sep + i.upper()) for i in without_mask]
  return path_default_figures, path_default_figures_mask, path_without_mask

## Identificando lista de arquivos

### AVC

In [None]:
avc_figures_path = 'drive/MyDrive/Train/AVC'
avc_default_figures, avc_default_figures_mask, avc_without_mask = get_files_list_and_rejects(avc_figures_path)

# Arquivos com máscara associada: `avc_default_figures`
# Arquivos de máscara: `avc_default_figures_mask`
# Lista de arquivos sem máscara: `avc_without_mask`
print(len(avc_default_figures), len(avc_default_figures_mask), len(avc_without_mask))

511 511 964


### EM

In [None]:
em_figures_path = 'drive/MyDrive/Train/EM'
em_default_figures, em_default_figures_mask, em_without_mask = get_files_list_and_rejects(em_figures_path)

# Arquivos com máscara associada: `em_default_figures`
# Lista de arquivos sem máscara: `em_without_mask`
print(len(em_default_figures), len(em_default_figures_mask), len(em_without_mask))

628 628 1716


# Processamento

In [None]:
max_mean = get_N8_ratio_value(avc_default_figures, em_default_figures)

## AVC

In [None]:
def full_image_list(default_figures, default_figures_mask, type_image, norm=True, dilatation=True):
  print("NORM:", norm, "DILATATION:", dilatation)
  full_list = []
  for i in range(len(default_figures)):
    current_image = np.array(Image.open(default_figures[i]).convert('L'))
    current_image_mask =  np.array(Image.open(default_figures_mask[i]).convert('L'))
    # Definido o threshold
    t = skimage.filters.threshold_otsu(current_image_mask)
    current_image_mask = (current_image_mask > t).astype(int)
    
    # Normalização da imagem
    if norm:
      current_image = N8(current_image)

    # Recuperando área de interesse
    if dilatation:
      interest_image_data = get_interest_image_data(current_image, current_image_mask, dilation=10)
    else:
      interest_image_data = current_image

    # Extraindo métricas
    glrlm_run_lenght = get_glrlm_run_lenght_matrix(interest_image_data)
    image_metrics = get_image_metrics(interest_image_data)
    co_occurrence_matrix = get_co_occurrence_matrix(interest_image_data)

    lbp81 = get_lbp_pattern(interest_image_data)
    lbp81_image_metrics = get_image_metrics(lbp81, prefix='lbp81')
    

    # Compondo o array final que se tornará o dataframe
    cur_dict = {'type': type_image}
    cur_dict.update({'file': default_figures[i].upper()})
    cur_dict.update(glrlm_run_lenght)
    cur_dict.update(image_metrics)
    cur_dict.update(co_occurrence_matrix)
    cur_dict.update(lbp81_image_metrics)
    full_list.append(cur_dict)

  return full_list


##### **Normalization and ROIs with dilatation**

In [None]:
full_avc_list_norm_roi = []
full_avc_list_norm_roi = full_image_list(avc_default_figures, avc_default_figures_mask, 'AVC')

NORM: True DILATATION: True


##### **No normalization and ROIs with dilatation**

In [None]:
full_avc_list_roi = []
full_avc_list_roi = full_image_list(avc_default_figures, avc_default_figures_mask, 'AVC', norm=False)

NORM: False DILATATION: True


##### **Normalization with complete images**

In [None]:
full_avc_list_norm = []
full_avc_list_norm = full_image_list(avc_default_figures, avc_default_figures_mask, 'AVC', dilatation=False)

NORM: True DILATATION: False


##### **No normalization and complete images**

In [None]:
full_avc_list = []
full_avc_list = full_image_list(avc_default_figures, avc_default_figures_mask, 'AVC', norm=False, dilatation=False)

NORM: False DILATATION: False


## EM

##### **Normalization and ROIs with dilatation**

In [None]:
full_em_list_norm_roi = []
full_em_list_norm_roi = full_image_list(em_default_figures, em_default_figures_mask, 'EM')

NORM: True DILATATION: True


##### **No normalization and ROIs with dilatation**

In [None]:
full_em_list_roi = []
full_em_list_roi = full_image_list(em_default_figures, em_default_figures_mask, 'EM', norm=False)

NORM: False DILATATION: True


##### **Normalization with complete images**

In [None]:
full_em_list_norm = []
full_em_list_norm = full_image_list(em_default_figures, em_default_figures_mask, 'EM', dilatation=False)

NORM: True DILATATION: False


##### **No normalization and complete images**

In [None]:
full_em_list = []
full_em_list = full_image_list(em_default_figures, em_default_figures_mask, 'EM', norm=False, dilatation=False)

NORM: False DILATATION: False


## união e criação do dataset

In [None]:
df_norm_roi = pd.DataFrame(full_avc_list_norm_roi + full_em_list_norm_roi)
df_roi = pd.DataFrame(full_avc_list_roi + full_em_list_roi)
df_norm = pd.DataFrame(full_avc_list_norm + full_em_list_norm)
df = pd.DataFrame(full_avc_list + full_em_list)

In [None]:
df_norm_roi.to_csv('df_norm_roi.csv', index=False)
df_roi.to_csv('df_roi.csv', index=False)
df_norm.to_csv('df_norm.csv', index=False)
df.to_csv('df.csv', index=False)

files.download('df_norm_roi.csv')
files.download('df_roi.csv')
files.download('df_norm.csv')
files.download('df.csv')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Processamento

## Ajuste do dataset

In [None]:
patient = [(x.split(os.sep)[-1]).split('_')[0] for x in df['file']]

def adjustment(df_):
  # Criação de feature `patient`, com o código do paciente a partir do arquivo
  
  df_['patient'] = patient

  # Criaçaõ de feature binária (0 ou 1) para se é Esclerose ou não (não esclerose => AVC)
  is_em = [int(x == 'EM') for x in df_['type']]
  df_['is_em'] = is_em

  # Remoção de feature `file` (utilizada para recuperar os dados do paciente) & `type` (utilizada para saber se é esclerose ou avc)
  df_ = df_.drop(['file', "type"], axis=1)

  return df_

In [None]:
df_norm_roi = adjustment(df_norm_roi)
df_roi = adjustment(df_roi)
df_norm = adjustment(df_norm)
df = adjustment(df)

In [None]:
df.head()

Unnamed: 0,SRE,LRE,GLU,RLU,RPC,Mean,Median,Kurtosis,Skewness,Var,...,ASM,homo,lbp81Mean,lbp81Median,lbp81Kurtosis,lbp81Skewness,lbp81Var,lbp81Std,patient,is_em
0,1.12,5110.618,6843.246,14129.76,26.342,1.446259,0.0,15.052632,4.129483,3.706241,...,0.340433,0.941844,237.632732,255.0,14.966843,4.113127,2826.246237,53.162451,6,0
1,1.177,6081.51,7405.588,13122.623,24.312,1.256084,0.0,15.052632,4.129483,3.262176,...,0.377317,0.946366,239.082607,255.0,14.983729,4.116335,2615.159234,51.138628,6,0
2,1.086,8340.93,9013.865,9737.389,20.006,1.055477,0.0,15.052632,4.129483,2.73829,...,0.425781,0.955761,241.907307,255.0,15.007239,4.120811,2192.374522,46.822799,6,0
3,1.534,1906.314,4057.403,26278.653,38.915,1.810508,1.0,15.052632,4.129483,4.081953,...,0.220064,0.905396,228.47036,255.0,14.802667,4.082255,3901.152047,62.459203,7,0
4,1.476,2022.739,4261.644,23671.559,37.079,1.859826,1.0,15.052632,4.129483,4.233676,...,0.227785,0.909499,229.629089,255.0,14.821754,4.085811,3778.1416,61.466589,7,0


### Segregação em treino e teste
Entendendo que não é adequado manter o mesmo paciente em treino e em teste, separamos os pacientes para depois separarmos as observações

In [None]:
patients_train, patients_test = train_test_split(list(set(patient)), train_size=.80, shuffle=True, random_state=4) # 1 ou 15 são algumas separações aceitáveis

In [None]:
df_norm_roi_train, df_norm_roi_test = df_norm_roi[df_norm_roi['patient'].isin(patients_train)], df_norm_roi[df_norm_roi['patient'].isin(patients_test)]
df_roi_train, df_roi_test = df_roi[df_roi['patient'].isin(patients_train)], df_roi[df_roi['patient'].isin(patients_test)]
df_norm_train, df_norm_test = df_norm[df_norm['patient'].isin(patients_train)], df_norm[df_norm['patient'].isin(patients_test)]
df_train, df_test = df[df['patient'].isin(patients_train)], df[df['patient'].isin(patients_test)]

In [None]:
print(df_roi_train['is_em'].value_counts())
print(df_roi_test['is_em'].value_counts())

1    493
0    410
Name: is_em, dtype: int64
1    135
0    101
Name: is_em, dtype: int64


In [None]:
df_norm_roi_train.drop(['patient'], axis=1, inplace=True)
df_norm_roi_test.drop(['patient'], axis=1, inplace=True)

df_roi_train.drop(['patient'], axis=1, inplace=True)
df_roi_test.drop(['patient'], axis=1, inplace=True)

df_norm_train.drop(['patient'], axis=1, inplace=True)
df_norm_test.drop(['patient'], axis=1, inplace=True)

df_train.drop(['patient'], axis=1, inplace=True)
df_test.drop(['patient'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [None]:
X_norm_roi_train, y_norm_roi_train = df_norm_roi_train.drop(['is_em'], axis=1), df_norm_roi_train[['is_em']].reset_index(drop=True)
X_roi_train, y_roi_train = df_roi_train.drop(['is_em'], axis=1), df_roi_train[['is_em']].reset_index(drop=True)
X_norm_train, y_norm_train = df_norm_train.drop(['is_em'], axis=1), df_norm_train[['is_em']].reset_index(drop=True)
X_train, y_train = df_train.drop(['is_em'], axis=1), df_train[['is_em']].reset_index(drop=True)

X_norm_roi_test, y_norm_roi_test = df_norm_roi_test.drop(['is_em'], axis=1), df_norm_roi_test[['is_em']].reset_index(drop=True)
X_roi_test, y_roi_test = df_roi_test.drop(['is_em'], axis=1), df_roi_test[['is_em']].reset_index(drop=True)
X_norm_test, y_norm_test = df_norm_test.drop(['is_em'], axis=1), df_norm_test[['is_em']].reset_index(drop=True)
X_test, y_test = df_test.drop(['is_em'], axis=1), df_test[['is_em']].reset_index(drop=True)

In [None]:
X_norm_roi_train

Unnamed: 0,SRE,LRE,GLU,RLU,RPC,Mean,Median,Kurtosis,Skewness,Var,...,eng,corr,ASM,homo,lbp81Mean,lbp81Median,lbp81Kurtosis,lbp81Skewness,lbp81Var,lbp81Std
0,0.008,469527.692,221840.097,853.082,0.696,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
1,0.034,25286.052,18785.248,219.918,0.604,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
2,0.038,20687.542,10287.635,182.914,0.685,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
3,0.040,18401.866,8684.988,171.734,0.697,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
4,0.018,83931.000,67504.255,408.272,0.589,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1117,0.030,33190.334,12788.852,227.000,0.745,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
1118,0.030,30862.814,15192.969,222.364,0.688,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
1119,0.024,47572.428,20383.137,271.902,0.718,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0
1120,0.028,36896.676,20516.736,247.206,0.661,0.0,0.0,15.052632,4.129483,0.0,...,1.0,1.0,1.0,1.0,255.0,255.0,15.052632,4.129483,0.0,0.0


### Transformações

In [None]:
scaler = StandardScaler()

X_norm_roi_train = scaler.fit_transform(X_norm_roi_train)
X_norm_roi_test = scaler.transform(X_norm_roi_test)

In [None]:
X_roi_train = scaler.fit_transform(X_roi_train)
X_roi_test = scaler.transform(X_roi_test)

In [None]:
X_norm_train = scaler.fit_transform(X_norm_train)
X_norm_test = scaler.transform(X_norm_test)

In [None]:
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### PCA

In [None]:
# pca = PCA(n_components=.95) # estima 95% da explicabilidade

In [None]:
# X_norm_roi_train = pca.fit_transform(X_norm_roi_train)
# X_norm_roi_test = pca.transform(X_norm_roi_test)
# print("X_norm_roi_train", X_norm_roi_train.shape, pca.explained_variance_ratio_, abs(pca.components_) )

# X_roi_train = pca.fit_transform(X_roi_train)
# X_roi_test = pca.transform(X_roi_test)
# print("X_roi_train", X_roi_train.shape, pca.explained_variance_ratio_, abs(pca.components_) )

# X_norm_train = pca.fit_transform(X_norm_train)
# X_norm_test = pca.transform(X_norm_test)
# print("X_norm_train", X_norm_train.shape, pca.explained_variance_ratio_, abs(pca.components_) )

# X_train = pca.fit_transform(X_train)
# X_test = pca.transform(X_test)
# print("X_train", X_train.shape, pca.explained_variance_ratio_, abs(pca.components_) )

### SVM

#### Treino e teste

In [None]:
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001], 'kernel': ['rbf', 'poly', 'sigmoid', 'linear']}

In [None]:
# Grid search com k-fold (k=5), com eventuais mesmos pacientes no treino e no teste
def grid_search(x_train, y_train, x_test, y_test, params, filename):
  csv_metrics = {}
  grid = GridSearchCV(SVC(), params, refit=True, verbose=2, cv=5)
  grid.fit(x_train, np.ravel(y_train))

  # identificando o melhor estimador
  print(grid.best_estimator_)

  # Identificando os melhores parâmetros
  print(grid.best_params_)
  
  # Gerando o resultado final
  grid_predictions = grid.predict(x_test)
  csv_metrics = {'test': filename, 'acc': accuracy_score(y_test, grid_predictions), 'precision': precision_score(y_test, grid_predictions) , \
                 'recall': recall_score(y_test, grid_predictions), 'f1': f1_score(y_test, grid_predictions), "confusion_matrix": confusion_matrix(y_test, grid_predictions)}
  
  
  # salvar o modelo XGBoost (xgb_model) no arquivo sale_xgboost.pkl
  filename = '{}.pkl'.format(filename)
  with open(filename, 'wb') as file:
    pickle.dump(grid, file)
    #files.download(filename)

  
  return csv_metrics

In [None]:
csv_norm_roi = grid_search(X_norm_roi_train, y_norm_roi_train, X_norm_roi_test, y_norm_roi_test, param_grid, "norm_roi")
csv_roi = grid_search(X_roi_train, y_roi_train, X_roi_test, y_roi_test, param_grid, "roi")
csv_norm = grid_search(X_norm_train, y_norm_train, X_norm_test, y_norm_test, param_grid, "norm")
csv_x = grid_search(X_train, y_train, X_test, y_test, param_grid, "x")

Fitting 5 folds for each of 64 candidates, totalling 320 fits
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.1s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.1s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.1s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.1s
[CV] END .....................C=0.1, gamma=1, kernel=sigmoid; total time=   0.0s
[CV] END .....................C=0.1, gamma=1, k

In [None]:
print('csv_norm_roi', csv_norm_roi)
print('csv_roi', csv_roi)
print('csv_norm', csv_norm)
print('csv_x', csv_x)

csv_norm_roi {'test': 'norm_roi', 'acc': 0.652542372881356, 'precision': 0.6464088397790055, 'recall': 0.8666666666666667, 'f1': 0.7405063291139241, 'confusion_matrix': array([[ 37,  64],
       [ 18, 117]])}
csv_roi {'test': 'roi', 'acc': 0.9152542372881356, 'precision': 0.96, 'recall': 0.8888888888888888, 'f1': 0.923076923076923, 'confusion_matrix': array([[ 96,   5],
       [ 15, 120]])}
csv_norm {'test': 'norm', 'acc': 0.847457627118644, 'precision': 0.8413793103448276, 'recall': 0.9037037037037037, 'f1': 0.8714285714285713, 'confusion_matrix': array([[ 78,  23],
       [ 13, 122]])}
csv_x {'test': 'x', 'acc': 0.940677966101695, 'precision': 0.991869918699187, 'recall': 0.9037037037037037, 'f1': 0.945736434108527, 'confusion_matrix': array([[100,   1],
       [ 13, 122]])}


# Teste <a name="section4"></a>

O melhor classificador será agora executado com o conjunto de teste fornecido pela Professora, sem as labels. O resultado será armazenado em um arquivo .csv para posterior análise e geração da taxa de acerto. 

In [None]:
def get_files_list_and_rejects_test(folder):
  # Recupera todas as imagens (incluindo as mascaras)
  figures = [i.split('/')[-1] for i in glob.glob(folder + "/*[!mask].*")]
  # Recupera somente as mascaras
  figures_mask = [i.split('/')[-1] for i in glob.glob(folder + "/*mask.*")]
  files = [i.lower().split('.')[0] for i in figures]
  mask_files = [i.lower().split('_mask')[0] for i in figures_mask]
  # Intersecção entre imagens e máscaras
  default_figures = [value for value in mask_files if value in files]
  print(len(figures))
  # Aqui arquivos sem máscara (ou que o nome não bateu exato)
  without_mask = list(set(files) - set(default_figures))
  path_default_figures = [get_real_file(folder + os.sep + i.upper()) for i in default_figures]
  path_default_figures_mask = [get_real_file(folder + os.sep + i.upper() + '_mask') for i in default_figures]
  path_without_mask = [get_real_file(folder + os.sep + i.upper()) for i in without_mask]
  return path_default_figures, path_default_figures_mask, path_without_mask

In [None]:
test_folder = '/content/drive/Shareddrives/dsfh_/Test'
test_default_figures, test_default_figures_mask, test_without_mask = get_files_list_and_rejects_test(test_folder)

225


In [None]:
full_test_list = []
# full_test_list = full_image_list(test_default_figures, test_default_figures_mask, '', norm=False, dilatation=False)
# full_test_list = full_image_list(test_default_figures, test_default_figures_mask, '')
full_test_list = full_image_list(test_default_figures, test_default_figures_mask, '', norm=False, dilatation=False)
#full_test_list = full_image_list(test_default_figures, test_default_figures_mask, '')


NORM: False DILATATION: False


In [None]:
len(full_test_list)

225

In [None]:
df_test = pd.DataFrame(full_test_list)

In [None]:
test_patients = [i.split('/')[-1].split('_')[0] for i in df_test['file']]

In [None]:
df_test = df_test.drop(['type', 'file'], axis=1)

In [None]:
df_test.shape

(225, 23)

In [None]:
scaler = StandardScaler()
X_test = scaler.fit_transform(df_test)

In [None]:
# pca = PCA(n_components=4) # Captura 4 dimensões

In [None]:
# X_test = pca.fit_transform(X_test)

In [None]:
X_test.shape

(225, 23)

In [None]:
loaded_model = pickle.load(open('/content/x.pkl', 'rb'))

In [None]:
y_test_prediction = loaded_model.predict(X_test)

In [None]:
res = []
result_em = ['AVC', 'EM']
for a, b in zip(test_patients, y_test_prediction):
  res.append([a, result_em[b]])

In [None]:
pd.DataFrame(res).to_csv('outcome_x.csv', index=False, sep=" ", header=False)
files.download('outcome_x.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>