# [Lymphography Dataset](https://archive.ics.uci.edu/dataset/63/lymphography)

This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. 

### Estrutura do dataset

| Alvo  | Valores |
| ------------- | ------------- |
|class| {normal find, metastases, malign lymph, fibrosis}| 

| Features  | Valores |
| ------------- | ------------- |
|lymphatics| { normal, arched, deformed, displaced} |
|block of affere| { no, yes} |
|bl. of lymph. c| { no, yes} |
|bl. of lymph. s| { no, yes} |
|by pass| { no, yes} |
|extravasates| { no, yes} |
|regeneration of| { no, yes} |
|early uptake in| { no, yes} |
|lym.nodes dimin| { 0-3} |
|lym.nodes enlar| { 1-4} |
|changes in lym.| { bean, oval, round} |
|defect in node| { no, lacunar, lac. marginal, lac. central} |
|changes in node| { no, lacunar, lac. margin, lac. central} |
|changes in stru| { no, grainy, drop-like, coarse, diluted, reticular, stripped, faint,} |
|special forms| { no, chalices, vesicles} |
|dislocation of| { no, yes} |
|exclusion of no| { no, yes} |
|no. of nodes in| { 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, >=7} |

### Instalação das dependências

In [301]:
# !pip install ucimlrepo
# !pip install tensorflow
# !pip install sklearn
# !pip install pandas

In [302]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder

from ucimlrepo import fetch_ucirepo 
pd.options.mode.copy_on_write = True

In [303]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense, Dropout

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder

from ucimlrepo import fetch_ucirepo 
pd.options.mode.copy_on_write = True

### Importando o dataset

In [304]:
# fetch dataset 
dataset = fetch_ucirepo(id=63) 
  
# data (as pandas dataframes) 
X = dataset.data.features 
Y = dataset.data.targets 

# Transform labels to int
labels = Y["class"].unique()
for i in range(len(labels)):
  Y.loc[Y['class']==labels[i], 'class'] = i


### One-hot Encoding das features

In [305]:
cat_fatures = [feature for feature in dataset.data.features]

In [306]:
X = pd.get_dummies(X,columns=cat_fatures, dtype=int)

### Dividindo os conjuntos de teste e treino

In [307]:
x_train, x_test, y_train, y_test = train_test_split(X, Y,test_size= 0.3, random_state = 28)

### One-Hot Enconding dos Targets

In [308]:
y_train = to_categorical(y_train) 
y_true = list(y_test['class'])
y_test = to_categorical(y_test) 

### Perceptron

In [490]:
model = tf.keras.Sequential([
    Flatten(input_shape=(len(X.columns),)),
    Dense(64, activation='relu'),
    Dropout(0.5), # Camada de Dropout utilizada para diminuir o overfitting
    Dense(32, activation='relu'),
    Dense(len(y_train[0]), activation='softmax')
])

In [491]:
model.compile(
  loss='categorical_crossentropy', 
  optimizer='adam', 
  metrics=['Accuracy', 'Precision', 'Recall', 'F1Score']
)
model.summary()

Model: "sequential_76"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_76 (Flatten)        (None, 63)                0         
                                                                 
 dense_258 (Dense)           (None, 64)                4096      
                                                                 
 dropout_41 (Dropout)        (None, 64)                0         
                                                                 
 dense_259 (Dense)           (None, 32)                2080      
                                                                 
 dense_260 (Dense)           (None, 4)                 132       
                                                                 
Total params: 6308 (24.64 KB)
Trainable params: 6308 (24.64 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


### Treinamento

In [492]:
from keras.callbacks import Callback

class stopAtLossValue(Callback):
    '''
    Funcao utilizada para parar o treinamento caso a loss fique abaixo de um limite estabelecido.

    Esta medida foi tomada para diminuir o overfitting que o modelo estava apresentando.

    fonte: https://stackoverflow.com/a/54959664
    '''
    def on_batch_end(self, batch, logs={}):
        THR = 0.01
        if logs.get('loss') <= THR:
                self.model.stop_training = True

In [493]:
model.fit(x_train, y_train, epochs=600, batch_size=64, verbose=1, callbacks=[stopAtLossValue()])
model.evaluate(x_test,  y_test, verbose=2)

Epoch 1/600


Epoch 2/600
Epoch 3/600
Epoch 4/600
Epoch 5/600
Epoch 6/600
Epoch 7/600
Epoch 8/600
Epoch 9/600
Epoch 10/600
Epoch 11/600
Epoch 12/600
Epoch 13/600
Epoch 14/600
Epoch 15/600
Epoch 16/600
Epoch 17/600
Epoch 18/600
Epoch 19/600
Epoch 20/600
Epoch 21/600
Epoch 22/600
Epoch 23/600
Epoch 24/600
Epoch 25/600
Epoch 26/600
Epoch 27/600
Epoch 28/600
Epoch 29/600
Epoch 30/600
Epoch 31/600
Epoch 32/600
Epoch 33/600
Epoch 34/600
Epoch 35/600
Epoch 36/600
Epoch 37/600
Epoch 38/600
Epoch 39/600
Epoch 40/600
Epoch 41/600
Epoch 42/600
Epoch 43/600
Epoch 44/600
Epoch 45/600
Epoch 46/600
Epoch 47/600
Epoch 48/600
Epoch 49/600
Epoch 50/600
Epoch 51/600
Epoch 52/600
Epoch 53/600
Epoch 54/600
Epoch 55/600
Epoch 56/600
Epoch 57/600
Epoch 58/600
Epoch 59/600
Epoch 60/600
Epoch 61/600
Epoch 62/600
Epoch 63/600
Epoch 64/600
Epoch 65/600
Epoch 66/600
Epoch 67/600
Epoch 68/600
Epoch 69/600
Epoch 70/600
Epoch 71/600
Epoch 72/600
Epoch 73/600
Epoch 74/600
Epoch 75/600
Epoch 76/600
Epoch 77/600
Epoch 78/600
Epoch 7

[1.2226308584213257,
 0.800000011920929,
 0.800000011920929,
 0.800000011920929,
 array([0.78048784, 0.        , 0.6666667 , 0.826087  ], dtype=float32)]

In [494]:
predictions = model.predict(x_test)
predictions = [list(p).index(max(p)) for p in predictions]



In [495]:
print(f'Acurácia obtida: {accuracy_score(y_true, predictions) * 100:.2f}%')
print(f'Precisão obtida: {precision_score(y_true, predictions, average="macro", zero_division=np.nan) * 100:.2f}%')
print(f'Recall obtido: {recall_score(y_true, predictions, average="macro") * 100:.2f}%')
print(f'F1 Score obtida: {f1_score(y_true, predictions, average="macro") * 100:.2f}%')


Acurácia obtida: 80.00%
Precisão obtida: 86.36%
Recall obtido: 71.13%
F1 Score obtida: 75.77%
