# [Soy Bean Desease (Large) Dataset](https://archive.ics.uci.edu/dataset/90/soybean+large)

Michalski's famous soybean disease database

### Estrutura do dataset

| Alvo  | Valores |
| ------------- | ------------- |
|class| {diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury}|

| Features  | Valores |
| ------------- | ------------- |
|date| {6.0, 4.0, 3.0, 5.0, 1.0, 0.0, 2.0, nan, }| 
|plant-stand| {0.0, 1.0, nan, }| 
|precip| {2.0, 0.0, 1.0, nan, }| 
|temp| {1.0, 2.0, 0.0, nan, }| 
|hail| {0.0, 1.0, nan, }| 
|crop-hist| {1.0, 2.0, 3.0, 0.0, nan, }| 
|area-damaged| {1.0, 0.0, 3.0, 2.0, nan, }| 
|severity| {1.0, 2.0, nan, 0.0, }| 
|seed-tmt| {0.0, 1.0, nan, 2.0, }| 
|germination| {0.0, 1.0, 2.0, nan, }| 
|plant-growth| {1.0, 0.0, nan, }| 
|leaves| {1, 0, }| 
|leafspots-halo| {0.0, nan, 2.0, 1.0, }| 
|leafspots-marg| {2.0, nan, 0.0, 1.0, }| 
|leafspot-size| {2.0, nan, 1.0, 0.0, }| 
|leaf-shread| {0.0, nan, 1.0, }| 
|leaf-malf| {0.0, nan, 1.0, }| 
|leaf-mild| {0.0, nan, 1.0, 2.0, }| 
|stem| {1.0, 0.0, nan, }| 
|lodging| {1.0, 0.0, nan, }| 
|stem-cankers| {3.0, 0.0, 1.0, 2.0, nan, }| 
|canker-lesion| {1.0, 0.0, 3.0, 2.0, nan, }| 
|fruiting-bodies| {1.0, 0.0, nan, }| 
|external-decay| {1.0, 0.0, nan, }| 
|mycelium| {0.0, 1.0, nan, }| 
|int-discolor| {0.0, 2.0, 1.0, nan, }| 
|sclerotia| {0.0, 1.0, nan, }| 
|fruit-pods| {0.0, 3.0, nan, 1.0, 2.0, }| 
|fruit-spots| {4.0, nan, 0.0, 1.0, 2.0, }| 
|seed| {0.0, nan, 1.0, }| 
|mold-growth| {0.0, nan, 1.0, }| 
|seed-discolor| {0.0, nan, 1.0, }| 
|seed-size| {0.0, nan, 1.0, }| 
|shriveling| {0.0, nan, 1.0, }| 
|roots| {0.0, 1.0, 2.0, nan, }| 

### Instalação das dependências

In [248]:
# !pip install ucimlrepo
# !pip install tensorflow
# !pip install sklearn
# !pip install pandas
# !pip install numpy

In [338]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder

from ucimlrepo import fetch_ucirepo 
pd.options.mode.copy_on_write = True

### Importando o dataset

In [250]:
# fetch dataset 
dataset = fetch_ucirepo(id=90) 
  
# data (as pandas dataframes) 
X = dataset.data.features 
Y = dataset.data.targets 

# Transform labels to int
labels = Y["class"].unique()
for i in range(len(labels)):
  Y.loc[Y['class']==labels[i], 'class'] = i


### One-hot Encoding das features

In [251]:
cat_fatures = []
for feature in dataset.data.features:
    #print(feature)
    feature_info = dataset.variables.loc[dataset.variables["name"] == feature]
    if feature_info.type.values[0] == "Categorical":
        cat_fatures.append(feature)


In [252]:
X = pd.get_dummies(X,columns=cat_fatures, dtype=int)

### Dividindo os conjuntos de teste e treino

In [253]:
x_train, x_test, y_train, y_test = train_test_split(X, Y,test_size= 0.3, random_state = 42)

### One-Hot Enconding dos Targets

In [254]:
y_train = to_categorical(y_train) 
y_true = list(y_test['class'])
y_test = to_categorical(y_test) 

### Perceptron

In [331]:
model = tf.keras.Sequential([
    Flatten(input_shape=(len(X.columns),)),
    Dense(16, activation='relu'),
    Dense(32, activation='relu'),
    Dense(32, activation='relu'),
    Dense(len(y_train[0]), activation='softmax')
])

In [332]:
model.compile(
  loss='categorical_crossentropy', 
  optimizer='adam', 
  metrics=['Accuracy', 'Precision', 'Recall', 'F1Score']
)
model.summary()

Model: "sequential_34"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_34 (Flatten)        (None, 98)                0         
                                                                 
 dense_128 (Dense)           (None, 16)                1584      
                                                                 
 dense_129 (Dense)           (None, 32)                544       
                                                                 
 dense_130 (Dense)           (None, 32)                1056      
                                                                 
 dense_131 (Dense)           (None, 19)                627       
                                                                 
Total params: 3811 (14.89 KB)
Trainable params: 3811 (14.89 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


### Treinamento

In [340]:
from keras.callbacks import Callback

class stopAtLossValue(Callback):
    '''
    Funcao utilizada para parar o treinamento caso a loss fique abaixo de um limite estabelecido.

    Esta medida foi tomada para diminuir o overfitting que o modelo estava apresentando.

    fonte: https://stackoverflow.com/a/54959664
    '''
    def on_batch_end(self, batch, logs={}):
        THR = 0.01 
        if logs.get('loss') <= THR:
                self.model.stop_training = True

In [334]:
model.fit(x_train, y_train, epochs=200, batch_size=64, verbose=1, validation_split=0.2, callbacks=[stopAtLossValue()])
model.evaluate(x_test,  y_test, verbose=2)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

[0.4185304045677185,
 0.9032257795333862,
 0.9130434989929199,
 0.9032257795333862,
 array([1.       , 1.       , 1.       , 0.9166667, 1.       , 1.       ,
        1.       , 0.969697 , 1.       , 1.       , 1.       , 0.9230769,
        0.6666667, 0.7857143, 0.6666667, 0.       , 1.       , 0.       ,
        1.       ], dtype=float32)]

In [335]:
predictions = model.predict(x_test)
predictions = [list(p).index(max(p)) for p in predictions]




In [339]:
print(f'Acurácia obtida: {accuracy_score(y_true, predictions) * 100:.2f}%')
print(f'Precisão obtida: {precision_score(y_true, predictions, average="macro", zero_division=np.nan) * 100:.2f}%')
print(f'Recall obtido: {recall_score(y_true, predictions, average="macro") * 100:.2f}%')
print(f'F1 Score obtida: {f1_score(y_true, predictions, average="macro") * 100:.2f}%')


Acurácia obtida: 90.32%
Precisão obtida: 95.71%
Recall obtido: 87.80%
F1 Score obtida: 88.49%
