# [Soy Bean Desease (Small) Dataset](https://archive.ics.uci.edu/dataset/91/soybean+small)

A small subset of Michalski's famous soybean disease database

### Estrutura do dataset

| Alvo  | Valores |
| ------------- | ------------- |
|class| {D1, D2, D3, D4, }| 

| Features  | Valores |
| ------------- | ------------- |
|date| {4, 5, 3, 6, 0, 2, 1, }| 
|plant-stand| {0, 1, }| 
|precip| {2, 0, 1, }| 
|temp| {1, 2, 0, }| 
|hail| {1, 0, }| 
|crop-hist| {1, 3, 2, 0, }| 
|area-damaged| {0, 1, 2, 3, }| 
|severity| {1, 2, }| 
|seed-tmt| {0, 1, }| 
|germination| {2, 1, 0, }| 
|plant-growth| {1, }| 
|leaves| {1, 0, }| 
|leafspots-halo| {0, }| 
|leafspots-marg| {2, }| 
|leafspot-size| {2, }| 
|leaf-shread| {0, }| 
|leaf-malf| {0, }| 
|leaf-mild| {0, }| 
|stem| {1, }| 
|lodging| {0, 1, }| 
|stem-cankers| {3, 0, 1, 2, }| 
|canker-lesion| {1, 0, 3, 2, }| 
|fruiting-bodies| {1, 0, }| 
|external-decay| {1, 0, }| 
|mycelium| {0, 1, }| 
|int-discolor| {0, 2, }| 
|sclerotia| {0, 1, }| 
|fruit-pods| {0, 3, }| 
|fruit-spots| {4, }| 
|seed| {0, }| 
|mold-growth| {0, }| 
|seed-discolor| {0, }| 
|seed-size| {0, }| 
|shriveling| {0, }| 
|roots| {0, 1, }| 

### Instalação das dependências

In [14]:
# !pip install ucimlrepo
# !pip install tensorflow
# !pip install sklearn
# !pip install pandas

In [15]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder

from ucimlrepo import fetch_ucirepo 

### Importando o dataset

In [16]:
# fetch dataset 
dataset = fetch_ucirepo(id=91) 
  
# data (as pandas dataframes) 
X = dataset.data.features 
Y = dataset.data.targets 

# Transform labels to int
labels = Y["class"].unique()
for i in range(len(labels)):
  Y.loc[Y['class']==labels[i], 'class'] = i


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Y.loc[Y['class']==labels[i], 'class'] = i


### One-hot Encoding das features

In [17]:
cat_fatures = []
for feature in dataset.data.features:
    #print(feature)
    feature_info = dataset.variables.loc[dataset.variables["name"] == feature]
    if feature_info.type.values[0] == "Categorical":
        cat_fatures.append(feature)


In [18]:
X = pd.get_dummies(X,columns=cat_fatures, dtype=int)

### Dividindo os conjuntos de teste e treino

In [19]:
x_train, x_test, y_train, y_test = train_test_split(X, Y,test_size= 0.7, random_state = 28)

### One-Hot Enconding dos Targets

In [20]:
y_train = to_categorical(y_train) 
y_true = list(y_test['class'])
y_test = to_categorical(y_test) 

### Perceptron

In [21]:
model = tf.keras.Sequential([
    Flatten(input_shape=(len(X.columns),)),
    Dense(64, activation='relu'),
    Dense(128, activation='relu'),
    Dense(len(y_train[0]), activation='softmax')
])

In [22]:
model.compile(
  loss='categorical_crossentropy', 
  optimizer='sgd', 
  metrics=['Accuracy', 'Precision', 'Recall', 'F1Score']
)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 72)                0         
                                                                 
 dense_3 (Dense)             (None, 64)                4672      
                                                                 
 dense_4 (Dense)             (None, 128)               8320      
                                                                 
 dense_5 (Dense)             (None, 4)                 516       
                                                                 
Total params: 13508 (52.77 KB)
Trainable params: 13508 (52.77 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


### Treinamento

In [23]:
model.fit(x_train, y_train, epochs=300, batch_size=32, verbose=1)
model.evaluate(x_test,  y_test, verbose=2)

Epoch 1/300


Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78/300
Epoch 7

[0.378959983587265,
 0.8484848737716675,
 0.9629629850387573,
 0.7878788113594055,
 array([1.        , 0.8       , 0.8333334 , 0.82758623], dtype=float32)]

In [26]:
predictions = model.predict(x_test)
predictions = [list(p).index(max(p)) for p in predictions]



In [27]:
print(f'Acurácia obtida: {accuracy_score(y_true, predictions) * 100:.2f}%')
print(f'Precisão obtida: {precision_score(y_true, predictions, average="macro") * 100:.2f}%')
print(f'Recall obtido: {recall_score(y_true, predictions, average="macro") * 100:.2f}%')
print(f'F1 Score obtida: {f1_score(y_true, predictions, average="macro") * 100:.2f}%')


Acurácia obtida: 84.85%
Precisão obtida: 92.65%
Recall obtido: 84.52%
F1 Score obtida: 86.52%
