# [Soy Bean Desease (Large) Dataset](https://archive.ics.uci.edu/dataset/90/soybean+large)

Michalski's famous soybean disease database

### Estrutura do dataset

| Alvo  | Valores |
| ------------- | ------------- |
|class| {diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury}|

| Features  | Valores |
| ------------- | ------------- |
|date| {6.0, 4.0, 3.0, 5.0, 1.0, 0.0, 2.0, nan, }| 
|plant-stand| {0.0, 1.0, nan, }| 
|precip| {2.0, 0.0, 1.0, nan, }| 
|temp| {1.0, 2.0, 0.0, nan, }| 
|hail| {0.0, 1.0, nan, }| 
|crop-hist| {1.0, 2.0, 3.0, 0.0, nan, }| 
|area-damaged| {1.0, 0.0, 3.0, 2.0, nan, }| 
|severity| {1.0, 2.0, nan, 0.0, }| 
|seed-tmt| {0.0, 1.0, nan, 2.0, }| 
|germination| {0.0, 1.0, 2.0, nan, }| 
|plant-growth| {1.0, 0.0, nan, }| 
|leaves| {1, 0, }| 
|leafspots-halo| {0.0, nan, 2.0, 1.0, }| 
|leafspots-marg| {2.0, nan, 0.0, 1.0, }| 
|leafspot-size| {2.0, nan, 1.0, 0.0, }| 
|leaf-shread| {0.0, nan, 1.0, }| 
|leaf-malf| {0.0, nan, 1.0, }| 
|leaf-mild| {0.0, nan, 1.0, 2.0, }| 
|stem| {1.0, 0.0, nan, }| 
|lodging| {1.0, 0.0, nan, }| 
|stem-cankers| {3.0, 0.0, 1.0, 2.0, nan, }| 
|canker-lesion| {1.0, 0.0, 3.0, 2.0, nan, }| 
|fruiting-bodies| {1.0, 0.0, nan, }| 
|external-decay| {1.0, 0.0, nan, }| 
|mycelium| {0.0, 1.0, nan, }| 
|int-discolor| {0.0, 2.0, 1.0, nan, }| 
|sclerotia| {0.0, 1.0, nan, }| 
|fruit-pods| {0.0, 3.0, nan, 1.0, 2.0, }| 
|fruit-spots| {4.0, nan, 0.0, 1.0, 2.0, }| 
|seed| {0.0, nan, 1.0, }| 
|mold-growth| {0.0, nan, 1.0, }| 
|seed-discolor| {0.0, nan, 1.0, }| 
|seed-size| {0.0, nan, 1.0, }| 
|shriveling| {0.0, nan, 1.0, }| 
|roots| {0.0, 1.0, 2.0, nan, }| 

### Instalação das dependências

In [14]:
# !pip install ucimlrepo
# !pip install tensorflow
# !pip install sklearn
# !pip install pandas

In [15]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder

from ucimlrepo import fetch_ucirepo 

### Importando o dataset

In [16]:
# fetch dataset 
dataset = fetch_ucirepo(id=90) 
  
# data (as pandas dataframes) 
X = dataset.data.features 
Y = dataset.data.targets 

# Transform labels to int
labels = Y["class"].unique()
for i in range(len(labels)):
  Y.loc[Y['class']==labels[i], 'class'] = i


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Y.loc[Y['class']==labels[i], 'class'] = i


### One-hot Encoding das features

In [17]:
cat_fatures = []
for feature in dataset.data.features:
    #print(feature)
    feature_info = dataset.variables.loc[dataset.variables["name"] == feature]
    if feature_info.type.values[0] == "Categorical":
        cat_fatures.append(feature)


In [18]:
X = pd.get_dummies(X,columns=cat_fatures, dtype=int)

### Dividindo os conjuntos de teste e treino

In [19]:
x_train, x_test, y_train, y_test = train_test_split(X, Y,test_size= 0.7, random_state = 28)

### One-Hot Enconding dos Targets

In [20]:
y_train = to_categorical(y_train) 
y_true = list(y_test['class'])
y_test = to_categorical(y_test) 

### Perceptron

In [21]:
model = tf.keras.Sequential([
    Flatten(input_shape=(len(X.columns),)),
    Dense(64, activation='relu'),
    Dense(128, activation='relu'),
    Dense(len(y_train[0]), activation='softmax')
])

In [22]:
model.compile(
  loss='categorical_crossentropy', 
  optimizer='adam', 
  metrics=['Accuracy', 'Precision', 'Recall', 'F1Score']
)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 98)                0         
                                                                 
 dense_3 (Dense)             (None, 64)                6336      
                                                                 
 dense_4 (Dense)             (None, 128)               8320      
                                                                 
 dense_5 (Dense)             (None, 19)                2451      
                                                                 
Total params: 17107 (66.82 KB)
Trainable params: 17107 (66.82 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


### Treinamento

In [23]:
model.fit(x_train, y_train, epochs=300, batch_size=32, verbose=1)
model.evaluate(x_test,  y_test, verbose=2)

Epoch 1/300


Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78/300
Epoch 7

[0.6478949785232544,
 0.8279069662094116,
 0.8516746163368225,
 0.8279069662094116,
 array([0.8       , 1.        , 1.        , 0.95652175, 0.9677419 ,
        1.        , 1.        , 0.85714287, 0.875     , 0.9230769 ,
        0.6       , 0.71428573, 0.57142854, 0.68965524, 0.62222224,
        1.        , 1.        , 0.        , 0.8       ], dtype=float32)]

In [24]:
predictions = model.predict(x_test)
predictions = [list(p).index(max(p)) for p in predictions]


[[5.7992725e-05 1.0408237e-05 4.4474887e-06 ... 1.6413238e-04
  1.9864571e-07 4.9222500e-08]
 [4.2185628e-01 6.9444832e-05 1.6540890e-03 ... 5.6441961e-04
  2.3878595e-06 1.2870449e-04]
 [6.1285245e-04 4.2938525e-05 9.3410039e-05 ... 2.2077073e-05
  3.8914663e-07 9.8284207e-02]
 ...
 [2.4309034e-07 6.6458405e-08 3.2277587e-06 ... 5.5543723e-07
  9.2422084e-07 7.5505253e-05]
 [4.8072319e-04 7.5684288e-03 2.9620141e-01 ... 1.9660069e-05
  1.3295653e-07 5.4602948e-04]
 [2.9699941e-05 6.9330412e-08 1.4467660e-06 ... 3.4977877e-08
  1.5011324e-08 1.0953607e-06]]


In [25]:
print(f'Acurácia obtida: {accuracy_score(y_true, predictions) * 100:.2f}%')
print(f'Precisão obtida: {precision_score(y_true, predictions, average="macro") * 100:.2f}%')
print(f'Recall obtido: {recall_score(y_true, predictions, average="macro") * 100:.2f}%')
print(f'F1 Score obtida: {f1_score(y_true, predictions, average="macro") * 100:.2f}%')


Acurácia obtida: 82.79%
Precisão obtida: 80.30%
Recall obtido: 83.07%
F1 Score obtida: 80.93%


  _warn_prf(average, modifier, msg_start, len(result))
