#### Soybean (Large) Model

Descrição: O dataset contém diversas informações sobre grãos de soja e deseja classifica-los em uma das 19 classes apresentadas (diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot,
     phytophthora-rot, brown-stem-rot, powdery-mildew,
     downy-mildew, brown-spot, bacterial-blight,
     bacterial-pustule, purple-seed-stain, anthracnose,
     phyllosticta-leaf-spot, alternarialeaf-spot,
     frog-eye-leaf-spot, diaporthe-pod-&-stem-blight,
     cyst-nematode, 2-4-d-injury, herbicide-injury). São fornecidas 307 instâncias no total

Resultados: O modelo não se saiu muito bem nas métricas avaliadas. Para melhorar os resultados foi implementada uma substituição da "?" por dados sintéticos baseados na moda da coluna em questão que estava sendo avaliada. Os resultados negativos do modelo provavelmente se devem a baixa quantidade de amostras.

In [2]:

import numpy as np
from ucimlrepo import fetch_ucirepo
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import tensorflow as tf


In [3]:

soybean_large = fetch_ucirepo(id=90)

# data (as pandas dataframes)
X = soybean_large.data.features
y = soybean_large.data.targets


In [6]:

# Substituir '?' por NaN
X = X.replace('?', np.nan)

# Preencher valores ausentes com o valor mais frequente de cada coluna (moda)
for col in X.columns:
    X[col] = X[col].fillna(X[col].mode()[0])


In [7]:

encoder = OneHotEncoder(handle_unknown='ignore')
X_encoded = encoder.fit_transform(X)


In [None]:

le = LabelEncoder()
y_encoded = le.fit_transform(y)
print(np.unique(y))
print(np.unique(y_encoded))


['2-4-d-injury' 'alternarialeaf-spot' 'anthracnose' 'bacterial-blight'
 'bacterial-pustule' 'brown-spot' 'brown-stem-rot' 'charcoal-rot'
 'cyst-nematode' 'diaporthe-pod-&-stem-blight' 'diaporthe-stem-canker'
 'downy-mildew' 'frog-eye-leaf-spot' 'herbicide-injury'
 'phyllosticta-leaf-spot' 'phytophthora-rot' 'powdery-mildew'
 'purple-seed-stain' 'rhizoctonia-root-rot']
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18]


  y = column_or_1d(y, warn=True)


In [9]:

X_train, X_test, y_train, y_test = train_test_split(X_encoded, y_encoded, test_size=0.3, random_state=42)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(len(np.unique(y_encoded)), activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train, epochs=50, validation_split=0.3, callbacks=[early_stop], verbose=1)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 117ms/step - accuracy: 0.1025 - loss: 3.0053 - val_accuracy: 0.2154 - val_loss: 2.7866
Epoch 2/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step - accuracy: 0.2266 - loss: 2.7594 - val_accuracy: 0.2462 - val_loss: 2.6186
Epoch 3/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step - accuracy: 0.2438 - loss: 2.5773 - val_accuracy: 0.2615 - val_loss: 2.4825
Epoch 4/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 49ms/step - accuracy: 0.3318 - loss: 2.3925 - val_accuracy: 0.4154 - val_loss: 2.3517
Epoch 5/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step - accuracy: 0.4738 - loss: 2.2245 - val_accuracy: 0.5231 - val_loss: 2.2017
Epoch 6/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step - accuracy: 0.6447 - loss: 1.9804 - val_accuracy: 0.5846 - val_loss: 2.0346
Epoch 7/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━

In [46]:
y_pred = model.predict(X_test).argmax(axis=1)
print("Relatório de Classificação:")
print(classification_report(
    y_test, y_pred,
    labels=np.unique(y_test),
    target_names=le.classes_[np.unique(y_test)]
))
print("Matriz de Confusão:")
print(confusion_matrix(y_test, y_pred))


[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step
Relatório de Classificação:
                        precision    recall  f1-score   support

          2-4-d-injury       0.00      0.00      0.00         1
   alternarialeaf-spot       0.67      0.92      0.77        13
           anthracnose       1.00      0.86      0.92         7
      bacterial-blight       0.80      1.00      0.89         4
     bacterial-pustule       1.00      0.50      0.67         2
            brown-spot       0.88      0.94      0.91        16
        brown-stem-rot       1.00      0.88      0.93         8
          charcoal-rot       0.67      1.00      0.80         2
         cyst-nematode       1.00      1.00      1.00         2
 diaporthe-stem-canker       1.00      1.00      1.00         5
          downy-mildew       1.00      1.00      1.00         2
    frog-eye-leaf-spot       0.86      0.60      0.71        10
      herbicide-injury       0.00      0.00      0.00         1
phy

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
