#### Soybean (Large) Model

Descrição: O dataset contém diversas informações sobre grãos de soja e deseja classifica-los em uma das 19 classes apresentadas (diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot,
     phytophthora-rot, brown-stem-rot, powdery-mildew,
     downy-mildew, brown-spot, bacterial-blight,
     bacterial-pustule, purple-seed-stain, anthracnose,
     phyllosticta-leaf-spot, alternarialeaf-spot,
     frog-eye-leaf-spot, diaporthe-pod-&-stem-blight,
     cyst-nematode, 2-4-d-injury, herbicide-injury). São fornecidas 307 instâncias no total

### Imports

In [65]:

import numpy as np
from ucimlrepo import fetch_ucirepo
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import wisardpkg as wp


### Carregando o Dataset

In [66]:
soybean_large = fetch_ucirepo(id=90)
X_original = soybean_large.data.features
y_original = soybean_large.data.targets.iloc[:, 0]

print(f"Dimensão inicial de X: {X_original.shape}")
# print(f"Colunas de X e seus tipos de dados:\n{X_original.dtypes}")
# print(f"\nPrimeiras 5 linhas de X:\n{X_original.head()}")
print(f"\nClasses de y:\n{y_original.value_counts()}")

Dimensão inicial de X: (307, 35)

Classes de y:
class
alternarialeaf-spot            40
brown-spot                     40
phytophthora-rot               40
frog-eye-leaf-spot             40
brown-stem-rot                 20
anthracnose                    20
diaporthe-stem-canker          10
rhizoctonia-root-rot           10
charcoal-rot                   10
downy-mildew                   10
powdery-mildew                 10
purple-seed-stain              10
bacterial-pustule              10
bacterial-blight               10
phyllosticta-leaf-spot         10
diaporthe-pod-&-stem-blight     6
cyst-nematode                   6
herbicide-injury                4
2-4-d-injury                    1
Name: count, dtype: int64


### Pré-processamento

#### Removendo nulos

In [67]:
print(X_original.isnull().sum()[X_original.isnull().sum() > 0])
X_original = X_original.replace('?', np.nan)

for col in X_original.columns:
    X_original[col] = X_original[col].fillna(X_original[col].mode()[0])

print("\nValores ausentes após preenchimento:")
print(X_original.isnull().sum()[X_original.isnull().sum() > 0])

date                1
plant-stand         8
precip             11
temp                7
hail               41
crop-hist           1
area-damaged        1
severity           41
seed-tmt           41
germination        36
plant-growth        1
leafspots-halo     25
leafspots-marg     25
leafspot-size      25
leaf-shread        26
leaf-malf          25
leaf-mild          30
stem                1
lodging            41
stem-cankers       11
canker-lesion      11
fruiting-bodies    35
external-decay     11
mycelium           11
int-discolor       11
sclerotia          11
fruit-pods         25
fruit-spots        35
seed               29
mold-growth        29
seed-discolor      35
seed-size          29
shriveling         35
roots               7
dtype: int64

Valores ausentes após preenchimento:
Series([], dtype: int64)


#### One Hot Encoding para atributos categóricos

In [68]:
encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False) # sparse_output=False retorna array denso

X_encoded_dense = encoder.fit_transform(X_original)

print(f"Dimensão de X após One-Hot Encoding: {X_encoded_dense.shape}")
print(f"Tipo dos elementos após One-Hot Encoding: {X_encoded_dense.dtype}")
print(f"Primeiras 2 linhas de X_encoded_dense:\n{X_encoded_dense[:2]}")

Dimensão de X após One-Hot Encoding: (307, 98)
Tipo dos elementos após One-Hot Encoding: float64
Primeiras 2 linhas de X_encoded_dense:
[[0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0.
  0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1.
  0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 1. 0.
  1. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1.
  0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 0. 0.
  0. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1.
  0. 1. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 1. 0.
  1. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1.
  0. 0.]]


### Dividindo os Dados em Treino e Teste

In [69]:
X_encoded_uint8 = X_encoded_dense.astype(np.uint8)
y_str = y_original.astype(str).tolist()

X_train, X_test, y_train, y_test = train_test_split(
    X_encoded_uint8, y_str, test_size=0.7, random_state=42
)

print(f"Tamanho do conjunto de treino (X_train): {X_train.shape[0]} amostras, {X_train.shape[1]} bits")
print(f"Tamanho do conjunto de teste (X_test): {X_test.shape[0]} amostras, {X_test.shape[1]} bits")
print(f"Tipo dos elementos de X_train: {X_train.dtype}")


Tamanho do conjunto de treino (X_train): 92 amostras, 98 bits
Tamanho do conjunto de teste (X_test): 215 amostras, 98 bits
Tipo dos elementos de X_train: uint8


### Treinando o Modelo WiSARD

In [70]:
addressSize = 8

wsd = wp.Wisard(addressSize, ignoreZero=False, verbose=True)

print(f"WiSARD inicializado com addressSize={addressSize}")
wsd.train(X_train, y_train)

predictions_str = wsd.classify(X_test)

y_pred_np = np.array(predictions_str)
y_test_np = np.array(y_test) 

WiSARD inicializado com addressSize=8
training 92 of 92
classifying 215 of 2155


### Relatório de Classificação

In [71]:
print(classification_report(
    y_test_np, y_pred_np,
    labels=np.unique(y_test)
))

                             precision    recall  f1-score   support

               2-4-d-injury       0.00      0.00      0.00         1
        alternarialeaf-spot       0.41      1.00      0.58        28
                anthracnose       1.00      0.79      0.88        14
           bacterial-blight       0.80      0.57      0.67         7
          bacterial-pustule       1.00      0.83      0.91         6
                 brown-spot       1.00      0.68      0.81        31
             brown-stem-rot       0.87      0.93      0.90        14
               charcoal-rot       1.00      0.62      0.77         8
              cyst-nematode       1.00      1.00      1.00         3
diaporthe-pod-&-stem-blight       1.00      0.80      0.89         5
      diaporthe-stem-canker       1.00      0.88      0.93         8
               downy-mildew       1.00      0.57      0.73         7
         frog-eye-leaf-spot       0.79      0.58      0.67        26
           herbicide-injury      

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


### WiSARD x MLP

O modelo alcançou números similares ao do MLP nas métricas de avaliação. Os resultados negativos do modelo em ambos os métodos devem-se, provavelmente, a baixa quantidade de amostras.