Training Kolmogorov-Arnold Network (KAN) for Lithology Classification

In [6]:
import imodelsx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pandas import set_option
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, f1_score

from imodelsx import KANClassifier

In [None]:
# read data 
dataset = '/content/MLGeo/KAN/feature_vectors_preprocessed.csv'
training_data = pd.read_csv(dataset)
training_data

In [None]:
# Standardize
feature_vectors = training_data.iloc[:,:-1]
correct_facies_labels = training_data.iloc[:,-1]

scaler = preprocessing.StandardScaler().fit(feature_vectors)
scaled_features = scaler.transform(feature_vectors)

# Split
X_train, X_test, y_train, y_test = train_test_split(scaled_features, correct_facies_labels, test_size=0.2, random_state=10)

It's time to train KAN on our training data. Like you know in ML and neural networks, there are a few parameters of KAN we should aware of:

* Model hyperparameters: hidden_layer_size, regularize_activation, regularize_entropy, regularize_ridge
* Training configuration: batch_size, lr (learning rate), weight_decay, gamma

We will use default parameters and see how accurate it is. We will use weighted F1-score metric to evaluate on training and test set. 

In [None]:
# Build model
model = KANClassifier(device='cuda')

# Fit with training data
model.fit(X_train, y_train)

# Evaluate F1 accuracy on train set
y_train_pred = model.predict(X_train)
f1_train = f1_score(y_train, y_train_pred, average='weighted')
print(f1_train)

# Evaluate F1 accuracy on test set
y_test_pred = model.predict(X_test)
f1_test = f1_score(y_test, y_test_pred, average='weighted')
print(f1_test)

Our first result we have F1-score training 0.29 and F1-score testing 0.32. Definitely, not accurate! Let's try hyperparameter tuning. Since we have 2 categories of parameters (listed above), we try first to tune model hyperparameters. 
Currently, the imodelsx doesn't provide hyperparameter tuning like in scikit-learn. So, let's build a grid search from scratch using simple for-loops.

In [None]:
# Hyperparameter grid
layers = [256, 512, 1024]
activation = [0.4, 0.5, 0.6]
entropy = [0.4, 0.5, 0.6]
ridge = [0.05, 0.1, 0.2]

# varying hyperparameter
for hidden_layer_size in layers:
  for regularize_activation in activation:
    for regularize_entropy in entropy:
      for regularize_ridge in ridge:
        model = KANClassifier(hidden_layer_size=hidden_layer_size, device='cuda',
                                      regularize_activation=regularize_activation,
                                      regularize_entropy=regularize_entropy,
                                       regularize_ridge=regularize_ridge)

        model.fit(X_train, y_train)

        y_train_pred = model.predict(X_train)
        f1_train = f1_score(y_train, y_train_pred, average='weighted')

        y_test_pred = model.predict(X_test)
        f1_test = f1_score(y_test, y_test_pred, average='weighted')

        print(hidden_layer_size, regularize_activation, regularize_entropy, regularize_ridge, f1_train, f1_test)

From this grid search, we can manage to improve the F1-score training to 0.48 and the F1-score test to 0.50, using the following hyperparameters
hidden_layer_size: 1024
regularize_activation: 0.5
regularize_entropy: 0.4
regularize_ridge: 0.2
An improvement of 60.7% in the F1-score compared to the default setup, but the score is still low! Remember, we still have another parameter to tune, which is the training configuration parameters. 
So, let's build another grid search.

In [None]:
# Grid of training configuration
batch = [64, 128, 512]
learning = [0.05, 0.07, 0.1]
weights = [0.005, 0.01, 0.03]
gammas = [0.4, 0.5, 0.6]

# Use tuned hyperparameter
model = KANClassifier(hidden_layer_size=1024, regularize_activation=.5,
                      regularize_entropy=.4, regularize_ridge=.2,
                      device='cuda')

# Varying training configuration
for batch_size in batch:
  for lr in learning:
    for weight_decay in weights:
      for gamma in gammas:
        model.fit(X_train, y_train, batch_size=batch_size, lr=lr,
                  weight_decay=weight_decay, gamma=gamma)

        y_train_pred = model.predict(X_train)
        f1_train = f1_score(y_train, y_train_pred, average='weighted')

        y_test_pred = model.predict(X_test)
        f1_test = f1_score(y_test, y_test_pred, average='weighted')

        print(batch_size, lr, weight_decay, gamma, f1_train, f1_test)


From this second grid search, finally, we manage to improve the F1-score training to 0.58 and the F1-score test to 0.60, using the following hyperparameters
batch_size: 64
lr: 0.07
weight_decay: 0.01
gamma: 0.6
It is a 92.9% improvement in F1-score compared to default setup. Neat!!
Now, we have a final model

In [None]:
### Final KAN model
model = KANClassifier(hidden_layer_size=1024, regularize_activation=.5,
                      regularize_entropy=.4, regularize_ridge=.2,
                      device='cuda')

# Fit with training data
model.fit(X_train, y_train, batch_size=64, lr=.07,
                  weight_decay=.01, gamma=.6)