## Few shot learning

Given the limited amount of positive (fraudulent) data points available, we explored few-shot learning, which is a method that specialises in training with small training data. We implemented a classifier using prototypical networks as the underlying architecture. 

We hypothesized that a few shot classifier would be able to perform comparably to the previous supervised leraning models, while using less samples. To validate this, we will evaluate the classifier's performance on different values of k, where k is the number of examples of each class that is seen by the model during training.

Set up

In [None]:
import warnings

from get_processed_data import get_processed_data
from FSLMethods import form_datasets
from FSLTrainer import FSLTrainer
from show_metrics import show_metrics

warnings.filterwarnings('ignore')

### Preparing data

Train-Test-Validation split

In [None]:
df, X_train, y_train, X_val, y_val, X_test, y_test = get_processed_data()


### Model training (meta-learning / episodic training)

Episodic training simulates the few-shot learning scenario to train a prototypical network. Training data is organized into episodes that resemble few-shot tasks.

Firstly, we determine whether to do feature selection, which method to sample with (if any), and what size the embedding of the prototypical network should be. k is initialized as a constant 10.

In [None]:
config_1 = {
    'n_shot': [10],
    'embedding_size': [2 ** x for x in range(2, 6)] ## Try {4, 8, 16, 32}
}
results_1 = {} ## key:value = (feature_selection, sampling_method):(recall, f1-score)
idx = 0

for feature_selection in [True, False]:
    for sampling_method in ['', 'oversampling', 'undersampling', 'smote']:
        print(f'##### Run {idx} #####')
        print(f'Feature selection: {feature_selection}, sampling method: {sampling_method}')
        train_set, validation_set, test_set = \
            form_datasets(X_train, y_train, X_val, y_val, X_test, y_test, 
                          feature_selection = feature_selection, sampling_method = sampling_method)
        trainer_1 = FSLTrainer(train_set, validation_set, test_set, config_1)

        curr_results, best_config = trainer_1.tune(metric = 'recall')
        print(f'Precision: {curr_results[best_config][0].precision}, F1: {curr_results[best_config][0].f1_score}, AUC: {curr_results[best_config][0].auc}')
        idx += 1
        # print(f'Results: Recall = {curr_results[best_config][0]}, best embedding size = {best_config[1]}')

From the above results, the best recall was obtained by the model that used feature selection and sampling with SMOTE, and had an embedding size of 8 (ie the feature extractor of the prototypical network embeds inputs into vectors of size 8). We will now experiment with various values of k.

In [None]:
train_set, validation_set, test_set = \
    form_datasets(X_train, y_train, X_val, y_val, X_test, y_test, 
                  feature_selection = True, sampling_method = 'smote')

config_2 = {
    'n_shot': [4, 8, 16, 24, 32, 48, 64],
    'embedding_size': [32]
}

trainer_2 = FSLTrainer(train_set, validation_set, test_set, config_2)
results, best_config = trainer_2.tune(metric = 'recall') ## Key:Value = (k, embedding_size):(metric, model_params)

Thus, we will take the optimal k to be 48, and the optimal embedding size to be 8. 

### Model evaluation

We will now determine the minimum k required to match the performance of the supervised learning models. This is done by evaluating the previously trained models on the testing set. Based on the performance of our other models, a threshold of 0.75 for the recall has been set.

In [None]:
## Get trained models for each k
threshold = 0.7
temp_list = [(k, embedding_size, metric, model_params) for (k, embedding_size), (metric, model_params) in results.items()]
temp_list = sorted(temp_list, key = lambda x: x[0])

relevant_actuals = []
relevant_predictions = []

for curr_k, curr_size, _, curr_params in temp_list:
    curr_config = {
        'n_shot': curr_k,
        'embedding_size': curr_size
    }
    test_metrics = trainer_2.test(curr_params, curr_config)
    if test_metrics.recall > threshold:
        print(f'Minimum k required to match performance threshold = {curr_k}')
        show_metrics(actual = test_metrics.actuals, predicted = test_metrics.predictions, pos_label = 1, neg_label = 0)
        break