## Train Base Model -- Development of a Transaction Categorization Model

This notebook aims to create the base model to categorize transactions. In this scenario, all categories will have the same importance, adjusted by the frequency.

### Tasks:
 - [ ] Load test dataset.
 - [ ] Load models:
     - [ ] Base model.
     - [ ] Clothing model.
 - [ ] Evaluate models on the test set.
     - [ ] Generate confusion matrix.
     - [ ] Check metrics on clothing category.

## Libraries and Configurations

In [1]:
import pandas as pd

import mlflow
from mlflow.tracking import MlflowClient
from sklearn.preprocessing._label import LabelEncoder

import seaborn as sns
import matplotlib.pyplot as plt
from IPython.core.display import HTML

from application.code.core.configurations import configs
from application.code.adapters.storage import read_dataset
from application.code.core.model_evaluation import (compute_multiclass_classification_metrics,
                                                    generate_feature_importance_report,
                                                    generate_confusion_matrix,
                                                    plot_folds_metrics)

from application.code.adapters.mlflow_adapter import (get_mlflow_artifact_content,
                                                      get_published_model)
from application.code.core.feature_engineering import (format_string_columns, 
                                                       standardize_labels)

sns.set_style("whitegrid")

## Constants

In [2]:
TARGET_COLUMN = 'grupo_estabelecimento'

## Local Functions

In [3]:
def encode_label(label_encoder: LabelEncoder, label: str) -> int:
    try:
        return int(label_encoder.transform([label]))
    except ValueError:
        return -1    

## MLflow Settings

In [4]:
mlflow.set_tracking_uri(configs.mlflow.uri)
mlflow.set_experiment(configs.mlflow.experiment_name);

## Load Dataset

The `test` dataset is loaded to perform the final evaluation.

In [5]:
df = read_dataset(base_path=configs.datasets.base_path, stage='raw', file_name='test')

display(HTML('<h4>Dataset</h4>'))
print(f'Records: {len(df)}')

df = df.drop_duplicates()
display(HTML('<h4>Deduplicated Dataset</h4>'))
print(f'Records: {len(df)}')

Records: 1011


Records: 1004


To create the model and perform experiments, only the `training` dataset will be used. The evaluation will be performed by creating some time-oriented `validation` datasets using the same methodology used to create the `test` dataset.

3 sets of `training` and `validation` sets will be created, each of them representing a fold. At the end, it will be possible to have an efficacy measurement with a variance notion.
It is important to use `validation` set avoid using the `test` several times. Ideally, it should be used only once, for the final assessment.

## Load Models

The models are retrieved from MLflow server to be used as it would in production.

In [6]:
base_model = get_published_model(model_name=configs.mlflow.base_model_name,
                                 stage="Staging")

## Evaluate Model

Compute predictions and extract numeric version of labels.

In [7]:
predictions = base_model.predict(df)
numeric_predictions = [p for _, p in predictions]

Preprocess raw labels to be able to compare with the model generated labels.

In [9]:
labels = (
    df
    .copy()
    [[TARGET_COLUMN]]
    .pipe(format_string_columns, columns=[TARGET_COLUMN])
    .pipe(standardize_labels)
    [TARGET_COLUMN]
    .to_list()
)

Extracts the label encoder from the wrapped model

In [17]:
label_encoder = base_model._model_impl.label_encoder

Encodes the labels as numbers.

In [18]:
numeric_labels = [encode_label(label_encoder, l) for l in labels]

Compute metrics

In [15]:
metrics = compute_multiclass_classification_metrics(numeric_labels, numeric_predictions)

In [28]:
metrics_df = (
    pd
    .DataFrame([metrics])
    .T
    .reset_index()
    .set_axis(['metric', 'value'], axis=1)
)
metrics_df

Unnamed: 0,metric,value
0,macro_precision,0.161088
1,macro_recall,0.162674
2,macro_f1,0.157204
3,micro_precision,0.400398
4,micro_recall,0.400398
5,micro_f1,0.400398
6,weighted_precision,0.365754
7,weighted_recall,0.400398
8,weighted_f1,0.376822


## Concluding Remarks
