## Custom model training, Inference and Evaluation

This notebook is intended for implementing the custom model training python script based on best parameters found in the first notebook

In [1]:
# importing packages/modules

import pandas as pd
import numpy as np
import sys
sys.path.append("..")
from src.train import train_model
from src.score import score
from sklearn.metrics import f1_score
import mlflow

In [2]:
# defining dictionary to map our predictions to, for our custom score function

lithology_keys = {
    30000: 0,
    65030: 1,
    65000: 2,
    80000: 3,
    74000: 4,
    70000: 5,
    70032: 6,
    88000: 7,
    86000: 8,
    99000: 9,
    90000: 10,
    93000: 11
}

In [3]:
# loading all available data 

X_train = pd.read_csv("../data/train.csv", sep=";")
target = X_train['FORCE_2020_LITHOFACIES_LITHOLOGY']
X_train = X_train.drop(['FORCE_2020_LITHOFACIES_LITHOLOGY', 'FORCE_2020_LITHOFACIES_CONFIDENCE'], axis=1)

X_test = pd.read_csv("../data/test.csv", sep=";")
y_test = pd.read_csv("../data/test_targets.csv", sep=";")['FORCE_2020_LITHOFACIES_LITHOLOGY']
y_test = y_test.map(lithology_keys)

hidden_X_test = pd.read_csv("../data/hidden_test.csv", sep=";")
hidden_y_test = hidden_X_test['FORCE_2020_LITHOFACIES_LITHOLOGY']
hidden_y_test = hidden_y_test.map(lithology_keys)

## 1. Training

In [4]:
# training the custom model by calling the training script

train_model(X_train, target)



step 1: Data Transformation Complete!



Parameters: { "n_estimators" } are not used.




step 2: Model Trained!






step 3: Model & Transformer have been logged!






step 4: Custom model logged!


## 2. Evaluation 

**Remember that both test sets have different lithology distributions, so its expected that model performance will be lower, but how much lower?**

In [5]:
# loading the custom mlflow model

run_id = "0ba34f33b03f477bb557bbcfd293609d" # mlflow run_id
custom_model = mlflow.pyfunc.load_model(f"runs:/{run_id}/lithology_classifier")

In [6]:
# helper function for model evaluation

def evaluate_model_performance(train, target, custom_model=custom_model):
    y_pred = custom_model.predict(train)

    score_ = score(target, y_pred)
    f1score = f1_score(target, y_pred, average="weighted")

    print(f"Custom score: {score_:.4f}")
    print(f"f1-score: {f1score:.1%}")

### a. Train set

In [7]:
evaluate_model_performance(X_train, target.map(lithology_keys).values)

Custom score: -0.2969
f1-score: 88.3%


### b. Test set 1

In [8]:
evaluate_model_performance(X_test, y_test)

Custom score: -0.5375
f1-score: 77.5%


### c. Test set 2

In [9]:
evaluate_model_performance(hidden_X_test, hidden_y_test)

Custom score: -0.4627
f1-score: 80.0%


As expected, the model performs better on the train set, and poorer on both test sets. On average, there is a ~7.2% f1-score, and -0.16 custom score difference between all data sets.