# BLS MLflow Example Notebook 4- Hugging Face Transformers & ONET
*Remy Stewart, BLS Civic Digitial Fellow Summer 2022*

*The following code builds from a previous script developed by BLS Data Scientist David Oh*.


# 1.0 Introduction

This notebook provides a walkthrough for how to integrate MLflow into a machine learning pipeline featuring  Hugging Face's transformers library. Transformers are state-of-the-art neural network models for natural language processing that are ideal for language-based ML applications at the BLS. This example notebook delineates how to incorperate Mlflow's features with the transformers library. Please refer to the first example notebook titled "sklearn_logreg_example_1" for an introduction to MLflow's core API, as well as to the BLS MLflow User Guide for a comprehensive overview of MLflow overall. 

There is no current model flavor for Huggingface transformers within Mlflow in contrast from other deep learning libraries such as Pytorch and Tensorflow. While Huggingface includes native intergration for MLOps-based logging through platforms such as Mlflow that you'll see featured within this notebook, there are multiple points of further customization required to comprehensively track the many features of transformers models within the BLS Mlflow server. We'll walk through these in detail for easy replication of the following methods within other transformer-based applications. 

The data and modeling goal in this example are the same as from the first Scikit-learn notebook. We will be predicting which tasks are associated with 37 potential General Work Activities (GWAs) from public data sourced from the Occupational Information Network's (ONET) occupational requirements content module.

# 2.0 Set-Up

Let's import our standard ML libraries, libraries for deep learning applications including transformers itself, Hugging Face's associated Dataset module, Pytorch, as well as Scikit-learn methods for splitting our data and computing performance metrics. We additionally retrieve a few customized modules such as a GPU manager for the BLS server this script is trained within along with a set of helper functions for tasks such as dataset splitting and generating multilabel predictions. We also import Mlflow itself along with a few new MLflow modules that we'll explore further. 

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import tempfile
import os

# Deep learning modules 
import transformers
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EvalPrediction
import torch

# Sklearn modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score

# Helper functions
import pyfiles

# MLflow modules
import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec
from mlflow.types import DataType

We'll want to start by configuring our virtual environment to support the intergration of our MLflow server with Hugging Face's transformers. We set an environmental variable within our current conda environment to configure our MLflow server's artifact logging directory as the location to store our logged model training checkpoint files in. This ensures that artifacts will be successfully recorded within the MLflow UI following our transformer model's training. 

In [3]:
%env HF_MLFLOW_LOG_ARTIFACTS=1

env: HF_MLFLOW_LOG_ARTIFACTS=1


We then set our tracking URI to establish our connection with the BLS MLflow server and set our run to its designated experiment.

In [None]:
mlflow.set_tracking_uri("http://<Remote IP>:<Port>")
mlflow.set_experiment('hf-onet-experiment')

## 2.1 Importing Data

We're now ready to load in our ONET data file, where you'll notice a new column called "Task_backtranslated" compared to the previous example notebook. 

In [5]:
onet_base = pd.read_parquet("./data/onet_task_gwa.pqt")
onet_base

Unnamed: 0,Task,GWA,Task_backtranslated
0,"Review and analyze legislation, laws, or publi...",4A2a4,"Review and analysis of legislation, laws or pu..."
1,"Review and analyze legislation, laws, or publi...",4A4b6,"Review and analysis of legislation, laws or pu..."
2,Direct or coordinate an organization's financi...,4A4b4,management or coordination of the financial or...
3,"Confer with board members, organization offici...",4A4a2,"Talk to members of the board, organizational o..."
4,Analyze operations to evaluate performance of ...,4A2a4,Analyze the operations to evaluate the perform...
...,...,...,...
23009,Unload cars containing liquids by connecting h...,4A3a2,Download vehicles that contain liquids by conn...
23010,Copy and attach load specifications to loaded ...,4A1b1,Copy and attach charging specifications to the...
23011,Start pumps and adjust valves or cables to reg...,4A3a3,Start pumps and adapt valves or cables to regu...
23012,"Perform general warehouse activities, such as ...",4A1b3,Carrying out general storage activities such a...


These represent additional records generated from the original task descriptions through a language translation data augmentation strategy. Fine-tuning a pre-trained transformer model towards a specific goal such as robust GWA membership prediction per task is best supported by having as many records to train on as possible. The records were processed by a series of translation models [available through Hugging Face](https://huggingface.co/Helsinki-NLP) from the University of Helsinki's Language Technology Research Group by converting the original English text into Spanish, then German, and then back to English. This data augmentation strategy provides us with more task examples for our model that are the right balance of being conceptually similar to the original tasks but varying enough to provide additional information for our model within training.

# 3.0 Model Preparation

We'll now go through the core steps for establishing a transformers model pipeline by preparing & tokenizing our data sets, establishing the model, and specifying its performance metrics. This section doesn't include any direct MLflow intergration, but we'll walk through the process either way to ensure our understanding for each component.   

## 3.1 Splitting Datasets & Instantiating the Model 


We'll draw from our helper function file to transform the original data frame into columns for each GWA, split the data sets, and convert the data splits into transformer Datasets. We will use 78% of our data set for training, 20% for testing, and 2% for model prediction following our logging and loading of the model into MLflow. This function keeps the backtranslated examples within the training set to prevent data leakage within the testing & prediction sets given these record's non-independence from the original task text. 

While this function converts the training and testing set into transformer Datasets, it leaves the prediction sample as an original Pandas Dataframe. This is because we'll want to test our model's ability to successfully convert the data and generate predictions when it's loaded in directly from MLflow. 

In [6]:
training, testing, prediction, labels = pyfiles.helpers.hf_data_processing(onet_base)

Training Size: 26822
Testing Size: 3356
Prediction Size: 343


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Our pre-trained transformers model of choice will be **DistilRoBERTa**, which is available for download directly from Hugging Face's model hub. DistilRoBERTa combines the performance advancements of [RoBERTa](https://arxiv.org/pdf/1907.11692.pdf) building from the base BERT model that established transformers into mainstream deep learning use, with model compression through the [knowledge distallation](https://arxiv.org/pdf/1910.01108.pdf) technique. DistilRoBERTa strikes an ideal balance between high performance towards classification tasks with being a smaller model compared to alternative transformers that requires less computing resources and subsequently promotes faster run times.    

We additionally create two dictionaries with the first featuring the 37 original GWA codes to their equivalent numerical encoding used within our model, and the corollary mapping of said numbers back to their original labels. These dictionaries are provided directly to the model as stored attributes. 

In [7]:
label2id = {label:idx for idx, label in enumerate(labels)}
id2label = {idx:label for idx, label in enumerate(labels)}

model = AutoModelForSequenceClassification.from_pretrained('distilroberta-base', 
                                                           problem_type='multi_label_classification',
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.bias', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.bias

## 3.2 Tokenization

The following function defines how the DistilRoBERTa text tokenizer should process each of the data sets. It tokenizes the tasks in batches following the unique [tokenization approach](https://huggingface.co/docs/transformers/tokenizer_summary) required for our specific transformer model, processing individual task text into vectors that can be read into the model that additionally includes the multi-label multi-class GWA membership output. 

In [8]:
def tokenization(data):
    text = data['Task']
    
    # Setting longest task description as default max length within tokenization
    tokenized = tokenizer(text, padding='max_length')
    
    # Structure Dataset to feature tokenized text, attention masks, and multiclass multilabel membership labels
    labels_batch = {key: data[key] for key in data.keys() if key in labels}
    labels_matrix = np.zeros((len(text), len(labels)))
    for index, label in enumerate(labels):
        labels_matrix[:, index] = labels_batch[label]
        tokenized['labels'] = labels_matrix.tolist()

    return tokenized

We then load in DistilRoBERTa's associated tokenizer, process the training and testing set in batches based on GWA column, and replace the original column names with transformer's preferred column labels.

In [None]:
tokenizer = AutoTokenizer.from_pretrained('distilroberta-base')

tokenized_training = training.map(tokenization, batched=True, remove_columns=training.column_names)
tokenized_testing = testing.map(tokenization, batched=True, remove_columns=testing.column_names)

## 3.3 Metrics 

We'll use the F1 score, ROC AUC, and accuracy metrics to measure model performance for our multi-label multi-class task. This code is adapted from a [script developed](https://jesusleal.io/2021/04/21/Longformer-multilabel-classification/) by Data Scientist Jesus Leal Trujillo designed to compute accurate metrics when performing multilabel classification tasks with transformers. It additionally calls our `helpers_hf.force_prediction` function to ensure at least one GWA assignment for each input task similar to our approach featured in the previous Scikit-learn example notebook. 

In [10]:
def multi_label_metrics(predictions, labels):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = helpers.force_prediction(pd.DataFrame(probs.numpy())).to_numpy()
    # finally, compute metrics
    y_true = labels
    f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
    roc_auc = roc_auc_score(y_true, y_pred, average = 'micro')
    accuracy = accuracy_score(y_true, y_pred)
    # return as dictionary
    metrics = {'f1': f1_micro_average,
               'roc_auc': roc_auc,
               'accuracy': accuracy}
    return metrics
  
def compute_metrics(pred: EvalPrediction):
    model_predictions = pred.predictions[0] if isinstance(pred.predictions, tuple) else pred.predictions
    result = multi_label_metrics(predictions=model_predictions, labels=pred.label_ids)
    return result

# 4.0 Model Training 

We have almost all of the necessary components established to train our model, short some final configuration of our training arguments relevant to MLflow logging. 

## 4.1 Training Arguments & Trainer

We'll configure a selection of training arguments for our DistilRoBERTa model that prioritizes faster run times and preserving GPU memory for our example workflow, such as with completing only 1 epoch of training and using small batch sizes to prevent memory overload.  

There are two parameters that are of particular importance to our goal of MLflow tracking intergration for our model. `report_to='mlflow'` clearly implies that this parameter cues our trainer to report its logged values to our linked MLflow server. `mp_parameters` is much less intuitive in its importance than `report_to`. This argument is set as a empty string as its default value and it doesn't have a description within the `TrainingArguments` [documentation](https://huggingface.co/docs/transformers/v4.20.1/en/main_classes/trainer#transformers.TrainingArguments). My brief research on the argument has lead me to suspect that it may be designed for model parallelization within cloud computing platforms such as Amazon Sagemaker. What's most important for our purposes is to use a white space to replace this empty string, as otherwise it will cause logging errors within MLflow's `log_parameters`.   

In [11]:
training_args = TrainingArguments(output_dir='./results', 
                                  num_train_epochs=1,
                                  per_device_train_batch_size=8,
                                  per_device_eval_batch_size=8,
                                  save_strategy='epoch',
                                  save_total_limit=3,
                                  learning_rate=3e-5,
                                  do_eval=True,
                                  evaluation_strategy='epoch',
                                  load_best_model_at_end=True,
                                  metric_for_best_model='f1',
                                  report_to="mlflow", 
                                  mp_parameters=' ')

We're now ready to initialize our full model trainer with the DistilRoBERTa model itself, its configured training arguments, its training and testing sets, and the custom function to compute performance metrics. 

In [12]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_training,
    eval_dataset=tokenized_testing,
    compute_metrics=compute_metrics
)

## 4.2 Training the Model 

Running `trainer.train` automatically cues transformers to log all of the set parameters and generated performance metrics during the training of our model to our MLflow server. We won't have to make any calls to methods such as `mlflow.log_parameters` or `mlflow.log_metrics` ourselves thanks to our earlier steps to configure our transformer  model to report to MLflow. 

In [13]:
trainer.train()

***** Running training *****
  Num examples = 26822
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 3353
Trainer is attempting to log a value of "{0: '4A1a1', 1: '4A1a2', 2: '4A1b1', 3: '4A1b2', 4: '4A1b3', 5: '4A2a1', 6: '4A2a2', 7: '4A2a3', 8: '4A2a4', 9: '4A2b1', 10: '4A2b2', 11: '4A2b3', 12: '4A2b4', 13: '4A2b5', 14: '4A2b6', 15: '4A3a1', 16: '4A3a2', 17: '4A3a3', 18: '4A3a4', 19: '4A3b1', 20: '4A3b4', 21: '4A3b6', 22: '4A4a1', 23: '4A4a2', 24: '4A4a3', 25: '4A4a4', 26: '4A4a5', 27: '4A4a6', 28: '4A4a7', 29: '4A4a8', 30: '4A4b3', 31: '4A4b4', 32: '4A4b5', 33: '4A4b6', 34: '4A4c1', 35: '4A4c2', 36: '4A4c3'}" for key "id2label" as a parameter. MLflow's log_param() only accepts values no longer than 250 characters so we dropped this attribute.
Trainer is attempting to log a value of "{'4A1a1': 0, '4A1a2': 1, '4A1b1': 2, '4A1b2': 3, '4A1b3': 4

Epoch,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
1,0.0717,0.067361,0.605961,0.778505,0.529499


***** Running Evaluation *****
  Num examples = 3356
  Batch size = 8
Saving model checkpoint to ./results/checkpoint-3353
Configuration saved in ./results/checkpoint-3353/config.json
Model weights saved in ./results/checkpoint-3353/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/checkpoint-3353 (score: 0.6059611703582172).
Logging artifacts. This may take time.


TrainOutput(global_step=3353, training_loss=0.09345755788983426, metrics={'train_runtime': 837.82, 'train_samples_per_second': 32.014, 'train_steps_per_second': 4.002, 'total_flos': 3555258286958592.0, 'train_loss': 0.09345755788983426, 'epoch': 1.0})

We can see that we've successfully completed our model training and obtained our performance metrics on the test set. If you were to visit the MLflow UI for this run at this point in this demo, you'd see that a range of parameters, metrics, and artifacts such as saved checkpoints have been successfully logged. However, there are a few key components to our logged model still missing that we'll want to take some additional steps to add into MLflow. 

# 5.0 Customized Model Logging

As mentioned at the start of this demo script, Transformers models are not currently supported as one of MLflow's established model flavors, meaning that there isn't a prexisting method such as `mlflow.sklearn.log_model` to easily log our model into the server and ensure its seamless retrieval in the future. However, all flavors inherit MLflow's model base class `mlflow.pyfunc`, which we'll be able use to successfully save and retrieve models without currently specified flavors. 

## 5.1 Model Loader Class

The key step we need to take to use `mlflow.pyfunc.log_model` for our fine-runed DistilRoBERTa model is to customized our own class known as a **model loader**. Creating a model loader script guides MLflow towards knowing how to instantiate a new instance of the model after it's logged into the server. We'll use the loader as a separate Python file within our logging, but since we'd like to review the script itself we'll write the file internally through `%%writefile`

Model loaders need to include three primary components- a model initialization method, a method for how to generate predictions on future data, and a method for calling and loading the customized model from a specified path (which for our purposes will be from our MLflow server). The methods to achieve these goals should import their required packages directly so they can be called upon in production environments. You can add additional methods to your customized model loader as well.

We'll include these features through a class titled `TransformerONET`. Initializing a `TransformerONET` instance automatically retrieves our fine-tuned DistilRoBERTa model, its model configuration, and its tokenizer. We'll add our `helpers_hf.force_prediction` function into the model loader script so we can access this prediction converter within `TransformersONET` without having to package the model loader script along with the helpers_hf file. 
- `predict` tokenizes passed input data, generates predictions, and then forces at least one GWA assignment for each input task through `force_prediction`.
- Finally, `_load_pyfunc` simply initializes the TransformerONET model from a passed directory path, which for us will be MLflow's Model Registry. The method must be consistently labeled as `_load_pyfunc` within customized model loaders. 

In [14]:
%%writefile "./pyfiles/model_loader_hf.py"

import torch
import pandas as pd
from transformers.models.auto import AutoConfig, AutoModelForSequenceClassification
from transformers.models.auto.tokenization_auto import AutoTokenizer

class TransformerONET:
    def __init__(self, model_name: str, tokenizer = None):
        self.model_name = model_name
        self.config = AutoConfig.from_pretrained(model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer or model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, config=self.config)
    
    def force_prediction(self, predicted_probs_df):
        import pandas
        import numpy as np
        
        predicted_probs_df[predicted_probs_df > 0.5] = 1
        predicted_prob_df_nomax = predicted_probs_df[predicted_probs_df.max(axis=1) < 0.5]
        predicted_prob_df_nomax.values[range(len(predicted_prob_df_nomax.index)), 
                                          np.argmax(predicted_prob_df_nomax.values, axis=1)] = 1
        all_predicted = predicted_prob_df_nomax.combine_first(predicted_probs_df)
        all_predicted[all_predicted != 1] = 0
        return all_predicted
        
    def predict(self, data: pd.DataFrame) -> pd.DataFrame:
        import torch
        import pandas as pd
        import numpy as np
        from datasets import Dataset
        
        with torch.no_grad():
            # Tokenize the passed data set
            data = Dataset.from_pandas(data)
            inputs = self.tokenizer(data['Task'], padding=True, return_tensors='pt')
        
            # Send data set to GPU for predictions by GWA column 
            if self.model.device.index != None:
                torch.cuda.empty_cache()
                for key in inputs.keys():
                    inputs[key] = inputs[key].to(self.model.device.index)    

            # Generate model predictions and retrieve the produced probabilities
            predictions = self.model(**inputs)
            probs = torch.nn.Softmax(dim=1)(predictions.logits)
            probs = probs.detach().cpu().numpy()
            
            # Force at least one prediction for each task & convert numeric column codes to original GWA code labels
            outputs = self.force_prediction(pd.DataFrame(probs[:,]))      
            outputs = outputs.rename(self.config.id2label, axis=1)
       
        return outputs
        
def _load_pyfunc(path):
    import os
    return TransformerONET(os.path.abspath(path))

Overwriting model_loader_hf.py


## 5.2 Declaring Model Signatures

In the previous Scikit-learn based example notebooks we used the `infer_signature` method to obtain our model signature, referring to the structure & data types of the input data and predicition outputs. Let's therefore explore the manual alternative for creating our model's signature through using MLflow's `Schema` objects.

It's important to carefully think through how to best structure our input schema, since loading in a model from MLflow with a set model signature will require any passed data for future predictions to have exactly the same input format. Our model loader is structured to tokenized passed data by referencing the `data['task']` column. Since this model is only being used in a development setting where we have associated class labels, our passed data includes both the task itself and the 37 GWA binary codes. In a production setting with unseen data, the schema would likely be better set as just one text column. 

The following `Schema` objects for the input and output data use `ColSpec` to specify the column values, their accepted data type, and their column label. 

In [15]:
input_schema = Schema([
  ColSpec(DataType.string, "Task"),
  ColSpec(DataType.integer, "4A1a1"),
  ColSpec(DataType.integer, "4A1a2"),
  ColSpec(DataType.integer, "4A1b1"),  
  ColSpec(DataType.integer, "4A1b2"),  
  ColSpec(DataType.integer, "4A1b3"),
  ColSpec(DataType.integer, "4A2a1"),
  ColSpec(DataType.integer, "4A2a2"),
  ColSpec(DataType.integer, "4A2a3"),
  ColSpec(DataType.integer, "4A2a4"),  
  ColSpec(DataType.integer, "4A2b1"),  
  ColSpec(DataType.integer, "4A2b2"),
  ColSpec(DataType.integer, "4A1b2"),
  ColSpec(DataType.integer, "4A2b3"),
  ColSpec(DataType.integer, "4A2b4"),
  ColSpec(DataType.integer, "4A2b5"),  
  ColSpec(DataType.integer, "4A2b6"),  
  ColSpec(DataType.integer, "4A3a1"),
  ColSpec(DataType.integer, "4A3a2"),
  ColSpec(DataType.integer, "4A3a3"),
  ColSpec(DataType.integer, "4A3a4"),
  ColSpec(DataType.integer, "4A3b1"),  
  ColSpec(DataType.integer, "4A3b4"),  
  ColSpec(DataType.integer, "4A3b6"),
  ColSpec(DataType.integer, "4A4a1"),
  ColSpec(DataType.integer, "4A4a2"),
  ColSpec(DataType.integer, "4A4a3"),
  ColSpec(DataType.integer, "4A4a4"),
  ColSpec(DataType.integer, "4A4a5"),  
  ColSpec(DataType.integer, "4A4a6"),  
  ColSpec(DataType.integer, "4A4a7"),
  ColSpec(DataType.integer, "4A4a8"),
  ColSpec(DataType.integer, "4A4b3"),
  ColSpec(DataType.integer, "4A4b4"),
  ColSpec(DataType.integer, "4A4b5"),  
  ColSpec(DataType.integer, "4A4a6"),  
  ColSpec(DataType.integer, "4A4c1"),
  ColSpec(DataType.integer, "4A4c2"),
  ColSpec(DataType.integer, "4A4c3")
])

Our model output has almost the same schema as the model input, but without the 'Task' text as the original input feature.

In [16]:
output_schema = Schema([ColSpec(DataType.integer, "4A1a1"),
  ColSpec(DataType.integer, "4A1a2"),
  ColSpec(DataType.integer, "4A1b1"),  
  ColSpec(DataType.integer, "4A1b2"),  
  ColSpec(DataType.integer, "4A1b3"),
  ColSpec(DataType.integer, "4A2a1"),
  ColSpec(DataType.integer, "4A2a2"),
  ColSpec(DataType.integer, "4A2a3"),
  ColSpec(DataType.integer, "4A2a4"),  
  ColSpec(DataType.integer, "4A2b1"),  
  ColSpec(DataType.integer, "4A2b2"),
  ColSpec(DataType.integer, "4A1b2"),
  ColSpec(DataType.integer, "4A2b3"),
  ColSpec(DataType.integer, "4A2b4"),
  ColSpec(DataType.integer, "4A2b5"),  
  ColSpec(DataType.integer, "4A2b6"),  
  ColSpec(DataType.integer, "4A3a1"),
  ColSpec(DataType.integer, "4A3a2"),
  ColSpec(DataType.integer, "4A3a3"),
  ColSpec(DataType.integer, "4A3a4"),
  ColSpec(DataType.integer, "4A3b1"),  
  ColSpec(DataType.integer, "4A3b4"),  
  ColSpec(DataType.integer, "4A3b6"),
  ColSpec(DataType.integer, "4A4a1"),
  ColSpec(DataType.integer, "4A4a2"),
  ColSpec(DataType.integer, "4A4a3"),
  ColSpec(DataType.integer, "4A4a4"),
  ColSpec(DataType.integer, "4A4a5"),  
  ColSpec(DataType.integer, "4A4a6"),  
  ColSpec(DataType.integer, "4A4a7"),
  ColSpec(DataType.integer, "4A4a8"),
  ColSpec(DataType.integer, "4A4b3"),
  ColSpec(DataType.integer, "4A4b4"),
  ColSpec(DataType.integer, "4A4b5"),  
  ColSpec(DataType.integer, "4A4a6"),  
  ColSpec(DataType.integer, "4A4c1"),
  ColSpec(DataType.integer, "4A4c2"),
  ColSpec(DataType.integer, "4A4c3")
])

We can then pass both objects directly into MLflow's `ModelSignature` method.

In [17]:
signature = ModelSignature(inputs = input_schema, outputs = output_schema)

While `infer_signature` is a simpler approach compared to manually setting Schemas for establishing our model's signature, this alternative strategy may be preferred for ML use cases when the data expected during later stages of a model's lifecycle will be different than what the model was originally trained on. 

## 5.3 Logging Through `mlflow.pyfunc`

We have almost everything configured to load our custom `TransformersONET` model into MLflow. We'll just need to add the fine-tuned model and its tokenizer into a temporary directory through transformer's `save_pretrained` method. 

In [18]:
tempdir = tempfile.mkdtemp()
model.save_pretrained(tempdir)
tokenizer.save_pretrained(tempdir)

Configuration saved in /tmp/tmprspc12q7/config.json
Model weights saved in /tmp/tmprspc12q7/pytorch_model.bin
tokenizer config file saved in /tmp/tmprspc12q7/tokenizer_config.json
Special tokens file saved in /tmp/tmprspc12q7/special_tokens_map.json


('/tmp/tmprspc12q7/tokenizer_config.json',
 '/tmp/tmprspc12q7/special_tokens_map.json',
 '/tmp/tmprspc12q7/vocab.json',
 '/tmp/tmprspc12q7/merges.txt',
 '/tmp/tmprspc12q7/added_tokens.json',
 '/tmp/tmprspc12q7/tokenizer.json')

Let's now break down our specified parameters in the following call to `mlflow.pyfunc.log_model`.

`best_model` will be the name of the directory within this run's artifact storage in MLflow that the model, tokenizer, and configuration file is stored into. `data_path` points to the previous created temporary directory where we saved these files. `code_path` links `mlflow.pyfunc.log_model` to our customized model script, while `loader_module` directly points to the code that includes the required `_load_pyfunc` method. `conda_env` points to our directory's conda.yaml file to log for reproducing the model's training environment. `registered_model_name` and `signature` set the title of our model in the Model Registry and pass the signature we just manually instantiated.

We'll additionally log this notebook and its helper function file as well. 

In [19]:
mlflow.pyfunc.log_model("best_model", 
                            data_path=tempdir, 
                            code_path=["./pyfiles/model_loader_hf.py"], 
                            loader_module="model_loader_hf",
                            conda_env="../conda.yaml",
                            signature=signature,
                            registered_model_name="hf_onet")

mlflow.log_artifact("hf_transformers_example_4.ipynb")
mlflow.log_artifact("./pyfiles/helpers.py")

Successfully registered model 'hf_onet'.
2022/07/13 10:42:03 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: hf_onet, version 1
Created version '1' of model 'hf_onet'.


The above output signifies that our model was successfully logged into MLflow. We didn't have to start the model run we've been tracking throughout this script since transformers does it automatically for us, but we do need to directly end the run ourselves. 

In [20]:
mlflow.end_run()

# 6.0 MLflow Tracking UI & Model Registry 
Let's now refer to the MLflow UI page by switching to `http://<Remote IP>:<Port>/` within our local browser and head to our completed run underneath `hf-onet-experiments`. When we reference our run directly within the MLflow UI, we see that transformers has automatically logged for us 145 parameters for the wide range of customizable training arguments for our model. It's captured performance metrics we originally established such as the F1 score, ROC AUC, and accuracy along with some additional metrics such as the training and evaluation running times in seconds, the number of model optimization steps taken per second, and the cross-entropy loss score. 

Our model has been stored as a MLflow `MLModel` object which signifies that it'll now be successfully processed by both MLflow itself as well as across diverse deployment settings. Additional artifacts have also been logged such as this Jupyter Notebook, the helper function file, and our model's logged checkpoints during training.

We can also switch to the Models tab in the top navigation bar and verify that our model was successfully registered by our customized model loader method.  

![hf_reg.png](../imgs/hf_reg.png)

## 6.1 Comparing Models 

An additional feature to highlight within MLflow is the Compare page, which you can access within individual Experiment logs after selecting the check boxes of at least two model runs. I've separately logged four runs of our TransformersONET model trained for three epochs each, in which the only difference between the runs is their learning rates (2e-5 up to 5e-5). 

We can visually contrast these runs through multiple plots such as the parallel coordinate plot below: 
![pcplot.png](../imgs/pcplot.png)
The first coordinate establishes which line is associated with each tested learning rate, with the coordinates to follow delineates these model's differences in performance on four evaluation metrics. 

We can scroll below the Visualization section of the compare page to investigate these metric differences across the four models as well:
![num_diffs.png](../imgs/num_diffs.png)

# 7.0 Model Loading & Generating Predictions 

Let's now test whether our custom TransformersONET model loader will correctly instantiate our saved model by using `mlflow.pyfunc.load_model` to retrieve it from the Model Registry. 

In [21]:
registered_model = mlflow.pyfunc.load_model("models:/hf_onet/1")

loading configuration file /tmp/tmp75qfyaqg/data/tmprspc12q7/config.json
Model config RobertaConfig {
  "_name_or_path": "distilroberta-base",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "4A1a1",
    "1": "4A1a2",
    "2": "4A1b1",
    "3": "4A1b2",
    "4": "4A1b3",
    "5": "4A2a1",
    "6": "4A2a2",
    "7": "4A2a3",
    "8": "4A2a4",
    "9": "4A2b1",
    "10": "4A2b2",
    "11": "4A2b3",
    "12": "4A2b4",
    "13": "4A2b5",
    "14": "4A2b6",
    "15": "4A3a1",
    "16": "4A3a2",
    "17": "4A3a3",
    "18": "4A3a4",
    "19": "4A3b1",
    "20": "4A3b4",
    "21": "4A3b6",
    "22": "4A4a1",
    "23": "4A4a2",
    "24": "4A4a3",
    "25": "4A4a4",
    "26": "4A4a5",
    "27": "4A4a6",
    "28": "4A4a7",
    "29": "4A4a8",
    "30": "4A4b3",
    "31": 

The tokenizer is loaded automatically within our TransformerONET class, so we can move forward with generating predictions following loading in our transformers model. We'll use the 2% of our original task text sample saved as the `predictions` data frame. 

In [22]:
registered_model.predict(prediction)

Unnamed: 0,4A1a1,4A1a2,4A1b1,4A1b2,4A1b3,4A2a1,4A2a2,4A2a3,4A2a4,4A2b1,...,4A4a6,4A4a7,4A4a8,4A4b3,4A4b4,4A4b5,4A4b6,4A4c1,4A4c2,4A4c3
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
338,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
339,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
340,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
341,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


This dataframe of predicted GWA membership confirms that we've successfully fine-tuned our DistilRoBERTa model, logged it into the MLflow Model Registry, and can generate predictions on unseen data with our custom model loader configuration. 