# BLS MLflow Example Notebook 4 Extended- Hugging Face Transformers & Ray Tune Hyperparameter Optimization 

*Remy Stewart, BLS Civic Digitial Fellow Summer 2022*

*The following code builds from a previous script developed by BLS Data Scientist David Oh.*


# 1.0 Introduction

This notebook extends the base Hugging Face and ONET MLFlow intergration notebook featured within this repository by incorperating the Ray Tune hyperparameter optimization library to support state-of-the-art optimization techniques as logged within the remote BLS MLflow server. This notebook is similar in its code to the base transformers notebook. It therefore focuses on the Ray Tune configuration and does not repeat explanations featured within the original example walkthrough. 

Hyperparameter optimization is an essential step within the machine learning model development lifecycle to engineer high-performing ML models that are configured with the identified best-suited hyperparameters within both model development and as applied for production use. Hyperparameters can significantly influence model behavior, and deep learning models such as Hugging Face transformers are dependent on a large number of key hyperparameters that are strong candidates for potential tuning.

Finding the optimal hyperparameters for a model takes up sizable time within model experimentation, and iterating through a search space of the many potential combinations of multiple hyperparameters can be computationally costly and time inefficient. The Ray Tune library offers a suite of more advanced hyperparameter tuning methods over traditional techniques such as Grid & Random Search that aim to minimize resource demands and direct hyperparameter searches through performance-informed algorithms. Ray Tune also supports MLflow incorperation, which is an essential advantage of the library to ensure our hyperparameter exploration is logged and reproducible via the BLS MLflow server. 

This notebook features hyperparameter tuning of our original DistilRoBERTa transformers classifier on public ONET data via the Tree-Structured Parzen Estimator (TPE) algorithm grounded on a Bayesian optimization approach. This walkthrough explores both the code incorperation of the TPE optimizer as well as overviews the key features of this technique.

# 2.0 Set-Up

To get started, we'll install Ray Tune if it's not already present in your current environment (which it will be if running this script within the bls-mlflow conda environment) and import our supporting libraries. You'll note at the very end of the library import block the inclusion of `ray.shutdown`, which will successfully end any previous Ray Tune trial initializations as needed in cases such as restarted kernels. 

In [1]:
# !pip install "ray[tune]"

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import tempfile
import os

# Deep learning modules 
import transformers
from datasets import Dataset
from transformers import ( AutoTokenizer, AutoConfig,
                          AutoModelForSequenceClassification, 
                          TrainingArguments, Trainer, EvalPrediction )
import torch

# Sklearn modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score

# Helper functions
import pyfiles

import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.integration.mlflow import MLflowLoggerCallback

import mlflow 
from mlflow.tracking import MlflowClient
ray.shutdown()

We can set an environmental variable to automatically log produced model artifacts within our remote BLS server as linked via its hosted URL. We'll additionally house our hyperparameter tuning runs under the same experiment we created within the original transformers MLflow incorperation example notebook. 

In [2]:
%env HF_MLFLOW_LOG_ARTIFACTS=1

env: HF_MLFLOW_LOG_ARTIFACTS=1


In [None]:
mlflow.set_tracking_uri("http://<Remote IP>:<Port>")
mlflow.set_experiment('hf-onet-experiment')

Using hyperparameter optimization techniques via Ray Tune requires establishing a local file path for the library to record trial performance in order to make prior-informed hyperparameter updates within later tuning trails. I therefore set a local path within the BLS deep learning server I am developing this script within to ensure that Ray Tune will consistently know where to record and reference past optimization trials.   

In [None]:
%env TUNE_RESULT_DIR="<Ray Trial Directory>"

# 3.0 Preprocessing & Transformers Set-Up

We then load in our data and preprocess it using our helper package methods to transform the data sets for training, validation, and testing. 

In [6]:
onet_base = pd.read_parquet("./data/onet_task_gwa.pqt")
onet_base

Unnamed: 0,Task,GWA,Task_backtranslated
0,"Review and analyze legislation, laws, or publi...",4A2a4,"Review and analysis of legislation, laws or pu..."
1,"Review and analyze legislation, laws, or publi...",4A4b6,"Review and analysis of legislation, laws or pu..."
2,Direct or coordinate an organization's financi...,4A4b4,management or coordination of the financial or...
3,"Confer with board members, organization offici...",4A4a2,"Talk to members of the board, organizational o..."
4,Analyze operations to evaluate performance of ...,4A2a4,Analyze the operations to evaluate the perform...
...,...,...,...
23009,Unload cars containing liquids by connecting h...,4A3a2,Download vehicles that contain liquids by conn...
23010,Copy and attach load specifications to loaded ...,4A1b1,Copy and attach charging specifications to the...
23011,Start pumps and adjust valves or cables to reg...,4A3a3,Start pumps and adapt valves or cables to regu...
23012,"Perform general warehouse activities, such as ...",4A1b3,Carrying out general storage activities such a...


In [7]:
training, validation, testing, labels = pyfiles.helpers.data_processing(onet_base)
testing = Dataset.from_pandas(testing)

Training Size: 26822
Testing Size: 3356
Prediction Size: 343


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


There's an important difference to note here between the model instantiation featured in the base transformers example script and how the model needs to be configured for successful use within hyperparameter optimization. We'll need to create a function that can be called to initialize the same DistilRoBERTa model multiple times, as it will be re-launched within hyperparameter tuning across multiple trials. 

In [8]:
label2id = {label:idx for idx, label in enumerate(labels)}
id2label = {idx:label for idx, label in enumerate(labels)}

def model_init():
    return AutoModelForSequenceClassification.from_pretrained('distilroberta-base', 
                                                           problem_type='multi_label_classification',
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)

From here we initialize our tokenizer, tokenize our data splits, and create our custom metric computation function. These code blocks are exact replicates from the initial transformers example script. 

In [11]:
tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
    
def tokenization(data, max_length=50):
    text = data['Task']
    
    # Setting longest task description as default max length within tokenization
    tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=max_length)
    
    # Structure Dataset to feature tokenized text, attention masks, and multiclass multilabel membership labels
    labels_batch = {key: data[key] for key in data.keys() if key in labels}
    labels_matrix = np.zeros((len(text), len(labels)))
    for index, label in enumerate(labels):
        labels_matrix[:, index] = labels_batch[label]
    tokenized['labels'] = labels_matrix.tolist()
    return tokenized

In [None]:
# Split & tokenize data- testing out the removal of train batch size
tokenized_training = training.map(tokenization, batched=True, remove_columns=training.column_names)
tokenized_validation = validation.map(tokenization, batched=True, remove_columns=validation.column_names)
tokenized_testing = testing.map(tokenization, batched=True, remove_columns=testing.column_names)

In [13]:
def multi_label_metrics(predictions, labels):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = pyfiles.helpers.force_prediction(pd.DataFrame(probs.numpy())).to_numpy()
    # finally, compute metrics
    y_true = labels
    f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
    roc_auc = roc_auc_score(y_true, y_pred, average = 'micro')
    accuracy = accuracy_score(y_true, y_pred)
    # return as dictionary
    metrics = {'f1': f1_micro_average,
               'roc_auc': roc_auc,
               'accuracy': accuracy}
    return metrics
  
def compute_metrics(pred: EvalPrediction):
    model_predictions = pred.predictions[0] if isinstance(pred.predictions, tuple) else pred.predictions
    result = multi_label_metrics(predictions=model_predictions, labels=pred.label_ids)
    return result

# 4.0 Training Arguments & Trainer

We can then establish relevant training arguments for our DistilRoBERTa model, particularly those that remain consistent throughout hyperparameter tuning. We set the `report_to='mlflow'` parameter that ensures that the model's parameters and metrics are successfully recorded within the MLflow server as well as set the `mp_parameters` to an empty whitespace value so this argument does not cause subsequent issues within MLflow logging. 

An important consideration within hyperparameter tuning is that because searching for optimal model hyperparameters requires launching and recording multiple experimental runs of our model, we are incentivized to both increase runtime performance while simultaneously reduce storage requirements associated with writing our large models as Pytorch binary files. 

We configure our training arguments to support the first goal by setting the `gradient_accumulation_steps` parameter to allow for more dispersed model updates, which subsequently supports higher data batch sizes for faster training without encountering CUDA memory issues. Additionally, the `save_total_limit` parameter set to 1 will only record one model checkpoint during training, which will reduce the rate of storage accumulation across multiple hyperparameter trials that each require the models to be saved for recording and optimization purposes.  

In [19]:
training_args = TrainingArguments(output_dir='./results',
                                  save_strategy='epoch',
                                  save_total_limit=1,
                                  do_eval=True,
                                  evaluation_strategy='epoch',
                                  gradient_accumulation_steps=4,
                                  metric_for_best_model='f1',
                                  report_to='mlflow', 
                                  mp_parameters=' ')

PyTorch: setting up devices


In [13]:
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_training,
    eval_dataset=tokenized_validation,
    compute_metrics=compute_metrics
)

loading configuration file https://huggingface.co/distilroberta-base/resolve/main/config.json from cache at /home/stewart_r/.cache/huggingface/transformers/42d6b7c87cbac84fcdf35aa69504a5ccfca878fcee2a1a9b9ff7a3d1297f9094.aa95727ac70adfa1aaf5c88bea30a4f5e50869c68e68bce96ef1ec41b5facf46
Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "4A1a1",
    "1": "4A1a2",
    "2": "4A1b1",
    "3": "4A1b2",
    "4": "4A1b3",
    "5": "4A2a1",
    "6": "4A2a2",
    "7": "4A2a3",
    "8": "4A2a4",
    "9": "4A2b1",
    "10": "4A2b2",
    "11": "4A2b3",
    "12": "4A2b4",
    "13": "4A2b5",
    "14": "4A2b6",
    "15": "4A3a1",
    "16": "4A3a2",
    "17": "4A3a3",
    "18": "4A3a4",
    "19": "4A3b1",
    "20": "4A3b4",
    "21": "4A3b6",
    "22": "4A4a1",
    

# 5.0 Hyperparameter Search via TPE & Bayesian Optimization

With the standard components required for a transformers model workflow now establish, let's now overview the core features of the TPE Bayesian optimization algorithm we'll be employing for hyperparameter optimization. TPE’s Bayesian approach is grounded on the algorithm’s use of its own recorded performance history of evaluated hyperparameters on a metric of choice to create a probabilistic model used to make well-informed suggestions towards the next set of hyperparameters to evaluate. 

This is sequentially Bayesian since the model is employing information on prior performance towards the goal of producing posterior estimates that maximizes the probability that the algorithm's selected hyperparameters are set to the optimal value for either minimizing or maximizing the chosen performance metric. The consistent updating of the model's underlying probability distribution used to produce subsequent hyperparameters to test therefore leads to higher-performing model trials with each subsequent prior adjustment with additional trials to learn from.

These characteristics of TPE allows the algorithm to make informed selections towards better-performing hyperparameter values in contrast to simpler techniques such as Grid and Random Search. This leads to faster convergance towards the best-performing hyperparameter values that reduces experimentation time and resource demands. 

## 5.1 Ray Tune Configuration Dictionary & MLflow Callback Intergration

Ray Tune's implementation of TPE is supported by providing a configuration dictionary that designates both the hyperparameters that will be included within the search space, as well as the range of values for each hyperparameter the model is allowed to select between when exploring potential optimal values across trials. 

TPE is unique compared to other Bayesian optimization methods in that it supports both discrete and continuous hyperparameters instead of exclusively continuous search spaces. Ray Tune provides syntax to delineate discrete hyperparameters via `tune.choice` and continous values with `tune.uniform` within the hyperparameter configuration dictionary. `tune.choice` will allow the model to select from any of the specified discrete values, while `tune.uniform` will permit any continuous selection within the minimum and maximum ranges. We therefore set the training sample batch size, model learning rate, potential use of a weight decay parameter, and the selected number of training epoches as our searchable hyperparameters within the Ray Tune configuration dictionary.

We'll additionally instantiate an `MLflowLoggerCallback` instance to provide the required arguments for Ray Tune to log its hyperparameter searches into the remote BLS MLflow server. This includes passing the same experiment name we established for the transformer model itself earlier in the script, as well as the `save_artifact=True` parameter to ensure that trial runs are successfully recorded. 

In [26]:
def tune_config(trial): 
    return { "per_device_train_batch_size": tune.choice([8, 16, 32, 64]),
             "learning_rate": tune.uniform(2e-5, 3e-5),
             "weight_decay": tune.uniform(0.0, 0.1),
             "num_train_epochs": tune.choice([3, 5, 7, 10])}

callbacks=[MLflowLoggerCallback(
        tracking_uri="http://<Remote IP>:<Port>",
        experiment_name="hf-onet-experiment",
        save_artifact=True)]

## 5.2 Hyperparameter Training Configuration

The next code block both establishes the full configuration of our hyperparameter search as well as launches the search when ran. 

- `trainer.hyperparameter_search` itself is from the transformers library. The first argument of `hp_space` passes our created configuration dictionary. 
- `n_trials` specifies how many unique trials should be launched within the search run. This parameter is key to the successful use of the sequential Bayesian strengths of TPE. I've set it to 3 for demonstration purposes, but I'd encourage higher values when this method is used with the direct intention of solidifying hyperparameter selections. 
- `direction='maximize'` establishes that the search will be aiming to increase our chosen primary metric of F1 score, compared to metrics that are designed to be minimized such as cross-entropy loss. 
- `backend='ray'` tells transformers to follow the Ray Tune implementation format for hyperparameter optimization, which is particularly helpful when alternative libraries such as Optuna are installed within the environment that transformers may attempt to incorrectly reference. 
- `search_alg` initializes `HyperOptSearch`, which is the class that provides the implementation of the TPE algorithm. You can see that the metric to optimize has been set to F1 score performance on the evalution set, with the `mode='max'`parameter ensuring that Ray Tune also understands that the TPE algorithm will aim to maximize the F1 score along with the transformer library's aligned passed argument. 
- `callbacks` establishes the MLflow Callback parameters for the optimization trials, while `local_dir` passes the trial storage directory for each launched trial. 
- The `resources_per_trial` dictionary flags for Ray Tune that one CPU and one GPU core will be avaliable for the trials, while `verbose` reduces the output verbosity to a lower level since Ray Tune otherwise produces extensive logs. Even the lower level of verbosity is still quite extensive, so you'll have to do a fair amount of scrolling to get to the end of the following block. 

Finally, you'll also notice that the hyperparameter search is passed into the `best_run` object. `best_run` will retain the best hyperparameters identified across the 3 initialized trials for later reference within our model workflow. 

In [None]:
best_run = trainer.hyperparameter_search(hp_space=tune_config, n_trials=3, direction='maximize', backend='ray',
                                         search_alg=HyperOptSearch(metric='eval_f1', mode='max'), 
                                         callbacks=callbacks, local_dir='<Ray Results Directory>', resources_per_trial={"cpu": 1, "gpu": 1}, verbose=1)

# 6.0 Trial Recording within MLflow

With the three trial runs now complete, let's switch over to the remote BLS MLflow server's UI at `http://<Remote IP>:<Port>/` to investigate how these trials are logged within MLflow. The trial runs are automatically assigned the objective ID that Ray Tune creates to keep track of individual trials as run names. You'll note that there are only 4 parameters logged contrasted to the 145 parameters captured when we log a complete transformer model, with the logged parameters being the selected values for the hyperparameters being tested for this given run. 

While the files of the transformer model components such as the Pytorch model and its training arguments are captured in the results folder, we're not able to save our trials as MLflow models through customizing the `mlflow.pyfunc` model flavor as featured in the base transformers example notebook. We do find that some new artifacts have been recorded such as a JSON file of the selected hyperparameter values for the trial run as well as both JSON and CSV files highlighting the metrics values over training epochs. 

You can think of the logged optimization trials within MLflow less as full models and instead as experimental runs that provide guidance towards how to best create a fully-configured model through the ideal hyperparameters that were identified. These runs are particularly well-suited for MLflow's Compare feature, which allows you to visually contrast metric performance across the tested hyperparameter values:

![tuneviz.png](../imgs/tuneviz.png)

Comparing both visually and with the actual metric values between the models clearly identifies which tested run is our top performer. 

# 7.0 Initializing the Best Tuned Model

We can also correborate our identification of which optimization trial and its associated hyperparameters achieved the highest performance on the model metrics within MLflow through accessing our stored `best_run` object as follows:

In [None]:
best_run.hyperparameters.items()

With this insight gained towards optimal hyperparameters, we can now move towards logging our tuned transformers model more comprehensively within MLflow. The Ray Tune trials artifact logging included both the Pytorch model file and its associated training arguments that we can use to reinitialize our model with the same hyperparameter values set within the best performing trial. We use the `MLflowClient` to directly download these files into a temporary directory through referencing both the model run ID and the artifact directory path of the best performing trial. We can then use both transformers and Pytorch loading methods to bring in the same model and its training arguments into our workspace. 

In [None]:
client = MlflowClient()
tempdir = tempfile.mkdtemp()
model_path = client.download_artifacts("<Model Run>", "<Logged Model Checkpoint>", tempdir)

loaded_trial = AutoModelForSequenceClassification.from_pretrained(model_path, problem_type='multi_label_classification',
                                                                    num_labels=len(labels), id2label=id2label,
                                                                    label2id=label2id)
loaded_arguments = torch.load(os.path.join(model_path, 'training_args.bin'))

To combine the parameters, metrics, and artifacts from training, validation, to testing, the model will need to be re-ran separate from the hyperparameter tuning for it to be constructed properly as a full model when logging into MLflow. We therefore reinitialize the Trainer with the loaded model and training arguments and record the performance on the training and evaluation data into MLflow. 

In [14]:
trainer = Trainer(
    model=loaded_trial,
    args=loaded_arguments,
    train_dataset=tokenized_training,
    eval_dataset=tokenized_validation,
    compute_metrics=compute_metrics
)

In [15]:
trainer.train()

***** Running training *****
  Num examples = 26822
  Num Epochs = 10
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 4
  Total optimization steps = 4190
Trainer is attempting to log a value of "{0: '4A1a1', 1: '4A1a2', 2: '4A1b1', 3: '4A1b2', 4: '4A1b3', 5: '4A2a1', 6: '4A2a2', 7: '4A2a3', 8: '4A2a4', 9: '4A2b1', 10: '4A2b2', 11: '4A2b3', 12: '4A2b4', 13: '4A2b5', 14: '4A2b6', 15: '4A3a1', 16: '4A3a2', 17: '4A3a3', 18: '4A3a4', 19: '4A3b1', 20: '4A3b4', 21: '4A3b6', 22: '4A4a1', 23: '4A4a2', 24: '4A4a3', 25: '4A4a4', 26: '4A4a5', 27: '4A4a6', 28: '4A4a7', 29: '4A4a8', 30: '4A4b3', 31: '4A4b4', 32: '4A4b5', 33: '4A4b6', 34: '4A4c1', 35: '4A4c2', 36: '4A4c3'}" for key "id2label" as a parameter. MLflow's log_param() only accepts values no longer than 250 characters so we dropped this attribute.
Trainer is attempting to log a value of "{'4A1a1': 0, '4A1a2': 1, '4A1b1': 2, '4A1b2': 3, '4A1b3'

Epoch,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
0,No log,0.060143,0.6686,0.819636,0.576877
1,0.033700,0.060942,0.675154,0.824253,0.588796
2,0.027500,0.063044,0.672077,0.82351,0.583135
3,0.022000,0.064608,0.673414,0.826925,0.583433
4,0.018300,0.065929,0.675896,0.829078,0.585518
5,0.015200,0.06708,0.676943,0.829248,0.58671
6,0.015200,0.067927,0.678137,0.830263,0.588498
7,0.012700,0.068793,0.679162,0.83103,0.587306
8,0.011200,0.069479,0.677809,0.832044,0.584625
9,0.009900,0.069436,0.677673,0.83096,0.587902


***** Running Evaluation *****
  Num examples = 3356
  Batch size = 8
Saving model checkpoint to ./results/checkpoint-419
Configuration saved in ./results/checkpoint-419/config.json
Model weights saved in ./results/checkpoint-419/pytorch_model.bin
Deleting older checkpoint [results/checkpoint-2316] due to args.save_total_limit
***** Running Evaluation *****
  Num examples = 3356
  Batch size = 8
Saving model checkpoint to ./results/checkpoint-838
Configuration saved in ./results/checkpoint-838/config.json
Model weights saved in ./results/checkpoint-838/pytorch_model.bin
Deleting older checkpoint [results/checkpoint-2702] due to args.save_total_limit
***** Running Evaluation *****
  Num examples = 3356
  Batch size = 8
Saving model checkpoint to ./results/checkpoint-1257
Configuration saved in ./results/checkpoint-1257/config.json
Model weights saved in ./results/checkpoint-1257/pytorch_model.bin
Deleting older checkpoint [results/checkpoint-419] due to args.save_total_limit
***** Runni

TrainOutput(global_step=4190, training_loss=0.01836893774729071, metrics={'train_runtime': 930.1294, 'train_samples_per_second': 288.368, 'train_steps_per_second': 4.505, 'total_flos': 3471854254794600.0, 'train_loss': 0.01836893774729071, 'epoch': 10.0})

Since ideally identifying optimal hyperparameters would make us confident to then move forward to record model performance on our held-out testing data, let's go ahead and generate predictions on a pseudo-test set and log the computed performance metrics. 

In [16]:
predictions = trainer.predict(tokenized_testing)

***** Running Prediction *****
  Num examples = 343
  Batch size = 8


In [17]:
pred_metrics = predictions.metrics
mlflow.log_metrics(pred_metrics)

Let's resave our model following this additional training into the temporary directory we created earlier and then use the `model_loader_hf` script we established within the base transformers notebook to log our model as a custom `mlflow.pyfunc` model.

In [19]:
loaded_trial.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

Configuration saved in /tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/config.json
Model weights saved in /tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/pytorch_model.bin
tokenizer config file saved in /tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/tokenizer_config.json
Special tokens file saved in /tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/special_tokens_map.json


('/tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/tokenizer_config.json',
 '/tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/special_tokens_map.json',
 '/tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/vocab.json',
 '/tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/merges.txt',
 '/tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/added_tokens.json',
 '/tmp/tmpv4uapgd_/./results/run-41b8700e/checkpoint-4190/tokenizer.json')

In [21]:
mlflow.pyfunc.log_model("best_model", 
                            data_path=model_path, 
                            code_path=["./pyfiles/model_loader_hf.py"], 
                            loader_module="model_loader_hf",
                            conda_env="conda.yaml")

mlflow.log_artifact("hf_raytune_example_4_extended.ipynb")
mlflow.log_artifact("./pyfiles/helpers.py")

In [22]:
mlflow.end_run()

By ending the model run we've now successfully incorperated hyperparameter tuning via the TPE Bayesian optimization approach from initial parameter exploration all the way to generating predictions on held-out data. 