### Fine-tuning transformer model with enhanced model management using MLFlow

- Training Cycle
- Model Logging and Management
- Inference and Deployment

In [28]:
# Disable tokenizers warnings when constructing pipelines
%env TOKENIZERS_PARALLELISM=false

import warnings

# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category=UserWarning)

env: TOKENIZERS_PARALLELISM=false


### Preparing the dataset for the model

In [29]:
import evaluate
import numpy as np
from datasets import load_dataset
from transformers import (
  AutoModelForSequenceClassification,
  AutoTokenizer,
  Trainer,
  TrainingArguments,
  pipeline,
)

import mlflow

# Load the "sms_spam" dataset.
sms_dataset = load_dataset("sms_spam")

# Split train/test by an 8/2 ratio.
sms_train_test = sms_dataset["train"].train_test_split(test_size=0.2)
train_dataset = sms_train_test["train"]
test_dataset = sms_train_test["test"]

### Tokenization Process

In [30]:
# Load the tokenizer for "distilbert-base-uncased" model.
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")


def tokenize_function(examples):
  # Pad/truncate each text to 512 tokens. Enforcing the same shape
  # could make the training faster.
  return tokenizer(
      examples["sms"],
      padding="max_length",
      truncation=True,
      max_length=128,
  )


seed = 22

# Tokenize the train and test datasets
train_tokenized = train_dataset.map(tokenize_function)
train_tokenized = train_tokenized.remove_columns(["sms"]).shuffle(seed=seed)

test_tokenized = test_dataset.map(tokenize_function)
test_tokenized = test_tokenized.remove_columns(["sms"]).shuffle(seed=seed)

Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4459/4459 [00:01<00:00, 2471.93 examples/s]
Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1115/1115 [00:00<00:00, 2227.46 examples/s]


### Label Mapping and Model Initialization

In [31]:
# Set the mapping between int label and its meaning.
id2label = {0: "ham", 1: "spam"}
label2id = {"ham": 0, "spam": 1}

# Acquire the model from the Hugging Face Hub, providing label and id mappings so that both we and the model can 'speak' the same language.
model = AutoModelForSequenceClassification.from_pretrained(
  "distilbert-base-uncased",
  num_labels=2,
  label2id=label2id,
  id2label=id2label,
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Setting up Evaluation Metrics

In [32]:
# Define the target optimization metric
metric = evaluate.load("accuracy")


# Define a function for calculating our defined target optimization metric during training
def compute_metrics(eval_pred):
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=-1)
  return metric.compute(predictions=predictions, references=labels)

Downloading builder script: 4.20kB [00:00, 7.06MB/s]


### Configuring the training environment

In [33]:
# Checkpoints will be output to this `training_output_dir`.
training_output_dir = "/tmp/sms_trainer"
training_args = TrainingArguments(
  output_dir=training_output_dir,
  eval_strategy='epoch',
  per_device_train_batch_size=8,
  per_device_eval_batch_size=8,
  logging_steps=8,
  num_train_epochs=3,
)

# Instantiate a `Trainer` instance that will be used to initiate a training run.
trainer = Trainer(
  model=model,
  args=training_args,
  train_dataset=train_tokenized,
  eval_dataset=test_tokenized,
  compute_metrics=compute_metrics,
)

### Setting the tracking URI

In [34]:
mlflow.set_tracking_uri("http://127.0.0.1:5000")

### Creating MLFlow Experiment, Initiating MLFlow Run, and Monitoring the training progress

In [37]:
# Pick a name that you like and reflects the nature of the runs that you will be recording to the experiment.
mlflow.set_experiment("Spam Classifier fine tuning")
with mlflow.start_run() as run:
  trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.1973,0.032793,0.992825
2,0.0001,0.044225,0.993722
3,0.0001,0.043742,0.994619


üèÉ View run sassy-crab-361 at: http://127.0.0.1:5000/#/experiments/804052913340005351/runs/38e365b5613b45ce9407b26df438d373
üß™ View experiment at: http://127.0.0.1:5000/#/experiments/804052913340005351


### Creating a Pipeline with the Fine-Tuned Model

In [38]:
# If you're going to run this on something other than a Macbook Pro, change the device to the applicable type. "mps" is for Apple Silicon architecture in torch.

tuned_pipeline = pipeline(
  task="text-classification",
  model=trainer.model,
  batch_size=8,
  tokenizer=tokenizer,
  device="cuda",
)

Device set to use cuda


### Validating Fine-tuned model

In [39]:
# Perform a validation of our assembled pipeline that contains our fine-tuned model.
quick_check = (
  "I have a question regarding the project development timeline and allocated resources; "
  "specifically, how certain are you that John and Ringo can work together on writing this next song? "
  "Do we need to get Paul involved here, or do you truly believe, as you said, 'nah, they got this'?"
)

tuned_pipeline(quick_check)

[{'label': 'ham', 'score': 0.9999444484710693}]

### Model Configuration and Signature Inference

In [40]:
# Define a set of parameters that we would like to be able to flexibly override at inference time, along with their default values
model_config = {"batch_size": 8}

# Infer the model signature, including a representative input, the expected output, and the parameters that we would like to be able to override at inference time.
signature = mlflow.models.infer_signature(
  ["This is a test!", "And this is also a test."],
  mlflow.transformers.generate_signature_output(
      tuned_pipeline, ["This is a test response!", "So is this."]
  ),
  params=model_config,
)

### Model Logging

In [41]:
# Log the pipeline to the existing training run
with mlflow.start_run(run_id=run.info.run_id):
  model_info = mlflow.transformers.log_model(
      transformers_model=tuned_pipeline,
      name="fine_tuned",
      signature=signature,
      input_example=["Pass in a string", "And have it mark as spam or not."],
      model_config=model_config,
  )

Device set to use cuda:0


üèÉ View run sassy-crab-361 at: http://127.0.0.1:5000/#/experiments/804052913340005351/runs/38e365b5613b45ce9407b26df438d373
üß™ View experiment at: http://127.0.0.1:5000/#/experiments/804052913340005351


### Loading and Testing the Model from MLflow

In [42]:
# Load our saved model in the native transformers format
loaded = mlflow.transformers.load_model(model_uri=model_info.model_uri)

# Define a test example that we expect to be classified as spam
validation_text = (
  "Want to learn how to make MILLIONS with no effort? Click HERE now! See for yourself! Guaranteed to make you instantly rich! "
  "Don't miss out you could be a winner!"
)

# validate the performance of our fine-tuning
loaded(validation_text)

Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 15/15 [00:07<00:00,  2.01it/s] 
2025/07/06 18:26:59 INFO mlflow.transformers: 'models:/m-476bbb2842dc487f868a2c70934fc805' resolved as 'mlflow-artifacts:/804052913340005351/models/m-476bbb2842dc487f868a2c70934fc805/artifacts'
Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 220.06it/s]
Device set to use cuda:0


[{'label': 'spam', 'score': 0.99962317943573}]