<a href="https://colab.research.google.com/github/ManjunathAdi/LLMs/blob/main/MLflow_Fine_tune_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To build an end-to-end pipeline for fine-tuning an LLM (Large Language Model) with integration into MLflow for automated experiment tracking, metrics logging, and model versioning, we'll focus on using a pre-trained LLM (like GPT-2 or BERT) and fine-tune it on a custom dataset. The goal is to automate the whole pipeline: from loading and preprocessing the dataset, to fine-tuning the model, tracking experiments, logging metrics, versioning the fine-tuned model, and deploying it using MLflow’s serving capabilities.

### Steps:

* Dataset Preparation: Load and preprocess a custom dataset for fine-tuning.
* Fine-tuning: Use Hugging Face’s Trainer class for fine-tuning the LLM.
* Automated Experiment Tracking: Log hyperparameters, metrics, and artifacts using MLflow.
* Model Versioning: Register and version the fine-tuned model in MLflow’s Model Registry.
* Deployment: Deploy the fine-tuned LLM using MLflow’s serving capabilities.
* Logging Metrics: Automatically log metrics (like evaluation loss, accuracy, etc.) during the fine-tuning process.

# Libraries Installation

Before starting, ensure you have MLflow, Hugging Face Transformers, PyTorch, and Datasets installed:

In [2]:
#!pip install mlflow transformers torch datasets


# Fine-tuning an LLM with MLflow

### Step 1: Preprocess Custom Dataset

In [3]:
import mlflow
from transformers import AutoTokenizer, GPTNeoForCausalLM, Trainer, TrainingArguments
from datasets import load_dataset

# Preprocess dataset
def preprocess_data():
    dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
    tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")

    # Add padding token to the tokenizer
    # GPT-Neo models typically use the eos_token as the pad_token
    tokenizer.pad_token = tokenizer.eos_token

    # Tokenize dataset
    def tokenize_function(examples):
        return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

    tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
    return tokenized_datasets, tokenizer

# Set up MLflow experiment
mlflow.set_experiment("Fine-Tuning-GPT-Neo-Experiment")


2024/10/09 13:53:17 INFO mlflow.tracking.fluent: Experiment with name 'Fine-Tuning-GPT-Neo-Experiment' does not exist. Creating a new experiment.


<Experiment: artifact_location='file:///content/mlruns/302910649505589496', creation_time=1728481997992, experiment_id='302910649505589496', last_update_time=1728481997992, lifecycle_stage='active', name='Fine-Tuning-GPT-Neo-Experiment', tags={}>

* Dataset Loading: We load the WikiText-2 dataset using load_dataset() from Hugging Face.
* Preprocessing: The dataset is tokenized using GPT-Neo’s tokenizer (EleutherAI/gpt-neo-125M). We truncate and pad sequences to a maximum length of 128 tokens.
* MLflow Experiment Setup: Set up the MLflow experiment under the name Fine-Tuning-GPT-Neo-Experiment.

# Step 2: Define the Fine-tuning Pipeline

In [4]:
def fine_tune_gpt_neo():
    # Preprocess the data
    tokenized_datasets, tokenizer = preprocess_data()

    # Initialize the GPT-Neo model
    model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")

    # Define training arguments
    training_args = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=128,
        num_train_epochs=1,
        weight_decay=0.01,
        save_total_limit=2,
    )

    # Initialize Hugging Face Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_datasets["train"],
        eval_dataset=tokenized_datasets["validation"],
    )

    return trainer, model, tokenizer

# Fine-tune the model
trainer, model, tokenizer = fine_tune_gpt_neo()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/733k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/6.36M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/657k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/4358 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/36718 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3760 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/357 [00:00<?, ?B/s]

Map:   0%|          | 0/4358 [00:00<?, ? examples/s]

Map:   0%|          | 0/36718 [00:00<?, ? examples/s]

Map:   0%|          | 0/3760 [00:00<?, ? examples/s]

config.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/526M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]



* Model Initialization: We initialize the GPT-Neo model with GPTNeoForCausalLM from Hugging Face, specifically the EleutherAI/gpt-neo-125M version.
* Training Arguments: Training parameters such as learning rate, batch size, number of epochs, and weight decay are set. The evaluation strategy is set to "epoch", meaning evaluation will occur after every epoch.

In [5]:
def compute_loss(model, inputs):
    """
    Custom compute_loss function to calculate the loss.
    """
    # Shift the labels to the right by one position for causal language modeling.
    # Create labels from input_ids shifted to the right
    labels = inputs["input_ids"].clone()

    # Set labels to -100 where they should be ignored (padding or special tokens)
    # Assuming you have a padding token ID in your tokenizer
    labels[labels == tokenizer.pad_token_id] = -100

    # The model returns a tuple, where the first element is the loss
    outputs = model(**inputs, labels=labels)

    # Instead of directly accessing outputs.loss, calculate the loss manually
    # using the logits and labels:
    import torch
    from torch.nn import CrossEntropyLoss

    # Shift the logits and labels for alignment
    shift_logits = outputs.logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()

    # Flatten the tokens
    loss_fct = CrossEntropyLoss()
    loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

    return loss  # Return the calculated loss

# Update the trainer's compute_loss function
trainer.compute_loss = compute_loss

# Step 3: Integrate MLflow for Tracking Experiments and Metrics

In [6]:

def run_fine_tuning_experiment():
    #trainer, model, tokenizer = fine_tune_gpt_neo()

    # Start an MLflow run
    with mlflow.start_run() as run:  # Assign the run object to a variable
        # Log parameters
        mlflow.log_param("model_name", "EleutherAI/gpt-neo-125M")
        #mlflow.log_param("learning_rate", 2e-5)
        mlflow.log_param("learning_rate", 2e-2)
        mlflow.log_param("epochs", 1)

        # Fine-tune the model and log metrics
        trainer.train()

        # Evaluate the model and log metrics
        eval_metrics = trainer.evaluate()

        # Check if 'eval_loss' is present in eval_metrics before logging
        if 'eval_loss' in eval_metrics:
            mlflow.log_metrics({
                "eval_loss": eval_metrics["eval_loss"]
            })
        else:
            print("Warning: 'eval_loss' not found in evaluation metrics.")

        # Log the fine-tuned model
        mlflow.pytorch.log_model(model, "model")

        # Register the model in MLflow Model Registry
        # Get the actual run ID from the run object
        model_uri = f"runs:/{run.info.run_id}/model"
        mlflow.register_model(model_uri, "FineTuned_GPT_Neo_Model")

        # Log additional artifacts (tokenizer)
        tokenizer.save_pretrained("./tokenizer")
        mlflow.log_artifact("./tokenizer")

run_fine_tuning_experiment()

2024/10/09 13:53:32 ERROR mlflow.utils.async_logging.async_logging_queue: Run Id d48718ac076748fa83fb5ad2d7117b87: Failed to log run data: Exception: Changing param values is not allowed. Param with key='learning_rate' was already logged with value='0.02' for run ID='d48718ac076748fa83fb5ad2d7117b87'. Attempted logging new value '2e-05'.


Epoch,Training Loss,Validation Loss
1,No log,No log




Successfully registered model 'FineTuned_GPT_Neo_Model'.
Created version '1' of model 'FineTuned_GPT_Neo_Model'.


* MLflow Run: We initiate an MLflow run using mlflow.start_run(). This context ensures all tracking operations (logging of parameters, metrics, and artifacts) are associated with a specific run.
* Logging Parameters: Hyperparameters such as model name, learning rate, and the number of epochs are logged using mlflow.log_param().
* Training and Logging Metrics: The trainer.train() method fine-tunes the model, and evaluation metrics such as evaluation loss are logged using mlflow.log_metrics().
* Model Logging: The fine-tuned model is logged as a PyTorch model with mlflow.pytorch.log_model().
* Model Versioning: The fine-tuned model is registered in MLflow’s Model Registry. This allows us to version the model for future deployments.

# Step 4: Deploy the Fine-Tuned Model Using MLflow

To deploy the fine-tuned model, MLflow offers a simple command-line interface to serve the model as an API:

In [None]:
mlflow models serve -m models:/FineTuned_GPT_Neo_Model/1 -p 5000


MLflow Model Serving: This command serves the model on port 5000 as a REST API. By calling the endpoint, you can send inputs (text) and receive outputs (predictions) from the fine-tuned model.

# Step 5: Logging Metrics During Fine-Tuning

In [9]:
# Log metrics such as loss during evaluation
eval_metrics = trainer.evaluate()

# Check if the metric values exist and are not None before logging
metrics_to_log = {}
if "eval_loss" in eval_metrics and eval_metrics["eval_loss"] is not None:
    metrics_to_log["eval_loss"] = eval_metrics["eval_loss"]
if "accuracy" in eval_metrics and eval_metrics["accuracy"] is not None:
    metrics_to_log["eval_accuracy"] = eval_metrics["accuracy"]

# Only log metrics if the dictionary is not empty
if metrics_to_log:
    mlflow.log_metrics(metrics_to_log)

Explanation of the Process

* Automated Experiment Tracking:
MLflow tracks the entire fine-tuning experiment. Hyperparameters, evaluation metrics, and artifacts (such as the tokenizer and model configuration) are automatically logged into the MLflow UI for easy tracking and comparison of runs.
* Logging Metrics:
Metrics like evaluation loss and accuracy are logged at each evaluation stage during fine-tuning. These metrics are visualized in the MLflow dashboard, allowing you to monitor the model’s performance throughout training.
* Model Versioning:
Each fine-tuned model is registered in MLflow’s Model Registry, enabling easy versioning and management of models. This simplifies production model deployment, where different versions of the fine-tuned model can be served based on their performance.
* Reduced Deployment Overhead:
MLflow’s built-in model serving functionality drastically reduces deployment overhead. The fine-tuned LLM can be deployed with a single command, eliminating the need to manually set up serving environments or build Docker containers.
* Reproducibility:
By logging all artifacts (model, tokenizer, training configuration), MLflow ensures that the fine-tuning process is fully reproducible. This is crucial for future experiments and model retraining.
