## Introduction to MLflow and Transformers

Welcome to our first tutorial on leveraging the power of **Transformers** with **MLflow**. This tutorial is designed for beginners who are just getting started with machine learning workflows and model management. We will walk you through a simple example that demonstrates the integration of the Transformers library with MLflow, a platform that simplifies the machine learning lifecycle.

### What are Transformers?

Transformers are a type of deep learning model that have revolutionized the field of natural language processing (NLP). They are designed to handle sequential data and are particularly effective for tasks like language translation, text summarization, and question answering. The Transformers library, developed by [🤗 Hugging Face](https://huggingface.co/docs/transformers/index), provides a collection of state-of-the-art pre-trained models that you can use to perform a variety of NLP tasks.

### Why Combine MLflow with Transformers?

Combining MLflow with Transformers brings several benefits:

- **Experiment Tracking**: Easily log and compare parameters, metrics, and outputs from your Transformers models.
- **Model Management**: Keep track of different versions of your models, configured parameters, and their respective performance.
- **Reproducibility**: Record everything needed to reproduce a model prediction, from data preprocessing to model parameters.
- **Deployment**: Streamline the process of deploying your Transformers models to production.

### What Will You Learn?

In this tutorial, you will learn how to:

- Set up a simple text generation pipeline using the Transformers library.
- Log the model and its parameters using MLflow.
- Infer the input and output signature of the model automatically.
- Simulate serving the model using MLflow and make predictions with it.

By the end of this tutorial, you will have a clear understanding of how MLflow can enhance your workflow with Transformers models, making it easier to track, manage, and deploy your NLP applications backed by the fantastic Hugging Face ecosystem.

Let's get started!


## Imports and Pipeline configuration

In this first section, we are setting up our environment and configuring aspects of the transformers pipeline that we'll be using to generate a text response from the LLM. 

### Steps

- We import the necessary libraries: transformers for building our NLP model and mlflow for model tracking and management.
- We then define the task for our pipeline, which in this case is `text2text-generation`. This task involves generating new text based on the input text.
- Next, we create a generation_pipeline using the `pipeline` function from the Transformers library. This pipeline is configured to use the `declare-lab/flan-alpaca-base model`, which is a pre-trained model suitable for text generation.
- For the purposes of generating a signature later on, as well as having a visual indicator of the expected input data to our model when loading as a `pyfunc`, we next set up an input_example that contains sample prompts.
- Finally, we define parameters that will be used to control the behavior of the model during inference, such as the maximum length of the generated text and whether to sample multiple times.

### Understanding Pipelines

Pipelines are a high-level abstraction provided by the Transformers library that simplifies the process of using models for inference. They encapsulate the complexity of the underlying code, offering a straightforward API for a variety of tasks, such as text classification, question answering, and in our case, text generation.

#### The `pipeline()` function

The pipeline() function is a versatile tool that can be used to create a pipeline for any supported task. When you specify a task, the function returns a pipeline object tailored for that task, constructing the required calls to sub-components (a tokenizer, encoder, generative model, etc.) in the order needed to fulfill the needs of the specified task. This abstraction dramatically simplifies the code required to use these models and their respective components. 

#### Task-Specific Pipelines
In addition to the general pipeline() function, there are task-specific pipelines for different domains like audio, computer vision, and natural language processing. These specialized pipelines are optimized for their respective tasks and can provide additional convenience and functionality.

#### Benefits of Using Pipelines
Using pipelines has several advantages:

- **Simplicity**: You can perform complex tasks with a minimal amount of code.
- **Flexibility**: You can specify different models and configurations to customize the pipeline for your needs.
- **Efficiency**: Pipelines handle batching and dataset iteration internally, which can lead to performance improvements.

Due to the utility and simple, high-level API, MLflow's `transformers` implementation uses the `pipeline` abstraction by default (although it can support component-only mode as well). 

In [1]:
import transformers

import mlflow

task = "text2text-generation"

generation_pipeline = transformers.pipeline(
    task=task,
    model="declare-lab/flan-alpaca-large",
)

input_example = ["prompt 1", "prompt 2", "prompt 3"]

parameters = {"max_length": 512, "do_sample": True, "temperature": 0.4}

Downloading (…)lve/main/config.json:   0%|          | 0.00/787 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.50k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

## Introduction to Model Signatures in MLflow

In the realm of machine learning, ensuring that models receive and produce the expected data types and structures is crucial for reliable predictions. This is where model signatures come into play, serving as a contract for the model's inputs, outputs, and parameters. MLflow, a platform for the machine learning lifecycle, includes a feature for defining model signatures, which helps in standardizing and enforcing the correct use of ML models.

### Understanding Model Signatures

A model signature in MLflow describes the schema for inputs, outputs, and parameters of an ML model. It is a blueprint that details the expected data types and shapes, facilitating a clear interface for model usage. Signatures are particularly useful as they are:

- Displayed in MLflow's UI for easy reference.
- Employed by MLflow's deployment tools to validate inputs during inference.
- Stored in a standardized JSON format alongside the model's metadata.

### The Role of Signatures in Code

In the following section, we are using MLflow to infer the signature of a machine learning model. This involves specifying an input example, generating a model output example, and defining any additional inference parameters. The resulting signature is used to validate future inputs and document the expected data formats.

### Types of Model Signatures

Model signatures can be:

- **Column-based**: Suitable for models that operate on tabular data, with each column having a specified data type and optional name.
- **Tensor-based**: Designed for models that take tensors as inputs and outputs, with each tensor having a specified data type, shape, and optional name.
- **With Params**: Some models require additional parameters for inference, which can also be included in the signature.

For the transformers flavor, all input types are of the Column-based type (referred to within MLflow as `ColSpec` types).

### Signature Enforcement

MLflow enforces the signature at the time of model inference, ensuring that the provided input and parameters match the expected schema. If there's a mismatch, MLflow will raise an exception or issue a warning, depending on the nature of the mismatch.

In [2]:
signature = mlflow.models.infer_signature(
    input_example,
    mlflow.transformers.generate_signature_output(generation_pipeline, input_example),
    parameters,
)



## Setting the tracking server

In order to view the results in our tracking server (for the purposes of this tutorial, we've started a local tracking server at this url)

We can start an instance of the MLflow server locally by running the following from a terminal to start the tracking server:

``` bash
    mlflow server --host 127.0.0.1 --port 8080
```

With the server started, the following code will ensure that all experiments, runs, models, parameters, and metrics that we log are being tracked within that server instance (which also provides us with the MLflow UI when navigating to that url address in a browser).

In [3]:
mlflow.set_tracking_uri("http://127.0.0.1:8080")

## Logging the Transformers Model with MLflow

Once the model signature has been established, the next step is to log the model within an MLflow run. This process registers the model along with its metadata, including the signature, within the MLflow tracking system. It allows for versioning, tracking, and managing the lifecycle of the machine learning model. The [mlflow.transformers.log_model](https://www.mlflow.org/docs/latest/python_api/mlflow.transformers.html#mlflow.transformers.log_model) function is specifically designed to handle model components and pipelines from the `transformers` library, making it straightforward to log these models with their associated metadata.


In [4]:
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model=generation_pipeline,
        artifact_path="text_generator",
        input_example=input_example,
        signature=signature,
    )

Downloading (…)solve/main/README.md:   0%|          | 0.00/5.84k [00:00<?, ?B/s]



## Loading the Text Generation Model

In this section, we initialize our text generation model using MLflow's pyfunc module. The pyfunc module is a generic wrapper for Python functions, and in the context of MLflow, it allows us to load our ML model as a regular Python function. This is particularly useful for models that have been logged or registered using MLflow, as it abstracts away the details of how the model was trained and serialized.

By calling [mlflow.pyfunc.load_model](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.load_model), we are able to load our previously logged model using its unique model URI. This URI points to the location where the model artifacts are stored, and MLflow takes care of deserializing the model and its dependencies so that it can be used for prediction.

Once loaded, `sentence_generator` acts as a regular Python function that can be used to generate text based on the prompts provided to it. This function encapsulates the model's inference capabilities and allows us to seamlessly generate predictions without having to manually preprocess the input or postprocess the output and allows a fully encapsulated function that can be deployed anywhere.

In [5]:
sentence_generator = mlflow.pyfunc.load_model(model_info.model_uri)

Downloading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]

2023/11/08 19:41:05 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

Formatting Predictions for Tutorial Readability
Please note that the following function, `format_predictions`, is designed specifically for enhancing the readability of model predictions within this Jupyter Notebook environment. It **is not a standard component** of the model's inference pipeline and is **not necessary for the actual usage** of the model in production or other applications.

The function takes a list of prediction strings and formats each one by:

- Splitting the prediction text into individual sentences.
- Ensuring that each sentence is properly stripped of leading/trailing whitespace and ends with a period.
- Joining the sentences back together with newline characters for clear visual separation.

This formatting is particularly useful in a notebook setting where we want to present the output in a more readable and presentable manner for demonstration purposes.

In [6]:
def format_predictions(predictions):
    """
        Function for formatting the output for readability in a Jupyter Notebook
    """
    formatted_predictions = []
    
    for prediction in predictions:
        # Split the output into sentences, ensuring we don't split on abbreviations or initials
        sentences = [sentence.strip() + ('.' if not sentence.endswith('.') else '') for sentence in prediction.split('. ') if sentence]
        
        # Join the sentences with a newline character
        formatted_text = '\n'.join(sentences)
        
        # Add the formatted text to the list
        formatted_predictions.append(formatted_text)
    
    return formatted_predictions

## Generating Predictions with Custom Parameters

In the code cell below, we invoke the sentence_generator model to produce predictions for a set of prompts. This demonstration includes:

- A request for help in choosing between hiking and kayaking for a weekend activity.
- A prompt asking for a joke related to hiking.

We call the `predict` method on the `sentence_generator` pyfunc model with a list of string prompts. To influence the generation process, we override the `temperature` parameter, which affects the randomness of the generated text. A lower temperature leads to more predictable and conservative outputs, while a higher temperature fosters more varied and creative responses.

In this example, we are explicitly setting the `temperature` parameter for the prediction call. Other parameters that were defined at the time of model logging will use their default values unless they are also overridden in the params argument.

The `predictions` variable will hold the model's output for each input prompt. We can format these outputs for better readability in subsequent steps.

In [7]:
predictions = sentence_generator.predict(
    data=[
        "I can't decide whether to go hiking or kayaking this weekend. Can you help me decide?",
        "Please tell me a joke about hiking."
    ], 
    params={"temperature": 0.7}
)



## Formatting and Displaying Predictions

After generating predictions, we aim to present them in a readable format within our Jupyter Notebook. The format_predictions function is applied to the list of prediction strings to enhance their readability. This function processes each prediction by:

- Splitting the output into individual sentences.
- Ensuring proper sentence termination with periods.
- Joining the sentences with newline characters for clear visual separation.

The formatted predictions are then printed out, with each response clearly associated with its corresponding prompt.

In [8]:
# Format each prediction for notebook readability
formatted_predictions = format_predictions(predictions)

for i, formatted_text in enumerate(formatted_predictions):
    print(f"Response to prompt {i+1}:\n{formatted_text}\n")

Response to prompt 1:
Hiking is a great way to get some exercise, but it can be tiring and exhausting.
Kayaking is a great way to get in some fresh air and experience the outdoors.
It can be more enjoyable and relaxing than hiking, but it requires more commitment and can be a bit more expensive.

Response to prompt 2:
What do you call a hiker who never stops? A slob!.



## Closing Remarks

This demonstration showcases the flexibility and power of the model in generating contextually relevant and creative text responses. By formatting the outputs, we ensure that the results are not only accurate but also presented in a manner that is easy to read and understand, enhancing the overall user experience within this Jupyter Notebook environment.