## Introduction to Conversational AI with MLflow and DialoGPT

In this tutorial, we delve into the fascinating world of conversational AI by integrating [Microsoft's DialoGPT](https://huggingface.co/microsoft/DialoGPT-medium), a conversational model, with MLflow's transformers flavor. This guide is designed for those who have a grasp of machine learning workflows and are keen on exploring the realms of natural language processing, specifically in building and managing conversational AI models. We will demonstrate how to utilize MLflow for logging, managing, and deploying a sophisticated chatbot model provided by the [🤗 Hugging Face](https://huggingface.co/) [Transformers](https://huggingface.co/transformers) library.

### What is DialoGPT?

DialoGPT is a conversational model developed by Microsoft, fine-tuned on a large dataset of dialogues to generate human-like responses. It's part of the GPT (Generative Pretrained Transformer) family of models, known for their effectiveness in natural language understanding and generation. DialoGPT is specifically optimized for generating conversational responses, making it an ideal choice for building chatbots.

### Why MLflow with DialoGPT?

Integrating MLflow with DialoGPT offers several benefits:

- **Experiment Tracking**: Keep track of various configurations and performance metrics of Conversational models across different experiments.
- **Model Management**: Create a centralized repository for different versions of the chatbot models along with their configurations.
- **Reproducibility**: Record all necessary components to reproduce the conversational AI model's behavior.
- **Deployment**: Streamline the process of deploying Conversational models into production environments.

### Learning Objectives

In this tutorial, you will:

- Set up a conversational AI **pipeline** using DialoGPT from the Transformers library.
- **Log** the DialoGPT model along with its configurations using MLflow.
- Infer the input and output **signature** of the DialoGPT model.
- **Load** a stored DialoGPT model from MLflow for interactive usage.
- Interact with the chatbot model and understand the nuances of conversational AI.

By the end of this tutorial, you will have a solid understanding of how to manage and deploy conversational AI models with MLflow, enhancing your capabilities in the field of natural language processing.

Let's embark on this journey to explore conversational AI with MLflow and DialoGPT!


### Setting Up the Conversational Pipeline

In this section, we begin by importing the necessary libraries: `transformers` and `mlflow`. The `transformers` library, developed by Hugging Face, provides us with a wide range of pre-trained models, including the DialoGPT model for conversational AI. MLflow, on the other hand, is a platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.

We then initialize our conversational pipeline using the `transformers.pipeline` function. This function simplifies the process of deploying models for different tasks. Here, we specify the model `"microsoft/DialoGPT-medium"`, which is a medium-sized variant of the DialoGPT model, well-suited for general-purpose conversational tasks.

After setting up the conversational pipeline, we use MLflow to infer the signature of our model. The signature defines the input and output schema of the model, which is crucial for understanding the data requirements and the format of the model's predictions. We do this by providing a sample input, `"Hi there, chatbot!"`, and generating a corresponding output using the `mlflow.transformers.generate_signature_output` function. 

In [1]:
import transformers

import mlflow

conversational_pipeline = transformers.pipeline(model="microsoft/DialoGPT-medium")

signature = mlflow.models.infer_signature(
    "Hi there, chatbot!",
    mlflow.transformers.generate_signature_output(conversational_pipeline, "Hi there, chatbot!"),
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


## Setting the tracking server and creating an experiment

In order to view the results in our tracking server (for the purposes of this tutorial, we've started a local tracking server at this url)

We can start an instance of the MLflow server locally by running the following from a terminal to start the tracking server:

``` bash
    mlflow server --host 127.0.0.1 --port 8080
```

With the server started, the following code will ensure that all experiments, runs, models, parameters, and metrics that we log are being tracked within that server instance (which also provides us with the MLflow UI when navigating to that url address in a browser).

After setting the tracking url, we create a new MLflow Experiment to store the run we're about to create in. 

In [2]:
mlflow.set_tracking_uri("http://127.0.0.1:8080")

mlflow.set_experiment("Conversational")

<Experiment: artifact_location='mlflow-artifacts:/664266092508187059', creation_time=1699630163555, experiment_id='664266092508187059', last_update_time=1699630163555, lifecycle_stage='active', name='Conversational', tags={}>

### Logging the Model with MLflow

Having set up our conversational pipeline, the next step involves leveraging MLflow for model logging. This process is crucial for versioning, tracking, and managing our model in a more organized manner.

We initiate this by starting an MLflow run using `with mlflow.start_run()`. This command begins a new MLflow run, under which all subsequent MLflow commands will log data. Each run is an isolated set of code executions and logs its own parameters, metrics, artifacts, etc.

Within this run, we use `mlflow.transformers.log_model` to log our conversational model. This function is specifically tailored for logging transformer models and takes several parameters:

- `transformers_model`: Here, we pass our previously created `conversational_pipeline`.
- `artifact_path`: This is the location within the MLflow run where our model will be stored. We name it `"chatbot"` to reflect its functionality.
- `task`: We specify `"conversational"` to indicate the nature of our model, which is a required element to construct a pipeline if we save our model as a collection of components and wish to later load it as a constructed `pipeline` object.
- `signature`: The model signature we inferred earlier is passed here. It ensures that MLflow knows the expected input and output format of our model.
- `input_example`: Providing an example input, such as `"A clever and witty question"`, can be helpful for understanding the model's usage and for later testing purposes.

In [3]:
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model=conversational_pipeline,
        artifact_path="chatbot",
        task="conversational",
        signature=signature,
        input_example="A clever and witty question",
    )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


### Loading and Interacting with the Chatbot Model

After successfully logging our conversational model with MLflow, the next step is to load and interact with it. This is where we see the practical application of our model in a conversational context.

We begin by loading the model using `mlflow.pyfunc.load_model`. This function is a part of MLflow's Python function (`pyfunc`) flavor, which provides a generic interface for Python models. We pass `model_uri=model_info.model_uri` to specify the location of our logged model. The `model_info.model_uri` contains the URI where our model is stored in MLflow, ensuring that we are loading the exact version of the model we just logged.

Once the model is loaded into the variable `chatbot`, we can start interacting with it as if it were a regular Python function. To demonstrate this, we use the `predict` method of our `chatbot` object:

- We ask our chatbot, "What is the best way to get to Antarctica?" This question is passed as an argument to the `predict` method.
- The chatbot's response is then captured and printed. In this case, the response is: "I think you can get there by boat."

This interaction showcases the ease with which we can deploy and utilize MLflow-logged models in practical scenarios. The ability to load and use models seamlessly like this is a powerful feature of MLflow, making it an invaluable tool in the machine learning workflow, especially in scenarios requiring quick model deployment and testing.


In [4]:
chatbot = mlflow.pyfunc.load_model(model_uri=model_info.model_uri)

first = chatbot.predict("What is the best way to get to Antarctica?")

Downloading artifacts:   0%|          | 0/18 [00:00<?, ?it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2023/11/10 17:00:44 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [5]:
print(f"Response: {first}")

Response: I think you can get there by boat.


### Continuing the Conversation with the Chatbot

In this part of our tutorial, we continue our interaction with the chatbot, further exploring the capabilities of the MLflow `pyfunc` implementation for the `transformers` package. One of the key features of this `pyfunc` implementation for Conversational Pipelines is its ability to maintain conversational context or state. This means that the model remembers the flow of the conversation, allowing for more coherent and contextually relevant responses. This is a feature enabled only for this type of Pipeline. For any other `task` types, if state is required, you will have to manage that yourself with an appropriate prompt and response stateful implementation.

We proceed with a follow-up question to our chatbot: "What sort of boat should I use?" This question is designed to test the chatbot's ability to maintain the context of the conversation, which started with a question about traveling to Antarctica.

The response we receive is: "A boat that can go to Antarctica." This response, while somewhat obvious, demonstrates the model's ability to keep track of the conversation's topic and serves to illustrate the state management present in the MLflow implementation for this pipeline type. 

It's important to note the witty and somewhat facetious nature of the response. This characteristic is a direct result of the training data used for the DialoGPT model, which includes (heavily sanitized) conversational exchanges from Reddit. Reddit, known for its diverse and often humorous content, influences the style and tone of the responses generated by the model.

This interaction highlights the importance of understanding the source and nature of the training data when working with machine learning models. The training data not only determines the model's knowledge base but also its conversational style and tone. In practical applications, this understanding is crucial for setting appropriate expectations and for tailoring the model to suit specific conversational needs or scenarios.


In [6]:
second = chatbot.predict("What sort of boat should I use?")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [7]:
print(f"Response: {second}")

Response: A boat that can go to Antarctica.


## Conclusion and Key Takeaways

In this tutorial, we've explored the integration of MLflow with a conversational AI model, specifically using the DialoGPT model from Microsoft. We've covered several important aspects and techniques that are crucial for anyone looking to work with advanced machine learning models in a practical, real-world setting.

### Key Takeaways

1. **MLflow for Model Management**: We demonstrated how MLflow can be effectively used for managing and deploying machine learning models. The ability to log models, track experiments, and manage different versions of models is invaluable in a machine learning workflow.

2. **Conversational AI**: By using the DialoGPT model, we delved into the world of conversational AI, showcasing how to set up and interact with a conversational model. This included understanding the nuances of maintaining conversational context and the impact of training data on the model's responses.

3. **Practical Implementation**: Through practical examples, we showed how to log a model in MLflow, infer a model signature, and use the `pyfunc` model flavor for easy deployment and interaction. This hands-on approach is designed to provide you with the skills needed to implement these techniques in your own projects.

4. **Understanding Model Responses**: We emphasized the importance of understanding the nature of the model's training data. This understanding is crucial for interpreting the model's responses and for tailoring the model to specific use cases.

### Wrapping Up

As we conclude this tutorial, we hope that you have gained a deeper understanding of how to integrate MLflow with conversational AI models and the practical considerations involved in deploying these models. The skills and knowledge acquired here are not only applicable to conversational AI but also to a broader range of machine learning applications.

Remember, the field of machine learning is vast and constantly evolving. Continuous learning and experimentation are key to staying updated and making the most out of these exciting technologies.

Thank you for joining us in this journey through the world of MLflow and conversational AI. We encourage you to take these learnings and apply them to your own unique challenges and projects. Happy coding!
