## Introduction to Translation with Transformers and MLflow

In this tutorial, we delve into the world of language translation by leveraging the power of [Transformers](https://huggingface.co/docs/transformers/model_doc/whisper) and MLflow. This guide is crafted for practitioners with a grasp of machine learning concepts who seek to streamline their translation model workflows. We will showcase the use of MLflow to log, manage, and serve a cutting-edge translation model - the `google/flan-t5-base` from the [🤗 Hugging Face](https://huggingface.co/) library.

### What is `flan-t5-base`?

`flan-t5-base` is a versatile translation model developed by Google, capable of understanding and translating multiple languages. It is part of the T5 (Text-to-Text Transfer Transformer) models, designed to handle a variety of text-based tasks within one framework. This model is accessible through the Transformers library, which provides an array of pre-trained models for different natural language processing tasks.

### Why MLflow with `flan-t5-base`?

Combining MLflow with `flan-t5-base` offers numerous benefits:

- **Experiment Tracking**: Monitor and benchmark different configurations and performance metrics of the translation model.
- **Model Management**: Create a centralized model hub for various iterations of translation models along with their settings.
- **Reproducibility**: Record all the details necessary to replicate a specific translation task.
- **Deployment**: Streamline the process of deploying translation models into production settings.

### Learning Objectives

Throughout this tutorial, you will:

- Construct a translation **pipeline** using `flan-t5-base` from the Transformers library.
- **Log** the translation model and its configurations using MLflow.
- Determine the input and output **signature** of the translation model automatically.
- **Retrieve** a logged translation model from MLflow for direct interaction.
- Emulate the deployment of the translation model using MLflow's **pyfunc** model flavor for language translation tasks.

By the conclusion of this tutorial, you'll gain a thorough insight into managing and deploying translation models with MLflow, thereby enhancing your machine learning operations for language processing.

Let's embark on this journey of language translation with MLflow and `flan-t5-base`!


### Setting Up the Translation Environment

Before diving into the translation tasks, we need to set up our environment. This involves importing the necessary libraries and initializing the translation model and tokenizer. We will be using the `google/flan-t5-base` model, which is a part of the T5 family known for its effectiveness in translation tasks.

The following steps will be covered in this setup:

1. **Importing Libraries**: We will import the `transformers` library, which provides us with the translation model and tokenizer, as well as `mlflow` for model tracking and management.
2. **Initializing the Model**: The `google/flan-t5-base` model will be loaded from the Hugging Face model repository. This model has been pre-trained and is ready for translation tasks.
3. **Setting Up the Tokenizer**: The corresponding tokenizer for our model will be initialized. The tokenizer is responsible for converting text input into a format that the model can understand.
4. **Creating the Pipeline**: We will create a pipeline for translation from English to French. This pipeline abstracts away the model and tokenizer interaction, allowing us to directly input text and receive the translation.

With these components in place, we will be able to perform translations seamlessly.

Let's proceed to initialize our translation environment.


In [1]:
import transformers

import mlflow

model_architecture = "google/flan-t5-base"

translation_pipeline = transformers.pipeline(
    task="translation_en_to_fr",
    model=transformers.T5ForConditionalGeneration.from_pretrained(
        model_architecture, max_length=1000
    ),
    tokenizer=transformers.T5TokenizerFast.from_pretrained(model_architecture, return_tensors="pt"),
)

### Testing the Translation Pipeline

Before we proceed to log our model with MLflow, it's crucial to verify that our translation pipeline is functioning correctly. Given the substantial size of these models, it's best practice to ensure that the base model from the library performs as expected. This step helps in avoiding the time-consuming process of saving and then troubleshooting a model that may have issues during inference when loaded back from MLflow.

Here's why this preliminary check is important:

- **Model Verification**: By running a test translation, we can confirm that the model correctly translates text from English to French.
- **Error Prevention**: Identifying any issues early on, before logging the model, can save us from future errors that might arise during model deployment or inference.
- **Resource Management**: Large models require significant resources to save and load. Testing the model beforehand ensures that we are using our resources efficiently.
- **Pipeline Validation**: This step also validates that the pipeline, which includes the model and tokenizer, is set up correctly and can process input data as intended.

By conducting a test translation, we can confidently proceed to log our model with MLflow, knowing that the underlying model is functioning properly. Let's execute a translation to verify our setup:


In [2]:
translation_pipeline(
    "translate English to French: I enjoyed my slow saunter along the Champs-Élysées."
)

[{'translation_text': "J'ai apprécié mon sajour lente sur les Champs-Élysées."}]

### Evaluating the Translation Results

Upon running our initial translation through the pipeline, we received the following output:

```text
[{'translation_text': "J'ai apprécié mon sajour lente sur les Champs-Élysées."}]
```

While this translation captures the essence of the original English sentence, it does exhibit a few areas where accuracy could be improved. Notably, there are minor grammatical errors and the choice of words could be refined. However, considering the complexity of language translation, especially with nuances and context, the model has done a commendable job on its first attempt.

The translation demonstrates the model's ability to grasp the core meaning of the text and convert it into a structurally similar sentence in French. This is a testament to the power of the underlying machine learning model and its training on diverse linguistic data.

To put it into perspective, a more polished translation might look like this:

```text
"J'ai apprécié ma lente promenade le long des Champs-Élysées."
```

This refined version corrects the grammatical gender of 'séjour' to 'promenade' and adds the necessary article 'des' to 'Champs-Élysées', along with the correct accentuation and hyphenation. It's these subtle nuances that transform a good translation into a great one.

It's encouraging to see that with just a base model, we are already close to a natural and accurate translation. With further fine-tuning and possibly more context provided to the model, we can expect even more precise translations. This initial result is promising and indicates that our pipeline is well-configured and ready for more advanced use cases and optimizations.

In the world of machine translation, perfection is an ongoing pursuit, and this result is a solid stepping stone towards that goal. Let's celebrate the successes of our model and look forward to the improvements that iterative development and learning will bring. Thankfully, MLflow will have you completely covered if you should choose to improve the results of such a model by helping you keep track of the iterative fine tuning process.

### Setting Model Parameters and Inferring Signature

Before we proceed to log our model with MLflow, it's crucial to define the model parameters and infer the model signature. The model parameters dictate how our model behaves during inference. For instance, setting max_length to 1000 specifies the maximum length of the sequence to be generated. This ensures that our model can handle longer sentences without truncating them prematurely, which is essential for maintaining the context and meaning of the translation.

Inferring the model signature is also a pivotal step. The signature represents the schema of the model's inputs and outputs, allowing MLflow to understand the data types and structures that the model expects and produces. By providing a sample input and generating a corresponding output, we enable MLflow to capture this information, which aids in ensuring consistency and reliability when the model is deployed in different environments.

This process is a best practice that enhances model portability and reduces the risk of schema-related errors during deployment. It also provides clear documentation for developers and users of the model, making it easier to integrate the model into downstream applications.

By setting the model parameters and inferring the signature before logging the model, we establish a solid foundation for tracking, managing, and serving our translation model with confidence. 


In [3]:
model_params = {"max_length": 1000}

signature = mlflow.models.infer_signature(
    "This is a sample input sentence.",
    mlflow.transformers.generate_signature_output(translation_pipeline, "This is another sample."),
    params=model_params,
)

### Reviewing the Model Signature

After setting the parameters and inferring the signature for our translation model, it's important to review the signature to ensure that it accurately reflects the input and output data structures. The signature is a blueprint that MLflow uses to understand how to interact with the model. It specifies the types of data the model expects as input and predicts as output, as well as any additional parameters that control the model's behavior.

Here's what each part of the signature represents:

- **Inputs**: This section lists the expected input data type(s). In our case, the model expects a string input, which corresponds to the text we want to translate.

- **Outputs**: This outlines the data type(s) of the model's output. For our translation model, the output is also a string, representing the translated text.

- **Parameters**: These are the additional settings that can be configured for the model. The `max_length` parameter shown here is set to a long integer with a default value of 1000, indicating the maximum length of the generated translation.

By executing the `signature` command, we can visually confirm that the signature matches our expectations. This is a form of validation to ensure that when we deploy the model, it will receive and produce data in the format we have defined. It's a crucial step for model deployment and serves as a contract that guarantees the model's inputs and outputs remain consistent, avoiding potential runtime errors in production.

Additionally, it's worth noting that if we wish to override a parameter for the model at inference time, we must declare it with a default value at the time of signature generation. This ensures that our model's behavior is predictable and that any changes to its configuration are intentional and documented.


In [4]:
signature

inputs: 
  [string]
outputs: 
  [string]
params: 
  ['max_length': long (default: 1000)]

## Setting the tracking server and creating an experiment

In order to view the results in our tracking server (for the purposes of this tutorial, we've started a local tracking server at this url)

We can start an instance of the MLflow server locally by running the following from a terminal to start the tracking server:

``` bash
    mlflow server --host 127.0.0.1 --port 8080
```

With the server started, the following code will ensure that all experiments, runs, models, parameters, and metrics that we log are being tracked within that server instance (which also provides us with the MLflow UI when navigating to that url address in a browser).

After setting the tracking url, we create a new MLflow Experiment to store the run we're about to create in. 

In [5]:
mlflow.set_tracking_uri("http://127.0.0.1:8080")

mlflow.set_experiment("Translation")

<Experiment: artifact_location='mlflow-artifacts:/390523574656024103', creation_time=1699572826948, experiment_id='390523574656024103', last_update_time=1699572826948, lifecycle_stage='active', name='Translation', tags={}>

### Logging the Model with MLflow

Once we have verified the functionality of our translation model and confirmed the signature, the next step is to log the model with MLflow. This process involves starting an MLflow run and using the `mlflow.transformers.log_model` function to save our model. Here's what each argument in the function call is doing:

- `transformers_model`: This is the actual translation model pipeline that we have created and tested. It includes both the model and tokenizer, and it's ready to be logged for tracking and versioning.

- `artifact_path`: This is the directory within the MLflow run where the model artifacts will be stored. In this case, we're naming it "french_translator".

- `signature`: This is the model signature we previously generated and reviewed. It ensures that MLflow knows the expected input and output formats for the model.

- `model_params`: These are the parameters of the model that we want to log. In our case, we're logging the `max_length` parameter, which we've set to 1000.

By wrapping this function call within a `with mlflow.start_run()` block, we're creating a new MLflow run. This run acts as a container for all the information we log about this model, including the model itself, its parameters, and its signature. Once the model is logged, we receive a `model_info` object, which contains metadata about the logged model, including the location where it's stored. This information is crucial for later stages when we want to deploy the model or analyze its performance across different runs.


In [6]:
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model=translation_pipeline,
        artifact_path="french_translator",
        signature=signature,
        model_params=model_params,
    )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


### Inspecting the Loaded Model Components

Once the model is loaded using MLflow, we can inspect the individual components to confirm their integrity and types. The output from the previous cell provides us with a clear picture of what components are available:

- `task`: A string indicating the type of task the model is intended for.
- `device_map`: A string that represents the device mapping if the model has been configured to run on specific hardware.
- `model`: An instance of `T5ForConditionalGeneration`, which is the core of the translation model.
- `tokenizer`: The `T5TokenizerFast` object used to preprocess text for the model.
- `framework`: A string that indicates the deep learning framework used by the model.

This information is crucial for understanding how the model operates and ensures that each component is correctly loaded and identified. It also confirms that the model can be reconstructed for inference, further training, or analysis, maintaining the flexibility and robustness of the MLflow platform.

In [7]:
translation_components = mlflow.transformers.load_model(
    model_info.model_uri, return_type="components"
)

for key, value in translation_components.items():
    print(f"{key} -> {type(value).__name__}")

Downloading artifacts:   0%|          | 0/15 [00:00<?, ?it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2023/11/09 22:17:16 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
2023/11/09 22:17:28 INFO mlflow.transformers: 'runs:/50aa1efbd8354d788942b3fe587b6d01/french_translator' resolved as 'mlflow-artifacts:/390523574656024103/50aa1efbd8354d788942b3fe587b6d01/artifacts/french_translator'


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

task -> str
device_map -> str
model -> T5ForConditionalGeneration
tokenizer -> T5TokenizerFast
framework -> str


### Understanding Model Flavors in MLflow

The `model_info.flavors` attribute provides a detailed description of the different "flavors" that MLflow uses to manage and deploy the model. Flavors are a way to abstract the model's capabilities and requirements, making it easier to deploy across different platforms. Here's what each key in the output dictionary represents:

- `python_function`: This flavor indicates that the model can be loaded as a generic Python function. It includes:
  - `model_binary`: The path to the binary model file.
  - `loader_module`: The module used by MLflow to load the model.
  - `python_version`: The version of Python with which the model is compatible.
  - `env`: The environment specifications, including both conda and virtualenv options.

- `transformers`: This flavor is specific to models from the Hugging Face Transformers library. It includes:
  - `transformers_version`: The version of the Transformers library used.
  - `code`: Any additional code dependencies required by the model.
  - `task`: The specific task the model is trained for, in this case, `translation_en_to_fr`.
  - `instance_type`: The class of the pipeline instance, here `TranslationPipeline`.
  - `source_model_name`: The name of the pre-trained model used, `google/flan-t5-base`.
  - `pipeline_model_type`: The type of model, `T5ForConditionalGeneration`.
  - `framework`: The deep learning framework, PyTorch in this case (`pt`).
  - `tokenizer_type`: The type of tokenizer used, `T5TokenizerFast`.
  - `components`: The list of components included in the model, which for this model is the tokenizer.
  - `model_binary`: The path to the binary model file.

This information is essential for understanding how to interact with the model within the MLflow ecosystem, ensuring that the correct environment and dependencies are used when deploying the model for inference or further training.


In [8]:
model_info.flavors

{'python_function': {'model_binary': 'model',
  'loader_module': 'mlflow.transformers',
  'python_version': '3.8.13',
  'env': {'conda': 'conda.yaml', 'virtualenv': 'python_env.yaml'}},
 'transformers': {'transformers_version': '4.34.1',
  'code': None,
  'task': 'translation_en_to_fr',
  'instance_type': 'TranslationPipeline',
  'source_model_name': 'google/flan-t5-base',
  'pipeline_model_type': 'T5ForConditionalGeneration',
  'framework': 'pt',
  'tokenizer_type': 'T5TokenizerFast',
  'components': ['tokenizer'],
  'model_binary': 'model'}}

### Evaluating the Translation Output

We're now going to test the loaded pipeline with a somewhat challenging sentence, loaded via the native default process, which will return a pipeline instance. 

The result below is quite satisfactory as the model has correctly identified "Nice" as the proper noun referring to the city, rather than just an adjective. Moreover, it has adeptly navigated the play on words by choosing an appropriate French adjective "bien" to convey the sentiment that Nice is a pleasant place to be during this time of the year.

Such nuances in translation from English to French demonstrate the model's capability to understand context and the subtleties of language. This is a positive indication of the model's utility for real-world applications where accurate and context-aware translations are necessary.

It's also a reminder of the importance of testing models with sentences that have multiple meanings or interpretations to ensure that the model can handle a variety of linguistic challenges.




In [9]:
translation_pipeline = mlflow.transformers.load_model(model_info.model_uri)
response = translation_pipeline("I have heard that Nice is nice this time of year.")

print(response)

Downloading artifacts:   0%|          | 0/15 [00:00<?, ?it/s]

2023/11/09 22:17:29 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
2023/11/09 22:17:38 INFO mlflow.transformers: 'runs:/50aa1efbd8354d788942b3fe587b6d01/french_translator' resolved as 'mlflow-artifacts:/390523574656024103/50aa1efbd8354d788942b3fe587b6d01/artifacts/french_translator'


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[{'translation_text': "J'ai entendu que Nice est bien cette période de l'année."}]


### Assessing the Reconstructed Pipeline's Translation
In this next section, we're going to take the components that we loaded (a dictionary consisting of the required elements to perform inference) and reconstruct a new pipeline from these components. 

As you can see in the next section, the reconstructed pipeline successfully translated the English input into French, capturing the essence of how the Transformers library simplifies the use of deep learning models. The translation is not only syntactically correct but also semantically coherent, reflecting the original sentence's positive tone about the ease and enjoyment of using deep learning models.

This test confirms that our pipeline components have been correctly logged and retrieved from MLflow, and that the reconstructed pipeline is functioning as expected. It's a crucial step to ensure that the model we've trained and the components we've saved can be used effectively after being deployed, maintaining the integrity of the model's performance.

In [10]:
reconstructed_pipeline = transformers.pipeline(**translation_components)

reconstructed_response = reconstructed_pipeline(
    "transformers makes using Deep Learning models easy and fun!"
)

print(reconstructed_response)

[{'translation_text': "transformers simplifie l'utilisation des modèles de l'apprentissage profonde!"}]


### Direct Utilization of Model Components

In addition to using the full pipeline, we have the flexibility to interact with individual components of the model. This can be particularly useful for customizing the translation process or integrating the model into a larger system where you might need to manipulate the inputs and outputs more directly.

By examining the keys of the translation_components dictionary, we gain insight into the structure of our model and the available components. This includes the task specification, device mapping, the core model itself, the tokenizer responsible for preparing our inputs, and the framework information. Each component plays a vital role in the translation process and can be utilized independently for a more granular level of control.

In [11]:
translation_components.keys()

dict_keys(['task', 'device_map', 'model', 'tokenizer', 'framework'])

### Advanced Usage: Direct Interaction with Model Components

While the pipeline approach offers a convenient and high-level interface for translations, direct interaction with the model's components can be beneficial in certain scenarios. This method requires a deeper understanding of the model and tokenizer but provides the opportunity to insert custom logic at various points in the process, offering greater flexibility.

In the following code block, we manually handle the translation process by directly using the tokenizer and model components. This allows us to:

- Customize the tokenization process.
- Modify the tensor handling, such as specifying the device (CPU, GPU, MPS, etc.).
- Generate predictions with the possibility of adjusting parameters on-the-fly.
- Decode the outputs with the option to post-process the results.

This granular control can be crucial for advanced use cases where you need to intervene in the model's operations, such as adjusting the inputs based on dynamic conditions or post-processing the model's outputs before presenting them to the end-user.

However, this flexibility comes at the cost of increased complexity. Unlike the pipeline, which abstracts away many of the underlying operations, using the components directly requires managing more code and understanding the intricacies of the model's behavior. It's a trade-off between ease of use and control that needs to be considered based on the specific requirements of your application.


In [12]:
tokenizer = translation_components["tokenizer"]
model = translation_components["model"]

query = "Translate to French: Liberty, equality, fraternity, or death."

# This notebook was run on a Mac laptop, so we'll send the output tensor to the "mps" device.
# If you're running this on a different system, ensure that you're sending the tensor output to the appropriate device to ensure that
# the model is able to read it from memory.
inputs = tokenizer.encode(query, return_tensors="pt").to("mps")
outputs = model.generate(inputs).to("mps")
result = tokenizer.decode(outputs[0])

# Since we're not using a pipeline here, we need to modify the output slightly to get only the translated text.
print(result.replace("<pad> ", "\n").replace("</s>", ""))


La liberté, l'égalité, la fraternité ou la mort.


### Reflection on the Translation Output

Upon examining the final translation output, we observe that it is very close to the iconic French motto: "Liberté, égalité, fraternité, ou la mort." While the translation is not exact, it captures the essence of the phrase, demonstrating the model's capability to convey the meaning of complex and historically significant sentences. This slight deviation underscores the importance of context and cultural knowledge in language models.

The phrase "Liberté, égalité, fraternité" is more than just a collection of words; it is an emblematic slogan of the French Republic, a symbol of its values and history. This highlights an area where even advanced models like the one we've used can benefit from further refinement and contextual awareness. For more on this phrase and its significance, you can explore its [Wikipedia page](https://en.wikipedia.org/wiki/Libert%C3%A9,_%C3%A9galit%C3%A9,_fraternit%C3%A9).

### Tutorial Recap

Throughout this tutorial, we've delved into the integration of MLflow with a state-of-the-art language model for translation tasks. We've covered a lot of ground, including:

- Setting up and testing a translation pipeline.
- Logging the model and its parameters to MLflow.
- Inferring and understanding the model's signature.
- Loading and interacting with the model components for greater flexibility.
- Reflecting on the nuances of language translation and the importance of context.

#### The Power of MLflow and Model Metadata

The use of MLflow in this tutorial has demonstrated how it can streamline the process of managing and deploying machine learning models. By logging models with their parameters and metadata, MLflow ensures that we have a robust system for tracking experiments, managing model versions, and simplifying deployment.

The metadata stored within MLflow, such as the model's signature and components, plays a crucial role in ensuring that models are deployed consistently and reliably. It provides valuable information about the expected inputs and outputs, making it easier to integrate the model into production systems.

#### Conclusion

As we conclude this tutorial, it's clear that the combination of powerful language models and robust MLOps tools like MLflow can significantly enhance our ability to deploy sophisticated AI solutions. Whether you're working on translation, speech recognition, or any other machine learning task, the principles we've explored here will help you to manage and deploy your models with confidence and precision.

Thank you for joining me on this journey through automatic language translation and model management. I hope you've found it informative and empowering as you continue to work with these incredible tools in your own projects.