## Introduction to Sentence Transformers and MLflow

Welcome to our tutorial on harnessing the capabilities of **Sentence Transformers** with **MLflow**. This tutorial targets those beginning their journey in advanced natural language processing and model management. We will guide you through a hands-on example showcasing the integration of the `sentence-transformers` library with MLflow, a tool that significantly streamlines the machine learning lifecycle.

### What are Sentence Transformers?

**Sentence Transformers** are a modification of the traditional transformers model, specifically optimized for generating meaningful and semantically rich sentence embeddings. Developed as an extension of the renowned Transformers library by 🤗 Hugging Face, `sentence-transformers` facilitate an array of NLP tasks such as semantic search, text clustering, and similarity comparison. The library leverages models like BERT, RoBERTa, and DistilBERT, fine-tuned to produce high-quality sentence-level embeddings.

### Benefits of Integrating MLflow with Sentence Transformers

Merging MLflow with Sentence Transformers offers a suite of advantages for NLP projects:

- **Efficient Experiment Management**: Streamline the process of tracking experiments, including logging model parameters, metrics, and embeddings.
- **Enhanced Model Lifecycle Control**: Gain better control over your NLP models' versions, configurations, and performance.
- **Reproducibility**: Facilitate the replication of results and model predictions with comprehensive record-keeping.
- **Simplified Model Deployment**: Ease the deployment of your NLP models into production environments, supported by MLflow's robust tools.

### Learning Objectives

In this tutorial, you'll learn how to:

- Set up a pipeline for generating sentence embeddings using the `sentence-transformers` library.
- Log models and their configurations using MLflow.
- Understand the concept of model signatures in MLflow and how they apply to `sentence-transformers`.
- Deploy and utilize these models for inference using MLflow's capabilities.

By the end of this tutorial, you'll have a deeper understanding of how MLflow can amplify your NLP projects, especially when working with sophisticated models like Sentence Transformers, empowering you to efficiently track, manage, and deploy your applications.

Let's embark on this journey of integrating Sentence Transformers with MLflow!

### Setting Up the Environment for Sentence Embedding

As we embark on this journey with Sentence Transformers and MLflow, our initial step is to establish our working environment. This involves importing the required libraries and setting up the Sentence Transformer model, which forms the core of our sentence embedding pipeline.

#### Key Steps for Initialization:

1. **Library Imports**:
   - Import the `SentenceTransformer` class from the `sentence_transformers` library. This class is crucial for creating our sentence embedding model.
   - Import `mlflow`. This will give us access to our run context and the `sentence_transformers` module, allowing us to log and load a model.

2. **Model Initialization**:
   - Initialize the Sentence Transformer model using the `SentenceTransformer` class. We will be using the `"all-MiniLM-L6-v2"` model for this tutorial, a compact yet powerful model known for its efficiency and effectiveness in generating sentence embeddings.

   You can find additional models that are compatible with embedding tasks in the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending) by selecting **"Sentence Similarity"** in the categorical selection pane.

3. **Purpose of the Model**:
   - The `"all-MiniLM-L6-v2"` model is designed to convert sentences into semantically meaningful embeddings. These embeddings can be used in a variety of NLP tasks, such as semantic search, clustering, and similarity comparisons.

By setting up this environment, we lay the foundation for our exploration into the capabilities of Sentence Transformers in conjunction with MLflow. This setup not only simplifies our workflow but also enables us to delve into advanced NLP tasks with ease.

Let's proceed to initialize our environment and the Sentence Transformer model.

In [1]:
import mlflow
from sentence_transformers import SentenceTransformer


model = SentenceTransformer("all-MiniLM-L6-v2")


* 'schema_extra' has been renamed to 'json_schema_extra'


### Defining the Model Signature with MLflow

Now that our Sentence Transformer model is set up, the next important step is to define the model signature. A model signature in MLflow is crucial for specifying the input and output formats of the model, ensuring consistent and expected behavior during inference.

#### Steps for Signature Definition:

1. **Prepare Example Sentences**:
   - We start by defining a list of example sentences: `["This is a sentence.", "This is another sentence."]`. These sentences will be used to demonstrate the model's input and output formats.

2. **Generate Model Signature**:
   - Utilize the `mlflow.models.infer_signature` function to automatically infer and define the model signature.
   - The `model_input` parameter is set to our example sentences.
   - For the `model_output` parameter, we use the `encode` method of our Sentence Transformer model to transform the example sentences into embeddings. The output of this method represents the model's expected output format.

#### Importance of the Model Signature:

- **Clarity in Data Formats**: The signature provides clear documentation of the type and format of data the model expects and produces, which is essential for anyone using the model.
- **Model Deployment and Usage**: Accurate model signatures are crucial when deploying models to production, as they ensure that the model receives inputs in the correct format and produces expected outputs.
- **Error Prevention**: A well-defined signature helps prevent errors during model inference by enforcing consistent data formats.

With the model signature defined, we gain a better understanding of how our Sentence Transformer model processes and transforms data, setting the stage for more advanced operations and deployment.

**NOTE**: The `List[str]` input type is equivalent at inference time (for purposes of validation and signature enforcement) to `str`. This MLflow flavor uses a `ColSpec[str]` definition for the input type, which can accept either `str` or `List[str]`.

Let's proceed to define the signature for our model.

In [2]:
example_sentences = ["A sentence to encode.", "Another sentence to encode."]

signature = mlflow.models.infer_signature(
    model_input=example_sentences,
    model_output=model.encode(example_sentences),
)

### Setting the tracking server and creating an experiment

In order to view the results in our tracking server (for the purposes of this tutorial, we’ve started a local tracking server at this url)

We can start an instance of the MLflow server locally by running the following from a terminal to start the tracking server:

``` bash
mlflow server --host 127.0.0.1 --port 8080
```

With the server started, the following code will ensure that all experiments, runs, models, parameters, and metrics that we log are being tracked within that server instance (which also provides us with the MLflow UI when navigating to that url address in a browser).

After setting the tracking url, we create a new MLflow Experiment to store the run we’re about to create in.

In [3]:
mlflow.set_tracking_uri("http://127.0.0.1:8080")

mlflow.set_experiment("Introduction to Sentence Transformers")

<Experiment: artifact_location='mlflow-artifacts:/711030387534394632', creation_time=1700232675282, experiment_id='711030387534394632', last_update_time=1700232675282, lifecycle_stage='active', name='Introduction to Sentence Transformers', tags={}>

### Logging the Sentence Transformer Model with MLflow

With our Sentence Transformer model initialized and its signature defined, the next crucial step is to log the model in MLflow. This process involves registering the model along with its metadata, which includes the signature and an example of the input data. Logging the model in this way is essential for tracking, version control, and deployment.

#### Steps for Logging the Model:

1. **Start an MLflow Run**:
   - Use `mlflow.start_run()` to initiate a new run. This MLflow run acts as a container for all the operations related to model logging, ensuring that they are grouped together for easy tracking.

2. **Log the Model**:
   - Call `mlflow.sentence_transformers.log_model` to log the Sentence Transformer model.
   - Provide the `model` object itself, which is our Sentence Transformer model.
   - Specify an `artifact_path`, which is the directory within the MLflow run where the model will be stored.
   - Include the `signature` we defined earlier, which documents the model's input and output formats.
   - Add an `input_example` to give a concrete example of the data the model expects.

#### Importance of Model Logging:

- **Model Management**: Logging the model in MLflow aids in managing the model's lifecycle, from training to deployment.
- **Reproducibility and Tracking**: It allows for the tracking of model versions and ensures reproducibility of the model's performance.
- **Ease of Deployment**: Logged models in MLflow can be easily deployed for inference, making the transition from training to production smoother.

By logging our Sentence Transformer model in MLflow, we effectively create a record of the model's configuration and capabilities, paving the way for subsequent model analysis, sharing, and deployment.

Let's log our model in MLflow and move forward in our machine learning workflow.

In [4]:
with mlflow.start_run():
    logged_model = mlflow.sentence_transformers.log_model(
        model=model,
        artifact_path="sbert_model",
        signature=signature,
        input_example=example_sentences,
    )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


### Loading the Model and Testing Inference

After logging our Sentence Transformer model with MLflow, we will now demonstrate how to load the model for inference and test it with new input sentences. This step is crucial to understand how our model can be deployed for real-time inference in an application service layer.

#### Loading the Model as a PyFunc:

1. **Why PyFunc**:
   - We load our logged model using `mlflow.pyfunc.load_model`. The `pyfunc` format is MLflow's way of abstracting models, enabling them to be used as regular Python functions. This is particularly useful for deployment scenarios where the model needs to seamlessly integrate into an existing Python-based service or application.
   - Loading as a `pyfunc` allows for flexibility and simplicity, especially when the downstream processing or application logic is implemented in Python.

2. **Model URI**: 
   - We use the `logged_model.model_uri` to load the model. This URI points to the location where the logged model's artifacts are stored in MLflow.

#### Conducting Inference Tests:

1. **Test Sentences**:
   - We define a set of test sentences: `["I enjoy pies of both apple and cherry.", "I prefer cookies."]`. These sentences are used to test the model's ability to generate embeddings.

2. **Performing Predictions**:
   - We call the `predict` method on the loaded `pyfunc` model with our test sentences. This method returns the embeddings for each input sentence.

3. **Printing Embedding Lengths**:
   - We print the length of the returned embedding structures to verify that embeddings have been generated for each input sentence. This step confirms the model's functionality in producing embeddings and helps visualize the output structure.
   - The length of each embedding array corresponds to the dimensionality of the vector representation of each sentence.

#### Importance of Inference Testing:

- **Model Validation**: This test confirms that our model, when loaded for inference, behaves as expected and successfully processes input data.
- **Deployment Readiness**: Demonstrating inference with `pyfunc` is a key step in validating the model's readiness for integration into a service layer for real-time applications.

Let's proceed to load the model and test its inference capabilities on our example sentences.

In [5]:
inference_test = ["I enjoy pies of both apple and cherry.", "I prefer cookies."]

loaded_model_pyfunc = mlflow.pyfunc.load_model(logged_model.model_uri)

embeddings1 = loaded_model_pyfunc.predict(inference_test)

print(f"The return structure length is: {len(embeddings1)}")

for i, embedding in enumerate(embeddings1):
    print(f"The size of embedding {i + 1} is: {len(embeddings1[i])}")

Downloading artifacts:   0%|          | 0/16 [00:00<?, ?it/s]

2023/11/20 12:09:44 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


The return structure length is: 2
The size of embedding 1 is: 384
The size of embedding 2 is: 384


### Displaying Samples of Generated Embeddings

Having confirmed that our Sentence Transformer model successfully generates embeddings for our input sentences, we now move to inspect the actual content of these embeddings. This step is crucial for understanding the nature of the output produced by our model and for verifying the quality of the embeddings.

#### Inspecting the Embedding Samples:

1. **Purpose of Sampling**:
   - We examine a sample of the entries in each embedding to get a sense of what the model output looks like. This inspection is particularly important for understanding the kind of vector representations that the model generates for different sentences.

2. **Printing Embedding Samples**:
   - For each embedding, we print the first 10 entries of the vector. This subset provides a glimpse into the embedding without overwhelming us with the entire high-dimensional vector.
   - The command `embedding[:10]` is used to slice the first 10 elements of each embedding vector.

#### Why Sampling is Important:

- **Quality Check**: Sampling the embeddings allows us to perform a quick quality check, ensuring that the embeddings are not degenerate (e.g., all zeros) and have meaningful values.
- **Understanding Model Output**: Seeing a portion of the embedding vectors helps in gaining an intuitive understanding of the model's output, which can be important for debugging and further model development.

This exploration of the embeddings is a key step in familiarizing ourselves with the model's output, setting the stage for further analysis or integration of these embeddings into downstream tasks.

Let's inspect the samples from our generated embeddings.

In [6]:
for i, embedding in enumerate(embeddings1):
    print(f"The sample of the first 10 entries in embedding {i + 1} is: {embedding[:10]}")

The sample of the first 10 entries in embedding 1 is: [ 0.04866192 -0.03687946  0.02408808  0.03534171 -0.12739632  0.00999414
  0.07135344 -0.01433522  0.04296691 -0.00654414]
The sample of the first 10 entries in embedding 2 is: [-0.03879027 -0.02373698  0.01314073  0.03589077 -0.01641303 -0.0857707
  0.08282158 -0.03173266  0.04507608  0.02777079]


### Native Model Loading in MLflow for Extended Functionality

In addition to loading our model as a generic Python function (`pyfunc`), MLflow also supports native loading of Sentence Transformer models. This approach allows us to utilize the full range of functionalities, methods, and attributes inherent to the Sentence Transformer model, which can be crucial for certain NLP tasks that extend beyond simple embedding generation.

#### Why Support Native Loading?

1. **Access to Native Functionalities**:
   - By loading the model natively, we can access all the native functionalities of the Sentence Transformer model. This is particularly important for tasks that require specific methods or attributes not exposed through the generic `pyfunc` interface.
   - Native loading is essential for scenarios where advanced model functionalities are needed, such as fine-tuning, further training, or using specific model methods for complex NLP tasks.

2. **Loading the Model Natively**:
   - We use `mlflow.sentence_transformers.load_model` to load the model natively. This method is specifically designed for models logged using the Sentence Transformers flavor in MLflow.
   - The model is loaded using its unique model URI, ensuring that we retrieve the correct version of the model that we previously logged.

#### Generating Embeddings Using Native Model:

1. **Model Encoding**:
   - After loading the model natively, we use the `encode` method of the Sentence Transformer model to generate embeddings for our test sentences.
   - This `encode` method is part of the native Sentence Transformer model and provides optimized functionality for converting sentences into embeddings.

2. **Importance of Native Encoding**:
   - Using the native `encode` method ensures that we are leveraging the model's full capabilities in terms of embedding generation.
   - It allows for a more flexible and potentially more efficient embedding process, especially for complex or large-scale NLP applications.

By understanding the benefits of native model loading in MLflow, we can better utilize the full range of features offered by Sentence Transformers, tailoring our NLP projects to specific requirements and extending beyond basic embedding tasks.

Let's proceed to load our model natively and generate embeddings using the model's `encode` method.

In [7]:
loaded_model_native = mlflow.sentence_transformers.load_model(logged_model.model_uri)

embeddings2 = loaded_model_native.encode(inference_test)

for i, embedding in enumerate(embeddings2):
    print(f"The sample of the native library encoding call for embedding {i + 1} is: {embedding[:10]}")

Downloading artifacts:   0%|          | 0/16 [00:00<?, ?it/s]

2023/11/20 12:09:46 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
2023/11/20 12:09:48 INFO mlflow.sentence_transformers: 'runs:/471300dac52f489cbc59a3c4013a995e/sbert_model' resolved as 'mlflow-artifacts:/711030387534394632/471300dac52f489cbc59a3c4013a995e/artifacts/sbert_model'


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

The sample of the native library encoding call for embedding 1 is: [ 0.04866192 -0.03687946  0.02408808  0.03534171 -0.12739632  0.00999414
  0.07135344 -0.01433522  0.04296691 -0.00654414]
The sample of the native library encoding call for embedding 2 is: [-0.03879027 -0.02373698  0.01314073  0.03589077 -0.01641303 -0.0857707
  0.08282158 -0.03173266  0.04507608  0.02777079]


## Conclusion: Embracing the Power of Sentence Transformers with MLflow

As we reach the end of our Introduction to Sentence Transformers tutorial, we have successfully navigated the basics of integrating the Sentence Transformers library with MLflow. This foundational knowledge sets the stage for more advanced and specialized applications in the field of Natural Language Processing (NLP).

### Recap of Key Learnings

1. **Integration Basics**: We covered the essential steps of loading and logging a Sentence Transformer model using MLflow. This process demonstrated the simplicity and effectiveness of integrating cutting-edge NLP tools within MLflow's ecosystem.

2. **Signature and Inference**: Through the creation of a model signature and the execution of inference tasks, we showcased how to operationalize the Sentence Transformer model, ensuring that it's ready for real-world applications.

3. **Model Loading and Prediction**: We explored two ways of loading the model - as a PyFunc model and using the native Sentence Transformers loading mechanism. This dual approach highlighted the versatility of MLflow in accommodating different model interaction methods.

4. **Embeddings Exploration**: By generating and examining sentence embeddings, we glimpsed the transformative potential of transformer models in capturing semantic information from text.

### Looking Ahead

- **Expanding Horizons**: While this tutorial focused on the foundational aspects of Sentence Transformers and MLflow, there's a whole world of advanced applications waiting to be explored. From semantic similarity analysis to paraphrase mining, the potential use cases are vast and varied.

- **Continued Learning**: We strongly encourage you to delve into the other tutorials in this series, which dive deeper into more intriguing use cases like similarity analysis, semantic search, and paraphrase mining. These tutorials will provide you with a broader understanding and more practical applications of Sentence Transformers in various NLP tasks.

### Final Thoughts

The journey into NLP with Sentence Transformers and MLflow is just beginning. With the skills and insights gained from this tutorial, you are well-equipped to explore more complex and exciting applications. The integration of advanced NLP models with MLflow's robust management and deployment capabilities opens up new avenues for innovation and exploration in the field of language understanding and beyond.

Thank you for joining us on this introductory journey, and we look forward to seeing how you apply these tools and concepts in your NLP endeavors!