# How to Deploy a Custom Hugging Face Model to Azure ML

The following is a step by step guide on how to deploy a custom Hugging Face model to Azure ML. In order to be able to deploy the model from your local environment to Azure ML we will need to install the libraries from ```requirements.txt```.

Then, we can run the cells in the notebook to prepare the model and deploy it to Azure ML.

## 0. Download the model from Hugging Face Hub and save it locally

In this example, we are going to deploy an embedding model. To keep the example as simple as possible, we will be using sentence-transformers for inference. Here, we download the model from Hugging Face Hub and save it locally to upload and register the model in Azure ML.

If we already have the model locally, we can skip this step.

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-m3")
model.save("path/to/model/bge-m3")

## 1. Connect to Azure ML Workspace

The easiest way to connect to our Azure ML Workspace without hard coding your credentials is to pass the path to the ```config.json``` file as a parameter to the ```Workspace.from_config()``` method.

In [None]:
from azureml.core import Workspace

# Connect to your Azure ML workspace
ws = Workspace.from_config("config.json")

## 2. Register the model

Registering the model means that the model will uploaded to the Azure ML Model Registry. Depending on the size of the model, this can take a while.

In [None]:
from azureml.core import Model

# Register the model
model = Model.register(workspace=ws,
                       model_name='bge-m3',  # Give a unique name
                       model_path="path/to/model/bge-m3",  # Path to the model directory
                       description="Embedding model from Hugging Face Hub")


## 3. Define the environment

Define the environment by giving it a distinct name and pass all packages you will need to run inferences with the model. In this case, we will be basically using sentence-transformers only.

In [None]:
# Define the environment
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies

env = Environment('huggingface-embeddings')
deps = CondaDependencies.create(conda_packages=[],
                                pip_packages=['azureml-defaults', 'sentence-transformers==2.7.0'])
env.python.conda_dependencies = deps

# 4. Create an inference script

In order to get inferences from the model, we need to create an inference script. For Azure ML, the script needs to have an ```init()``` and a ```run()``` function. The ```init()``` function will run only once at the start of the contrainer and is for loading the model into memory. The ```run()``` function will be called every time a request is made to the endpoint and needs to unpack the request data and pass it to the model for inference.

Refer to the ```score.py``` file in this repository for an example.

In [None]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script="score.py", environment=env)

# 5. Create a deployment configuration and deploy the model

Before deploying the model, we need to define the deployment configuration. Set ```cpu_cores``` and ```memory_in_gb```, so the model can be loaded into memory and the container will not run out of memory while performing inference.

The deploy method will the start the deployment process. Here, a container image will be created and a VM instance will be created to run the container. The whole process can take a while.

If we already registered a model but want to redeploy it, we can grab the model to pass it to the deploy method via ```model = Model(ws, 'bge-m3')```.

Double-Check that you are using the Azure ML SDK v1 or v2 for both the model registration and the deployment. If we registered the model via Studio UI or SDK v2, we can not deploy the model via SDK v1.

In [None]:
# Deploy the model
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

inference_config = InferenceConfig(entry_script='score.py', environment=env)
deployment_config = AciWebservice.deploy_configuration(cpu_cores=2, memory_gb=4)

service = Model.deploy(workspace=ws,
                       name='huggingface-embeddings',
                       models=[model],
                       inference_config=inference_config,
                       deployment_config=deployment_config)
service.wait_for_deployment(show_output=True)

# 6. Call the endpoint and get inferences

To call the endpoint, we can either send a request via http or use ```service.run()```.

In [None]:
from utils import embed_text
print(service.scoring_uri)

test_sample = "The quick brown fox jumps over the lazy dog."
embeddings = embed_text(test_sample, service.scoring_uri)

In [None]:
from azureml.core.webservice import Webservice

service = Webservice(name='huggingface-embeddings', workspace=ws)

test_sample = "The quick brown fox jumps over the lazy dog."
embeddings = service.run(test_sample)

# 7. Now do the fun part

After we got the embeddings, we can use them for all kinds of downstream tasks. For instance, we can embed multiple texts and calculate the cosine similarity between them.

In [None]:
from utils import cosine_similarity

print(cosine_similarity(embed_text("Bock"), embed_text("B")))

In [None]:
docs = [
    "The first human landing on the Moon was achieved in 1969.",
    "Neil Armstrong was the first person to walk on the lunar surface.",
    "Apollo 11 was the spaceflight that landed the first two people on the Moon.",
]
query = "Who was the first to walk on the Moon?"

for i in docs:
    print(cosine_similarity(embed_text(query), embed_text(i)))