# Inference Parameters in MLFlow 2.6

MLFlow 2.6 [added the ability](https://github.com/mlflow/mlflow/releases/tag/v2.6.0) to pass inference parameters to PyFunc models at inference time. This makes it easier to, for example, adjust sampling parametes such as `temperarure` and `top_k` or adjust the number of tokens to be returned with `max_new_tokens` in Hugging Face models at inference time without building those parameters into the input data.

Here's how it's done.

## Define the Pyfunc model

In [None]:
import mlflow.pyfunc
import transformers

class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input, params=None):
        # Load the pre-trained model
        model = transformers.pipeline(
            task="text-generation", model="gpt2", eos_token_id=50256
        )

        # Generate text using the provided parameters
        generated_text = []
        for t in model_input["input"]:
            generated_text.append(model(t, **params))
        return generated_text


my_model = MyModel()

## Define the model signature
To use inferenc parameters, we need to define a model signature that includes a `params` component.

In [None]:

data = {"input": ["The sky is", "The ocean is"]}
parameters = {
    "do_sample": False,
    "top_k": 5,
    "temperature": 0.,
    "max_new_tokens": 10,
}

signature = mlflow.models.infer_signature(model_input=data, params=parameters)

## Log the model

In [None]:
with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        python_model=my_model,
        signature=signature,
        artifact_path="my-model",
    )

## Load and run the logged model
In the cell below, we load the logged model and run it twice, once with default inference parameters and once with the parameters set explicitly.

In [None]:
# Load the logged mlflow model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

data = {"input": ["The sky is", "The ocean is"]}

# Example with no parameters
result_no_params = loaded_model.predict(data, params={})

# Example with do_sample=true and temperature=2
params = {
    "do_sample": True,
    "temperature": 4.,
    "top_k": 25,
    "max_new_tokens": 100
}
result_with_params = loaded_model.predict(data, params=params)

print("Default params:")
for response in result_no_params:
    print("Prompt 1:", response[0][0]['generated_text'])
    print("Prompt 2:", response[1][0]['generated_text'])

print("\nExplicit params:")
for response in result_with_params:
    print("Prompt 1:", response[0][0]['generated_text'])
    print("Prompt 2:", response[1][0]['generated_text'])
