# Packaging a pytorch model with the pyfunc flavor

The pyfunc flavor is the most generic flavor in MLflow. You can package whatever you want in it. In this notebook, we will package a pytorch model.

* If you need to run several models in one server, you can use the pyfunc flavor to package them all.
* If you need to preprocess the data with a custom function, you can use the pyfunc flavor to package the preprocessing function.
* If you need to handle complex inputs, outputs  or dependencies, you can use the pyfunc flavor to package the logic.
* If you need more flexibility, you can use the pyfunc flavor to package the model and the logic.

## 1. Imports

In [1]:
from src.model import SimpleNN, RandomDataset, Trainer
from mlflow.pyfunc import save_model, PythonModel, PythonModelContext
from mlflow.models.signature import infer_signature
import requests
from typing import Any
import subprocess
import torch
import shutil
import os
import signal
import time
import cloudpickle


## 2. Settings

In [2]:
INPUT_SIZE = 10
TARGET_SIZE = 2
SERVE_PORT = 10001
MODEL_PATH = "model"

## 3. Training
In this example we will train 2 different models and save them in the same pyfunc flavor. The endpoint will use one or the other
based on the parameter model: model1 or model2.

In [3]:
model1 = SimpleNN(input_size=INPUT_SIZE, output_size=TARGET_SIZE, hidden_size=10)

train_dataset = RandomDataset(feat_size=INPUT_SIZE, target_size=TARGET_SIZE, num_samples=100)

trainer = Trainer(model1, optimizer=torch.optim.Adam(model1.parameters()), loss_fn=torch.nn.MSELoss())

trained_model1 = trainer.train(train_dataset, epochs=10)

  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
loss: 1.02: 100%|██████████| 13/13 [00:00<00:00, 729.60it/s]


Epoch 1/10, Loss: 1.02


loss: 1.00: 100%|██████████| 13/13 [00:00<00:00, 963.90it/s]


Epoch 2/10, Loss: 1.00


loss: 0.98: 100%|██████████| 13/13 [00:00<00:00, 946.96it/s]


Epoch 3/10, Loss: 0.98


loss: 0.95: 100%|██████████| 13/13 [00:00<00:00, 965.47it/s]


Epoch 4/10, Loss: 0.95


loss: 0.96: 100%|██████████| 13/13 [00:00<00:00, 931.72it/s]


Epoch 5/10, Loss: 0.96


loss: 1.00: 100%|██████████| 13/13 [00:00<00:00, 1043.62it/s]


Epoch 6/10, Loss: 1.00


loss: 0.95: 100%|██████████| 13/13 [00:00<00:00, 972.17it/s]


Epoch 7/10, Loss: 0.95


loss: 0.96: 100%|██████████| 13/13 [00:00<00:00, 982.13it/s]


Epoch 8/10, Loss: 0.96


loss: 0.90: 100%|██████████| 13/13 [00:00<00:00, 996.45it/s]


Epoch 9/10, Loss: 0.90


loss: 0.91: 100%|██████████| 13/13 [00:00<00:00, 966.70it/s]

Epoch 10/10, Loss: 0.91





In [4]:
model2 = SimpleNN(input_size=INPUT_SIZE, output_size=TARGET_SIZE, hidden_size=10)

train_dataset = RandomDataset(feat_size=INPUT_SIZE, target_size=TARGET_SIZE, num_samples=100)

trainer = Trainer(model2, optimizer=torch.optim.Adam(model2.parameters()), loss_fn=torch.nn.MSELoss())

trained_model2 = trainer.train(train_dataset, epochs=10)


loss: 1.18: 100%|██████████| 13/13 [00:00<00:00, 681.65it/s]


Epoch 1/10, Loss: 1.18


loss: 1.11: 100%|██████████| 13/13 [00:00<00:00, 793.28it/s]


Epoch 2/10, Loss: 1.11


loss: 1.09: 100%|██████████| 13/13 [00:00<00:00, 374.17it/s]


Epoch 3/10, Loss: 1.09


loss: 1.06: 100%|██████████| 13/13 [00:00<00:00, 280.60it/s]


Epoch 4/10, Loss: 1.06


loss: 1.05: 100%|██████████| 13/13 [00:00<00:00, 194.56it/s]


Epoch 5/10, Loss: 1.05


loss: 1.02: 100%|██████████| 13/13 [00:00<00:00, 966.69it/s]


Epoch 6/10, Loss: 1.02


loss: 1.05: 100%|██████████| 13/13 [00:00<00:00, 995.51it/s]


Epoch 7/10, Loss: 1.05


loss: 1.00: 100%|██████████| 13/13 [00:00<00:00, 977.87it/s]


Epoch 8/10, Loss: 1.00


loss: 1.03: 100%|██████████| 13/13 [00:00<00:00, 1078.65it/s]


Epoch 9/10, Loss: 1.03


loss: 0.98: 100%|██████████| 13/13 [00:00<00:00, 1073.64it/s]

Epoch 10/10, Loss: 0.98





## 4. Packaging the models
In order to package it you can use `save_model` passing a custom class.  This class should implement the methods predict and load_context. We can use the method load context to load the models. The models are saved as artifacts.

The code path and the environment it needs. Also... it is advisable to pass the model signature.

For the environment I prefer to use conda.

In [5]:
class PytorchModel(PythonModel):
    def __init__(self):
        self.model1 = None
        self.model2 = None
        self.device = None
        self.cpu_device = None
        self.loaded = False
    
    def load_context(self, context: PythonModelContext):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.cpu_device = torch.device("cpu")

        
        self.model1 = torch.load(context.artifacts["model1"], map_location=self.device)
        self.model1.eval()
        self.model2 = torch.load(context.artifacts["model2"], map_location=self.device)
        self.model2.eval()

        self.loaded = True
    
    def predict(self, context: PythonModelContext, model_input: list, params: dict[str, Any]) -> Any:
        
        if params.get("model") is None:
            raise TypeError("Model parameter not found")
        
        if not self.loaded:
            self.load_context(context)

        model_input = torch.tensor(model_input, dtype=torch.float32).to(self.device)
        
        # NOTE: depending in the model parameter, we will use model1 or model2
        if params["model"] == "model1":
            model = self.model1
        elif params["model"] == "model2":
            model = self.model2
        else:
            raise ValueError("Model parameter should be either model1 or model2")
        
        # NOTE: you could even run both and take the average, and use a model to predict something
        # the next model needs.
        
        print(self.random_function_made_for_demo())

        with torch.no_grad():
            output = model(model_input)
        return output.cpu().numpy().tolist()
        
    def random_function_made_for_demo(self):
        return "This is a random function made for demo purposes"


In [6]:
# signature
input_example = torch.rand(1, INPUT_SIZE)
output_example = trained_model1(input_example) # same as trained_model2 since they have the same architecture

# NOTE: we are using the params argument which is a dictionary that will be passed to the predict method
# this way we can select which model to use in the predict method
signature = infer_signature(input_example.numpy(), output_example.detach().numpy(), params={"model": "model1"})
signature

inputs: 
  [Tensor('float32', (-1, 10))]
outputs: 
  [Tensor('float32', (-1, 2))]
params: 
  ['model': string (default: model1)]

In [7]:
# conda env
# you usually want to save this to a file and then load it with mlflow
conda_env = {
    "channels": ["defaults"],
    "dependencies": [
        "python=3.11",
        {"pip": ["mlflow", "torch", "tqdm"]}
    ]
}

In [8]:
# save the models
torch.save(trained_model1, "model1.pth")
torch.save(trained_model2, "model2.pth")

In [9]:
# save model
shutil.rmtree(MODEL_PATH, ignore_errors=True)
save_model(path=MODEL_PATH, python_model=PytorchModel(), conda_env=conda_env, code_paths=["src"], artifacts={"model1": "model1.pth", "model2": "model2.pth"}, signature=signature)

  from .autonotebook import tqdm as notebook_tqdm
Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 2673.23it/s] 
Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 1340.89it/s]


## 5. Local serving

You can inmediately serve this model and run inference in local. 

**Note:** With this you may not need docker. It is enough with having correclty set up the conda env.

In [10]:
# start model server
cmd = f"mlflow models serve -m {MODEL_PATH} -p {SERVE_PORT} --env-manager local --workers 2" # alternative: --env-manager conda: will create a new conda env
process = subprocess.Popen(cmd, shell=True, preexec_fn=os.setsid)
time.sleep(2)

Downloading artifacts: 100%|██████████| 9/9 [00:00<00:00, 16278.02it/s]
2024/12/02 00:34:29 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2024/12/02 00:34:29 INFO mlflow.pyfunc.backend: === Running command 'exec gunicorn --timeout=60 -b 127.0.0.1:10001 -w 2 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2024-12-02 00:34:29 +0100] [429396] [INFO] Starting gunicorn 23.0.0
[2024-12-02 00:34:29 +0100] [429396] [INFO] Listening at: http://127.0.0.1:10001 (429396)
[2024-12-02 00:34:29 +0100] [429396] [INFO] Using worker: sync
[2024-12-02 00:34:29 +0100] [429397] [INFO] Booting worker with pid: 429397
[2024-12-02 00:34:29 +0100] [429398] [INFO] Booting worker with pid: 429398


In [11]:
result = requests.post(f"http://localhost:{SERVE_PORT}/invocations", json={"inputs": input_example.numpy().tolist(), "params": {"model": "model1"}})
print(result.json())

This is a random function made for demo purposes
{'predictions': [[-0.009028077125549316, 0.18052636086940765]]}


  return torch._C._cuda_getDeviceCount() > 0
  return torch._C._cuda_getDeviceCount() > 0


In [12]:
# stop model server
os.killpg(os.getpgid(process.pid), signal.SIGTERM)

[2024-12-02 00:34:32 +0100] [429398] [INFO] Worker exiting (pid: 429398)
[2024-12-02 00:34:32 +0100] [429396] [INFO] Handling signal: term


## 6. Docker

We also can package the model in docker. It is usually easier this way. It works in your machine and in their machine.

### 6.1. Packaging

In [13]:
IMAGE_NAME = "mlflow-model:pyfunc" # the name is mlflow-model and the tag is pytorch (you can change it)

[2024-12-02 00:34:32 +0100] [429397] [INFO] Worker exiting (pid: 429397)


In [14]:
cmd = f"mlflow models build-docker -m {MODEL_PATH} -n {IMAGE_NAME} --env-manager conda"
subprocess.run(cmd, shell=True)

[2024-12-02 00:34:32 +0100] [429396] [INFO] Shutting down: Master
  value = self.callback(ctx, self, value)
Downloading artifacts: 100%|██████████| 11/11 [00:00<00:00, 21907.57it/s]
2024/12/02 00:34:33 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
Downloading artifacts: 100%|██████████| 11/11 [00:00<00:00, 13710.95it/s]
2024/12/02 00:34:33 INFO mlflow.pyfunc.backend: Building docker image with name mlflow-model:pyfunc
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.95kB done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.7s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [ 1/15] FROM docker.io/library/ubuntu:20.04@sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 20.34kB done
#5 DONE 0.

CompletedProcess(args='mlflow models build-docker -m model -n mlflow-model:pyfunc --env-manager conda', returncode=0)

### 6.2. Inference
In order to get predictions we need to run the docker image.

In [15]:
CONTAINER_NAME = "mlflow_server"

In [None]:
cmd  = f'docker run -e GUNICORN_CMD_ARGS="--workers=1"  -p {SERVE_PORT}:8080 --name {CONTAINER_NAME} {IMAGE_NAME}'

process = subprocess.Popen(cmd, shell=True)

[2024-12-01 23:43:30 +0000] [41] [INFO] Starting gunicorn 23.0.0
[2024-12-01 23:43:30 +0000] [41] [INFO] Listening at: http://127.0.0.1:8000 (41)
[2024-12-01 23:43:30 +0000] [41] [INFO] Using worker: sync
[2024-12-01 23:43:30 +0000] [47] [INFO] Booting worker with pid: 47


In [20]:
result = requests.post(f"http://0.0.0.0:{SERVE_PORT}/invocations", json={"inputs": input_example.numpy().tolist(), "params": {"model": "model2"}})
result.json()

172.17.0.1 - - [01/Dec/2024:23:43:40 +0000] "POST /invocations HTTP/1.1" 200 61 "-" "python-requests/2.32.3"


{'predictions': [[0.18097594380378723, 0.11162048578262329]]}

In [21]:
cmd_stop = f"docker stop {CONTAINER_NAME}"
subprocess.run(cmd_stop, shell=True)

cmd_rm = f"docker rm {CONTAINER_NAME}"
subprocess.run(cmd_rm, shell=True)

2024/12/01 23:43:49 INFO mlflow.models.container: Got sigterm signal, exiting.
[2024-12-01 23:43:49 +0000] [41] [INFO] Handling signal: term
[2024-12-01 23:43:49 +0000] [47] [INFO] Worker exiting (pid: 47)


This is a random function made for demo purposes
mlflow_server
mlflow_server


CompletedProcess(args='docker rm mlflow_server', returncode=0)

### 6.3. Exporting the docker image
In order to use the docker image in other machines, we need to compress it and upload it to the machine.

In [22]:
cmd = f"docker save -o model.tar {IMAGE_NAME}"
subprocess.run(cmd, shell=True)

CompletedProcess(args='docker save -o model.tar mlflow-model:pytorch', returncode=0)

### 6.4. Importing the docker image
The target machine needs to have docker installed. Then we can load the image and run it.

In [23]:
cmd = "docker load -i model.tar"
subprocess.run(cmd, shell=True)

Loaded image: mlflow-model:pytorch


CompletedProcess(args='docker load -i model.tar', returncode=0)

You can do inference in the target machine executing the same command as in step 6.2.