# Packaging a pytorch model with the pytorch flavor

The pytorch flavor is the original flavor that supports mlflow. It has some cool features such as autologing. But you can not customize that much.

**NOTE:** In order for the network to work, if you are using float32 (which is usual), you need to explicitly convert it. From the mlflow server, we will recieve float64.

## 1. Imports

In [None]:
from src.model import SimpleNN, RandomDataset, Trainer

from mlflow.pytorch import save_model
from mlflow.models.signature import infer_signature
import requests
import subprocess
import torch
import shutil
import os
import signal
import time

## 2. Settings

In [2]:
INPUT_SIZE = 10
TARGET_SIZE = 2
SERVE_PORT = 10001
MODEL_PATH = "model"

## 3. Training

In [3]:
model = SimpleNN(input_size=INPUT_SIZE, output_size=TARGET_SIZE, hidden_size=10)

train_dataset = RandomDataset(feat_size=INPUT_SIZE, target_size=TARGET_SIZE, num_samples=100)

trainer = Trainer(model, optimizer=torch.optim.Adam(model.parameters()), loss_fn=torch.nn.MSELoss())

trained_model = trainer.train(train_dataset, epochs=10)

  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
loss: 1.07: 100%|██████████| 13/13 [00:00<00:00, 657.23it/s]


Epoch 1/10, Loss: 1.07


loss: 1.05: 100%|██████████| 13/13 [00:00<00:00, 943.09it/s]


Epoch 2/10, Loss: 1.05


loss: 1.03: 100%|██████████| 13/13 [00:00<00:00, 972.38it/s]


Epoch 3/10, Loss: 1.03


loss: 1.04: 100%|██████████| 13/13 [00:00<00:00, 937.98it/s]


Epoch 4/10, Loss: 1.04


loss: 1.02: 100%|██████████| 13/13 [00:00<00:00, 952.20it/s]


Epoch 5/10, Loss: 1.02


loss: 1.00: 100%|██████████| 13/13 [00:00<00:00, 970.42it/s]


Epoch 6/10, Loss: 1.00


loss: 1.02: 100%|██████████| 13/13 [00:00<00:00, 945.92it/s]


Epoch 7/10, Loss: 1.02


loss: 1.03: 100%|██████████| 13/13 [00:00<00:00, 1068.21it/s]


Epoch 8/10, Loss: 1.03


loss: 0.98: 100%|██████████| 13/13 [00:00<00:00, 1061.79it/s]


Epoch 9/10, Loss: 0.98


loss: 0.97: 100%|██████████| 13/13 [00:00<00:00, 978.55it/s]

Epoch 10/10, Loss: 0.97





## 4. Packaging the model
In order to package it you can use `save_model` passing the torch model, the code path and the environment it needs. Also... it is advisable to pass the model signature.

For the environment I prefer to use conda.

In [4]:
# signature
input_example = torch.rand(1, INPUT_SIZE)
output_example = trained_model(input_example)
signature = infer_signature(input_example.numpy(), output_example.detach().numpy())
signature

inputs: 
  [Tensor('float32', (-1, 10))]
outputs: 
  [Tensor('float32', (-1, 2))]
params: 
  None

In [5]:
# conda env
# you usually want to save this to a file and then load it with mlflow
conda_env = {
    "channels": ["defaults"],
    "dependencies": [
        "python",
        {"pip": ["mlflow", "torch", "tqdm"]}
    ]
}

In [None]:
# save model
shutil.rmtree(MODEL_PATH, ignore_errors=True)
save_model(trained_model, MODEL_PATH, conda_env=conda_env, signature=signature, code_paths=["src"])

## 5. Local serving

You can inmediately serve this model and run inference in local. 

**Note:** With this you may not need docker. It is enough with having correclty set up the conda env.

In [26]:
# start model server
cmd = f"mlflow models serve -m {MODEL_PATH} -p {SERVE_PORT} --env-manager local --workers 2" # alternative: --env-manager conda: will create a new conda env
process = subprocess.Popen(cmd, shell=True, preexec_fn=os.setsid)
time.sleep(2)

Downloading artifacts: 100%|██████████| 8/8 [00:00<00:00, 18672.47it/s]
2024/12/01 23:19:29 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2024/12/01 23:19:29 INFO mlflow.pyfunc.backend: === Running command 'exec gunicorn --timeout=60 -b 127.0.0.1:10001 -w 2 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2024-12-01 23:19:30 +0100] [392037] [INFO] Starting gunicorn 23.0.0
[2024-12-01 23:19:30 +0100] [392037] [INFO] Listening at: http://127.0.0.1:10001 (392037)
[2024-12-01 23:19:30 +0100] [392037] [INFO] Using worker: sync
[2024-12-01 23:19:30 +0100] [392038] [INFO] Booting worker with pid: 392038
[2024-12-01 23:19:30 +0100] [392039] [INFO] Booting worker with pid: 392039


In [27]:
result = requests.post(f"http://localhost:{SERVE_PORT}/invocations", json={"inputs": input_example.numpy().tolist()})
print(result.json())

{'predictions': [[0.0012273788452148438, 0.30130383372306824]]}


  return torch._C._cuda_getDeviceCount() > 0


  return torch._C._cuda_getDeviceCount() > 0


In [28]:
# stop model server
os.killpg(os.getpgid(process.pid), signal.SIGTERM)

[2024-12-01 23:19:34 +0100] [392039] [INFO] Worker exiting (pid: 392039)
[2024-12-01 23:19:34 +0100] [392038] [INFO] Worker exiting (pid: 392038)
[2024-12-01 23:19:34 +0100] [392037] [INFO] Handling signal: term


[2024-12-01 23:19:35 +0100] [392037] [INFO] Shutting down: Master


## 6. Docker

We also can package the model in docker. It is usually easier this way. It works in your machine and in their machine.

### 6.1. Packaging

In [33]:
IMAGE_NAME = "mlflow-model:pytorch" # the name is mlflow-model and the tag is pytorch (you can change it)

In [31]:
cmd = f"mlflow models build-docker -m {MODEL_PATH} -n {IMAGE_NAME} --env-manager conda"
subprocess.run(cmd, shell=True)

  value = self.callback(ctx, self, value)
Downloading artifacts: 100%|██████████| 10/10 [00:00<00:00, 18078.90it/s]
2024/12/01 23:24:44 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
Downloading artifacts: 100%|██████████| 10/10 [00:00<00:00, 10007.88it/s]
2024/12/01 23:24:44 INFO mlflow.pyfunc.backend: Building docker image with name mlflow-model:pytorch
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.95kB done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.8s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [ 1/15] FROM docker.io/library/ubuntu:20.04@sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 12.78kB done
#5 DONE 0.0s

#6 [ 4/15] RUN bash ./miniconda.sh -b -p /miniconda && rm ./m

CompletedProcess(args='mlflow models build-docker -m model -n mlflow-model:pytorch --env-manager conda', returncode=0)

### 6.2. Inference
In order to get predictions we need to run the docker image.

In [34]:
CONTAINER_NAME = "mlflow_server"

In [None]:
cmd  = f'docker run -e GUNICORN_CMD_ARGS="--workers=1"  -p {SERVE_PORT}:8080 --name {CONTAINER_NAME} {IMAGE_NAME}'

process = subprocess.Popen(cmd, shell=True)

[2024-12-01 22:40:28 +0000] [41] [INFO] Starting gunicorn 23.0.0
[2024-12-01 22:40:28 +0000] [41] [INFO] Listening at: http://127.0.0.1:8000 (41)
[2024-12-01 22:40:28 +0000] [41] [INFO] Using worker: sync
[2024-12-01 22:40:28 +0000] [47] [INFO] Booting worker with pid: 47


In [58]:
result = requests.post(f"http://0.0.0.0:{SERVE_PORT}/invocations", json={"inputs": input_example.numpy().tolist()})
result.json()

172.17.0.1 - - [01/Dec/2024:22:40:38 +0000] "POST /invocations HTTP/1.1" 200 63 "-" "python-requests/2.32.3"


{'predictions': [[0.0012273788452148438, 0.30130383372306824]]}

In [59]:
cmd_stop = f"docker stop {CONTAINER_NAME}"
subprocess.run(cmd_stop, shell=True)

cmd_rm = f"docker rm {CONTAINER_NAME}"
subprocess.run(cmd_rm, shell=True)

2024/12/01 22:40:47 INFO mlflow.models.container: Got sigterm signal, exiting.
[2024-12-01 22:40:47 +0000] [41] [INFO] Handling signal: term
[2024-12-01 22:40:47 +0000] [47] [INFO] Worker exiting (pid: 47)


mlflow_server
mlflow_server


CompletedProcess(args='docker rm mlflow_server', returncode=0)

### 6.3. Exporting the docker image
In order to use the docker image in other machines, we need to compress it and upload it to the machine.

In [60]:
cmd = "docker save -o model.tar mlflow-model:pytorch"
subprocess.run(cmd, shell=True)

CompletedProcess(args='docker save -o model.tar mlflow-model:pytorch', returncode=0)

### 6.4. Importing the docker image
The target machine needs to have docker installed. Then we can load the image and run it.

In [None]:
cmd = "docker load -i model.tar"
subprocess.run(cmd, shell=True)

You can do inference in the target machine executing the same command as in step 6.2.