## Deploy de modelo com o Triton Inference Server

## Pré requisitos
- Podman
- Notebook python

## Explicando a estrutura dos modelos na Triton

In [2]:
!tree modelo_regressao

Folder PATH listing for volume OS
Volume serial number is 6037-BFFF
C:\USERS\018117631\DOCUMENTS\PROJETO BB\TRITON\PROJETO_TRITON\TRITON\MODELO_REGRESSAO
+---1
    +---__pycache__


## Config.pbxt - Protobuffer
O arquivo de de configuração do config.pbtxt especifica as entradas e saídas dos modelos:
- As entradas são os dados que o modelo recebe para realizar a inferência. 
- As saídas são os resultados da inferência do modelo.

In [3]:
with open('./modelo_regressao/config.pbtxt', 'r') as f:
    arquivo_configuracao = f.read()

print(arquivo_configuracao)

backend: "python"

input {
    name: "input"
    data_type: TYPE_FP32
    dims: [-1, -1]
}

output {
    name: "PREDICAO"
    data_type: TYPE_STRING
    dims: [ 1 ]
}

instance_group [{ kind: KIND_CPU }]



## Model.py 
- O arquivo model.py contém o código para carregar o modelo com base nas configurações fornecidas pelo protobuffer.

In [4]:
with open('./modelo_regressao/1/model.py', 'r') as f:
    arquivo_modelo = f.read()

print(arquivo_modelo)

import json

import numpy as np
import triton_python_backend_utils as pb_utils
from joblib import load


class TritonPythonModel:
    def initialize(self, args):
        self.model_config = model_config = json.loads(args['model_config'])

        predicao_config = pb_utils.get_output_config_by_name(
            model_config, "PREDICAO")
        
        self.predicao_dtype = pb_utils.triton_string_to_numpy(
            predicao_config['data_type'])

        version_path =  args['model_repository'] + '/' + args['model_version']

        self.model = load(version_path + '/model.pickle')

    def execute(self, requests):
        responses = []

        for request in requests:
            in_0 = pb_utils.get_input_tensor_by_name(request, "input")

            input_0 = in_0.as_numpy()

            predicao = self.model.predict(input_0)

            predicao_tensor = pb_utils.Tensor(
                "PREDICAO", predicao.astype(self.predicao_dtype))

            inference_response = pb_util

In [7]:
!pip install scikit-learn===1.1.1
# !pip install scikit-learn===0.24.0

Collecting scikit-learn===1.1.1
  Using cached scikit_learn-1.1.1-cp310-cp310-win_amd64.whl (7.3 MB)
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.3.0
    Uninstalling scikit-learn-1.3.0:
      Successfully uninstalled scikit-learn-1.3.0


ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\018117631\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\~-learn\\.libs\\msvcp140.dll'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
# Deserializando .pickle do modelo para entender o tipo de modelo

""" import pickle
dict_pickle_in = open('./modelo_regressao/1/model.pickle','rb')
dict1 = pickle.load(dict_pickle_in)
type(dict1) """



" import pickle\ndict_pickle_in = open('./modelo_regressao/1/model.pickle','rb')\ndict1 = pickle.load(dict_pickle_in)\ntype(dict1) "

## Deploy do modelo na triton usando o podman

Imagem do triton mais recente: 23.09-py3

`podman pull nvcr.io/nvidia/tritonserver:23.09-py3`

No terminal, executar o comando:

`podman run --rm -p 8000:8000 -v $HOME/triton:/models nvcr.io/nvidia/tritonserver:23.09-py3 /bin/bash -c "pip install -r /models/requirements.txt && tritonserver --model-repository=/models"`


O que faz esse comado:

- Executa um contêiner usando o podman, mapeando a porta 8000 do host para a porta 8000 do contêiner e montando o diretório $HOME/triton do host para /models dentro do contêiner. 
- Instala o requirements (bibliotecas necessárias) para o modelo.

## Após o deploy, vamos realizar inferências!!

In [4]:
import requests
import json
import numpy as np

url = "http://localhost:8000/v2/models/modelo_regressao/versions/1/infer"


# input data
input_data = np.array([11.1]).reshape(-1, 1)
# input_data = np.array([10.1]).reshape(-1, 1)
print(input_data)
payload = json.dumps({
  "inputs": [
    {
      # "name": "ENTRADA",
      "name": "input",
      "shape": input_data.shape,
      "datatype": "FP32",
      "data": input_data.tolist()
    }
  ]
})

headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

[[11.1]]
{"model_name":"modelo_regressao","model_version":"1","outputs":[{"name":"PREDICAO","datatype":"BYTES","shape":[1],"data":["121873.0"]}]}


## Comandinhos de verificação

> É necessário utilizar a versão 3.10.11 do Python para instalar as bibiliotecas a seguir

In [None]:
!pip install tritonclient

In [None]:
!pip install gevent

In [8]:
!pip install geventhttpclient==1.5.4

Collecting geventhttpclient==1.5.4
  Downloading geventhttpclient-1.5.4-cp310-cp310-win_amd64.whl (37 kB)
Collecting brotli
  Using cached Brotli-1.1.0-cp310-cp310-win_amd64.whl (357 kB)
Collecting certifi
  Using cached certifi-2024.2.2-py3-none-any.whl (163 kB)
Installing collected packages: brotli, certifi, geventhttpclient
Successfully installed brotli-1.1.0 certifi-2024.2.2 geventhttpclient-1.5.4


Reason for being yanked: Accidentally introduced a backwards incompatible change see https://github.com/locustio/locust/pull/2083

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
# Criando um cliente para se comunicar com o Triton
import tritonclient.http as httpclient

triton_client = httpclient.InferenceServerClient(url="localhost:8000", verbose=True)

In [10]:
# Verificar se o servidor está ativo para receber solicitações
triton_client.is_server_live()

GET /v2/health/live, headers {}
<HTTPSocketPoolResponse status=200 headers={'content-length': '0', 'content-type': 'text/plain'}>


True

In [11]:
# Verificar se o Triton está pronto para receber inferências
triton_client.is_server_ready()

GET /v2/health/ready, headers {}
<HTTPSocketPoolResponse status=200 headers={'content-length': '0', 'content-type': 'text/plain'}>


True

In [12]:
# Metadados do modelo 
triton_client.get_model_metadata("modelo_regressao")

GET /v2/models/modelo_regressao, headers {}
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '189'}>
bytearray(b'{"name":"modelo_regressao","versions":["1"],"platform":"python","inputs":[{"name":"input","datatype":"FP32","shape":[-1,-1]}],"outputs":[{"name":"PREDICAO","datatype":"BYTES","shape":[1]}]}')


{'name': 'modelo_regressao',
 'versions': ['1'],
 'platform': 'python',
 'inputs': [{'name': 'input', 'datatype': 'FP32', 'shape': [-1, -1]}],
 'outputs': [{'name': 'PREDICAO', 'datatype': 'BYTES', 'shape': [1]}]}

## Documentações

https://github.com/triton-inference-server

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

