# Crear Servidor para ejecutar modelo ya entrenado con TensorFlow+Keras usando TensorFlow.Serving
Fuentes:

https://www.tensorflow.org/tfx/guide/serving?hl=en-419

https://www.tensorflow.org/tfx/tutorials/serving/rest_simple?hl=es-419#install_tensorflow_serving

https://towardsdatascience.com/deploying-machine-learning-models-with-tensorflow-serving-an-introduction-6d49697a1315#d0c4

https://towardsdatascience.com/deploying-keras-models-using-tensorflow-serving-and-flask-508ba00f1037

https://blog.stackademic.com/make-our-local-server-application-public-using-ngrok-18b67acbd9bb?gi=79c2b55712e4

Nota: El código del tutoriral utilizado se ejecuta de forma nativa TensorFlow, pero también se puede ejecutar en Docker, que es una de las maneras más fáciles para empezar a usar TensorFlow Servir.

# Preparacion de TensorFlow Serving

In [1]:
#@title Descargar TensorFlow Serving
import sys
# We need sudo prefix if not on a Google Colab.
if 'google.colab' not in sys.modules:
  SUDO_IF_NEEDED = 'sudo'
else:
  SUDO_IF_NEEDED = ''

# This is the same as you would do from your command line, but without the [arch=amd64], and no sudo
# You would instead do:
# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | {SUDO_IF_NEEDED} tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | {SUDO_IF_NEEDED} apt-key add -
!{SUDO_IF_NEEDED} apt update

deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0   9627      0 --:--:-- --:--:-- --:--:--  9617
OK
Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,026 B]
Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
Get:8 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server-

In [2]:
#@title Instalar TensorFlow Serving
# TODO: Use the latest model server version when colab supports it.
#!{SUDO_IF_NEEDED} apt-get install tensorflow-model-server
# We need to install Tensorflow Model server 2.8 instead of latest version
# Tensorflow Serving >2.9.0 required `GLIBC_2.29` and `GLIBCXX_3.4.26`. Currently colab environment doesn't support latest version of`GLIBC`,so workaround is to use specific version of Tensorflow Serving `2.8.0` to mitigate issue.
!wget 'http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-2.8.0/t/tensorflow-model-server/tensorflow-model-server_2.8.0_all.deb'
!dpkg -i tensorflow-model-server_2.8.0_all.deb
!pip3 install tensorflow-serving-api==2.8.0

--2023-12-01 14:20:40--  http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-2.8.0/t/tensorflow-model-server/tensorflow-model-server_2.8.0_all.deb
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.201.207, 74.125.69.207, 64.233.181.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.201.207|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 340152790 (324M) [application/x-debian-package]
Saving to: ‘tensorflow-model-server_2.8.0_all.deb’


2023-12-01 14:20:42 (151 MB/s) - ‘tensorflow-model-server_2.8.0_all.deb’ saved [340152790/340152790]

Selecting previously unselected package tensorflow-model-server.
(Reading database ... 120882 files and directories currently installed.)
Preparing to unpack tensorflow-model-server_2.8.0_all.deb ...
Unpacking tensorflow-model-server (2.8.0) ...
Setting up tensorflow-model-server (2.8.0) ...
Collecting tensorflow-serving-api==2.8.0
  Downloading tensorflow

In [3]:
#@title Acceder al Drive

# Nota: la primera vez se debe confirmar el uso logueandose en "Google Drive File Stream" y obteniendo código de autentificación.
from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


In [4]:
#@title Definir Parámetros para TensorFlow Serving

path_modelo = '/content/gdrive/My Drive/IA/demoModelDeployment/modelo'  #@param {type:"string"}
port_number = "5000" #@param {type:"string"}
model_name = "IRIS" #@param {type:"string"}

import os
import shutil

# hace una copia del path temporal del modelo
# porque se necesita que termine con un numero
path_modelo_TFServ = "/content/model"
shutil.copytree(path_modelo, path_modelo_TFServ+"/1",  dirs_exist_ok=True)

# define variables de sistema para los parámetros
os.environ["MODEL_PORT"] = port_number
os.environ["MODEL_NAME"] = model_name
os.environ["MODEL_DIR"] = path_modelo_TFServ

modelURL_paraNGROK = 'http://localhost:'+port_number
serviceName_paraNGROK = '/v1/models/'+model_name+':predict'
modelURL = modelURL_paraNGROK + serviceName_paraNGROK

print("\n > Modelo local va a ser establecido en:", modelURL, "\n")


 > Modelo local va a ser establecido en: http://localhost:5000/v1/models/IRIS:predict 



In [5]:
#@title Ejecuta TensorFlow Serving con el modelo disponible
%%bash --bg
nohup tensorflow_model_server \
  --rest_api_port=${MODEL_PORT} \
  --model_name="${MODEL_NAME}" \
  --model_base_path="${MODEL_DIR}" >server.log 2>&1


In [6]:
#@title Espera 10 segundos a que se ejecute lo anterior...
import time
time.sleep(10)

In [7]:
#@title Muestra log de TensorFlow Serving
!tail server.log

2023-12-01 14:21:32.757867: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 105946 microseconds.
2023-12-01 14:21:32.758870: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /content/model/1/assets.extra/tf_serving_warmup_requests
2023-12-01 14:21:32.759731: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: IRIS version: 1}
2023-12-01 14:21:32.760798: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2023-12-01 14:21:32.760859: I tensorflow_serving/model_servers/server.cc:133] Using InsecureServerCredentials
2023-12-01 14:21:32.760928: I tensorflow_serving/model_servers/server.cc:391] Profiler service is enabled
2023-12-01 14:21:32.761668: I tensorflow_serving/model_servers/server.cc:417] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for noden

# Prueba de TensorFlow Serving local

In [8]:
#@title Cargar complementos del Modelo ya entrenado
import os
import joblib

# cargar scaler (si existe)
fn_scaler = path_modelo+"/scaler.joblib"
if os.path.isfile(fn_scaler):
  scaler = joblib.load(fn_scaler)
  print("\n* Scaler cargado de ", fn_scaler, "\n")
else:
  scaler = None
  print("\n* Scaler no encontrado en ", fn_scaler, "\n")

fn_clases = path_modelo+"/CLASES.txt"
CLASES = []
if os.path.isfile(fn_clases):
  with open(fn_clases, 'r') as f:
    # carga datos
    auxData = f.readlines()
  for c in auxData:
    CLASES.append( c.replace("\n", "") )
  print("\n* CLASES definidas cargado de ", fn_clases, ":")
  print("\t\t", CLASES, "\n")
else:
  print("\n* CLASES no encontradas en ", fn_clases, "\n")



* Scaler no encontrado en  /content/gdrive/My Drive/IA/demoModelDeployment/modelo/scaler.joblib 


* CLASES definidas cargado de  /content/gdrive/My Drive/IA/demoModelDeployment/modelo/CLASES.txt :
		 ['na', 'Setosa', 'Versicolor', 'Virginica'] 



In [11]:
#@title Probar Modelo usando TensorFlow Serving local

import numpy as np
import requests
import json

def ejecutarModeloURL(vals, modelURL):
  headers = {"content-type": "application/json"}
  data_json = json.dumps({"signature_name": "serving_default", "instances": vals})
  json_response = requests.post(modelURL, data=data_json, headers=headers)
  return json.loads(json_response.text)

def clasificarIris(valMedidas, modelURL):
  # normaliza los datos (si el scaler está definido)
  if (scaler is not None):
    valMedidas = scaler.transform(valMedidas)
  # ejecuta el modelo
  resModel = ejecutarModeloURL(valMedidas, modelURL)
  # se genera al menos un resultado
  if (len(resModel) > 0) and ("predictions" in resModel):
    resClases = []
    for r in resModel["predictions"]:
      if (len(r) == 1):
        # como tiene una salida se asume salida lineal (solo la redondea)
        claseID = round(r)
      else:
        # como tiene validas salidas se asume salida softmax (toma la de mayor puntaje)
        claseID = int( np.argmax(r, axis=0) )
      # determina la descripción de la clase (si está definida)
      if (CLASES is not None) and (len(CLASES)>0) and (claseID<len(CLASES)):
        claseDesc = CLASES[claseID]
      else:
        claseDesc = str(claseID)
      resClases.append( { "claseID" : claseID, "clase" : claseDesc} )
    return resClases
  else:
    print(resModel)
    return []

# ejecuta
LargoSepalo = 5.5 #@param{type:"number"}
AnchoSepalo = 2.6 #@param{type:"number"}
LargoPetalo = 4.4 #@param{type:"number"}
AnchoPetalo = 1.2 #@param{type:"number"}

vals = [[LargoSepalo, AnchoSepalo, LargoPetalo, AnchoPetalo]]
res = clasificarIris(vals, modelURL)
if len(res)==0:
  print("--> Clase no definida!")
else:
  for r in res:
    print("--> Clase " + str(r))


--> Clase {'claseID': 2, 'clase': 'Versicolor'}


In [14]:
#@title Probar Modelo usando TensorFlow Serving local con ejemplos de un CSV que tiene los datos mostrados en <construir-RNA-MLP-IRIS.ipynb>
from google.colab import files

def cargarArchivo(fn):

  # lo carga en lista
  uploadedData = []
  with open(fn, 'r', encoding='utf-8') as f:
    contents = f.readlines()
  # separa en lineas
  uploadedData.extend( ("\n".join(contents)).split("\n") )

  print('-- Archivo "{name}" con largo {length} bytes cargado'.format(
    name=fn, length=len(uploaded[fn])))

  return uploadedData

# sube archivo
uploaded = files.upload()
# procesa los datos
uploadedData = []
for fn in uploaded.keys():
    uploadedData.extend( cargarArchivo(fn))

if len(uploadedData)>0:
  print("\n")
  # procesa los datos cargados
  classReal = []
  classPreds = []
  for data in uploadedData:
    if data!="":
      arAux = data.split(";")
      vals = [[float(arAux[0]), float(arAux[1]), float(arAux[2]), float(arAux[3])]]
      classReal.append( arAux[4] )
      claseModelo = clasificarIris(vals, modelURL)
      if len(claseModelo)==0:
        classPreds.append("Clase no definida!")
      else:
        classPreds.append( str(claseModelo[0]["claseID"]) )
  print("\n")


  from sklearn.metrics import classification_report
  from sklearn.metrics import confusion_matrix
  import pandas as pd

  # muestra reporte de clasificación
  print("\n Reporte de Clasificación: ")
  print(classification_report(classReal, classPreds))

  # muestra matriz de confusion
  print('\nMatriz de Confusión ( real / modelo ): ')
  cm = confusion_matrix(classReal, classPreds)
  cmtx = pd.DataFrame(
      cm
    )
  # agrega para poder mostrar la matrix de confusión completa
  pd.options.display.max_rows = 100
  pd.options.display.max_columns = 100
  cmtx.sort_index(axis=0, inplace=True)
  cmtx.sort_index(axis=1, inplace=True)
  print(cmtx)
  print("\n")


Saving datos_PRUEBA.csv to datos_PRUEBA (1).csv
-- Archivo "datos_PRUEBA (1).csv" con largo 684 bytes cargado





 Reporte de Clasificación: 
              precision    recall  f1-score   support

           1       1.00      1.00      1.00        12
           2       0.92      0.85      0.88        13
           3       0.86      0.92      0.89        13

    accuracy                           0.92        38
   macro avg       0.92      0.92      0.92        38
weighted avg       0.92      0.92      0.92        38


Matriz de Confusión ( real / modelo ): 
    0   1   2
0  12   0   0
1   0  11   2
2   0   1  12




# Publicación de TensorFlow Serving con ngrok (opcional)

In [15]:
#@title Instalar pyngrok (opcional)

usar_ngrok_web_publica = True #@param{type:"boolean"}

if usar_ngrok_web_publica:
  # Instalar pyngrok
  !pip install pyngrok
  print("")
else:
  print("- no se usa ngrok.")


Collecting pyngrok
  Downloading pyngrok-7.0.1.tar.gz (731 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/731.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.6/731.8 kB[0m [31m4.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.8/731.8 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyngrok
  Building wheel for pyngrok (setup.py) ... [?25l[?25hdone
  Created wheel for pyngrok: filename=pyngrok-7.0.1-py3-none-any.whl size=21122 sha256=8c12b6732dbf77cb288dacdfb1117eb67115e68fbdb9c408dcaf332006012707
  Stored in directory: /root/.cache/pip/wheels/3b/32/0e/27789b6fde02bf2b320d6f1a0fd9e1354b257c5f75eefc29bc
Successfully built pyngrok
Installing collected packages: pyngrok
Successfully installed pyngrok-7.0.1



In [16]:
#@title Preparar la conexión con ngrok para hacer sitio público (opcional)

ngrok_auth_token = "" #@param {type:"string"}

if usar_ngrok_web_publica:
  import getpass
  from pyngrok import ngrok, conf

  # determina el authentication token de ngrok (
  if (ngrok_auth_token == ""):
    print("Ingrese el authtoken indicada en https://dashboard.ngrok.com/get-started/your-authtoken luego de registrarse")
    conf.get_default().auth_token = getpass.getpass()
  else:
    conf.get_default().auth_token = ngrok_auth_token
  print("")
  # Open a TCP ngrok tunnel to the SSH server
  connection_string = ngrok.connect("5555", "tcp").public_url
  print("")
  ssh_url, port = connection_string.strip("tcp://").split(":")
  print("")
  print(f" * se crea ngrok tunnel, accediendo con `ssh root@{ssh_url} -p{port}`")

  # Open a ngrok tunnel to the HTTP server

  public_url = ngrok.connect(port_number).public_url
  print(" * ngrok tunnel definido: \"{}\" <-> \"{}\"".format(public_url, modelURL_paraNGROK))


  # ... Update inbound traffic via APIs to use the public-facing ngrok URL
  ngrok_public_Web_API = public_url + serviceName_paraNGROK

  print("\n > ngrok public Web API establecida en: ",  ngrok_public_Web_API)
else:
  print("- no se usa ngrok.")


Ingrese el authtoken indicada en https://dashboard.ngrok.com/get-started/your-authtoken luego de registrarse
··········







 * se crea ngrok tunnel, accediendo con `ssh root@2.tcp.ngrok.io -p13433`
 * ngrok tunnel definido: "https://c772-34-68-28-183.ngrok-free.app" <-> "http://localhost:5000"

 > ngrok public Web API establecida en:  https://c772-34-68-28-183.ngrok-free.app/v1/models/IRIS:predict
