<h1>Instalación del tf ModelServer</h1>
Este módulo es la herramienta usada por tf.serving para acceder a modelos a través de llamados al API-REST

In [0]:
%%bash
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
fapt-get update && apt-get install tensorflow-model-server

deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
OK


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0   4465      0 --:--:-- --:--:-- --:--:--  4465
bash: line 3: fapt-get: command not found


<h1>Parseo del archivo para crear la base de datos</h1>
Se parsea el archivo y se divide por caracteristicas

In [0]:
from google.colab import files
files.upload()

Saving sample_text.txt to sample_text.txt


{'sample_text.txt': b'dominio*****||*****seccion*****||*****url*****||*****title*****||*****tags*****||*****pretitle*****||*****posttitle*****||*****imagen_url*****||*****imagen_descripcion*****||*****texto\r\nwww.elconfidencial.com*****||*****mundo*****||*****https://www.elconfidencial.com/mundo/europa/2019-09-03/ultima-estocada-a-salvini-el-79-de-los-inscritos-del-m5s-apoya-la-nueva-coalicion_2208291/*****||*****Las bases del M5S apoyan por mayor\xc3\xada la coalici\xc3\xb3n de gobierno en Italia con el PD*****||*****Socialdemocracia|Movimiento 5 Estrellas|*****||*****\xc3\x9aLTIMA ESTOCADA A SALVINI*****||*****El 79% de los inscritos del Movimiento Cinco Estrellas (M5S) vot\xc3\xb3 hoy a favor del Gobierno en coalici\xc3\xb3n con el Partido Dem\xc3\xb3crata (PD), con Giuseppe Conte como primer ministro*****||*****https://www.ecestaticos.com/imagestatic/clipping/8b3/17c/8b317c51beca47d40dc5be6c8388e01a/el-gobierno-de-conte-ii-dara-un-respiro-a-italia-pero-no-lograra-frenar-a-salvini.

In [0]:
try:
 # %tensorflow_version only exists in Colab.
 %tensorflow_version 2.x
except Exception:
  pass
  
import tensorflow as tf
import tensorflow_datasets as tfds
import keras
from keras.layers import Input, Dense, concatenate
from keras.models import Model
import numpy as np

Using TensorFlow backend.


<h1>Creación del modelo</h1>
El clasificador de texto está compuesto de 2 modelos los cuales son el tokenizer y el clasificador
<ul>
<li> El tokenizer convierte las sequencias de texto en vectores y debe ser aprendido de la data</li>
<li> El clasificador toma la salida del tokenizer para las inputs y outputs del modelo</li>

In [0]:
# Training the tokenizer
lines_dataset = tf.data.TextLineDataset('./sample_text.txt')
training_samples = 28
INPUTS = ['seccion', 'title']
INLEN = [255,500]
OUTPUTS = ['tags']
OUTLEN = [255]
# Splitting the dataset
def parsing(line):
  """
  Parse the file and split by category
  """
  splitted = tf.strings.split(line, sep='*****||*****', maxsplit=9) 
  #tags = tf.strings.split(splitted[4], '|')
  return (splitted[1], splitted[3]), splitted[4]

parsed_data = lines_dataset.map(parsing)

# Creating the encoders
tags_tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=255, split='|')
section_tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=255)
title_tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=500)

tags_vocabulary_set = set()
section_vocabulary_set = set()
title_vocabulary_set = set()
for (section, title), tags in parsed_data.skip(1).take(training_samples):
  tags_vocabulary_set.update([tags.numpy().decode()])
  section_vocabulary_set.update([section.numpy().decode()])
  title_vocabulary_set.update([title.numpy().decode()])

tags_tokenizer.fit_on_texts(list(tags_vocabulary_set))
section_tokenizer.fit_on_texts(list(section_vocabulary_set))
title_tokenizer.fit_on_texts(list(title_vocabulary_set))



In [0]:
# Preparing the data pipeline
def encoder_data(text_tensor, label):
  """
  Uses the trained encoder to convert the text sequences into arrays
  Note: The tag array contain the labeled data
  """
  section = section_tokenizer.texts_to_matrix([text_tensor[0].numpy().decode()])[0] 
  title = title_tokenizer.texts_to_matrix([text_tensor[1].numpy().decode()])[0]  
  tag = tags_tokenizer.texts_to_matrix([label.numpy().decode()])[0]
  return section, title, tag

def encode_map_fn(text_tensor, label):
  result = tf.py_function(encoder_data, inp=[text_tensor, label], Tout=(tf.int64, tf.int64, tf.int64))
  return {'seccion': result[0], 'title':result[1]}, {'tags':result[2]}

# Skipping the first line and using a batchsize of 14
train_data = parsed_data.map(encode_map_fn).skip(1)
train_data = train_data.take(training_samples).padded_batch(14, padded_shapes=({'seccion':[None], 'title':[None]},{'tags':[None]}))



In [0]:
for i in train_data:
  print(i[0]['seccion'].shape, i[0]['title'].shape, i[1]['tags'].shape)

(14, 255) (14, 500) (14, 255)
(14, 255) (14, 500) (14, 255)


In [0]:
# Creating the model

section_input = tf.keras.layers.Input(shape=(255,), name='seccion', dtype=tf.int64)
title_input = tf.keras.layers.Input(shape=(500,), name='title', dtype=tf.int64)
section_emb = tf.keras.layers.Embedding(1024,32)(section_input)
title_emb = tf.keras.layers.Embedding(1024,32)(title_input)
mixed = tf.keras.layers.concatenate([tf.keras.layers.Flatten()(section_emb), tf.keras.layers.Flatten()(title_emb)])
out = tf.keras.layers.Dense(255, activation='sigmoid', name='tags')(mixed)
model = tf.keras.models.Model([section_input,title_input], [out])
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
seccion (InputLayer)            [(None, 255)]        0                                            
__________________________________________________________________________________________________
title (InputLayer)              [(None, 500)]        0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 255, 32)      32768       seccion[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 500, 32)      32768       title[0][0]                      
______________________________________________________________________________________________

In [0]:
model.compile(loss='binary_crossentropy')
model.fit(train_data, epochs=100)

Epoch 1/100
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch

<tensorflow.python.keras.callbacks.History at 0x7f9ca021f2b0>

In [0]:
class MyWordEmbeddingLayerLayer(tf.keras.layers.Layer):
  def __init__(self, tokenizer, output_size=255):
    super(MyWordEmbeddingLayerLayer, self).__init__()
    key_values = tf.lookup.KeyValueTensorInitializer(list(tokenizer.word_index.keys()),
                                                     list(tokenizer.word_index.values()),
                                                     key_dtype=tf.string, value_dtype=tf.int64)
    self.table = tf.lookup.StaticVocabularyTable(key_values,10)
    self.lookup = tf.function(self.table.lookup)
    self.output_size = output_size

  def call(self, input_string):
    queries = tf.ragged.map_flat_values(tf.strings.split, input_string)
    word_embeddings = tf.ragged.map_flat_values(lambda x: self.lookup(x), queries).to_tensor(default_value=0)
    word_embeddings = tf.pad(word_embeddings, [[0,0],[0,0], [0,self.output_size]], 'CONSTANT')[:,:,:self.output_size]
    return tf.reshape(word_embeddings, [queries.nrows(), self.output_size])

In [0]:
section_input = tf.keras.layers.Input(shape=(1,), name='seccion', dtype=tf.string)
title_input = tf.keras.layers.Input(shape=(1,), name='title', dtype=tf.string)
section_encode = MyWordEmbeddingLayerLayer(section_tokenizer)(section_input)
title_encode = MyWordEmbeddingLayerLayer(title_tokenizer, 500)(title_input)
#test_model = tf.keras.models.Model([section_input, title_input], [section_encode, title_encode])
#test_model.summary()
#test_model.predict([['mundo'],['la socialdemocracia']])[0].shape, test_model.predict([['mundo'],['la socialdemocracia']])[1].shape

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
seccion (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
title (InputLayer)              [(None, 1)]          0                                            
__________________________________________________________________________________________________
my_word_embedding_layer_layer ( (None, 255)          0           seccion[0][0]                    
__________________________________________________________________________________________________
my_word_embedding_layer_layer_1 (None, 500)          0           title[0][0]                      
Total params: 0
Trainable params: 0
Non-trainable params: 0
________________________________

((1, 255), (1, 500))

In [0]:
new_outputs = model({'seccion':section_encode, 'title':title_encode})
new_model = tf.keras.models.Model(inputs={'seccion':section_input, 'title':title_input}, outputs={'tags':new_outputs})
new_model.summary()
new_model.predict([['mundo'],['la socialdemocracia']])

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
seccion (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
title (InputLayer)              [(None, 1)]          0                                            
__________________________________________________________________________________________________
my_word_embedding_layer_layer ( (None, 255)          0           seccion[0][0]                    
__________________________________________________________________________________________________
my_word_embedding_layer_layer_1 (None, 500)          0           title[0][0]                      
____________________________________________________________________________________________

array([[0.00000000e+00, 1.17221475e-03, 1.20845437e-03, 1.14375353e-03,
        1.19748712e-03, 6.42240047e-05, 3.40580940e-04, 3.60280275e-04,
        3.54647636e-04, 3.60876322e-04, 6.18755817e-04, 6.18278980e-04,
        6.27845526e-04, 6.29752874e-04, 6.05225563e-04, 6.46203756e-04,
        6.31809235e-04, 6.15864992e-04, 4.49329615e-04, 4.24474478e-04,
        4.41044569e-04, 4.53263521e-04, 3.36766243e-05, 3.36170197e-05,
        6.01023436e-04, 1.21742487e-04, 1.33097172e-04, 1.28984451e-04,
        1.31040812e-04, 1.28984451e-04, 1.22338533e-04, 1.31577253e-04,
        1.28120184e-04, 1.32560730e-04, 1.23292208e-04, 1.26123428e-04,
        1.29342079e-04, 1.30087137e-04, 1.12026930e-04, 1.04367733e-04,
        1.07437372e-04, 7.15851784e-05, 7.65740871e-04, 1.55162811e-03,
        1.62097812e-03, 1.52927637e-03, 6.37650490e-04, 6.25163317e-04,
        5.96225262e-04, 3.56703997e-04, 3.41713428e-04, 3.49074602e-04,
        3.31014395e-04, 3.50326300e-04, 3.33040953e-04, 3.518760

In [0]:
import pickle 
tf.saved_model.save(new_model, "./textclassifier/1")

# saving the tokenizers
with open('section_tokenizer.pickle', 'wb') as handle:
    pickle.dump(section_tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('title_tokenizer.pickle', 'wb') as handle:
    pickle.dump(title_tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('tags_tokenizer.pickle', 'wb') as handle:
    pickle.dump(tags_tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: ./textclassifier/1/assets


<h1>Lanzado el servidor</h1>
A través del tesorflow_model_server se crea un proceso administrado por tf.serving que permite hacer una predicción usando el modelo inception cargado previamente.
(Reiniciar el collab en este paso)

In [0]:
import requests

In [0]:
%%bash --bg
nohup tensorflow_model_server --port=8502 --rest_api_port=8503 --model_name=text --model_base_path=`realpath textclassifier` > server.log 2>&1

Starting job # 0 in a separate thread.


In [0]:
try:
  print(requests.get('http://localhost:8503/v1/models/text').content.decode())
except:
  pass

{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}



<h1>Evaluacion del modelo</h1>
<p>El modelo se evalua con la primera entrada del texto cuyos tags son: <br>Socialdemocracia|Movimiento 5 Estrellas</p>

In [0]:
import json
import numpy as np

In [0]:
title = 'Las bases del M5S apoyan por mayoría la coalición de gobierno en Italia con el PD'
section = 'mundo'
instances = [{"seccion":[section], "title":[title]}]
data = json.dumps({"signature_name": "serving_default", 'examples':instances})
requests.post('http://localhost:8503/v1/models/text:classify', data=data).content

b'{ "error": "Expected classification signature method_name to be tensorflow/serving/classify. Was: tensorflow/serving/predict" }'

In [0]:
{'socialdemocracia': 0.5, 'otra': 0.2,... }

In [0]:
with open('tags_tokenizer.pickle', 'rb') as handle:
    tags_tokenizer = pickle.load(handle)

_, seq = np.where(np.array(json.loads(predictions)['predictions']) > 0.5)
for tag in seq:
  print(tags_tokenizer.sequences_to_texts([[tag]]))

FileNotFoundError: ignored

<tensorflow.python.keras.engine.training.Model at 0x7f9c56669160>

In [0]:
!saved_model_cli show --dir textclassifier/1 --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['seccion'] tensor_info:
      dtype: DT_STRING
      shape: (-1, 1)
      name: serving_default_seccion:0
  inputs['title'] tensor_info:
      dtype: DT_STRING
      shape: (-1, 1)
      name: serving_default_title:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['model'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 255)
      name: StatefulPartitionedCall_2:0
Method name is: tensorflow/serving/predict


In [0]:
class MyModule(tf.Module):
  def __init__(self, model):
    self.model = model

  @tf.function(input_signature=[{'seccion':tf.TensorSpec(shape=(), dtype=tf.string),
                                'title': tf.TensorSpec(shape=(), dtype=tf.string)}])
  def score(self, string_inputs):
    result = self.model(string_inputs)
    return { "scores": results }
modulemodel = MyModule(new_model)

In [0]:
modulemodel.score({'seccion':['mundo'], 'title':['la socialdemocracia']})

ValueError: ignored

In [0]:
!saved_model_cli show --dir test/1 --tag_set serve

The given SavedModel MetaGraphDef contains SignatureDefs with the following keys:
SignatureDef key: "__saved_model_init_op"
SignatureDef key: "serving_default"


In [0]:
load_model = tf.saved_model.load('textclassifier/1')

In [0]:
load_model.signatures['serving_default']

<tensorflow.python.saved_model.load._WrapperFunction at 0x7f9c555c3f28>

In [0]:
tf.saved_model.save()

False