## EXTRACCIÓN DE INFORMACIÓN RELEVANTE A PARTIR DE CONVERSACIONES SOBRE LA DIABETES MELLITUS

### Trabajo de Fin de Máster
#### Ainhoa García Sánchez

El objetivo del presente proyecto es crear un modelo capaz de extraer información relevante a partir de conversaciones que mantienen enfermos de DM con un chatbot especializado. 




## 1 - Preparación del entorno

In [None]:
!pip install rouge

In [None]:
!pip install sentencepiece

In [None]:
!pip install transformers

In [None]:
!pip install sacremoses

In [None]:
!pip install nlpaug

In [None]:
# Librerías necesarias
import pandas as pd
import numpy as np
%tensorflow_version 1.x # Se trabaja con la versión 1 de tensorflow 
import tensorflow as tf
import re
from nltk.corpus import stopwords
import time
from tensorflow.python.layers.core import Dense
from tensorflow.python.ops.rnn_cell_impl import _zero_state_tensors
from rouge import Rouge
import operator
import nltk
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

import os
import matplotlib.pyplot as plt
from transformers import *
from sklearn.model_selection import train_test_split

import random
import nlpaug.augmenter.word as naw
import sentencepiece


print('TensorFlow Version: {}'.format(tf.__version__))

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `1.x # Se trabaja con la versión 1 de tensorflow`. This will be interpreted as: `1.x`.


TensorFlow 1.x selected.


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


TensorFlow Version: 1.15.2


In [None]:
# Se trabaja con GPU
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [None]:
# Conexión a Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 2 - Importación de los datos
El objetivo del presente apartado es importar los datos del estudio y hacer una primera exploración de estos.

In [None]:
# Importar los datos
data = pd.read_csv('/content/drive/MyDrive/TFM_Diabetes/data/dialogues_ampliado_160622.csv', sep = ";")

def dataframe_ok(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df.drop(['Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
           'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25'], axis = 1, inplace = True) #Eliminar la columna del índice
  return(df)


data = dataframe_ok(data)

In [None]:
data.head(1)

Unnamed: 0,Dialogue,Mood,Sport,Glucose,Glucose_binary,Insulin,Insulin_dose,Bad_food,Remedies_low_glucose_level,Remedies_high_glucose_level,Glucose_checks,Symptoms_low_blood_sugar,Symptoms_high_blood_sugar,Risk_situation,Good_food,All_together,Unnamed: 26
0,<SOS> how are you ? <SOS> i'm feeling great to...,i'm feeling great today,play a baskeball match,sugar level was low after lunch,low,did you inject too much insulin ? yes i didn ...,next time think about the exercise when you ca...,,,,,,,inject too much insulin,,i'm feeling great today play a baskeball match...,


In [None]:
## Funciones para separar cada una de las columnas según la información de interés

def dataframe_Mood(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Mood']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Glucose(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Glucose']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Glucose_binary(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Glucose_binary']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Insulin(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Insulin']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Insulin_dose(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Insulin_dose']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Sport(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Sport']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Bad_food(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Bad_food']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Good_food(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Good_food']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Risk_situation(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Risk_situation']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Remedies_low(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Remedies_low_glucose_level']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Symptoms_low(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Symptoms_low_blood_sugar']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Remedies_high(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Remedies_high_glucose_level']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Symptoms_high(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Symptoms_high_blood_sugar']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

def dataframe_Glucose_checks(df):
  ''' Eliminar columnas innecesarias y asignar los nombres a las columnas'''
  df = df[['Dialogue', 'Glucose_checks']]
  df.columns = ['Text', 'Answer'] #Asignar el nombre a cada columna
  return(df)

## Función para eliminar los textos con datos faltantes
def drop_missing(df):
  df.dropna(inplace=True)  # Elimianr filas con datos faltantes
  return(df)

## Función para separar el conjunto de datos en entrenamiento y test
def split_train_test(data):
  X_train, X_test = train_test_split(data, test_size=0.10, random_state=42)
  return(X_train, X_test)

In [None]:
# Inspeccionar las conversaciones de las tres primeras filas
for i in range(3):
    print("New #",i+1)
    print(data.Dialogue[i])
    print()

New # 1
<SOS> how are you ? <SOS> i'm feeling great today . what did you do today ? i played a baskeball match this morning . how was it ? did you win ? yes and i scored 40 points . wow you did an amazing match did the sport affect your glucose ? yes my sugar level was low after lunch . did you inject too much insulin ? yes i didn t take into account the exercise . next time think about the exercise when you calculate the insulin dose . yes i will . <EOS> .

New # 2
<SOS> <SOS> how is it going ? great i had a very good blood glucose level today . that s nice keep it up having a good glucose level is great . thank you but it may rise this afternoon . why is that ? because i just ate a big sandwich now . and did you put your insulin ? yes but maybe it was not enough . then you should check your glucose in an hour and a half and correct the dose if necessary . thanks for your advice . no problem see <EOS> .

New # 3
till later . <SOS> i need your help . what s the matter ? i forgot to put

### 2.2. Creación de los conjuntos de datos específicos para cada una de las categorías

#### 2.2.1. Separar el conjunto de datos en entrenamiento y test

In [None]:
train, test = split_train_test(data)

In [None]:
## Resetear el índice del Data frame

train = train.reset_index()
train.drop(['index'], axis = 1, inplace = True)
test = test.reset_index()
test.drop(['index'], axis = 1, inplace = True)

In [None]:
train.shape

(180, 17)

#### 2.2.2. Data augmentation (aplicada al conjunto de train)

In [None]:
"""
back_trans_aug = naw.BackTranslationAug('Helsinki-NLP/opus-mt-en-de', 'Helsinki-NLP/opus-mt-de-en')
"""

"\nback_trans_aug = naw.BackTranslationAug('Helsinki-NLP/opus-mt-en-de', 'Helsinki-NLP/opus-mt-de-en')\n"

In [None]:
"""

df_augmented = pd.DataFrame(columns=['Dialogue', 'Mood', 'Sport', 'Glucose', 'Glucose_binary', 'Insulin',
       'Insulin_dose', 'Bad_food', 'Remedies_low_glucose_level',
       'Remedies_high_glucose_level', 'Glucose_checks',
       'Symptoms_low_blood_sugar', 'Symptoms_high_blood_sugar',
       'Risk_situation', 'Good_food'])

def data_augmentation(train, df_augmented):

  for i in range(0, len(train)):
    print(i)
    df_augmented = df_augmented.append({'Dialogue': back_trans_aug.augment(train.Dialogue[i]), 'Mood':train.Mood[i], 'Sport':train.Sport[i], 
                                      'Glucose': train.Glucose[i], 'Glucose_binary':train.Glucose_binary[i], 'Insulin': train.Insulin[i],
                                      'Insulin_dose': train.Insulin_dose[i], 'Bad_food': train.Bad_food[i], 
                                      'Remedies_low_glucose_level': train.Remedies_low_glucose_level[i],
                                      'Remedies_high_glucose_level': train.Remedies_high_glucose_level[i], 
                                      'Glucose_checks': train.Glucose_checks[i], 'Symptoms_low_blood_sugar': train.Symptoms_low_blood_sugar[i],
                                      'Symptoms_high_blood_sugar': train.Symptoms_high_blood_sugar[i],
                                      'Risk_situation': train.Risk_situation[i],
                                      'Good_food': train.Good_food[i]}, ignore_index=True)
    
  return(df_augmented)

  """


"\n\ndf_augmented = pd.DataFrame(columns=['Dialogue', 'Mood', 'Sport', 'Glucose', 'Glucose_binary', 'Insulin',\n       'Insulin_dose', 'Bad_food', 'Remedies_low_glucose_level',\n       'Remedies_high_glucose_level', 'Glucose_checks',\n       'Symptoms_low_blood_sugar', 'Symptoms_high_blood_sugar',\n       'Risk_situation', 'Good_food'])\n\ndef data_augmentation(train, df_augmented):\n\n  for i in range(0, len(train)):\n    print(i)\n    df_augmented = df_augmented.append({'Dialogue': back_trans_aug.augment(train.Dialogue[i]), 'Mood':train.Mood[i], 'Sport':train.Sport[i], \n                                      'Glucose': train.Glucose[i], 'Glucose_binary':train.Glucose_binary[i], 'Insulin': train.Insulin[i],\n                                      'Insulin_dose': train.Insulin_dose[i], 'Bad_food': train.Bad_food[i], \n                                      'Remedies_low_glucose_level': train.Remedies_low_glucose_level[i],\n                                      'Remedies_high_glucose_leve

In [None]:
"""
data_augmented = data_augmentation(train, df_augmented)
"""

'\ndata_augmented = data_augmentation(train, df_augmented)\n'

In [None]:
#data_augmented.shape

In [None]:
"""
data_augmented.to_csv('/content/drive/MyDrive/TFM_Diabetes/data/data_augmentation_160622.csv', index=False)
"""

"\ndata_augmented.to_csv('/content/drive/MyDrive/TFM_Diabetes/data/data_augmentation_160622.csv', index=False)\n"

In [None]:
data_aug = pd.read_csv('/content/drive/MyDrive/TFM_Diabetes/data/data_augmentation_160622.csv')

In [None]:
data_aug.shape

(180, 15)

In [None]:
## Juntar conjunto de train con conjunto data augmentation

train_old = train

df_completed = pd.concat([train, data_aug], axis=0)
train = df_completed
print(train.shape)

(360, 17)


#### 2.2.3. Separar conjuntos para hacerlos específicos

In [None]:
## Categoría: Mood
train_mood = dataframe_Mood(train)


test_mood = dataframe_Mood(test)


## Categoría: Glucose
train_glucose = dataframe_Glucose(train)


test_glucose = dataframe_Glucose(test)


## Categoría: Sport
train_sport = dataframe_Sport(train)


test_sport = dataframe_Sport(test)


## Categoría: Glucose binary
train_glucose_binary = dataframe_Glucose_binary(train)


test_glucose_binary = dataframe_Glucose_binary(test)

## Categoría: Glucose binary
train_glucose_checks = dataframe_Glucose_checks(train)


test_glucose_checks = dataframe_Glucose_checks(test)


## Categoría: Insulin
train_insulin = dataframe_Insulin(train)


test_insulin = dataframe_Insulin(test)


## Categoría: Insulin dose
train_insulin_dose = dataframe_Insulin_dose(train)


test_insulin_dose = dataframe_Insulin_dose(test)


## Categoría: Bad food
train_bad_food = dataframe_Bad_food(train)


test_bad_food = dataframe_Bad_food(test)


## Categoría: Good food
train_good_food = dataframe_Good_food(train)


test_good_food = dataframe_Good_food(test)


## Categoría: Remedies low
train_remedies_low = dataframe_Remedies_low(train)


test_remedies_low = dataframe_Remedies_low(test)


## Categoría: Symptoms low
train_symptoms_low = dataframe_Symptoms_low(train)


test_symptoms_low = dataframe_Symptoms_low(test)


## Categoría: Remedies high
train_remedies_high = dataframe_Remedies_high(train)


test_remedies_high = dataframe_Remedies_high(test)


## Categoría: Symptoms high
train_symptoms_high = dataframe_Symptoms_high(train)


test_symptoms_high = dataframe_Symptoms_high(test)


## Categoría: Risk situation
train_risk_situation = dataframe_Risk_situation(train)


test_risk_situation = dataframe_Risk_situation(test)

## 3 - Preprocesamiento de los datos

La tarea de preprocesamiento de los datos es una de las partes más importantes en un proyecto de procesamiento de lenguaje natural. Dado que los modelos requieren de vectores de datos numéricos para su entrenamiento, es importante procesar la información de los textos de forma adecuada para el buen funcionamiento de este. 

Antes de elegir la forma de representación de los símbolos básicos (caracteres, palabras o frases), es necesario tratar los textos y la información que en estos aparece. 

Según la lengua con la que se desee entrenar el modelo, las tareas de limpieza de los datos pueden tener variaciones. Se recuerda que en el presente *notebook* se pretende utilizar textos en lengua inglesa. 

##### **Preprocesamiento de los datos:**
- **Eliminar letras mayúsculas**: Python hace la diferenciación entre caracteres en mayúsuclas y en minúsculas, por lo tanto, las palabras *Diabetes* y *diabetes* serían interpretadas como diferentes. Sin embargo, para comprender el texto correctamente, esto no debe ser así. Es por ello que se convierte todo el texto a letras minúsculas. 
- **Eliminar caracteres especiales**
- **Eliminar 's**
- **Sustituir las contracciones por su forma original**: [diccionario para expandir las contracciones](https://www.analyticsvidhya.com/blog/2019/06comprehensive-guide-text-summarization-using-deep-learning-python/)
- Opcional: **Eliminar las stop words**


In [None]:
# Diccionario para expandir las contracciones

### Añadidos "haven't": "have not", "didn't": "did not", "don't": "do not",

contraction_mapping_upper = {"ain't": "is not","can't": "cannot", "'cause": "because", "could've": "could have", "haven't": "have not", "didn't": "did not", "isn't" : "is not",

                           "he'd": "he would","he'll": "he will", "he's": "he is", "how'd": "how did", "how'd'y": "how do you", "how'll": "how will", "how's": "how is",

                           "I'd": "I would", "I'd've": "I would have", "I'll": "I will", "I'll've": "I will have","I'm": "I am", "I've": "I have", "i'd": "i would",

                           "i'd've": "i would have", "i'll": "i will",  "i'll've": "i will have","i'm": "i am", "i've": "i have", "it'd": "it would",

                           "it'd've": "it would have", "it'll": "it will", "it'll've": "it will have", "let's": "let us", "ma'am": "madam",

                           "mayn't": "may not", "might've": "might have", "mightn't've": "might not have", "must've": "must have",

                           "mustn't've": "must not have", "needn't've": "need not have","o'clock": "of the clock", "could've": "could have",

                           "oughtn't": "ought not", "oughtn't've": "ought not have", "sha'n't": "shall not", "shan't've": "shall not have",

                           "she'd": "she would", "she'd've": "she would have", "she'll": "she will", "she'll've": "she will have",

                           "shouldn't've": "should not have", "so've": "so have","so's": "so as", "don't": "do not",

                           "this's": "this is","that'd": "that would", "that'd've": "that would have", "that's": "that is", "there'd": "there would",

                           "there'd've": "there would have", "there's": "there is", "here's": "here is","they'd": "they would", "they'd've": "they would have",

                           "they'll": "they will", "they'll've": "they will have", "they're": "they are", "they've": "they have", "to've": "to have",

                           "we'd": "we would", "we'd've": "we would have", "we'll": "we will", "we'll've": "we will have", "we're": "we are",

                           "we've": "we have", "what'll": "what will", "what'll've": "what will have", "what're": "what are",

                           "what's": "what is", "what've": "what have", "when's": "when is", "when've": "when have", "where'd": "where did", "where's": "where is",

                           "where've": "where have", "who'll": "who will", "who'll've": "who will have", "who's": "who is", "who've": "who have",

                           "why's": "why is", "why've": "why have", "will've": "will have", "won't've": "will not have",

                           "would've": "would have", "wouldn't've": "would not have", "y'all": "you all",

                           "y'all'd": "you all would","y'all'd've": "you all would have","y'all're": "you all are","y'all've": "you all have",

                           "you'd've": "you would have", "you'll've": "you will have"}


contraction_mapping = dict((k.lower(), v) for k, v in contraction_mapping_upper .items()) # Convertir todas las clave-valor del diccionario a minúsculas

# Stop words: palabras que no tienen un significado por sí solas (artículos, pronombres, preposiciones)
stop_words = set(stopwords.words('english')) 

In [None]:
# Función para limpiar el texto 

def clean_text(text, remove_stopwords):

  ''' Limpiar los textos '''

  if pd.isna(text):
    text = "nananan"
  
  clean = text.lower() #Convierte todo a minúsculas

  #Eliminar contracciones
  clean = ' '.join([contraction_mapping[t] if t in contraction_mapping else t for t in clean.split(" ")])  #Quitar las contracciones 

  #Eliminar las 's
  clean = re.sub(r"'s", "", clean)

  #Eliminar las <EOS>
  clean = re.sub(r"<eos> ", "", clean)
  #Eliminar las <SOS>
  clean = re.sub(r"<sos> ", "", clean)

  #Eliminar caracteres especiales
  clean = re.sub("[^a-zA-Z 0-9 . ?]", " ", clean)

  #Sustituir nan por <PAD>
  clean = re.sub(r"nananan", "<PAD>", clean)

  #Dejar un espacio entre símbolos
  clean = clean.replace(r" . ", ". ")
  clean = clean.replace(r".", " .")

  clean = clean.replace(r" , ", ", ")
  clean = clean.replace(r",", " ,")

  clean = clean.replace(r" ? ", "? ")
  clean = clean.replace(r"?", " ?")

  clean = clean.replace(r" ! ", "! ")
  clean = clean.replace(r"!", " !")

  #Opcional: eliminar las stop words --> True: texts, False: answers.
  # Las stop words no aportan información durante el entrenamiento del modelo, por lo que se eliminan en los textos. 
  #En los resúmenes se mantienen para que estos sean más naturales.
  if remove_stopwords:
    tokens = [w for w in clean.split() if not w in stop_words] #Separar en tokens las palabras y eliminar las que sean stop words
  else:
    tokens = [w for w in clean.split()] #Separar en tokens las palabras
    
  return (" ".join(tokens).strip())

In [None]:
## Conjunto de entero para analizar el vocabulario
df_total = pd.concat([data, data_aug], axis=0)
df_total = df_total.reset_index()
clean_total_texts = []
for text in df_total.Dialogue:
    clean_total_texts.append(clean_text(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.


In [None]:
df_total.Dialogue[108]

"Hey ! Hi ! How are you doing ? I'm great thanks ! Yesterday night I celebrated my birthday . Oh ! Happy birthday ! Thanks . What did you do ? We went bowling and then we had dinner . What did you eat? We had pizza and some cake after . That is great but it isn't the best option to control your glucose levels . I know , but I was alright last night . Did you check your blood sugar again today ? No , I didn't but I don't have any symptoms . Could you check it to be sure ? Yes , I already did and I am a little high ... what do you recommend me ? You should go do some exercise , do you like cycling ? Yes , I love it . I will do that . Thanks ."

In [None]:
clean_total_texts[108]

'hey hi how are you doing ? i am great thanks yesterday night i celebrated my birthday . oh happy birthday thanks . what did you do ? we went bowling and then we had dinner . what did you eat ? we had pizza and some cake after . that is great but it is not the best option to control your glucose levels . i know but i was alright last night . did you check your blood sugar again today ? no i did not but i do not have any symptoms . could you check it to be sure ? yes i already did and i am a little high . . . what do you recommend me ? you should go do some exercise do you like cycling ? yes i love it . i will do that . thanks .'

In [None]:
## Categoría: Mood

print("---------------------------- Mood -------------------------------")

cleanT_train_mood = []
for text in train_mood.Text:
    cleanT_train_mood.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría Mood han sido tratados.")

cleanA_train_mood = []
for text in train_mood.Answer:
    cleanA_train_mood.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría Mood han sido tratados.")

cleanT_test_mood = []
for text in test_mood.Text:
    cleanT_test_mood.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría Mood han sido tratados.")

cleanA_test_mood = []
for text in test_mood.Answer:
    cleanA_test_mood.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría Mood han sido tratados.")


## Categoría: Glucose

print("---------------------------- Glucose -------------------------------")

cleanT_train_glucose = []
for text in train_glucose.Text:
    cleanT_train_glucose.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría Glucose han sido tratados.")

cleanA_train_glucose = []
for text in train_glucose.Answer:
    cleanA_train_glucose.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría Glucose han sido tratados.")

cleanT_test_glucose = []
for text in test_glucose.Text:
    cleanT_test_glucose.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría Glucose han sido tratados.")

cleanA_test_glucose = []
for text in test_glucose.Answer:
    cleanA_test_glucose.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría Glucose han sido tratados.")

## Categoría: Sport

print("---------------------------- Sport -------------------------------")

cleanT_train_sport = []
for text in train_sport.Text:
    cleanT_train_sport.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría sport han sido tratados.")

cleanA_train_sport = []
for text in train_sport.Answer:
    cleanA_train_sport.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría sport han sido tratados.")

cleanT_test_sport = []
for text in test_sport.Text:
    cleanT_test_sport.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría sport han sido tratados.")

cleanA_test_sport = []
for text in test_sport.Answer:
    cleanA_test_sport.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría sport han sido tratados.")

## Categoría: Glucose binary

print("---------------------------- Glucose binary -------------------------------")

cleanT_train_glucose_binary = []
for text in train_glucose_binary.Text:
    cleanT_train_glucose_binary.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría glucose_binary han sido tratados.")

cleanA_train_glucose_binary = []
for text in train_glucose_binary.Answer:
    cleanA_train_glucose_binary.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría glucose_binary han sido tratados.")

cleanT_test_glucose_binary = []
for text in test_glucose_binary.Text:
    cleanT_test_glucose_binary.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría glucose_binary han sido tratados.")

cleanA_test_glucose_binary = []
for text in test_glucose_binary.Answer:
    cleanA_test_glucose_binary.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría glucose_binary han sido tratados.")


## Categoría: Glucose checks

print("---------------------------- Glucose checks -------------------------------")

cleanT_train_glucose_checks = []
for text in train_glucose_checks.Text:
    cleanT_train_glucose_checks.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría glucose_checks han sido tratados.")

cleanA_train_glucose_checks = []
for text in train_glucose_checks.Answer:
    cleanA_train_glucose_checks.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría glucose_checks han sido tratados.")

cleanT_test_glucose_checks = []
for text in test_glucose_checks.Text:
    cleanT_test_glucose_checks.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría glucose_checks han sido tratados.")

cleanA_test_glucose_checks = []
for text in test_glucose_checks.Answer:
    cleanA_test_glucose_checks.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría glucose_checks han sido tratados.")

## Categoría: Insulin

print("---------------------------- Insulin -------------------------------")

cleanT_train_insulin = []
for text in train_insulin.Text:
    cleanT_train_insulin.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría insulin han sido tratados.")

cleanA_train_insulin = []
for text in train_insulin.Answer:
    cleanA_train_insulin.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría insulin han sido tratados.")

cleanT_test_insulin = []
for text in test_insulin.Text:
    cleanT_test_insulin.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría insulin han sido tratados.")

cleanA_test_insulin = []
for text in test_insulin.Answer:
    cleanA_test_insulin.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría insulin han sido tratados.")


## Categoría: Insulin dose

print("---------------------------- Insulin dose -------------------------------")

cleanT_train_insulin_dose = []
for text in train_insulin_dose.Text:
    cleanT_train_insulin_dose.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría insulin_dose han sido tratados.")

cleanA_train_insulin_dose = []
for text in train_insulin_dose.Answer:
    cleanA_train_insulin_dose.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría insulin_dose han sido tratados.")

cleanT_test_insulin_dose = []
for text in test_insulin_dose.Text:
    cleanT_test_insulin_dose.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría insulin_dose han sido tratados.")

cleanA_test_insulin_dose = []
for text in test_insulin_dose.Answer:
    cleanA_test_insulin_dose.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría insulin_dose han sido tratados.")


## Categoría: Bad food

print("---------------------------- Bad food -------------------------------")

cleanT_train_bad_food = []
for text in train_bad_food.Text:
    cleanT_train_bad_food.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría bad_food han sido tratados.")

cleanA_train_bad_food = []
for text in train_bad_food.Answer:
    cleanA_train_bad_food.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría bad_food han sido tratados.")

cleanT_test_bad_food = []
for text in test_bad_food.Text:
    cleanT_test_bad_food.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría bad_food han sido tratados.")

cleanA_test_bad_food = []
for text in test_bad_food.Answer:
    cleanA_test_bad_food.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría bad_food han sido tratados.")

## Categoría: Good food

print("---------------------------- Good food -------------------------------")

cleanT_train_good_food = []
for text in train_good_food.Text:
    cleanT_train_good_food.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría good_food han sido tratados.")

cleanA_train_good_food = []
for text in train_good_food.Answer:
    cleanA_train_good_food.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría good_food han sido tratados.")

cleanT_test_good_food = []
for text in test_good_food.Text:
    cleanT_test_good_food.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría good_food han sido tratados.")

cleanA_test_good_food = []
for text in test_good_food.Answer:
    cleanA_test_good_food.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría good_food han sido tratados.")

## Categoría: Remedies low

print("---------------------------- Remedies low -------------------------------")

cleanT_train_remedies_low = []
for text in train_remedies_low.Text:
    cleanT_train_remedies_low.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría remedies_low han sido tratados.")

cleanA_train_remedies_low = []
for text in train_remedies_low.Answer:
    cleanA_train_remedies_low.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría remedies_low han sido tratados.")

cleanT_test_remedies_low = []
for text in test_remedies_low.Text:
    cleanT_test_remedies_low.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría remedies_low han sido tratados.")

cleanA_test_remedies_low = []
for text in test_remedies_low.Answer:
    cleanA_test_remedies_low.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría remedies_low han sido tratados.")

## Categoría: Symptoms low

print("---------------------------- Symptoms low -------------------------------")

cleanT_train_symptoms_low = []
for text in train_symptoms_low.Text:
    cleanT_train_symptoms_low.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría symptoms_low han sido tratados.")

cleanA_train_symptoms_low = []
for text in train_symptoms_low.Answer:
    cleanA_train_symptoms_low.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría symptoms_low han sido tratados.")

cleanT_test_symptoms_low = []
for text in test_symptoms_low.Text:
    cleanT_test_symptoms_low.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría symptoms_low han sido tratados.")

cleanA_test_symptoms_low = []
for text in test_symptoms_low.Answer:
    cleanA_test_symptoms_low.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría symptoms_low han sido tratados.")

## Categoría: Remedies high

print("---------------------------- Remedies high -------------------------------")

cleanT_train_remedies_high = []
for text in train_remedies_high.Text:
    cleanT_train_remedies_high.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría remedies_high han sido tratados.")

cleanA_train_remedies_high = []
for text in train_remedies_high.Answer:
    cleanA_train_remedies_high.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría remedies_high han sido tratados.")

cleanT_test_remedies_high = []
for text in test_remedies_high.Text:
    cleanT_test_remedies_high.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría remedies_high han sido tratados.")

cleanA_test_remedies_high = []
for text in test_remedies_high.Answer:
    cleanA_test_remedies_high.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría remedies_high han sido tratados.")


## Categoría: Symptoms high

print("---------------------------- Symptoms high -------------------------------")

cleanT_train_symptoms_high = []
for text in train_symptoms_high.Text:
    cleanT_train_symptoms_high.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría symptoms_high han sido tratados.")

cleanA_train_symptoms_high = []
for text in train_symptoms_high.Answer:
    cleanA_train_symptoms_high.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría symptoms_high han sido tratados.")

cleanT_test_symptoms_high = []
for text in test_symptoms_high.Text:
    cleanT_test_symptoms_high.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría symptoms_high han sido tratados.")

cleanA_test_symptoms_high = []
for text in test_symptoms_high.Answer:
    cleanA_test_symptoms_high.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría symptoms_high han sido tratados.")

## Categoría: Risk situation

print("---------------------------- Risk situation -------------------------------")

cleanT_train_risk_situation = []
for text in train_risk_situation.Text:
    cleanT_train_risk_situation.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de entrenamiento de la categoría risk_situation han sido tratados.")

cleanA_train_risk_situation = []
for text in train_risk_situation.Answer:
    cleanA_train_risk_situation.append(clean_text(text, remove_stopwords=False))
print("Todos las respuestas de entrenamiento de la categoría risk_situation han sido tratados.")

cleanT_test_risk_situation = []
for text in test_risk_situation.Text:
    cleanT_test_risk_situation.append(clean_text(text, remove_stopwords=False))
print("Todos los textos de validación de la categoría risk_situation han sido tratados.")

cleanA_test_risk_situation = []
for text in test_risk_situation.Answer:
    cleanA_test_risk_situation.append(clean_text(text, remove_stopwords=False)) 
print("Todos las respuestas de validación de la categoría risk_situation han sido tratados.")

---------------------------- Mood -------------------------------
Todos los textos de entrenamiento de la categoría Mood han sido tratados.
Todos las respuestas de entrenamiento de la categoría Mood han sido tratados.
Todos los textos de validación de la categoría Mood han sido tratados.
Todos las respuestas de validación de la categoría Mood han sido tratados.
---------------------------- Glucose -------------------------------
Todos los textos de entrenamiento de la categoría Glucose han sido tratados.
Todos las respuestas de entrenamiento de la categoría Glucose han sido tratados.
Todos los textos de validación de la categoría Glucose han sido tratados.
Todos las respuestas de validación de la categoría Glucose han sido tratados.
---------------------------- Sport -------------------------------
Todos los textos de entrenamiento de la categoría sport han sido tratados.
Todos las respuestas de entrenamiento de la categoría sport han sido tratados.
Todos los textos de validación de la

## 4 - Análisis de las palabras del texto: *word embedding*

In [None]:
def count_words(count_dict, text):
    '''Cuenta el número de ocurrencias de cada palabra en una frase del texto'''
    for sentence in text:
        for word in sentence.split():
            if word not in count_dict:
                count_dict[word] = 1
            else:
                count_dict[word] += 1

In [None]:
# Saber el número de veces que cada palabra se ha usado. Conocer el tamaño del vocabulario total.
word_counts = {}

count_words(word_counts, clean_total_texts)

# Falta AÑADIR DATOS de las respuestas            
print("Tamaño del vocabulario:", len(word_counts))

Tamaño del vocabulario: 1065


In [None]:
sorted(word_counts.items(), key=operator.itemgetter(1), reverse=True )

[('.', 3815),
 ('i', 2187),
 ('you', 1538),
 ('?', 1124),
 ('a', 720),
 ('your', 704),
 ('have', 576),
 ('to', 548),
 ('am', 543),
 ('do', 542),
 ('are', 521),
 ('glucose', 505),
 ('it', 492),
 ('how', 481),
 ('and', 481),
 ('is', 384),
 ('levels', 380),
 ('the', 376),
 ('not', 375),
 ('yes', 359),
 ('my', 356),
 ('that', 320),
 ('will', 316),
 ('ok', 278),
 ('sugar', 268),
 ('for', 268),
 ('did', 263),
 ('should', 258),
 ('some', 257),
 ('feel', 257),
 ('what', 253),
 ('of', 246),
 ('good', 228),
 ('blood', 215),
 ('thanks', 208),
 ('high', 202),
 ('eat', 197),
 ('low', 193),
 ('insulin', 188),
 ('very', 180),
 ('but', 176),
 ('oh', 175),
 ('hello', 175),
 ('be', 165),
 ('okay', 165),
 ('now', 164),
 ('go', 158),
 ('little', 141),
 ('thank', 140),
 ('in', 129),
 ('today', 127),
 ('think', 127),
 ('check', 127),
 ('with', 121),
 ('could', 120),
 ('great', 117),
 ('help', 116),
 ('was', 110),
 ('maybe', 110),
 ('better', 107),
 ('or', 106),
 ('hi', 104),
 ('dose', 100),
 ('right', 98),


### 4.1. *Word embeddings*
***Word embedding***:  la forma de representación de los símbolos básicos

GloVe es un conjunto de vectores semánticos (**word embeddings**) que permite comparar los significados de las palabras de forma numérica.  Se han descargados directamente de este [enlace](https://nlp.stanford.edu/projects/glove/).

In [None]:
embeddings_index = {}
#with open('/content/drive/MyDrive/TFM_Diabetes/ConceptNet/numberbatch-en.txt', encoding='utf-8') as f:
with open('/content/drive/MyDrive/TFM_Diabetes/glove/glove.6B.300d.txt', encoding='utf-8') as f: #Utilizamos glove porque ya tiene codificadas más palabras del texto (símbolos, números y mellitus)

    for line in f:
        values = line.split(' ')
        word = values[0]
        embedding = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = embedding

print('Word embeddings:', len(embeddings_index))

Word embeddings: 400001


In [None]:
# Encontrar el número de palabras que no estan en CN y que se utilizan más de N veces (threshold)

#Las palabras que no estén en CN pueden añadirse a la matriz de word embeddings pero si son suficientemente comunes en los textos
missing_words = 0
threshold = 1 

for word, count in word_counts.items():
    if count > threshold:
        if word not in embeddings_index:
            missing_words += 1
            print(word)
            
missing_ratio = round(missing_words/len(word_counts),4)*100
            
print("Number of words missing from CN:", missing_words)
print("Percent of words that are missing from vocabulary: {}%".format(missing_ratio))

.thanks
Number of words missing from CN: 1
Percent of words that are missing from vocabulary: 0.09%


In [None]:
# Limitar el vocabulario a utilizar

#Diccionario para convertir las palabras a números enteros
vocab_to_int = {} 

# El vocabulario que se utilizará son palabras que aparezcan más de N veces (threshold) o que estén en CN
value = 0
for word, count in word_counts.items():
    if count >= threshold or word in embeddings_index: 
        vocab_to_int[word] = value
        value += 1

# Tokens especiales que deben añadirse al vocabulario
codes = ["<UNK>","<PAD>","<EOS>","<GO>"]   

# Añadir los tokens especiales al vocabulario 
for code in codes:
    vocab_to_int[code] = len(vocab_to_int)

# Diccionario para convertir los números enteros en palabras
int_to_vocab = {}  
for word, value in vocab_to_int.items():
    int_to_vocab[value] = word

# Porcentaje de palabras que se utiliza
usage_ratio = round(len(vocab_to_int) / len(word_counts),4)*100 

print("Total number of unique words:", len(word_counts))
print("Number of words we will use:", len(vocab_to_int))
print("Percent of words we will use: {}%".format(usage_ratio))

Total number of unique words: 1065
Number of words we will use: 1069
Percent of words we will use: 100.38%


In [None]:
# Los vectores CN tienen 300 dimensiones, por lo tanto, se necesitan 300 dimensiones para el embedding
embedding_dim = 300  
nb_words = len(vocab_to_int)

# Crear una matriz de ceros, para rellenarla con los embeddings
word_embedding_matrix = np.zeros((nb_words, embedding_dim), dtype=np.float32)
# Para cada palabra en el vocabulario
for word, i in vocab_to_int.items():

    if word in embeddings_index:
        # Si está en CN, añadir el embedding a la matriz
        word_embedding_matrix[i] = embeddings_index[word]
    else:
        # Si no está en CN, crear un embedding aleatorio para ella y añadirlo a la matriz
        new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))
        embeddings_index[word] = new_embedding
        word_embedding_matrix[i] = new_embedding

# Verificar si su valor es el mismo de vocab_to_int
print("len(word_embedding_matrix) == len(vocab_to_int) ", len(word_embedding_matrix) == len(vocab_to_int) )

len(word_embedding_matrix) == len(vocab_to_int)  True


### 4.2. Formatear las palabras del texto para que sean aptas para el modelo

In [None]:
def convert_to_ints(text, word_count, unk_count, eos=False):
    ''' Convertir las palabras del texto en un número entero. 
          Si la palabra no está en vocab_to_int, utilizar el UNK
        Calcular el número total de palabras y UNKs
        Añadir el token EOS (end of sentence) al final de los textos '''

    ints = []
    for sentence in text:
        sentence_ints = []
        for word in sentence.split():
            word_count += 1
            if word in vocab_to_int:
                sentence_ints.append(vocab_to_int[word])
            else:
                sentence_ints.append(vocab_to_int["<UNK>"])
                unk_count += 1
        if eos:
            sentence_ints.append(vocab_to_int["<EOS>"])
        ints.append(sentence_ints)
    return ints, word_count, unk_count

### 4.3. Funciones para analizar las longitudes de las frases

In [None]:
def create_lengths(text):
    '''Crear un DataFrame de las longitudes de las frases de un texto'''
    lengths = []
    for sentence in text:
        lengths.append(len(sentence))
    return pd.DataFrame(lengths, columns=['counts'])

In [None]:
def unk_counter(sentence):
    '''Cuenta el número de veces que UNK aparece en una frase'''
    unk_count = 0
    for word in sentence:
        if word == vocab_to_int["<UNK>"]:
            unk_count += 1
    return unk_count

In [None]:
def text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers):
  for length in range(min(lengths_texts.counts), max_text_length): 
    for count, words in enumerate(int_answers):
        if (len(int_answers[count]) >= min_length and
            len(int_answers[count]) <= max_answer_length and
            len(int_texts[count]) >= min_length and
            unk_counter(int_answers[count]) <= unk_answer_limit and
            unk_counter(int_texts[count]) <= unk_text_limit and
            length == len(int_texts[count])
           ):
            sorted_answers.append(int_answers[count])
            sorted_texts.append(int_texts[count])

  return(sorted_answers, sorted_texts)

## 5. Diseño del modelo 

El objetivo de este apartado es de nir todas las funciones necesarias para crear un modelo que realice la tarea de resumen de textos de forma automática.

In [None]:
def model_inputs():
    '''Crear los palceholders para las entradas del modelo'''
    
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, name='learning_rate')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    answer_length = tf.placeholder(tf.int32, (None,), name='answer_length')
    max_answer_length = tf.reduce_max(answer_length, name='max_dec_len')
    text_length = tf.placeholder(tf.int32, (None,), name='text_length')

    return input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length

In [None]:
def process_encoding_input(target_data, vocab_to_int, batch_size):
    '''Eliminar el id de la última palabra de cada batch y concatenar <GO> en el inicio de cada batch'''
    
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return dec_input

In [None]:
def encoding_layer(rnn_size, sequence_length, num_layers, rnn_inputs, keep_prob):
    '''Crear la capa codificadora'''
    
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer)):
            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size,
                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, 
                                                    input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size,
                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, 
                                                    input_keep_prob = keep_prob)

            enc_output, enc_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                                    cell_bw, 
                                                                    rnn_inputs,
                                                                    sequence_length,
                                                                    dtype=tf.float32)
    # Juntar los outputs dado que se está utilizando una RNN bidireccional 
    enc_output = tf.concat(enc_output,2)
    
    return enc_output, enc_state

In [None]:
def training_decoding_layer(dec_embed_input, answer_length, dec_cell, initial_state, output_layer, 
                            vocab_size, max_answer_length):
    '''Crear los logits de entrenamiento'''
    
    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                        sequence_length=answer_length,
                                                        time_major=False)

    training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                       training_helper,
                                                       initial_state,
                                                       output_layer) 

    training_logits, _ , _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                           output_time_major=False,
                                                           impute_finished=True,
                                                           maximum_iterations=max_answer_length)
    return training_decoder

In [None]:
def inference_decoding_layer(embeddings, start_token, end_token, dec_cell, initial_state, output_layer,
                             max_answer_length, batch_size):
    '''Crear los logits de inferencia'''
    
    start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')
    
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings,
                                                                start_tokens,
                                                                end_token)
                
    inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                        inference_helper,
                                                        initial_state,
                                                        output_layer)
                
    inference_logits, _ , _ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                                            output_time_major=False,
                                                            impute_finished=True,
                                                            maximum_iterations=max_answer_length)
    
    return inference_decoder

In [None]:
def decoding_layer(dec_embed_input, embeddings, enc_output, enc_state, vocab_size, text_length, answer_length, 
                   max_answer_length, rnn_size, vocab_to_int, keep_prob, batch_size, num_layers):
    '''Crear la capa decodificadora y la de atención para las capas de entrenamiento e inferencia'''
    
    for layer in range(num_layers):
        with tf.variable_scope('decoder_{}'.format(layer)):
            lstm = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            dec_cell = tf.contrib.rnn.DropoutWrapper(lstm, 
                                                     input_keep_prob = keep_prob)
    
    output_layer = Dense(vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    attn_mech = tf.contrib.seq2seq.BahdanauAttention(rnn_size,
                                                  enc_output,
                                                  text_length,
                                                  normalize=False,
                                                  name='BahdanauAttention')

    dec_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell,
                                                          attn_mech,
                                                          rnn_size)
            
    initial_state = dec_cell.zero_state(batch_size=batch_size,dtype=tf.float32).clone(cell_state=enc_state[0])

    with tf.variable_scope("decode"):
        training_decoder = training_decoding_layer(dec_embed_input, 
                                                  answer_length, 
                                                  dec_cell, 
                                                  initial_state,
                                                  output_layer,
                                                  vocab_size, 
                                                  max_answer_length)
        
        training_logits,_ ,_ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                  output_time_major=False,
                                  impute_finished=True,
                                  maximum_iterations=max_answer_length)
    with tf.variable_scope("decode", reuse=True):
        inference_decoder = inference_decoding_layer(embeddings,  
                                                    vocab_to_int['<GO>'], 
                                                    vocab_to_int['<EOS>'],
                                                    dec_cell, 
                                                    initial_state, 
                                                    output_layer,
                                                    max_answer_length,
                                                    batch_size)
        
        inference_logits,_ ,_ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                  output_time_major=False,
                                  impute_finished=True,
                                  maximum_iterations=max_answer_length)

    return training_logits, inference_logits

In [None]:
def seq2seq_model(input_data, target_data, keep_prob, text_length, answer_length, max_answer_length, 
                  vocab_size, rnn_size, num_layers, vocab_to_int, batch_size):
    '''Utilizar las funciones previas para crear los logits de entrenamiento e inferencia'''
    
    embeddings = word_embedding_matrix # Utilizar los embeddings de Numberbatch y los que se han añadidos después
    
    enc_embed_input = tf.nn.embedding_lookup(embeddings, input_data)
    enc_output, enc_state = encoding_layer(rnn_size, text_length, num_layers, enc_embed_input, keep_prob)
    
    dec_input = process_encoding_input(target_data, vocab_to_int, batch_size)
    dec_embed_input = tf.nn.embedding_lookup(embeddings, dec_input)
    
    training_logits, inference_logits  = decoding_layer(dec_embed_input, 
                                                        embeddings,
                                                        enc_output,
                                                        enc_state, 
                                                        vocab_size, 
                                                        text_length, 
                                                        answer_length, 
                                                        max_answer_length,
                                                        rnn_size, 
                                                        vocab_to_int, 
                                                        keep_prob, 
                                                        batch_size,
                                                        num_layers)
    
    return training_logits, inference_logits

In [None]:
def pad_sentence_batch(sentence_batch):
    """Rellenar las frases con <PAD> para que todas las de un mismo batch tengan la misma longitud"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [vocab_to_int['<PAD>']] * (max_sentence - len(sentence)) for sentence in sentence_batch]

In [None]:
def get_batches(answers, texts, batch_size):
    """Poner en lotes (batch) los sumarios, textos y las longitudes de sus frases juntos"""
    for batch_i in range(0, len(texts)//batch_size):
        start_i = batch_i * batch_size
        answers_batch = answers[start_i:start_i + batch_size]
        texts_batch = texts[start_i:start_i + batch_size]
        pad_answers_batch = np.array(pad_sentence_batch(answers_batch))
        pad_texts_batch = np.array(pad_sentence_batch(texts_batch))
        
        # Need the lengths for the _lengths parameters
        pad_answers_lengths = []
        for answer in pad_answers_batch:
            pad_answers_lengths.append(len(answer))
        
        pad_texts_lengths = []
        for text in pad_texts_batch:
            pad_texts_lengths.append(len(text))
        
        yield pad_answers_batch, pad_texts_batch, pad_answers_lengths, pad_texts_lengths

In [None]:
def text_to_seq(text):
    '''Preparar el texto para el modelo'''
    
    text = clean_text(text, remove_stopwords = True)
    return [vocab_to_int.get(word, vocab_to_int['<UNK>']) for word in text.split()]

## 6- Entrenamiento del modelo 

### 6.1. Categoría GLUCOSE

#### 6.1.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_glucose, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_glucose, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 34521
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     4.905556
std      2.141765
min      1.000000
25%      4.000000
50%      5.000000
75%      5.250000
max     21.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
6.100000000000023
7.0
14.230000000000075


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 204
max_answer_length = 7
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

348
348


#### 6.1.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100
batch_size = 16
rnn_size = 16
num_layers = 1
learning_rate = 0.005
keep_probability = 0.75

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.1.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test13.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 7.767
New Record!
Average loss for this update: 5.001
New Record!
Average loss for this update: 3.507
New Record!
Epoch   1/100 Batch   20/21 - Loss:  5.141, Seconds: 10.12
Average loss for this update: 3.291
New Record!
Average loss for this update: 2.413
New Record!
Average loss for this update: 2.804
No Improvement.
Epoch   2/100 Batch   20/21 - Loss:  2.777, Seconds: 10.82
Average loss for this update: 2.775
No Improvement.
Average loss for this update: 2.116
New Record!
Average loss for this update: 2.462
No Improvement.
Epoch   3/100 Batch   20/21 - Loss:  2.401, Seconds: 8.14
Average loss for this update: 2.479
No Improvement.
Average loss for this update: 1.879
New Record!
Average loss for this update: 2.203
No Improvement.
Epoch   4/100 Batch   20/21 - Loss:  2.144, Seconds: 14.09
Average loss for this update: 2.233
No Improvement.
Average loss for this update: 1.676
New Record!
Average loss for this update: 1.993
No Improvement.
Epoch   5/100 Bat

### 6.2. Categoría MOOD

#### 6.2.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_mood, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_mood, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 34095
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     3.722222
std      1.963612
min      1.000000
25%      3.000000
50%      3.000000
75%      5.000000
max     10.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
5.0
8.0
10.0


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 204
max_answer_length = 9
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

348
348


#### 6.2.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100
batch_size = 16
rnn_size = 32
num_layers = 1
learning_rate = 0.005
keep_probability = 0.85

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.2.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_mood_test4.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 6.038
New Record!
Average loss for this update: 3.031
New Record!
Average loss for this update: 2.261
New Record!
Epoch   1/100 Batch   20/21 - Loss:  3.565, Seconds: 8.78
Average loss for this update: 2.09
New Record!
Average loss for this update: 1.85
New Record!
Average loss for this update: 1.704
New Record!
Epoch   2/100 Batch   20/21 - Loss:  1.840, Seconds: 13.00
Average loss for this update: 1.726
No Improvement.
Average loss for this update: 1.564
New Record!
Average loss for this update: 1.466
New Record!
Epoch   3/100 Batch   20/21 - Loss:  1.558, Seconds: 11.38
Average loss for this update: 1.501
No Improvement.
Average loss for this update: 1.388
New Record!
Average loss for this update: 1.309
New Record!
Epoch   4/100 Batch   20/21 - Loss:  1.377, Seconds: 9.95
Average loss for this update: 1.337
No Improvement.
Average loss for this update: 1.237
New Record!
Average loss for this update: 1.198
New Record!
Epoch   5/100 Batch   20/21 - Loss: 

### 6.3. Categoría SPORT

#### 6.3.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_sport, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_sport, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33327
Total number of UNKs: 2
answers:
           counts
count  360.000000
mean     1.588889
std      1.554180
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     10.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
4.0
5.0
9.0


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 5
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

348
348


#### 6.3.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100
batch_size = 16
rnn_size = 16
num_layers = 2
learning_rate = 0.005
keep_probability = 0.85

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Please use `layer.add_weight` method instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Gr

#### 6.3.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_sport_test7.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 7.245
New Record!
Average loss for this update: 3.591
New Record!
Average loss for this update: 1.404
New Record!
Epoch   1/100 Batch   20/21 - Loss:  3.880, Seconds: 5.24
Average loss for this update: 1.702
No Improvement.
Average loss for this update: 1.221
New Record!
Average loss for this update: 1.132
New Record!
Epoch   2/100 Batch   20/21 - Loss:  1.359, Seconds: 5.35
Average loss for this update: 1.247
No Improvement.
Average loss for this update: 1.059
New Record!
Average loss for this update: 0.978
New Record!
Epoch   3/100 Batch   20/21 - Loss:  1.113, Seconds: 5.29
Average loss for this update: 1.174
No Improvement.
Average loss for this update: 1.016
No Improvement.
Average loss for this update: 0.92
New Record!
Epoch   4/100 Batch   20/21 - Loss:  1.054, Seconds: 5.40
Average loss for this update: 1.142
No Improvement.
Average loss for this update: 0.969
No Improvement.
Average loss for this update: 0.851
New Record!
Epoch   5/100 Batch   20/

### 6.4. Categoría INSULIN

#### 6.4.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_insulin, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_insulin, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33245
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     1.361111
std      1.875703
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     16.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
1.0
1.0
11.82000000000005


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 16
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

360
360


#### 6.4.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 8 ## Bajar el tamaño
rnn_size = 8
num_layers = 1
learning_rate = 0.005
keep_probability = 0.75

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.4.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test3.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 6.31
New Record!
Average loss for this update: 1.663
New Record!
Average loss for this update: 0.408
New Record!
Epoch   1/100 Batch   20/22 - Loss:  2.652, Seconds: 9.54
Average loss for this update: 0.319
New Record!
Average loss for this update: 0.0
New Record!
Average loss for this update: 0.34
No Improvement.
Epoch   2/100 Batch   20/22 - Loss:  0.310, Seconds: 10.28
Average loss for this update: 0.23
No Improvement.
Average loss for this update: 0.004
No Improvement.
Average loss for this update: 0.243
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  0.225, Seconds: 10.74
Average loss for this update: 0.182
No Improvement.
Average loss for this update: 0.009
No Improvement.
Stopping Training.


### 6.5. Categoría INSULIN DOSE

#### 6.5.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_insulin_dose, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_insulin_dose, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33919
Total number of UNKs: 2
answers:
           counts
count  360.000000
mean     3.233333
std      6.436404
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     40.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
10.100000000000023
17.05000000000001
34.6400000000001


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 204
max_answer_length = 34
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.5.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 8 ## Bajar el tamaño
rnn_size = 32
num_layers = 1
learning_rate = 0.005
keep_probability = 0.85

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.5.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test3.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 2.398
New Record!
Epoch   1/100 Batch   20/44 - Loss:  2.089, Seconds: 2.87
Average loss for this update: 1.044
New Record!
Average loss for this update: 0.995
New Record!
Epoch   1/100 Batch   40/44 - Loss:  0.888, Seconds: 4.46
Average loss for this update: 0.319
New Record!
Epoch   2/100 Batch   20/44 - Loss:  0.443, Seconds: 2.90
Average loss for this update: 0.52
No Improvement.
Average loss for this update: 0.735
No Improvement.
Epoch   2/100 Batch   40/44 - Loss:  0.659, Seconds: 4.61
Average loss for this update: 0.306
New Record!
Epoch   3/100 Batch   20/44 - Loss:  0.427, Seconds: 2.98
Average loss for this update: 0.5
No Improvement.
Average loss for this update: 0.721
No Improvement.
Epoch   3/100 Batch   40/44 - Loss:  0.641, Seconds: 4.59
Average loss for this update: 0.311
No Improvement.
Epoch   4/100 Batch   20/44 - Loss:  0.433, Seconds: 2.77
Average loss for this update: 0.499
No Improvement.
Average loss for this update: 0.699
No Improv

### 6.6. Categoría BAD FOOD

#### 6.6.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_bad_food, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_bad_food, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33569
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     2.261111
std      4.493714
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     35.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
5.0
8.200000000000045
26.410000000000025


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 26
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.6.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 32
num_layers = 1
learning_rate = 0.005
keep_probability = 0.99

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.6.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test5.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 4.304
New Record!
Average loss for this update: 0.775
New Record!
Average loss for this update: 1.793
No Improvement.
Epoch   1/100 Batch   20/22 - Loss:  2.146, Seconds: 5.01
Average loss for this update: 0.894
No Improvement.
Average loss for this update: 0.4
New Record!
Average loss for this update: 0.886
No Improvement.
Epoch   2/100 Batch   20/22 - Loss:  0.716, Seconds: 5.18
Average loss for this update: 0.766
No Improvement.
Average loss for this update: 0.395
New Record!
Average loss for this update: 0.78
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  0.639, Seconds: 5.38
Average loss for this update: 0.785
No Improvement.
Average loss for this update: 0.352
New Record!
Average loss for this update: 0.738
No Improvement.
Epoch   4/100 Batch   20/22 - Loss:  0.617, Seconds: 5.47
Average loss for this update: 0.714
No Improvement.
Average loss for this update: 0.341
New Record!
Average loss for this update: 0.723
No Improvement.
Epoch   5/100 B

### 6.7. Categoría GOOD FOOD

#### 6.7.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_good_food, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_good_food, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33229
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     1.316667
std      1.513081
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     13.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
1.0
2.0500000000000114
9.230000000000075


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 11
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

358
358


#### 6.7.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 16
num_layers = 2
learning_rate = 0.005
keep_probability = 0.95

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.7.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 6.137
New Record!
Average loss for this update: 1.386
New Record!
Average loss for this update: 0.463
New Record!
Epoch   1/100 Batch   20/22 - Loss:  2.477, Seconds: 4.56
Average loss for this update: 0.947
No Improvement.
Average loss for this update: 0.766
No Improvement.
Average loss for this update: 0.377
New Record!
Epoch   2/100 Batch   20/22 - Loss:  0.672, Seconds: 4.76
Average loss for this update: 0.536
No Improvement.
Average loss for this update: 0.463
No Improvement.
Average loss for this update: 0.275
New Record!
Epoch   3/100 Batch   20/22 - Loss:  0.417, Seconds: 8.41
Average loss for this update: 0.506
No Improvement.
Average loss for this update: 0.447
No Improvement.
Average loss for this update: 0.252
New Record!
Epoch   4/100 Batch   20/22 - Loss:  0.394, Seconds: 4.65
Average loss for this update: 0.489
No Improvement.
Average loss for this update: 0.435
No Improvement.
Average loss for this update: 0.251
New Record!
Epoch   5/100 Ba

### 6.8. Categoría REMEDIES LOW

#### 6.8.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_remedies_low, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_remedies_low, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 34131
Total number of UNKs: 4
answers:
           counts
count  360.000000
mean     3.822222
std      6.619929
min      1.000000
25%      1.000000
50%      1.000000
75%      4.000000
max     48.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
9.100000000000023
13.100000000000023
41.82000000000005


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 42
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.8.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 32
num_layers = 2
learning_rate = 0.005
keep_probability = 0.95

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.8.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test4.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 5.98
New Record!
Average loss for this update: 2.474
New Record!
Average loss for this update: 1.363
New Record!
Epoch   1/100 Batch   20/22 - Loss:  3.090, Seconds: 11.89
Average loss for this update: 1.345
New Record!
Average loss for this update: 1.5
No Improvement.
Average loss for this update: 0.974
New Record!
Epoch   2/100 Batch   20/22 - Loss:  1.263, Seconds: 9.25
Average loss for this update: 1.162
No Improvement.
Average loss for this update: 1.352
No Improvement.
Average loss for this update: 0.906
New Record!
Epoch   3/100 Batch   20/22 - Loss:  1.136, Seconds: 11.11
Average loss for this update: 1.075
No Improvement.
Average loss for this update: 1.284
No Improvement.
Average loss for this update: 0.858
New Record!
Epoch   4/100 Batch   20/22 - Loss:  1.070, Seconds: 9.86
Average loss for this update: 1.017
No Improvement.
Average loss for this update: 1.21
No Improvement.
Average loss for this update: 0.812
New Record!
Epoch   5/100 Batch   

### 6.9. Categoría SYMPTOMS LOW

#### 6.9.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_symptoms_low, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_symptoms_low, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33605
Total number of UNKs: 2
answers:
           counts
count  360.000000
mean     2.361111
std      2.638401
min      1.000000
25%      1.000000
50%      1.000000
75%      3.000000
max     14.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
6.100000000000023
8.0
12.0


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 12
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

358
358


#### 6.9.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 16
num_layers = 1
learning_rate = 0.005
keep_probability = 0.75

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.9.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test3.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 7.181
New Record!
Average loss for this update: 3.547
New Record!
Average loss for this update: 1.733
New Record!
Epoch   1/100 Batch   20/22 - Loss:  3.891, Seconds: 6.25
Average loss for this update: 1.422
New Record!
Average loss for this update: 1.528
No Improvement.
Average loss for this update: 1.213
New Record!
Epoch   2/100 Batch   20/22 - Loss:  1.350, Seconds: 10.52
Average loss for this update: 1.005
New Record!
Average loss for this update: 1.257
No Improvement.
Average loss for this update: 1.034
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  1.082, Seconds: 8.96
Average loss for this update: 0.939
New Record!
Average loss for this update: 1.208
No Improvement.
Average loss for this update: 0.985
No Improvement.
Epoch   4/100 Batch   20/22 - Loss:  1.027, Seconds: 10.13
Average loss for this update: 0.892
New Record!
Average loss for this update: 1.171
No Improvement.
Average loss for this update: 0.952
No Improvement.
Epoch   5/100 Batc

### 6.10. Categoría REMEDIES HIGH

#### 6.10.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_remedies_high, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_remedies_high, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33763
Total number of UNKs: 2
answers:
           counts
count  360.000000
mean     2.800000
std      3.598824
min      1.000000
25%      1.000000
50%      1.000000
75%      3.000000
max     19.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
8.0
12.0
15.6400000000001


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 16
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.10.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 32
num_layers = 1
learning_rate = 0.005
keep_probability = 0.85

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.10.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.85
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test5.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 4.24
New Record!
Average loss for this update: 1.044
New Record!
Average loss for this update: 1.702
No Improvement.
Epoch   1/100 Batch   20/22 - Loss:  2.208, Seconds: 10.73
Average loss for this update: 0.998
New Record!
Average loss for this update: 0.723
New Record!
Average loss for this update: 1.349
No Improvement.
Epoch   2/100 Batch   20/22 - Loss:  1.014, Seconds: 10.52
Average loss for this update: 0.907
No Improvement.
Average loss for this update: 0.672
New Record!
Average loss for this update: 1.26
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  0.936, Seconds: 8.89
Average loss for this update: 0.875
No Improvement.
Average loss for this update: 0.656
New Record!
Average loss for this update: 1.197
No Improvement.
Epoch   4/100 Batch   20/22 - Loss:  0.897, Seconds: 9.33
Average loss for this update: 0.844
No Improvement.
Average loss for this update: 0.634
New Record!
Average loss for this update: 1.155
No Improvement.
Epoch   5/100 Ba

### 6.11. Categoría SYMPTOMS HIGH

#### 6.11.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_symptoms_high, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_symptoms_high, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33441
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     1.905556
std      2.719367
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     26.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
5.0
7.050000000000011
11.230000000000075


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 12
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.11.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 64
num_layers = 1
learning_rate = 0.005
keep_probability = 0.75

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.11.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test5.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 3.475
New Record!
Average loss for this update: 0.674
New Record!
Average loss for this update: 0.964
No Improvement.
Epoch   1/100 Batch   20/22 - Loss:  1.580, Seconds: 8.17
Average loss for this update: 0.791
No Improvement.
Average loss for this update: 0.462
New Record!
Average loss for this update: 0.731
No Improvement.
Epoch   2/100 Batch   20/22 - Loss:  0.636, Seconds: 10.17
Average loss for this update: 0.801
No Improvement.
Average loss for this update: 0.439
New Record!
Average loss for this update: 0.653
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  0.616, Seconds: 8.18
Average loss for this update: 0.779
No Improvement.
Average loss for this update: 0.473
No Improvement.
Average loss for this update: 0.678
No Improvement.
Epoch   4/100 Batch   20/22 - Loss:  0.612, Seconds: 7.35
Average loss for this update: 0.652
No Improvement.
Average loss for this update: 0.383
New Record!
Average loss for this update: 0.58
No Improvement.
Epoch   

### 6.12. Categoría RISK SITUATION

#### 6.12.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_risk_situation, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_risk_situation, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33843
Total number of UNKs: 0
answers:
           counts
count  360.000000
mean     3.022222
std      3.570157
min      1.000000
25%      1.000000
50%      1.000000
75%      4.000000
max     20.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
9.0
11.050000000000011
14.82000000000005


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 15
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.12.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 64
num_layers = 2
learning_rate = 0.005
keep_probability = 0.75

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.12.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test5.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 3.871
New Record!
Average loss for this update: 1.655
New Record!
Average loss for this update: 2.166
No Improvement.
Epoch   1/100 Batch   20/22 - Loss:  2.542, Seconds: 10.47
Average loss for this update: 1.106
New Record!
Average loss for this update: 0.962
New Record!
Average loss for this update: 1.725
No Improvement.
Epoch   2/100 Batch   20/22 - Loss:  1.328, Seconds: 9.83
Average loss for this update: 1.13
No Improvement.
Average loss for this update: 0.966
No Improvement.
Average loss for this update: 1.664
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  1.310, Seconds: 9.71
Average loss for this update: 1.021
No Improvement.
Average loss for this update: 0.912
New Record!
Average loss for this update: 1.608
No Improvement.
Epoch   4/100 Batch   20/22 - Loss:  1.237, Seconds: 8.75
Average loss for this update: 0.924
No Improvement.
Average loss for this update: 0.836
New Record!
Average loss for this update: 1.466
No Improvement.
Epoch   5/10

### 6.13. Categoría GLUCOSE CHECKS

#### 6.13.1. Análisis de las frases para determinar los hiperparámetros

In [None]:
# Aplicar la función convert_to_ints a los clean_answers y clean_texts
word_count = 0
unk_count = 0

int_answers, word_count, unk_count = convert_to_ints(cleanA_train_glucose_checks, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(cleanT_train_glucose_checks, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words:", word_count)
print("Total number of UNKs:", unk_count)

#############################

lengths_answers = create_lengths(int_answers)
lengths_texts = create_lengths(int_texts)

print("answers:")
print(lengths_answers.describe())
print()
print("Texts:")
print(lengths_texts.describe())

##############################

# Inspeccionar la longitud de los textos (percentil 90, 95, 99)
print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

# Inspeccionar la longitud de los sumarios (percentil 90, 95, 99)
print(np.percentile(lengths_answers.counts, 90))
print(np.percentile(lengths_answers.counts, 95))
print(np.percentile(lengths_answers.counts, 99))


Total number of words: 33451
Total number of UNKs: 2
answers:
           counts
count  360.000000
mean     1.933333
std      3.060395
min      1.000000
25%      1.000000
50%      1.000000
75%      1.000000
max     20.000000

Texts:
           counts
count  360.000000
mean    91.986111
std     29.089571
min     30.000000
25%     72.000000
50%     86.000000
75%    104.000000
max    202.000000
134.0
154.05
191.05000000000013
5.0
8.0
16.230000000000075


In [None]:
###############################

# Clasifica los sumarios y los textos según la longitud de los textos (del más corto al más largo)
# Limita la longitud de los sumarios y textos basándose en los rangos mín-máx
# Eliminar los sumarios y los textos que inclyan demasiados UNKs

sorted_answers = []
sorted_texts = []
max_text_length = 203
max_answer_length = 17
min_length = 1
unk_text_limit = 100
unk_answer_limit = 2

sorted_answers, sorted_texts = text_length_selection(sorted_answers, sorted_texts, max_text_length, max_answer_length, min_length, unk_text_limit, unk_answer_limit, int_answers)
        
# Compare lengths to ensure they match
print(len(sorted_answers))
print(len(sorted_texts))

356
356


#### 6.12.2. Selección de los hiperparámetros

In [None]:
# Seleccionar los hiperparámetros
epochs = 100 ## Bajar el número
batch_size = 16 ## Bajar el tamaño
rnn_size = 64
num_layers = 2
learning_rate = 0.005
keep_probability = 0.75

In [None]:
# Construir el gráfico
train_graph = tf.Graph()
# Fijar el gráfico a sus valores por defecto para asegurar que está preparado para entrenarse
with train_graph.as_default():
    
    # Cargar las entradas del modelo    
    input_data, targets, lr, keep_prob, answer_length, max_answer_length, text_length = model_inputs()

    # Crear los logits de entrenamiento e inferencia 
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      answer_length,
                                                      max_answer_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Crear los tensores para el entrenamiento de los logits de entrenamiento y de inferencia
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Crear los pesos para sequence_loss 
    masks = tf.sequence_mask(answer_length, max_answer_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Función de pérdida 
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizador 
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


#### 6.12.3. Entrenamiento del modelo

In [None]:
# Entrenar el modelo
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Comprobar la pérdida de entrenamiento cada 20 batches 
stop_early = 0 
stop = 6 # Si la función de pérdida no decrece después de 6 chequeos consecutivos, parar el entrenamiento 
per_epoch = 3 # Hacer 3 chequeos de actualización en cada epoch 
update_check = (len(sorted_texts)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
answer_update_loss = [] #Guardar la actualización de las pérdidas para mejoras en el modelo 

  
tf.reset_default_graph()
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test5.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # Si se quiere continuar entrenando una sesión anterior:
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    # sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (answers_batch, texts_batch, answers_lengths, texts_lengths) in enumerate(
                get_batches(sorted_answers, sorted_texts, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: answers_batch,
                 lr: learning_rate,
                 answer_length: answers_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                answer_update_loss.append(update_loss)
                
              
                  
                # Si la actualización de la pérdida tiene un nuevo mínimo, guardar el modelo
                if update_loss <= min(answer_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reducir la tasa de aprendizaje (siempre por encima de su valor mínimo) 
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Average loss for this update: 3.12
New Record!
Average loss for this update: 0.626
New Record!
Average loss for this update: 0.728
No Improvement.
Epoch   1/100 Batch   20/22 - Loss:  1.448, Seconds: 8.42
Average loss for this update: 0.426
New Record!
Average loss for this update: 0.22
New Record!
Average loss for this update: 0.493
No Improvement.
Epoch   2/100 Batch   20/22 - Loss:  0.423, Seconds: 8.39
Average loss for this update: 0.334
No Improvement.
Average loss for this update: 0.198
New Record!
Average loss for this update: 0.422
No Improvement.
Epoch   3/100 Batch   20/22 - Loss:  0.359, Seconds: 10.66
Average loss for this update: 0.308
No Improvement.
Average loss for this update: 0.18
New Record!
Average loss for this update: 0.391
No Improvement.
Epoch   4/100 Batch   20/22 - Loss:  0.333, Seconds: 8.71
Average loss for this update: 0.279
No Improvement.
Average loss for this update: 0.17
New Record!
Average loss for this update: 0.351
No Improvement.
Epoch   5/100 Batch

## 7. Generación de resúmenes

In [None]:
## Función para generar las respuestas

def generate_answers(clean_texts, clean_answers, model):
  generated_answer = []

  j = 0

  for i in range(len(clean_texts)):
    j += 1
    print(j)

    input_sentence = clean_texts[i]
    text = text_to_seq(clean_texts[i])

    checkpoint = model

    loaded_graph = tf.Graph()
    with tf.Session(graph=loaded_graph) as sess:
      # Cargar el modelo guardado
      loader = tf.train.import_meta_graph(checkpoint + '.meta')
      loader.restore(sess, checkpoint)

      input_data = loaded_graph.get_tensor_by_name('input:0')
      logits = loaded_graph.get_tensor_by_name('predictions:0')
      text_length = loaded_graph.get_tensor_by_name('text_length:0')
      answer_length = loaded_graph.get_tensor_by_name('answer_length:0')
      keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')
    
      #Multiplicar por el batch_size para emparejar los parámetros de entrada 
      answer_logits = sess.run(logits, {input_data: [text]*batch_size, 
                                        answer_length: [np.random.randint(70,80)], 
                                        text_length: [len(text)]*batch_size,
                                        keep_prob: 1.0})[0] 

    # Eliminar el padding 
    pad = vocab_to_int["<PAD>"] 

    print('\nOriginal Text:', clean_texts[i])
    print('Original answer:', clean_answers[i])

    print('\nText')
    print('  Word Ids:    {}'.format([j for j in text]))
    print('  Input Words: {}'.format(" ".join([int_to_vocab[j] for j in text])))

    print('\nanswer')
    print('  Word Ids:       {}'.format([j for j in answer_logits if j != pad]))
    print('  Response Words: {}'.format(" ".join([int_to_vocab[j] for j in answer_logits if j != pad])))

    new_summ = (" ".join([int_to_vocab[j] for j in answer_logits if j != pad]))

    generated_answer.append(new_summ)

  return(generated_answer)


In [None]:
# Seleccionar los hiperparámetros
epochs = 100
batch_size = 16


#### 7.1. Categoría GLUCOSE

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt


In [None]:
generated_answer_glucose = generate_answers(cleanT_train_glucose, cleanA_train_glucose, "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: a glucose spike

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       [59, 63]
  Response Words: is good
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but

In [None]:
generated_answer_glucose_validation = generate_answers(cleanT_test_glucose, cleanA_test_glucose, "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       [319, 1, 355]
  Response Words: they are okay
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_test12.ckpt

Original Text: how are you ? everything fine . and you ? i m very

### 7.2. Categoría MOOD

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt


In [None]:
generated_answer_mood = generate_answers(cleanT_train_mood, cleanA_train_mood, "/content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: i do not feel great

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       [4, 12, 85, 160, 7]
  Response Words: i do not feel great
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt

Original Text: how is your day going ? not bad and yours 

In [None]:
generated_answer_mood_validation = generate_answers(cleanT_test_mood, cleanA_test_mood, "/content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: i am right

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       [4, 5, 63]
  Response Words: i am good
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_mood_test1.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired tod

### 7.3. Categoría SPORT

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt


In [None]:
generated_answer_sport = generate_answers(cleanT_train_sport, cleanA_train_sport, "/content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don t feel quite w

In [None]:
generated_answer_sport_validation = generate_answers(cleanT_test_sport, cleanA_test_sport, "/content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_sport_test3.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today . what did you do

### 7.4. Categoría INSULIN

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt


In [None]:
generated_answer_insulin = generate_answers(cleanT_train_insulin, cleanA_train_insulin, "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don t feel qui

In [None]:
generated_answer_insulin_validation = generate_answers(cleanT_test_insulin, cleanA_test_insulin, "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_test1.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today . what did yo

### 7.5. Categoría INSULIN DOSE

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt


In [None]:
generated_answer_insulin_dose = generate_answers(cleanT_train_insulin_dose, cleanA_train_insulin_dose, "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don 

In [None]:
generated_answer_insulin_dose_validation = generate_answers(cleanT_test_insulin_dose, cleanA_test_insulin_dose, "/content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_insulin_dose_test4.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today . w

### 7.6. Categoría BAD FOOD

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt


In [None]:
generated_answer_bad_food = generate_answers(cleanT_train_bad_food, cleanA_train_bad_food, "/content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don t feel q

In [None]:
generated_answer_bad_food_validation = generate_answers(cleanT_test_bad_food, cleanA_test_bad_food, "/content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_bad_food_test4.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today . what did 

### 7.7. Categoría GOOD FOOD

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt


In [None]:
generated_answer_good_food = generate_answers(cleanT_train_good_food, cleanA_train_good_food, "/content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don t feel

In [None]:
generated_answer_good_food_validation = generate_answers(cleanT_test_good_food, cleanA_test_good_food, "/content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_good_food_test4.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today . what di

### 7.8. Categoría REMEDIES LOW

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt


In [None]:
generated_answer_remedies_low = generate_answers(cleanT_train_remedies_low, cleanA_train_remedies_low, "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don 

In [None]:
generated_answer_remedies_low_validation = generate_answers(cleanT_test_remedies_low, cleanA_test_remedies_low, "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: eat something with sugar

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       [116, 14, 148]
  Response Words: eat a juice
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_low_test2.ckpt

Original Text: how are you ? everything

### 7.9. Categoría SYMPTOMS LOW

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt


In [None]:
generated_answer_symptoms_low = generate_answers(cleanT_train_symptoms_low, cleanA_train_symptoms_low, "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i don 

In [None]:
generated_answer_symptoms_low_validation = generate_answers(cleanT_test_symptoms_low, cleanA_test_symptoms_low, "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       [4, 160, 14, 382, 362]
  Response Words: i feel a little tired
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_low_test1.ckpt

Original Text: how are you ? everything 

### 7.10. Categoría REMEDIES HIGH

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt


In [None]:
generated_answer_remedies_high = generate_answers(cleanT_train_remedies_high, cleanA_train_remedies_high, "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: you should eat something and it will improve

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       [2, 88, 116, 246, 23, 20, 58, 186, 88, 121, 31]
  Response Words: you should eat something and it will improve should some sport
2
INFO:tensorflow:Restoring parameters from /content/drive/My

In [None]:
generated_answer_remedies_high_validation = generate_answers(cleanT_test_remedies_high, cleanA_test_remedies_high, "/content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_remedies_high_test.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today . w

### 7.11. Categoría SYMPTOMS HIGH

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt


In [None]:
generated_answer_symptoms_high = generate_answers(cleanT_train_symptoms_high, cleanA_train_symptoms_high, "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: i have chest pain

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       [4, 109, 576, 577]
  Response Words: i have chest pain
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt

Original Text: how is your day going ? not b

In [None]:
generated_answer_symptoms_high_validation = generate_answers(cleanT_test_symptoms_high, cleanA_test_symptoms_high, "/content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_symptoms_high_test1.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today .

### 7.12. Categoría RISK SITUATION

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt


In [None]:
generated_answer_risk_situation = generate_answers(cleanT_train_risk_situation, cleanA_train_risk_situation, "/content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: i am still fasting

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       [4, 5, 289, 808]
  Response Words: i am still fasting
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt

Original Text: how is your day going ? not

In [None]:
generated_answer_risk_situation_validation = generate_answers(cleanT_test_risk_situation, cleanA_test_risk_situation, "/content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: i took more insulin than needed

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_risk_situation_test4.ckpt

Original Text: how are you ? everything fine . and 

### 7.13. Categoría GLUCOSE CHECKS

In [None]:
checkpoint = "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Cargar el modelo guardado 
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]

INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt


In [None]:
generated_answer_glucose_checks = generate_answers(cleanT_train_glucose_checks, cleanA_train_glucose_checks, "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt

Original Text: good morning . hi how are you doing ? i do not feel great i have chest pain . did you ate ? no i am still fasting . that is a risk situation of a glucose spike . you should eat something and it will improve . okay thanks . i have to go now . ok have a nice day .
Original answer: <PAD>

Text
  Word Ids:    [63, 18, 9, 486, 3, 160, 7, 576, 577, 9, 79, 3, 289, 808, 9, 809, 810, 34, 490, 9, 116, 246, 186, 9, 355, 96, 9, 126, 9, 118, 67, 124, 9]
  Input Words: good morning . hi ? feel great chest pain . ate ? still fasting . risk situation glucose spike . eat something improve . okay thanks . go . ok nice day .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt

Original Text: how is your day going ? not bad and yours ? i m also fine but somehow i 

In [None]:
generated_answer_glucose_checks_validation = generate_answers(cleanT_test_glucose_checks, cleanA_test_glucose_checks, "/content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt")

1
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt

Original Text: hello i have a problem . yesterday i took more insulin than needed . oh and how do you feel ? i am right i think . i am a bit tired though . it may be it . do not worry eat something with sugar . maybe a cookie or a juice . ok . thanks for your advice .
Original answer: <PAD>

Text
  Word Ids:    [609, 100, 9, 212, 707, 44, 733, 9, 270, 160, 3, 191, 53, 9, 238, 362, 361, 9, 73, 9, 112, 116, 246, 36, 9, 84, 732, 148, 9, 118, 9, 96, 98, 9]
  Input Words: hello problem . yesterday took insulin needed . oh feel ? right think . bit tired though . may . worry eat something sugar . maybe cookie juice . ok . thanks advice .

answer
  Word Ids:       []
  Response Words: 
2
INFO:tensorflow:Restoring parameters from /content/drive/MyDrive/TFM_Diabetes/best_model_glucose_checks_test1.ckpt

Original Text: how are you ? everything fine . and you ? i m very tired today

## 8. Evaluación del modelo

In [None]:
# Función para limpiar el texto 

def clean_answers(text, remove_stopwords):

  ''' Limpiar los textos '''

  if text == '':
    text = "nananan"
  
  clean = text.lower() #Convierte todo a minúsculas

  #Eliminar contracciones
  clean = ' '.join([contraction_mapping[t] if t in contraction_mapping else t for t in clean.split(" ")])  #Quitar las contracciones 

  #Eliminar caracteres especiales
  clean = re.sub("[^a-zA-Z 0-9 . ?]", " ", clean)

  #Sustituir nan por <PAD>
  clean = re.sub(r"nananan", "<PAD>", clean)

  #Opcional: eliminar las stop words --> True: texts, False: answers.
  # Las stop words no aportan información durante el entrenamiento del modelo, por lo que se eliminan en los textos. 
  #En los resúmenes se mantienen para que estos sean más naturales.
  if remove_stopwords:
    tokens = [w for w in clean.split() if not w in stop_words] #Separar en tokens las palabras y eliminar las que sean stop words
  else:
    tokens = [w for w in clean.split()] #Separar en tokens las palabras

  result = " ".join(tokens).strip()

  if result == '':
    result = "<PAD>"
  return (result)

### 8.1. Categoría GLUCOSE

In [None]:
generated = []
for text in generated_answer_glucose:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_glucose:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_glucose_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_glucose:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.579288315833097,
  'p': 0.6217129629629627,
  'r': 0.6020634920634919},
 'rouge-2': {'f': 0.180498235122324,
  'p': 0.1779166666666667,
  'r': 0.19229497354497355},
 'rouge-l': {'f': 0.5785938713886526,
  'p': 0.620787037037037,
  'r': 0.6015079365079363}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.27214285457583903,
  'p': 0.2458333333333333,
  'r': 0.3416666666666667},
 'rouge-2': {'f': 0.024999999750000005, 'p': 0.025, 'r': 0.025},
 'rouge-l': {'f': 0.25785714029012474,
  'p': 0.22916666666666666,
  'r': 0.32916666666666666}}

In [None]:
generated = []
for text in generated_answer_glucose:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_glucose:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_glucose_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_glucose:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.45300218233408623,
  'p': 0.4855753968253966,
  'r': 0.4442487550575782},
 'rouge-2': {'f': 0.26643728947764705,
  'p': 0.2723489858906527,
  'r': 0.2687166253175027},
 'rouge-l': {'f': 0.4526318119637158,
  'p': 0.485019841269841,
  'r': 0.44397097727980045}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.23537878504277354,
  'p': 0.2588095238095238,
  'r': 0.24583333333333335},
 'rouge-2': {'f': 0.061111110379822545,
  'p': 0.057499999999999996,
  'r': 0.06666666666666667},
 'rouge-l': {'f': 0.22628787595186445, 'p': 0.2488095238095238, 'r': 0.2375}}

### 8.2. Categoría MOOD

In [None]:
generated = []
for text in generated_answer_mood:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_mood:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_mood_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_mood:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.5185317432703,
  'p': 0.524537037037037,
  'r': 0.5210648148148148},
 'rouge-2': {'f': 0.19768518416821002,
  'p': 0.19722222222222222,
  'r': 0.2013888888888889},
 'rouge-l': {'f': 0.5185317432703,
  'p': 0.524537037037037,
  'r': 0.5210648148148148}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.1733333323711111, 'p': 0.175, 'r': 0.18333333333333332},
 'rouge-2': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05},
 'rouge-l': {'f': 0.1533333323711111, 'p': 0.15, 'r': 0.16666666666666669}}

In [None]:
generated = []
for text in generated_answer_mood:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_mood:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_mood_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_mood:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.6294563202282404,
  'p': 0.6433101851851852,
  'r': 0.6265674603174602},
 'rouge-2': {'f': 0.4889844539906549,
  'p': 0.494155643738977,
  'r': 0.4889219576719577},
 'rouge-l': {'f': 0.6284462192181394,
  'p': 0.6423842592592592,
  'r': 0.6254563492063492}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.403882779327405,
  'p': 0.43208333333333326,
  'r': 0.4091666666666667},
 'rouge-2': {'f': 0.15749999863500003, 'p': 0.19166666666666668, 'r': 0.15},
 'rouge-l': {'f': 0.3927716682162939,
  'p': 0.41541666666666666,
  'r': 0.40083333333333326}}

### 8.3. Categoría SPORT

In [None]:
generated = []
for text in generated_answer_sport:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_sport:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_sport_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_sport:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.24483465474632812,
  'p': 0.25833333333333336,
  'r': 0.24481481481481485},
 'rouge-2': {'f': 0.1154541440144287, 'p': 0.11625, 'r': 0.11944444444444445},
 'rouge-l': {'f': 0.2432473531590265,
  'p': 0.2555555555555556,
  'r': 0.24370370370370376}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.08499999927000002,
  'p': 0.09166666666666666,
  'r': 0.08333333333333333},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.08499999927000002,
  'p': 0.09166666666666666,
  'r': 0.08333333333333333}}

In [None]:
generated = []
for text in generated_answer_sport:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_sport:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_sport_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_sport:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.23558701969275514,
  'p': 0.24974867724867728,
  'r': 0.24054894179894185},
 'rouge-2': {'f': 0.11543931311269129,
  'p': 0.11605158730158731,
  'r': 0.11914682539682539},
 'rouge-l': {'f': 0.2328092419149774,
  'p': 0.24511904761904763,
  'r': 0.23838844797178133}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.07619047547644243,
  'p': 0.07833333333333334,
  'r': 0.07916666666666666},
 'rouge-2': {'f': 0.019999999760000005, 'p': 0.025, 'r': 0.016666666666666666},
 'rouge-l': {'f': 0.06507936436533132,
  'p': 0.06833333333333333,
  'r': 0.06666666666666667}}

### 8.4. Categoría INSULIN

In [None]:
generated = []
for text in generated_answer_insulin:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_insulin:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_insulin_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_insulin:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
generated = []
for text in generated_answer_insulin:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_insulin:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_insulin_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_insulin:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

### 8.5. Categoría INSULIN DOSE

In [None]:
generated = []
for text in generated_answer_insulin_dose:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_insulin_dose:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_insulin_dose_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_insulin_dose:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.03896770001207109,
  'p': 0.04490470177970178,
  'r': 0.038445578568127595},
 'rouge-2': {'f': 0.027343949007864934,
  'p': 0.030310846560846563,
  'r': 0.029175420168067227},
 'rouge-l': {'f': 0.03896770001207109,
  'p': 0.04490470177970178,
  'r': 0.038445578568127595}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
generated = []
for text in generated_answer_insulin_dose:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_insulin_dose:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_insulin_dose_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_insulin_dose:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.03878428684192828,
  'p': 0.04335530936021132,
  'r': 0.03940180718208865},
 'rouge-2': {'f': 0.030287871179556374,
  'p': 0.0314288570906218,
  'r': 0.032669104870085265},
 'rouge-l': {'f': 0.038599101656743094,
  'p': 0.04318169824910021,
  'r': 0.03920339448367595}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

### 8.6. Categoría BAD FOOD

In [None]:
generated = []
for text in generated_answer_bad_food:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_bad_food:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_bad_food_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_bad_food:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.014277933249196656,
  'p': 0.015277777777777777,
  'r': 0.01950617283950617},
 'rouge-2': {'f': 0.004208754144934191,
  'p': 0.005092592592592592,
  'r': 0.004012345679012346},
 'rouge-l': {'f': 0.014277933249196656,
  'p': 0.015277777777777777,
  'r': 0.01950617283950617}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
generated = []
for text in generated_answer_bad_food:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_bad_food:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_bad_food_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_bad_food:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.01644273281959789,
  'p': 0.02018849206349206,
  'r': 0.021841018974639664},
 'rouge-2': {'f': 0.00571974801328031,
  'p': 0.008333333333333331,
  'r': 0.005044786634460547},
 'rouge-l': {'f': 0.01578913804835606,
  'p': 0.017966269841269842,
  'r': 0.021457877212187556}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.006666666417777788,
  'p': 0.007142857142857143,
  'r': 0.00625},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.006666666417777788,
  'p': 0.007142857142857143,
  'r': 0.00625}}

### 8.7. Categoría GOOD FOOD

In [None]:
generated = []
for text in generated_answer_good_food:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_good_food:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_good_food_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_good_food:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.06541267339572875,
  'p': 0.0651851851851852,
  'r': 0.06873015873015874},
 'rouge-2': {'f': 0.03311028453407109,
  'p': 0.028478835978835975,
  'r': 0.04583333333333333},
 'rouge-l': {'f': 0.06324624414596616,
  'p': 0.06324074074074075,
  'r': 0.06623015873015874}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.04999999950000001, 'p': 0.05, 'r': 0.05},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.04999999950000001, 'p': 0.05, 'r': 0.05}}

In [None]:
generated = []
for text in generated_answer_good_food:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_good_food:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_good_food_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_good_food:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.06371794837175505,
  'p': 0.0637125220458554,
  'r': 0.06772486772486772},
 'rouge-2': {'f': 0.036364060528982715,
  'p': 0.03258818342151675,
  'r': 0.047169312169312166},
 'rouge-l': {'f': 0.0623646720184787,
  'p': 0.06254409171075838,
  'r': 0.06610449735449736}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.04999999950000001, 'p': 0.05, 'r': 0.05},
 'rouge-2': {'f': 0.019999999760000005, 'p': 0.016666666666666666, 'r': 0.025},
 'rouge-l': {'f': 0.04999999950000001, 'p': 0.05, 'r': 0.05}}

### 8.8. Categoría REMEDIES LOW

In [None]:
generated = []
for text in generated_answer_remedies_low:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_remedies_low:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_remedies_low_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_remedies_low:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.2801307160754639,
  'p': 0.3117592592592592,
  'r': 0.2662043049543049},
 'rouge-2': {'f': 0.22159644777982285,
  'p': 0.24264029180695848,
  'r': 0.21427398989898985},
 'rouge-l': {'f': 0.2791677531125009,
  'p': 0.30951058201058196,
  'r': 0.26558702100368764}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.18550116401471092,
  'p': 0.19333333333333333,
  'r': 0.17976190476190473},
 'rouge-2': {'f': 0.1321428561479592, 'p': 0.13125, 'r': 0.13333333333333336},
 'rouge-l': {'f': 0.18550116401471092,
  'p': 0.19333333333333333,
  'r': 0.17976190476190473}}

In [None]:
generated = []
for text in generated_answer_remedies_low:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_remedies_low:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_remedies_low_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_remedies_low:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.2756203611057964,
  'p': 0.3152089345839346,
  'r': 0.26098664059719723},
 'rouge-2': {'f': 0.23061372830046875,
  'p': 0.2599899172632016,
  'r': 0.22178087884099235},
 'rouge-l': {'f': 0.2743105940427716,
  'p': 0.31173761423761426,
  'r': 0.2601316753366058}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.1832919239765152,
  'p': 0.18083333333333335,
  'r': 0.18765151515151515},
 'rouge-2': {'f': 0.11666666592361112, 'p': 0.11428571428571428, 'r': 0.12},
 'rouge-l': {'f': 0.1832919239765152,
  'p': 0.18083333333333335,
  'r': 0.18765151515151515}}

### 8.9. Categoría SYMPTOMS LOW

In [None]:
generated = []
for text in generated_answer_symptoms_low:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_symptoms_low:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_symptoms_low_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_symptoms_low:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.23555775887107108,
  'p': 0.23796296296296293,
  'r': 0.23828703703703707},
 'rouge-2': {'f': 0.18789882855569442,
  'p': 0.18125661375661378,
  'r': 0.20277777777777778},
 'rouge-l': {'f': 0.23555775887107108,
  'p': 0.23796296296296293,
  'r': 0.23828703703703707}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.11999999926, 'p': 0.125, 'r': 0.11666666666666667},
 'rouge-2': {'f': 0.0999999995, 'p': 0.1, 'r': 0.1},
 'rouge-l': {'f': 0.11999999926, 'p': 0.125, 'r': 0.11666666666666667}}

In [None]:
generated = []
for text in generated_answer_symptoms_low:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_symptoms_low:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_symptoms_low_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_symptoms_low:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.24068021804645623,
  'p': 0.24082070707070705,
  'r': 0.24591209716209722},
 'rouge-2': {'f': 0.21986413992385134,
  'p': 0.21546131337798002,
  'r': 0.23163720538720542},
 'rouge-l': {'f': 0.24006293409583893,
  'p': 0.24031565656565657,
  'r': 0.24511844636844643}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.14831168713330095,
  'p': 0.14242424242424243,
  'r': 0.16666666666666669},
 'rouge-2': {'f': 0.07249999911871143,
  'p': 0.06749999999999999,
  'r': 0.08916666666666666},
 'rouge-l': {'f': 0.14831168713330095,
  'p': 0.14242424242424243,
  'r': 0.16666666666666669}}

### 8.10. Categoría REMEDIES HIGH

In [None]:
generated = []
for text in generated_answer_remedies_high:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_remedies_high:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_remedies_high_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_remedies_high:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.1877186178080461,
  'p': 0.1851281588781589,
  'r': 0.20275132275132274},
 'rouge-2': {'f': 0.15427727321553514,
  'p': 0.15194410527743865,
  'r': 0.17370370370370372},
 'rouge-l': {'f': 0.18531962790905618,
  'p': 0.18294561919561916,
  'r': 0.19997354497354497}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05},
 'rouge-2': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05},
 'rouge-l': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05}}

In [None]:
generated = []
for text in generated_answer_remedies_high:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_remedies_high:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_remedies_high_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_remedies_high:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.18534084122077862,
  'p': 0.18241177323994656,
  'r': 0.19886895693268244},
 'rouge-2': {'f': 0.16652019922417308,
  'p': 0.16172079402890807,
  'r': 0.18453476806417987},
 'rouge-l': {'f': 0.18349196025425055,
  'p': 0.18071661654478988,
  'r': 0.19680079633511008}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05},
 'rouge-2': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05},
 'rouge-l': {'f': 0.04999999975, 'p': 0.05, 'r': 0.05}}

### 8.11. Categoría SYMPTOMS HIGH

In [None]:
generated = []
for text in generated_answer_symptoms_high:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_symptoms_high:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_symptoms_high_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_symptoms_high:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.12426807691214936,
  'p': 0.12675925925925927,
  'r': 0.12314814814814813},
 'rouge-2': {'f': 0.08911375612445467,
  'p': 0.08435185185185186,
  'r': 0.09722222222222222},
 'rouge-l': {'f': 0.11982363246770493,
  'p': 0.1212037037037037,
  'r': 0.11944444444444445}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.024999999750000005, 'p': 0.025, 'r': 0.025},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.024999999750000005, 'p': 0.025, 'r': 0.025}}

In [None]:
generated = []
for text in generated_answer_symptoms_high:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_symptoms_high:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_symptoms_high_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_symptoms_high:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.12366079836918664,
  'p': 0.12774911816578483,
  'r': 0.12316358024691355},
 'rouge-2': {'f': 0.10631321911806517,
  'p': 0.10555555555555554,
  'r': 0.11182760141093474},
 'rouge-l': {'f': 0.12076134442806605,
  'p': 0.1239296737213404,
  'r': 0.12074074074074073}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.029999999750000006, 'p': 0.03, 'r': 0.03},
 'rouge-2': {'f': 0.012499999750000004, 'p': 0.0125, 'r': 0.0125},
 'rouge-l': {'f': 0.029999999750000006, 'p': 0.03, 'r': 0.03}}

### 8.12. Categoría RISK SITUATION

In [None]:
generated = []
for text in generated_answer_risk_situation:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_risk_situation:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_risk_situation_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_risk_situation:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.3111226411564894,
  'p': 0.30401124338624336,
  'r': 0.3240277777777778},
 'rouge-2': {'f': 0.24851529874334757,
  'p': 0.23757936507936506,
  'r': 0.2722222222222222},
 'rouge-l': {'f': 0.3111226411564894,
  'p': 0.30401124338624336,
  'r': 0.3240277777777778}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
generated = []
for text in generated_answer_risk_situation:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_risk_situation:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_risk_situation_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_risk_situation:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.3148204924129201,
  'p': 0.3089823024198024,
  'r': 0.32447089947089947},
 'rouge-2': {'f': 0.2852436808177155,
  'p': 0.27619062922371745,
  'r': 0.30069444444444443},
 'rouge-l': {'f': 0.3148204924129201,
  'p': 0.3089823024198024,
  'r': 0.32447089947089947}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.011111110864197537, 'p': 0.0125, 'r': 0.01},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.011111110864197537, 'p': 0.0125, 'r': 0.01}}

### 8.13. Categoría GLUCOSE CHECKS

In [None]:
generated = []
for text in generated_answer_glucose_checks:
  generated.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_glucose_checks:
  original.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_glucose_checks_validation:
  generated_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_glucose_checks:
  original_val.append(clean_answers(text, remove_stopwords=True))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.0711633424097013,
  'p': 0.07250000000000001,
  'r': 0.07242283950617284},
 'rouge-2': {'f': 0.0614831152363684,
  'p': 0.05951719576719577,
  'r': 0.06721009700176367},
 'rouge-l': {'f': 0.0711633424097013,
  'p': 0.07250000000000001,
  'r': 0.07242283950617284}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

In [None]:
generated = []
for text in generated_answer_glucose_checks:
  generated.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original = []
for text in cleanA_train_glucose_checks:
  original.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

generated_val = []
for text in generated_answer_glucose_checks_validation:
  generated_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

original_val = []
for text in cleanA_test_glucose_checks:
  original_val.append(clean_answers(text, remove_stopwords=False))
print("Todos los textos han sido tratados.")

Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.
Todos los textos han sido tratados.


In [None]:
rouge = Rouge()
rouge.get_scores(generated, original, avg=True)

{'rouge-1': {'f': 0.06737015514468454,
  'p': 0.06685295414462081,
  'r': 0.07038027761711972},
 'rouge-2': {'f': 0.0592699650736461,
  'p': 0.05647690272690272,
  'r': 0.06590571850988518},
 'rouge-l': {'f': 0.06737015514468454,
  'p': 0.06685295414462081,
  'r': 0.07038027761711972}}

In [None]:
rouge = Rouge()
rouge.get_scores(generated_val, original_val, avg=True)

{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
 'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}

## 9- Guardar los resultados

In [None]:
glucose_Train = pd.DataFrame(generated_answer_glucose)
glucose_Test = pd.DataFrame(generated_answer_glucose_validation)

glucose_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/glucose_Train_0622.csv', index=False)
glucose_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/glucose_Test_0622.csv', index=False)

In [None]:
mood_Train = pd.DataFrame(generated_answer_mood)
mood_Test = pd.DataFrame(generated_answer_mood_validation)

mood_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/mood_Train_0622.csv', index=False)
mood_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/mood_Test_0622.csv', index=False)

In [None]:
sport_Train = pd.DataFrame(generated_answer_sport)
sport_Test = pd.DataFrame(generated_answer_sport_validation)

sport_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/sport_Train_0622.csv', index=False)
sport_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/sport_Test_0622.csv', index=False)

In [None]:
insulin_Train = pd.DataFrame(generated_answer_insulin)
insulin_Test = pd.DataFrame(generated_answer_insulin_validation)

insulin_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/insulin_Train_0622.csv', index=False)
insulin_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/insulin_Test_0622.csv', index=False)

In [None]:
insulin_dose_Train = pd.DataFrame(generated_answer_insulin_dose)
insulin_dose_Test = pd.DataFrame(generated_answer_insulin_dose_validation)

insulin_dose_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/insulin_dose_Train_0622.csv', index=False)
insulin_dose_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/insulin_dose_Test_0622.csv', index=False)

In [None]:
bad_food_Train = pd.DataFrame(generated_answer_bad_food)
bad_food_Test = pd.DataFrame(generated_answer_bad_food_validation)

bad_food_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/bad_food_Train_0622.csv', index=False)
bad_food_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/bad_food_Test_0622.csv', index=False)

In [None]:
good_food_Train = pd.DataFrame(generated_answer_good_food)
good_food_Test = pd.DataFrame(generated_answer_good_food_validation)

good_food_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/good_food_Train_0622.csv', index=False)
good_food_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/good_food_Test_0622.csv', index=False)

In [None]:
remedies_low_Train = pd.DataFrame(generated_answer_remedies_low)
remedies_low_Test = pd.DataFrame(generated_answer_remedies_low_validation)

remedies_low_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/remedies_low_Train_0622.csv', index=False)
remedies_low_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/remedies_low_Test_0622.csv', index=False)

In [None]:
symptoms_low_Train = pd.DataFrame(generated_answer_symptoms_low)
symptoms_low_Test = pd.DataFrame(generated_answer_symptoms_low_validation)

symptoms_low_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/symptoms_low_Train_0622.csv', index=False)
symptoms_low_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/symptoms_low_Test_0622.csv', index=False)

In [None]:
remedies_high_Train = pd.DataFrame(generated_answer_remedies_high)
remedies_high_Test = pd.DataFrame(generated_answer_remedies_high_validation)

remedies_high_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/remedies_high_Train_0622.csv', index=False)
remedies_high_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/remedies_high_Test_0622.csv', index=False)

In [None]:
symptoms_high_Train = pd.DataFrame(generated_answer_symptoms_high)
symptoms_high_Test = pd.DataFrame(generated_answer_symptoms_high_validation)

symptoms_high_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/symptoms_high_Train_0622.csv', index=False)
symptoms_high_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/symptoms_high_Test_0622.csv', index=False)

In [None]:
risk_situation_Train = pd.DataFrame(generated_answer_risk_situation)
risk_situation_Test = pd.DataFrame(generated_answer_risk_situation_validation)

risk_situation_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/risk_situation_Train_0622.csv', index=False)
risk_situation_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/risk_situation_Test_0622.csv', index=False)

In [None]:
glucose_checks_Train = pd.DataFrame(generated_answer_glucose_checks)
glucose_checks_Test = pd.DataFrame(generated_answer_glucose_checks_validation)

glucose_checks_Train.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/glucose_checks_Train_0622.csv', index=False)
glucose_checks_Test.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/glucose_checks_Test_0622.csv', index=False)

## 10. Generación de la matriz final

In [None]:
matriz = pd.DataFrame()
matriz["Dialogue"] = cleanT_train_glucose
matriz["Mood"] = generated_answer_mood
matriz["Glucose"] = generated_answer_glucose
matriz["Insulin"] = generated_answer_insulin
matriz["Insulin dose"] = generated_answer_insulin_dose
matriz["Glucose checks"] = generated_answer_glucose_checks
matriz["Symptoms low blood sugar"] = generated_answer_symptoms_low
matriz["Remedies low blood sugar"] = generated_answer_remedies_low
matriz["Symptoms high blood sugar"] = generated_answer_symptoms_high
matriz["Remedies high blood sugar"] = generated_answer_remedies_high
matriz["Risk situation"] = generated_answer_risk_situation
matriz["Good food"] = generated_answer_good_food
matriz["Bad food"] = generated_answer_bad_food
matriz["Sport"] = generated_answer_sport

In [None]:
matriz

Unnamed: 0,Dialogue,Mood,Glucose,Insulin,Insulin dose,Glucose checks,Symptoms low blood sugar,Remedies low blood sugar,Symptoms high blood sugar,Remedies high blood sugar,Risk situation,Good food,Bad food,Sport
0,good morning . hi how are you doing ? i do not...,i do not feel great,is good,,,,,,i have chest pain,you should eat something and it will improve s...,i am still fasting,,,
1,how is your day going ? not bad and yours ? i ...,i am not well,,,,,,drink water sugar,,,i am still fasting,,,
2,good afternoon . how are you feeling ? i am go...,i am good,my blood sugar sugar was good,,,,,,,,,,,walk 1 km per day per day per day per day per ...
3,hello . hi . how are you ? i am sad . why is t...,i am sad,is low,,,,i have a lack of motivation,drink some carbohydrates as a your diet,,,the stress of a new job could impact in your g...,,,
4,hey how have you been ? i have been stressed l...,i have been stressed lately,it low glucose blood glucose blood glucose blo...,,,,i have dizziness,eat some carbohydrates as bread or rice,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
355,good night . hey how was your day ? great than...,great,they,,,,i am very tired,eat a snack,,,,,,
356,what are you doing right now ? i have just coo...,,my blood glucose is quite high,,,,,,,,potatoes have a lot of carbohydrates,,,
357,hey i need to talk to you . i do not feel good...,i do not feel great,they are very low,,,,i feel physical weakness,eat some carbohydrates as bread or pasta,,,,,,
358,good morning . hey how are you ? i am very goo...,i feel good,my glucose are okay,,,,,,,,,dark chocolate,,


In [None]:
T_matriz = pd.DataFrame()
T_matriz["Dialogue"] = cleanT_test_glucose
T_matriz["Mood"] = generated_answer_mood_validation
T_matriz["Glucose"] = generated_answer_glucose_validation
T_matriz["Insulin"] = generated_answer_insulin_validation
T_matriz["Insulin dose"] = generated_answer_insulin_dose_validation
T_matriz["Glucose checks"] = generated_answer_glucose_checks_validation
T_matriz["Symptoms low blood sugar"] = generated_answer_symptoms_low_validation
T_matriz["Remedies low blood sugar"] = generated_answer_remedies_low_validation
T_matriz["Symptoms high blood sugar"] = generated_answer_symptoms_high_validation
T_matriz["Remedies high blood sugar"] = generated_answer_remedies_high_validation
T_matriz["Risk situation"] = generated_answer_risk_situation_validation
T_matriz["Good food"] = generated_answer_good_food_validation
T_matriz["Bad food"] = generated_answer_bad_food_validation
T_matriz["Sport"] = generated_answer_sport_validation

In [None]:
T_matriz.head(20)

Unnamed: 0,Dialogue,Mood,Glucose,Insulin,Insulin dose,Glucose checks,Symptoms low blood sugar,Remedies low blood sugar,Symptoms high blood sugar,Remedies high blood sugar,Risk situation,Good food,Bad food,Sport
0,hello i have a problem . yesterday i took more...,i am good,they are okay,,,,i feel a little tired,eat a juice,,,,,,
1,how are you ? everything fine . and you ? i m ...,i am not okay,i,,,,,drink some sugar to recover more more more mor...,,,calculated my dinner insulin dose incorrectly,,pasta eggs with some pasta with some pasta wit...,
2,hi how are you doing ? i have not been better ...,i am not well,my blood glucose is very high,,,,,eat some sugar,,,,,,
3,hello . how are you ? i am okay thanks for ask...,i feel good,my glucose is ok,,,,,,,,,,,gym exercises in the gym some exercise in the ...
4,good evening . how are you ? i am good . i nee...,i do not feel ok,my blood glucose is high,,,,,,,,i did not take the necessary insulin,,,play baskeball
5,hello how are you feeling today ? i am okay bu...,i am fine,my blood glucose is high,,,,,,i am a little sleepy,walk around 30 minutes every day,,,,run a walk
6,hey how are you ? i am not doing well . yester...,i have not feel good,my blood sugar sugar low blood blood glucose b...,,,,i have lack of focus and i <UNK> <UNK> concent...,drink some sugar water,,go for a walk or go insulin . minutes ago day,i am still fasting,,,
7,hi i do not feel well . what is happening to y...,i am okay,they,,,,i feel a little tired,take some sugar as candy,,,,,,
8,hey how have you been ? i am great i just fini...,i do not feel good,my blood sugar sugar was low after lunch,,,,,,,,,raspberries and blueberries and blueberries an...,,
9,hi what is up ? i do not feel good i have a ve...,i do not feel good,,,,,,drink water sugar,,inject an extra dose of insulin,,,,


In [None]:
matriz.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/resultadosTrain_280622.csv', index=False)
T_matriz.to_csv('/content/drive/MyDrive/TFM_Diabetes/results/resultadosTest_280622.csv', index=False)