Víctor Daniel Cruz González

# Objetivo
Establecer un ambiente de comunicación seguro y de calidad para los usuarios


# Antecedentes

## Recurrent Neuronal Network
Una red neuronal recurrente (RNN) es un tipo de red neuronal que utiliza datos secuenciales o series de tiempo. Estos algoritmos de aprendizaje profundo se utilizan para problemas temporales, como la traducción de idiomas, el procesamiento del lenguaje natural (nlp) o el reconocimiento de voz.

![Unrolled network](unrolled-network.png)

Mientras que las redes neuronales profundas asumen que las entradas y salidas son independientes entre sí, la salida de las recurrentes depende de los elementos anteriores. Asimismo, comparten parámetros en cada capa de la red. Las redes neuronales recurrentes comparten el mismo parámetro de peso dentro de cada capa de la red. Estos pesos todavía se ajustan en los procesos de retropropagación y descenso de gradiente para facilitar el aprendizaje por refuerzo.

Las redes neuronales recurrentes aprovechan el algoritmo de retropropagación a través del tiempo (BPTT) para determinar los gradientes, que es ligeramente diferente de la retropropagación tradicional, ya que es específico de los datos de secuencia. Este algoritmo implica que el modelo se entrena a sí mismo calculando errores desde su capa de salida hasta su capa de entrada. Estos cálculos nos permiten ajustar los parámetros del modelo de manera adecuada. BPTT se diferencia del enfoque tradicional en que BPTT suma los errores en cada paso de tiempo, mientras que las redes de retroalimentación no necesitan sumar errores, ya que no comparten parámetros en cada capa.

![BPPT](bptt.svg)

A través de este proceso, los RNN tienden a encontrarse con dos problemas, conocidos como gradientes explosivos y gradientes que desaparecen. Estos problemas se definen por el tamaño del gradiente, que es la pendiente de la función de pérdida a lo largo de la curva de error. Cuando el gradiente es demasiado pequeño, continúa haciéndolo más pequeño, actualizando los parámetros de peso hasta que se vuelven insignificantes, es decir. 0. Cuando eso ocurre, el algoritmo ya no está aprendiendo. Los degradados explosivos se producen cuando el degradado es demasiado grande, creando un modelo inestable. En este caso, los pesos del modelo crecerán demasiado y, finalmente, se representarán como NaN. Una solución a estos problemas es reducir la cantidad de capas ocultas dentro de la red neuronal, eliminando parte de la complejidad del modelo RNN.

# Implementación

Inicializando librerias.

In [1]:
import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf

import matplotlib.pyplot as plt

In [2]:
def plotGraphs(history, metric):
    """
    Display graph of history and metric
    """
    plt.plot(history.history[metric])
    plt.plot(history.history['val_' + metric], '')
    plt.xlabel('Epochs')
    plt.ylabel(metric)
    plt.legend([metric, 'val' + metric])

Obteniendo dataset de IMDB

In [3]:
dataset, info = tfds.load(
    'imdb_reviews', with_info=True, as_supervised=True)
trainDataset, testDataset = dataset['train'], dataset['test']
print(trainDataset.element_spec)

(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))


Generando ejemplo

In [4]:
for example, label in trainDataset.take(1):
    print('text: ', example.numpy())
    print('label: ', label.numpy())

text:  b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it."
label:  0


Mezclando los datos

In [5]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

trainDataset = trainDataset.shuffle(BUFFER_SIZE).batch(
    BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
testDataset = testDataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

for example, label in trainDataset.take(1):
    print('texts: ', example.numpy()[:3])
    print()
    print('labels: ', label.numpy()[:3])

texts:  [b"This piece of crap, since I can't call it a movie, can be summed up by the following.<br /><br />-Stereotypical black criminal with black midget partner get in trouble -Black Midget pretends to be a baby with a fully developed adult face, body hair and genitalia -Black midget is mistaken(somehow) by man and woman who happen to want a baby -Black midget than goes on to commit acts of physical and sexual violence, demean white people wherever he sees them, and commit more crimes -Happy Ending<br /><br />Honestly, it could have been a good satire if it hadn't been directed so shallowly and had such talentless bastards star in it."
 b'This movie blows - let\'s get that straight right now. There are a few scene gems nestled inside this pile of crap but none can redeem the limp plot. Colin Farrel looks like Brad Pitt in "12 Monkeys" and acts in a similar manner. I normally hate Colin because he is a fairy in general but he\'s OK in this movie. There were two plot lines in this mov

Creando text encoder

In [6]:
VOCAB_SIZE = 1000
encoder = tf.keras.layers.experimental.preprocessing.TextVectorization(
    max_tokens=VOCAB_SIZE)
encoder.adapt(trainDataset.map(lambda text, label: text))
vocab = np.array(encoder.get_vocabulary())
print(vocab[:20])

['' '[UNK]' 'the' 'and' 'a' 'of' 'to' 'is' 'in' 'it' 'i' 'this' 'that'
 'br' 'was' 'as' 'for' 'with' 'movie' 'but']


Mostrando ejemplos del encoder

In [7]:
for example, label in trainDataset.take(1):
    encodedExample = encoder(example)[:3].numpy()
    print(encodedExample)

for example, label in trainDataset.take(1):
    encodedExample = encoder(example)[:3].numpy()
    for n in range(3):
        print('Original: ', example[n].numpy())
        print('Round-trip: ', ' '.join(vocab[encodedExample[n]]))
        print()

[[163  30   2 ...   0   0   0]
 [ 10 208  11 ...   0   0   0]
 [  2  61 274 ...   0   0   0]]
Original:  b"This movie is god awful. Not one quality to this movie. You would think that the gore would be good but it sucks bad. The effects are worse and the acting if you can call it acting is the worst I've ever seen. This movie was obviously shot on a camcorder and runs on a budget around 500 dollars probably. If you want to watch a good Zombie movie than watch Dawn of the dead or Day of the dead. If you want to watch a good cheap shot on video Zombie movie like this but way better than watch Redneck Zombies. Please avoid this movie at all costs. It is unwatchable and pointless. You've been warned. I've got nothing else to say about this stupid movie."
Round-trip:  this movie is god awful not one quality to this movie you would think that the gore would be good but it [UNK] bad the effects are worse and the acting if you can call it acting is the worst ive ever seen this movie was obviou

Implementando arquitectura para RNN

In [8]:
model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(
        input_dim=len(encoder.get_vocabulary()),
        output_dim=64,
        mask_zero=True
    ),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1),
])

print([layer.supports_masking for layer in model.layers])


[False, True, True, True, True]


Viendo si funciona la solución

Texto sin padding

In [9]:
sampleText = ('The movie was cool. The animation and the graphics '
              'were out of this world. I would recommend this movie.')
predictions = model.predict(np.array([sampleText]))
predictions[0]

array([0.00603663], dtype=float32)

Texto con padding

In [10]:
padding = "the " * 2000
predictions = model.predict(np.array([sampleText, padding]))
predictions[0]

array([0.00603663], dtype=float32)

Compilando la solución

In [11]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy']
              )
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
text_vectorization (TextVect (None, None)              0         
_________________________________________________________________
embedding (Embedding)        (None, None, 64)          64000     
_________________________________________________________________
bidirectional (Bidirectional (None, 128)               66048     
_________________________________________________________________
dense (Dense)                (None, 64)                8256      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 138,369
Trainable params: 138,369
Non-trainable params: 0
_________________________________________________________________


Entrenando modelo

In [12]:
history = model.fit(trainDataset, epochs=10,
                    validation_data=testDataset, validation_steps=30)

testLoss, testAcc = model.evaluate(testDataset)
print('Test Loss: ', testLoss)
print('Test Accuracy: ', testAcc)

Epoch 1/10
 52/391 [==>...........................] - ETA: 28:58 - loss: 0.6931 - accuracy: 0.5117

KeyboardInterrupt: 

# Resultados

In [None]:
plt.figure(figsize=(16, 8))
plt.subplot(1, 2, 1)
plotGraphs(history, 'accuracy')
plt.ylim(None, 1)
plt.subplot(1, 2, 2)
plotGraphs(history, 'loss')
plt.ylim(0, None)

![](unrolled-network.png)