<a href="https://colab.research.google.com/github/ProfAI/tf00/blob/master/10%20-%20Modelli%20Sequenziali/rnn_lstm_gru_cnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 # Reti Neurali Ricorrenti
 Le Reti Neurali Ricorrenti sono una particolare architettura di Reti Neurali molto utilizzate nell'analisi di dati sequenziali e nell'analisi di testi. Il vantaggio principale che offrono è che in fase di predizione tengono conto anche delle osservazioni precedenti e/o successive all'interno di una sequenza. In questo notebook vedremo come utilizzare le reti neurali ricorrenti per classificare correttamente una recensione di film come negativa o positiva, utilizzando sempre l'IMDB Movies Reviews Dataset.

## Importiamo i Moduli

In [85]:
import numpy as np

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import Embedding, SimpleRNN, GRU, LSTM, Bidirectional, GlobalMaxPooling1D, Conv1D

## Definiamo le Costanti

In [72]:
MAX_WORDS = 6000
NUM_EMBEDDING = 64
SEQ_MAX_LENGTH = 200

BATCH_SIZE = 128
NUM_EPOCHS = 3
VALIDATION_SPLIT = 0.2

## Carichiamo il Dataset

In [50]:
import os
from sklearn.utils import shuffle
import subprocess


def load_imdb(files_path, labels=["pos", "neg"]):
    
    if(not os.path.isfile("aclImdb_v1.tar.gz")):
      os.system("wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz")
      os.system("tar -xf aclImdb_v1.tar.gz")
    
    label_map = {labels[0]:1, labels[1]:0}
    
    reviews = []
    y = []
    
    for label in labels:
      path = files_path+label
      for file in os.listdir(path):
        review_file = open(path+"/"+file)
        review = review_file.read()    
        
        reviews.append(review)
        y.append(label_map[label])
        
    # la funzione shuffle di sklearn ci permette di
    # mescolare più array allo stesso modo
    
    reviews, y = shuffle(reviews,y)
    
    return(reviews,y)

# Prepariamo i Dati

In [51]:
reviews_train, y_train = load_imdb("aclImdb/train/")
reviews_test, y_test = load_imdb("aclImdb/test/")

tokenizer = Tokenizer(num_words=MAX_WORDS)
tokenizer.fit_on_texts(reviews_train)

X_train = tokenizer.texts_to_sequences(reviews_train)
X_test = tokenizer.texts_to_sequences(reviews_test)

X_train = pad_sequences(X_train, maxlen = SEQ_MAX_LENGTH)
X_test = pad_sequences(X_test, maxlen = SEQ_MAX_LENGTH)

y_train = np.array(y_train)
y_test = np.array(y_test)

## Rete Ricorrente Semplice
Per utilizzare dei semplici strati Ricorrenti, possiamo utilizzare la classe *SimpleRNN* di tf.keras.

In [73]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(SimpleRNN(32))
model.add(Dropout(0.5))
model.add(Dense(1, activation='relu'))

model.summary()

Model: "sequential_27"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_27 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 32)                3104      
_________________________________________________________________
dropout_41 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_45 (Dense)             (None, 1)                 33        
Total params: 387,137
Trainable params: 387,137
Non-trainable params: 0
_________________________________________________________________


In [74]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fc0a64fb9b0>

Il risultato è scarso in quanto la classe *SimpleRNN* definisce, appunto, una rete ricorrente semplice e non andrebbe mai utilizzata nella pratica.

## Rete Ricorrente Gated Recurrent Unit (GRU)
Le Reti Ricorrenti Semplici hanno il problema di non riuscire a operare su sequenza mediamente o molto lunghe, le Reti Gated Recurrent Unit risolvono questo problema.

In [75]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(GRU(32))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_28"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_28 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
gru_3 (GRU)                  (None, 32)                9408      
_________________________________________________________________
dropout_42 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_46 (Dense)             (None, 1)                 33        
Total params: 393,441
Trainable params: 393,441
Non-trainable params: 0
_________________________________________________________________


In [76]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=NUM_EPOCHS)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fc0a5a39e10>

Tramite l'utilizzo di strati GRU siamo riusciti ad ottenere risultato ancora migliori

## Rete Ricorrente Long-Short Term Memory (LSTM)
Le Reti Long-Short Term Memory (LSTM) sono un'altra architettura di Reti Ricorrenti che, come le GRU, riescono ad operare su sequenze lunghe. Le LSTM riescono a gestire sequenze ancora più lunghe delle GRU, ma d'altra parte sono computazionalmente più espensive.

In [77]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(LSTM(32))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_29"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_29 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
lstm_40 (LSTM)               (None, 32)                12416     
_________________________________________________________________
dropout_43 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_47 (Dense)             (None, 1)                 33        
Total params: 396,449
Trainable params: 396,449
Non-trainable params: 0
_________________________________________________________________


In [78]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=NUM_EPOCHS)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fc0a3af7208>

Il risultato che abbiamo ottenuto utilizzando strati LSTM è simile a quello ottenuto utilizzando strati GRU.

## Rete Ricorrente Profonda
Una strato ricorrente richiede in input una sequenza, quindi per definire più strati ricorrenti dobbiamo fare in modo che quello precedente ritorni una sequenza in output, per farlo ci basta utilizzare il parametro *return_sequences*.

In [79]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(Dropout(0.5))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_30"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_30 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
dropout_44 (Dropout)         (None, 200, 64)           0         
_________________________________________________________________
lstm_41 (LSTM)               (None, 200, 32)           12416     
_________________________________________________________________
lstm_42 (LSTM)               (None, 32)                8320      
_________________________________________________________________
dense_48 (Dense)             (None, 64)                2112      
_________________________________________________________________
dropout_45 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_49 (Dense)             (None, 1)               

In [80]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=NUM_EPOCHS)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fc0a24644e0>

## Reti Ricorrenti Bidirezionali
Le Reti Ricorrenti che abbiamo definito fino ad ora sono unidirezionali, tengono conto soltanto dei valori precedenti all'interno della sequenza. Per definire degli strati ricorrenti bidirezionali possiamo usare la classse *Bidirectional*.

In [81]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(Bidirectional(LSTM(32)))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_31"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_31 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
dropout_46 (Dropout)         (None, 200, 64)           0         
_________________________________________________________________
bidirectional_28 (Bidirectio (None, 200, 64)           24832     
_________________________________________________________________
bidirectional_29 (Bidirectio (None, 64)                24832     
_________________________________________________________________
dense_50 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_47 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_51 (Dense)             (None, 1)               

In [82]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=NUM_EPOCHS)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fc0a0815c18>

## Pooling nelle Reti Ricorrenti
Il Pooling può essere utilizzato anche nelle Reti Ricorrenti, ma con una funzione diversa, infatti in questo caso ci permette di far diventare la rete location-invariant, cioè non dipendente dalla posizione dei valori all'interno della sequenza.

In [83]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(GlobalMaxPooling1D())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_32"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_32 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
dropout_48 (Dropout)         (None, 200, 64)           0         
_________________________________________________________________
bidirectional_30 (Bidirectio (None, 200, 64)           24832     
_________________________________________________________________
dropout_49 (Dropout)         (None, 200, 64)           0         
_________________________________________________________________
bidirectional_31 (Bidirectio (None, 200, 64)           24832     
_________________________________________________________________
global_max_pooling1d_9 (Glob (None, 64)                0         
_________________________________________________________________
dense_52 (Dense)             (None, 64)              

In [90]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=NUM_EPOCHS)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fc09b153f98>

## Classificazione di nuove recensioni
Vediamo ora come si comporta quest'ultimo modello sulla classificazione di nuove recensioni.

In [71]:
reviews = ["This movie sucks, I just wasted two hours of my life", "Best movie I have ever seen, the ending was so touching and I made me crying so much.", "Not a bad movie"]

reviews = tokenizer.texts_to_sequences(reviews)
X = pad_sequences(reviews, maxlen = SEQ_MAX_LENGTH)

y = model.predict(X)
print(y)

[[0.19489302]
 [0.97196925]
 [0.4444822 ]]


Nessuna sorpresa sulle prime due classificazione, mentre la terza viene classificata come "neutrale", questo vuol dire che il modello è riuscito a comprendere la negazione.

## Reti Convoluzionali Monodimensionali
Le Reti Convoluzionali possonno anche essere utilizzate per la classificazione di sequenze, in questo caso la classe da utilizzare è Conv1D.

In [87]:
model = Sequential()

model.add(Embedding(MAX_WORDS, NUM_EMBEDDING, input_length=SEQ_MAX_LENGTH))
model.add(Dropout(0.5))
model.add(Conv1D(32, kernel_size=3))
model.add(Conv1D(32, kernel_size=3))
model.add(GlobalMaxPooling1D())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_34"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_34 (Embedding)     (None, 200, 64)           384000    
_________________________________________________________________
dropout_52 (Dropout)         (None, 200, 64)           0         
_________________________________________________________________
conv1d (Conv1D)              (None, 198, 32)           6176      
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 196, 32)           3104      
_________________________________________________________________
global_max_pooling1d_10 (Glo (None, 32)                0         
_________________________________________________________________
dense_54 (Dense)             (None, 64)                2112      
_________________________________________________________________
dropout_53 (Dropout)         (None, 64)              

In [84]:
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=BATCH_SIZE, validation_split=VALIDATION_SPLIT, epochs=NUM_EPOCHS)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fc09ddaa588>