# RNN pour l'analyse de sentiments

Nous allons maintenant faire une analyse de sentiments sur le même jeu de données en utilisant les réseaux de neurones récurrents

## Réseaux de neurones récurrents

Les réseaux de neurones récurrents ou (RNN), sont souvent utilisés pour analyser des séquences.

En effet, dans les réseaux de neurones généraux, un input est traité par un certain nombre de couches et un output est produit à la sortie, avec l'hypothèse que deux inputs successifs sont indépendants.

Cependant, cette hypothèse n'est pas correcte dans un certain nombre de scénarios. 
Par exemple, si on veut prédire le mot suivant dans une séquence, il est indispensable de considérer la dépendance des observations précédentes.

<center> <img src="rnn.png" alt="drawing" width="700"/>

    


On voit sur la partie de gauche de cette figure que cette couche prend en entrée une observation $x_t$ (où $t$ est l’indice de l’observation dans la séquence) et retourne un vecteur $h_t$. On remarque surtout, ce qui n’était pas le cas pour les couches que vous avez étudiées jusqu’à présent, qu’il existe une boucle de rétroaction.

Pour comprendre le fonctionnement de cette boucle, il faut jeter un oeil à la partie droite de l’illustration, dans laquelle on a “déplié” la couche récurrente. On peut alors remarque que pour générer $h_1$, la couche récurrente va non seulement regarder le contenu de $x_1$, mais également obtenir de l’information concernant les états passés (par le biais de la flèche allant de la boîte $A$ du temps $0$ à celle du temps $1$). 
    
Dans notre cas, le modèle RNN prend une séquence de mots $X=\{x_1, ..., x_T\}$, une à la fois, et produit un état caché $h$, pour chaque mot.
On utilise le RNN en lui donnant le mot courant $x_t$ ainsi que l'état caché du mot précédent, $h_{t-1}$, pour produire l'état caché suivant, $h_t$.


$$h_t = \text{RNN}(x_t, h_{t-1})$$


Une fois que l'on a notre état caché final, $h_T$, obtenu après avoir donné le dernier mot de la séquence $x_T$ au modèle, on le donne à une couche linéaire $f$, (qui s'appelle également fully connected layer), pour recevoir notre sentiment prédit, $\hat{y} = f(h_T)$.

<center> <img src="RNN.png" alt="drawing" width="700"/>
        
        
Cette illustration montre un exemple de phrase, avec le RNN prédisant 0, c'est-à-dire que le sentiment est négatif. Le RNN est représenté en orange et la couche linéaire est en gris. On utilise le même RNN pour chaque mot, c'est-à-dire qu'il a les mêmes paramètres. L'état initial caché $h_0$, est un tensor initialisé à zéro.

## Préparation des données

In [1]:
import string
from string import punctuation
from os import listdir
from collections import Counter
import re
import unicodedata

import pandas as pd
import random
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pickle

On reprend les données pré-traitées précédemment.

In [2]:
df = pd.read_pickle("clean_data.pkl")

In [3]:
df.head(10)

Unnamed: 0,Reviews,Tidy_Reviews,label
32,This is the worst thing the TMNT franchise has...,bad thing tmnt franchise ever spawn kid come s...,1
8456,Shame on Julia Roberts and John Cusack. They a...,shame julia robert john cusack talented part s...,1
1650,A poorly written script with no likeable chara...,poorly write script likeable comedy forgot lau...,1
8101,Bo Derek's beauty and John Derek's revolutiona...,bo derek beauty john derek revolutionary direc...,0
4908,For that matter one of the worst FILMS ever ma...,matter bad ever plot follow slog jungle look a...,1
57,Caught the tail end of this movie channel surf...,caught tail end channel surf cable channel int...,0
1363,"the more i think about it, there was nothing r...",think nothing redeem saw month ago memory migh...,1
1503,"I am a big fan of the Spaghetti Western Genre,...",big fan spaghetti western genre usually also l...,1
9557,Remembering the dirty particulars of this insi...,remember dirty particular insidiously vapid ak...,1
5968,Typical De Palma movie made with lot's of styl...,typical de palma lot style bring edge certainl...,0


On sépare les données en données train et en données test.

In [4]:
from sklearn.model_selection import train_test_split
reviews_train, reviews_test, label_train, label_test = train_test_split(df["Tidy_Reviews"], df["label"], test_size=0.2, random_state=42)

On sépare les données test en données de validation et en données test.

In [5]:
data_test, data_val, y_test, y_val = train_test_split(reviews_test, label_test, test_size=0.5, random_state=42)

In [6]:
data_test = data_test.to_numpy()
data_val = data_val.to_numpy()
data_train = reviews_train.to_numpy()

In [7]:
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(reviews_train.shape), 
      "\nValidation set: \t{}".format(data_val.shape),
      "\nTest set: \t\t{}".format(data_test.shape))

			Feature Shapes:
Train set: 		(18163,) 
Validation set: 	(2271,) 
Test set: 		(2270,)


On fait un vocabulaire seulement sur le train.

In [8]:
import nltk    
voc = Counter()
for review in data_train:
    tokens = nltk.word_tokenize(review)
    voc.update(tokens)

Si un mot du test n'apparaît pas dans le vocabulaire on le remplace par unk pour unknown.

In [9]:
voc.update(["<pad>","<unk>"])

Ensuite, on créé un index. Chaque mot est remplacé par un entier.

In [10]:
vocab_to_int = {word: ii for ii, word in enumerate(voc, 1)}

In [11]:
def encode(data):
    reviews_ints = [] 
    for review in data:
        reviews_ints.append([vocab_to_int[word] if word in voc else vocab_to_int["<unk>"] for word in review.split()])
    return reviews_ints

In [12]:
reviews_ints = encode(data_train)

In [13]:
print('Mots uniques: ', len((vocab_to_int)))
print('Commentaire: \n', reviews_ints[:1])

Mots uniques:  54322
Commentaire: 
 [[1, 2, 3, 4, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 9, 14, 15]]


As an additional pre-processing step, we want to make sure that our reviews are in good shape for standard processing. That is, our network will expect a standard input text size, and so, we’ll want to shape our reviews into a specific length. We’ll approach this task in two main steps:

    Getting rid of extremely long or short reviews; the outliers
    Padding/truncating the remaining data so that we have reviews of the same length.

Before we pad our review text, we should check for reviews of extremely short or long lengths; outliers that may mess with our training.

In [14]:
# outlier review stats
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 0
Maximum review length: 1360


In [15]:
print('Number of reviews before removing outliers: ', len(reviews_ints))  ## remove any reviews/labels with zero length from the reviews_ints list.
# get indices of any reviews with length 0 non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]
# remove 0-length reviews and their labels reviews_ints = [reviews_ints[ii] for ii in non_zero_idx] labels = np.array([labels[ii] for ii in non_zero_idx])  
print('Number of reviews after removing outliers: ', len(reviews_ints))      

Number of reviews before removing outliers:  18163
Number of reviews after removing outliers:  18163


To deal with both short and very long reviews, we’ll pad or truncate all our reviews to a specific length for more example you can check this link. For reviews shorter than some seq_length, we'll pad with 0s. For reviews longer than seq_length, we can truncate them to the first seq_length words. A good seq_length, in this case, is 200.

Let’s define a function that returns an array features that contains the padded data, of a standard size, that we'll pass to the network.

    The data should come from review_ints, since we want to feed integers to the network.
    Each row should be seq_length elements long.
    For reviews shorter than seq_length words, left pad with 0s. That is, if the review is ['best', 'movie', 'ever'], [117, 18, 128] as integers, the row will look like [0, 0, 0, ..., 0, 117, 18, 128].
    For reviews longer than seq_length, use only the first seq_length words as the feature vector.


In [16]:
def pad_features(reviews_ints, seq_length):
    ''' Return features of review_ints, where each review is padded with 0'sor truncated to the input seq_length.'''
    
    # getting the correct rows x cols shape
    features = np.zeros((len(reviews_ints), seq_length), dtype=int)
    # for each review, I grab that review and
    for i, row in enumerate(reviews_ints):
        features[i, -len(row):] = np.array(row)[:seq_length]
    return features# Test your implementation!

seq_length = 200

features = pad_features(reviews_ints, seq_length=seq_length)

## test statements - do not change - ##
assert len(features)==len(reviews_ints), "Your features should have as many rows as reviews."
assert len(features[0])==seq_length, "Each feature row should contain seq_length values."

# print first 10 values of the first 30 batches 
print(features[:30,:10])

[[   0    0    0    0    0    0    0    0    0    0]
 [  16   17   18   19   20   21   22   23   24   25]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [ 441  442  443  304  444  271  445  446  447  448]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [ 246  747  748  749  750  149  751  752    5  288]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [1152  412 1153 1154  831 1155 1156 1077 1157 1158]
 [ 587 1367 1368 1369 1370 1371   16    9  611 1372]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0

In [17]:
data_val_ints = encode(data_val)
data_val = pad_features(data_val_ints, seq_length=seq_length)

In [18]:
data_test_ints = encode(data_test)
data_test = pad_features(data_test_ints, seq_length=seq_length)

In [19]:
train_x = np.array(features)
train_y = np.array(label_train)
val_x = np.array(data_val)
val_y = np.array(y_val)
test_x = np.array(data_test)
test_y = np.array(y_test)

In [20]:
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(18163, 200) 
Validation set: 	(2271, 200) 
Test set: 		(2270, 200)


In [21]:
import torch
from torch.utils.data import TensorDataset, DataLoader

In [22]:
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=50)

In [23]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
valid_data = TensorDataset(torch.from_numpy(val_x), torch.from_numpy(val_y))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))

# dataloaders
batch_size = 50

# make sure the SHUFFLE your training data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [24]:
# obtain one batch of training data
dataiter = iter(valid_loader)
sample_x, sample_y = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

Sample input size:  torch.Size([50, 200])
Sample input: 
 tensor([[    0,     0,     0,  ...,  1049,   843,    95],
        [ 3284,  1121,   236,  ...,  1478,   126,  2184],
        [    0,     0,     0,  ...,  1285,  3799, 23692],
        ...,
        [ 5716,  4550,   635,  ..., 41754, 33172,  1951],
        [  704,    47,  2082,  ...,    82,  9478,  1893],
        [ 5186,   580,   765,  ...,    58,  1601,  1303]])

Sample label size:  torch.Size([50])
Sample label: 
 tensor([0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
        0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 0])


In [25]:
# First checking if GPU is available
train_on_gpu=torch.cuda.is_available()
if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

Training on GPU.


The next stage is building the model that we'll eventually train and evaluate.

There is a small amount of boilerplate code when creating models in PyTorch, note how our RNN class is a sub-class of nn.Module and the use of super.

On définit les trois couches de notre modèle :
 
 - embedding layer : transforme notre vecteur one-hot qui contient des 0 en majorité, en vecteur dense qui est de dimension plus petite et tous les éléments sont des nombres réels
 - RNN 
 - couche linéaire : prend l'état caché final et le donne a fully connected layer, $f(h_T)$, le transformant à la dimension de l'output correcte.
 
Within the __init__ we define the layers of the module. Our three layers are an embedding layer, our RNN, and a linear layer. All layers have their parameters initialized to random values, unless explicitly specified.

The embedding layer is used to transform our sparse one-hot vector (sparse as most of the elements are 0) into a dense embedding vector (dense as the dimensionality is a lot smaller and all the elements are real numbers). This embedding layer is simply a single fully connected layer. As well as reducing the dimensionality of the input to the RNN, there is the theory that words which have similar impact on the sentiment of the review are mapped close together in this dense vector space. For more information about word embeddings, see here.

The RNN layer is our RNN which takes in our dense vector and the previous hidden state $h_{t-1}$, which it uses to calculate the next hidden state, $h_t$.

The forward method is called when we feed examples into our model.

Each batch, text, is a tensor of size [sentence length, batch size]. That is a batch of sentences, each having each word converted into a one-hot vector.

You may notice that this tensor should have another dimension due to the one-hot vectors, however PyTorch conveniently stores a one-hot vector as it's index value, i.e. the tensor representing a sentence is just a tensor of the indexes for each token in that sentence. The act of converting a list of tokens into a list of indexes is commonly called numericalizing.

The input batch is then passed through the embedding layer to get embedded, which gives us a dense vector representation of our sentences. embedded is a tensor of size [sentence length, batch size, embedding dim].

embedded is then fed into the RNN. In some frameworks you must feed the initial hidden state, $h_0$, into the RNN, however in PyTorch, if no initial hidden state is passed as an argument it defaults to a tensor of all zeros.

The RNN returns 2 tensors, output of size [sentence length, batch size, hidden dim] and hidden of size [1, batch size, hidden dim]. output is the concatenation of the hidden state from every time step, whereas hidden is simply the final hidden state. We verify this using the assert statement. Note the squeeze method, which is used to remove a dimension of size 1.

Finally, we feed the last hidden state, hidden, through the linear layer, fc, to produce a prediction.

In [26]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        
        super().__init__()
        
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, text):

        #text = [sent len, batch size]
     
        self = self.cuda()
        embedded = self.embedding(text)
        
        #embedded = [sent len, batch size, emb dim]
        
        output, hidden = self.rnn(embedded)
       
        
        #output = [sent len, batch size, hid dim]
        #hidden = [1, batch size, hid dim]
      
        assert torch.equal(output[-1,:,:], hidden.squeeze(0))
        
        return self.fc(hidden.squeeze(0))
  

In [27]:
INPUT_DIM = len(voc)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1

model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

In [28]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 5,524,105 trainable parameters


In [29]:
import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=1e-3)

In [30]:
criterion = nn.BCEWithLogitsLoss()

In [31]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

In [32]:
def train(model, iterator, optimizer, criterion):
    
    if(train_on_gpu):
        model.cuda()

    model.train()
    
    epoch_loss = 0
    epoch_acc = 0   
    counter = 0
      
    for inputs, labels in train_loader:
        counter += 1
        
        optimizer.zero_grad()
        
        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()
         
        inputs = torch.transpose(inputs, 0, 1)
        
        predictions = model.forward(inputs).squeeze(1) 
        
        labels = labels.float()
        
        loss = criterion(predictions, labels)
        
        acc = binary_accuracy(predictions, labels)
    
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
          
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [33]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    counter = 0    
    model.eval()
    
    with torch.no_grad():
    
        for inputs, labels in valid_loader:
            counter += 1

            if(train_on_gpu):
                inputs, labels = inputs.cuda(), labels.cuda()

            inputs = torch.transpose(inputs, 0, 1)

            predictions = model(inputs).squeeze(1) 

            labels = labels.float()

            loss = criterion(predictions, labels)

            acc = binary_accuracy(predictions, labels)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [34]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [35]:
N_EPOCHS = 3
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    # initialize hidden state
    #h = model.init_hidden(batch_size)
    
    #train_loss, train_acc = train(model, dataiter, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, dataiter, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut1-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    #print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

RuntimeError: cuda runtime error (710) : device-side assert triggered at /tmp/pip-req-build-ufslq_a9/aten/src/THC/THCReduceAll.cuh:327

In [None]:
model.load_state_dict(torch.load('tut1-model.pt'))

test_loss, test_acc = evaluate(model, dataiter, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

In [None]:
# Get test data loss and accuracy

net = model 
test_losses = [] # track loss
num_correct = 0

# init hidden state
#h = net.init_hidden(batch_size)

net.eval()
# iterate over test data
for inputs, labels in test_loader:

    # Creating new variables for the hidden state, otherwise
    # we'd backprop through the entire training history
    #h = tuple([each.data for each in h])

    if(train_on_gpu):
        inputs, labels = inputs.cuda(), labels.cuda()
    
    inputs = torch.transpose(inputs, 0, 1)
    # get predicted outputs
    output = net(inputs)
    

    # calculate loss
    test_loss = criterion(output.squeeze(), labels.float())
 
    test_losses.append(test_loss.item())
    
    # convert output probabilities to predicted class (0 or 1)
    pred = torch.round(output.squeeze())  # rounds to the nearest integer
    
    # compare predictions to true label
    correct_tensor = pred.eq(labels.float().view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    num_correct += np.sum(correct)

# -- stats! -- ##
# avg test loss
print("Test loss: {:.3f}".format(np.mean(test_losses)))

# accuracy over all test data
test_acc = num_correct/len(test_loader.dataset)
print("Test accuracy: {:.3f}".format(test_acc))

# Test

In [None]:
test_reviews = df["Reviews"][1:2]
test_reviews

In [None]:
from string import punctuation

def tokenize_review(test_review, voc):
    
    test_ints = []
    
    for review in test_review:
        # lowercase
        review = review.lower() 
        
        # get rid of punctuation
        test_text = ''.join([c for c in review if c not in punctuation])

        # splitting by spaces
        test_words = test_text.split()

        # tokens       
        test_ints.append([vocab_to_int[word] if word in voc else vocab_to_int["<unk>"] for word in test_words ])

    return test_ints

# test code and generate tokenized review
test_ints = tokenize_review(test_review_pos, voc)
print(test_ints)

In [None]:
# test sequence padding
seq_length=200
features = pad_features(test_ints, seq_length)

print(features)

In [None]:
# test conversion to tensor and pass into your model
feature_tensor = torch.from_numpy(features)
print(feature_tensor.size())

In [None]:
def predict(net, test_review, sequence_length=200):
    
    net.eval()
    
    # tokenize review
    test_ints = tokenize_review(test_review,voc)
    
    # pad tokenized sequence
    seq_length=sequence_length
    features = pad_features(test_ints, seq_length)
    
    # convert to tensor to pass into your model
    feature_tensor = torch.from_numpy(features)
    
    batch_size = feature_tensor.size(0)
    
    # initialize hidden state
    #h = net.init_hidden(batch_size)
    
    if(train_on_gpu):
        feature_tensor = feature_tensor.cuda()
    
    feature_tensor = torch.transpose(inputs, 0, 1)
    
    output = net(feature_tensor)
    
    # get the output from the model
    #output = net(feature_tensor)
    print(output)
    # convert output probabilities to predicted class (0 or 1)
    pred = torch.round(output.squeeze()) 
    print(pred)
    # printing output value, before rounding
    #print('Prediction value, pre-rounding: {:.6f}'.format(output.item()))
    
    # print custom response
    if(pred==1):
        print("Positive review detected!")
    else:
        print("Negative review detected.")

    # positive test review
test_review_pos = ['This movie had the best acting and the dialogue was so good. I loved it.']

In [None]:
# call function
seq_length=200 # good to use the length that was trained on

predict(model, test_review_pos, seq_length)

In [None]:
freq_words = pd.DataFrame({'Word':list(voc.keys()),'Count':list(voc.values())})
freq_words.head(10)

In [None]:
freq_words.shape

In [None]:
import torch
from torchtext import data

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(data_train), torch.from_numpy(train_y))
valid_data = TensorDataset(torch.from_numpy(data_val), torch.from_numpy(val_y))
test_data = TensorDataset(torch.from_numpy(reviews_test), torch.from_numpy(test_y))

# dataloaders
batch_size = 50

# make sure the SHUFFLE your training data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [None]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

BATCH_SIZE = 1

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (data_train, data_val, reviews_test),
    sort = False, #don't sort test/validation data
    batch_size=BATCH_SIZE,
    device=device)

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (data_train, data_val, reviews_test),
    sort_key = lambda x: x.s, #sort by s attribute (quote)
    batch_size=BATCH_SIZE,
    device=device)

print('Train:')
for batch in train_iterator:
    print(batch)
    
print('Valid:')
for batch in valid_iterator:
    print(batch)
    
print('Test:')
for batch in test_iterator:
    print(batch)