# Wine classification project

This is a NLP and deep learning project using `spacy`, `pytorch` and `torchtext`, aiming to retrieve information about a wine based on short text reviews, written by a taster.

## Objective

The data we're looking to guess are the **country** of production, the  **province** of production, and the **grape variety**.  
As a side objective, we show that on this dataset we can retrieve with a great accuracy the **name of the taster**, only from a review they have written.

## Dataset

The dataset contain data scraped from [WineEnthusiast](https://www.winemag.com/?s=&drink_type=wine), and is hosted on [this kaggle page](https://www.kaggle.com/zynicide/wine-reviews#winemag-data-130k-v2.csv). Each example contains a written review of a wine, an various data about this wine like the country and province of production, the grape variety, the winery, the name and twitter handle of the taster, a general grade and a price index.

## What we'll do

We'll perform the following steps: 
- Load and clean the data
- Setup the training, validation and testing datasets
- Setup pre-trained word embeddings
- Create a CNN to classify our data
- Write a training routine
- Test our model

In appendix, you'll also find:
- Helpers to analysis our model performance and diagnose misclassifications
- Our initial RNN implementation, which was not performing as well as our CNN


### Library loading

In [1]:
import torch
import numpy as np
from torchtext import data

SEED = 2753 # We always use the same seed for reproducibility
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(
    tokenize= 'spacy'
#     include_lengths=True # Uncomment for RNN (see appendix)
)
LABEL = data.LabelField()

We load `torch` and `torchtext`, and setup our fields for `torchtext`. Note that we indicate we're going to use `spacy` as our tokenizer. You need to have spacy installed for this to work, as well as downloading an english language model. `torchtext` expects this model to be called `en`, so you might have to rename it.

### Load and clean the dataset

In [2]:
import csv

CURRENT_LABEL = 'country' # Column we're currently trying to guess. Change this to any of the above columns.

# String to int relation between column name and column index, to access them easily
COLUMNS_STOI = {
    'country': 1, 
    'province': 6, 
    'taster_name': 9,
    'variety': 12,
}

MIN_SAMPLE_NUMBER = 150

column_number = COLUMNS_STOI[CURRENT_LABEL]

with open('datasets/winemag-data-130k-v2.csv') as f:
    reader = csv.reader(f)
    lines_uncontrolled = []
    counts = {}

    for row in reader:
        if not row[column_number]:
            # Skip the row if it doesn't have the current label
            continue
        if CURRENT_LABEL == 'province':
            # Fix the issue where "Bordeaux" is also sometimes called "Burgundy" (they are the same thing)
            if province == "Burgundy":
                row[6] = "Bordeaux"
        # Keep a count of each label occurence
        if not row[column_number] in counts.keys():
            counts[row[column_number]] = 1
        else:
            counts[row[column_number]] += 1
        lines_uncontrolled.append(row)
        
lines = []

# Remove the rows where the label is too rare
for row in lines_uncontrolled:
    if counts[row[column_number]] >= MIN_SAMPLE_NUMBER:
        lines.append(row)
    
                      
print("Removed " + str(len(lines_uncontrolled) - len(lines)) + " rows")

print("Number of classes before cutting:", len(counts.keys()))
print("Original number of rows:", len(lines_uncontrolled))    
print("Rows after cutting:", len(lines))
print("Classes kept:", [k for k in counts.keys() if counts[k] >= MIN_SAMPLE_NUMBER])

Removed 1277 rows
Number of classes before cutting: 44
Original number of rows: 129909
Rows after cutting: 128632
Classes kept: ['Italy', 'Portugal', 'US', 'Spain', 'France', 'Germany', 'Argentina', 'Chile', 'Australia', 'Austria', 'South Africa', 'New Zealand', 'Israel', 'Greece', 'Canada']


The dataset sometimes lacks data, so we need to make sure we only select the rows where the data we're looking at is present. We also want to keep only the examples for which we have enough data : for instance, if a variety is too rare in the dataset, we won't be able to determine rules to understand what this variety consists in. We can finetune the threshold with `MIN_SAMPLE_NUMBER`. We set it to `150`, which is `1/1000` of the total dataset size.

### Train, validation and test splits

In [3]:
TEST_SET_SIZE = .3
VALIDATION_SET_SIZE = .2

indices = list(range(1, len(lines)))
np.random.seed(SEED)
np.random.shuffle(indices)

first_split_index = int(TEST_SET_SIZE * len(lines))
second_split_index = int((TEST_SET_SIZE+VALIDATION_SET_SIZE) * len(lines))

test_indices = indices[:first_split_index]
validation_indices = indices[first_split_index:second_split_index]
train_indices = indices[second_split_index:]

train_set = [lines[k] for k in train_indices]
test_set = [lines[k] for k in test_indices]
validation_set = [lines[k] for k in validation_indices]

print("Train set size:", len(train_set))
print("Validation set size:", len(validation_set))
print("Test set size:", len(test_set))
print("Train set sample:", train_set[0])

Train set size: 64315
Validation set size: 25727
Test set size: 38589
Train set sample: ['60920', 'US', 'Serious Cabernet, rich and complex and full bodied, made from Knights Valley and Alexander Valley fruit, with a few grapes from Atlas Peak and Mount Veeder. Dense and tannic now, but high toned, with waves of black currants, black cherries and oak. A good price for a Cabernet of this quality. Drink now–2012.', 'Grand Reserve', '91', '30.0', 'California', 'Sonoma County', 'Sonoma', '', '', 'Kendall-Jackson 2007 Grand Reserve Cabernet Sauvignon (Sonoma County)', 'Cabernet Sauvignon', 'Kendall-Jackson']


We split our dataset in train, validation and test. We choose the size of the validation dataset to be 20% of the total size, and the test set to be 30%, leaving 50% for the training.

We then write these sets to csv files so we can load them afterwards. Note that we're using the `csv` library to write, because our wine reviews contain commas, so we need to be careful.

In [4]:
import os
try:
    os.mkdir('preprocessed_datasets')
except OSError:
    # It means the directory already exists, so let's just continue
    pass
    

with open('preprocessed_datasets/train.csv', 'w') as train_file:
    writer = csv.writer(train_file)
    writer.writerows(train_set)
    
with open('preprocessed_datasets/test.csv', 'w') as test_file:
    writer = csv.writer(test_file)
    writer.writerows(test_set)
with open('preprocessed_datasets/validation.csv', 'w') as validation_file:
    writer = csv.writer(validation_file)
    writer.writerows(validation_set)

### Setup the datasets

Then we'll setup the datasets so they can be used by `torchtext`. Here, we tell the library what the lines contains, and what data we want to use. We can select here the label we want to be working on, by setting it to `LABEL`, otherwise we leave it to `None`.

The `description` field, which contains the reviews, will always be set to `TEXT`: this is the field on which we're going to do some NLP.

In [5]:
# Put the label you want to predict as `LABEL`, all the other ones to `None`.
tv_datafields = [("id", None),
                 ("country", LABEL),
                 ("description", TEXT),
                 ("designation", None),
                 ("points", None),
                 ("price", None),
                 ("province", None),
                 ("region_1", None),
                 ("region_2", None),
                 ("taster_name", None),
                 ("taster_twitter_handle", None),
                 ("title", None),
                 ("variety", None),
                 ("winery", None)]

trn, vld, tst = data.TabularDataset.splits(path='preprocessed_datasets',
                                     format="csv",
                                     train= 'train.csv',
                                     validation='validation.csv',
                                     test='test.csv',
                                     fields=tv_datafields)

### Setup word embedding

Now we'll use pretrained word embeddings to improve the accuracy and speed up the training of our models.  
We'll build ourselves a vocabulary of the words encountered in the reviews (and in the labels), but as the reviews are quite big, we'll only keep the words common enough. For this we can set a limit on the number of words in our vocabulary. This is not necessary for the labels, because the vocabulary for them is much smaller.

**Beware :** `glove.6B.100d` is a library of pretrained vectors. It weights around **800M** and if you don't have it installed, running the following cell will download it. Make sure you have a good connection.

In [6]:
MAX_VOCAB_SIZE = 25000

TEXT.build_vocab(trn,
                 max_size=MAX_VOCAB_SIZE,
                 vectors = "glove.6B.100d",  # CAREFUL: this will download ~800M of data
                 unk_init = torch.Tensor.normal_)
LABEL.build_vocab(trn)

print("Reviews vocab length:", len(TEXT.vocab))
print("Labels vocab length:", len(LABEL.vocab))

Reviews vocab length: 25002
Labels vocab length: 15


Note how we get `25 002` and not `25 000` as our TEXT vocab lenght. This is because `torchtext` adds two reserved tokens: it replaces the word out of our vocab with a `<unk>` (unknown) token, and adds padding so the samples are all the same size with a `<pad>` token.

We can check the most common words in our reviews vocab:

In [7]:
print(TEXT.vocab.freqs.most_common(30))

[(',', 218513), ('.', 175333), ('and', 172093), ('of', 84913), ('the', 83112), ('a', 78408), ('with', 57539), ('is', 48552), ('wine', 39830), ('-', 36812), ('this', 36108), ('in', 29999), ('flavors', 29798), ('to', 27640), ('The', 26346), ("'s", 25521), ('fruit', 24620), ('It', 21413), ('it', 21153), ('on', 21067), ('This', 20297), ('that', 19687), ('palate', 19062), ('aromas', 17373), ('acidity', 17111), ('finish', 16994), ('tannins', 15099), ('from', 14989), ('but', 14604), ('cherry', 14193)]


We notice that the most common word is a comma, which explains why we had to be careful with our csv reading and writing.

Now we'll setup iterators, which will allow us to iterate through batches of our training, validation and testing datasets:

In [8]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # If you have cuda support, this will make sure you're using if for training

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (trn, vld, tst), 
    batch_size = BATCH_SIZE,
#     sort_within_batch=True, # Uncomment for RNN (see appendix), because batches need to be sorted
    sort_key=lambda x: len(x.description), 
    device = device)

Note that we sort our data according to the length of the review. This is because we need to add some padding to the reviews to make sure all the samples in a batch are of the same size. Gathering samples of same size close together will ensure we won't have to add too much padding, which will speed up the process a bit.

### Creating the CNN

In [9]:
import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
                 dropout, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, 
                                              out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        text = text.permute(1, 0)        
        embedded = self.embedding(text)
        embedded = embedded.unsqueeze(1)
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
        cat = self.dropout(torch.cat(pooled, dim = 1))
        return self.fc(cat)

In the `__init__` function we define the architecture of our model. 
- First we have an **embedding layer** (our input vectors are one-hot vector and are sparse, this will turn them into smaller, non-sparse vector)
- Several **convolution layers** : convolution on text is a bit specific, we wrote a little bit more about it in our pdf (in French). Basically, it performs convolution a bit like we would do on images, but instead of layers we use n-grams. Then they all use **ReLU** as an activation function, and then use **max pooling**.
- Finally a **linear layer**, of same output size as our number of classes, so we can perform classification
- Note we're using **dropout**: this is a technique to avoid overfitting, by randomly setting some node to 0 at each forward pass.

Next we'll have to choose the parameters of this architecture:

In [10]:
INPUT_DIMENSION = len(TEXT.vocab)
EMBEDDING_DIM = 100
N_FILTERS = 100
FILTER_SIZES = [2,3,4]
OUTPUT_DIM = len(LABEL.vocab)
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = CNN(INPUT_DIMENSION, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX)

- `INPUT_DIM` and `OUTPUT_DIM` are based on our data.
- The embedding dimension `EMBEDDING_DIM` is fixed by the pretrained data we've loaded, so we have to keep this one at 100.
- We can choose `N_FILTER` and `FILTER_SIZES` freely, as well as the dropout rate `DROUPOUT`.

We can now use our pre-trained embeddings to setup initial values in our embedding layer:

In [11]:
pretrained_embeddings = TEXT.vocab.vectors

model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[-0.0737, -1.4165,  0.7548,  ..., -1.6059, -0.2473,  0.2487],
        [-0.0412, -0.6711,  0.8314,  ..., -1.2042,  0.2975, -0.3076],
        [-0.1077,  0.1105,  0.5981,  ..., -0.8316,  0.4529,  0.0826],
        ...,
        [ 0.1787,  0.0940, -0.6176,  ..., -0.1955, -0.3778, -0.0580],
        [ 0.0252, -0.9636, -0.2277,  ..., -2.2039,  0.3247,  0.5036],
        [-1.5420,  0.4095,  0.0563,  ...,  0.4985,  1.0879, -0.5282]])

Of course the pretrained vectors did not contain the `<unk>` and `<pad>` tokens, so we assign them all-zeros token:

In [12]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

### Training

Now we have everything defined, we can train our model.

We choose Adam as our optimizer (the nice thing about Adam is that we don't have to select a learning rate, as we would need with stochastic gradient descent).  
We also need to choose a loss function. Here we use `CrossEntropyLoss` from `pytorch`, which is for when a sample belongs to exclusively one class (this is our case, as each wine only belongs to one country, one province, has only one writer...)

In [13]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

criterion = nn.CrossEntropyLoss()

model = model.to(device)
criterion = criterion.to(device)

Now we need to define an accuracy function. As we are doing multi-class classification, we can use the proportion of correctly classified samples in a batch, in other words: on each sample, we choose the label with the max probability, and then we check on the batch what is the proportion of correctly classified labels:

In [14]:
def categorical_accuracy(preds, y):
    max_preds = preds.argmax(dim = 1, keepdim = True)
    correct = max_preds.squeeze(1).eq(y)
    return correct.sum() / torch.FloatTensor([y.shape[0]])

We can now define our training and evaluating functions, which will repectively train the model and evaluate accuracy batch after batch.

We are always using the `description` field (review text) as an input, but we can take varying outputs depending on what label we're experimenting on, so we need to get this one back with `getattr`.

In [15]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        optimizer.zero_grad()
        
        predictions = model(batch.description)
        
        loss = criterion(predictions, getattr(batch, CURRENT_LABEL))
        
        acc = categorical_accuracy(predictions, getattr(batch, CURRENT_LABEL))
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.description)
            
            loss = criterion(predictions, getattr(batch, CURRENT_LABEL))
            
            acc = categorical_accuracy(predictions, getattr(batch, CURRENT_LABEL))

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)


We create a helper function to keep track of time during training, so we can compare how fast our different models are:

In [16]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

Now it's time to run the training!

We choose the number of epochs we want to run the model on, and when we get better results, we save the model in a separate file to make sure we don't lose it as this step can be time consuming.

In [17]:
N_EPOCHS = 10

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'wine-prediction-model.pt')
    
    print('Epoch: ' + str(epoch+1) + ' | Epoch Time: ' + str(epoch_mins) + 'm '+ str(epoch_secs) + 's')
    print('\tTrain Loss: ' + str(train_loss) + ' | Train Acc: ' + str(train_acc*100) + '%')
    print('\tVal. Loss: ' + str(valid_loss) + ' |  Val. Acc: ' + str(valid_acc*100) + '%')

Epoch: 1 | Epoch Time: 2m 30s
	Train Loss: 1.038166518857823 | Train Acc: 68.35233261929223%
	Val. Loss: 0.6452360872309006 |  Val. Acc: 79.10355218310853%
Epoch: 2 | Epoch Time: 2m 43s
	Train Loss: 0.6372991372696796 | Train Acc: 79.22780166811017%
	Val. Loss: 0.5362838631245628 |  Val. Acc: 81.66897408108214%
Epoch: 3 | Epoch Time: 2m 42s
	Train Loss: 0.5159955355065379 | Train Acc: 82.80443650573048%
	Val. Loss: 0.4701322091752617 |  Val. Acc: 83.89618129872564%
Epoch: 4 | Epoch Time: 2m 40s
	Train Loss: 0.43658111817504636 | Train Acc: 85.01773442201946%
	Val. Loss: 0.45510196037108624 |  Val. Acc: 84.40134397786649%
Epoch: 5 | Epoch Time: 2m 37s
	Train Loss: 0.3796428862347532 | Train Acc: 86.84245614863154%
	Val. Loss: 0.4365081242216167 |  Val. Acc: 84.96123055617014%
Epoch: 6 | Epoch Time: 2m 34s
	Train Loss: 0.33618637588783284 | Train Acc: 88.08689602571933%
	Val. Loss: 0.4432896591389357 |  Val. Acc: 85.12435343431596%
Epoch: 7 | Epoch Time: 2m 33s
	Train Loss: 0.30325687466

### Testing the results

Now we have trained the model, we can use the test samples we have left aside to test its perfomance on unknown samples:

In [18]:
model.load_state_dict(torch.load('wine-prediction-model.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print('Test Loss: ' + str(test_loss) + ' | Test Acc: '+ str(test_acc*100) + '%')

Test Loss: 0.4347682754149287 | Test Acc: 85.83219490241055%


Depending on the experiment you're running, you can get various results at this step. We kept in our pdf a track of the results we could obtain here.

### Live testing

To play a bit more with the model, we can use `spacy` to classify in live some reviews :

In [19]:
import spacy
nlp = spacy.load('en_core_web_sm')

def predict_class(model, sentence, min_len = 4):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    if len(tokenized) < min_len:
        tokenized += ['<pad>'] * (min_len - len(tokenized))
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    preds = model(tensor)
    max_preds = preds.argmax(dim = 1)
    return max_preds.item()

In the following cell, you can put any review in the description, and check how the model classifies it.

In [20]:
description = "This cooperative, based in Aÿ, has benefited from the fine Pinot Noir in the village to produce a ripe red fruited wine. With balanced acidity and a soft aftertaste, it is ready to drink."
pred_class = predict_class(model, description)
print('Predicted class is: ' + str(pred_class) + ' = ' + str(LABEL.vocab.itos[pred_class]))

Predicted class is: 1 = France


## Appendix

We also provide a few things that are not directly linked to our results above but we used during our work.  

### Results exploration

To check how our model was performing, as especially in what cases it didn't perform well, we used the following script. It allows us to see a number of wrongly classified samples. This is for instance how we found out that **Burgundy** was also called **Bordeaux** in the dataset, which led to lots of classification errors.

In [21]:
import csv

LIMIT = 30  # How many results to display
SHOW_ONLY_WRONG = False # If set to true, will only show the wrongly classified samples

with open('preprocessed_datasets/test.csv') as f:
    reader = csv.reader(f)

    i = 0
    for row in reader:
        if i > LIMIT:
            break
        sentence = row[2]
        real_value = row[COLUMNS_STOI[CURRENT_LABEL]]
        pred_value = predict_class(model, sentence)
        if not SHOW_ONLY_WRONG or real_value != LABEL.vocab.itos[pred_value]:
            print(sentence)
            print("Actual: " + str(real_value) + ", predicted: " + str(LABEL.vocab.itos[pred_value]) + "\n")
        i += 1
    

Still showing its tannins, this wine is developing well. It is relatively light in texture, the sweet berry fruits balanced with a layer of acidity.
Actual: France, predicted: France

A unique blend of fermenting orange, aging white flowers, dried apples and a musky sandalwood show on the nose of this wine. The palate is simultaneously rich, sour and creamy, with tangerine and banana flavors.
Actual: US, predicted: US

Sweet tobacco and overripe cherry open the nose of this thick, jammy Amarone. The wine exhibits a syrupy, bold mouthfeel with lingering tones of smoke, beef jerky and spice on the finish.
Actual: Italy, predicted: Italy

Mineral aromas of gravel, graphite and crushed slate show on the nose of this bottling, leading into baked black plum and oak notes. It's a refreshing example from an appellation that tends toward richer, jammy styles. The palate offers raspberry and dried thyme flavors, with a touch of eucalyptus.
Actual: US, predicted: US

Aromas of plum, wild herbs an

### RNN experiments

We also implemented a multi-class RNN as it usually works well on text analysis, because of the sequential nature of text. However, it turned out that it was not performing as well as our CNN described above, and was much longer to train. If you want to try to run it by yourself (beware: the training can take several hours), you'll have to change a few things in the code above - look for the comments about RNN.

The train and evaluate functions are similar to those of the CNN, the main difference being the text length being taken into account.

In [None]:
def trainRNN(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for (index, batch) in enumerate(iterator):
        print(index/len(iterator))
        optimizer.zero_grad()
        
        text, text_lengths = batch.description
        
        predictions = model(text, text_lengths).squeeze(1)
        
        loss = criterion(predictions, batch.province)
        
        acc = categorical_accuracy(predictions, batch.province)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluateRNN(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text, text_lengths = batch.description
            
            predictions = model(text, text_lengths).squeeze(1)
            
            loss = criterion(predictions, batch.province)
            
            acc = categorical_accuracy(predictions, batch.province)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

N_EPOCHS = 10
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = trainRNN(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluateRNN(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'wine-prediction-model.pt')
    
    print('Epoch: ' + str(epoch+1.02) + ' | Epoch Time: ' + str(epoch_mins) + 'm '+ str(epoch_secs) + 's')
    print('\tTrain Loss: ' + str(train_loss) + ' | Train Acc: ' + str(train_acc*100) + '%')
    print('\tVal. Loss: ' + str(valid_loss) + ' |  Val. Acc: ' + str(valid_acc*100) + '%')

And below you can find the definition of our RNN architecture:

In [381]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        self.rnn = nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=bidirectional, 
                           dropout=dropout)
        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text, text_lengths):
        embedded = self.dropout(self.embedding(text))
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)   
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
        return self.fc(hidden.squeeze(0))

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = len(LABEL.vocab)
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = RNN(INPUT_DIM, 
            EMBEDDING_DIM, 
            HIDDEN_DIM, 
            OUTPUT_DIM, 
            N_LAYERS, 
            BIDIRECTIONAL, 
            DROPOUT, 
            PAD_IDX)