# Building configurable neural networks

The goal of this part is to get you up and running with the wonderful library from the AllenAI people. The library implements all the details we have coded ourselves (batching, padding, etc.). More importantly it provides a configurable architecture which lets you both experiment and evolve your models.

In this case we will be rebuilding our classifier with configurable embeddings (word-level and char-level) and a configurable encoder (BoE, RNNs, CNNs, etc.)

In particular, we will be implementing three classes. 

- The first is a <a href ="https://allenai.github.io/allennlp-docs/api/allennlp.data.dataset_readers.html">DatasetReader</a>, which contains the logic for reading a file of data and producing a stream of <code>Instance</code>s.
- The second is a configurable `Model`, which can combine different modules (seq2vec encoders such as lstms, cnns, Elmo, etc.).

But first let's prepare our datasets. This time we will be using the training and validation splits provided by fastai.

In [6]:
import fastai
import pandas as pd
data_path = fastai.untar_data(fastai.URLs.IMDB_SAMPLE)
df = pd.read_csv(data_path/'texts.csv')
train_df = df.loc[df['is_valid'] == False] # get examples from train split
validation_df = df.loc[df['is_valid'] == True] # get examples from valid split
train_df.to_csv(data_path/'train.csv')
validation_df.to_csv(data_path/'validation.csv')

  return f(*args, **kwds)


In [2]:
# AllenNLP uses type annotations
from typing import Iterator, List, Dict

import torch
import torch.optim as optim
import numpy as np

# AllenNLP represent each training example as Instances, containing several fields
from allennlp.data import Instance
from allennlp.data.fields import TextField, LabelField
# Abstract DatasetReader, similar to our previous 'Dataset'
from allennlp.data.dataset_readers import DatasetReader
# Tokenizer and numericalizers utilities
from allennlp.data.token_indexers import TokenIndexer, SingleIdTokenIndexer
from allennlp.data.tokenizers import Token

In [3]:
class CSVDatasetReader(DatasetReader):
    #  <code>TokenIndexer</code>s similar to our previous to_id method
    def __init__(self, token_indexers: Dict[str, TokenIndexer] = None) -> None:
        super().__init__(lazy=False)
        self.token_indexers = token_indexers or {"tokens": SingleIdTokenIndexer()}

    # This creates and wraps training and evaluating examples
    def text_to_instance(self, tokens: List[Token], label: str = None) -> Instance:
        text_field = TextField(tokens, self.token_indexers)
        fields = {"text": text_field}

        if label:
            label_field = LabelField(label)
            fields["label"] = label_field

        return Instance(fields)
 
    # This reads the file and builds instance for each example
    def _read(self, file_path: str) -> Iterator[Instance]:
        dataset = pd.read_csv(file_path)
        for _, row in dataset.iterrows():
            yield self.text_to_instance([Token(word) for word in row['text']],
                                       row['label'])

# Building the model

Our model will be composed of the following layers:

- text embeddings
- encoder
- linear layer



In [11]:
from allennlp.modules.text_field_embedders import TextFieldEmbedder
from allennlp.modules.seq2vec_encoders import Seq2VecEncoder
from allennlp.models import Model
import torch.nn.functional as F

from allennlp.training.metrics import CategoricalAccuracy
from allennlp.data.vocabulary import Vocabulary
from allennlp.nn.util import get_text_field_mask
    
class TextClassifier(Model): # Inherit from allenNLP model which wraps torch.nn.Module
    def __init__(self,
                # here we can plug dif embeddings (char, words, elmo, and combinations)
                word_embeddings: TextFieldEmbedder,
                # same for encoders
                encoder: Seq2VecEncoder,
                vocab: Vocabulary) -> None:
        super().__init__(vocab)
        self.word_embeddings = word_embeddings
        self.encoder = encoder
        self.out = torch.nn.Linear(in_features=encoder.get_output_dim(),
                                          out_features=vocab.get_vocab_size('label'))
        self.loss = torch.nn.CrossEntropyLoss()
        self.metrics = {
            "accuracy": CategoricalAccuracy()
        }
        self.loss = torch.nn.CrossEntropyLoss()
    def forward(self,
                text: Dict[str, torch.Tensor],
                label: torch.Tensor = None) -> torch.Tensor:
        # AllenNLP provides out of the box utilities for dealing with padding
        # and also masking to exclude the padding from the computation
        mask = get_text_field_mask(text)
        
        embeddings = self.word_embeddings(text)
        # Sequence of encoded outputs
        encoder_out = self.encoder(embeddings, mask)
        
        label_logits = self.out(encoder_out)
        output = {"label": label_logits}

        class_probabilities = F.softmax(label_logits)
        output_dict = {"class_probabilities": class_probabilities}

        if label is not None:
            loss = self.loss(label_logits, label.squeeze(-1))
            for metric in self.metrics.values():
                metric(label_logits, label.squeeze(-1))
            output["loss"] = loss
        return output
    
    def get_metrics(self, reset: bool = False) -> Dict[str, float]:
        return {metric_name: metric.get_metric(reset) for metric_name, metric in self.metrics.items()}
        

# Reading data and creating the vocab

In [7]:
# Read data and make vocab
csv_reader = CSVDatasetReader()
# create train and validation datasets
train_dataset= csv_reader.read(data_path/'train.csv')
validation_dataset= csv_reader.read(data_path/'validation.csv')
# Make vocab from train and valid
vocab = Vocabulary.from_instances(train_dataset + validation_dataset)

800it [00:04, 176.98it/s]
200it [00:01, 156.57it/s]
100%|██████████| 1000/1000 [00:01<00:00, 918.86it/s]


# Configuring our model
So we said that one of the best thing of AllenNLP and the model we are building are their modularity and extensibility. Let's see how this works:



In [12]:
from allennlp.common import Params
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
word_embeddings_config = Params({
    "tokens": {
        "type": "embedding",
        "embedding_dim": 50
    }
})
word_embeddings = BasicTextFieldEmbedder.from_params(vocab, word_embeddings_config)

encoder_config = Params({
            "type": "boe",
            "embedding_dim": 50
})
encoder = Seq2VecEncoder.from_params(encoder_config)

# Our model gets this configured modules
# Later we can simply change these configurations to try out new ideas
imdb_classifier = TextClassifier(word_embeddings, encoder, vocab)

# Finally, training


In [13]:
from allennlp.data.iterators import BucketIterator
from allennlp.training.trainer import Trainer

# Using an optimizer as before
optimizer = optim.Adam(imdb_classifier.parameters(), lr=0.1)

# This handles batching for our datasets. 
# The iterator sorts instances by the specified fields in order to create 
# batches with similar sequence lengths. 
# Here we indicate that we want to sort the instances by the number of tokens in the text field
iterator = BucketIterator(batch_size=4, sorting_keys=[("text", "num_tokens")])
iterator.index_with(vocab)

# Train for 10 epochs, and early stop if validation does not improve for two consec epochs
trainer = Trainer(model=imdb_classifier,
                  optimizer=optimizer,
                  iterator=iterator,
                  train_dataset=train_dataset,
                  validation_dataset=validation_dataset,
                  patience=2,
                  num_epochs=10)

trainer.train()




accuracy: 0.2500, loss: 14.6712 ||:   0%|          | 1/200 [00:00<02:32,  1.30it/s][A
accuracy: 0.5000, loss: 350.7003 ||:   2%|▏         | 3/200 [00:00<01:50,  1.79it/s][A
accuracy: 0.6250, loss: 476.2503 ||:   3%|▎         | 6/200 [00:01<01:18,  2.48it/s][A
accuracy: 0.5625, loss: 606.9741 ||:   4%|▍         | 8/200 [00:01<01:00,  3.20it/s][A
accuracy: 0.5417, loss: 452.6760 ||:   6%|▌         | 12/200 [00:01<00:42,  4.39it/s][A
accuracy: 0.5000, loss: 530.8801 ||:   8%|▊         | 15/200 [00:01<00:32,  5.73it/s][A
accuracy: 0.5000, loss: 603.7033 ||:   8%|▊         | 17/200 [00:01<00:25,  7.24it/s][A
accuracy: 0.5263, loss: 625.5961 ||:  10%|▉         | 19/200 [00:01<00:20,  8.62it/s][A
accuracy: 0.5208, loss: 571.8606 ||:  12%|█▏        | 24/200 [00:01<00:15, 11.38it/s][A
accuracy: 0.5000, loss: 523.5506 ||:  14%|█▎        | 27/200 [00:01<00:12, 13.69it/s][A
accuracy: 0.5242, loss: 459.9739 ||:  16%|█▌        | 31/200 [00:02<00:09, 16.90it/s][A
accuracy: 0.5000, loss: 

accuracy: 0.4957, loss: 105.0591 ||:  58%|█████▊    | 117/200 [00:05<00:03, 27.51it/s][A
accuracy: 0.4979, loss: 101.6828 ||:  60%|██████    | 121/200 [00:05<00:02, 27.76it/s][A
accuracy: 0.4980, loss: 98.5544 ||:  62%|██████▎   | 125/200 [00:05<00:02, 25.47it/s] [A
accuracy: 0.4961, loss: 96.4654 ||:  64%|██████▍   | 128/200 [00:05<00:02, 25.36it/s][A
accuracy: 0.4981, loss: 94.4460 ||:  66%|██████▌   | 131/200 [00:05<00:02, 23.82it/s][A
accuracy: 0.4981, loss: 92.6289 ||:  67%|██████▋   | 134/200 [00:06<00:02, 22.21it/s][A
accuracy: 0.5018, loss: 90.7268 ||:  68%|██████▊   | 137/200 [00:06<00:03, 18.77it/s][A
accuracy: 0.5018, loss: 88.4570 ||:  70%|███████   | 141/200 [00:06<00:02, 22.08it/s][A
accuracy: 0.5035, loss: 86.6907 ||:  72%|███████▏  | 144/200 [00:06<00:02, 20.77it/s][A
accuracy: 0.5000, loss: 84.4598 ||:  74%|███████▍  | 148/200 [00:06<00:02, 23.42it/s][A
accuracy: 0.5000, loss: 82.9055 ||:  76%|███████▌  | 151/200 [00:06<00:02, 21.38it/s][A
accuracy: 0.5032, 

accuracy: 0.3333, loss: 132.7416 ||:   2%|▏         | 3/200 [00:01<03:41,  1.12s/it][A
accuracy: 0.4000, loss: 159.9533 ||:   2%|▎         | 5/200 [00:01<02:36,  1.25it/s][A
accuracy: 0.4286, loss: 138.8638 ||:   4%|▎         | 7/200 [00:01<01:52,  1.71it/s][A
accuracy: 0.3889, loss: 291.1868 ||:   4%|▍         | 9/200 [00:02<01:24,  2.26it/s][A
accuracy: 0.4091, loss: 246.0780 ||:   6%|▌         | 11/200 [00:02<01:01,  3.07it/s][A
accuracy: 0.4231, loss: 249.2264 ||:   6%|▋         | 13/200 [00:02<00:47,  3.93it/s][A
accuracy: 0.4500, loss: 225.1828 ||:   8%|▊         | 15/200 [00:02<00:37,  4.89it/s][A
accuracy: 0.4706, loss: 233.0927 ||:   8%|▊         | 17/200 [00:02<00:33,  5.53it/s][A
accuracy: 0.4875, loss: 210.8865 ||:  10%|█         | 20/200 [00:03<00:24,  7.21it/s][A
accuracy: 0.5000, loss: 242.7198 ||:  11%|█         | 22/200 [00:03<00:21,  8.20it/s][A
accuracy: 0.5104, loss: 233.7059 ||:  12%|█▏        | 24/200 [00:03<00:17,  9.89it/s][A
accuracy: 0.5185, loss: 2

{'training_duration': '00:00:27',
 'training_start_epoch': 0,
 'training_epochs': 2,
 'epoch': 2,
 'training_accuracy': 0.47875,
 'training_loss': 296.36930656939745,
 'validation_accuracy': 0.535,
 'validation_loss': 222.559275932312,
 'best_epoch': 1,
 'best_validation_accuracy': 0.525,
 'best_validation_loss': 3.8973180186748504}

## Exercises

1. Try using pre-trained word embeddings on the embedding layer.


2. Try adding char level tokenization and embedding.


3. We used the simplest encoder possible, try a CNN encoder and then RNN-based ones.


4. On the network we created before, we used two linear layers with one non-linearity
in between, could you try here? Do you get better results? Try using the Feedforward module from allenNLP, which can be configured like this:
```json
"classifier_feedforward": {
      "input_dim": 400,
      "num_layers": 2,
      "hidden_dims": [200, 3],
      "activations": ["relu", "linear"],
      "dropout": [0.2, 0.0]
    }
```


5. Use Elmo for embedding text.


BONUS: 
You could run the training and/or evaluation with the full-IMDB dataset available by running `fastai.untar_data(fastai.URLs.IMDB)` (hint: you will need to pre-arrange the data a little bit as its organized a bit differently, run path.ls() to see the new structure)

