# Transfer Learning in NLP

In this notebook, we will go through basics of Transfer Learning in NLP using two architectures (pretrained models) ELMo and BERT and also compare the results of GloVe embeddings with ELMo and BERT on [Twitter US Airline Sentiment dataset](https://www.kaggle.com/crowdflower/twitter-airline-sentiment).

### Dataset

Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").  It contains whether the sentiment of the tweets in this set was positive, neutral, or negative for six US airlines

---

Here we will use [Allennlp](https://github.com/allenai/allennlp  "Allennlp Githubl").


Everything is explained in-detail in [blog post](https://dudeperf3ct.github.io/nlp/transfer/learning/2019/02/22/Power-of-Transfer-Learning-in-NLP/). This is notebook which replicates the result of blog and runs in colab. Enjoy!


#### Run in Colab

You can run this notebook in google colab.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dudeperf3ct/DL_notebooks/blob/master/tl_nlp/tl_nlp_allennlp.ipynb)




In [0]:
! pip install allennlp

Collecting allennlp
[?25l  Downloading https://files.pythonhosted.org/packages/64/32/d6d0a93a23763f366df2dbd4e007e45ce4d2ad97e6315506db9da8af7731/allennlp-0.8.2-py3-none-any.whl (5.6MB)
[K    100% |████████████████████████████████| 5.6MB 8.2MB/s 
Collecting flaky (from allennlp)
  Downloading https://files.pythonhosted.org/packages/02/42/cca66659a786567c8af98587d66d75e7d2b6e65662f8daab75db708ac35b/flaky-3.5.3-py2.py3-none-any.whl
Collecting pytz==2017.3 (from allennlp)
[?25l  Downloading https://files.pythonhosted.org/packages/a3/7f/e7d1acbd433b929168a4fb4182a2ff3c33653717195a26c1de099ad1ef29/pytz-2017.3-py2.py3-none-any.whl (511kB)
[K    100% |████████████████████████████████| 512kB 25.0MB/s 
[?25hCollecting numpydoc==0.8.0 (from allennlp)
  Downloading https://files.pythonhosted.org/packages/95/a8/b4706a6270f0475541c5c1ee3373c7a3b793936ec1f517f1a1dab4f896c0/numpydoc-0.8.0.tar.gz
Collecting parsimonious==0.8.0 (from allennlp)
  Downloading https://files.pythonhosted.org/packages/

In [0]:
! python -m spacy download en
! python -m spacy download en_core_web_md


[93m    Linking successful[0m
    /usr/local/lib/python3.6/dist-packages/en_core_web_sm -->
    /usr/local/lib/python3.6/dist-packages/spacy/data/en

    You can now load the model via spacy.load('en')


[93m    Linking successful[0m
    /usr/local/lib/python3.6/dist-packages/en_core_web_md -->
    /usr/local/lib/python3.6/dist-packages/spacy/data/en_core_web_md

    You can now load the model via spacy.load('en_core_web_md')



## Twitter Sentiment Data

Code Adapted from : [Link](https://github.com/keitakurita/Practical_NLP_in_PyTorch)

Paper ELMo : [Link](https://arxiv.org/pdf/1802.05365.pdf)

Paper BERT: [Link](https://arxiv.org/pdf/1810.04805.pdf)



In [0]:
from pathlib import Path
from typing import *
import os
import torch
import torch.optim as optim
import numpy as np
import pandas as pd
from functools import partial
from overrides import overrides

from allennlp.data import Instance
from allennlp.data.token_indexers import TokenIndexer
from allennlp.data.tokenizers import Token
from allennlp.nn import util as nn_util
from allennlp.common.checks import ConfigurationError

USE_GPU = torch.cuda.is_available()

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


In [0]:
df = pd.read_csv('https://query.data.world/s/hus7zihvuo5vt65cnv4fcfn2ppfj6y', encoding = "ISO-8859-1")
df = df[["airline_sentiment", "text"]]
df.head()

Unnamed: 0,airline_sentiment,text
0,neutral,@VirginAmerica What @dhepburn said.
1,positive,@VirginAmerica plus you've added commercials t...
2,neutral,@VirginAmerica I didn't today... Must mean I n...
3,negative,@VirginAmerica it's really aggressive to blast...
4,negative,@VirginAmerica and it's a really big bad thing...


In [0]:
df['positive'] = df.apply(lambda row: 1 if row['airline_sentiment'] == 'positive' else 0, axis=1)
df['negative'] = df.apply(lambda row: 1 if row['airline_sentiment'] == 'negative' else 0, axis=1)
df['neutral'] = df.apply(lambda row: 1 if row['airline_sentiment'] == 'neutral' else 0, axis=1)

In [0]:
df.head()

Unnamed: 0,airline_sentiment,text,positive,negative,neutral
0,neutral,@VirginAmerica What @dhepburn said.,0,0,1
1,positive,@VirginAmerica plus you've added commercials t...,1,0,0
2,neutral,@VirginAmerica I didn't today... Must mean I n...,0,0,1
3,negative,@VirginAmerica it's really aggressive to blast...,0,1,0
4,negative,@VirginAmerica and it's a really big bad thing...,0,1,0


In [0]:
#os.mkdir('data/')
df.to_csv('data/train.csv', index=False)

In [0]:
class Config(dict):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        for k, v in kwargs.items():
            setattr(self, k, v)
    
    def set(self, key, val):
        self[key] = val
        setattr(self, key, val)
        
config = Config(
    testing=False,
    seed=1,
    batch_size=64,
    lr=3e-4,
    epochs=10,
    hidden_sz=64,
    max_seq_len=100, # necessary to limit memory usage
    max_vocab_size=10000,
)

In [0]:
torch.manual_seed(config.seed)
DATA_ROOT = Path("data")

### Load Data

### Prepare Dataset

In [0]:
from allennlp.data.vocabulary import Vocabulary
from allennlp.data.dataset_readers import DatasetReader

In [0]:
label_cols = ["negative", "neutral", "positive"]

In [0]:
from allennlp.data.fields import TextField, MetadataField, ArrayField

class SentimentDatasetReader(DatasetReader):
    def __init__(self, tokenizer: Callable[[str], List[str]]=lambda x: x.split(),
                 token_indexers: Dict[str, TokenIndexer] = None,
                 max_seq_len: Optional[int]=config.max_seq_len) -> None:
        super().__init__(lazy=False)
        self.tokenizer = tokenizer
        self.token_indexers = token_indexers or {"tokens": SingleIdTokenIndexer()}
        self.max_seq_len = max_seq_len

    @overrides
    def text_to_instance(self, tokens: List[Token], id: str=None, labels: np.ndarray=None) -> Instance:
        sentence_field = TextField(tokens, self.token_indexers)
        fields = {"tokens": sentence_field}
        
        id_field = MetadataField(id)
        fields["id"] = id_field
        
        if labels is None:
            labels = np.zeros(len(label_cols))
        label_field = ArrayField(array=labels)
        fields["label"] = label_field

        return Instance(fields)
    
    @overrides
    def _read(self, file_path: str) -> Iterator[Instance]:
        df = pd.read_csv(file_path)
        if config.testing: df = df.head(1000)
        for i, row in df.iterrows():
            yield self.text_to_instance([Token(x) for x in self.tokenizer(row["text"])], None, row[label_cols].values)

## GloVe

In [0]:
from allennlp.data.tokenizers.word_splitter import SpacyWordSplitter
from allennlp.data.token_indexers import SingleIdTokenIndexer

# the token indexer is responsible for mapping tokens to integers
token_indexer = SingleIdTokenIndexer()

def tokenizer(x: str):
    return [w.text for w in SpacyWordSplitter(language='en_core_web_sm', pos_tags=False).split_words(x)[:config.max_seq_len]]

In [0]:
reader = SentimentDatasetReader(
    tokenizer=tokenizer,
    token_indexers={"tokens": token_indexer}
)

In [0]:
train_ds = reader.read(DATA_ROOT / "train.csv")
val_ds = None

14640it [00:19, 735.51it/s]


In [0]:
len(train_ds)

14640

In [0]:
vars(train_ds[0].fields["tokens"])

{'_indexed_tokens': None,
 '_indexer_name_to_indexed_token': None,
 '_token_indexers': {'tokens': <allennlp.data.token_indexers.single_id_token_indexer.SingleIdTokenIndexer at 0x7f42d587d7b8>},
 'tokens': [@VirginAmerica, What, @dhepburn, said, .]}

In [0]:
vars(train_ds[0].fields["label"])

{'array': array([0, 1, 0], dtype=object), 'padding_value': 0}

### Prepare Vocabulary

In [0]:
vocab = Vocabulary()

### Prepare Iterator

In [0]:
from allennlp.data.iterators import BucketIterator

In [0]:
iterator = BucketIterator(batch_size=config.batch_size, 
                          sorting_keys=[("tokens", "num_tokens")],
                         )

In [0]:
iterator.index_with(vocab)

### Read Sample

In [0]:
batch = next(iter(iterator(train_ds)))

In [0]:
batch

{'id': [None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None],
 'label': tensor([[1., 0., 0.],
         [0., 0., 1.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0., 1., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0., 1., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0., 0., 1.],
         [1.

In [0]:
batch["tokens"]["tokens"].shape

torch.Size([64, 28])

### Prepare Model

In [0]:
import torch
import torch.nn as nn
import torch.optim as optim

from allennlp.modules.seq2vec_encoders import Seq2VecEncoder, PytorchSeq2VecWrapper
from allennlp.nn.util import get_text_field_mask
from allennlp.models import Model
from allennlp.modules.text_field_embedders import TextFieldEmbedder

In [0]:
class BaselineModel(Model):
    def __init__(self, word_embeddings: TextFieldEmbedder,
                 encoder: Seq2VecEncoder,
                 out_sz: int=len(label_cols)):
        super().__init__(vocab)
        self.word_embeddings = word_embeddings
        self.encoder = encoder
        self.projection = nn.Linear(self.encoder.get_output_dim(), out_sz)
        self.loss = nn.BCEWithLogitsLoss()
        
    def forward(self, tokens: Dict[str, torch.Tensor],
                id: Any, label: torch.Tensor) -> torch.Tensor:
        mask = get_text_field_mask(tokens)
        embeddings = self.word_embeddings(tokens)
        state = self.encoder(embeddings, mask)
        class_logits = self.projection(state)
        
        output = {"class_logits": class_logits}
        output["loss"] = self.loss(class_logits, label)

        return output

### Prepare Embeddings

In [0]:
from allennlp.modules.token_embedders import Embedding
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder

token_embedding = Embedding(num_embeddings=config.max_vocab_size + 2,
                            embedding_dim=300, padding_index=0)
# the embedder maps the input tokens to the appropriate embedding matrix
word_embeddings: TextFieldEmbedder = BasicTextFieldEmbedder({"tokens": token_embedding})

In [0]:
from allennlp.modules.seq2vec_encoders import PytorchSeq2VecWrapper
encoder: Seq2VecEncoder = PytorchSeq2VecWrapper(nn.LSTM(word_embeddings.get_output_dim(),
                                                        config.hidden_sz, bidirectional=True, batch_first=True))


In [0]:
model = BaselineModel(
    word_embeddings, 
    encoder, 
)

In [0]:
if USE_GPU: model.cuda()
else: model

### Train

In [0]:
optimizer = optim.Adam(model.parameters(), lr=config.lr)

In [0]:
from allennlp.training.trainer import Trainer

trainer = Trainer(
    model=model,
    optimizer=optimizer,
    iterator=iterator,
    train_dataset=train_ds,
    cuda_device=0 if USE_GPU else -1,
    num_epochs=config.epochs,
)

In [0]:
metrics = trainer.train()

loss: 0.5522 ||: 100%|██████████| 229/229 [00:04<00:00, 46.38it/s]
loss: 0.5034 ||: 100%|██████████| 229/229 [00:04<00:00, 49.46it/s]
loss: 0.5012 ||: 100%|██████████| 229/229 [00:04<00:00, 50.00it/s]
loss: 0.5008 ||: 100%|██████████| 229/229 [00:04<00:00, 50.04it/s]
loss: 0.5013 ||: 100%|██████████| 229/229 [00:04<00:00, 47.44it/s]
loss: 0.5006 ||: 100%|██████████| 229/229 [00:04<00:00, 57.80it/s]
loss: 0.5004 ||: 100%|██████████| 229/229 [00:04<00:00, 50.30it/s]
loss: 0.5008 ||: 100%|██████████| 229/229 [00:04<00:00, 50.60it/s]
loss: 0.4999 ||: 100%|██████████| 229/229 [00:04<00:00, 55.95it/s]
loss: 0.4996 ||: 100%|██████████| 229/229 [00:04<00:00, 50.54it/s]


### Predictions

In [0]:
from allennlp.predictors.sentence_tagger import SentenceTaggerPredictor

In [0]:
tagger = SentenceTaggerPredictor(model, reader)

In [0]:
tagger.predict("Bad Service, utter disaster!")

{'class_logits': [-0.9477578997612, -0.3868182301521301, -0.7099025249481201],
 'loss': 0.41527312994003296}

## ELMo

In [0]:
from allennlp.data.tokenizers.word_splitter import SpacyWordSplitter
from allennlp.data.token_indexers.elmo_indexer import ELMoCharacterMapper, ELMoTokenCharactersIndexer

# the token indexer is responsible for mapping tokens to integers
token_indexer = ELMoTokenCharactersIndexer()

def tokenizer(x: str):
    return [w.text for w in SpacyWordSplitter(language='en_core_web_sm', pos_tags=False).split_words(x)[:config.max_seq_len]]

In [0]:
reader = SentimentDatasetReader(
    tokenizer=tokenizer,
    token_indexers={"tokens": token_indexer}
)

In [0]:
train_ds = reader.read(DATA_ROOT / "train.csv")
val_ds = None

14640it [00:18, 785.09it/s]


In [0]:
len(train_ds)

14640

In [0]:
vars(train_ds[0].fields["tokens"])

{'_indexed_tokens': None,
 '_indexer_name_to_indexed_token': None,
 '_token_indexers': {'tokens': <allennlp.data.token_indexers.elmo_indexer.ELMoTokenCharactersIndexer at 0x7f42b6afe320>},
 'tokens': [@VirginAmerica, What, @dhepburn, said, .]}

In [0]:
vars(train_ds[0].fields["label"])

{'array': array([0, 1, 0], dtype=object), 'padding_value': 0}

### Prepare Vocabulary

In [0]:
vocab = Vocabulary()

### Prepare Iterator

In [0]:
from allennlp.data.iterators import BucketIterator

In [0]:
iterator = BucketIterator(batch_size=config.batch_size, 
                          sorting_keys=[("tokens", "num_tokens")],
                         )

In [0]:
iterator.index_with(vocab)

### Read Sample

In [0]:
batch = next(iter(iterator(train_ds)))

In [0]:
batch

{'id': [None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None],
 'label': tensor([[1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0., 0., 1.],
         [0., 1., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0.

In [0]:
batch["tokens"]["tokens"].shape

torch.Size([64, 27, 50])

### Prepare Model

In [0]:
import torch
import torch.nn as nn
import torch.optim as optim

from allennlp.modules.seq2vec_encoders import Seq2VecEncoder, PytorchSeq2VecWrapper
from allennlp.nn.util import get_text_field_mask
from allennlp.models import Model
from allennlp.modules.text_field_embedders import TextFieldEmbedder

In [0]:
class BaselineModel(Model):
    def __init__(self, word_embeddings: TextFieldEmbedder,
                 encoder: Seq2VecEncoder,
                 out_sz: int=len(label_cols)):
        super().__init__(vocab)
        self.word_embeddings = word_embeddings
        self.encoder = encoder
        self.projection = nn.Linear(self.encoder.get_output_dim(), out_sz)
        self.loss = nn.BCEWithLogitsLoss()
        
    def forward(self, tokens: Dict[str, torch.Tensor],
                id: Any, label: torch.Tensor) -> torch.Tensor:
        mask = get_text_field_mask(tokens)
        embeddings = self.word_embeddings(tokens)
        state = self.encoder(embeddings, mask)
        class_logits = self.projection(state)
        
        output = {"class_logits": class_logits}
        output["loss"] = self.loss(class_logits, label)

        return output

### Prepare Embeddings

In [0]:
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.modules.token_embedders import ElmoTokenEmbedder

options_file = 'https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x1024_128_2048cnn_1xhighway/elmo_2x1024_128_2048cnn_1xhighway_options.json'
weight_file = 'https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x1024_128_2048cnn_1xhighway/elmo_2x1024_128_2048cnn_1xhighway_weights.hdf5'

elmo_embedder = ElmoTokenEmbedder(options_file, weight_file)
word_embeddings = BasicTextFieldEmbedder({"tokens": elmo_embedder})

100%|██████████| 336/336 [00:00<00:00, 245306.55B/s]
100%|██████████| 54402456/54402456 [00:02<00:00, 25637890.34B/s]


In [0]:
from allennlp.modules.seq2vec_encoders import PytorchSeq2VecWrapper
encoder: Seq2VecEncoder = PytorchSeq2VecWrapper(nn.LSTM(word_embeddings.get_output_dim(), config.hidden_sz, bidirectional=True, batch_first=True))


In [0]:
model = BaselineModel(
    word_embeddings, 
    encoder, 
)

In [0]:
if USE_GPU: model.cuda()
else: model

### Train

In [0]:
optimizer = optim.Adam(model.parameters(), lr=config.lr)

In [0]:
from allennlp.training.trainer import Trainer

trainer = Trainer(
    model=model,
    optimizer=optimizer,
    iterator=iterator,
    train_dataset=train_ds,
    cuda_device=0 if USE_GPU else -1,
    num_epochs=config.epochs,
)

In [0]:
metrics = trainer.train()

loss: 0.4940 ||: 100%|██████████| 229/229 [00:40<00:00,  5.54it/s]
loss: 0.3640 ||: 100%|██████████| 229/229 [00:40<00:00,  5.94it/s]
loss: 0.3417 ||: 100%|██████████| 229/229 [00:40<00:00,  5.63it/s]
loss: 0.3316 ||: 100%|██████████| 229/229 [00:40<00:00,  5.54it/s]
loss: 0.3232 ||: 100%|██████████| 229/229 [00:40<00:00,  5.67it/s]
loss: 0.3173 ||: 100%|██████████| 229/229 [00:40<00:00,  5.73it/s]
loss: 0.3101 ||: 100%|██████████| 229/229 [00:40<00:00,  6.72it/s]
loss: 0.3064 ||: 100%|██████████| 229/229 [00:40<00:00,  6.03it/s]
loss: 0.2999 ||: 100%|██████████| 229/229 [00:40<00:00,  5.61it/s]
loss: 0.2932 ||: 100%|██████████| 229/229 [00:40<00:00,  6.17it/s]


### Predictions

In [0]:
from allennlp.predictors.sentence_tagger import SentenceTaggerPredictor

In [0]:
tagger = SentenceTaggerPredictor(model, reader)

In [0]:
tagger.predict("Bad Service, utter disaster!")

{'class_logits': [1.424069881439209, -2.3454854488372803, -1.7706302404403687],
 'loss': 0.6294845342636108}

## BERT

In [0]:
from allennlp.data.token_indexers import PretrainedBertIndexer

token_indexer = PretrainedBertIndexer(
    pretrained_model="bert-base-uncased",
    max_pieces=config.max_seq_len,
    do_lowercase=True,
 )
# apparently we need to truncate the sequence here, which is a stupid design decision
def tokenizer(s: str):
    return token_indexer.wordpiece_tokenizer(s)[:config.max_seq_len - 2]

100%|██████████| 231508/231508 [00:00<00:00, 5636031.33B/s]


In [0]:
reader = SentimentDatasetReader(
    tokenizer=tokenizer,
    token_indexers={"tokens": token_indexer}
)

In [0]:
train_ds = reader.read(DATA_ROOT / "train.csv")
val_ds = None

14640it [00:15, 937.78it/s] 


In [0]:
len(train_ds)

14640

In [0]:
vars(train_ds[0].fields["tokens"])

{'_indexed_tokens': None,
 '_indexer_name_to_indexed_token': None,
 '_token_indexers': {'tokens': <allennlp.data.token_indexers.wordpiece_indexer.PretrainedBertIndexer at 0x7f42b0c8e198>},
 'tokens': [[UNK], [UNK], @, ##dh, ##ep, ##burn, said, ##.]}

In [0]:
vars(train_ds[0].fields["label"])

{'array': array([0, 1, 0], dtype=object), 'padding_value': 0}

### Prepare Vocabulary

In [0]:
vocab = Vocabulary()

### Prepare Iterator

In [0]:
from allennlp.data.iterators import BucketIterator

In [0]:
iterator = BucketIterator(batch_size=config.batch_size, 
                          sorting_keys=[("tokens", "num_tokens")],
                         )

In [0]:
iterator.index_with(vocab)

### Read Sample

In [0]:
batch = next(iter(iterator(train_ds)))

In [0]:
batch

{'id': [None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None,
  None],
 'label': tensor([[1., 0., 0.],
         [0., 0., 1.],
         [1., 0., 0.],
         [0., 1., 0.],
         [1., 0., 0.],
         [0., 0., 1.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0., 1., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [0., 0., 1.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.],
         [1.

In [0]:
batch["tokens"]["tokens"].shape

torch.Size([64, 28])

### Prepare Model

In [0]:
import torch
import torch.nn as nn
import torch.optim as optim

from allennlp.modules.seq2vec_encoders import Seq2VecEncoder, PytorchSeq2VecWrapper
from allennlp.nn.util import get_text_field_mask
from allennlp.models import Model
from allennlp.modules.text_field_embedders import TextFieldEmbedder

In [0]:
class BaselineModel(Model):
    def __init__(self, word_embeddings: TextFieldEmbedder,
                 encoder: Seq2VecEncoder,
                 out_sz: int=len(label_cols)):
        super().__init__(vocab)
        self.word_embeddings = word_embeddings
        self.encoder = encoder
        self.projection = nn.Linear(self.encoder.get_output_dim(), out_sz)
        self.loss = nn.BCEWithLogitsLoss()
        
    def forward(self, tokens: Dict[str, torch.Tensor],
                id: Any, label: torch.Tensor) -> torch.Tensor:
        mask = get_text_field_mask(tokens)
        embeddings = self.word_embeddings(tokens)
        state = self.encoder(embeddings, mask)
        class_logits = self.projection(state)
        
        output = {"class_logits": class_logits}
        output["loss"] = self.loss(class_logits, label)

        return output

### Prepare Embeddings

In [0]:
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.modules.token_embedders.bert_token_embedder import PretrainedBertEmbedder

bert_embedder = PretrainedBertEmbedder(
        pretrained_model="bert-base-uncased",
        top_layer_only=True, # conserve memory
)
word_embeddings: TextFieldEmbedder = BasicTextFieldEmbedder({"tokens": bert_embedder},
                                                            # we'll be ignoring masks so we'll need to set this to True
                                                           allow_unmatched_keys = True)

100%|██████████| 407873900/407873900 [00:08<00:00, 47933335.81B/s]


In [0]:
BERT_DIM = word_embeddings.get_output_dim()

class BertSentencePooler(Seq2VecEncoder):
    def forward(self, embs: torch.tensor, 
                mask: torch.tensor=None) -> torch.tensor:
        # extract first token tensor
        return embs[:, 0]
    
    @overrides
    def get_output_dim(self) -> int:
        return BERT_DIM
    
encoder = BertSentencePooler(vocab)

In [0]:
model = BaselineModel(
    word_embeddings, 
    encoder, 
)

In [0]:
if USE_GPU: model.cuda()
else: model

### Train

In [0]:
optimizer = optim.Adam(model.parameters(), lr=config.lr)

In [0]:
from allennlp.training.trainer import Trainer

trainer = Trainer(
    model=model,
    optimizer=optimizer,
    iterator=iterator,
    train_dataset=train_ds,
    cuda_device=0 if USE_GPU else -1,
    num_epochs=config.epochs,
)

In [0]:
metrics = trainer.train()

loss: 0.5332 ||: 100%|██████████| 229/229 [00:54<00:00,  5.84it/s]
loss: 0.4862 ||: 100%|██████████| 229/229 [00:54<00:00,  4.54it/s]
loss: 0.4623 ||: 100%|██████████| 229/229 [00:54<00:00,  4.65it/s]
loss: 0.4470 ||: 100%|██████████| 229/229 [00:54<00:00,  4.20it/s]
loss: 0.4341 ||: 100%|██████████| 229/229 [00:56<00:00,  3.62it/s]
loss: 0.4245 ||: 100%|██████████| 229/229 [00:54<00:00,  4.71it/s]
loss: 0.4178 ||: 100%|██████████| 229/229 [00:54<00:00,  4.30it/s]
loss: 0.4107 ||: 100%|██████████| 229/229 [00:54<00:00,  3.92it/s]
loss: 0.4076 ||: 100%|██████████| 229/229 [00:54<00:00,  4.22it/s]
loss: 0.4033 ||: 100%|██████████| 229/229 [00:54<00:00,  4.81it/s]


### Predictions

In [0]:
from allennlp.predictors.sentence_tagger import SentenceTaggerPredictor

In [0]:
tagger = SentenceTaggerPredictor(model, reader)

In [0]:
tagger.predict("Bad Service, utter disaster!")

{'class_logits': [0.7946805357933044,
  -2.3529469966888428,
  -1.6355410814285278],
 'loss': 0.4787622392177582}