# TransformerXL

The idea that we look at all the words in proportion to their relevance, while understanding a word in a sequence is the prime factor for the success of transformers in the natural language processing domain. However this attention mechanism comes at a cost. It restricts the possible length of the sequence of words. In NLP settings where you have to model log range dependencies between words this becomes a major showstopper. In this practice session, we will explore Transformer XL, a Transformer model that allows us to model long range dependencies while not disrupting the temporal coherence.

To read about it more, please refer [this](https://analyticsindiamag.com/what-is-transformer-xl/) article.

## Usage

Transformer XL is a huge model hence it needs a high memory GPU setup to pre train or finetune. We will stick to just running inference in this article due to memory constraints

huggingface provides this transformer model as a simple package.A sequence classification head is added on top of Transformer XL and is provided in the library.

In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn nltk gensim tensorflow keras transformers tqdm --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)

There are two steps we need to do as the part of inference pipeline.

1.tokenize the inputs and format them as per the model requirement.this is done by a custom tokenizer for this model.

2.Pass the tokenized inputs into the model and collect the outputs.

In [None]:
from transformers import TransfoXLTokenizer, TransfoXLForSequenceClassification
import torch
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103', max_length=128, pad_to_max_length=True,)
model = TransfoXLForSequenceClassification.from_pretrained('transfo-xl-wt103')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits

In [None]:
import tensorflow_datasets as tfds
train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews", 
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True)

In [None]:
df_train=list(train_data.batch(15000).as_numpy_iterator())
inputs=[x.decode('utf-8') for x in df_train[0][0]]
targets=df_train[0][1]
valid_train=list(validation_data.batch(15000).as_numpy_iterator())
valid_inputs=[x.decode('utf-8') for x in valid_train[0][0]]
valid_targets=valid_train[0][1]

In [None]:
from tqdm import tqdm_notebook as tqdm
losses=[model(**tokenizer(inputs[i],return_tensors='pt'),labels=torch.tensor(targets[i]).unsqueeze(0))['loss'].item() for i in tqdm(range(100))]


In [None]:
import numpy as np
np.mean(losses)

In [None]:
logits=[model(**tokenizer(inputs[i],return_tensors='pt'),labels=torch.tensor(targets[i]).unsqueeze(0))['logits'].detach().numpy() for i in tqdm(range(100))]

In [None]:
y_pred=[np.argmax(np.exp(i[0])/np.sum(np.exp(i[0]),)) for i in logits]
y_true=targets[:100]

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_true,y_pred))

Tried running as a tensorflow model but kernel would crash due high RAM utilization.

In [None]:
from tqdm import tqdm_notebook as tqdm
import numpy as np
def tokenize(sentences, tokenizer):
    input_ids = []
    for sentence in tqdm(sentences):
        inputs = tokenizer.encode_plus(sentence, max_length=128, pad_to_max_length=True,)
        input_ids.append(inputs['input_ids'])
        
        
    return np.asarray(input_ids, dtype='int32')
tokenizer.pad_token = tokenizer.eos_token
inputs=tokenize(valid_inputs,tokenizer)

In [None]:
import tensorflow as tf
input_ids = tf.keras.layers.Input(shape=(128,), name='input_token', dtype='int32')
input_labels = tf.keras.layers.Input(shape=(128,), name='masked_token', dtype='int32')
X = model(**{'input_ids':input_ids,'labels': input_labels})
classification_model = tf.keras.Model(inputs=[input_ids, input_masks_ids], outputs = X)

In [None]:
classification_model.layers