# Examples of How to Use Transformers

## Standard Tasks

### Sentiment Analysis

In [1]:
import tensorflow

In [2]:
!pip install --upgrade pip        
!pip install --upgrade setuptools 
!pip install --user --upgrade tensorflow-gpu
!pip install transformers 

Collecting pip
  Using cached pip-20.3-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
Successfully installed pip-20.3
Collecting setuptools
  Using cached setuptools-50.3.2-py3-none-any.whl (785 kB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 46.1.3.post20200330
    Uninstalling setuptools-46.1.3.post20200330:
      Successfully uninstalled setuptools-46.1.3.post20200330
Successfully installed setuptools-50.3.2
Collecting transformers
  Using cached transformers-4.0.0-py3-none-any.whl (1.4 MB)
Collecting filelock
  Using cached filelock-3.0.12-py3-none-any.whl (7.6 kB)
Collecting regex!=2019.12.17
  Using cached regex-2020.11.13-cp36-cp36m-manylinux2014_x86_64.whl (723 kB)
Collecting sacremoses
  Using cached sacremoses-0.0.43-py3-none-any.whl
Collecti

In [3]:
!pip install --user ipywidgets jupyter
!pip install --upgrade jupyter_client

Collecting ipywidgets
  Using cached ipywidgets-7.5.1-py2.py3-none-any.whl (121 kB)
Collecting nbconvert
  Using cached nbconvert-6.0.7-py3-none-any.whl (552 kB)
Collecting bleach
  Using cached bleach-3.2.1-py2.py3-none-any.whl (145 kB)
Collecting defusedxml
  Using cached defusedxml-0.6.0-py2.py3-none-any.whl (23 kB)
Collecting entrypoints>=0.2.2
  Using cached entrypoints-0.3-py2.py3-none-any.whl (11 kB)
Collecting jupyterlab-pygments
  Using cached jupyterlab_pygments-0.1.2-py2.py3-none-any.whl (4.6 kB)
Collecting mistune<2,>=0.8.1
  Using cached mistune-0.8.4-py2.py3-none-any.whl (16 kB)
Collecting nbclient<0.6.0,>=0.5.0
  Using cached nbclient-0.5.1-py3-none-any.whl (65 kB)
Collecting async-generator
  Using cached async_generator-1.10-py3-none-any.whl (18 kB)
Collecting nbformat>=4.2.0
  Using cached nbformat-5.0.8-py3-none-any.whl (172 kB)
Collecting jsonschema!=2.5.0,>=2.4
  Using cached jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)
Collecting attrs>=17.4.0
  Using cached attr

In [4]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('Batman Begins is a great movie! Truly a classic!')

[{'label': 'POSITIVE', 'score': 0.9998838305473328}]

### Question Answering

In [5]:
from transformers import pipeline
question_answerer = pipeline('question-answering')
question_answerer({
  'question': 'What is the name of my dog?',
  'context': 'I have a dog named Sam. He likes to chase cats in the neighborhood.'})

{'score': 0.9907371997833252, 'start': 19, 'end': 22, 'answer': 'Sam'}

### Translation

In [6]:
from transformers import pipeline
translator = pipeline('translation_en_to_fr')
translator("The quick brown fox jumped.")

Some weights of the model checkpoint at t5-base were not used when initializing T5Model: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at t5-base were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification

[{'translation_text': 'Le renard brun rapide saute.'}]

### Text Summarization

In [None]:
from transformers import pipeline
summarizer = pipeline('summarization', model="t5-base", tokenizer="t5-base", framework="tf")
speech = open('./data/never_give_in.txt').read()
summarizer(speech, min_length=50, max_length=100)

Some layers from the model checkpoint at t5-base were not used when initializing TFT5ForConditionalGeneration: ['decoder/block_._0/layer_._1/EncDecAttention/relative_attention_bias/embeddings:0']
- This IS expected if you are initializing TFT5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFT5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


## Finetuning Transformers

### Fake News Classification

In [8]:
from transformers import TFBertForSequenceClassification, BertTokenizerFast
import tensorflow as tf
from tensorflow.keras.callbacks import *
import pandas as pd
from sklearn.model_selection import train_test_split

transformer_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased', 
                                               max_length = 256, # max length of the text that can go to BERT
                                               pad_to_max_length = True)

data = pd.read_csv('./data/combined_news_data_processed.csv')
data.dropna(inplace=True)

X = data['text']
y = data['label']
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, random_state=42)

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertForSequenceClassification: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['dropout_37', 'classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
X_train_encoded = dict(tokenizer(list(X_train.values),
                                add_special_tokens = True, # add [CLS], [SEP]
                                max_length = 256, # max length of the text that can go to BERT
                                pad_to_max_length = True, # add [PAD] tokens
                                return_attention_mask = True))

X_valid_encoded = dict(tokenizer(list(X_valid.values),
                                add_special_tokens = True, # add [CLS], [SEP]
                                max_length = 256, # max length of the text that can go to BERT
                                pad_to_max_length = True, # add [PAD] tokens
                                return_attention_mask = True))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [10]:
train_data = tf.data.Dataset.from_tensor_slices((X_train_encoded, list(y_train.values)))
valid_data = tf.data.Dataset.from_tensor_slices((X_valid_encoded, list(y_valid.values)))

In [13]:
callbacks = [ModelCheckpoint('./bert_fake_news_model', save_best_only=True, monitor='val_loss')]
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
transformer_model.compile(optimizer=optimizer, 
                          loss=transformer_model.compute_loss, 
                          metrics=['accuracy']) # can also use any keras loss fn
transformer_model.fit(train_data.shuffle(1000).batch(16), 
                      epochs=1, batch_size=16, 
                      validation_data=valid_data.batch(16),
                      callbacks=callbacks)



<tensorflow.python.keras.callbacks.History at 0x7f12f39dc080>