<a href="https://colab.research.google.com/github/benjamininden/AI-teaching-python/blob/main/T5MultiTaskTransformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tackling multiple natural language processing tasks with a single multi-task trained transformer network

Transformer networks can be used on a variety of natural language processing tasks. They are typically pre-trained on a task with a large corpus of data that requres no human annotation, and then fine tuned on more specific tasks. Some task-specific dense layers ("heads") are put on top of the core architecture and trained for that purpose.

However, it is also possible to train on a mixture of the usual pre-training tasks and some more specific (benchmark) tasks. That way, while performance on the targeted tasks might be a bit less, it is not necessary to use separate heads for each task. This can be done by prefixing samples from a specific tasks with a corresponding textual command, e.g., if you want to translate sentence *X* from English to German, you input "translate from English to German: *X*"

Google's T5 (text-to-text transfer transformer) method uses that approach. It uses an encoder-decoder transformer network, and is described in detail in [this paper](https://arxiv.org/pdf/1910.10683.pdf). The Huggingface transformers Python package provides an [easy way](https://huggingface.co/transformers/model_doc/t5.html) to use it. There are actually [five different versions](https://huggingface.co/transformers/pretrained_models.html) of the model: "small" with 60M parameters, "base" with 220M parameters, "large" with 770M parameters, "3B" with 2.8B parameters, and "11B" with 11B parameters. Below, we load the trained "large" and "small" models, each with a suitable tokenizer.

In [2]:
! pip install -q transformers
from timeit import default_timer
from transformers import TFAutoModelWithLMHead, AutoTokenizer

model = TFAutoModelWithLMHead.from_pretrained("t5-large")
tokenizer = AutoTokenizer.from_pretrained("t5-large")

smallmodel = TFAutoModelWithLMHead.from_pretrained("t5-small")
smallmtokenizer = AutoTokenizer.from_pretrained("t5-small")

[K     |████████████████████████████████| 2.6 MB 5.3 MB/s 
[K     |████████████████████████████████| 895 kB 54.2 MB/s 
[K     |████████████████████████████████| 3.3 MB 39.9 MB/s 
[K     |████████████████████████████████| 636 kB 40.7 MB/s 
[?25h



HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1200.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2951710928.0, style=ProgressStyle(descr…




All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-large.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1197.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=242303832.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…




# Linguistic acceptability



The specific tasks that T5 was trained on include benchmark tasks from [GLUE](https://gluebenchmark.com/) and [SuperGLUE](https://super.gluebenchmark.com/) as well as the [SQuAD benchmark](https://arxiv.org/abs/1606.05250). Let us try to use T5 on some of these tasks. We start with the CoLa task where is has to be decided whether an English sentence is linguistically acceptable.

In [4]:
inputs = tokenizer.encode("cola sentence: I gave him the book yesterday", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

tstart = default_timer()
inputs = tokenizer.encode("cola sentence: I gave he the book yesterday", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))
print("time elapsed (large model):", default_timer() - tstart)

tstart = default_timer()
inputs = smallmtokenizer.encode("cola sentence: I gave he the book yesterday", return_tensors="tf")
outputs = smallmodel.generate(inputs, max_length=10, num_beams=1)
print(smallmtokenizer.decode(outputs[0]))
print("time elapsed (small model):", default_timer() - tstart)

<pad> acceptable</s>
<pad> unacceptable</s>
time elapsed (large model): 1.853573492999999
<pad> acceptable</s>
time elapsed (small model): 0.35252801699999736


For the examples here, the large T5 network makes a correct classification. It can be seen that unsurprisingly, the small model works much faster than the large one, not to mention the huge download size for the latter. However, its produces the wrong answer. We will continue to work with the large model. The T5 paper has a detailed comparison of their task performances.

For the parameters of the method `generate()` (class `Model`), see [here](https://huggingface.co/transformers/main_classes/model.html).

# Natural language inference

Next, we deal with tasks that require some reasoning on natural language. The Recognizing Textual Entailment (RTE) task requires a judgement on whether sentence 2 logically follows from sentence 1. The Multi-Genre Natural Language Inference (MNLI) task is similar but has a third class of judgements, namely "contradiction".

In [5]:
inputs = tokenizer.encode("rte sentence1: Stenocarpus sinuatus is an Australian rainforest tree \
whose flowers are bright red in umbels, in a circular formation, hence the name Firewheel Tree. \
sentence2: There are red flowers in Australia.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("rte sentence1: Stenocarpus sinuatus is an Australian rainforest tree \
whose flowers are bright red in umbels, in a circular formation, hence the name Firewheel Tree. \
sentence2: Australian rainforests are always red.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("mnli hypothesis: There are red flowers in Australia. premise: \
Stenocarpus sinuatus is an Australian rainforest tree whose flowers are bright red in umbels, \
in a circular formation, hence the name Firewheel Tree.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("mnli hypothesis: Firewheel trees have blue flowers. premise: \
Stenocarpus sinuatus is an Australian rainforest tree whose flowers are bright red in umbels, \
in a circular formation, hence the name Firewheel Tree.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

<pad> entailment</s>
<pad> not_entailment</s>
<pad> entailment</s>
<pad> contradiction</s>


# Semantic equivalence

The Microsoft Research Paraphrase Corpus consists
of sentence pairs automatically extracted from online news
sources, with human annotations for whether the sentences in
the pair are semantically equivalent (have the same meaning).

In [7]:
inputs = tokenizer.encode("mrpc sentence1: Generally speaking, the diagnostic feature \
of Proteaceae is the compound flower head. sentence2: Proteaceae commonly have a \
characteristic compound flower head.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("mrpc sentence1: Generally speaking, the diagnostic feature \
of Proteaceae is the compound flower head. sentence2: Proteaceae rarely have a \
characteristic compound flower head.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

<pad> equivalent</s>
<pad> not_equivalent</s>


# Sentiment analysis

Sentiment analysis often deals with whether a text expresses a positive or negative opinion about something. This can be used to find out what customers think about certain products or services. The Stansord Sentiment Treebank (SST-2) corpus is a dataset of human annotated movie reviews.

In [8]:
inputs = tokenizer.encode("sst2 sentence: If you like shallow dialogues and cheap special effects, \
then you must watch this movie.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("sst2 sentence: If you like intelligent dialogues and state-of-the-art \
special effects, then you must watch this movie.", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

<pad> negative</s>
<pad> positive</s>


# Commonsense causal reasoning

The Choice of Plausible Alternatives (COPA) task evaluates how well the model can, when provided with two candidate sentences, find which one of them describes a cause or effect of a given statement. Both the "cause" and the "effect" case are shown below.

In [9]:
inputs = tokenizer.encode("copa choice1: The river could no longer be crossed. choice2: \
The river became a habitat for ducks. premise: The river flooded its banks. question: effect", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("copa choice1: The tree had started to flower prolifically. choice2: \
The tree had grown very tall. premise: The tree provided a lot of shade. question: cause", return_tensors="tf")
outputs = model.generate(inputs, max_length=10, num_beams=1)
print(tokenizer.decode(outputs[0]))

<pad> choice1</s>
<pad> choice2</s>


# Question answering

This is the Stanford Question Answering (SQuAD) task. You provide a longer text and then ask one question that is answered within the text. Hopefully T5 will extract the correct answer from the text. By the way, the text in the example below, like some earlier botanical examples, has been derived from Wikipedia text. 

In [10]:
inputs = tokenizer.encode("""
question: What do the flowers of Passiflora foetida look like? context: Passiflora foetida \
is a species of passion flower that is native to the southwestern United States \
(southern Texas and Arizona), Mexico, the Caribbean, Central America, and much of South America. \
It has been introduced to tropical regions around the world, such as Southeast Asia, South Asia, \
Hawaii, Africa, and The Maldives. It is a creeping vine like other members of the genus, \
and yields an edible fruit. The specific epithet, foetida, means "stinking" in Latin and \
refers to the strong aroma emitted by damaged foliage. This passion flower tolerates arid ground, \
but favours moist areas. It is known to be an invasive species in some areas. This plant is also \
a widely grown perennial climber, and has been used in traditional medicine. The stems are thin \
and wiry, and are covered with minute sticky yellow hairs. Older stems become woody. The leaves \
are three- to five-lobed and viscid-hairy. When crushed, these leaves give off a pungent odor \
that some people consider unpleasant. The flowers are white to pale cream coloured, about 5–6 cm \
diameter. The fruit is globose, 2–3 cm diameter, yellowish-orange to red when ripe, and has \
numerous black seeds embedded in the pulp; the fruit are eaten and the seeds dispersed by birds. \
Passiflora foetida is able to trap insects on its bracts, which exude a sticky substance that \
also contains digestive enzymes. This minimizes predation on young flowers and fruits. \
Whether or not it gains nourishment from its prey is uncertain, and it is considered a protocarnivorous plant.
""", return_tensors="tf")
outputs = model.generate(inputs, max_length=50, num_beams=4)
print(tokenizer.decode(outputs[0]))

<pad> white to pale cream coloured


# Machine translation

Finally, we come to the task of translating into other languages. T5 has only been trained on English to German / French / Romanian translation. The huggingface transformers package has [other pre-trained models](https://huggingface.co/transformers/pretrained_models.html) specifically for machine translation, including some with 100 different languages.

In [11]:
inputs = tokenizer.encode("translate English to German: Having already transformed \
many industries, Artificial Intelligence remains a key technology to watch for the \
foreseeable future", return_tensors="tf")
outputs = model.generate(inputs, max_length=50, num_beams=4)
print(tokenizer.decode(outputs[0]))

inputs = tokenizer.encode("translate English to French: Having already transformed \
many industries, Artificial Intelligence remains a key technology to watch for the \
foreseeable future", return_tensors="tf")
outputs = model.generate(inputs, max_length=50, num_beams=4)
print(tokenizer.decode(outputs[0]))

<pad> Die künstliche Intelligenz, die bereits viele Industrien verändert hat, bleibt eine Schlüsseltechnologie, die in absehbarer Zukunft zu beobachten ist.
<pad> Ayant déjà transformé de nombreuses industries, l’intelligence artificielle demeure une technologie clé à surveiller dans un avenir prévisible
