# 4.2 How to use GPT

GPT (Generative Pretrained Transformer) is a model trained to generate text given a preceding input, so-called prompt (Brown et al 2020). It can do this repetitively up to a certain length, likewise generating short stories.

Another generative model is T5 (Text to Text Transfer Transformer, Raffel et al. 2019). T5 models many tasks as a text generation task, ranging from plain translation, sentiment annotation, question-answering, similarity, to summarisation. Tasks are differentiated through prompt prefixes.

<img src="T5.gif">

Models such as GPT3,4 and T5, although having good performance, are by far too large model to work with locally. Therefore in this notebook, we look into an older model GPT2, which is smaller and publicly available. It is nevertheless a generative model designed just like the others.

### References

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.

OpenAI, 2023. GPT-4 Technical Report. arXiv:2303.08774

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text trans- former. arXiv preprint arXiv:1910.10683.

Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108 (2019).

## Generative Models for English

We can load GPT2 from the Huggingface platform as we did before for BERT and XLM-RoBERTa as part of a pipeline. We now specify the task as **text-generation**. As the model is big, it may take a while to load it.

In [1]:
from transformers import pipeline

In [2]:
gpt2pipe = pipeline("text-generation", model="gpt2")

Once you succesfully downloaded it, it is saved on disk in cache for futher use. The next time you load the model it will be faster from disk.

You can now pass in any text as a prompt to this pipeline instance and it will complete the text according to the model. We create a list of prompts that are very similar except for the entity as the subject. In this way, we can test if the model also generates different texts relevant for the different entities.

In [3]:
prompts = ['Boris Johnson is called to justice for',
           'Donald Trump is called to justice for', 
           'Angela Merkel is called to justice for']

In [4]:
for prompt in prompts:
    print(gpt2pipe(prompt))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Boris Johnson is called to justice for the man who killed three pedestrians on London Bridge.\n\nThe Independent has launched its #FinalSay campaign to demand that voters are given a voice on the final Brexit deal.\n\n\nSign our petition here'}]


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Donald Trump is called to justice for tweeting this video, but the New York Times reports that it never was.\n\n\nHere's some information about this:\n\nOn March 8 at 10:25 A.M., three New Yorkers were fatally shot"}]
[{'generated_text': 'Angela Merkel is called to justice for what she did. It\'s hard to believe that the chancellor has done it herself and does it because it fits in with political correctness."\n\n\nPolls suggest a second round of EU divorce talks are drawing to'}]


We can clearly see that the stories are different for each entity but also show specific details that seem relevant for each. Whether these stories are correct and factual is a different thing. Generative models do not index facts but make up facts based on word probabilities.

If you do not have a powerful computer, GPT2 may be to big to use or too slow. Researchers found a way to compress large models to smaller more efficient models with almost equal performance. Knowledge distillation is a compression technique in which a smaller model is trained to reproduce the behaviour of a larger model or an ensemble of models. The distilled model is trained with a distillation loss over the soft target probabilities of the original model (Sanh et al., 2019).

There is also a distilled version of GTP2 called *distilgpt2*, which is smaller (only 40% of the original parameters) and faster while it is claimed to have almost equal performance.

In [5]:
distilgpt2pipe = pipeline("text-generation", model="distilgpt2")

In [6]:
for prompt in prompts:
    print(distilgpt2pipe(prompt))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Boris Johnson is called to justice for the public's safety, but Mr Johnson insists that Mr Johnson should remain the only candidate to have been selected by the British Labour Party.\n\n\n\n\n\nHe says Boris Johnson should remain with the Conservatives"}]


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Donald Trump is called to justice for his campaign's treatment of African Americans, according to a survey by the Democracy & Media Institute, an academic group for the conservative-leaning left-leaning nonprofit Public Policy Polling that tracks electoral outcomes across three presidential campaign"}]
[{'generated_text': 'Angela Merkel is called to justice for her role in the crisis.'}]


It gives very output for our prompts. DistilGPT2 has substantial less parameters (40%) and apparently represents less knowledge for the targets entities. The stories are shorter and contains less entity specific details.

## GPT2 for other languages than English

Building a GPT model from scratch is costly. You not only need a lot of data but also computer power to create such a model. An interesting alternative is to only train the vocabulary part of a model for a language and to keep the hidden layers of the English model for the contextual attention relations and capability to predict the next token embeddings. You can imagine that once the words in a sentence from a language get reasonable embedding representations, similar relations will hold through the attention mechanism across these embedding representations learned from the English data.

Such an apporach was followed by *de Vries and Nissim (2021)* from Groningen University for Dutch and Italian. You can read the paper for more details. In a nutshell, the hidden layers of the English model are frozen and only the lexical embeddings are trained using Dutch and Italian data using the same autoregressive self-attention approach as for English. The lexical embeddings are used to initialise the model with the input prompt and to predict the next word to be generated.

References:

de Vries, Wietse, and Malvina Nissim. "As good as new. How to successfully recycle English GPT-2 to make models for other languages." arXiv preprint arXiv:2012.05628 (2020). https://aclanthology.org/2021.findings-acl.74.pdf

See also: https://github.com/wietsedv/gpt2-recycle


You can download the resulting GPT2 models for Dutch and Italian from Huggingface and generate a Dutch and Italian short story from a prompt.

In [7]:
dutchGpt2pipe = pipeline("text-generation", model="GroNLP/gpt2-small-dutch")

In [8]:
dutch_prompts = ['Mark Rutte is ter verantwoording geroepen voor',
           'Thierry Baudet is ter verantwoording geroepen voor', 
           'Thierry Rutte is ter verantwoording geroepen voor']

In [9]:
for prompt in dutch_prompts:
    print(dutchGpt2pipe(prompt))

[{'generated_text': 'Mark Rutte is ter verantwoording geroepen voor de aanslagen van 11 september 2001. De VVD-leider zei in een vraaggesprek met The Washington Post, \'Ik denk dat er nu wel iets anders moet zijn.\'\nPremier Mark Rutte (rechts) noemt het "een grote eer om te zien hoe we onze rechtsstaat kunnen hervormen." Op Twitter heeft hij nog niet gereageerd op deze kritiek. Hij twitterde vrijdag:,,We moeten terug naar die tijd toen wij al zo\'n twintig keer zoveel mensen hadden gedood.\'\''}]
[{'generated_text': "Thierry Baudet is ter verantwoording geroepen voor het falen van de Nederlandse politiek. Hij zegt dat hij zich niet goed voelt om in een politieke partij te zijn, maar wel eens als 'de leider'.\n'Je kunt je afvragen waarom er zo veel mensen op straat zitten', zei Baudet aan RTL 5-presentatrice Marlon Brando over haar interview met burgemeester Eberhard van der Laan: 'Ik kan me niets voorstellen wat ik die dag moet meemaken.'\nBaudet was vorige maand na afloop van"}]
[{'g

In [10]:
print(dutchGpt2pipe('Een klein kind')[0])

{'generated_text': 'Een klein kind is een van de beste. Je weet niet eens wat dat betekent, hè?"\nHij knikte en liet zijn hand onder haar arm glijden. \'Ik heb het nooit gezegd.\'\n"Waarom zou ik dan zo\'n domme vrouw willen trouwen? Ik ben er toch geen maagd meer in geslaagd om voor je te leven?\'\nHaar hart bonsde toen ze hem aankeek. Haar lippen krulden omhoog naar voren zodat hij zich nauwelijks kon verroeren. Zijn handen gleden over haar rug heen en weer'}


In [11]:
italianGpt2pipe = pipeline("text-generation", model="GroNLP/gpt2-small-italian")

print(italianGpt2pipe('Uno bambino picolo')[0])

{'generated_text': "Uno bambino picologizzato ha il diritto di scegliere tra due modi: la scelta dell'ambiente e l'acquisto del cibo. Gli Stati Uniti hanno fatto un grande passo in avanti nell'individuazione delle condizioni migliori per vivere con i bambini, grazie ad accordi bilaterali che consentono ai paesi poveri di ottenere una maggiore protezione contro gli abusi commessi dalle loro famiglie o dall'UNICEF (Organizzazione Mondiale della Salute). Le Nazioni Unite stanno facendo progressi nel riconoscimento dei diritti degli individui umani a livello nazionale ed internazionale"}


The larger generative models such as ChatGPT, BARD, LLAMA have seen data in many languages (although still dominantly English, 93% of the data in case of GPT3). These models can directly represent prompts as input and generate text in these languages without further training. However, research into the multilinguality of these models is ongoing and the dominance of English appears to have an impact on the language generated for non-English languages.

## End of notebook