# 4.2 How to use GPT

GPT (Generative Pretrained Transformer) is a model trained to generate text given a preceding input (Brown et al 2020) It can do this repetitively up to a certain length, likewise generating short stories.

Another generative model is T5 (Text to Text Transfer Transformer, Raffel et al. 2019). T5 models many tasks as a text generation task, ranging from plain translation, sentiment annotation, question-answering, similarity, to summarisation. Tasks are differentiated through prompt prefixes.

<img src="T5.gif">

Models such as GPT4 and T5, although having good performance, are by far too large model to work with. Therefore in this notebook, we look into an older model GPT2, which is smaller and publicly available. It is nevertheless a generative model designed just like the others.

### References

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.

OpenAI, 2023. GPT-4 Technical Report. arXiv:2303.08774

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text trans- former. arXiv preprint arXiv:1910.10683.

Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108 (2019).

## Generative Models for English

We can load GPT2 from the Huggingface platform as we did before for BERT and XLM-RoBERTa as part of a pipeline. We now specify the task as **text-generation**. As the model is big, it may take a while to load it.

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
gpt2pipe = pipeline("text-generation", model="gpt2")

Downloading (…)neration_config.json: 100%|███████████████████████████████████████████| 124/124 [00:00<00:00, 16.2kB/s]


Once you succesfully downloaded it, it is saved on disk in cache for futher use. The next time you load the model it will be faster from disk.

You can now pass in any text as a prompt to this pipeline instance and it will complete the text according to the model. We create a list of prompts that are very similar except for the populair entity as the subject. In this way, we can test of the model also generates different texts relevant for the different entities.

In [3]:
prompts = ['Boris Johnson is called to justice for',
           'Donald Trump is called to justice for', 
           'Angela Merkel is called to justice for']

In [4]:
for prompt in prompts:
    print(gpt2pipe(prompt))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Boris Johnson is called to justice for his actions. He\'s a man of the people. "It would be a great honour to sit down together and speak to the families of those killed in the London Bridge terror attack," said Mr Johnson.\n\n'}]


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Donald Trump is called to justice for the horrific attack he is accused of assaulting a woman that sparked protests Saturday in Charlottesville that included a violent confrontation that left one woman dead.\n\nThe president said in a Twitter post: "My administration should do all'}]
[{'generated_text': "Angela Merkel is called to justice for her 'inappropriate' behaviour amid a row over her treatment of asylum seekers. The Germany Interior Ministry is looking into whether she breached a federal statute by telling local authorities to be patient with those who seek justice."}]


We can clearly see that the stories are different for each entity but also show specific details that seem relevant for each. Whether these stories are correct and factual is a diffeent thing. Generative models do not index facts but make up facts based on word probabilities.

If you do not have a powerful computer, GPT2 may be to big to use or too slow. Don't worry! Researchers found a way to compress large models to smaller more efficient models with almost equal performance. Knowledge distillation is a compression technique in which a smaller model is trained to reproduce the behaviour of a larger model or an ensemble of models. The distilled model is trained with a distillation loss over the soft target probabilities of the original model (Sanh et al., 2019).

There is also a distilled version of GTP2 called *distilgpt2*, which is smaller (only 40% of the original parameters) and faster while it is claimed to have almost equal performance.

In [5]:
distilgpt2pipe = pipeline("text-generation", model="distilgpt2")

Downloading (…)neration_config.json: 100%|███████████████████████████████████████████| 124/124 [00:00<00:00, 65.4kB/s]


In [6]:
for prompt in prompts:
    print(distilgpt2pipe(prompt))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Boris Johnson is called to justice for the millions he has made in support of the EU in light of the decision to leave the single market.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'}]


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Donald Trump is called to justice for his support of the U.S.'s military in Iraq after a U.S. soldier told him they would be prosecuted for aiding and abetting the Islamic State — which the U.S. says has carried"}]
[{'generated_text': 'Angela Merkel is called to justice for her alleged mishandling of the refugee crisis, German media say, and on Monday called for changes to the law which enables her to seek asylum.\n\n\n\n\n\n\n\n\n\n\n\n\n'}]


It gives very different output for our prompts. DistilGPT2 has substantial less parameters (40%) and apparently represents less knowledge for the targets entities. The stories are shorter and contains less entity specific details.

## GPT2 for other languages than English

Building a GPT model from scratch is costly. You not only need a lot of data but also computer power to create such a model. An interesting alternative is to only train the vocabulary part of a model for a language and to keep the hidden layers of the English model for the contextual attention relations and capability to predict the next token embeddings. You can imagine that once the words in a sentence from a language get reasonable embedding representations, similar relations will hold through the attention mechanism across these embedding representations learned from the English data.

Such an apporach was followed by *de Vries and Nissim (2021)* from Groningen University for Dutch and Italian. You can read the paper for more details.

References:

de Vries, Wietse, and Malvina Nissim. "As good as new. How to successfully recycle English GPT-2 to make models for other languages." arXiv preprint arXiv:2012.05628 (2020). https://aclanthology.org/2021.findings-acl.74.pdf

See also: https://github.com/wietsedv/gpt2-recycle


You can download the resulting GPT2 models for Dutch and Italian from Huggingface and generate a Dutch and Italian short story from a prompt.

In [7]:
dutchGpt2pipe = pipeline("text-generation", model="GroNLP/gpt2-small-dutch")

In [8]:
dutch_prompts = ['Mark Rutte is ter verantwoording geroepen voor',
           'Thierry Baudet is ter verantwoording geroepen voor', 
           'Thierry Rutte is ter verantwoording geroepen voor']

In [9]:
for prompt in dutch_prompts:
    print(dutchGpt2pipe(prompt))



[{'generated_text': 'Mark Rutte is ter verantwoording geroepen voor de mislukte coup van Erdogan in november 2016. "Ik weet dat wij het niet goed hebben, maar ik ben er wel overheen gekomen", twitterde hij zondag na zijn speech op NPO Radio 1.\nHet Turkse parlement wil een nieuw kabinet vormen waarin Turkije als premier wordt gekozen. Een nieuwe coalitie zal moeten worden gevormd tussen minister-president Tayyip Erdogan (links) en vicepremier Küdim Aktasoglu (rechts). Er staat volgens sommige peilingen'}]
[{'generated_text': 'Thierry Baudet is ter verantwoording geroepen voor de moord op Marianne Thieme aan het begin van haar politieke carrière.\nIn een interview in Trouw vertelt Thierry Baudet: "Ik zou me niet voorstellen dat er ooit iemand zoiets zo verschrikkelijks te maken had, maar ik heb ook nog nooit iets dergelijks meegemaakt." Hij verwijst naar zijn eigen ervaringen met geweld tegen vrouwen en sekswerkers die door hem werden beschuldigd van \'onbehoorlijke of gewelddadige\' vo

In [10]:
 print(dutchGpt2pipe('Een klein kind')[0])

{'generated_text': "Een klein kind, maar ik weet zeker dat ze er niet in staat is om aan te geven wat zij wil. Ze moet het zelf weten.'\n 'Ik zal haar vertellen waar we mee bezig zijn,' zei hij met een ondeugende twinkeling in zijn ogen. De jongen knikte instemmend en ging achter zich op naar de auto die hen hadden gezet. Een paar minuten later was hun huis binnengeleid door twee andere mannen van ongeveer veertig of vijftig meter vijfenvijftig tot zestig jaar oud. Zij waren gekleed in zwarte"}


In [11]:
italianGpt2pipe = pipeline("text-generation", model="GroNLP/gpt2-small-italian")

print(italianGpt2pipe('Uno bambino picolo')[0])

{'generated_text': 'Uno bambino picolosa non ha mai potuto fare il male, è stato preso in custodia da un uomo che gli aveva chiesto di aiutarlo a trovare i soldi per la morte del padre.\n\nL\'episodio narra le vicende delle due sorelle e l\'evoluzione degli atteggiamenti dei loro genitori nella vita quotidiana dell\'adolescente (le storie si mescolano anche con altri episodi). Il narratore racconta così una storia narrata nei flash della prima parte dello stesso romanzo: "Il figlio era morto nel suo letto'}


The larger generative models such as ChatGPT, BARD, LLAMA have seen data in many languages (although still dominantly English). These models can generate text in these languages as well without further training. However, research into the multilinguality of these models is ongoing and the dominance of English appears to have an impact on the language generated for non-English languages.

## End of notebook