# Lab 4.2 Use generative Language Models as text generator

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

Most of you have used ChatGPT from OpenAI and noticed that it can respond to your input in a very natural way. ChatGPT is build on top of GPT (Generative Pretrained Transformer) which is a model trained to generate text given a preceding input, so-called prompt (Brown et al 2020). It can do this repetitively up to a certain length, likewise generating short stories.

Another generative model is T5 (Text to Text Transfer Transformer, Raffel et al. 2019). T5 models many tasks as a text generation task, ranging from plain translation, sentiment annotation, question-answering, similarity, to summarisation. Tasks are differentiated through prompt prefixes.

<img src="T5.gif">

Models such as GPT3,4 and T5, although having good performance, are by far too large model to work with locally. Therefore in this notebook, we will use Llama3 which is publicly available: https://ai.meta.com/blog/meta-llama-3-1/

### References

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.

OpenAI, 2023. GPT-4 Technical Report. arXiv:2303.08774

Llama: [The Llama 3 Herd of Models](https://scontent-ams2-1.xx.fbcdn.net/v/t39.2365-6/452387774_1036916434819166_4173978747091533306_n.pdf?_nc_cat=104&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=t6egZJ8QdI4Q7kNvgEka7Z4&_nc_ht=scontent-ams2-1.xx&oh=00_AYBPnjp-CQn7YnUQU_P-yJATmlaN6oRuEHZ1VrXshBoBwQ&oe=66A6EB8D)


## Llama3 client-server calls

We could use Llama3 in the same way as we have seen for BERT and XLM-RoBERTa using a pipeline. This is however a bit more difficult because you need to obtain an access key from Meta and login through a Huggingface account. 

We are therefore going to make use of the server version of Llama as we did for our chat conversation at the beginning of the course. For this, we need to use the OpenAI client package again. If you have not yet installed it, you can i nstall the OpenAI client using the following command line.

In [1]:
! pip install openai

Collecting openai
  Downloading openai-1.51.0-py3-none-any.whl.metadata (24 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Using cached jiter-0.5.0-cp310-cp310-macosx_10_12_x86_64.whl.metadata (3.6 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->openai)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.23.4 (from pydantic<3,>=1.9.0->openai)
  Downloading pydantic_core-2.23.4-cp310-cp310-macosx_10_12_x86_64.whl.metadata (6.6 kB)
Downloading openai-1.51.0-py3-none-any.whl (383 kB)
Using cached distro-1.9.0-py3-none-any.whl (20 kB)
Using cached jiter-0.5.0-cp310-cp310-macosx_10_12_x86_64.whl (284 kB)
Downloading pydantic-2.9.2-py3-none-any.whl (434 kB)
Downloading pydantic_core-2.23.4-cp310-cp310-macosx_10_12_x86_64

After installing the openAI package we can import it:

In [2]:
from openai import OpenAI

We can now create a client that connects to the server version, either locally or to the CLTL server that was given to you during the course. In the next cell, we show how to access a local server that runs on port 9001.

In [6]:
client = OpenAI(base_url="http://localhost:9001/v1", api_key="not-needed")

The openAI client has various functions which are explained here: https://pypi.org/project/openai/

We will use the ```chat``` module with the function ```completions.create```. This function needs several arguments:

* model: this can be a local model or the name of one of the openAI models
* messages: this is the input prompt
* temperature: a value between 0 and 1 that defines how creative the model should be, 0 means best answer only and 1 a less liklely alternative maximally creative.
* stream: this must be set to True so that we can get the complete answer from the server

You can experiment with the temperature and the prompt to see the effect.

The prompt is given as a list of **dict** strings defining the role of the **system** (the model) and the input given by the **user**:

In [7]:
prompt = [{"role": "system", "content": "Generate 10 alternative sentences by completing the next input text."},
          {"role": "user", "content": "Bach sat down at his organ and played"}]

In [8]:
completion = client.chat.completions.create(
                model="local-model", # this field is currently unused
                messages=prompt,
                temperature=0.3,
                stream=True
            )

In [9]:
response = ""
for chunk in completion:
    if chunk.choices[0].delta.content:
        response += chunk.choices[0].delta.content
print(response)

Here are 10 alternative sentences:

1. Bach sat down at his organ and played a soothing melody.
2. Bach sat down at his organ and played with precision and passion.
3. Bach sat down at his organ and played a complex fugue.
4. Bach sat down at his organ and played a lively dance tune.
5. Bach sat down at his organ and played a somber dirge.
6. Bach sat down at his organ and played a beautiful chorale.
7. Bach sat down at his organ and played with reckless abandon.
8. Bach sat down at his organ and played a stately processional.
9. Bach sat down at his organ and played a whimsical piece.
10. Bach sat down at his organ and played a majestic cantata.


There is a lot to consider about these sentences. Are they all grammatical? Is the input modified and still semantically valid? Are the completions semantically correct? Do the completions make sense given that it is Bach playing?

Let us try another one:

In [None]:
prompt = [{"role": "system", "content": "Generate 10 alternative sentences by completing the next input text."},
          {"role": "user", "content": "Beethoven sat down at his organ and played"}]

completion = client.chat.completions.create(
                model="local-model", # this field is currently unused
                messages=prompt,
                temperature=0.3,
                stream=True
            )

response = ""
for chunk in completion:
    if chunk.choices[0].delta.content:
        response += chunk.choices[0].delta.content
print(response)

You can do this all day, let us switch the language of the input:

In [None]:
prompt = [{"role": "system", "content": "Generate 10 alternative sentences by completing the next input text."},
          {"role": "user", "content": "Beethoven ging achter zijn orgel zitten en speelde"}]

completion = client.chat.completions.create(
                model="local-model", # this field is currently unused
                messages=prompt,
                temperature=0.3,
                stream=True
            )

response = ""
for chunk in completion:
    if chunk.choices[0].delta.content:
        response += chunk.choices[0].delta.content
print(response)

Instructions in English, completions as Dutch. You can also give the instructions in Dutch if you want or try other languages. Which languages are covered by Llama and how well? Interesing result is that Llama varied the sitting and in a few cases the organ (orgel) but never specified what Beethoven is playing.

In [None]:
prompt = [{"role": "system", "content": "Genereer 10 alternatieve zinnen door de volgende text af te maken."},
          {"role": "user", "content": "Beethoven ging achter zijn orgel zitten en speelde"}]

completion = client.chat.completions.create(
                model="local-model", # this field is currently unused
                messages=prompt,
                temperature=0.3,
                stream=True
            )

response = ""
for chunk in completion:
    if chunk.choices[0].delta.content:
        response += chunk.choices[0].delta.content
print(response)

So Dutch instructions do make a big difference as the quality went down drastically.

You could use the same client to prompt openAI models but for that you need to obtain an API key from openAI and pass it in to the model. You can pass this key in the api_key parameter when creating a client and next make the same calls as we did before. 

In [45]:
# client = OpenAI(api_key="OPENAI_API_KEY")
                
# completion = client.chat.completions.create(
#                 model="gpt-4o",
#                 messages=prompt,
#                 temperature=0.3,
#                 stream=True
#             )

Generative LLMs sucha s ChatGPT, LLama, Mixtral have been finetuned for various tasks through alignment and possibly reinforcement learning on top of this. These are specific behaviours such as **chat**, **summarize**, **paraphrase**, **translate** or answer questions **Q&A**. You can evoke these functions through the instruction given, as we gave the instruction to complete a text. However, the models could also give useful responses to other instructions for which they were not specifically trained, e.g. to generate Python code, do calculations or a website. These emerging capabilities are fascinating but also risky since there is little control and/or training for this behaviour and therefore no guarantee that it provides the correct answers. So in case of code generation, it may cost more time to check and understand the generated code than to code yourself which also makes you a better coder.

## End of notebook