# Generative language models for zero-shot or few-shot learning

Many modern NLP solutions (and software more generally; you know, sprinkle that magic AI dust ...) include the use of LLMs for performing various tasks. Unless an LLM is run on a powerful machine, this often involves sending requests to an LLM service, kind of like when your browser communicates with a server. This can actually be done from the command-line with the `curl` tool, as shown below. This is essentially what we want to do in Python code.

In [33]:
# NOTE: This will only work if there is a server running with this public URL on UCloud
! curl https://app-language-analytics-llm.cloud.aau.dk/api/generate -d '{"model": "llama3.2:3b", "prompt": "Is there anybody out there?", "stream": true, "options": {"num_predict": 20}}'

{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:15.988960481Z","response":"A","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.000652397Z","response":" reference","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.010211574Z","response":" to","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.0177438Z","response":" Pink","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.025058983Z","response":" Floyd","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.032516523Z","response":"'s","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.039915692Z","response":" iconic","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.047482053Z","response":" song","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.054696159Z","response":" \"","done":false}
{"model":"llama3.2:3b","created_at":"2025-05-05T11:30:16.062314684Z","response":"Comfor

## APIs (Application Programming Interface_) and web services

First a metaphor (thanks to `u/berael` on Reddit for the inspiration): An API is a menu in a restaurant. If someone wants give you access to their food, but not their kitchen, they give you a menu. Now you can tell them what you want and get it without seeing how it is made.

If someone wants to give you access to their program, but not their code, they give you an API. Thus, an API gives you access to the functionality of an application without showing you the implementation and inner logic.

In fact, we have been working with many things that can be considered having an API. For instance, a lot is going on under the hood in `SpaCy` or `transformers`, and we can access that functionality via their API.

When I say _LLM service_, it is because it is typically an API exposed and used over a network via HTTP/HTTPS requests. If you open the _Inspect_ pane in your browser (ctrl+shift+i in Chrome) and open the tab _Network_, you will see that your browser, as the _client_, sends many (often hundreds) requests to web _servers_ to get the content that you see in your browser. In a similar fashion, we can send requests (for data, broadly speaking) in our Python code to a server . We then use the data that we get back for our own purposes.

## OpenAI-compatible APIs in Python
Due to the wide usage, a standard quickly arose from early movers: OpenAI. Therefore, communication with an LLM service is often done with a so-called "OpenAI compatible" API. This has many nice consequences. For instance, one really only has to learn how to do it one way, and it means that switching from one model (or provider) to another requires little more than switching out a few lines. In practice, it means that we can use the `openai` library, but just point it elsewhere in the requests. We then point the _client_ to an [ollama](https://ollama.com/) server (which is a framework for running local LLMs).

For this exercise, I (Kasper) will get servers up and running on UCloud. You can also try to install and run ollama on your own machine, in which case you should use the `http://localhost:11434/v1` endpoint, as shown below.

In [7]:
!pip install openai

Defaulting to user installation because normal site-packages is not writeable


In [17]:
local_endpoint = 'http://localhost:11434/v1'
ucloud_endpoint_1 = 'https://app-language-analytics-llm.cloud.aau.dk/v1'
ucloud_endpoint_2 = 'https://app-language-analytics-llm2.cloud.aau.dk/v1'
ucloud_endpoint_3 = 'https://app-language-analytics-llm3.cloud.aau.dk/v1'

In [21]:
llm_endpoint = ucloud_endpoint_2

In [22]:
from openai import OpenAI

client = OpenAI(
    base_url=llm_endpoint,
    api_key='ollama',  # required, but unused; for OpenAI or similar, this the secret access key that tells the server who you are
)

response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "How do I make proper ramen?"},
    ],
    max_tokens=15
)

In [23]:
response.choices[0].message.content

'Making proper ramen is an art that requires attention to detail, quality ingredients,'

The available models for this exercise will be:
- llama3.2:3b
- llama3.1:8b
- phi4
- deepseek-r1:7b

### The old way
Notice that the arguments and the endpoint are slightly different from what I used in the `curl` command further above. The one in Python is considered the modern one, where it is laid out as a chat between `user` and `assistant`. The other is considered outdated, but still works.

In [None]:
response = client.completions.create(
    model="llama3.2:3b",
    prompt="How do I make proper ramen?",
    max_tokens=10
)

In [None]:
response

Completion(id='cmpl-324', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=None, text='The Los Angeles Dodgers won the World Series in ')], created=1746016666, model='llama3.2:3b', object='text_completion', system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=10, prompt_tokens=35, total_tokens=45, completion_tokens_details=None, prompt_tokens_details=None))

## Prompt engineering

Our goal is to use the knowledge of language that instruction-tuned language models like Llama (from Meta) or Phi (from Microsoft) have already acquired during training and to use that knowledge in different domains *without any further fine-tuning*. This is called *zero-shot* learning. If we provide one or more examples as part of the context, it is referred to as *few-shot* learning.

In order for zero-shot learning to be successful, our prompts need to be carefully designed.

In [None]:
# classification
prompt = "classify the following text as positive or negative: I absolutely hated this movie"

# translation
#prompt = "translate from English to French: how old are you?"

# question answering
#prompt = "answer the following question: how is cheese made?"

# named entity recognition
#prompt = "find all location entities in this text: Ross comes from Scotland"

In [13]:
response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[
        {"role": "system",
         "content": "You are a data annotator. Follow the instructions and respond with only one word."},
        {"role": "user",
         "content": "classify the following text as 'positive' or 'negative':\n\nI absolutely hated this movie"},
    ],
    max_tokens=1  # we really just need one!
)

In [14]:
response.choices[0].message.content

'Negative'

We can also do this a bit more cleverly by using a single prompt plus a F-string. This means we could, for example, write functions for specific tasks:

In [15]:
def classifier(input_text: str) -> str:
    prompt = f"classify the following text as positive or negative: {input_text}"
    response = client.chat.completions.create(
        model="llama3.2:3b",
        messages=[
            {"role": "system",
             "content": "You are a data annotator. Follow the instructions and respond with only one word."},
            {"role": "user",
             "content": prompt},
        ],
        max_tokens=1  # we really just need one!
    )
    return response.choices[0].message.content.lower()

In [16]:
classifier("I absolutely hated this movie!")

'negative'

## Tasks

1. Look through previous notebooks, exercises, and datasets from Language Analytics so far this semester. In small groups, try using either a local model, the UCloud endpoints or your favorite chatbot (ChatGPT, Claude, Gemini, Mistral, etc.). Try to solve those problems using generative language models. That would mean, for example:
    - Grammatical analysis
    - Named entity recognition/extraction
    - Classification
    - Topic modelling
2. The API has a lot of options, e.g. sampling strategies, ways to control generation or do integrate "streaming" (one token at a time). It surpasses the scope of this class to go through all of that, but now you may know where to look. If you are up for it, try tweaking the API calls.