# Few-shot text classification with Llama 3.1 8B and ollama

For **running on _Google Colab_**, see notebook [llm_zeroshot_classification_ollama.ipynb](./llm_zeroshot_classification_ollama.ipynb).

In [1]:
import os
from ollama import Client

import re

In [2]:
# create the ollama client that communicates with the `ollama` server running in the background
client = Client()
MODEL = 'llama3.1:8b' # maybe try 'mistral-nemo:12b' for better results

In [3]:
get_available_models = lambda : [m['model'] for m in client.list()['models']]

if MODEL not in get_available_models():
  print(f"Model not found. Running `client.pull('{MODEL}')` to download it.")
  client.pull(MODEL)

# confirm
assert MODEL in get_available_models()

In [4]:
data_path = os.path.join('..', 'data', 'labeled', 'benoit_crowdsourced_2016', '')

In [5]:
SEED=42

## Define the task

In this example, we adapt the instruction for the economic/social/neither policy area classification task in Benoit et al. (2016)

- see [this README file](../data/labeled/benoit_crowdsourced_2016/README.md) for a description of the data and tasks covered in the paper
- see [this file](../data/labeled/benoit_crowdsourced_2016/instructions/econ_social_policy.md) for a copy of their original task instructions

In [6]:
instructions = """
This task involves reading sentences from political texts and judging whether these deal with economic or social policy.

The sentences you will be asked to interpret come from political party manifestos.
Some of these sentences will deal with economic policy; some will deal with social policy; other sentences will deal with neither economic nor social policy. We tell you below about what we mean by "economic" and "social" policy.

First, you will read a short section from a party manifesto.
For the sentence highlighted in red, enter your best judgment about whether it mainly refers to economic policy, to social policy, or to neither.

If the sentence refers to economic policy, select "economic" in the drop down menu; if it refers to social policy, select "social".
If the sentence does not refer to either policy area, select "Not Economic or Social" -- in this case you will move directly to the next sentence.

Now we need to tell you about what we mean by "economic" and "social" policy.

## Your task

Classify the input text in one of the following categories: economic, social, neither

## Response format

Only respond with the chosen category and no additional text or explanations 
"""

### Load the data

In [7]:
from utils.io import read_tabular
from utils.finetuning import split_data

fp = data_path+'benoit_crowdsourced_2016-policy_area.csv'
df = read_tabular(fp, columns=['text', 'label', 'metadata__gold'])

df = df[df.metadata__gold]
del df['metadata__gold']

id2label = {
    2: "economic",
    3: "social",
    1: "neither"
}

df.label = df.label.map(id2label)

# subset to 100 examples per label class 
df = df.groupby('label').sample(n=50, random_state=SEED)

In [8]:
df.label.value_counts()

label
economic    50
neither     50
social      50
Name: count, dtype: int64

In [9]:
# split into "training" and "test" data
#  - training is used for examples
#  - test is used for few-shot classification  
# note: if you had different example selection 
#  strategies of variations of a prompt, you could use a dev for comparing them
data_splits = split_data(df, test_size=0.5, dev_size=None, stratify_by='label', return_dict=True)
data_splits.keys()

dict_keys(['train', 'test'])

## sample examples

There is various different ways of sampling few-shot exemplars

Here, we take the **most representative exemplars** by 

1. embedding exemplars
2. computing the centroid for each label class
3. and ranking exemplars in terms of their closeness to their class centroid

In [10]:
import numpy as np
from typing import List
def embed(texts: List[str]):
    # see https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=python
    texts = [text.replace("\n", " ") for text in texts] 
    res = client.embed(
        input=texts,
        model=MODEL
        # alternatively, take 'mxbai-embed-large' for faster embedding
    )
    return np.array(res['embeddings'])

In [11]:
from tqdm.notebook import tqdm
from sklearn.metrics.pairwise import cosine_similarity

ranked_exemplars = {}
for label_class, df in tqdm(data_splits['train'].groupby('label')):
    # get embeddings from OpenAI model
    embeddings = embed(df.text.to_list())
    # compute centroid
    centroid = embeddings.mean(axis=0)
    # compute cosine similarity between the centroid and the embeddings
    dists = cosine_similarity([centroid], embeddings)
    # rank the examples by similarity, descending
    ranked_indices = np.argsort(dists[0])[::-1]
    # add to output
    ranked_exemplars[label_class]= df.iloc[ranked_indices, 0]

  0%|          | 0/3 [00:00<?, ?it/s]

In [12]:
import random
def get_n_examples(ranked_exemplars, k, shuffle=True, random_state=SEED):
    exs = []
    for label_class, exemplars in ranked_exemplars.items():
        ex = exemplars[:k].to_list()
        ex = [{'text': text, 'label': label_class} for text in ex]
        exs.extend(ex)
    if shuffle:
        random.Random(random_state).shuffle(exs)
    return exs

In [13]:
get_n_examples(ranked_exemplars, 3)

[{'text': 'Rigid dogmas, the overriding need for party unity, and indiscriminate three-line whips have all helped to create a climate of conflict and rancour.',
  'label': 'neither'},
 {'text': 'We will encourage the recruitment of ethnic minorities into the police force and require action to be taken against discrimination within the force.',
  'label': 'social'},
 {'text': 'We have encouraged tougher sentences for violent criminals.',
  'label': 'social'},
 {'text': 'We seek the support of the British people to make this achievement truly secure, to build upon it and to extend its benefits to all.',
  'label': 'neither'},
 {'text': 'We will strengthen the law on public order to combat racial hatred and take firm action against the growing menace of racial attacks.',
  'label': 'social'},
 {'text': 'In particular, we will cut income tax still further and reduce the basic rate to 25p in the £ as soon as we prudently can.',
  'label': 'economic'},
 {'text': 'From the White House through

In [14]:
def convert_exemplars_list_to_convo(exemplars):
    convo = []
    for ex in exemplars:
        convo.append({"role": "user", "content": f"'''{ex['text']}'''"},)
        convo.append({"role": "assistant", "content": ex['label']},)
    return convo

In [15]:
convert_exemplars_list_to_convo(get_n_examples(ranked_exemplars, 3))[:6]

[{'role': 'user',
  'content': "'''Rigid dogmas, the overriding need for party unity, and indiscriminate three-line whips have all helped to create a climate of conflict and rancour.'''"},
 {'role': 'assistant', 'content': 'neither'},
 {'role': 'user',
  'content': "'''We will encourage the recruitment of ethnic minorities into the police force and require action to be taken against discrimination within the force.'''"},
 {'role': 'assistant', 'content': 'social'},
 {'role': 'user',
  'content': "'''We have encouraged tougher sentences for violent criminals.'''"},
 {'role': 'assistant', 'content': 'social'}]

### A single text example

In [16]:
text_input = data_splits['test'].text.values[0]

In [17]:
# convert to conversation history
messages = [
  # system prompt
  {"role": "system", "content": instructions},
  # exemplars
  *convert_exemplars_list_to_convo(get_n_examples(ranked_exemplars, 3))[:6],
  # user input
  {"role": "user", "content": f"'''{text_input}'''"},
]

In [18]:
options = options = {'seed': 42, 'temperature': 0.0, 'max_tokens': 3}
response = client.chat(
  model=MODEL,
  messages=messages,
  options=options
)

In [19]:
# parse the response
response['message']

{'role': 'assistant', 'content': 'economic'}

In [20]:
data_splits['test'].label.values[0]

'economic'

### Iterate over multiple examples

Let's first define a custom function to classify tweets:

In [21]:
from typing import List, Dict
def classify_tweet(text, model: str, system_message: str, exemplars: List[Dict]):

  # clean the text 
  text = re.sub(r'\s+', ' ', text).strip()

  # construct input

  messages = [
    # system prompt
    {"role": "system", "content": system_message},
    # exemplars
    *convert_exemplars_list_to_convo(exemplars),
    # user input
    {"role": "user", "content": f"'''{text}'''"},
  ]

  options = options = {'seed': 42, 'temperature': 0.0, 'max_tokens': 3}
  response = client.chat(
    model=MODEL,
    messages=messages,
    options=options
  )
  
  if 'message' not in response:
      print("WARNING: Response should have one 'meassage'")
      return None
  if not response['done'] or response['done_reason'] != 'stop':
      print("WARNING: Response should have 'done_reason' of 'stop' but got:", response['done_reason'])
      return None

  return response['message']['content']

Now we can iterate over example texts:

In [22]:
from tqdm.notebook import tqdm

texts = data_splits['test'].text.to_list()
exemplars = get_n_examples(ranked_exemplars, 3)
classifications = [classify_tweet(text, model=MODEL, system_message=instructions, exemplars=exemplars) for text in tqdm(texts)]

  0%|          | 0/75 [00:00<?, ?it/s]

In [23]:
from sklearn.metrics import classification_report

cr = classification_report(
    y_true = data_splits['test'].label.to_list(),
    y_pred = classifications
)

print(cr)

              precision    recall  f1-score   support

    economic       1.00      0.32      0.48        25
     neither       0.85      0.92      0.88        25
      social       0.60      0.96      0.74        25

    accuracy                           0.73        75
   macro avg       0.82      0.73      0.70        75
weighted avg       0.82      0.73      0.70        75

