# Ollama: Exercises

We saw that the first results of the Llama 3.2 model were not great (to say the least).

Try getting better results via prompt engineering, few-shot classification and using a different model! All of these techniques are viable approaches to improving performance, so feel free to tackle them in any order you like - or use them all together to get the best results!

_Hint_: Larger models will require more powerful hardware to run efficiently. Keep this in mind when trying different models.

In [None]:
# we're loading up our trusty UK media dataset and do some minor data cleaning

import pandas as pd

seed = 20250228

# set the size of the samples. feel free to adjust this for faster iterations
n = 100

uk_media = pd.read_csv('data/uk_media.csv')

 # fillna() makes sure missing values don't result in NaN entries
uk_media['text'] = uk_media['description'].fillna('') + ' ' + uk_media['subtitle'].fillna('')

# we'll also drop duplicates indicated by the filter_duplicate column
uk_media = uk_media[uk_media['filter_duplicate'] == 0]

# we'll also drop rows where text is NaN (missing due to missing headlines)
uk_media = uk_media[uk_media['text'].notna()]

# drop rows with majortopic code 0
uk_media = uk_media[uk_media['majortopic'] != 0]

# only keep rows below 24 OR equal to 99
uk_media = uk_media[(uk_media['majortopic'] < 24) | (uk_media['majortopic'] == 99)]

# drop category 22 not in the CAP
uk_media = uk_media[uk_media['majortopic'] != 22]

# turn the majortopic column into a string
uk_media['majortopic'] = uk_media['majortopic'].astype(str)

# this will be the same sample as before, since we set a seed
uk_media_sample = uk_media.sample(n=n, random_state=seed) 
uk_media_sample.reset_index(drop = True, inplace = True) # reset index

In [None]:
# our classification function - simply replacing openAI client with our ChatOllama client

import re

def classify_text(text, system_message, model):

  # clean the text by removing extra spaces
  text = re.sub(r'\s+', ' ', text).strip()

  # construct input

  messages = [
    # system prompt
    {"role": "system", "content": system_message}, # this will contain all instructions for the model
    # user input
    {"role": "user", "content": text}, # text here is the input text to be classified
  ]

  # note that we set parameters such as tempetaure when setting the client up, rather than when calling it 
  llm = ChatOllama(model = model,
                  temperature=0.0,
                  num_ctx = 20000, # this sets the size of the context window!
                  # you can add additional parameters here
                  )

  response = llm.invoke(messages)

  return response.content

## Prompt Engineering

As with the openAI GPT models, prompt engineering (changing the wording of the prompt) can have a large impact on model performance - for some models even more than for other. As before, try engineering the prompt to get better results (or re-use the prompts you engineered for the GPT models earlier).

In [None]:
# Edit the prompt here

# the CAP labels
cap_labels = {
    "1": "Issues related to general domestic macroeconomic policy; Interest Rates; Unemployment Rate; Monetary Policy; National Budget; Tax Code; Industrial Policy; Price Control; other macroeconomics subtopics",
    "2": "Issues related generally to civil rights and minority rights; Minority Discrimination; Gender Discrimination; Age Discrimination; Handicap Discrimination; Voting Rights; Freedom of Speech; Right to Privacy; Anti-Government; other civil rights subtopics",
    "3": "Issues related generally to health care, including appropriations for general health care government agencies; Health Care Reform; Insurance; Drug Industry; Medical Facilities; Insurance Providers; Medical Liability; Manpower; Disease Prevention; Infants and Children; Mental Health; Long-term Care; Drug Coverage and Cost; Tobacco Abuse; Drug and Alcohol Abuse; health care research and development; issues related to other health care topics",
    "4": "Issues related to general agriculture policy, including appropriations for general agriculture government agencies; agricultural foreign trade; Subsidies to Farmers; Food Inspection & Safety; Food Marketing & Promotion; Animal and Crop Disease; Fisheries & Fishing; agricultural research and development; issues related to other agricultural subtopics",
    "5": "Issues generally related to labor, employment, and pensions, including appropriations for government agencies regulating labor policy; Worker Safety; Employment Training; Employee Benefits; Labor Unions; Fair Labor Standards; Youth Employment; Migrant and Seasonal workers; Issues related to other labor policy",
    "6": "Issues related to General education policy, including appropriations for government agencies regulating education policy; Higher education, student loans and education finance, and the regulation of colleges and universities; Elementary & Secondary education; Underprivileged students; Vocational education; Special education and education for the physically or mentally handicapped; Education Excellence; research and development in education; issues related to other subtopics in education policy",
    "7": "Issues related to General environmental policy, including appropriations for government agencies regulating environmental policy; Drinking Water; Waste Disposal; Hazardous Waste; Air Pollution; Recycling; Indoor Hazards; Species & Forest; Land and Water Conservation; research and development in environmental technology, not including alternative energy; issues related to other environmental subtopics",
    "8": "Issues generally related to energy policy, including appropriations for government agencies regulating energy policy; Nuclear energy, safety and security, and disposal of nuclear waste; Electricity; Natural Gas & Oil; Coal; Alternative & Renewable Energy; Issues related to energy conservation and energy efficiency; issues related to energy research and development; issues related to other energy subtopics",
    "9": "Issues related to immigration, refugees, and citizenship",
    "10": "Issues related generally to transportation, including appropriations for government agencies regulating transportation policy; mass transportation construction, regulation, safety, and availability; public highway construction, maintenance, and safety; Air Travel; Railroad Travel; Maritime transportation; Infrastructure and public works, including employment initiatives; transportation research and development; issues related to other transportation subtopics",
    "12": "Issues related to general law, crime, and family issues; law enforcement agencies, including border, customs, and other specialized enforcement agencies and their appropriations; White Collar Crime; Illegal Drugs; Court Administration; Prisons; Juvenile Crime; Child Abuse; Family Issues, domestic violence, child welfare, family law; Criminal & Civil Code; Crime Control; Police; issues related to other law, crime, and family subtopics",
    "13": "Issues generally related to social welfare policy; Low-Income Assistance; Elderly Assistance; Disabled Assistance; Volunteer Associations; Child Care; issues related to other social welfare policy subtopics",
    "14": "Issues related generally to housing and urban affairs; housing and community development, neighborhood development, and national housing policy; urban development and general urban issues; Rural Housing; economic, infrastructure, and other developments in non-urban areas; housing for low-income individuals and families, including public housing projects and housing affordability programs; housing for military veterans and their families, including subsidies for veterans; housing for the elderly, including housing facilities for the handicapped elderly; housing for the homeless and efforts to reduce homelessness ; housing and community development research and development; Other issues related to housing and community development",
    "15": "Issues generally related to domestic commerce, including appropriations for government agencies regulating domestic commerce; Banking; Securities & Commodities; Consumer Finance; Insurance Regulation; personal, commercial, and municipal bankruptcies; corporate mergers, antitrust regulation, corporate accounting and governance, and corporate management; Small Businesses; Copyrights and Patents; Disaster Relief; Tourism; Consumer Safety; Sports Regulation; domestic commerce research and development; other domestic commerce policy subtopics",
    "16": "Issues related generally to defense policy, and appropriations for agencies that oversee general defense policy; defense alliance and agreement, security assistance, and UN peacekeeping activities; military intelligence, espionage, and covert operations; military readiness, coordination of armed services air support and sealift capabilities, and national stockpiles of strategic materials.; Nuclear Arms; Military Aid; military manpower, military personel and their dependents, military courts, and general veterans' issues; military procurement, conversion of old equipment, and weapons systems evaluation; military installations, construction, and land transfers; military reserves and reserve affairs; military nuclear and hazardous waste disposal and military environmental compliance; domestic civil defense, national security responses to terrorism, and other issues related to homeland security; non-contractor civilian personnel, civilian employment in the defense industry, and military base closings; military contractors and contracting, oversight of military contrators and fraud by military contractors; Foreign Operations; claims against the military, settlements for military dependents, and compensation for civilians injured in military operations; defense research and development; other defense policy subtopics",
    "17": "Issues related to general space, science, technology, and communications; government use of space and space resource exploitation agreements, government space programs and space exploration, military use of space; regulation and promotion of commercial use of space, commercial satellite technology, and government efforts to encourage commercial space development; science and technology transfer and international science cooperation; Telecommunications; Broadcast; Weather Forecasting; computer industry, regulation of the internet, and cyber security; space, science, technology, and communication research and development not mentioned in other subtopics.; other issues related to space, science, technology, and communication research and development",
    "18": "Issues generally related to foreign trade and appropriations for government agencies generally regulating foreign trade; Trade Agreements; Exports; Private Investments; productivity of competitiveness of domestic businesses and balance of payments issues; Tariff & Imports; Exchange Rates; other foreign trade policy subtopics",
    "19": "Issues related to general international affairs and foreign aid, including appropriations for general government foreign affairs agencies; Foreign Aid; Resources Exploitation; Developing Countries; International Finance; Western Europe; issues related specifically to a foreign country or region not codable using other codes, assessment of political issues in other countries, relations between individual countries; Human Rights; International organizations, NGOs, the United Nations, International Red Cross, UNESCO, International Olympic Committee, International Criminal Court; international terrorism, hijacking, and acts of piracy in other countries, efforts to fight international terrorism, international legal mechanisms to combat terrorism; diplomats, diplomacy, embassies, citizens abroad, foreign diplomats in the country, visas and passports; issues related to other international affairs policy subtopics",
    "20": "Issues related to general government operations, including appropriations for multiple government agencies; Intergovernmental Relations; Bureaucracy; Postal Service; issues related to civil employees not mentioned in other subtopics, government pensions and general civil service issues; issues related to nominations and appointments not mentioned elsewhere; issues related the currency, national mints, medals, and commemorative coins; government procurement, government contractors, contractor and procurement fraud, and procurement processes and systems; government property management, construction, and regulation; Tax Administration; public scandal and impeachment; government branch relations, administrative issues, and constitutional reforms; regulation of political campaigns, campaign finance, political advertising and voter registration; Census & Statistics; issues related to the capital city; claims against the government, compensation for the victims of terrorist attacks, compensation policies without other substantive provisions; National Holidays; other government operations subtopics",
    "21": "Issues related to general public lands, water management, and territorial issues; National Parks; Indigenous Affairs; natural resources, public lands, and forest management, including forest fires, livestock grazing; water resources, water resource development and civil works, flood control, and research; territorial and dependency issues and devolution; other public lands policy subtopics",
    "23": "Issues related to general cultural policy issues",
    "99": "Other issues, where none of the above is appropriate.", # dummy category
}

# give the model some context for the task it is about to perform
context = """
You are a political scientist tasked with annotating documents into policy categories. 
The documents can be classified as one of the following numbered categories. 
A description of each category is following the ':' sign.
"""

# turn the CAP dictionary into a string
labels_definitions = ""

for i in range(len(cap_labels)):
    labels_definitions += f'{list(cap_labels.keys())[i]}: {list(cap_labels.values())[i]}\n'

# finally, the question we want the model to answer, including specific instructions for the output
question = """
Which policy category does this document belong to? 
Answer only with the number of the category, and only with a single category.
"""

# now we combine the parts into the system prompt
system_message = f"{context}\n{labels_definitions}\n\n{question}"

print(system_message)
print(f'Prompt length: {len(system_message)}')

In [None]:
# classify

model = "llama3.2"

classification_results = [classify_text(text, 
                                        system_message = system_message, 
                                        model = model) for text in uk_media_sample_sm['text']] # we're looping our function over the texts

In [None]:
# evaluate

classification_results_df = pd.concat([uk_media_sample_sm, 
                                       pd.DataFrame(classification_results, 
                                                    columns = ['result'])],
                                        axis = 1)
classification_results_df

In [None]:
# check which results are longer than 2 characters
classification_results_df[classification_results_df['result'].str.len() > 2]

In [None]:
from sklearn.metrics import classification_report

# replace results with string length > 2 with '99'
classification_results_df.loc[classification_results_df['result'].str.len() > 2, 'result'] = '99'

print(classification_report(classification_results_df["majortopic"], classification_results_df["result"]))

## Few-shot Classification

As before, we can provide the model with some examples, so that it gets better at understanding what we want it to do. Try adjusting parameters, such as using more than 2 examples per category.

In [None]:
# train/test split for examples

from sklearn.model_selection import train_test_split

# this functions pulls n samples per category from the dataframe as train, leaving the rest as test
def stratified_train_test_split(df, category_col, n_train_per_category):
    train_dfs = []
    test_dfs = []
    
    for category, group in df.groupby(category_col):
        train_group, test_group = train_test_split(group, train_size=n_train_per_category, random_state=42)
        train_dfs.append(train_group)
        test_dfs.append(test_group)
    
    train_df = pd.concat(train_dfs).reset_index(drop=True)
    test_df = pd.concat(test_dfs).reset_index(drop=True)
    
    return train_df, test_df


train_sample, test_sample = stratified_train_test_split(uk_media, 'majortopic', 2) # we'll use 2 samples per category. Edit this as needed

In [None]:
# sample for few-shot learning

uk_media_sample_fewshot = test_sample.sample(n = n, random_state = seed)

uk_media_sample_fewshot.reset_index(drop = True, inplace = True) # reset index

In [None]:
# Edit the prompt here (this re-uses the labels above)

# give the model some context for the task it is about to perform
context = """
You are a political scientist tasked with annotating documents into policy categories. 
The documents can be classified as one of the following numbered categories. 
A description of each category is following the ':' sign.
You will be provided two examples for each category to help you make a decision. These are marked with "Examples:".
"""

# turn the CAP dictionary into a string
labels_definitions = ""

for label in cap_labels.keys():
    examples = train_sample[train_sample['majortopic'] == label]['text'].values
    labels_definitions += f'{label}:\n{cap_labels[label]}\nExamples: {examples}\n'

# finally, the question we want the model to answer, including specific instructions for the output
question = """
Which policy category does this document belong to? 
Answer only with the number of the category, and only with a single category.
"""

# now we combine the parts into the system prompt
system_message = f"{context}\n{labels_definitions}\n\n{question}"

print(system_message)

In [None]:
# classify

model = "llama3.2"

classification_results = [classify_text(text, 
                                        system_message = system_message, 
                                        model = model) for text in uk_media_sample_sm['text']] # we're looping our function over the texts

In [None]:
# evaluate

classification_results_df = pd.concat([uk_media_sample_sm, 
                                       pd.DataFrame(classification_results, 
                                                    columns = ['result'])],
                                        axis = 1)
classification_results_df

In [None]:
# check which results are longer than 2 characters
classification_results_df[classification_results_df['result'].str.len() > 2]

In [None]:
from sklearn.metrics import classification_report

# replace results with string length > 2 with '99'
classification_results_df.loc[classification_results_df['result'].str.len() > 2, 'result'] = '99'

print(classification_report(classification_results_df["majortopic"], classification_results_df["result"]))

## Different Models

Try different models and see how they fare. You can find available  models on the [Ollama Website](https://ollama.com/search). You can download them with `!ollama pull` (leave out the `!` if running the command in a terminal rather than a code chunk).

You can also combine different models with different prompts and few-shot learning to see how this affects the results!

In [1]:
!ollama pull deepseek-r1:1.5b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest 
pulling aabd4debf0c8... 100% ▕████████████████▏ 1.1 GB                         
pulling 369ca498f347... 100% ▕████████████████▏  387 B                         
pulling 6e4c38e1172f... 100% ▕████████████████▏ 1.1 KB                         
pulling f4d24e9138dd... 100% ▕████████████████▏  148 B                         
pulling a85fe2a2e58e... 100% ▕████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [None]:
# classify. This reuses the previous system_message

model = "deepseek-r1:1.5b"

classification_results = [classify_text(text, 
                                        system_message = system_message, 
                                        model = model) for text in uk_media_sample_sm['text']] # we're looping our function over the texts

In [None]:
# evaluate

classification_results_df = pd.concat([uk_media_sample_sm, 
                                       pd.DataFrame(classification_results, 
                                                    columns = ['result'])],
                                        axis = 1)
classification_results_df

In [None]:
# check which results are longer than 2 characters
classification_results_df[classification_results_df['result'].str.len() > 2]

In [None]:
from sklearn.metrics import classification_report

# replace results with string length > 2 with '99'
classification_results_df.loc[classification_results_df['result'].str.len() > 2, 'result'] = '99'

print(classification_report(classification_results_df["majortopic"], classification_results_df["result"]))

_Note_: These exercises will not have solutions provided, as they are more about exploring different techniques and models, and all relevant code is already provided.