<a href="https://colab.research.google.com/github/TurkuNLP/DIGHT25/blob/main/01_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DSPy tutorial part 1:

*   Grabs a dataset on climate change claims
*   Carries out the same classification task using DSPy
*   Compares the outputs

IMPORTANT

Before you start, press the "key" symbol on the left, create a "new secret"
called "openai-api-key" and the value is the key you get from me. Allow "Notebook access".

In [7]:
!pip3 install -q 'datasets<4.0.0'
!pip3 install -q dspy

In [8]:
from datasets import load_dataset
from google.colab import userdata
import random


#### new comment

# Load the Frugal AI Challenge dataset
dataset = load_dataset("QuotaClimat/frugalaichallenge-text-train")

# Access the training split
train_data = dataset["train"]

# Pick 10 random items
sampled_items = random.sample(list(train_data), 10)

# Print out a few samples to see the structure
for item in sampled_items[:10]: #prints 10 samples
    print(" Quote:", item["quote"])
    print(" Label:", item["label"])
    print()

 Quote: There is a lot of confusion in Asia about Alaska and the opportunities here, so we felt it was important to do this with BP and for us to be seen standing alongside each other. It’s an important way of demonstrating alignment,
 Label: 0_not_relevant

 Quote: “While record low sea ice is nothing new in the Arctic, this is a surprising turn of events for the Antarctic. Even as sea ice in the Arctic has seen a rapid and consistent decline over the past decade, its counterpart in the Southern Hemisphere has seen its extent increasing.”
 Label: 1_not_happening

 Quote: In the end, it’s the opinion of climatologists that matter in the global warming debate, and the polls clearly show that the experts are skeptical,
 Label: 6_proponents_biased

 Quote: I want to remind you what climate actually is. What kind of expertise [you need to study it]. If you want to study climate just because of carbon dioxide, I’m sorry, it’s still not enough to study only the atmosphere. You’ve got to know

In [11]:
import dspy


class ClimateViewClassifier(dspy.Signature):
    """Classify a quote or claim about climate change using one of the following labels:

    Not-Relevant: No relevant claim detected or claims that don't fit other categories

    Not-Happening: Claims denying the occurrence of global warming and its effects - Global warming is not happening. Climate change is NOT leading to melting ice (such as glaciers, sea ice, and permafrost), increased extreme weather, or rising sea levels. Cold weather also shows that climate change is not happening

    Not-Human: Claims denying human responsibility in climate change - Greenhouse gases from humans are not the causing climate change.

    Not-Bad: Claims minimizing or denying negative impacts of climate change - The impacts of climate change will not be bad and might even be beneficial.

    Solutions-Harmful-Unnecessary: Claims against climate solutions - Climate solutions are harmful or unnecessary

    Science-is-Unreliable: Claims questioning climate science validity - Climate science is uncertain, unsound, unreliable, or biased.

    Proponents-Biased: Claims attacking climate scientists and activists - Climate scientists and proponents of climate action are alarmist, biased, wrong, hypocritical, corrupt, and/or politically motivated.

    Fossil-Fuels-Needed: Claims promoting fossil fuel necessity - We need fossil fuels for economic growth, prosperity, and to maintain our standard of living.
    """

    quote = dspy.InputField(desc="Quote or claim about climate change")
    label = dspy.OutputField(
        desc="Label of the quote",
        choices=["Not-Happening","Not-Human","Not-Bad","Solutions-Harmful-Unnecessary","Science-is-Unreliable","Proponents-Biased","Fossil-Fuels-Needed"],
    )

# Initialize DSPy with an LLM (use default OpenAI if configured, or HF local)
lm=dspy.LM("openai/gpt-4.1-mini",api_key=userdata.get("openai-api-key"))
dspy.configure(lm=lm)

# Instantiate classifier
classifier = dspy.Predict(signature=ClimateViewClassifier)

# Run predictions on our 30 samples
for item in sampled_items:
    quote = item["quote"]
    true_label = item["label"]
    pred = classifier(quote=quote)
    print("Text:", quote.replace("\n", " "))
    print("True:", true_label)
    print("Pred:", pred.label)
    print("-" * 40)


Text: There is a lot of confusion in Asia about Alaska and the opportunities here, so we felt it was important to do this with BP and for us to be seen standing alongside each other. It’s an important way of demonstrating alignment,
True: 0_not_relevant
Pred: Not-Relevant
----------------------------------------
Text: “While record low sea ice is nothing new in the Arctic, this is a surprising turn of events for the Antarctic. Even as sea ice in the Arctic has seen a rapid and consistent decline over the past decade, its counterpart in the Southern Hemisphere has seen its extent increasing.”
True: 1_not_happening
Pred: Not-Happening
----------------------------------------
Text: In the end, it’s the opinion of climatologists that matter in the global warming debate, and the polls clearly show that the experts are skeptical,
True: 6_proponents_biased
Pred: Science-is-Unreliable
----------------------------------------
Text: I want to remind you what climate actually is. What kind of exp

# Follow-up task(s)

1.   Is the model accurate?
1.   Is the model wrong, when it disagrees with the dataset annotation?
2.   Think of any other classification schema of these claims, and try to modify the DSPy code to make it happen - did it work?
