<a href="https://colab.research.google.com/github/TurkuNLP/DIGHT25/blob/main/01_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DSPy tutorial part 1:

*   Grabs a dataset on climate change claims
*   Carries out the same classification task using DSPy
*   Compares the outputs

IMPORTANT

Before you start, press the "key" symbol on the left, create a "new secret"
called "openai-api-key" and the value is the key you get from me. Allow "Notebook access".

In [1]:
!pip3 install -q 'datasets<4.0.0'
!pip3 install -q dspy

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m260.1/260.1 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m86.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m32.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.4/57.4 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.4/247.4 kB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from datasets import load_dataset
from google.colab import userdata
import random


#### new comment

# Load the Frugal AI Challenge dataset
dataset = load_dataset("QuotaClimat/frugalaichallenge-text-train")

# Access the training split
train_data = dataset["train"]

# Pick 30 random items
sampled_items = random.sample(list(train_data), 30)

# Print out a few samples to see the structure
for item in sampled_items[:10]: #prints 10 samples
    print(" Quote:", item["quote"])
    print(" Label:", item["label"])
    print()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/1.02M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/248k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1219 [00:00<?, ? examples/s]

 Quote: Greece, like many other EU countries, will remain in this paradigm. We should not dream that tomorrow everything will be 100% renewable.
 Label: 4_solutions_harmful_unnecessary

 Quote: The alarmism is not going away without a struggle.
 Label: 6_proponents_biased

 Quote: Why was there a warming period in the Roman times over a couple of hundred years? It was exceptionally warm and then it got cold.
 Label: 2_not_human

 Quote: Proponents of policies to control human-induced global warming cite science as the basis for their claims and proposals. There is only one problem – as much as they claim otherwise, there is no scientific consensus for their theories.
 Label: 5_science_unreliable

 Quote: The assumptions do not lead us to conclude that we should venture forth and take costly action to reduce emissions. […] There is a matter of benefits and costs. Is it possible that global climate change is on balance helpful to human populations and the environment? And if not, is it p

In [4]:
import dspy


class ClimateViewClassifier(dspy.Signature):
    """Classify a quote or claim about climate change using one of the following labels:

    Not-Relevant: No relevant claim detected or claims that don't fit other categories

    Not-Happening: Claims denying the occurrence of global warming and its effects - Global warming is not happening. Climate change is NOT leading to melting ice (such as glaciers, sea ice, and permafrost), increased extreme weather, or rising sea levels. Cold weather also shows that climate change is not happening

    Not-Human: Claims denying human responsibility in climate change - Greenhouse gases from humans are not the causing climate change.

    Not-Bad: Claims minimizing or denying negative impacts of climate change - The impacts of climate change will not be bad and might even be beneficial.

    Solutions-Harmful-Unnecessary: Claims against climate solutions - Climate solutions are harmful or unnecessary

    Science-is-Unreliable: Claims questioning climate science validity - Climate science is uncertain, unsound, unreliable, or biased.

    Proponents-Biased: Claims attacking climate scientists and activists - Climate scientists and proponents of climate action are alarmist, biased, wrong, hypocritical, corrupt, and/or politically motivated.

    Fossil-Fuels-Needed: Claims promoting fossil fuel necessity - We need fossil fuels for economic growth, prosperity, and to maintain our standard of living.
    """

    quote = dspy.InputField(desc="Quote or claim about climate change")
    label = dspy.OutputField(
        desc="Label of the quote",
        choices=["Not-Happening","Not-Human","Not-Bad","Solutions-Harmful-Unnecessary","Science-is-Unreliable","Proponents-Biased","Fossil-Fuels-Needed"],
    )

# Initialize DSPy with an LLM (use default OpenAI if configured, or HF local)
lm=dspy.LM("openai/gpt-4.1-mini",api_key=userdata.get("openai-api-key"))
dspy.configure(lm=lm)

# Instantiate classifier
classifier = dspy.Predict(signature=ClimateViewClassifier)

# Run predictions on our 30 samples
for item in sampled_items:
    quote = item["quote"]
    true_label = item["label"]
    pred = classifier(quote=quote)
    print("Text:", quote.replace("\n", " "))
    print("True:", true_label)
    print("Pred:", pred.label)
    print("-" * 40)


Text: Greece, like many other EU countries, will remain in this paradigm. We should not dream that tomorrow everything will be 100% renewable.
True: 4_solutions_harmful_unnecessary
Pred: Solutions-Harmful-Unnecessary
----------------------------------------
Text: The alarmism is not going away without a struggle.
True: 6_proponents_biased
Pred: Proponents-Biased
----------------------------------------
Text: Why was there a warming period in the Roman times over a couple of hundred years? It was exceptionally warm and then it got cold.
True: 2_not_human
Pred: Not-Human
----------------------------------------
Text: Proponents of policies to control human-induced global warming cite science as the basis for their claims and proposals. There is only one problem – as much as they claim otherwise, there is no scientific consensus for their theories.
True: 5_science_unreliable
Pred: Science-is-Unreliable
----------------------------------------
Text: The assumptions do not lead us to conclu

# Follow-up task(s)

1.   Is the model accurate?
1.   Is the model wrong, when it disagrees with the dataset annotation?
1.   If you change the model from gpt-4.1-mini to gpt-4.1 is there any difference? (do not rerun the whole notebook so you keep your random sample)
2.   Think of any other classification schema of these claims, and try to modify the DSPy code to make it happen - did it work?
