# LLM for stance detection


In this notebook, we will use the Large Language Models to perform stance detection. The task is to classify the stance of a given text towards a given target. And as I mentioned, the LLMs can be used to perform this task with Few-Shot or even Zero-Shot learning. I will provide a Zero-Shot learning example in this notebook.

I will demonstrate how to use the LLMs for stance detection using the `transformers` library. We will use the `google/gemma-3-12b-it` model for this task. The `transformers` library provides a simple API to use the LLMs for various NLP tasks.

We will use the `AutoModelForCausalLM` class to load the model and the `AutoTokenizer` class to load the tokenizer. Causal language models are the models that can generate the next token given the previous tokens and tokenizer is used to convert the text into tokens that can be fed into the model.


## Load the data


In [1]:
!nvidia-smi

Sat Apr  5 13:42:09 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A40                     On  |   00000000:22:00.0 Off |                    0 |
|  0%   38C    P0             56W /  300W |       1MiB /  46068MiB |      0%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
#!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 
#!pip3 install -U transformers
#!pip3 install -U accelerate


In [1]:
import pandas as pd

# Remove #SemST from the tweet

stance = pd.read_csv("/home/kklynxxu/622/reddit_stratified_400_with_manually_labels.csv")
# stance["Tweet"] = stance["Tweet"].str.replace("#SemST", "")

## Load the model and tokenizer


In [2]:
import os
os.environ["HF_TOKEN"] = "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"

token = os.environ["HF_TOKEN"]

In [3]:
# Load Huggingface Transformers library
# https://huggingface.co/transformers/
# clear jupyter notebook output
from IPython.display import clear_output
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'

model_name = "mistralai/Mistral-7B-Instruct-v0.1"  # LLM
tokenizer = AutoTokenizer.from_pretrained(model_name)  # Load tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name, device_map=device)  # Load model
clear_output()

## Specify the prompt template


Before we start format the prompt template, we need to understand what the inputs should look like and why we need to format the prompt template like this.

For training a Large Language Model (with 'trained' referring specifically to the model after human alignment) we need to provide the model with a prompt that contains the input text and the target text. And the model should be able to distinguish between human input and desired output. Therefore, we will roughly see two types of prompt templates: one will only distinguish between the human input and model output and the other will also provide the instruction. The only difference between the two is that the former will treat the instruction as a part of the input text and the latter will treat the instruction as a separate entity.

Based on the [technical report](https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf) provided by Google, they used the first type of prompt template. Therefore, we will also follow the same format for the prompt template.

The prompt template should look like this:

```
[BOS]<start_of_turn>user
Who are you?<end_of_turn>
<start_of_turn>model
My name is Gemma!<end_of_turn>
<start_of_turn>user
What is 2+2?<end_of_turn>
<start_of_turn>model
```

Where `<start_of_turn>` and `<end_of_turn>` are the special tokens that are used to indicate the start and end of the sequence. `user` and `model` are the special tokens that are used to indicate whether the content is from user or LLM. 

Fortunately, for inference-only tasks (such as zero-shot or few-shot learning), we don’t need to provide the model with the target text—only the input text is required. By using the `transformers` library’s `pipeline` function, we can format the input in a way that is applied uniformly across different LLMs, even if they use different chat templates.

To reuse the prompt template for different inputs, we will create a `f` string that will take the input text and the target text as input and return the formatted prompt template. If you are not familiar with the `f` string, you can think of it as a string that can take variables as input and return the formatted string. More details can be found [here](https://realpython.com/python-f-strings/).


Let's take an example to understand how the prompt template will look like for the stance detection task.


In [4]:
topic = "Vaccines"
comment = stance.loc[0, "comment"]

In [5]:
print(f"Topic: {topic}\ncomment: {comment}\n")

Topic: Vaccines
comment: bingo its always about the money



In [8]:
# Define the prompt template
#prompt = [
#   {
#        "role": "system",
#        "content": [{"type": "text", "text": "You are a helpful assistant."}]
#    },
#    {
#        "role": "user",
#        "content": [
#            {"type": "text", "text": f"""
#            Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a tweet, delimited by triple backticks, concerning the {topic}. Please make the following assessment:
#            (1) Determine whether the tweet discusses the topic of the {topic}. If so, please indicate whether the Twitter user who posted the tweet favors, opposes, or has no opinion on the {topic}.
#                                         
#            Your response should be formatted as follows: "Stance: [FAVOR, AGAINST, NONE]"
#
#            Tweet: ```{tweet}```"""}
#        ]
#    }
#]



In [6]:
def build_prompt(topic, comment):
    user_prompt = f"""Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a Reddit comment concerning the topic of "{topic}". Please make the following assessment:

(1) Does the comment discuss the topic?
(2) If yes, indicate the stance of the comment as Favorable, Neutral, or Oppose.

Reply in the following format:
Stance: [Favorable, Neutral, Oppose]

Comment: ```{comment}```"""

    return f"<start_of_turn>user\n{user_prompt}<end_of_turn>\n<start_of_turn>model\n"

In [7]:
prompt = build_prompt(topic, comment)
print(prompt)

<start_of_turn>user
Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a Reddit comment concerning the topic of "Vaccines". Please make the following assessment:

(1) Does the comment discuss the topic?
(2) If yes, indicate the stance of the comment as Favorable, Neutral, or Oppose.

Reply in the following format:
Stance: [Favorable, Neutral, Oppose]

Comment: ```bingo its always about the money```<end_of_turn>
<start_of_turn>model



### LLM inference


### Tokenize the input text


Since we already have the raw input text, we will need to transform it into the format that the model can understand. We will use the `AutoTokenizer` class to convert the input text into tokens that can be fed into the model. The `AutoTokenizer` class will automatically select the appropriate tokenizer for the model.

For more information on understand how tokenizer works and how to use the `AutoTokenizer` class, you can refer to the [official documentation](https://huggingface.co/transformers/model_doc/auto.html#autotokenizer).


In [11]:
# Tokenize the prompt
#inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(device)
# Take a look at the tokens
#print(f"Tokens: {inputs}\n")

In [8]:
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
print(f"Tokens: {inputs}\n")

Tokens: {'input_ids': tensor([[    1,   523,  2521, 28730,  1009, 28730,   499, 28767,  1838,    13,
         18066, 28747,   995,   506, 11012,   272,  3905,   302,   264,  2930,
           396,  1478,  1028, 28723,   560,   456,  3638, 28725,   368,   622,
           347,  7567,   395,   264, 22233,  4517, 15942,   272,  9067,   302,
           345, 28790,  4373,  1303,  2586,  5919,  1038,   272,  2296, 15081,
         28747,    13,    13, 28732, 28740, 28731,  9377,   272,  4517,  3342,
           272,  9067, 28804,    13, 28732, 28750, 28731,  1047,  5081, 28725,
         11634,   272, 27379,   302,   272,  4517,   390,   401,  3115,   522,
         28725,  3147,   329,  1650, 28725,   442, 22396,   645, 28723,    13,
            13, 23805,   297,   272,  2296,  5032, 28747,    13,   718,   617,
         28747,   733, 28765,  3115,   522, 28725,  3147,   329,  1650, 28725,
         22396,   645, 28793,    13,    13, 13617, 28747,  8789,  7551, 28709,
           378, 30969, 28713, 

In [13]:
# Note: the tokenized prompt can always be decoded back to the original prompt
# Decode the tokenized prompt
#decoded_prompt = tokenizer.decode(inputs[0])
#print(decoded_prompt)

In [9]:
decoded_prompt = tokenizer.decode(inputs.input_ids[0]) 
print(decoded_prompt)

<s> <start_of_turn>user
Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a Reddit comment concerning the topic of "Vaccines". Please make the following assessment:

(1) Does the comment discuss the topic?
(2) If yes, indicate the stance of the comment as Favorable, Neutral, or Oppose.

Reply in the following format:
Stance: [Favorable, Neutral, Oppose]

Comment: ```bingo its always about the money```<end_of_turn>
<start_of_turn>model



### Feed the input text (tokens) into the model


In [15]:
#outputs = model.generate(inputs, max_new_tokens=20)  # Generate the model output
# Decode the generated output
#generated_output = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
# Print the generated output and compare with human annotation
#print(
#    f"Generated Output: {generated_output}\nHuman annotation: {stance.loc[0,'Stance']}"
#)

In [10]:
outputs = model.generate(inputs.input_ids)
generated_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<start_of_turn>user
Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a Reddit comment concerning the topic of "Vaccines". Please make the following assessment:

(1) Does the comment discuss the topic?
(2) If yes, indicate the stance of the comment as Favorable, Neutral, or Oppose.

Reply in the following format:
Stance: [Favorable, Neutral, Oppose]

Comment: ```bingo its always about the money```<end_of_turn>
<start_of_turn>model
Stance: Oppose

The comment discusses the topic of vaccines, but it takes


There are several parameters that we can use to control the output of the model. We used the `max_length` parameter to control the maximum length of the output. We alsod use the `return_tensors` parameter to control the output format. We set it to `pt` to get the output in PyTorch tensors format.

Besides these parameters, we can also use the `temperature` parameter to control the randomness of the output. We can use the `top_k` and `top_p` parameters to control the diversity of the output. We can also use the `num_return_sequences` parameter to control the number of output sequences.

There is a great explanation of temperature parameter (which is also the parameter you will use for OpenAI's Models) in the [blog](https://lukesalamone.github.io/posts/what-is-temperature/)

Feel free to play around with these parameters to see how they affect the output of the model.


In [11]:
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=50,        
    temperature=0.7,         
    top_k=50,
    top_p=0.9,
    )
generated_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<start_of_turn>user
Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a Reddit comment concerning the topic of "Vaccines". Please make the following assessment:

(1) Does the comment discuss the topic?
(2) If yes, indicate the stance of the comment as Favorable, Neutral, or Oppose.

Reply in the following format:
Stance: [Favorable, Neutral, Oppose]

Comment: ```bingo its always about the money```<end_of_turn>
<start_of_turn>model
Stance: Oppose

The comment discusses the topic of vaccines, but it takes a negative stance by suggesting that the focus is always on money rather than public health.


In [12]:
import re
from tqdm import tqdm 

predictions = []

def extract_stance(text):
    match = re.search(r"Stance:\s*(Favorable|Neutral|Oppose)", text)
    return match.group(1) 

for comment in tqdm(stance["comment"], desc="Running LLM predictions"):
    prompt = build_prompt(topic, comment)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=50,        
        temperature=0.7,         
        top_k=50,
        top_p=0.9,
    )

    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)

    stance_result = extract_stance(decoded)
    predictions.append(stance_result)


stance["llm_predicted_stance"] = predictions


Running LLM predictions:   0%|          | 0/401 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Running LLM predictions:   0%|          | 1/401 [00:02<14:24,  2.16s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Running LLM predictions:   0%|          | 2/401 [00:05<17:13,  2.59s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Running LLM predictions:   1%|          | 3/401 [00

In [13]:
stance[["post_id", "comment", "manual","llm_predicted_stance"]].head()

Unnamed: 0,post_id,comment,manual,llm_predicted_stance
0,3_2_1,bingo its always about the money,Neutral,Oppose
1,3_4_2_1_2_1,oh bullshit he made a choice he has access to ...,Favorable,Oppose
2,7,well thats it people are going to start dying,Favorable,Oppose
3,1_4,someone said he looks like a ren and stimpy ch...,Neutral,Neutral
4,2_1_1,all his fans keep repeating that but i have no...,Neutral,Oppose


In [14]:
stance.to_csv("llm_stance_predictions.csv", index=False)

In [15]:
from sklearn.metrics import classification_report

print(classification_report(
    stance["manual"],  
    stance["llm_predicted_stance"],  
    labels=["Favorable", "Neutral", "Oppose"]
))


              precision    recall  f1-score   support

   Favorable       0.50      0.01      0.02        79
     Neutral       0.71      0.56      0.62       236
      Oppose       0.33      0.84      0.48        86

    accuracy                           0.51       401
   macro avg       0.52      0.47      0.38       401
weighted avg       0.59      0.51      0.47       401

