<a href="https://colab.research.google.com/github/RicoStaedeli/NLP2025_CQG/blob/main/2_Baseline_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Baseline Predictions
In this file we generate the baseline predictions

## Setup

In [1]:
!pip install -U transformers
!pip install accelerate bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.

In [2]:
import torch
from google.colab import userdata, drive
import logging
import transformers
import os

In [3]:
token = userdata.get('GITHUB')
repo_url = f"https://{token}@github.com/RicoStaedeli/NLP2025_CQG.git"

!git clone {repo_url}

Cloning into 'NLP2025_CQG'...
remote: Enumerating objects: 1104, done.[K
remote: Counting objects: 100% (191/191), done.[K
remote: Compressing objects: 100% (115/115), done.[K
remote: Total 1104 (delta 133), reused 102 (delta 74), pack-reused 913 (from 1)[K
Receiving objects: 100% (1104/1104), 47.27 MiB | 16.36 MiB/s, done.
Resolving deltas: 100% (609/609), done.
Updating files: 100% (107/107), done.


In [4]:
os.chdir("NLP2025_CQG")
!ls

1_a_Generate_DPO_Dataset.ipynb	      Development
1_Information_preprocessing.md	      Doc
1_Preprocessing.ipynb		      Evaluation
2_Baseline_Generation.ipynb	      INFORMATION.md
2_Information_Baseline_Generation.md  LICENSE
3_Evaluation.ipynb		      Logs
3_Training_1_SFT_3.ipynb	      README.md
4_Finetuned_Generation.ipynb	      requirements.txt
5_Evaluation_Analytics.ipynb	      Training
Data				      Utils


In [5]:
################################################################################
#######################   PATH VARIABLES        ################################
################################################################################

model_name= "Meta-Llama-3.1-8B-Instruct-bnb-4bit"
model_id= f"unsloth/{model_name}"

test_dataset_path = f"Data/Processed/test.csv"

results_path = os.path.join(os.getcwd(), f"Evaluation/Results/results_schema_Baseline_{model_name}.json")

log_base_path = f"Logs/"
os.makedirs(log_base_path, exist_ok=True)

log_path = log_base_path + "2_baseline_generation.log"


################################################################################
#######################   STATIC VARIABLES      ################################
################################################################################


In [6]:
# Setup logger manually
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Create file handler (only if not already added)
if not logger.handlers:
    fh = logging.FileHandler(log_path)
    fh.setLevel(logging.INFO)
    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    fh.setFormatter(formatter)
    logger.addHandler(fh)

# Detect device
device = torch.device(
    "mps" if torch.backends.mps.is_available()
    else "cuda" if torch.cuda.is_available()
    else "cpu"
)

# Log the device info
logger.info("--------  Start with Baseline Generation  -------------")
logger.info(f'Device selected: {device}')
logger.info(f'Results Path: {results_path}')
logger.info(f'Log Path: {log_path}')
logger.info("--------------------------------------------------------")

INFO:__main__:--------  Start with Baseline Generation  -------------
INFO:__main__:Device selected: cuda
INFO:__main__:Results Path: /content/NLP2025_CQG/Evaluation/Results/results_schema_Baseline_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json
INFO:__main__:Log Path: Logs/2_baseline_generation.log
INFO:__main__:--------------------------------------------------------


## Generate Answers

In [7]:
import gc


In [8]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="cuda",
    quantization_config={"load_in_4bit": True},
    low_cpu_mem_usage=True,
)

def generate_response(prompt_texts, max_new_tokens=64):
    inputs = tokenizer(prompt_texts, return_tensors="pt", padding=True, truncation=True).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
    )

    decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    responses = []
    for full_output in decoded:
        # Extract text after ### Response:
        if "### Response:" in full_output:
            response = full_output.split("### Response:")[-1].strip()
        else:
            response = full_output.strip()

        # Remove leading 'assistant\n\n' if present
        if response.lower().startswith("assistant"):
            response = response[len("assistant"):].lstrip("\n ").strip()

        responses.append(response)
    return responses

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


tokenizer_config.json:   0%|          | 0.00/55.5k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [None]:
del model
gc.collect()
torch.cuda.empty_cache()

In [None]:
# schemas = {
#     "CauseToEffect": """'Cause to Effect' with the examples:
#     How strong is the generalisation that if <eventA> then <eventB>?
#     Are there other factors in this particular case that could have interfered with the event of‘<eventB>’?
#     """,

#     "ExpertOpinion": """'Expert Opinion' with the examples:
#     Is <expertE> a genuine expert in <domainD>?
#     Did <expertE> really assert that <eventA>? Is <expertE>’s pronouncement directly quoted? If not, is a reference to the original source given? Can it be checked?
#     If <expertE>’s advice is not quoted, does it look like important information or qualifications may have been left out?
#     Is what <expertE> said clear? Are there technical terms used that are not explained clearly?
#     Is <eventA> relevant to domain <domainD>?
#     Is <eventA> consistent with what other experts in <domainD> say?
#     Is <eventA> consistent with known evidence in <domainD>?
#     """,

#     "Analogy": """'Analogy' with the examples:
#     Are <C1> and <C2> similar in the respect cited?
#     Is <eventA> true in <C1>?
#     Are there differences between <C1> and <C2> that would tend to undermine the force of the similarity cited?
#     Is there some other case that is also similar to <C1>, but in which <eventA> is false?
#     """,

#     "FearAppeal": """'Fear Appeal' with the examples:
#     Is <eventB> bad? Why and to whom is it bad?
#     Is <eventA> away to prevent <eventB>?
#     Is it practically possible for <eventA> to happen?
#     Are there other consequences from <eventA>?
#     """
# }

In [10]:
schemas = {
    "CauseToEffect": """'Cause to Effect' with the examples:
    What if the job is one that will cause their illness to manifest and affect their job performance?
    If so, do you think the lack of any of the pieces influence the decisions they make about whether to create the digital widget or which digital widget to create?
    If miners and the working class are such a small percentage of the population as to not make a difference in a straight up popular vote, then why do we let them have so much influence?
    """,

    "ExpertOpinion": """'Expert Opinion' with the examples:
    Do you have a scientific source that confirm that women only wants one partner to a greater degree than men?
    If you are just stating facts, the surely you can point to some studies that show most women are narcissistic sociopaths?
    Where in any scientific textbook or journal have you seen evidence of sex being described as a spectrum?
    """,

    "Analogy": """'Analogy' with the examples:
    If so, would these groups resemble what are commonly referred to as races?
    If whites are guilty of enjoying the benefits of stolen land after being here for two generations, why would that not mean a 2nd generation American POC not have a similar benefit?
    But why is the way her hair naturally grows from her head seen as less professional than chemically altering to mimic straight white European hair?
    """,

    "FearAppeal": """'Fear Appeal' with the examples:
    If he did such a great job making peace, why do we need sanctions to pressure North Korea into stopping its violent threats?
    And if we should strive to stop all killings then why would the implement of this murder be spared if it causes a very large number of deaths but was intended to cause none?
    If you say you can, then are you implying that you see Islam as being no worse?
    """
}

In [11]:
!pip install tqdm
from tqdm.auto import tqdm



In [12]:
import pandas as pd
import json
from collections import defaultdict
results = {}
chunk_size = 20

for chunk in pd.read_csv(test_dataset_path, chunksize=chunk_size):
    contexts = chunk['input'].tolist()
    ids = chunk['id'].tolist()
    prompts = []
    input_ids = []
    schema_ids = []


    for idx, input_text in enumerate(contexts):
        input_id = ids[idx]
        for schema_name, schema_template in schemas.items():
            messages = [
                {"role": "system", "content": "You are a system designed to generate critical questions for a given argumentative context."},
                {"role": "user", "content": f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Generate one critical question addressing the given context following the schema:

### Schema:
{schema_template}

Your answer is just the question without anything else.

This is the given context to relate the question to:

### Context:
{input_text}

### Response:
"""}
            ]
            prompt_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            prompts.append(prompt_text)
            input_ids.append(input_id)
            schema_ids.append(schema_name)

    batch_size = 64
    num_batches = (len(prompts) + batch_size - 1) // batch_size

    with tqdm(total=num_batches, desc="Generating Critical Questions", leave=True) as pbar:
        for batch_start in range(0, len(prompts), batch_size):
            batch_prompts = prompts[batch_start:batch_start+batch_size]
            batch_outputs = generate_response(batch_prompts, max_new_tokens=256)

            for curr_id, schema_name, output in zip(
                input_ids[batch_start:batch_start + batch_size],
                schema_ids[batch_start:batch_start + batch_size],
                batch_outputs
                ):
                if curr_id not in results:
                    results[curr_id] = {
                        "input": chunk.loc[chunk['id'] == curr_id, 'input'].values[0],
                        "cqs": []
                    }
                results[curr_id]['cqs'].append({
                    "schema": schema_name,
                    "cq": output
                    })

            torch.cuda.empty_cache()
            gc.collect()
            pbar.update(1)

with open(results_path, 'w', encoding='utf-8') as f:
    json.dump(results, f, ensure_ascii=False, indent=2)

print(f"Results saved to {results_path}")


Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/1 [00:00<?, ?it/s]

Results saved to /content/NLP2025_CQG/Evaluation/Results/results_schema_Baseline_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json


## Commit & Push

In [13]:
!git config --global user.name "Showcas"
!git config --global user.email "cedric.bohni@gmx.de"


commit_message = f"Done baseline"
!git add .
!git commit -m "{commit_message}"
!git push

[main a423b4c] Done baseline
 2 files changed, 3913 insertions(+), 3908 deletions(-)
 rewrite Evaluation/Results/results_schema_Baseline_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json (81%)
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 12 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 73.39 KiB | 3.19 MiB/s, done.
Total 7 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.[K
To https://github.com/RicoStaedeli/NLP2025_CQG.git
   686d22e..a423b4c  main -> main
