<a href="https://colab.research.google.com/github/RicoStaedeli/NLP2025_CQG/blob/main/4_Finetuned_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuned Predictions
In this file we generate the finetuned predictions

## Setup

In [1]:
!pip install -U transformers
!pip install accelerate bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.

In [2]:
import torch
from google.colab import userdata
import logging
import transformers
import os


In [3]:
token = userdata.get('GITHUB')
repo_url = f"https://{token}@github.com/RicoStaedeli/NLP2025_CQG.git"

!git clone {repo_url}

Cloning into 'NLP2025_CQG'...
remote: Enumerating objects: 839, done.[K
remote: Counting objects: 100% (102/102), done.[K
remote: Compressing objects: 100% (65/65), done.[K
remote: Total 839 (delta 67), reused 40 (delta 37), pack-reused 737 (from 2)[K
Receiving objects: 100% (839/839), 25.71 MiB | 27.09 MiB/s, done.
Resolving deltas: 100% (419/419), done.


In [4]:
os.chdir("NLP2025_CQG")
!ls

1_Information_preprocessing.md	      Doc
1_Preprocessing.ipynb		      Evaluation
2_Baseline_Generation.ipynb	      INFORMATION.md
2_Information_Baseline_Generation.md  LICENSE
3_Evaluation.ipynb		      Logs
4_Finetuned_Generation.ipynb	      README.md
5_Evaluation_Analytics.ipynb	      requirements.txt
Data				      Training
Development			      Utils


In [5]:
################################################################################
#######################   PATH VARIABLES        ################################
################################################################################

model_name= "Meta-Llama-3.1-1B-Instruct_SFT_1"
model_id= f"ricostaedeli/{model_name}"

test_dataset_path = f"Data/Processed/test.csv"

results_path = os.path.join(os.getcwd(), f"Evaluation/Results/results_{model_name}.json")

log_base_path = f"Logs/"
os.makedirs(log_base_path, exist_ok=True)

log_path = log_base_path + "4_cqs_generation_SFT_1.log"


################################################################################
#######################   STATIC VARIABLES      ################################
################################################################################


In [6]:
# Setup logger manually
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Create file handler (only if not already added)
if not logger.handlers:
    fh = logging.FileHandler(log_path)
    fh.setLevel(logging.INFO)
    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    fh.setFormatter(formatter)
    logger.addHandler(fh)

# Detect device
device = torch.device(
    "mps" if torch.backends.mps.is_available()
    else "cuda" if torch.cuda.is_available()
    else "cpu"
)

# Log the device info
logger.info("--------  Start with Baseline Generation  -------------")
logger.info(f'Device selected: {device}')
logger.info(f'Results Path: {results_path}')
logger.info(f'Log Path: {log_path}')
logger.info("--------------------------------------------------------")

INFO:__main__:--------  Start with Baseline Generation  -------------
INFO:__main__:Device selected: cuda
INFO:__main__:Results Path: /content/NLP2025_CQG/Evaluation/Results/results_Meta-Llama-3.1-1B-Instruct_SFT_1.json
INFO:__main__:Log Path: Logs/4_cqs_generation_SFT_1.log
INFO:__main__:--------------------------------------------------------


## Generate Answers

In [8]:
import gc


In [7]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

In [14]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from bitsandbytes import nn as bnb_nn

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="cuda",
    quantization_config={
        "load_in_4bit": True,
        "bnb_4bit_compute_dtype": torch.float16,
    },
    low_cpu_mem_usage=True,
)

def generate_response(prompt_texts, max_new_tokens=64):
    inputs = tokenizer(prompt_texts, return_tensors="pt", padding=True, truncation=True).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
    )

    decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    responses = []
    for full_output in decoded:
        # Extract text after ### Response:
        if "### Response:" in full_output:
            response = full_output.split("### Response:")[-1].strip()
        else:
            response = full_output.strip()

        # Remove leading 'assistant\n\n' if present
        if response.lower().startswith("assistant"):
            response = response[len("assistant"):].lstrip("\n ").strip()

        responses.append(response)
    return responses

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
# del model
# gc.collect()
# torch.cuda.empty_cache()

In [10]:
schemas = {
    "CauseToEffect": """'Cause to Effect' with the examples:
    How strong is the generalisation that if <eventA> then <eventB>?
    Are there other factors in this particular case that could have interfered with the event of‘<eventB>’?
    """,

    "ExpertOpinion": """'Expert Opinion' with the examples:
    Is <expertE> a genuine expert in <domainD>?
    Did <expertE> really assert that <eventA>? Is <expertE>’s pronouncement directly quoted? If not, is a reference to the original source given? Can it be checked?
    If <expertE>’s advice is not quoted, does it look like important information or qualifications may have been left out?
    Is what <expertE> said clear? Are there technical terms used that are not explained clearly?
    Is <eventA> relevant to domain <domainD>?
    Is <eventA> consistent with what other experts in <domainD> say?
    Is <eventA> consistent with known evidence in <domainD>?
    """,

    "Analogy": """'Analogy' with the examples:
    Are <C1> and <C2> similar in the respect cited?
    Is <eventA> true in <C1>?
    Are there differences between <C1> and <C2> that would tend to undermine the force of the similarity cited?
    Is there some other case that is also similar to <C1>, but in which <eventA> is false?
    """,

    "FearAppeal": """'Fear Appeal' with the examples:
    Is <eventB> bad? Why and to whom is it bad?
    Is <eventA> away to prevent <eventB>?
    Is it practically possible for <eventA> to happen?
    Are there other consequences from <eventA>?
    """
}

In [11]:
!pip install tqdm
from tqdm.auto import tqdm



In [16]:
import pandas as pd
import json
from collections import defaultdict
results = {}
chunk_size = 20

for chunk in pd.read_csv(test_dataset_path, chunksize=chunk_size):
    contexts = chunk['input'].tolist()
    ids = chunk['id'].tolist()
    prompts = []
    input_ids = []
    schema_ids = []


    for idx, input_text in enumerate(contexts):
        input_id = ids[idx]
        for schema_name, schema_template in schemas.items():
            messages = [
                {"role": "system", "content": "You are a system designed to generate critical questions for a given argumentative context."},
                {"role": "user", "content": f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

              ### Instruction:
              Generate one critical question addressing the given context following the schema:

              ### Schema:
              {schema_template}

              Your answer is just the question without anything else.

              This is the given context to relate the question to:

              ### Context:
              {input_text}

              ### Response:
              """}
            ]
            prompt_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            prompts.append(prompt_text)
            input_ids.append(input_id)
            schema_ids.append(schema_name)

    batch_size = 64
    num_batches = (len(prompts) + batch_size - 1) // batch_size

    with tqdm(total=num_batches, desc="Generating Critical Questions", leave=True) as pbar:
        for batch_start in range(0, len(prompts), batch_size):
            batch_prompts = prompts[batch_start:batch_start+batch_size]
            batch_outputs = generate_response(batch_prompts, max_new_tokens=256)

            for curr_id, schema_name, output in zip(
                input_ids[batch_start:batch_start + batch_size],
                schema_ids[batch_start:batch_start + batch_size],
                batch_outputs
                ):
                if curr_id not in results:
                    results[curr_id] = {
                        "input": chunk.loc[chunk['id'] == curr_id, 'input'].values[0],
                        "cqs": []
                    }
                results[curr_id]['cqs'].append({
                    "schema": schema_name,
                    "cq": output
                    })

            torch.cuda.empty_cache()
            gc.collect()
            pbar.update(1)

with open(results_path, 'w', encoding='utf-8') as f:
    json.dump(results, f, ensure_ascii=False, indent=2)

print(f"Results saved to {results_path}")


Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Critical Questions:   0%|          | 0/1 [00:00<?, ?it/s]

Results saved to /content/NLP2025_CQG/Evaluation/Results/results_Meta-Llama-3.1-1B-Instruct_SFT_1.json


## Commit & Push

In [17]:
!git config --global user.name "Showcas"
!git config --global user.email "cedric.bohni@gmx.de"


commit_message = f"Finetuned generation"
!git add .
!git commit -m "{commit_message}"
!git push

[main a0a14dd] Finetuned generation
 1 file changed, 3735 insertions(+), 39 deletions(-)
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 86.17 KiB | 4.54 MiB/s, done.
Total 5 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/RicoStaedeli/NLP2025_CQG.git
   f7b3f04..a0a14dd  main -> main
