<a href="https://colab.research.google.com/github/RicoStaedeli/NLP2025_CQG/blob/main/2_baseline_cqs_generation_schema.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Baseline Predictions
In this file we generate the baseline predictions

## Setup

In [None]:
!pip install -U transformers
!pip install accelerate bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.

In [None]:
import torch
from google.colab import userdata, drive
import logging
import transformers
import os

In [None]:
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
token = userdata.get('GITHUB')
repo_url = f"https://{token}@github.com/RicoStaedeli/NLP2025_CQG.git"

!git clone {repo_url}

Cloning into 'NLP2025_CQG'...
remote: Enumerating objects: 600, done.[K
remote: Counting objects: 100% (86/86), done.[K
remote: Compressing objects: 100% (49/49), done.[K
remote: Total 600 (delta 56), reused 49 (delta 37), pack-reused 514 (from 1)[K
Receiving objects: 100% (600/600), 24.81 MiB | 13.59 MiB/s, done.
Resolving deltas: 100% (282/282), done.


In [None]:
os.chdir("NLP2025_CQG")
!ls

1_Preprocessing.ipynb		     Development
2a_Baseline_Evaluation.ipynb	     Doc
2_Baseline_CQS_generation.ipynb      Evaluation
2_Baseline_CQS_generation_old.ipynb  Evaluation_Schema.ipynb
3a_Finetuned_CQS_generation.ipynb    INFORMATION.md
3b_Finetune_Evaluation.ipynb	     LICENSE
3_Training.ipynb		     Logs
4a_RAG_CQS_generation.ipynb	     README.md
4b_RAG_Evaluation.ipynb		     requirements.txt
4_RAG_System,.ipynb		     SocratiQ_final_prepro.ipynb
5_Evaluation_Analytics.ipynb	     Training
Data				     Utils


In [None]:
################################################################################
#######################   PATH VARIABLES        ################################
################################################################################

model_id= "Meta-Llama-3.1-8B-Instruct-bnb-4bit"


test_dataset_path = f"Data/Processed/test.csv"

results_path = os.path.join(os.getcwd(), f"Evaluation/Results/results_schema_{model_id}.json")
os.makedirs(results_path, exist_ok=True)

log_base_path = f"Logs/"
os.makedirs(log_base_path, exist_ok=True)

log_path = log_base_path + "2_baseline_generation.log"


################################################################################
#######################   STATIC VARIABLES      ################################
################################################################################


In [None]:
print(results_path)

Evaluation/Results/results_schema_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json


In [None]:
# Setup logger manually
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Create file handler (only if not already added)
if not logger.handlers:
    fh = logging.FileHandler(log_path)
    fh.setLevel(logging.INFO)
    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    fh.setFormatter(formatter)
    logger.addHandler(fh)

# Detect device
device = torch.device(
    "mps" if torch.backends.mps.is_available()
    else "cuda" if torch.cuda.is_available()
    else "cpu"
)

# Log the device info
logger.info("--------  Start with Baseline Generation  -------------")
logger.info(f'Device selected: {device}')
logger.info(f'Results Path: {results_path}')
logger.info(f'Log Path: {log_path}')
logger.info("--------------------------------------------------------")

INFO:__main__:--------  Start with Baseline Generation  -------------
INFO:__main__:Device selected: cuda
INFO:__main__:Results Path: Evaluation/Results/results_schema_unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit.json
INFO:__main__:Log Path: Logs/2_baseline_generation.log
INFO:__main__:--------------------------------------------------------


## Generate Answers

In [None]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={
        "torch_dtype": torch.float16,
        "quantization_config": {"load_in_4bit": True},
        "low_cpu_mem_usage": True,
    },
    device_map="auto",
)



generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.5k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

Device set to use cuda:0


In [None]:
def generate_response(prompt_text):
    messages = [
        {'role': 'system', 'content': 'You are a system designed to generate critical question for a given argumentative context.'},
        {'role': 'user', 'content': prompt_text},
    ]

    prompt = pipeline.tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    terminators = [
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids('<|eot_id|>')
    ]

    outputs = pipeline(
        prompt,
        max_new_tokens=256,
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
    )

    return outputs[0]['generated_text'][len(prompt):]

In [None]:
schemas = [
    """'Cause to Effect' with the examples:
    How strong is the generalisation that if <eventA> then <eventB>?
    Are there other factors in this particular case that could have interfered with the event of‘<eventB>’?
    """,
    """ 'Expert Opinion' with the examples:
    Is <expertE> a genuine expert in <domainD>?
    Did <expertE> really assert that <eventA>? Is <expertE>’s pronouncement directly quoted? If not, is a reference to the original source given? Can it be checked?
    If <expertE>’s advice is not quoted, does it look like important information or qualifications may have been left out?
    Is what <expertE> said clear? Are there technical terms used that are not explained clearly?
    Is <eventA> relevant to domain <domainD>?
    Is <eventA> consistent with what other experts in <domainD> say?
    Is <eventA> consistent with known evidence in <domainD>?
    """,
    """'Analogy' with the examples:
    Are <C1> and <C2> similar in the respect cited?
    Is <eventA> true in <C1>?
    Are there differences between <C1> and <C2> that would tend to undermine the force of the similarity cited?
    Is there some other case that is also similar to <C1>, but in which <eventA> is false?
    """,
    """'Fear Appeal' with the examples:
    Is <eventB> bad? Why and to whom is it bad?
    Is <eventA> away to prevent <eventB>?
    Is it practically possible for <eventA> to happen?
    Are there other consequences from <eventA>?
    """
    ]

In [None]:
import pandas as pd
import json

def process_dataset(input_csv, output_json):
    data = pd.read_csv(input_csv).head(5)
    results = {}
    for index, row in data.iterrows():
        input_text = row['input']
        input_id = f'id_{index + 1}'

        cqs = [{'cq': generate_response(
            f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

            ### Instruction:
            Generate one critical question addressing the given context following the schema:

            ### Schema:
            {schema}

            Your answer is just the question without anything else.

            This is the given context to relate the question to:

            ### Context:
            {input_text}

            ### Response:
            """
            )} for schema in schemas]
        results[input_id] = {
            'input': input_text,
            'cqs': cqs
        }

    output_json = "/content/NLP2025_CQG/Evaluation/Results/output.json"
    # Save results to JSON file
    with open(output_json, 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    print(f'Results saved to {output_json}')

# Run the processing
process_dataset(test_dataset_path, results_path)

Results saved to /content/NLP2025_CQG/Evaluation/Results/output.json


In [None]:
with open("/content/NLP2025_CQG/Evaluation/Results/output.json", 'r', encoding='utf-8') as f:
    content = f.read()
print(f"File content:\n{content}")

File content:
{
  "id_1": {
    "input": "CLINTON: \"which may prove to be an intelligence benefit\nwe've got to do everything we can to vacuum up intelligence from Europe, from the Middle East\nThat means we've got to work more closely with our allies, and that's something that Donald has been very dismissive of\nWe're working with NATO, the longest military alliance in the history of the world, to really turn our attention to terrorism\nWe're working with our friends in the Middle East, many of which, as you know, are Muslim majority nations\nDonald has consistently insulted Muslims abroad, Muslims at home, when we need to be cooperating with Muslim nations and with the American Muslim community\nThey're on the front lines\nThey can provide information to us that we might not get anywhere else\nThey need to have close working cooperation with law enforcement in these communities, not be alienated and pushed away as some of Donald's rhetoric, unfortunately, has led to\"",
    "cqs": [

## Commit & Push

In [None]:
!git config --global user.name "Showcas"
!git config --global user.email "cedric.bohni@gmx.de"


commit_message = f"trial baseline"
!git add .
!git commit -m "{commit_message}"
!git push

[main 4cfbf0c] trial baseline
 2 files changed, 92 insertions(+)
 create mode 100644 Evaluation/Results/output.json
Enumerating objects: 12, done.
Counting objects: 100% (12/12), done.
Delta compression using up to 12 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 3.86 KiB | 3.86 MiB/s, done.
Total 7 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.[K
To https://github.com/RicoStaedeli/NLP2025_CQG.git
   d80e595..4cfbf0c  main -> main
