Part 0:
Initalize huggingface + import dataset



# Task
Provide the commands to use Hugging Face in Google Colab, including installation, authentication, model loading, and inference.

## Installation

### Subtask:
Install the necessary libraries for Hugging Face in Colab.


**Reasoning**:
Install the required libraries using pip.



In [1]:
%pip install transformers datasets accelerate evaluate

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


## Authentication

### Subtask:
Authenticate to Hugging Face Hub.


**Reasoning**:
Import the necessary function for authentication and initiate the login process.



In [2]:
import os
from huggingface_hub import login
from google.colab import userdata

# Use the HUGGINGFACE_KEY from Colab's Secrets Manager for authentication
hf_token = userdata.get('HUGGINGFACE_KEY')
if hf_token:
    login(token=hf_token)
else:
    print("HUGGINGFACE_KEY not found in Colab secrets. Please add it to Colab secrets.")

TimeoutException: Requesting secret HUGGINGFACE_KEY timed out. Secrets can only be fetched when running from the Colab UI.

# Task
Load multiple JSON files from a shared Google Drive folder into a Hugging Face dataset. The JSON files are located in subfolders within "/content/drive/Shared drives/CSE472".

## Mount google drive

### Subtask:
Mount your Google Drive to access the shared folder.


**Reasoning**:
Mount Google Drive to access the shared folder containing the JSON files.



**Reasoning**:
Google Drive mounting failed due to credential propagation issues. This sometimes happens in Colab and can be resolved by trying to mount again.



In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=False)

Mounted at /content/drive


## List files

### Subtask:
Find all the JSON files within the specified path and its subfolders.


**Reasoning**:
Find all JSON files within the specified path and its subfolders.



In [None]:
json_files = []
dataset_path = "/content/drive/Shared drives/CSE472"
for root, _, files in os.walk(dataset_path):
    for file in files:
        if file.endswith(".json"):
            json_files.append(os.path.join(root, file))

print(json_files)

['/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/69zn0n.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/1fujeg3.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/11qi515.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/azwj51.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/i0pe8m.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/i8bw63.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/hnewi2.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/h8ehbg.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/fkydqv.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/b62v0s.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/iiud5e.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/ch8dr5.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/h7g1qh.json', '/content/drive/Shared drives/CSE472/Dataset_v2/Game-47/emjz75.json', '/content/drive/S

**Reasoning**:
The variable `dataset_path` was not defined in the current execution context. Re-execute the cell that defines `dataset_path` and then find the JSON files.



## Load json files

### Subtask:
Read the content of each JSON file.


**Reasoning**:
Iterate through the list of JSON file paths, load the content of each file using `json.load`, and append the loaded data to a list.



## Load and flatten JSON data

### Subtask:
Read each JSON file and flatten the conversation thread into a chronological list of messages.

In [None]:
import json
import os

all_messages = []

def extract_messages(conversation_part):
    """Recursively extracts messages from nested replies."""
    messages = []
    if isinstance(conversation_part, dict):
        if 'Reply_Text' in conversation_part:
            messages.append(conversation_part)
        if 'Replies' in conversation_part:
            for reply in conversation_part['Replies']:
                messages.extend(extract_messages(reply))
    elif isinstance(conversation_part, list):
        for item in conversation_part:
            messages.extend(extract_messages(item))
    return messages


for file_path in json_files:
    try:
        with open(file_path, 'r') as f:
            data = json.load(f)
            if 'Threads' in data:
                for thread_item in data['Threads']:
                    all_messages.extend(extract_messages(thread_item))

    except FileNotFoundError:
        print(f"File not found: {file_path}")
    except json.JSONDecodeError:
        print(f"Error decoding JSON from file: {file_path}")
    except Exception as e:
        print(f"An error occurred while processing {file_path}: {e}")

print(f"Total number of messages loaded: {len(all_messages)}")

Total number of messages loaded: 9215


## Set Up the LLM Model

### Subtask:
Use HuggingFace Transformers to load a model `meta-llama/Llama-3.2-3B-Instruct`.
Use `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "meta-llama/Llama-3.2-3B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.float16,
    low_cpu_mem_usage=True,
    device_map="auto"
)

print(f"Model '{model_name}' loaded successfully.")

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

Model 'meta-llama/Llama-3.2-3B-Instruct' loaded successfully.


Part 1: Generate Mediation Outputs (Judgment and Steering)

Load and prepare conversation data

In [None]:
# Prepare conversation data for mediation
import json
import random

def format_conversation(thread_messages):
    """Format a list of messages into a readable conversation string."""
    conversation = ""
    for i, msg in enumerate(thread_messages):
        author = msg.get('Reply_User', f'User_{i}')
        text = msg.get('Reply_Text', '')
        conversation += f"{author}: {text}\n\n"
    return conversation.strip()

# Group messages by post_id for processing
conversations_by_post = {}
for file_path in json_files:
    try:
        with open(file_path, 'r') as f:
            data = json.load(f)
            post_id = data.get('Post_id', os.path.basename(file_path).replace('.json', ''))

            if 'Threads' in data:
                thread_messages = extract_messages(data['Threads'])
                if len(thread_messages) >= 2:  # Only process conversations with at least 2 messages
                    conversations_by_post[post_id] = thread_messages
    except Exception as e:
        print(f"Error processing {file_path}: {e}")

print(f"Total conversations to process: {len(conversations_by_post)}")

Total conversations to process: 238


Define prompt templates

In [None]:
# Judgment Prompt Template
JUDGMENT_PROMPT = """You are a neutral moderator analyzing a heated argument between two users. Your task is to provide an objective judgment.

Conversation:
{conversation_text}

Instructions:
1. Identify the main points of the disagreement.
2. Determine which participant (if any) displays more reason, evidence, and respectful conduct.
3. Provide a clear and fair verdict (e.g., "User A is more reasonable", "User B is more reasonable", or "Both sides share responsibility").
4. Clearly explain **why** in 3-6 sentences. Focus only on reasoning and tone displayed in the conversation.

Output Format (strict JSON):
{{
  "verdict": "<User A | User B | Both>",
  "rationale": "<Your explanation>"
}}"""

# Steering Prompt Template
STEERING_PROMPT = """You are a conflict mediator. Your goal is to calm tension and encourage productive communication.

Conversation:
{conversation_text}

Instructions:
Write a single message (not a judgment) addressed to both participants that:
- Acknowledges both perspectives respectfully
- Avoids taking sides or assigning blame
- Encourages shared understanding, clarity, and constructive dialogue
- Uses calm, empathetic language
- Is 3–6 sentences long

Output Format:
<Your de-escalation message here>"""

Generate mediation outputs

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def generate_mediation(prompt, max_new_tokens=512):
    """Generate mediation output using the LLM."""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the generated part (remove the prompt)
    response = generated_text[len(tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)):].strip()
    return response

# Generate mediations for all conversations
mediation_results = []

# Limit to first 50 conversations for testing (remove limit for full run)
sample_conversations = dict(list(conversations_by_post.items())[:50])

for post_id, thread_messages in sample_conversations.items():
    print(f"Processing conversation: {post_id}")

    conversation_text = format_conversation(thread_messages)

    # Generate judgment
    judgment_prompt = JUDGMENT_PROMPT.format(conversation_text=conversation_text)
    judgment_output = generate_mediation(judgment_prompt, max_new_tokens=256)

    # Generate steering
    steering_prompt = STEERING_PROMPT.format(conversation_text=conversation_text)
    steering_output = generate_mediation(steering_prompt, max_new_tokens=256)

    mediation_results.append({
        'post_id': post_id,
        'conversation': conversation_text,
        'judgment': judgment_output,
        'steering': steering_output
    })

print(f"Generated mediations for {len(mediation_results)} conversations")

Processing conversation: 69zn0n


OutOfMemoryError: CUDA out of memory. Tried to allocate 62.24 GiB. GPU 0 has a total capacity of 14.74 GiB of which 2.93 GiB is free. Process 4046 has 11.80 GiB memory in use. Of the allocated memory 10.87 GiB is allocated by PyTorch, and 832.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Save results from part 1

In [None]:
# Save mediation results
output_file = '/content/mediation_outputs.json'

with open(output_file, 'w') as f:
    json.dump(mediation_results, f, indent=2)

print(f"Mediation results saved to {output_file}")

# Display sample result
if mediation_results:
    print("\n=== Sample Result ===")
    sample = mediation_results[0]
    print(f"Post ID: {sample['post_id']}")
    print(f"\nJudgment: {sample['judgment'][:200]}...")
    print(f"\nSteering: {sample['steering'][:200]}...")

Part 2: LLM-as-a-Judge Evaluation

In [None]:
# Load the mediation outputs from Part 1
with open('/content/mediation_outputs.json', 'r') as f:
    mediation_data = json.load(f)

print(f"Loaded {len(mediation_data)} mediation outputs for evaluation")

Setup Judge Model

In [None]:
# Load Qwen judge model
judge_model_name = "Qwen/Qwen2.5-3B-Instruct"  # Using available model

# Free up memory from previous model if needed
if 'model' in locals():
    del model
    torch.cuda.empty_cache()

judge_tokenizer = AutoTokenizer.from_pretrained(judge_model_name)
judge_model = AutoModelForCausalLM.from_pretrained(
    judge_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

print(f"Judge model '{judge_model_name}' loaded successfully.")

Judge Eval Prompt

In [None]:
EVALUATION_PROMPT = """You are acting as an impartial evaluation judge. You will score the quality of a mediation response.

Original Conversation:
{conversation_text}

Mediation Output to Evaluate:
{mediation_text}

Instructions:
Evaluate the mediation output on:
- Clarity
- Fairness / Neutrality
- Constructiveness (ability to de-escalate conflict)
- Alignment with the conversation context

Provide:
1. A short rationale explaining your evaluation.
2. A numeric score from 1 to 5:
   1 = Poor (biased, unclear, unhelpful)
   3 = Moderate (partially reasonable, needs improvement)
   5 = Excellent (clear, fair, constructive, context-aware)

Output Format (strict JSON):
{{
  "rationale": "<Your reasoning>",
  "score": <number>
}}"""

Function to extract rationale and score

In [None]:
import re

def extract_evaluation(generated_text):
    """Extract rationale and score from LLM output."""
    try:
        # Try to parse as JSON first
        # Look for JSON-like structure in the text
        json_match = re.search(r'\{[^}]+\}', generated_text, re.DOTALL)
        if json_match:
            json_str = json_match.group()
            eval_dict = json.loads(json_str)
            return {
                'rationale': eval_dict.get('rationale', ''),
                'score': int(eval_dict.get('score', 3))
            }
    except:
        pass

    # Fallback: extract using regex
    rationale_match = re.search(r'"rationale":\s*"([^"]+)"', generated_text)
    score_match = re.search(r'"score":\s*(\d+)', generated_text)

    rationale = rationale_match.group(1) if rationale_match else "Unable to extract rationale"
    score = int(score_match.group(1)) if score_match else 3

    return {'rationale': rationale, 'score': score}

def generate_evaluation(prompt, max_new_tokens=384):
    """Generate evaluation using the judge LLM."""
    inputs = judge_tokenizer(prompt, return_tensors="pt").to(judge_model.device)

    with torch.no_grad():
        outputs = judge_model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.3,  # Lower temperature for more consistent evaluation
            do_sample=True,
            pad_token_id=judge_tokenizer.eos_token_id
        )

    generated_text = judge_tokenizer.decode(outputs[0], skip_special_tokens=True)
    response = generated_text[len(judge_tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)):].strip()
    return response

Running the eval

In [None]:
# Evaluate all mediation outputs
evaluation_results = []

for item in mediation_data:
    print(f"Evaluating: {item['post_id']}")

    # Evaluate judgment
    judgment_eval_prompt = EVALUATION_PROMPT.format(
        conversation_text=item['conversation'][:1000],  # Truncate to avoid token limits
        mediation_text=item['judgment']
    )
    judgment_eval_raw = generate_evaluation(judgment_eval_prompt)
    judgment_eval = extract_evaluation(judgment_eval_raw)

    # Evaluate steering
    steering_eval_prompt = EVALUATION_PROMPT.format(
        conversation_text=item['conversation'][:1000],
        mediation_text=item['steering']
    )
    steering_eval_raw = generate_evaluation(steering_eval_prompt)
    steering_eval = extract_evaluation(steering_eval_raw)

    evaluation_results.append({
        'post_id': item['post_id'],
        'judgment_evaluation': {
            'rationale': judgment_eval['rationale'],
            'score': judgment_eval['score']
        },
        'steering_evaluation': {
            'rationale': steering_eval['rationale'],
            'score': steering_eval['score']
        }
    })

print(f"Completed evaluations for {len(evaluation_results)} conversations")

Saving Part 2 results

In [None]:
# Save evaluation results
eval_output_file = '/content/evaluation_results.json'

with open(eval_output_file, 'w') as f:
    json.dump(evaluation_results, f, indent=2)

print(f"Evaluation results saved to {eval_output_file}")

# Display sample evaluation
if evaluation_results:
    print("\n=== Sample Evaluation ===")
    sample = evaluation_results[0]
    print(f"Post ID: {sample['post_id']}")
    print(f"\nJudgment Score: {sample['judgment_evaluation']['score']}")
    print(f"Judgment Rationale: {sample['judgment_evaluation']['rationale'][:150]}...")
    print(f"\nSteering Score: {sample['steering_evaluation']['score']}")
    print(f"Steering Rationale: {sample['steering_evaluation']['rationale'][:150]}...")

Part 3: Advanced Prompting Strategies

Part 3A: Demonstration (Few-Shot) Prompting

In [None]:
# Define few-shot examples (manually created high-quality examples)
FEW_SHOT_EXAMPLES = [
    {
        'conversation': """User_A: Your argument makes no sense. You clearly don't understand basic economics.

User_B: Actually, if you'd read any reputable source, you'd know I'm right. Your stubbornness is showing.""",
        'mediation': """I appreciate both of you sharing your perspectives on this economic topic. User_A, you've raised valid concerns about the argument's clarity. User_B, you've pointed to external sources to support your view. Rather than questioning each other's understanding, could we focus on the specific economic principles at hand? Perhaps each of you could cite one key source or example that supports your position, so we can have a more productive discussion.""",
        'rationale': "This mediation effectively acknowledges both participants' points without taking sides. It redirects from personal attacks to substantive discussion by suggesting concrete next steps (citing sources). The tone is calm and constructive, encouraging evidence-based dialogue.",
        'score': 5
    },
    {
        'conversation': """User_C: This policy is ridiculous and anyone who supports it is an idiot.

User_D: Well, you're clearly too biased to see the benefits. Typical.""",
        'mediation': """Let's take a step back from the personal characterizations. Both of you seem passionate about this policy issue. User_C, could you explain what specific aspects of the policy concern you? User_D, what benefits do you see? Understanding the concrete pros and cons might help us move forward.""",
        'rationale': "The mediation acknowledges passion but doesn't fully address the hostile language used. While it attempts redirection toward substance, it could be more explicit about the need for respectful discourse. The suggestion to discuss specifics is good but feels somewhat generic.",
        'score': 3
    }
]

# Few-shot evaluation prompt template
FEW_SHOT_EVAL_PROMPT = """You are an impartial evaluation judge. Below are examples of high-quality evaluations:

Example 1:
Conversation:
{example1_conv}
Mediation:
{example1_med}
Evaluation Output:
{{
  "rationale": "{example1_rationale}",
  "score": {example1_score}
}}

Example 2:
Conversation:
{example2_conv}
Mediation:
{example2_med}
Evaluation Output:
{{
  "rationale": "{example2_rationale}",
  "score": {example2_score}
}}

Now evaluate the following mediation using the same criteria and style:

Conversation:
{conversation_text}

Mediation Output:
{mediation_text}

Output Format (strict JSON):
{{
  "rationale": "<Your reasoning>",
  "score": <number>
}}"""

# Format the few-shot prompt
few_shot_prompt_template = FEW_SHOT_EVAL_PROMPT.format(
    example1_conv=FEW_SHOT_EXAMPLES[0]['conversation'],
    example1_med=FEW_SHOT_EXAMPLES[0]['mediation'],
    example1_rationale=FEW_SHOT_EXAMPLES[0]['rationale'],
    example1_score=FEW_SHOT_EXAMPLES[0]['score'],
    example2_conv=FEW_SHOT_EXAMPLES[1]['conversation'],
    example2_med=FEW_SHOT_EXAMPLES[1]['mediation'],
    example2_rationale=FEW_SHOT_EXAMPLES[1]['rationale'],
    example2_score=FEW_SHOT_EXAMPLES[1]['score'],
    conversation_text="{conversation_text}",
    mediation_text="{mediation_text}"
)

Run few-shot evaluation

In [None]:
# Evaluate using few-shot prompting
fewshot_results = []

# Evaluate subset of conversations
for item in mediation_data[:20]:  # Evaluate first 20
    print(f"Few-shot evaluating: {item['post_id']}")

    # Evaluate steering with few-shot
    fewshot_prompt = few_shot_prompt_template.format(
        conversation_text=item['conversation'][:800],
        mediation_text=item['steering']
    )

    fewshot_eval_raw = generate_evaluation(fewshot_prompt, max_new_tokens=384)
    fewshot_eval = extract_evaluation(fewshot_eval_raw)

    fewshot_results.append({
        'post_id': item['post_id'],
        'method': 'few-shot',
        'evaluation': fewshot_eval
    })

print(f"Completed few-shot evaluations for {len(fewshot_results)} conversations")

Part 3B: Rule-Based / Rubric-Guided Evaluation

In [None]:
# Rule-based evaluation prompt with explicit rubric
RUBRIC_EVAL_PROMPT = """You are an impartial evaluation judge. Use the following rubric to evaluate the mediation:

Rubric Criteria:
- **Neutrality (0–2 points):** Does not take sides; avoids accusatory language.
- **Clarity (0–1 point):** Message is understandable, well-structured, and concise.
- **Constructiveness (0–2 points):** Encourages calm, collaboration, and future progress.

Total Score = Neutrality + Clarity + Constructiveness (range 0–5)

Now evaluate the mediation below according to the rubric.

Conversation:
{conversation_text}

Mediation Output:
{mediation_text}

Output Format (strict JSON):
{{
  "rationale": "<Explain how each rubric category applies>",
  "score": <0-5>
}}"""

Run rule based eval

In [None]:
# Evaluate using rubric-based prompting
rubric_results = []

for item in mediation_data[:20]:  # Evaluate first 20
    print(f"Rubric evaluating: {item['post_id']}")

    # Evaluate steering with rubric
    rubric_prompt = RUBRIC_EVAL_PROMPT.format(
        conversation_text=item['conversation'][:800],
        mediation_text=item['steering']
    )

    rubric_eval_raw = generate_evaluation(rubric_prompt, max_new_tokens=384)
    rubric_eval = extract_evaluation(rubric_eval_raw)

    rubric_results.append({
        'post_id': item['post_id'],
        'method': 'rubric-based',
        'evaluation': rubric_eval
    })

print(f"Completed rubric evaluations for {len(rubric_results)} conversations")

Save Part 3 results and compare methods

In [None]:
# Save advanced evaluation results
advanced_eval_file = '/content/advanced_evaluation_results.json'

advanced_results = {
    'few_shot': fewshot_results,
    'rubric_based': rubric_results
}

with open(advanced_eval_file, 'w') as f:
    json.dump(advanced_results, f, indent=2)

print(f"Advanced evaluation results saved to {advanced_eval_file}")

# Compare evaluation methods
print("\n=== Comparison of Evaluation Methods ===")
print(f"Few-Shot Average Score: {sum(r['evaluation']['score'] for r in fewshot_results) / len(fewshot_results):.2f}")
print(f"Rubric-Based Average Score: {sum(r['evaluation']['score'] for r in rubric_results) / len(rubric_results):.2f}")

# Display sample comparison
if fewshot_results and rubric_results:
    print("\n=== Sample Comparison ===")
    post_id = fewshot_results[0]['post_id']
    print(f"Post ID: {post_id}")
    print(f"\nFew-Shot Score: {fewshot_results[0]['evaluation']['score']}")
    print(f"Few-Shot Rationale: {fewshot_results[0]['evaluation']['rationale'][:150]}...")
    print(f"\nRubric Score: {rubric_results[0]['evaluation']['score']}")
    print(f"Rubric Rationale: {rubric_results[0]['evaluation']['rationale'][:150]}...")