# Policy Simulation for A/B/C using `comments.csv` (sample_id/thread_id/text)

This notebook:
- Loads `../data/comments.csv` with columns:
  - `sample_id`, `thread_id`, `role`, `order_in_thread`, `text`
- Applies three moderation policies (A, B, C) to each `text` using an LLM
- Saves the results to `../results/comments_with_policy_results.csv`

It assumes:
- This notebook is located in `notebooks/`
- There is a `data/` folder next to it containing `comments.csv`
- There is a `.env` file in the project root with `OPENAI_API_KEY=...`


In [1]:
# !pip install pandas tqdm python-dotenv openai

import os
import json
import time
from pathlib import Path

import pandas as pd
from tqdm import tqdm
from dotenv import load_dotenv
from openai import OpenAI


In [2]:
# Load environment variables (expects .env with OPENAI_API_KEY)
load_dotenv()

CLIENT = OpenAI(api_key=os.getenv("OPENAI_API_KEY", ""))

PROJECT_ROOT = Path("..").resolve()
DATA_DIR = PROJECT_ROOT / "data"
RESULTS_DIR = PROJECT_ROOT / "results"

DATA_DIR.mkdir(parents=True, exist_ok=True)
RESULTS_DIR.mkdir(parents=True, exist_ok=True)

COMMENTS_CSV = DATA_DIR / "comments.csv"
OUTPUT_CSV = RESULTS_DIR / "comments_with_policy_results.csv"

print("Project root:", PROJECT_ROOT)
print("Comments CSV:", COMMENTS_CSV)
print("Output CSV  :", OUTPUT_CSV)


Project root: C:\Users\Gibeom Kim\Desktop\UnderGraduate\3. junior\techno_science
Comments CSV: C:\Users\Gibeom Kim\Desktop\UnderGraduate\3. junior\techno_science\data\comments.csv
Output CSV  : C:\Users\Gibeom Kim\Desktop\UnderGraduate\3. junior\techno_science\results\comments_with_policy_results.csv


In [None]:
import os
from dotenv import load_dotenv

load_dotenv()  # .env 읽기

key = os.getenv("OPENAI_API_KEY")
print("OPENAI_API_KEY =", repr(key))


In [4]:
resp = CLIENT.chat.completions.create(
#    model="gpt-4.1-mini",
    model="gpt-5.1",
    messages=[{"role": "user", "content": "이건 API 키 테스트용 메시지입니다."}],
    temperature=0.0,
)

print(resp.choices[0].message.content)


알겠습니다. API 키 테스트용 메시지 잘 전달되었습니다.


In [5]:
POLICY_A_TEXT = """Policy A: High Protection, Low Tolerance (매너 최우선)

- BLOCK if:
  - Any direct insult or derogatory expression toward a person or group
    (예: "멍청이", "병x", "저년", "틀딱" 등).
  - Swearing or profanity directed at a person, group, or their work
    (예: "씨발", "좆같은 과제", "저 교수 또 ㅈ같이 하네").
  - Sarcastic or mocking tone that targets a person’s character, ability, or identity.
  - Shaming of appearance, background, academic ability, or mental health.
  - Any hate speech or slur toward social groups (gender, region, race, sexuality, etc.).
  - Any threat, doxxing, or suggestion of harm.

- WARN_AND_ALLOW if:
  - Strong criticism of ideas, policies, or institutions without personal insult.
  - Rude words aimed at a system or policy, not at a person.

- ALLOW if:
  - Polite or neutral comments.
  - Constructive criticism with respectful language.
  - Jokes or slang that are not targeted and not hateful.

Principle: When in doubt, prioritize emotional safety and manner
even if some harsh but meaningful criticism is lost.
""".strip()

POLICY_B_TEXT = """Policy B: Balanced (매너와 토론의 균형)

- BLOCK if:
  - Direct, explicit personal attack to a specific individual using insults or slurs.
  - Hate speech toward a protected/vulnerable group (gender, sexuality, race, disability, region, etc.).
  - Threats of violence, doxxing, or calls for harassment.

- WARN_AND_ALLOW if:
  - Strong, harsh expressions that mainly target ideas, policies, classes, or systems.
  - Profanity is present but not clearly directed at a specific person.
  - Mild sarcasm but focused on a public issue (tuition, dorm policy, grading, etc.).

- ALLOW if:
  - Criticism is expressed in relatively respectful language.
  - Comment contributes arguments, experiences, or information.
  - Emotional expressions that are not direct harassment or hate speech.

Principle: When in doubt, preserve debate value if the target is
an idea or policy, but never allow direct harassment or hate speech.
""".strip()

POLICY_C_TEXT = """Policy C: Minimal Regulation, High Freedom (최소 규제, 표현의 자유 우선)

- BLOCK if:
  - Severe insult or slur naming a person or clearly identifiable small group,
    combined with clear hostility or harm.
  - Clear hate speech toward protected groups using abusive slurs.
  - Explicit threats of violence, doxxing, or incitement to harassment.

- WARN_AND_ALLOW if:
  - Harsh, emotional language with insults or profanity that is not an explicit threat
    and not targeted at a protected group.
  - Rude mockery that is uncomfortable but not clearly hateful or threatening.

- ALLOW if:
  - Almost any criticism, even if rude, as long as:
    - No explicit threats.
    - No hate speech.
    - No doxxing.

Principle: When in doubt, err on the side of freedom of expression,
and only block content that is clearly hateful or threatening.
""".strip()

POLICIES = {
    "A": POLICY_A_TEXT,
    "B": POLICY_B_TEXT,
    "C": POLICY_C_TEXT,
}


In [6]:
def build_prompt(comment_text: str) -> str:
    """Build a single prompt that explains policies A/B/C
    and asks the model to output a strict JSON result."""
    prompt = f"""You are a content moderator for a Korean university online forum.
Your job is to apply three different moderation policies (A, B, C)
to the same comment and show how each policy would treat the comment.

Here is the definition of each policy:

[Policy A]
{POLICIES['A']}

[Policy B]
{POLICIES['B']}

[Policy C]
{POLICIES['C']}

You must strictly follow each policy as written, not your personal opinion.

For each policy, you MUST output:
- decision: one of [BLOCK, WARN_AND_ALLOW, ALLOW]
- short_reason: brief explanation in 1-2 sentences focusing on which rule applies.
- tone_score: integer 1-5 (1 = very polite, 5 = extremely aggressive)
- debate_value_score: integer 1-5 (1 = no argument or information, 5 = high contribution to debate)

Here is the comment from the forum (in Korean):

"{comment_text}"

Please respond in the following JSON format only:

{{
  "A": {{
    "decision": "...",
    "short_reason": "...",
    "tone_score": ...,
    "debate_value_score": ...
  }},
  "B": {{
    "decision": "...",
    "short_reason": "...",
    "tone_score": ...,
    "debate_value_score": ...
  }},
  "C": {{
    "decision": "...",
    "short_reason": "...",
    "tone_score": ...,
    "debate_value_score": ...
  }}
}}
"""  # noqa: E501
    return prompt.strip()


In [7]:
def call_llm(prompt: str, max_retries: int = 3, sleep_sec: float = 2.0) -> str:
    """Call the OpenAI Chat Completions API and return the model's text output.

    - Uses `gpt-4.1-mini` as a default (cheap+strong enough for this task).
    - Temperature is fixed at 0.0 for deterministic, policy-following behavior.
    """
    last_error = None
    for attempt in range(1, max_retries + 1):
        try:
            response = CLIENT.chat.completions.create(
                model="gpt-4.1-mini",
                messages=[
                    {"role": "user", "content": prompt}
                ],
                temperature=0.0,
            )
            text = response.choices[0].message.content
            return text.strip()
        except Exception as e:
            last_error = e
            print(f"[call_llm] Error on attempt {attempt}: {e}")
            time.sleep(sleep_sec)

    raise RuntimeError(f"LLM call failed after {max_retries} attempts. Last error: {last_error}")


In [8]:
def parse_policy_result(raw_text: str) -> dict:
    """Parse the LLM's raw text into a JSON dict.

    - Assumes the model tries to output valid JSON.
    - Handles common patterns like ```json ... ``` wrappers.
    """
    raw_text = raw_text.strip()

    # Handle fenced code blocks
    if raw_text.startswith("```"):
        lines = raw_text.splitlines()
        if lines and lines[0].startswith("```"):
            lines = lines[1:]
        if lines and lines[-1].startswith("```"):
            lines = lines[:-1]
        raw_text = "\n".join(lines).strip()
        if raw_text.lower().startswith("json"):
            raw_text = raw_text[4:].strip()

    try:
        return json.loads(raw_text)
    except json.JSONDecodeError:
        try:
            start = raw_text.index("{")
            end = raw_text.rindex("}") + 1
            snippet = raw_text[start:end]
            return json.loads(snippet)
        except Exception as e:
            raise ValueError(f"JSON parsing failed. Raw text:\n{raw_text}\nError: {e}")


In [9]:
df = pd.read_csv(COMMENTS_CSV)

expected_cols = {"sample_id", "thread_id", "role", "order_in_thread", "text"}
if not expected_cols.issubset(df.columns):
    raise ValueError(f"CSV must contain columns: {expected_cols}. Found: {df.columns.tolist()}")

print(f"Loaded {len(df)} rows from comments.csv")
df.head()


Loaded 126 rows from comments.csv


Unnamed: 0,sample_id,thread_id,role,order_in_thread,text
0,1,1,post,0,핼스장 팬티 오늘도 다른 사람이 벗어둔거 킁킁해서 기분 좋다
1,2,1,comment,1,헬스장에서 팬티를 왜 벗음
2,3,1,comment,2,게이게이야..
3,4,2,post,0,어효 코스프레하는 애들은 하나같이 젖크고 얼굴 이쁘고 빼빼 말랐네
4,5,2,comment,1,학교에 있음?


In [10]:
results = []

for _, row in tqdm(df.iterrows(), total=len(df)):
    sample_id = row["sample_id"]
    thread_id = row["thread_id"]
    role = row["role"]
    order_in_thread = row["order_in_thread"]
    text = str(row["text"])

    prompt = build_prompt(text)
    raw_response = call_llm(prompt)
    parsed = parse_policy_result(raw_response)

    for policy_key in ["A", "B", "C"]:
        policy_data = parsed.get(policy_key, {})
        results.append({
            "sample_id": sample_id,
            "thread_id": thread_id,
            "role": role,
            "order_in_thread": order_in_thread,
            "policy": policy_key,
            "decision": policy_data.get("decision"),
            "short_reason": policy_data.get("short_reason"),
            "tone_score": policy_data.get("tone_score"),
            "debate_value_score": policy_data.get("debate_value_score"),
        })

results_df = pd.DataFrame(results)
results_df.head()


100%|██████████| 126/126 [06:33<00:00,  3.12s/it]


Unnamed: 0,sample_id,thread_id,role,order_in_thread,policy,decision,short_reason,tone_score,debate_value_score
0,1,1,post,0,A,BLOCK,The comment implies inappropriate and invasive...,4,1
1,1,1,post,0,B,BLOCK,The comment describes inappropriate behavior t...,4,1
2,1,1,post,0,C,BLOCK,The comment depicts invasive and inappropriate...,4,1
3,2,1,comment,1,A,ALLOW,The comment is a neutral question without insu...,2,2
4,2,1,comment,1,B,ALLOW,The comment is a neutral inquiry without perso...,2,2


In [11]:
results_df.to_csv(OUTPUT_CSV, index=False, encoding="utf-8-sig")
print("Saved results to:", OUTPUT_CSV)


Saved results to: C:\Users\Gibeom Kim\Desktop\UnderGraduate\3. junior\techno_science\results\comments_with_policy_results.csv
