# LLM Weaponisation Detection for Control Group

This notebook uses openai API key to detect if an edit from our control group is weaponising or not. The prompt given to ChatGPT is the folllowing :

```
You are an expert linguistic analysis assistant specializing in detecting subtle shifts in language that might be used to weaponize cultural heritage.

The input is a JSON record representing a Wikipedia article revision. The record follows one of these structures:
1. For the original article version (version "first_version"):
   {{
     "version": "first_version",
     "Content": "<full article text>"
   }}
2. For a revision (version "diff"):
   {{
     "version": "diff",
     "Timestamp": "<ISO timestamp>",
     "User": "<editor identifier>",
     "Comment": "<revision comment>",
     "Diff": "<textual diff showing changes>"
   }}

Instructions:
- If the record is a revision (version "diff"), focus on the "Diff" field and the context provided by "Comment" and "User".
- For an original version (version "first_version"), treat it as baseline text.
- Analyze the text for subtle shifts in tone, style, or content that may be used to weaponize cultural heritage.

Below are examples of weaponisation terms and narratives:
• Term (Ukraine’s Perspective):
  - "Russian occupation of Crimea"
  - "Russian destruction of cultural sites"
  - "War crimes in Mariupol"
  - "Ukrainian neo-Nazi threats to Russian heritage"
  - "Ukrainian language suppression in occupied territories"
• Term (Russia’s Perspective):
  - "Crimea’s reunification with Russia"
  - "Legitimacy of annexation"
  - "Ukrainian territory"
  - "New Russian regions"
  - "Who controls Donetsk, Luhansk, Zaporizhzhia, Kherson?"
  - "Protection of Russian speakers"
  - "Narrative of cultural oppression"
Implication or consequence: These terms may be used to justify actions, shift narratives, or frame cultural or territorial control in a specific light.

Using these examples as guidance, please provide a clear judgment ("Weaponised" or "Not Weaponised") along with a brief explanation citing specific linguistic indicators.

Here is the input JSON:
{record_json}

Your analysis:
```

The control group contains the following articles :

```python
control_articles = [
    "Pop music",
    "Rock and roll",
    "Eric Clapton",
    "Rolling Stone",
    "Jazz",
    "Swing",
    "Classical music",
    "Ludwig van Beethoven",
    "Wolfgang Amadeus Mozart",
    "Joseph Haydn",
    "Country music",
    "BTS",
    "K-Pop",
    "Electronic music",
    "Daft Punk",
    "Paul Kalkbrenner",
    "Trumpet",
    "Music theory",
    "Fender",
    "Marshall Amplification",
    "Jimi Hendrix",
    "Bob Marley",
    "Edith Piaf",
    "Royal Albert Hall",
    "Piano",
    "Saxophone",
    "Pink Floyd",
    "Nirvana (band)",
    "Nina Simone",
    "Music of Africa",
    "Major scale",
    "Major chord",
    "Minor chord",
    "Red Hot Chili Peppers",
    "Funk rock",
    "James Brown",
    "Dire Straits",
    "Mark Knofler",
    "John Frusciante",
    "Alan Clark",
    "Stevie Wonder",
    "Guitar"
]
```

In [None]:
import json
import openai
import os

openai.api_key = "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"
# Array of JSONL file names you want to analyze.
file_names = [
    "Holodomor.jsonl"
]

def load_jsonl(filepath):
    """Load a JSONL file and return a list of records."""
    records = []
    with open(filepath, "r", encoding="utf-8") as f:
        for line in f:
            if line.strip():
                records.append(json.loads(line))
    return records

def detect_weaponisation(record):
    """
    Analyzes a single revision record for indications of language weaponisation.

    Expected JSON structure:
      • For the original article version (version "first_version"):
          {
            "version": "first_version",
            "Content": "<full article text>"
          }
      • For a revision record (version "diff"):
          {
            "version": "diff",
            "Timestamp": "<ISO timestamp>",
            "User": "<editor identifier>",
            "Comment": "<revision comment>",
            "Diff": "<textual diff showing changes>"
          }
    
    The prompt instructs the model to examine the record for subtle shifts in language that might be weaponized.
    It uses insights from "Personas as a Way to Model Truthfulness in Language Models" and includes examples of weaponization.
    """
    record_json = json.dumps(record, indent=2, ensure_ascii=False)
    prompt = f"""
You are an expert linguistic analysis assistant specializing in detecting subtle shifts in language that might be used to weaponize cultural heritage.

The input is a JSON record representing a Wikipedia article revision. The record follows one of these structures:
1. For the original article version (version "first_version"):
   {{
     "version": "first_version",
     "Content": "<full article text>"
   }}
2. For a revision (version "diff"):
   {{
     "version": "diff",
     "Timestamp": "<ISO timestamp>",
     "User": "<editor identifier>",
     "Comment": "<revision comment>",
     "Diff": "<textual diff showing changes>"
   }}

Instructions:
- If the record is a revision (version "diff"), focus on the "Diff" field and the context provided by "Comment" and "User".
- For an original version (version "first_version"), treat it as baseline text.
- Analyze the text for subtle shifts in tone, style, or content that may be used to weaponize cultural heritage.

Below are examples of weaponisation terms and narratives:
• Term (Ukraine’s Perspective):
  - "Russian occupation of Crimea"
  - "Russian destruction of cultural sites"
  - "War crimes in Mariupol"
  - "Ukrainian neo-Nazi threats to Russian heritage"
  - "Ukrainian language suppression in occupied territories"
• Term (Russia’s Perspective):
  - "Crimea’s reunification with Russia"
  - "Legitimacy of annexation"
  - "Ukrainian territory"
  - "New Russian regions"
  - "Who controls Donetsk, Luhansk, Zaporizhzhia, Kherson?"
  - "Protection of Russian speakers"
  - "Narrative of cultural oppression"
Implication or consequence: These terms may be used to justify actions, shift narratives, or frame cultural or territorial control in a specific light.

Using these examples as guidance, please provide a clear judgment ("Weaponised" or "Not Weaponised") along with a brief explanation citing specific linguistic indicators.

Here is the input JSON:
{record_json}

Your analysis:
"""
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4o-mini",  # using the gpt-4o-mini model as requested
            messages=[
                {"role": "system", "content": "You are a linguistic analysis expert focused on detecting weaponization of cultural heritage."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.0,
            max_tokens=300,
        )
        return response.choices[0].message["content"].strip()
    except Exception as e:
        return f"Error during API call: {e}"

def analyze_file(input_file):
    """Analyze all records in the given JSONL file and return a list of result strings."""
    records = load_jsonl(input_file)
    print(f"Loaded {len(records)} records from {input_file}")
    results = []
    for idx, record in enumerate(records, start=1):
        analysis = detect_weaponisation(record)
        results.append(f"Record {idx} (Version: {record.get('version', 'N/A')}):\n{analysis}\n{'-'*80}\n")
        print(f"Processed record {idx} in {input_file}")
    return results

# Process each file in the list and save output to a file named after each input file + "_analysis.txt"
for file in file_names:
    results = analyze_file(file)
    base_name = os.path.splitext(os.path.basename(file))[0]
    output_file = f"{base_name}_analysis.txt"
    with open(output_file, "w", encoding="utf-8") as f:
        f.writelines(results)
    print(f"Analysis for {file} saved to {output_file}")