# **DetectGPT for dummies**: Identifying AI-generated text
This notebook implements the **DetectGPT** method from Mitchell et al. (2023) [1], which helps determine whether a given text is AI-generated. The approach involves perturbing the text and analyzing its log-probabilities.

In [3]:
import os
import re
import json
import torch
import random
from tqdm import tqdm
import matplotlib.pyplot as plt
from functools import lru_cache
from transformers import AutoModelForCausalLM, AutoTokenizer, T5ForConditionalGeneration, T5Tokenizer

import pandas as pd
from datasets import Dataset
from IPython.display import display
from datatime import datetime

In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"

## I- **Model setup**

This part is a simple setup of different transformer based models that will be needed to:
1. produce the AI-generated text - ``generation_model``
2. compute the log-probablities - ``computation_model``
3. perturb the text with the T5 perturbation - ``t5_model``

### 1. **Text generation**

In [5]:
CACHE_DIR = "/tmp/huggingface"

GENERATION_MODEL_NAME = "EleutherAI/gpt-j-6B"  # CHANGE GENERATION MODEL
# Model list (all tested)
# openai-community/gpt2
# openai-community/gpt2-large
# EleutherAI/gpt-j-6B
# EleutherAI/gpt-neox-20b

TORCH_DTYPE = torch.bfloat16 # use bfloat16 for all models

# Load model
generation_model = AutoModelForCausalLM.from_pretrained(GENERATION_MODEL_NAME, torch_dtype=TORCH_DTYPE, cache_dir=CACHE_DIR)

# Load tokenizer 
generation_tokenizer = AutoTokenizer.from_pretrained(GENERATION_MODEL_NAME)

# Set model to evaluation mode
generation_model.eval()

generation_model.to(device)
print(device)



cpu


### 2. **Computation**

In [6]:
CACHE_DIR = "/tmp/huggingface"
COMPUTATION_MODEL_NAME = "EleutherAI/gpt-j-6B"  # CHANGE COMPUTATION MODEL
# openai-community/gpt2-large
TORCH_DTYPE = torch.bfloat16 # use bfloat16 for all models

# Load model
computation_model = AutoModelForCausalLM.from_pretrained(COMPUTATION_MODEL_NAME, torch_dtype=TORCH_DTYPE, cache_dir=CACHE_DIR)

# Load tokenizer 
computation_tokenizer = AutoTokenizer.from_pretrained(COMPUTATION_MODEL_NAME, cache_dir=CACHE_DIR)
computation_tokenizer.pad_token = computation_tokenizer.eos_token

# Set model to evaluation mode (ensures stable log prob estimation + disables dropout)
computation_model.eval()

computation_model.to(device)
print(device)

cpu


### 3. **Perturbation**

In [7]:
CACHE_DIR = "/tmp/huggingface"
PERTURBATION_MODEL_NAME = "t5-large"    # CHANGE PERTURBATION MODEL
TORCH_DTYPE = torch.bfloat16 # use bfloat16 for all models

# Load model
t5_model = T5ForConditionalGeneration.from_pretrained(PERTURBATION_MODEL_NAME, torch_dtype=TORCH_DTYPE, cache_dir=CACHE_DIR)

# Load tokenizer 
t5_tokenizer = T5Tokenizer.from_pretrained(PERTURBATION_MODEL_NAME, cache_dir=CACHE_DIR)

# Set to evaluation mode
t5_model.eval()

t5_model.to(device)
print(device)

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


cpu


## II- **Code setup**

### 1. **🔀 Text perturbation**

This section defines the **T5-based perturbation function**, which modifies the input text slightly while preserving its meaning. 

- **Why is perturbation needed?** AI-generated text often sits in **low-curvature** probability regions, meaning slight perturbations can significantly change their log probabilities
- **How does it work?** The **T5 model** introduces variations to the text and helps in detecting AI-generated content

These perturbed texts will later be compared to their original versions to compute the discrepancy scores d.

In [8]:
def batch_mask_text(texts, mask_ratio=0.15, max_words=512): # CHANGE THIS TO CHANGE THE MAXIMUM NUMBER OF WORDS TO MASK (DETERMINED BY max_words * mask_ratio) (originally 370)
    """Mask multiple texts at once."""
    masked_texts = []
    
    for text in texts:
        words = text.split()    # Splits the text at whitespaces
        
        # Truncate text
        if len(words) > max_words:
            words = words[:max_words]
        
        num_masks = int(len(words) * mask_ratio)
        
        # Randomly select spans to mask (sorted in reverse to avoid index shifts)
        mask_indices = sorted(random.sample(range(len(words) - 1), num_masks), reverse=True)
        
        for i, idx in enumerate(mask_indices):
            words[idx] = f"<extra_id_{i}>"
            if idx + 1 < len(words):  # Ensure a 2-word span - CHANGE THIS TO CHANGE THE WORD SPAN
                del words[idx + 1]  # Remove instead of replacing with ""
        
        masked_texts.append(" ".join(words))
    
    return masked_texts

def batch_replace_masks(texts, batch_size=128):
    """Generate T5 model outputs for masked texts in batches."""
    all_outputs = []
    
    # Process in batches
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        n_expected = [text.count("<extra_id_") for text in batch_texts]
        stop_id = t5_tokenizer.encode(f"<extra_id_{max(n_expected)}>")[0]   # Stop token is the largest <extra_id_X> token in the batch of texts
        
        tokens = t5_tokenizer(batch_texts, return_tensors="pt", padding=True)   # Tokenize batch of texts
        input_ids = tokens["input_ids"].to(device)
        attention_mask = tokens["attention_mask"].to(device)
        
        # Move input tensors to model's device
        with torch.no_grad():
            outputs = t5_model.generate(
                input_ids=tokens["input_ids"].to(t5_model.device),
                attention_mask=tokens["attention_mask"].to(t5_model.device),    # Ensure padded tokens are ignored
                max_length=150, # CHANGE THIS TO CHANGE THE MAXIMUM LENGTH OF THE OUTPUT OF EACH INSTANCE
                do_sample=True,
                top_p=0.9,
                num_return_sequences=1,
                eos_token_id=stop_id
            )
            
        # Move outputs back to CPU to save GPU memory
        outputs = outputs.detach().cpu()
        batch_decoded = t5_tokenizer.batch_decode(outputs, skip_special_tokens=False)   # Decode model outputs
        all_outputs.extend(batch_decoded)
    
    return all_outputs

def batch_extract_fills(texts):
    """Extract the generated fills from T5's output for multiple texts."""
    extracted_fills = []
    for text in texts:
        text = text.replace("<pad>", "").replace("</s>", "").strip()    # Clean up text
        
        # Use regex to extract text inside <extra_id_X> tokens
        fills = re.findall(r"<extra_id_\d+>\s*(.*?)\s*(?=<extra_id_\d+>|$)", text)
        
        # Clean extracted tokens
        extracted_fills.append([fill.strip() for fill in fills])
    
    return extracted_fills

def batch_apply_extracted_fills(masked_texts, extracted_fills):
    """Replace mask tokens in the masked texts with generated fills."""
    filled_texts = []
    
    for masked_text, fills in zip(masked_texts, extracted_fills):
        if not fills:
            filled_texts.append(masked_text)
            continue
        
        filled_text = masked_text
        # Iterate through expected mask positions and replace them
        for i, fill in enumerate(fills):
            filled_text = filled_text.replace(f"<extra_id_{i}>", fill, 1)
        
        filled_texts.append(filled_text)
    
    return filled_texts

In [9]:
def t5_perturbation(text: str, batch_size: int) -> str:
    """
    T5 perturbation - batch version

    Args:
        text (str): the input texts to be perturbed
        batch_size (int): batch_size for compute

    Returns:
        all_perturbed_texts (str): the perturbed texts
    """
    # Step 1: mask all texts at once
    all_masked_texts = batch_mask_text(text)

    # Step 2: generate replacements in batches
    all_raw_fills = batch_replace_masks(all_masked_texts, batch_size)

    # Step 3: extract fills
    all_extracted_fills = batch_extract_fills(all_raw_fills)

    # Step 4: apply fills
    all_perturbed_texts = batch_apply_extracted_fills(all_masked_texts, all_extracted_fills)
    
    return all_perturbed_texts

### 2. **🔍 Main functions: *DetectGPT* Method**

This section implements the **DetectGPT method**.

- **Key idea:** once again, AI-generated texts often **reside in low-curvature probability regions**.
- **How does it work?**
  - We perturb the text multiple times (``num_perturbation``). We will use ``n_samples`` texts with ``max_length`` words.
  - Compute log probabilities for both **original** and **perturbed** texts
  - Measure the **discrepancy score** (a higher score suggests AI-generated text)

In [10]:
def batch_average_log_prob(texts, batch_size=128):
    """Calculate average log probability for multiple texts in batches."""
    
    all_log_probs = []
    
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]

        # Tokenize input
        inputs = computation_tokenizer(batch_texts, return_tensors="pt", padding=True, truncation=True)
        input_ids = inputs["input_ids"].to(device)
        attention_mask = inputs["attention_mask"].to(device)

        with torch.no_grad():
            outputs = computation_model(input_ids, labels=input_ids, attention_mask=attention_mask)

        # Extract logits
        logits = outputs.logits  # [batch_size, seq_len, vocab_size]

        # Shift logits and labels to align
        shift_logits = logits[..., :-1, :].contiguous()
        shift_labels = input_ids[..., 1:].contiguous()
        shift_mask = attention_mask[..., 1:].contiguous()  # Ensure mask aligns

        # Compute per-token loss
        loss_fct = torch.nn.CrossEntropyLoss(reduction='none', ignore_index=computation_tokenizer.pad_token_id)
        loss_per_token = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

        # Reshape to [batch_size, seq_length - 1]
        loss_per_token = loss_per_token.view(shift_labels.size())

        # Compute per-sample log prob
        sample_losses = []
        for j in range(loss_per_token.size(0)):
            mask = shift_mask[j].bool()  # Use shift_mask for actual tokens
            if mask.sum() > 0:
                sample_loss = loss_per_token[j][mask].mean().item()
                sample_losses.append(-sample_loss)  # Negative loss as log prob
            else:
                sample_losses.append(float('-inf'))  # Avoid zero prob bias

        all_log_probs.extend(sample_losses)

    return all_log_probs

In [None]:
# Main optimized processing loop
def optimized_processing(data, num_samples=200, max_length=512, num_perturbation=100, batch_size=128):
    log_probs_per_text_transformed = []
    
    # Process original texts in batches
    original_texts = [" ".join(data[j]["text"].split()[:max_length]) for j in range(num_samples)]
    log_probs_per_text_base = batch_average_log_prob(original_texts, batch_size)
    
    # Inside the loop in optimized_processing()
    for perturbation_idx in tqdm(range(num_perturbation), desc=f"Processing {num_perturbation} perturbations for {num_samples} texts. Perturbation number:"):
        all_perturbed_texts = t5_perturbation(original_texts,batch_size)
        all_log_probs = batch_average_log_prob(all_perturbed_texts, batch_size)
        
        # Organize results by original text
        for j in range(num_samples):
            if perturbation_idx == 0:
                log_probs_per_text_transformed.append([])
            log_probs_per_text_transformed[j].append(all_log_probs[j])
    
    return log_probs_per_text_base, log_probs_per_text_transformed

In [None]:
def gen_perturbed(data, num_samples=200, max_length=512, num_perturbation=100, batch_size=128):
    '''Generates pertubations for text. Returns list of length num_pertubations, each entry being a JSON object with perturbed text'''
    # Initialise list to store all perturbed JSON
    all_perturbed_texts = []
    
    # Process original texts in batches
    original_texts = [" ".join(data[j]["text"].split()[:max_length]) for j in range(num_samples)]   # Truncate text to max_length, returns list of strings as before
    
    # Iterate for length num_pertubation
    for perturbation_idx in tqdm(range(num_perturbation), desc=f"Processing {num_perturbation} perturbations for {num_samples} texts. Perturbation number:"):

        # Randomly select 15% of text to mask, creates pertubations
        perturbed_texts = t5_perturbation(original_texts,batch_size)

        # Store in list
        all_perturbed_texts.append(perturbed_texts)
    
    return all_perturbed_texts
    # The output is a list of length num_pertubations, each entry being a list of length num_samples, each entry being a string
    # (so the outer list is over the perturbations, the inner list is over the samples, and the string is the perturbed text)

In [None]:
def compare_log_prob(original_texts, all_perturbed_texts, num_perturbation=100, batch_size=128):
    '''Compares log probs of original text vs list of JSONs perturbed texts'''
    # Initialise list to store log prob of each perturbed JSON
    
    log_probs_per_text_pert = []

    # Get num_samples
    num_samples = len(original_texts)

    # Calculate log prob for original text in batch
    log_probs_per_text_base = batch_average_log_prob(original_texts, batch_size)

    # Iterate 
    for perturbation in tqdm(range(num_perturbation)):

        # Get the JSON file
        perturbed_texts = all_perturbed_texts[perturbation]

        # Calculate log prob
        log_probs_per_text_tran = batch_average_log_prob(perturbed_texts, batch_size)

        # Organize results by original text
        for j in range(num_samples):
            if perturbation == 0:
                log_probs_per_text_pert.append([])
            log_probs_per_text_pert[j].append(log_probs_per_text_tran[j])
        
    return log_probs_per_text_base, log_probs_per_text_pert
    # log_probs_per_text_base is a list of length num_samples, each entry being the log prob of the corresponding original text
    # log_probs_per_text_pert is a list of length num_samples, each entry being a list of length num_pertubations, each entry being the log prob of the corresponding perturbed text

In [None]:
def compute_perplexity(data: list, n_samples: int, max_length: int) -> list:
    """
    Compute perplexity of each individual text in data

    Args:
        data (list): list of dictionaries containing text
        n_samples (int): number of texts to process
        max_length (int): max length for tokenization

    Returns:
        perplexities (list): perplexity scores of each text
    """
    
    original_texts = [" ".join(data[j]["text"].split()[:max_length]) for j in range(n_samples)]
    perplexities = []

    # Tokenize all inputs at once (better efficiency)
    inputs = computation_tokenizer(original_texts, return_tensors="pt", padding=True, truncation=True)
    
    # Move tensors to GPU
    input_ids = inputs.input_ids.to(device)
    attention_mask = inputs.attention_mask.to(device)

    with torch.no_grad():
        outputs = computation_model(input_ids, attention_mask=attention_mask)
        logits = outputs.logits
    
    # Shift logits and labels to align
    shift_logits = logits[:, :-1, :].contiguous()
    shift_labels = input_ids[:, 1:].contiguous()
    shift_mask = attention_mask[:, 1:].contiguous() # Ensure mask aligns

    # per-token loss for each sequence
    loss_fct = torch.nn.CrossEntropyLoss(reduction='none', ignore_index=computation_tokenizer.pad_token_id)
    loss_per_token = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

    # Reshape
    loss_per_token = loss_per_token.view(shift_labels.size())

    # Compute per-sentence loss by averaging over valid tokens
    valid_token_counts = shift_mask.sum(dim=1) # Number of valid tokens per sample
    sentence_losses = (loss_per_token * shift_mask).sum(dim=1) / valid_token_counts.clamp(min=1) # Avoid division by zero

    # Compute perplexity per sample
    perplexities = torch.exp(sentence_losses).cpu().tolist()
    return perplexities

In [12]:
def compute_detectgpt_discrepancy(log_probs_per_text_base: list, log_probs_per_text_transformed: list, normalization: bool=True) -> list:
    """
    Compute the DetectGPT discrepancy metric for each of the n_samples texts
    Calculated for num_perturbations perturbations

    Args:
        log_probs_per_text_base (list): original log probability of each text
        log_probs_per_text_transformed (list): list of size n_samples where each element is a list of the num_perturbations perturbed log probabilities
        normalization (bool)

    Returns:
        discrepancy_scores (list): list of discrepancy values (d) for the n_samples texs
    """
    num_samples = len(log_probs_per_text_base) 
    discrepancy_scores = []

    for i in range(num_samples):
        original_log_prob = log_probs_per_text_base[i]
        perturbed_log_probs = log_probs_per_text_transformed[i] # List of perturbed log probs
        num_perturbations = len(perturbed_log_probs) # Number of perturbations

        # Compute mean log probability of the perturbed texts
        mu = sum(perturbed_log_probs) / num_perturbations  

        # Compute discrepancy
        discrepancy_score_unormalized = original_log_prob - mu
        if normalization:
            # Normalize
            variance = sum((log_prob - mu) ** 2 for log_prob in perturbed_log_probs) / (num_perturbations - 1)
            sigma = variance ** 0.5
            discrepancy_score_normalized = discrepancy_score_unormalized / sigma if sigma > 0 else 0
            discrepancy_scores.append(discrepancy_score_normalized)
        else:
            discrepancy_scores.append(discrepancy_score_unormalized)
    
    return discrepancy_scores

### 3. **Utility functions**

In [13]:
# Memory management utilities
def clear_cuda_cache():
    """Clear CUDA cache to free up memory."""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()


# Add caching for tokenization
@lru_cache(maxsize=1024)
def cached_tokenize(text, is_t5=False):
    """Cache tokenization results to avoid repeated work."""
    if is_t5:
        return t5_tokenizer(text, return_tensors="pt", padding=True)
    else:
        return computation_tokenizer(text, return_tensors="pt", padding=True, truncation=True)

## III- **Data loading**

**📌 Dataset format guidelines**

All datasets (human-written and AI-generated) must follow this format:

- Stored as a **`.jsonl`** where each line is a dictionary.
- Each entry contains (minimum requirement):
  - `"text"`: the text content
  - `"model"`: for human text please label it as `"human"` and for AI-generated texts, please specify the model used (e.g. ``"gpt2-large"``)
  - `"source"`: the origin of the text (e.g., `"wikihow"`, `"reddit"`, `"news articles"`)

#### Exemple (as in ``subtaskB_train.jsonl`` located in `Datasets\SemEval2024-Task8`):
```json
{"text": "A groundbreaking discovery in physics was made today.", "model": "human", "source": "news articles"}
{"text": "The AI revolution is shaping the future of work.", "model": "chatGPT", "source": "AI Generated"}


### 1. **Human texts**

In [5]:
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Data/EdinburghNLP___xsum/default/1.2.0/40db7604fedb616a9d2b0673d11838fa5be8451c/xsum-test.arrow"  # CHANGE THIS TO THE PATH OF THE HUMAN-GENERATED TEXT FILE
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

if not os.path.exists(FILE_PATH):
    raise FileNotFoundError(f"File not found: {FILE_PATH}")


# For XSUM data - convert into a list of dictionaries
# Load dataset using Hugging Face's datasets library
dataset = Dataset.from_file(FILE_PATH)
# Keep only the first 200 rows
dataset = dataset.select(range(200))    # CHANGE THIS TO CONTROL THE NUMBER OF DATAPOINTS TO USE

# Convert to Pandas DataFrame
df = dataset.to_pandas()
# Adapt the dataframe to the desired style
df = df.rename(columns={"document": "text"}).drop(columns=["summary"])  # Rename and drop columns
df["model"] = "human"   # Add a new column to track that all these documents are human-generated

# Convert to list of dictionaries
data_human = df.to_dict(orient="records")

# Display first few rows of the list
display(data_human[:5])  # Display first 5 rows as sample output


## For JSON format
# data_human = []
## Efficiently process the file line by line
#with open(FILE_PATH, "r", encoding="utf-8") as file:
#    for line in file:
#        record = json.loads(line)  # Parse JSON once
#        if record.get("model") == "human":
#            data_human.append(record)
#
## Print first human record
#print("First human text record:", data_human[0] if data_human else "No human data found.")

[{'text': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the ne

### 2. **AI-generated texts**

#### 2.1. *Option 1: produce own AI-generated texts*

In [17]:
def generate_text(prompt: str, max_length: int) -> str:
    """
    Generate AI text from a given prompt

    Args:
        prompt (str): prompt to generate text
        max_length (int): max length of generated text

    Returns:
        cleaned_text (str): cleaned generated text
    """
    inputs = generation_tokenizer(prompt, return_tensors="pt").to(generation_model.device)
    with torch.no_grad():
        output = generation_model.generate(**inputs, max_length=max_length, do_sample=True, temperature=1.0)    # CHANGE THE GENERATION MODEL TEMPERATURE HERE (originally 0.7)
    
    generated_text = generation_tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text
    # The output is a string of the generated text


def generate_prompt(prompt_list: list) -> list:
    """
    Truncate the 'text' field in each dictionary to the first 30 tokens.

    Args:
        prompt_list (list): List of dictionaries with keys 'text', 'id', and 'model'.

    Returns:
        list: Updated list with truncated 'text' values.
    """
    truncated_prompts = []
    for entry in prompt_list:
        tokens = generation_tokenizer.tokenize(entry['text'])
        truncated_text = generation_tokenizer.convert_tokens_to_string(tokens[:30]) # CHANGE THIS TO CONTROL THE NUMBER OF TOKENS THE GENERATION MODEL IS GIVEN AS A PROMPT
        truncated_prompts.append({
            "text": truncated_text,
            "id": entry["id"],
            "model": entry["model"]
        })
    return truncated_prompts
    # The output is a list of dictionaries with keys 'text', 'id', and 'model'


def generate_dataset(prompt_list: list, max_length: int, generation_model_name: str) -> list:
    """
    Generates a dataset of AI-generated texts based on the given original prompt list.

    Args:
        prompt_list (list): Original list of dictionaries with full 'text'.
        max_length (int): Maximum length of each generated text.

    Returns:
        list: Dataset of AI-generated texts (list of dictionaries).
    """
    truncated_prompts = generate_prompt(prompt_list)  # Apply generate_prompt within generate_dataset

    data_ai = [
        {
            "text": generate_text(entry["text"], max_length),
            "id": entry["id"],
            "model": generation_model_name
        }
        for entry in truncated_prompts
    ]
    return data_ai
    # The output is a list of dictionaries, each dictionary has keys 'text', 'id', and 'model'

In [18]:
# Create the AI-generated dataset

max_length = 1000    # Maximum length of each generated text (originally 100)
data_ai_generated = generate_dataset(data_human, max_length, GENERATION_MODEL_NAME)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [19]:
# Function to save the AI-generated dataset in the correct .jsonl format

def save_ai_dataset_jsonl(data, file_name_base, generation_model_name):
    """
    Saves a dataset in JSONL format in a structured directory.
    
    Parameters:
        data (list): List of dictionaries to be saved.
        file_name_base (str): Base name for the file.
    """
    BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"{file_name_base}_generated_by_{generation_model_name}_{timestamp}.jsonl"   # CHANGE THIS TO CONTROL THE FORMAT OF THE FILE NAME
    file_relative_path = os.path.join("Data", "AI-Generated", file_name)
    file_path = os.path.join(BASE_DIR, file_relative_path)
    
    # Ensure directory exists
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    
    # Save the dataset in JSONL format
    with open(file_path, "w", encoding="utf-8") as f:
        for entry in data:
            f.write(json.dumps(entry) + "\n")
    
    print(f"Dataset saved at: {file_path}")

In [20]:
# Save and check the AI-generated dataset

# Save the AI-generated dataset
save_ai_dataset_jsonl(data_ai_generated, "XSUM_200_Samples", GENERATION_MODEL_NAME) # CHANGE THIS TO CONTROL THE FILE NAME

# Print first AI-generated text record
print("First AI-generated text record:", data_ai_generated[0] if data_ai_generated else "No AI-generated data found.")

First AI-generated text record: {'text': 'In a faraway galaxy, the alien galaxy is far away, and on the surface of that distant galaxy, we see the stars. The galaxies are very dense, and this is a big difference.\n\nBut there are many other galaxies, and they are very dense, and we can see some of them. There are three other galaxies, and they are very dense, and they are very dark. We can look at these stars as if they are the equivalent of a telescope. We can look', 'model': 'gpt2', 'source': 'FleLLM'}


#### 2.2. *Option 2: load AI-generated texts from a dataset*

In [29]:
# BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
# FILE_RELATIVE_PATH = "Datasets\SemEval2024-Task8\subtaskB_train.jsonl"  # CHANGE THIS TO THE PATH OF THE AI-GENERATED TEXT FILE
# # FILE_RELATIVE_PATH = "Datasets\AI-generated\dataset_ai.jsonl"
# FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

# if not os.path.exists(FILE_PATH):
#     raise FileNotFoundError(f"File not found: {FILE_PATH}")

# data_ai_dataset = []

# # Read entire file and parse as JSON list
# with open(FILE_PATH, "r", encoding="utf-8") as file:
#     for line in file:
#         record = json.loads(line)  # Parse JSON once
#         if record.get("model") != "human":
#             data_ai_dataset.append(record)

# # Print first AI-generated text record
# print("First AI-generated text record:", data_ai_dataset[0] if data_ai_dataset else "No AI-generated data found.")

First AI-generated text record: {'text': 'Forza Motorsport is a popular racing game that provides players with the ability to race on various tracks and in different vehicles. Whether you\'re a seasoned racer or a newbie, playing Forza Motorsport can be a fun experience. In this article, we will take you through the different steps on how to play Forza Motorsport.\n\nStep 1. Insert The Game Disc\n\nThe first step is to insert the game disc into your console or computer. Follow the instructions to set up the game.\n\nStep 2. Choose Your Game\n\nOnce the game is set up, choose the game you\'d like to play. Forza Motorsport has different modes: Career, Free Play, and Arcade. In this article, we will focus on the Arcade mode.\n\nStep 3. Just Make A Quick Race By The Arcade Mode\n\nOnce the Arcade mode is selected, choose "Quick Race" to get started quickly.\n\nStep 4. Pick A Racetrack\n\nPick a racetrack from the different ones available like Road Atlanta, New York, Rio de Janeiro, Maple V

## IV- **Exemple usage**

In [None]:
# Function to save the perturbations in the correct .jsonl format

def save_perturbations_jsonl(all_perturbed_texts, file_name_base, generation_model_name, perturbation_model_name):
    """
    Saves perturbed texts in JSONL format, where each line is a list of perturbed texts (one perturbation batch).

    Parameters:
        all_perturbed_texts (list of lists): List of lists containing perturbed texts.
        file_name_base (str): Base name for the file.
        generation_model_name (str): Name of the model that generated the original texts.
        perturbation_model_name (str): Name of the model that generated the perturbations.
    """
    BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"{file_name_base}_generated_by_{generation_model_name}_perturbed_by_{perturbation_model_name}_{timestamp}.jsonl"
    file_relative_path = os.path.join("Data", "Perturbations", file_name)
    file_path = os.path.join(BASE_DIR, file_relative_path)
    
    # Ensure directory exists
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    
    # Save the dataset in JSONL format
    with open(file_path, "w", encoding="utf-8") as f:
        for perturbed_texts in all_perturbed_texts:
            f.write(json.dumps(perturbed_texts) + "\n")
    
    print(f"Dataset saved at: {file_path}")

In [None]:
# Function to save the results in the correct .jsonl format

def save_results_jsonl(data, file_name_base, generation_model_name):
    """
    Saves a dataset in JSONL format in a structured directory.
    
    Parameters:
        data (list): List of results to be saved.
        file_name_base (str): Base name for the file.
    """
    BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"{file_name_base}_{generation_model_name}_{timestamp}.jsonl"   # CHANGE THIS TO CONTROL THE FORMAT OF THE FILE NAME
    file_relative_path = os.path.join("Results", file_name)
    file_path = os.path.join(BASE_DIR, file_relative_path)
    
    # Ensure directory exists
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    
    # Save the dataset in JSONL format
    with open(file_path, "w") as f:
        json.dump(data, f, indent=2)
    
    print(f"Dataset saved at: {file_path}")

Human texts

In [None]:
DATA = data_human
NUM_SAMPLES = 200
MAX_LENGTH = 512
NUM_PERTURBATIONS = 100
BATCH_SIZE = 128

# log_probs_base_human, log_probs_transformed_human = optimized_processing(DATA, NUM_SAMPLES, MAX_LENGTH, NUM_PERTURBATIONS, BATCH_SIZE)
perturbed_texts_human = gen_perturbed(DATA, NUM_SAMPLES, MAX_LENGTH, NUM_PERTURBATIONS, BATCH_SIZE)
save_perturbations_jsonl(perturbed_texts_human, "XSUM__Human_100_Perturbations_200_Samples", "Human", PERTURBATION_MODEL_NAME) # CHANGE THIS TO CONTROL THE FILE NAME

log_probs_base_human, log_probs_transformed_human = compare_log_prob(DATA, perturbed_texts_human, NUM_PERTURBATIONS, BATCH_SIZE)
discrepancy_scores_human = compute_detectgpt_discrepancy(log_probs_base_human, log_probs_transformed_human)

results_human = {}
results_human["log_probs_base"] = log_probs_base_human
results_human["log_probs_transformed"] = log_probs_transformed_human
results_human["discrepancy_scores"] = discrepancy_scores_human

In [None]:
# Save the results

save_results_jsonl(results_human, "XSUM__Human_100_Perturbations_200_Samples", "Human") # CHANGE THIS TO CONTROL THE FILE NAME

AI-generated texts

In [None]:
DATA = data_ai_generated
NUM_SAMPLES = 200
MAX_LENGTH = 512
NUM_PERTURBATIONS = 100
BATCH_SIZE = 128

# log_probs_base_ai, log_probs_transformed_ai = optimized_processing(DATA, NUM_SAMPLES, MAX_LENGTH, NUM_PERTURBATIONS, BATCH_SIZE)
perturbed_texts_ai = gen_perturbed(DATA, NUM_SAMPLES, MAX_LENGTH, NUM_PERTURBATIONS, BATCH_SIZE)
save_perturbations_jsonl(perturbed_texts_ai, "XSUM_AI_100_Perturbations_200_Samples", GENERATION_MODEL_NAME, PERTURBATION_MODEL_NAME) # CHANGE THIS TO CONTROL THE FILE NAME

log_probs_base_ai, log_probs_transformed_ai = compare_log_prob(DATA, perturbed_texts_ai, NUM_PERTURBATIONS, BATCH_SIZE)
discrepancy_scores_ai = compute_detectgpt_discrepancy(log_probs_base_ai, log_probs_transformed_ai)

results_ai = {}
results_ai["log_probs_base"] = log_probs_base_ai
results_ai["log_probs_transformed"] = log_probs_transformed_ai
results_ai["discrepancy_scores"] = discrepancy_scores_ai

In [None]:
# Save the results

save_results_jsonl(results_ai, "XSUM_AI_100_Perturbations_200_Samples", GENERATION_MODEL_NAME) # CHANGE THIS TO CONTROL THE FILE NAME

## V- **Analysis**

In [None]:
NORMALIZATION = True

In [None]:
# # AI texts results
# BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
# FILE_RELATIVE_PATH = "Results\\"
# FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH)

In [None]:
# with open(FILE_PATH, 'r') as file:
#     data_ai = json.load(file)
# log_probs_base_ai = data_ai["log_probs_base"]
# log_probs_transformed_ai = data_ai["log_probs_transformed"]

# discrepancy_scores_ai = compute_detectgpt_discrepancy(log_probs_base_ai,log_probs_transformed_ai,normalization=NORMALIZATION)

In [None]:
# # Human texts results
# BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
# FILE_RELATIVE_PATH = "Results\\"
# FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH)

In [None]:
# with open(FILE_PATH, 'r') as file:
#     data_human = json.load(file)
# log_probs_base_human = data_human["log_probs_base"]
# log_probs_transformed_human = data_human["log_probs_transformed"]

# discrepancy_scores_human = compute_detectgpt_discrepancy(log_probs_base_human,log_probs_transformed_human,normalization=NORMALIZATION)

In [None]:
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression

def optimal_threshold(list1, list2):
    X = np.concatenate([list1, list2]).reshape(-1, 1)
    y = np.concatenate([np.zeros(len(list1)),np.ones(len(list2))])
    
    clf = LogisticRegression()
    clf.fit(X, y)
    
    best_threshold = -clf.intercept_[0] / clf.coef_[0][0]
    
    # Predict
    y_pred = (clf.predict_proba(X)[:, 1] >= 0.5).astype(int)
    auroc = roc_auc_score(y, y_pred)
    
    return best_threshold, auroc

In [None]:
threshold,auroc = optimal_threshold(discrepancy_scores_human, discrepancy_scores_ai)
print(f"Optimal threshold: {threshold:.2f}")
print(f"AUROC: {auroc:.2f}")

plt.hist(discrepancy_scores_human, bins=15, alpha=0.5, label='Human', edgecolor='black', density=True)
plt.hist(discrepancy_scores_ai, bins=15, alpha=0.5, label='AI', edgecolor='black', density=True)

plt.axvline(threshold, color='red', linestyle='dashed', linewidth=2, label=f'Threshold = {threshold:.2f}')
plt.xlabel(f'Discrepancy scores (normalization={normalization})')
plt.ylabel('Frequency')
plt.title(f'Computation model: {COMPUTATION_MODEL_NAME}')
plt.legend()
plt.show()

## References
[1] E. Mitchell, C. Lin, A. Bosselut, and C. D. Manning, "DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature" *arXiv preprint*, 2023. Available at: [arXiv:2301.11305](https://arxiv.org/abs/2301.11305)