# **DetectGPT for dummies**: Identifying AI-generated text
This notebook implements the **DetectGPT** method from Mitchell et al. (2023) [1], which helps determine whether a given text is AI-generated. The approach involves perturbing the text and analyzing its log-probabilities.

In [None]:
import os
import re
import json
import torch
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
from transformers import AutoModelForCausalLM, AutoTokenizer, T5ForConditionalGeneration, T5Tokenizer

In [None]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(DEVICE)

## I- **Model setup**

This part is a simple setup of different transformer based models that will be needed to:
1. produce the AI-generated text - ``generation_model``
2. compute the log-probablities - ``computation_model``
3. perturb the text with the T5 perturbation - ``t5_model``

### 1. **Text generation**

In [None]:
CACHE_DIR = "/tmp/huggingface"

GENERATION_MODEL_NAME = "openai-community/gpt2-large"
# Model list (all tested)
# gpt2
# gpt2-large
# EleutherAI/gpt-j-6B
# EleutherAI/gpt-neox-20b

TORCH_DTYPE = torch.bfloat16 # use bfloat16 for all models

# Load model
generation_model = AutoModelForCausalLM.from_pretrained(GENERATION_MODEL_NAME, device_map="auto", torch_dtype=TORCH_DTYPE, cache_dir=CACHE_DIR)

# Load tokenizer 
generation_tokenizer = AutoTokenizer.from_pretrained(GENERATION_MODEL_NAME)

# Set model to evaluation mode
generation_model.eval()

### 2. **Computation**

In [None]:
CACHE_DIR = "/tmp/huggingface"
COMPUTATION_MODEL_NAME = "openai-community/gpt2-large"
TORCH_DTYPE = torch.bfloat16 # use bfloat16 for all models

# Load model
computation_model = AutoModelForCausalLM.from_pretrained(COMPUTATION_MODEL_NAME, device_map="auto", torch_dtype=TORCH_DTYPE, cache_dir=CACHE_DIR)

# Load tokenizer 
computation_tokenizer = AutoTokenizer.from_pretrained(COMPUTATION_MODEL_NAME, cache_dir=CACHE_DIR)
computation_tokenizer.pad_token = computation_tokenizer.eos_token

# Set model to evaluation mode (ensures stable log prob estimation + disables dropout)
computation_model.eval()

computation_model.to(DEVICE)
print(DEVICE)

### 3. **Perturbation**

In [None]:
CACHE_DIR = "/tmp/huggingface"
PERTURBATION_MODEL_NAME = "t5-large"
TORCH_DTYPE = torch.bfloat16 # use bfloat16 for all models

# Load model
t5_model = T5ForConditionalGeneration.from_pretrained(PERTURBATION_MODEL_NAME, device_map="auto", torch_dtype=TORCH_DTYPE, cache_dir=CACHE_DIR)

# Load tokenizer 
t5_tokenizer = T5Tokenizer.from_pretrained(PERTURBATION_MODEL_NAME, cache_dir=CACHE_DIR)

# Set to evaluation mode
t5_model.eval()

t5_model.to(DEVICE)
print(DEVICE)

## II- **Code setup**

### 1. **🔀 Text perturbation** *NEW VERSION*

This section defines the **T5-based perturbation function**, which modifies the input text slightly while preserving its meaning. 

- **Why is perturbation needed?** AI-generated text often sits in **low-curvature** probability regions, meaning slight perturbations can significantly change their log probabilities
- **How does it work?** The **T5 model** introduces variations to the text and helps in detecting AI-generated content

These perturbed texts will later be compared to their original versions to compute the discrepancy scores d.

In [None]:
def tokenize_and_mask(text: str, span_length: int, pct: float, buffer_size: int, ceil_pct: bool = False) -> str:
    """
    Tokenizes a text and applies masking by replacing certain spans with placeholder tokens

    Args:
        text (str): input text to be tokenized and masked
        span_length (int): length of each masked span
        pct (float): percentage of the text to be masked (as in DetectGPT codebase, so not exactly pure percentage of the text)
        buffer_size (int): buffer size around masked spans (to prevent overlap)
        ceil_pct (bool - optional): whether to round up the number of spans (Default: False)

    Return:
        text (str): masked text with placeholder token
    """
    tokens = text.split(' ')
    mask_string = '<<<mask>>>'

    # Calculate number of masked spans
    n_spans = pct * len(tokens) / (span_length + buffer_size * 2)

    if ceil_pct:
        n_spans = np.ceil(n_spans)
    n_spans = int(n_spans)

    n_masks = 0
    while n_masks < n_spans:
        start = np.random.randint(0, len(tokens) - span_length)
        end = start + span_length
        search_start = max(0, start - buffer_size)
        search_end = min(len(tokens), end + buffer_size)

        # Ensure no overlapping masks in buffer region
        if mask_string not in tokens[search_start:search_end]:
            tokens[start:end] = [mask_string]
            n_masks += 1
    
    # Replace each occurrence of mask_string with <extra_id_NUM>, where NUM increments
    num_filled = 0
    for idx, token in enumerate(tokens):
        if token == mask_string:
            tokens[idx] = f'<extra_id_{num_filled}>'
            num_filled += 1

    assert num_filled == n_masks, f"num_filled {num_filled} != n_masks {n_masks}"

    text = ' '.join(tokens)
    return text

def count_masks(texts: list[str]) -> list[int]:
    """
    Counts the number of mask tokens in each text

    Args:
        texts (list): list of texts containing mask tokens (format: "<extra_id_N>" where N is any int)

    Returns:
        n_masks (list): list where each element represents the number of mask tokens in the corresponding text
    """
    n_masks = [len([x for x in text.split() if x.startswith("<extra_id_")]) for text in texts]
    return n_masks

def replace_masks(texts: list[str]) -> list[str]:
    """
    Replaces masked spans in texts with generated text using a T5 model

    Args:
        texts (list): list of texts containing mask tokens (format: "<extra_id_N>" where N is any int)

    Returns:
        (list): list of texts where masked spans have been replaced with generated content
    """
    n_expected = count_masks(texts) # Count number of masks per text
    stop_id = t5_tokenizer.encode(f"<extra_id_{max(n_expected)}>")[0] # Define stopping condition

    # Tokenize the input texts
    tokens = t5_tokenizer(texts, return_tensors="pt", padding=True).to(DEVICE)

    # Generate replacements for the masks using T5
    outputs = t5_model.generate(
        **tokens, 
        max_length=150, 
        do_sample=True, 
        top_p=0.96, 
        num_return_sequences=1, 
        eos_token_id=stop_id
    )

    # Decode the generated output
    texts_replaced = t5_tokenizer.batch_decode(outputs, skip_special_tokens=False)
    return texts_replaced

# Define a regex pattern to match all placeholder tokens in the format <extra_id_N> (where N is any int)
pattern = re.compile(r"<extra_id_\d+>")

def extract_fills(texts: list[str]) -> list[list[str]]:
    """
    Extracts the generated text fills from masked texts

    Args:
        texts (list): list of texts where masked spans have been replaced with generated text

    Returns:
        extracted_fills (list): a list of lists, where each inner list contains the extracted fills for the corresponding input text
    """
    # Remove "<pad>" and "</s>" tokens from the beginning/end of each text
    texts = [x.replace("<pad>", "").replace("</s>", "").strip() for x in texts]

    # Extract the text between mask tokens (pattern should be defined elsewhere)
    extracted_fills = [pattern.split(x)[1:-1] for x in texts]

    # Trim whitespace from each extracted fill
    extracted_fills = [[y.strip() for y in x] for x in extracted_fills]

    return extracted_fills

def apply_extracted_fills(masked_texts: list[str], extracted_fills: list[list[str]]) -> list[str]:
    """
    Replaces mask tokens in masked texts with corresponding extracted fills

    Args:
        masked_texts (list): list of texts containing mask tokens
        extracted_fills (list): list of lists, where each inner list contains the extracted fill text for the corresponding masked text

    Returns:
        texts (list): list of texts with all masks replaced by their corresponding extracted fills
    """
    # Split masked text into tokens, keeping spaces intact
    tokens = [x.split(' ') for x in masked_texts]

    # Count expected number of masks per text
    n_expected = count_masks(masked_texts)

    # Replace each mask token with the corresponding extracted fill
    for idx, (text, fills, n) in enumerate(zip(tokens, extracted_fills, n_expected)):
        if len(fills) < n:
            tokens[idx] = []  # Empty text if not enough fills are available
        else:
            for fill_idx in range(n):
                if f"<extra_id_{fill_idx}>" in text:
                    text[text.index(f"<extra_id_{fill_idx}>")] = fills[fill_idx]

    # Join tokens back into text format
    texts = [" ".join(x) for x in tokens]
    return texts

def perturb_texts_(texts: list[str], span_length: int, pct: float, buffer_size: int, ceil_pct: bool = False) -> list[str]:
    """
    Applies T5-based perturbation to a list of texts by masking spans, generating replacements,and applying the generated fills

    Args:
        texts (list): list of input texts to be perturbed
        span_length (int): length of each masked span
        pct (float): percentage of the text to be masked (as in DetectGPT codebase, so not exactly pure percentage of the text)
        buffer_size (int): buffer size around masked spans (to prevent overlap)
        ceil_pct (bool, optional): whether to round up the number of spans (Default: False)

    Returns:
        perturbed_texts (list): list of perturbed texts
    """
    # Step 1: Mask spans in the input texts
    masked_texts = [tokenize_and_mask(x, span_length, pct, buffer_size, ceil_pct) for x in texts]
    print(f"Masked texts: {masked_texts}")

    # Step 2: Generate replacement texts
    raw_fills = replace_masks(masked_texts)
    print(f"Raw fills: {raw_fills}")

    # Step 3: Extract only the generated fills
    extracted_fills = extract_fills(raw_fills)
    print(f"Extracted fills: {extracted_fills}")

    # Step 4: Apply the extracted fills to reconstruct the perturbed texts
    perturbed_texts = apply_extracted_fills(masked_texts, extracted_fills)
    print(f"Perturbed texts: {perturbed_texts}")
    print(f"Original texts: {texts}")

    # Handle cases where the model doesn't generate the correct number of fills
    attempts = 1
    while '' in perturbed_texts:
        idxs = [idx for idx, x in enumerate(perturbed_texts) if x == '']
        print(f'WARNING: {len(idxs)} texts have no fills. Retrying [attempt {attempts}].')

        # Retry perturbation for failed cases
        masked_texts = [tokenize_and_mask(texts[idx], span_length, pct, buffer_size, ceil_pct) for idx in idxs]
        raw_fills = replace_masks(masked_texts)
        extracted_fills = extract_fills(raw_fills)
        new_perturbed_texts = apply_extracted_fills(masked_texts, extracted_fills)

        # Update perturbed texts
        for idx, new_text in zip(idxs, new_perturbed_texts):
            perturbed_texts[idx] = new_text

        attempts += 1

    return perturbed_texts

def perturb_texts(texts: list[str], span_length: int, pct: float, buffer_size: int, ceil_pct: bool = False) -> list[str]:
    """
    Applies T5-based perturbation to a list of texts (in chunks for efficiency)

    Note: wrapper function around `perturb_texts_`

    Args:
        texts (list): list of input texts to be perturbed
        span_length (int): length of each masked span
        pct (float): percentage of the text to be masked (as in DetectGPT codebase, so not exactly pure percentage of the text)
        buffer_size (int): buffer size around masked spans (to prevent overlap)
        ceil_pct (bool, optional): whether to round up the number of spans (Default: False)

    Returns:
        outputs (list): list of perturbed texts
    """
    chunk_size = 20  # Process texts in batches of 20 for efficiency
    outputs = []

    for i in range(0, len(texts), chunk_size):
        batch = texts[i:i + chunk_size]
        perturbed_batch = perturb_texts_(batch, span_length, pct, buffer_size, ceil_pct=ceil_pct)
        outputs.extend(perturbed_batch)

    return outputs

### 2. **🔍 Main functions: *DetectGPT* Method**

This section implements the **DetectGPT method**.

- **Key idea:** once again, AI-generated texts often **reside in low-curvature probability regions**.
- **How does it work?**
  - We perturb the text multiple times (``num_perturbation``). We will use ``n_samples`` texts with ``max_length`` words.
  - Compute log probabilities for both **original** and **perturbed** texts
  - Measure the **discrepancy score** (a higher score suggests AI-generated text)

In [None]:
def get_ll(text: list) ->list:
    """
    Compute log prob for a single text
    
    Args:
        text (str): input text

    Returns:
        log_prob (float): log prob of the text
    """
    with torch.no_grad():
        # Tokenize the text
        tokenized = computation_tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(DEVICE)
        labels = tokenized.input_ids

        # Compute loss (NLL per token)
        loss = computation_model(**tokenized, labels=labels).loss.item()

        # Compute log-prob 
        log_prob = -loss

    return log_prob

def get_lls(texts: list) -> list:
    """
    Compute log prob for multiple texts
    
    Args:
        texts (list): list of texts

    Returns:
        log_probs (list): log prob of each text
    """
    log_probs = [get_ll(text) for text in texts]
    return log_probs

In [None]:
def get_ppl(text: list) ->list:
    """
    Compute perplexity for a single text
    
    Args:
        text (str): input text

    Returns:
        perplexity (float): perplexity of the text
    """
    with torch.no_grad():
        # Tokenize the text
        tokenized = computation_tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(DEVICE)
        labels = tokenized.input_ids

        # Compute loss (NLL per token)
        loss = computation_model(**tokenized, labels=labels).loss.item()

        # Compute perplexity
        perplexity_score = torch.exp(torch.tensor(loss)).item()
    
    return perplexity_score


def get_ppls(texts: list) -> list:
    """
    Compute perplexity for multiple texts
    
    Args:
        texts (list): list of texts

    Returns:
        perplexity_scores (list): perplexity score of each text
    """
    perplexity_scores = [get_ppl(text) for text in texts]
    return perplexity_scores

In [None]:
def compute_detectgpt_discrepancy(log_probs_per_text_base: list, log_probs_per_text_transformed: list, normalization: bool=False) -> list:
    """
    Compute the DetectGPT discrepancy metric for each of the n_samples texts. Computed for n_perturbations perturbations.

    Args:
        log_probs_per_text_base (list): original log probability of each text
        log_probs_per_text_transformed (list): list of size n_samples where each element is a list of the n_perturbations perturbed log probs
        normalization (bool): True if you want to normalize the discrepancy scores, False otherwise

    Returns:
        discrepancy_scores (list): list of discrepancy values (d) for the n_samples texts
    """
    n_samples = len(log_probs_per_text_base) 
    discrepancy_scores = []

    for i in range(n_samples):
        original_log_prob = log_probs_per_text_base[i]
        perturbed_log_probs = log_probs_per_text_transformed[i] # List of perturbed log probs
        n_perturbations = len(perturbed_log_probs) # Number of perturbations

        # Compute mean log probability of the perturbed texts
        mu = sum(perturbed_log_probs) / n_perturbations  

        # Compute discrepancy
        discrepancy_score_unormalized = original_log_prob - mu
        if normalization:
            # Normalize
            variance = sum((log_prob - mu) ** 2 for log_prob in perturbed_log_probs) / (n_perturbations - 1)
            sigma = variance ** 0.5
            discrepancy_score_normalized = discrepancy_score_unormalized / sigma if sigma > 0 else 0
            discrepancy_scores.append(discrepancy_score_normalized)
        else:
            discrepancy_scores.append(discrepancy_score_unormalized)
    
    return discrepancy_scores

In [None]:
def optimized_processing(data: list, 
                         n_samples: int, 
                         max_length: int, 
                         n_perturbations: int, 
                         span_length: int, 
                         pct: float, 
                         buffer_size: int)-> tuple[list, list]:
    """
    Compute log probabilities for original and perturbed texts.
    
    This function processes multiple text samples, computes their log probabilities, 
    applies perturbations to the texts, and then computes the log probabilities 
    of the perturbed versions.

    Args:
        data (list): list of dictionaries containing text (e.g. [{"text": "sample text"}, ...])
        n_samples (int): number of texts to process
        max_length (int): maximum number of words to consider in each text
        n_perturbations (int): number of perturbations applied to each text
        span_length (int): length of each masked span
        pct (float): percentage of the text to be masked (as in DetectGPT codebase, so not exactly pure percentage of the text)
        buffer_size (int): buffer size around masked spans (to prevent overlap)
    
    Returns:
        tuple: 
            - log_probs_per_text_base (list): log probs of the original texts
            - log_probs_per_text_transformed (list of lists): log probs of the perturbed texts,
              structured as a list where each element corresponds to a text and contains 
              a list of its perturbed log probs
    """
    log_probs_per_text_transformed = []

    # Process original texts in batches
    original_texts = [" ".join(data[j]["text"].split()[:max_length]) for j in range(n_samples)]

    # Calculate log probabilities of the original texts
    log_probs_per_text_base = get_lls(original_texts)

    for perturbation_idx in tqdm(range(n_perturbations), desc=f"For all the {n_samples} texts, processing perturbation"):
        # Apply perturbation
        all_perturbed_texts = perturb_texts(original_texts, span_length=span_length, pct=pct, buffer_size=buffer_size, ceil_pct=False)
        
        # Calculate log probabilities of the perturbed texts
        all_log_probs = get_lls(all_perturbed_texts)

        # Organize results
        for j in range(n_samples):
            if perturbation_idx == 0:
                log_probs_per_text_transformed.append([])
            log_probs_per_text_transformed[j].append(all_log_probs[j])

    return log_probs_per_text_base, log_probs_per_text_transformed


### 3. **Utility functions**

In [None]:
# Memory management utilities
def clear_cuda_cache():
    """Clear CUDA cache to free up memory."""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

## III- **Data loading**

**📌 Dataset format guidelines**

All datasets (human-written and AI-generated) must follow this format:

- Stored as a **`.jsonl`** where each line is a dictionary.
- Each entry contains (minimum requirement):
  - `"text"`: the text content
  - `"model"`: for human text please label it as `"human"` and for AI-generated texts, please specify the model used (e.g. ``"gpt2-large"``)
  - `"source"`: the origin of the text (e.g., `"wikihow"`, `"reddit"`, `"news articles"`)

#### Exemple (as in ``subtaskB_train.jsonl`` located in `Datasets\SemEval2024-Task8`):
```json
{"text": "A groundbreaking discovery in physics was made today.", "model": "human", "source": "news articles"}
{"text": "The AI revolution is shaping the future of work.", "model": "chatGPT", "source": "AI Generated"}


### 1. **Human texts**

In [None]:
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Datasets\SemEval2024-Task8\subtaskB_train.jsonl"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

if not os.path.exists(FILE_PATH):
    raise FileNotFoundError(f"File not found: {FILE_PATH}")

data_human = []

# Efficiently process the file line by line
with open(FILE_PATH, "r", encoding="utf-8") as file:
    for line in file:
        record = json.loads(line)  # Parse JSON once
        if record.get("model") == "human":
            data_human.append(record)

# Print first human record
print("First human text record:", data_human[0] if data_human else "No human data found.")

### 2. **AI-generated texts**

#### 2.1. *Option 1: produce own AI-generated texts*

In [None]:
def generate_text(prompt: str, max_length: int) -> str:
    """
    Generate AI text from a given prompt

    Args:
        prompt (str): prompt to generate text
        max_length (int): max length of generated text

    Returns:
        generated_text (str): generated text
    """
    inputs = generation_tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.no_grad():
        output = generation_model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=max_length,
            do_sample=True,
            temperature=0.7
        )
    
    generated_text = generation_tokenizer.decode(output[0].cpu(), skip_special_tokens=True)

    return generated_text

def generate_dataset(n_samples: int, max_length: int, generation_model_name: str, prompt: str) -> list:
    """
    Generates a dataset of N AI-generated texts in the required dictionary format

    Args:
        n_samples (int): number of AI-generated texts
        max_length (int): max length of each generated text
        generation_model_name (str): name of the AI generation model
        prompt (str): prompt to start text generation
    
    Returns:
        data_ai (list): dataset of AI-generated texts (list of dictionaries)
    """
    data_ai = []

    for _ in tqdm(range(n_samples), desc="Generating dataset", unit="sample"):
        generated_text = generate_text(prompt, max_length)
        data_ai.append({
            "text": generated_text,
            "model": generation_model_name,
            "source": "FleLLM"
        })

    return data_ai

In [None]:
generation_model.to(DEVICE)
print(DEVICE)

n_samples = 1
max_length = 8
PROMPT = "In a faraway galaxy,"

clear_cuda_cache()
data_ai_generated = generate_dataset(n_samples=n_samples, max_length=max_length, generation_model_name=GENERATION_MODEL_NAME, prompt=PROMPT)

In [None]:
# Saving the dataset in the correct .jsonl format

BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Datasets\AI-generated\dataset_ai.jsonl"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH)

with open(FILE_PATH, "w", encoding="utf-8") as f:
    for entry in data_ai_generated:
        f.write(json.dumps(entry) + "\n")

In [None]:
# Print first AI-generated text record
print("First AI-generated text record:", data_ai_generated[0] if data_ai_generated else "No AI-generated data found.")

#### 2.2. *Option 2: load AI-generated texts from a dataset*

In [None]:
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Datasets\SemEval2024-Task8\subtaskB_train.jsonl"
# FILE_RELATIVE_PATH = "Datasets\AI-generated\dataset_ai.jsonl"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

if not os.path.exists(FILE_PATH):
    raise FileNotFoundError(f"File not found: {FILE_PATH}")

data_ai_dataset = []

# Read entire file and parse as JSON list
with open(FILE_PATH, "r", encoding="utf-8") as file:
    for line in file:
        record = json.loads(line)  # Parse JSON once
        if record.get("model") != "human":
            data_ai_dataset.append(record)

# Print first AI-generated text record
print("First AI-generated text record:", data_ai_dataset[0] if data_ai_dataset else "No AI-generated data found.")

### 3. **Texts perplexity**

In [None]:
DATA = data_human
n_samples = 2
max_length = 10

original_texts = [" ".join(DATA[j]["text"].split()[:max_length]) for j in range(n_samples)]

clear_cuda_cache()
perplexity_scores = get_ppls(texts=original_texts)

## IV- **Exemple usage**

Human texts

In [None]:
# Data
DATA = data_human

# Experiment setup
n_samples = 2
max_length = 20

# Perturbation setup
n_perturbations = 2
span_length = 2
pct = 0.3
buffer_size = 1

clear_cuda_cache()

# Compute log probs before and after perturbation
log_probs_base_human, log_probs_transformed_human = optimized_processing(
    data=DATA,
    n_samples=n_samples,
    max_length=max_length,
    n_perturbations=n_perturbations,
    span_length=span_length,
    pct=pct,
    buffer_size=buffer_size
    )

In [None]:
normalization = False # True if you want to normalize the discrepancy scores

# Compute discrepancy scores
discrepancy_scores_human = compute_detectgpt_discrepancy(log_probs_base_human,log_probs_transformed_human,normalization=normalization)

# Store all results
results_human = {}
results_human["log_probs_base"] = log_probs_base_human
results_human["log_probs_transformed"] = log_probs_transformed_human
results_human["discrepancy_scores"] = discrepancy_scores_human

In [None]:
# Saving results
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Results\experiment_0_results_human.json"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 
with open(FILE_PATH, "w") as f:
    json.dump(results_human, f, indent=2)

AI-generated texts

In [None]:
# Data
DATA = data_ai_dataset # DATA = data_ai_generated

# Experiment setup
n_samples = 2
max_length = 20
n_perturbations = 2

# Perturbation setup
span_length = 2
pct = 0.3
buffer_size = 1

# Compute log probs before and after perturbation
clear_cuda_cache()
log_probs_base_ai, log_probs_transformed_ai = optimized_processing(
    data=DATA,
    n_samples=n_samples,
    max_length=max_length,
    n_perturbations=n_perturbations,
    span_length=span_length,
    pct=pct,
    buffer_size=buffer_size
    )

In [None]:
normalization = False # True if you want to normalize the discrepancy scores

# Compute discrepancy scores
discrepancy_scores_ai = compute_detectgpt_discrepancy(log_probs_base_ai,log_probs_transformed_ai,normalization=normalization)

# Store all results
results_ai = {}
results_ai["log_probs_base"] = log_probs_base_ai
results_ai["log_probs_transformed"] = log_probs_transformed_ai
results_ai["discrepancy_scores"] = discrepancy_scores_ai

In [None]:
# Saving results
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Results\experiment_0_results_ai.json"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

with open(FILE_PATH, "w") as f:
    json.dump(results_ai, f, indent=2)

## V- **Analysis**

### 1. **Histograms of the discrepancy scores**

In [None]:
NORMALIZATION = False # True if you want to normalize the discrepancy scores

In [None]:
# AI texts results
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Results\\results_ai.json"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

with open(FILE_PATH, 'r') as file:
    data_ai = json.load(file)

In [None]:
log_probs_base_ai = data_ai["log_probs_base"]
log_probs_transformed_ai = data_ai["log_probs_transformed"]

discrepancy_scores_ai = compute_detectgpt_discrepancy(log_probs_base_ai,log_probs_transformed_ai,normalization=NORMALIZATION)

In [None]:
# Human texts results
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
FILE_RELATIVE_PATH = "Results\\results_human.json"
FILE_PATH = os.path.join(BASE_DIR,FILE_RELATIVE_PATH) 

with open(FILE_PATH, 'r') as file:
    data_human = json.load(file)

In [None]:
log_probs_base_human = data_human["log_probs_base"]
log_probs_transformed_human = data_human["log_probs_transformed"]

discrepancy_scores_human = compute_detectgpt_discrepancy(log_probs_base_human,log_probs_transformed_human,normalization=NORMALIZATION)

In [None]:
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression

def optimal_threshold(list1, list2):
    X = np.concatenate([list1, list2]).reshape(-1, 1)
    y = np.concatenate([np.zeros(len(list1)),np.ones(len(list2))])
    
    clf = LogisticRegression()
    clf.fit(X, y)
    
    best_threshold = -clf.intercept_[0] / clf.coef_[0][0]
    
    # Predict
    y_pred = (clf.predict_proba(X)[:, 1] >= 0.5).astype(int)
    auroc = roc_auc_score(y, y_pred)
    
    return best_threshold, auroc

In [None]:
threshold,auroc = optimal_threshold(discrepancy_scores_human, discrepancy_scores_ai)
print(f"Optimal threshold: {threshold:.2f}")
print(f"AUROC: {auroc:.2f}")

plt.hist(discrepancy_scores_human, bins=15, alpha=0.5, label='Human', edgecolor='black', density=True)
plt.hist(discrepancy_scores_ai, bins=15, alpha=0.5, label='AI', edgecolor='black', density=True)

plt.axvline(threshold, color='red', linestyle='dashed', linewidth=2, label=f'Threshold = {threshold:.2f}')
plt.xlabel(f'Discrepancy scores (normalization={normalization})')
plt.ylabel('Frequency')
plt.title(f'Computation model: {COMPUTATION_MODEL_NAME}')
plt.legend()
plt.show()

## References
[1] E. Mitchell, C. Lin, A. Bosselut, and C. D. Manning, "DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature" *arXiv preprint*, 2023. Available at: [arXiv:2301.11305](https://arxiv.org/abs/2301.11305)