# Text Detection Deep Dive with Veridex

This notebook demonstrates how to use Veridex to detect AI-generated text using various statistical and model-based signals.

We will cover:
1. **Zlib Entropy**: A lightweight, dependency-free method.
2. **Perplexity & Burstiness**: Using language models to measure predictability.
3. **Binoculars**: A state-of-the-art zero-shot detection method.

In [None]:
# Install dependencies if not already installed
!pip install veridex[text]

In [None]:
from veridex.text import ZlibEntropySignal, PerplexitySignal, BinocularsSignal
import pandas as pd

## 1. Setup Data

Let's define some sample texts. One is a classic human text, and the other is a generated-sounding text.

In [None]:
human_text = """
The industrial revolution was a period of major mechanization and innovation that began in Great Britain during the mid-18th century and early 19th century and later spread throughout much of the world. The American Industrial Revolution, sometimes referred to as the Second Industrial Revolution, started in the 1870s and continued through World War II.
"""

ai_text = """
The Industrial Revolution marked a significant turning point in history. It was characterized by the transition to new manufacturing processes in Great Britain, continental Europe, and the United States, in the period from about 1760 to sometime between 1820 and 1840. This transition included going from hand production methods to machines.
"""

## 2. Zlib Entropy Signal

This signal uses compression ratios. AI text is often more repetitive or predictable, leading to higher compression (lower entropy ratio).

In [None]:
zlib_detector = ZlibEntropySignal()

res_human = zlib_detector.detect(human_text)
res_ai = zlib_detector.detect(ai_text)

print(f"Human Text Score: {res_human.score:.4f} (Raw: {res_human.metadata['entropy_ratio']:.4f})")
print(f"AI Text Score:    {res_ai.score:.4f}    (Raw: {res_ai.metadata['entropy_ratio']:.4f})")

## 3. Perplexity Signal

This uses a pre-trained model (default: GPT-2) to calculate the likelihood of the text. Lower perplexity generally indicates AI generation.

In [None]:
# Initialize detector (downloads model on first run)
ppl_detector = PerplexitySignal()

res_ppl_human = ppl_detector.detect(human_text)
res_ppl_ai = ppl_detector.detect(ai_text)

print("Human Text Results:")
print(f"  Score: {res_ppl_human.score:.4f}")
print(f"  Perplexity: {res_ppl_human.metadata.get('perplexity', 'N/A')}")

print("\nAI Text Results:")
print(f"  Score: {res_ppl_ai.score:.4f}")
print(f"  Perplexity: {res_ppl_ai.metadata.get('perplexity', 'N/A')}")

## 4. Binoculars Signal

Binoculars is a more advanced metric that looks at the ratio of two different perplexity scores. It is considered state-of-the-art for zero-shot detection.

In [None]:
# Note: This requires downloading larger models and may be slow on CPU
try:
    binoc_detector = BinocularsSignal()
    
    res_bin_human = binoc_detector.detect(human_text)
    res_bin_ai = binoc_detector.detect(ai_text)
    
    print(f"Human Text AI Probability: {res_bin_human.score:.4f}")
    print(f"AI Text AI Probability:    {res_bin_ai.score:.4f}")
except Exception as e:
    print(f"Binoculars skipped: {e}")