# SynthID Detection: Text & Image AI Detection Guide

This notebook covers:
1. **SynthID Text Detection** (Open Source - Fully Working)
2. **SynthID Image Detection** (Limited - Google Cloud Only)
3. **Alternative AI Image Detection Methods**

---

## Part 1: SynthID Text Detection (Open Source)

SynthID Text is fully open-sourced and available through Hugging Face Transformers.

In [None]:
# Install required packages
!pip install transformers>=4.46.0 torch -q

### 1.1 Generate Watermarked Text with SynthID

In [None]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    SynthIDTextWatermarkingConfig,
    SynthIDTextWatermarkDetector,
    SynthIDTextWatermarkLogitsProcessor
)
import torch

# Load a small model for demonstration (you can use larger models)
model_name = "facebook/opt-1.3b"  # or "gpt2", "meta-llama/Llama-2-7b-hf", etc.

print("Loading model and tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Configure SynthID watermarking
watermark_config = SynthIDTextWatermarkingConfig(
    keys=[654, 400, 836, 123, 340],  # Random keys - KEEP THESE SECRET in production!
    ngram_len=5,  # Balance between detectability and robustness
)

print("Model loaded successfully!")

In [None]:
# Generate watermarked text
prompt = "Artificial intelligence is revolutionizing"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate WITH watermark
watermarked_output = model.generate(
    **inputs,
    watermarking_config=watermark_config,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

watermarked_text = tokenizer.decode(watermarked_output[0], skip_special_tokens=True)

print("\n" + "="*50)
print("WATERMARKED TEXT:")
print("="*50)
print(watermarked_text)
print("="*50)

In [None]:
# Generate WITHOUT watermark (for comparison)
non_watermarked_output = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

non_watermarked_text = tokenizer.decode(non_watermarked_output[0], skip_special_tokens=True)

print("\n" + "="*50)
print("NON-WATERMARKED TEXT:")
print("="*50)
print(non_watermarked_text)
print("="*50)

### 1.2 Detect SynthID Watermark in Text

In [None]:
# Initialize the detector
detector = SynthIDTextWatermarkDetector(
    watermarking_config=watermark_config,
    tokenizer=tokenizer
)

# Detect watermark in the generated text
def detect_watermark(text, label=""):
    result = detector(text)
    
    print(f"\n{'='*50}")
    print(f"DETECTION RESULT: {label}")
    print(f"{'='*50}")
    print(f"Text: {text[:100]}...")
    print(f"\nWatermark Detected: {result['prediction']}")
    print(f"Confidence Score: {result['score']:.4f}")
    
    # Interpret the result
    if result['prediction'] == 'watermarked':
        print("‚úÖ This text likely contains a SynthID watermark")
    elif result['prediction'] == 'not_watermarked':
        print("‚ùå This text does NOT contain a SynthID watermark")
    else:
        print("‚ö†Ô∏è  Uncertain - not enough confidence to determine")
    print(f"{'='*50}")
    
    return result

# Test on watermarked text
detect_watermark(watermarked_text, "Watermarked Text")

# Test on non-watermarked text
detect_watermark(non_watermarked_text, "Non-Watermarked Text")

# Test on human-written text
human_text = "The quick brown fox jumps over the lazy dog. This is a classic pangram used in typing tests."
detect_watermark(human_text, "Human-Written Text")

### 1.3 Test Robustness to Modifications

In [None]:
# Test watermark detection after minor modifications
import re

def modify_text(text, modification_type="minor"):
    if modification_type == "minor":
        # Change a few words
        modified = text.replace("the", "a").replace("is", "was")
        return modified[:len(text)]
    elif modification_type == "paraphrase":
        # Mild paraphrasing (simulated)
        return text.replace(".", ", which means that.").replace("and", "as well as")
    elif modification_type == "truncate":
        # Remove last 20% of text
        return text[:int(len(text) * 0.8)]
    return text

print("\n" + "#"*70)
print("TESTING WATERMARK ROBUSTNESS")
print("#"*70)

# Test minor modifications
modified_text = modify_text(watermarked_text, "minor")
detect_watermark(modified_text, "Minor Modifications")

# Test after truncation
truncated_text = modify_text(watermarked_text, "truncate")
detect_watermark(truncated_text, "Truncated Text (80%)")

---

## Part 2: SynthID Image Detection (Limited Availability)

**IMPORTANT:** SynthID for images is NOT open-source. It's only available through:
1. **Google Cloud Vertex AI** (for images generated with Imagen)
2. **SynthID Detector Portal** (waitlist only - for journalists, researchers)

### 2.1 Google Cloud Vertex AI Approach (Requires GCP Account)

In [None]:
# This code requires Google Cloud credentials and Vertex AI API access
# You need to:
# 1. Set up a GCP project
# 2. Enable Vertex AI API
# 3. Have proper authentication

"""
# EXAMPLE CODE (Won't work without GCP setup)
from google.cloud import aiplatform
from google.cloud.aiplatform.vision_models import Image, ImageGenerationModel

# Initialize Vertex AI
aiplatform.init(project="your-project-id", location="us-central1")

# Generate image with watermark (automatically applied)
model = ImageGenerationModel.from_pretrained("imagegeneration@006")
response = model.generate_images(
    prompt="A beautiful sunset over mountains",
    number_of_images=1,
)

# Images are automatically watermarked with SynthID
images = response.images
images[0].save("watermarked_image.png")

# Verify watermark
from google.cloud.aiplatform.vision_models import WatermarkVerificationModel

verification_model = WatermarkVerificationModel.from_pretrained("watermark-verification@001")
result = verification_model.verify_image(
    image=Image.load_from_file("watermarked_image.png")
)

print(f"Watermark detected: {result.watermark_detected}")
print(f"Confidence: {result.confidence}")
"""

print("‚ö†Ô∏è  SynthID Image Detection is NOT available as open-source.")
print("\nOptions:")
print("1. Use Google Cloud Vertex AI (requires GCP account & billing)")
print("2. Join SynthID Detector Portal waitlist:")
print("   https://deepmind.google/technologies/synthid/")
print("3. Use alternative AI image detection methods (see Part 3 below)")

---

## Part 3: Alternative AI Image Detection Methods

Since SynthID for images isn't publicly available, here are working alternatives:

### 3.1 Check Image Metadata (EXIF/C2PA)

In [None]:
!pip install pillow exifread -q

from PIL import Image
from PIL.ExifTags import TAGS
import exifread

def check_image_metadata(image_path):
    """Check image metadata for AI generation indicators"""
    
    print(f"\n{'='*60}")
    print(f"ANALYZING: {image_path}")
    print(f"{'='*60}")
    
    # Using PIL
    try:
        img = Image.open(image_path)
        exifdata = img.getexif()
        
        if exifdata:
            print("\nüìã EXIF Data Found:")
            ai_indicators = []
            
            for tag_id, value in exifdata.items():
                tag_name = TAGS.get(tag_id, tag_id)
                
                # Look for AI-related keywords
                value_str = str(value).lower()
                if any(keyword in value_str for keyword in 
                       ['ai', 'artificial', 'generated', 'stable diffusion', 
                        'midjourney', 'dall-e', 'imagen', 'synthid']):
                    ai_indicators.append(f"{tag_name}: {value}")
                    print(f"  ‚ö†Ô∏è  {tag_name}: {value}")
            
            if ai_indicators:
                print("\n‚úÖ AI Generation Indicators Found!")
            else:
                print("\n‚ùì No obvious AI indicators in metadata")
        else:
            print("\n‚ùå No EXIF data found (metadata stripped or never existed)")
            
    except Exception as e:
        print(f"\n‚ùå Error reading image: {e}")
    
    print(f"{'='*60}")

# Example usage
# check_image_metadata('your_image.png')

print("\nüí° Note: Many AI image generators strip metadata, making this method unreliable.")
print("   This is why pixel-based watermarking (like SynthID) is more robust.")

### 3.2 Hugging Face AI Image Detector

In [None]:
!pip install transformers pillow torch torchvision -q

from transformers import pipeline
from PIL import Image

# Load AI image detection model
print("Loading AI image detector...")
detector = pipeline("image-classification", model="umm-maybe/AI-image-detector")

def detect_ai_image(image_path):
    """Detect if an image is AI-generated using Hugging Face model"""
    
    img = Image.open(image_path)
    results = detector(img)
    
    print(f"\n{'='*60}")
    print(f"AI IMAGE DETECTION RESULTS: {image_path}")
    print(f"{'='*60}")
    
    for result in results:
        label = result['label']
        score = result['score'] * 100
        
        if label == 'artificial':
            icon = "ü§ñ"
        else:
            icon = "üë§"
        
        print(f"{icon} {label.upper()}: {score:.2f}%")
    
    # Determine verdict
    top_prediction = results[0]
    if top_prediction['label'] == 'artificial' and top_prediction['score'] > 0.7:
        print("\n‚úÖ LIKELY AI-GENERATED")
    elif top_prediction['label'] == 'human' and top_prediction['score'] > 0.7:
        print("\n‚úÖ LIKELY HUMAN-CREATED")
    else:
        print("\n‚ö†Ô∏è  UNCERTAIN - confidence too low")
    
    print(f"{'='*60}")
    return results

# Example usage:
# detect_ai_image('suspicious_image.png')

print("\nModel loaded! Use detect_ai_image('path_to_image.jpg') to analyze images.")

### 3.3 OpenAI's CLIP-based Detection

In [None]:
!pip install torch torchvision transformers pillow -q

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import torch

print("Loading CLIP model...")
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

def detect_ai_with_clip(image_path):
    """Use CLIP to detect AI-generated images based on visual features"""
    
    image = Image.open(image_path)
    
    # Prompts that describe AI vs real images
    text_prompts = [
        "a computer generated image",
        "an AI generated digital art",
        "a photograph taken with a camera",
        "a real photograph of the world",
        "artificial intelligence artwork",
        "authentic photograph"
    ]
    
    inputs = processor(
        text=text_prompts,
        images=image,
        return_tensors="pt",
        padding=True
    )
    
    outputs = model(**inputs)
    logits_per_image = outputs.logits_per_image
    probs = logits_per_image.softmax(dim=1)
    
    print(f"\n{'='*60}")
    print(f"CLIP-BASED AI DETECTION: {image_path}")
    print(f"{'='*60}")
    
    for i, prompt in enumerate(text_prompts):
        prob = probs[0][i].item() * 100
        print(f"{prompt}: {prob:.2f}%")
    
    # Calculate AI vs Real scores
    ai_score = (probs[0][0] + probs[0][1] + probs[0][4]).item() * 100 / 3
    real_score = (probs[0][2] + probs[0][3] + probs[0][5]).item() * 100 / 3
    
    print(f"\nAggregate Scores:")
    print(f"ü§ñ AI-Generated: {ai_score:.2f}%")
    print(f"üì∑ Real Photo: {real_score:.2f}%")
    
    if ai_score > real_score:
        print("\n‚úÖ More likely AI-GENERATED")
    else:
        print("\n‚úÖ More likely REAL PHOTOGRAPH")
    
    print(f"{'='*60}")
    
    return probs

# Example usage:
# detect_ai_with_clip('test_image.jpg')

print("\nCLIP model loaded! Use detect_ai_with_clip('image.jpg') to analyze.")

### 3.4 Statistical Analysis Method

In [None]:
!pip install numpy opencv-python pillow scipy -q

import numpy as np
import cv2
from PIL import Image
from scipy import stats

def analyze_image_statistics(image_path):
    """Analyze statistical properties that differ between AI and real images"""
    
    # Load image
    img = cv2.imread(image_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    print(f"\n{'='*60}")
    print(f"STATISTICAL ANALYSIS: {image_path}")
    print(f"{'='*60}")
    
    # 1. Noise analysis
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    noise_estimate = np.std(cv2.Laplacian(gray, cv2.CV_64F))
    print(f"\nüìä Noise Level: {noise_estimate:.2f}")
    if noise_estimate < 10:
        print("   ‚ö†Ô∏è  Very low noise (AI images often have minimal noise)")
    
    # 2. Color histogram analysis
    hist_r = cv2.calcHist([img_rgb], [0], None, [256], [0, 256])
    hist_g = cv2.calcHist([img_rgb], [1], None, [256], [0, 256])
    hist_b = cv2.calcHist([img_rgb], [2], None, [256], [0, 256])
    
    # Measure histogram smoothness (AI images often have smoother distributions)
    hist_variance = np.var(hist_r) + np.var(hist_g) + np.var(hist_b)
    print(f"\nüé® Color Distribution Variance: {hist_variance:.2f}")
    if hist_variance > 1000000:
        print("   ‚ö†Ô∏è  Very smooth color distribution (typical of AI)")
    
    # 3. Edge detection
    edges = cv2.Canny(gray, 100, 200)
    edge_density = np.sum(edges) / edges.size
    print(f"\nüîç Edge Density: {edge_density:.4f}")
    if edge_density < 0.05:
        print("   ‚ö†Ô∏è  Low edge density (AI images can have overly smooth edges)")
    
    # 4. Compression artifacts
    # AI-generated images often lack typical JPEG compression artifacts
    dct = cv2.dct(np.float32(gray))
    dct_mean = np.mean(np.abs(dct))
    print(f"\nüì∏ DCT Mean: {dct_mean:.2f}")
    
    # Overall assessment
    ai_indicators = 0
    if noise_estimate < 10:
        ai_indicators += 1
    if hist_variance > 1000000:
        ai_indicators += 1
    if edge_density < 0.05:
        ai_indicators += 1
    
    print(f"\n{'='*60}")
    print(f"AI Indicators Found: {ai_indicators}/3")
    
    if ai_indicators >= 2:
        print("‚ö†Ô∏è  Statistical properties suggest possible AI generation")
    else:
        print("‚úÖ Statistical properties more consistent with real photo")
    
    print(f"{'='*60}")
    print("\nüí° Note: Statistical analysis is indicative, not definitive.")
    print("   Combine with other methods for better accuracy.")

# Example usage:
# analyze_image_statistics('test_image.jpg')

print("\nStatistical analysis ready! Use analyze_image_statistics('image.jpg')")

---

## Summary & Recommendations

### For TEXT Detection:
‚úÖ **Use SynthID Text** (fully open-source)
- Most reliable for text generated with watermarking enabled
- Available through Hugging Face Transformers

### For IMAGE Detection:

#### If you have Google Cloud access:
1. **SynthID on Vertex AI** (most reliable, but limited to Imagen-generated images)

#### If you don't have Google Cloud:
1. **Hugging Face AI Detector** (Best free option)
2. **CLIP-based Detection** (Good for general assessment)
3. **Metadata Check** (Quick but easily defeated)
4. **Statistical Analysis** (Supplementary indicator)

### Best Practice:
**Combine multiple methods** for higher confidence:
```python
# Comprehensive check
check_image_metadata('image.jpg')        # Quick check
detect_ai_image('image.jpg')             # ML-based
detect_ai_with_clip('image.jpg')         # CLIP-based
analyze_image_statistics('image.jpg')    # Statistical
```

### Limitations:
- No detector is 100% accurate
- Sophisticated manipulation can fool all methods
- SynthID Image is the most robust but not publicly available
- Alternative methods give probabilistic estimates, not certainty