# SynthID Detection: Text & Image AI Detection Guide

This notebook covers:
1. **SynthID Text Detection** (Open Source - Fully Working)
2. **SynthID Image Detection** (Limited - Google Cloud Only)
3. **Alternative AI Image Detection Methods**

---

## Part 1: SynthID Text Detection (Open Source)

SynthID Text is fully open-sourced and available through Hugging Face Transformers.

In [1]:
# Install required packages
!pip install transformers>=4.46.0 torch -q

### 1.1 Generate Watermarked Text with SynthID

In [2]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    SynthIDTextWatermarkingConfig,
    SynthIDTextWatermarkDetector,
    SynthIDTextWatermarkLogitsProcessor
)
import torch

# Load a small model for demonstration (you can use larger models)
model_name = "facebook/opt-1.3b"  # or "gpt2", "meta-llama/Llama-2-7b-hf", etc.

print("Loading model and tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Configure SynthID watermarking
watermark_config = SynthIDTextWatermarkingConfig(
    keys=[654, 400, 836, 123, 340],  # Random keys - KEEP THESE SECRET in production!
    ngram_len=5,  # Balance between detectability and robustness
)

print("Model loaded successfully!")

Loading model and tokenizer...


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/653 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


pytorch_model.bin:   0%|          | 0.00/2.63G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Model loaded successfully!


In [3]:
# Generate watermarked text
prompt = "Artificial intelligence is revolutionizing"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate WITH watermark
watermarked_output = model.generate(
    **inputs,
    watermarking_config=watermark_config,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

watermarked_text = tokenizer.decode(watermarked_output[0], skip_special_tokens=True)

print("\n" + "="*50)
print("WATERMARKED TEXT:")
print("="*50)
print(watermarked_text)
print("="*50)


WATERMARKED TEXT:
Artificial intelligence is revolutionizing the way we think about our homes.

The technology is already changing the way we interact with our homes. And it’s coming soon to your neighborhood.

Google, Apple, and Amazon are developing voice-controlled devices that can help you with tasks around your home, such as ordering groceries or cooking.

The devices are part of a broader push to control our homes through voice commands.

“It’s going to be a huge part of the home


In [4]:
# Generate WITHOUT watermark (for comparison)
non_watermarked_output = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

non_watermarked_text = tokenizer.decode(non_watermarked_output[0], skip_special_tokens=True)

print("\n" + "="*50)
print("NON-WATERMARKED TEXT:")
print("="*50)
print(non_watermarked_text)
print("="*50)


NON-WATERMARKED TEXT:
Artificial intelligence is revolutionizing the way we think about health.

We’ve already seen the power of machines to improve the quality of care for patients with cancer, and even to save lives.

But the future is also looking bright for AI in the health care field. Here are some of the most promising uses for artificial intelligence in health care.

Health care is one of the most expensive industries in the world, with billions of dollars spent every year on medical services.

There are more than


### 1.2 Detect SynthID Watermark in Text

In [15]:
# Initialize the detector
# First, create the logits processor as it's a dependency for the detector
logits_processor_instance = SynthIDTextWatermarkLogitsProcessor(
    ngram_len=watermark_config.ngram_len,
    keys=watermark_config.keys,
    sampling_table_size=watermark_config.sampling_table_size,
    sampling_table_seed=watermark_config.sampling_table_seed,
    context_history_size=watermark_config.context_history_size,
    device=model.device
)

import torch

class WrappedModelForDetection(torch.nn.Module):
    def __init__(self, base_model):
        super().__init__()
        self.base_model = base_model

    def forward(self, input_ids=None, attention_mask=None, **kwargs):
        # Ensure use_cache is False to avoid issues with past_key_values and position_ids
        kwargs['use_cache'] = False
        kwargs['past_key_values'] = None # Explicitly clear past_key_values

        # Explicitly create position_ids for the current input_ids to ensure correct length
        if input_ids is not None:
            seq_len = input_ids.shape[-1]
            kwargs['position_ids'] = torch.arange(seq_len, device=input_ids.device).unsqueeze(0)
        else:
            kwargs['position_ids'] = None # Let the model handle if no input_ids

        # The detector expects output_hidden_states=True, so we ensure it's handled
        if 'output_hidden_states' not in kwargs:
            kwargs['output_hidden_states'] = True
        return self.base_model(input_ids=input_ids, attention_mask=attention_mask, **kwargs)

detector = SynthIDTextWatermarkDetector(
    tokenizer=tokenizer,
    detector_module=WrappedModelForDetection(model), # Wrap the model here
    logits_processor=logits_processor_instance
)

# Detect watermark in the generated text
def detect_watermark(text, label=""):
    # Tokenize the input text before passing to the detector
    tokenized_input = tokenizer(text, return_tensors="pt", truncation=True).to(model.device)
    result = detector(tokenized_input.input_ids)

    print(f"\n{'='*50}")
    print(f"DETECTION RESULT: {label}")
    print(f"{'='*50}")
    print(f"Text: {text[:100]}...")

    # Safely access prediction and score, providing default values if keys are missing
    prediction = result.get('prediction', 'unknown')
    score = result.get('score', 0.0)

    print(f"\nWatermark Detected: {prediction}")
    print(f"Confidence Score: {score:.4f}")

    # Interpret the result
    if prediction == 'watermarked':
        print("✅ This text likely contains a SynthID watermark")
    elif prediction == 'not_watermarked':
        print("❌ This text does NOT contain a SynthID watermark")
    else:
        print("⚠️  Uncertain - not enough confidence to determine")
    print(f"{'='*50}")

    return result

# Test on watermarked text
detect_watermark(watermarked_text, "Watermarked Text")

# Test on non-watermarked text
detect_watermark(non_watermarked_text, "Non-Watermarked Text")

# Test on human-written text
human_text = "The quick brown fox jumps over the lazy dog. This is a classic pangram used in typing tests."
detect_watermark(human_text, "Human-Written Text")


DETECTION RESULT: Watermarked Text
Text: Artificial intelligence is revolutionizing the way we think about our homes.

The technology is alre...

Watermark Detected: unknown
Confidence Score: 0.0000
⚠️  Uncertain - not enough confidence to determine

DETECTION RESULT: Non-Watermarked Text
Text: Artificial intelligence is revolutionizing the way we think about health.

We’ve already seen the po...

Watermark Detected: unknown
Confidence Score: 0.0000
⚠️  Uncertain - not enough confidence to determine

DETECTION RESULT: Human-Written Text
Text: The quick brown fox jumps over the lazy dog. This is a classic pangram used in typing tests....

Watermark Detected: unknown
Confidence Score: 0.0000
⚠️  Uncertain - not enough confidence to determine


CausalLMOutputWithPast(loss=None, logits=tensor([[[-1.5547, -1.7412,  4.5938,  ..., -1.5918, -1.5215, -1.5879],
         [ 1.9668,  2.3496,  4.3359,  ...,  1.7197,  2.0234,  2.0547],
         [ 5.3281,  5.2734,  6.4766,  ...,  5.1523,  5.0430,  5.1758],
         [ 5.1875,  5.1172,  7.1328,  ...,  5.0508,  4.8945,  4.9883],
         [ 5.2148,  5.2344,  7.8945,  ...,  5.1367,  4.9531,  4.9844]],

        [[-1.7188, -1.9678,  4.2188,  ..., -1.7959, -1.7295, -1.8047],
         [ 0.9292,  1.2334,  4.1094,  ...,  0.7075,  1.0010,  1.0791],
         [ 4.5078,  4.6719,  5.6523,  ...,  4.3867,  4.3398,  4.5352],
         [ 4.6094,  4.7500,  6.4844,  ...,  4.4492,  4.3359,  4.4883],
         [ 4.7773,  4.7930,  7.2539,  ...,  4.5586,  4.4727,  4.5547]],

        [[-1.3535, -1.6855,  4.1641,  ..., -1.4492, -1.3945, -1.4688],
         [ 0.7671,  0.9448,  5.1680,  ...,  0.6606,  0.7773,  0.9106],
         [ 4.3008,  4.2461,  7.4453,  ...,  4.1484,  4.0508,  4.1523],
         [ 4.5820,  4.3516,  7.6

### 1.3 Test Robustness to Modifications

In [16]:
# Test watermark detection after minor modifications
import re

def modify_text(text, modification_type="minor"):
    if modification_type == "minor":
        # Change a few words
        modified = text.replace("the", "a").replace("is", "was")
        return modified[:len(text)]
    elif modification_type == "paraphrase":
        # Mild paraphrasing (simulated)
        return text.replace(".", ", which means that.").replace("and", "as well as")
    elif modification_type == "truncate":
        # Remove last 20% of text
        return text[:int(len(text) * 0.8)]
    return text

print("\n" + "#"*70)
print("TESTING WATERMARK ROBUSTNESS")
print("#"*70)

# Test minor modifications
modified_text = modify_text(watermarked_text, "minor")
detect_watermark(modified_text, "Minor Modifications")

# Test after truncation
truncated_text = modify_text(watermarked_text, "truncate")
detect_watermark(truncated_text, "Truncated Text (80%)")


######################################################################
TESTING WATERMARK ROBUSTNESS
######################################################################

DETECTION RESULT: Minor Modifications
Text: Artificial intelligence was revolutionizing a way we think about our homes.

The technology was alre...

Watermark Detected: unknown
Confidence Score: 0.0000
⚠️  Uncertain - not enough confidence to determine

DETECTION RESULT: Truncated Text (80%)
Text: Artificial intelligence is revolutionizing the way we think about our homes.

The technology is alre...

Watermark Detected: unknown
Confidence Score: 0.0000
⚠️  Uncertain - not enough confidence to determine


CausalLMOutputWithPast(loss=None, logits=tensor([[[-1.7188, -1.9678,  4.2188,  ..., -1.7959, -1.7295, -1.8047],
         [ 0.9292,  1.2334,  4.1094,  ...,  0.7075,  1.0010,  1.0791],
         [ 4.5078,  4.6719,  5.6523,  ...,  4.3867,  4.3398,  4.5352],
         [ 4.6094,  4.7500,  6.4844,  ...,  4.4492,  4.3359,  4.4883],
         [ 4.7773,  4.7930,  7.2539,  ...,  4.5586,  4.4727,  4.5547]],

        [[-1.7188, -1.9678,  4.2188,  ..., -1.7959, -1.7295, -1.8047],
         [ 0.9292,  1.2334,  4.1094,  ...,  0.7075,  1.0010,  1.0791],
         [ 4.5078,  4.6719,  5.6523,  ...,  4.3867,  4.3398,  4.5352],
         [ 4.6094,  4.7500,  6.4844,  ...,  4.4492,  4.3359,  4.4883],
         [ 4.7773,  4.7930,  7.2539,  ...,  4.5586,  4.4727,  4.5547]],

        [[-1.6504, -1.8320,  4.7109,  ..., -1.7158, -1.6592, -1.7012],
         [ 1.5322,  2.0410,  3.1562,  ...,  1.2246,  1.7207,  1.6953],
         [ 5.5547,  5.7773,  6.0664,  ...,  5.4805,  5.4375,  5.5469],
         [ 5.0664,  5.1094,  6.9

---

## Part 2: SynthID Image Detection (Limited Availability)

**IMPORTANT:** SynthID for images is NOT open-source. It's only available through:
1. **Google Cloud Vertex AI** (for images generated with Imagen)
2. **SynthID Detector Portal** (waitlist only - for journalists, researchers)

### 2.1 Google Cloud Vertex AI Approach (Requires GCP Account)

In [None]:
# This code requires Google Cloud credentials and Vertex AI API access
# You need to:
# 1. Set up a GCP project
# 2. Enable Vertex AI API
# 3. Have proper authentication

"""
# EXAMPLE CODE (Won't work without GCP setup)
from google.cloud import aiplatform
from google.cloud.aiplatform.vision_models import Image, ImageGenerationModel

# Initialize Vertex AI
aiplatform.init(project="your-project-id", location="us-central1")

# Generate image with watermark (automatically applied)
model = ImageGenerationModel.from_pretrained("imagegeneration@006")
response = model.generate_images(
    prompt="A beautiful sunset over mountains",
    number_of_images=1,
)

# Images are automatically watermarked with SynthID
images = response.images
images[0].save("watermarked_image.png")

# Verify watermark
from google.cloud.aiplatform.vision_models import WatermarkVerificationModel

verification_model = WatermarkVerificationModel.from_pretrained("watermark-verification@001")
result = verification_model.verify_image(
    image=Image.load_from_file("watermarked_image.png")
)

print(f"Watermark detected: {result.watermark_detected}")
print(f"Confidence: {result.confidence}")
"""

print("⚠️  SynthID Image Detection is NOT available as open-source.")
print("\nOptions:")
print("1. Use Google Cloud Vertex AI (requires GCP account & billing)")
print("2. Join SynthID Detector Portal waitlist:")
print("   https://deepmind.google/technologies/synthid/")
print("3. Use alternative AI image detection methods (see Part 3 below)")

---

## Part 3: Alternative AI Image Detection Methods

Since SynthID for images isn't publicly available, here are working alternatives:

### 3.1 Check Image Metadata (EXIF/C2PA)

In [17]:
!pip install pillow exifread -q

from PIL import Image
from PIL.ExifTags import TAGS
import exifread

def check_image_metadata(image_path):
    """Check image metadata for AI generation indicators"""

    print(f"\n{'='*60}")
    print(f"ANALYZING: {image_path}")
    print(f"{'='*60}")

    # Using PIL
    try:
        img = Image.open(image_path)
        exifdata = img.getexif()

        if exifdata:
            print("\n📋 EXIF Data Found:")
            ai_indicators = []

            for tag_id, value in exifdata.items():
                tag_name = TAGS.get(tag_id, tag_id)

                # Look for AI-related keywords
                value_str = str(value).lower()
                if any(keyword in value_str for keyword in
                       ['ai', 'artificial', 'generated', 'stable diffusion',
                        'midjourney', 'dall-e', 'imagen', 'synthid']):
                    ai_indicators.append(f"{tag_name}: {value}")
                    print(f"  ⚠️  {tag_name}: {value}")

            if ai_indicators:
                print("\n✅ AI Generation Indicators Found!")
            else:
                print("\n❓ No obvious AI indicators in metadata")
        else:
            print("\n❌ No EXIF data found (metadata stripped or never existed)")

    except Exception as e:
        print(f"\n❌ Error reading image: {e}")

    print(f"{'='*60}")

# Example usage
# check_image_metadata('your_image.png')

print("\n💡 Note: Many AI image generators strip metadata, making this method unreliable.")
print("   This is why pixel-based watermarking (like SynthID) is more robust.")

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/59.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.7/59.7 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h
💡 Note: Many AI image generators strip metadata, making this method unreliable.
   This is why pixel-based watermarking (like SynthID) is more robust.


In [18]:
# Example usage
check_image_metadata('/content/Gemini_Generated_Image_rfgo3irfgo3irfgo.png')


ANALYZING: /content/Gemini_Generated_Image_rfgo3irfgo3irfgo.png

❌ No EXIF data found (metadata stripped or never existed)


### 3.2 Hugging Face AI Image Detector

In [19]:
!pip install transformers pillow torch torchvision -q

from transformers import pipeline
from PIL import Image

# Load AI image detection model
print("Loading AI image detector...")
detector = pipeline("image-classification", model="umm-maybe/AI-image-detector")

def detect_ai_image(image_path):
    """Detect if an image is AI-generated using Hugging Face model"""

    img = Image.open(image_path)
    results = detector(img)

    print(f"\n{'='*60}")
    print(f"AI IMAGE DETECTION RESULTS: {image_path}")
    print(f"{'='*60}")

    for result in results:
        label = result['label']
        score = result['score'] * 100

        if label == 'artificial':
            icon = "🤖"
        else:
            icon = "👤"

        print(f"{icon} {label.upper()}: {score:.2f}%")

    # Determine verdict
    top_prediction = results[0]
    if top_prediction['label'] == 'artificial' and top_prediction['score'] > 0.7:
        print("\n✅ LIKELY AI-GENERATED")
    elif top_prediction['label'] == 'human' and top_prediction['score'] > 0.7:
        print("\n✅ LIKELY HUMAN-CREATED")
    else:
        print("\n⚠️  UNCERTAIN - confidence too low")

    print(f"{'='*60}")
    return results

# Example usage:
# detect_ai_image('suspicious_image.png')

print("\nModel loaded! Use detect_ai_image('path_to_image.jpg') to analyze images.")

Loading AI image detector...


config.json:   0%|          | 0.00/937 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/348M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/347M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/240 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Device set to use cpu



Model loaded! Use detect_ai_image('path_to_image.jpg') to analyze images.


In [20]:
# Example usage:
detect_ai_image('/content/Gemini_Generated_Image_rfgo3irfgo3irfgo.png')


AI IMAGE DETECTION RESULTS: /content/Gemini_Generated_Image_rfgo3irfgo3irfgo.png
🤖 ARTIFICIAL: 57.22%
👤 HUMAN: 42.78%

⚠️  UNCERTAIN - confidence too low


[{'label': 'artificial', 'score': 0.5721818208694458},
 {'label': 'human', 'score': 0.4278181195259094}]

In [21]:
# Example usage:
detect_ai_image('/content/newsflash_visual_1764039537720.png')


AI IMAGE DETECTION RESULTS: /content/newsflash_visual_1764039537720.png
🤖 ARTIFICIAL: 66.51%
👤 HUMAN: 33.49%

⚠️  UNCERTAIN - confidence too low


[{'label': 'artificial', 'score': 0.6650604009628296},
 {'label': 'human', 'score': 0.334939569234848}]

### 3.3 OpenAI's CLIP-based Detection

In [22]:
!pip install torch torchvision transformers pillow -q

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import torch

print("Loading CLIP model...")
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

def detect_ai_with_clip(image_path):
    """Use CLIP to detect AI-generated images based on visual features"""

    image = Image.open(image_path)

    # Prompts that describe AI vs real images
    text_prompts = [
        "a computer generated image",
        "an AI generated digital art",
        "a photograph taken with a camera",
        "a real photograph of the world",
        "artificial intelligence artwork",
        "authentic photograph"
    ]

    inputs = processor(
        text=text_prompts,
        images=image,
        return_tensors="pt",
        padding=True
    )

    outputs = model(**inputs)
    logits_per_image = outputs.logits_per_image
    probs = logits_per_image.softmax(dim=1)

    print(f"\n{'='*60}")
    print(f"CLIP-BASED AI DETECTION: {image_path}")
    print(f"{'='*60}")

    for i, prompt in enumerate(text_prompts):
        prob = probs[0][i].item() * 100
        print(f"{prompt}: {prob:.2f}%")

    # Calculate AI vs Real scores
    ai_score = (probs[0][0] + probs[0][1] + probs[0][4]).item() * 100 / 3
    real_score = (probs[0][2] + probs[0][3] + probs[0][5]).item() * 100 / 3

    print(f"\nAggregate Scores:")
    print(f"🤖 AI-Generated: {ai_score:.2f}%")
    print(f"📷 Real Photo: {real_score:.2f}%")

    if ai_score > real_score:
        print("\n✅ More likely AI-GENERATED")
    else:
        print("\n✅ More likely REAL PHOTOGRAPH")

    print(f"{'='*60}")

    return probs

# Example usage:
# detect_ai_with_clip('test_image.jpg')

print("\nCLIP model loaded! Use detect_ai_with_clip('image.jpg') to analyze.")

Loading CLIP model...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.71G [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]


CLIP model loaded! Use detect_ai_with_clip('image.jpg') to analyze.


In [24]:
# Example usage:
detect_ai_with_clip('/content/newsflash_visual_1764039537720.png')


CLIP-BASED AI DETECTION: /content/newsflash_visual_1764039537720.png
a computer generated image: 88.14%
an AI generated digital art: 1.73%
a photograph taken with a camera: 0.02%
a real photograph of the world: 0.03%
artificial intelligence artwork: 10.08%
authentic photograph: 0.00%

Aggregate Scores:
🤖 AI-Generated: 33.32%
📷 Real Photo: 0.02%

✅ More likely AI-GENERATED


tensor([[8.8141e-01, 1.7301e-02, 1.7684e-04, 2.6067e-04, 1.0084e-01, 1.3525e-05]],
       grad_fn=<SoftmaxBackward0>)

### 3.4 Statistical Analysis Method

In [25]:
!pip install numpy opencv-python pillow scipy -q

import numpy as np
import cv2
from PIL import Image
from scipy import stats

def analyze_image_statistics(image_path):
    """Analyze statistical properties that differ between AI and real images"""

    # Load image
    img = cv2.imread(image_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    print(f"\n{'='*60}")
    print(f"STATISTICAL ANALYSIS: {image_path}")
    print(f"{'='*60}")

    # 1. Noise analysis
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    noise_estimate = np.std(cv2.Laplacian(gray, cv2.CV_64F))
    print(f"\n📊 Noise Level: {noise_estimate:.2f}")
    if noise_estimate < 10:
        print("   ⚠️  Very low noise (AI images often have minimal noise)")

    # 2. Color histogram analysis
    hist_r = cv2.calcHist([img_rgb], [0], None, [256], [0, 256])
    hist_g = cv2.calcHist([img_rgb], [1], None, [256], [0, 256])
    hist_b = cv2.calcHist([img_rgb], [2], None, [256], [0, 256])

    # Measure histogram smoothness (AI images often have smoother distributions)
    hist_variance = np.var(hist_r) + np.var(hist_g) + np.var(hist_b)
    print(f"\n🎨 Color Distribution Variance: {hist_variance:.2f}")
    if hist_variance > 1000000:
        print("   ⚠️  Very smooth color distribution (typical of AI)")

    # 3. Edge detection
    edges = cv2.Canny(gray, 100, 200)
    edge_density = np.sum(edges) / edges.size
    print(f"\n🔍 Edge Density: {edge_density:.4f}")
    if edge_density < 0.05:
        print("   ⚠️  Low edge density (AI images can have overly smooth edges)")

    # 4. Compression artifacts
    # AI-generated images often lack typical JPEG compression artifacts
    dct = cv2.dct(np.float32(gray))
    dct_mean = np.mean(np.abs(dct))
    print(f"\n📸 DCT Mean: {dct_mean:.2f}")

    # Overall assessment
    ai_indicators = 0
    if noise_estimate < 10:
        ai_indicators += 1
    if hist_variance > 1000000:
        ai_indicators += 1
    if edge_density < 0.05:
        ai_indicators += 1

    print(f"\n{'='*60}")
    print(f"AI Indicators Found: {ai_indicators}/3")

    if ai_indicators >= 2:
        print("⚠️  Statistical properties suggest possible AI generation")
    else:
        print("✅ Statistical properties more consistent with real photo")

    print(f"{'='*60}")
    print("\n💡 Note: Statistical analysis is indicative, not definitive.")
    print("   Combine with other methods for better accuracy.")

# Example usage:
# analyze_image_statistics('test_image.jpg')

print("\nStatistical analysis ready! Use analyze_image_statistics('image.jpg')")


Statistical analysis ready! Use analyze_image_statistics('image.jpg')


In [26]:
# Example usage:
analyze_image_statistics('/content/newsflash_visual_1764039537720.png')


STATISTICAL ANALYSIS: /content/newsflash_visual_1764039537720.png

📊 Noise Level: 38.86

🎨 Color Distribution Variance: 4264698368.00
   ⚠️  Very smooth color distribution (typical of AI)

🔍 Edge Density: 14.5897

📸 DCT Mean: 15.92

AI Indicators Found: 1/3
✅ Statistical properties more consistent with real photo

💡 Note: Statistical analysis is indicative, not definitive.
   Combine with other methods for better accuracy.


---

## Summary & Recommendations

### For TEXT Detection:
✅ **Use SynthID Text** (fully open-source)
- Most reliable for text generated with watermarking enabled
- Available through Hugging Face Transformers

### For IMAGE Detection:

#### If you have Google Cloud access:
1. **SynthID on Vertex AI** (most reliable, but limited to Imagen-generated images)

#### If you don't have Google Cloud:
1. **Hugging Face AI Detector** (Best free option)
2. **CLIP-based Detection** (Good for general assessment)
3. **Metadata Check** (Quick but easily defeated)
4. **Statistical Analysis** (Supplementary indicator)

### Best Practice:
**Combine multiple methods** for higher confidence:
```python
# Comprehensive check
check_image_metadata('image.jpg')        # Quick check
detect_ai_image('image.jpg')             # ML-based
detect_ai_with_clip('image.jpg')         # CLIP-based
analyze_image_statistics('image.jpg')    # Statistical
```

### Limitations:
- No detector is 100% accurate
- Sophisticated manipulation can fool all methods
- SynthID Image is the most robust but not publicly available
- Alternative methods give probabilistic estimates, not certainty