# Cross-Model Detection Analysis

This notebook analyzes the ability of GPT-5 to identify code written by other LLM families (e.g., Claude).

## Key Finding

**GPT-5 is better at identifying Claude's code than its own code!**

### Results Summary

| Task | Accuracy | Details |
|------|----------|----------|
| GPT-5 self-recognition | 79.2% | GPT-5 identifying its own code (202/255 correct) |
| GPT-5 detecting Claude | **97.7%** | GPT-5 identifying Claude's code (251/257 correct) |

### Interpretation

This surprising result suggests:
1. Claude has very distinctive coding patterns that are easier for GPT-5 to identify
2. Different model families may have more distinguishable "signatures" than models recognizing their own output
3. This could strengthen the paper's motivation: stronger models like GPT-5 can act as "monitors" to detect code from other LLM families


In [None]:
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

sns.set_style("whitegrid")


## Load Results


In [None]:
def load_results(file_path):
    """Load results from JSONL file."""
    results = []
    with open(file_path) as f:
        for line in f:
            results.append(json.loads(line))
    return pd.DataFrame(results)

# Load self-recognition results
self_recognition_path = Path("../data/self_recognition/mbpp-sanitized/test/openai-gpt-5.jsonl")
self_rec_df = load_results(self_recognition_path)

# Load cross-model detection results
cross_detection_path = Path("../data/cross_model_detection/mbpp-sanitized/test/judge-openai-gpt-5_target-anthropic-claude-haiku-4.5.jsonl")
cross_det_df = load_results(cross_detection_path)

print(f"Self-recognition samples: {len(self_rec_df)}")
print(f"Cross-detection samples: {len(cross_det_df)}")


## Calculate Accuracies


In [None]:
# Self-recognition accuracy
self_rec_valid = self_rec_df[self_rec_df['is_correct'].notna()]
self_rec_acc = self_rec_valid['is_correct'].mean()
self_rec_correct = self_rec_valid['is_correct'].sum()
self_rec_total = len(self_rec_valid)

print(f"Self-Recognition (GPT-5 identifying own code):")
print(f"  Accuracy: {self_rec_acc:.3f} ({self_rec_correct}/{self_rec_total})")

# Cross-detection accuracy
cross_det_valid = cross_det_df[cross_det_df['is_correct'].notna()]
cross_det_acc = cross_det_valid['is_correct'].mean()
cross_det_correct = cross_det_valid['is_correct'].sum()
cross_det_total = len(cross_det_valid)

print(f"\nCross-Model Detection (GPT-5 identifying Claude's code):")
print(f"  Accuracy: {cross_det_acc:.3f} ({cross_det_correct}/{cross_det_total})")

print(f"\nImprovement: +{(cross_det_acc - self_rec_acc) * 100:.1f} percentage points")


## Visualize Results


In [None]:
# Create comparison bar plot
fig, ax = plt.subplots(figsize=(10, 6))

tasks = ['Self-Recognition\n(GPT-5 own code)', 'Cross-Detection\n(GPT-5 → Claude)']
accuracies = [self_rec_acc, cross_det_acc]
colors = ['#3498db', '#e74c3c']

bars = ax.bar(tasks, accuracies, color=colors, alpha=0.8, edgecolor='black', linewidth=1.5)

# Add value labels on bars
for bar, acc in zip(bars, accuracies):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height + 0.01,
            f'{acc:.1%}',
            ha='center', va='bottom', fontsize=14, fontweight='bold')

ax.set_ylim(0, 1.1)
ax.set_ylabel('Accuracy', fontsize=14, fontweight='bold')
ax.set_title('GPT-5 Performance: Self-Recognition vs Cross-Model Detection', 
             fontsize=16, fontweight='bold', pad=20)
ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5, label='Random chance')
ax.legend(fontsize=10)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('../data/cross_model_detection/comparison_plot.png', dpi=300, bbox_inches='tight')
plt.show()

print("Plot saved to: data/cross_model_detection/comparison_plot.png")


## Implications for the Paper

### New Motivation

The paper can now argue that:
1. **Stronger models like GPT-5 can act as effective monitors** for detecting code from other LLM families
2. **Cross-model detection may be easier than self-recognition** - suggesting that different model families have more distinguishable signatures
3. **This has implications for AI safety and monitoring** - one model family can potentially monitor outputs from another

### Potential Follow-up Experiments

1. Test other judge-target combinations (e.g., Claude detecting GPT-5)
2. Test detection across more model families (Gemini, DeepSeek, Grok)
3. Analyze what features make Claude's code so identifiable
4. Test whether fine-tuning can make models better at cross-model detection
