Skip to content

feat: improve AI tool attribution from metadata signals#22

Merged
MatrixA merged 4 commits intomainfrom
MatrixA/aihubmix-api-map
Apr 9, 2026
Merged

feat: improve AI tool attribution from metadata signals#22
MatrixA merged 4 commits intomainfrom
MatrixA/aihubmix-api-map

Conversation

@MatrixA
Copy link
Copy Markdown
Owner

@MatrixA MatrixA commented Apr 8, 2026

Summary

Improve AI tool identification so detection results tell you WHICH model generated the content, not just "it's AI." Tool identification rate improves from 30% to 79% across 86 detected aihubmix-generated files.

Changes

  • C2PA: Google claim_generator/claim_generator_info fallback → google ai (covers all Imagen & Gemini images)
  • XMP: photoshop:Credit = "Made with Google AI" → google ai
  • MP4: AIGC ContentProducer ID mapping → wan (all Wan video variants)
  • EXIF: UserComment AIGC JSON with ContentProducer prefix → qwen (Qianfan Qwen images)
  • EXIF UserComment decoding: Properly decode raw bytes (8-byte charset prefix + ASCII payload) instead of displaying hex
  • New known tool patterns: google ai, wan, qwen, 通义万相
  • i18n: Add signal_xmp_credit and signal_exif_aigc_label keys to all 7 locale files

Checklist

  • cargo fmt -- --check passes
  • cargo clippy -- -D warnings passes
  • cargo test passes
  • New detection methods include appropriate confidence tiers
  • Documentation updated (if applicable)

Map unique metadata patterns to specific AI models so detection results
identify WHICH tool generated the content, not just that it's AI-generated.
Tool identification rate improves from 30% to 79% across aihubmix test files.

- C2PA: Google claim_generator/info fallback → "google ai" (Imagen/Gemini)
- XMP: photoshop:Credit "Made with Google AI" → "google ai"
- MP4: AIGC ContentProducer ID mapping → "wan" (Wan videos)
- EXIF: AIGC JSON in UserComment with ContentProducer prefix → "qwen"
- known_tools: add "google ai", "wan", "qwen", "通义万相" patterns
- i18n: add signal_xmp_credit and signal_exif_aigc_label to all 7 locales

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatrixA MatrixA changed the title feat: improve AI tool attribution & add aihubmix test dataset feat: improve AI tool attribution from metadata signals Apr 8, 2026
MatrixA and others added 3 commits April 8, 2026 21:29
…idated generation script

Add AI-generated media from 60+ models via aihubmix API as test fixtures,
with integration tests covering all detection methods (C2PA, XMP, EXIF AIGC,
MP4 AIGC, watermark, WAV TTS heuristic, filename). Also consolidates 7
generation scripts into one with all API route fixes from retry rounds.

- 70 new fixtures: 39 images, 22 videos, 9 audio from Google Imagen/Gemini,
  OpenAI, Ideogram, Flux, Doubao Seedream/Seedance, Jimeng, Wan, Veo, Qwen,
  Sora, and various TTS models
- 49 new integration tests in tests/aihubmix_detection.rs
- .gitattributes: add LFS tracking for .wav, .png, .jpg, .jpeg in fixtures
- scripts/: consolidated generate_aihubmix_all_models.py with all retry fixes
  (Flux polling, Gemini native API, AIGC producer mapping, multipart upload)
- Cleaned up: removed 6 retry scripts, generated/ directory, __pycache__

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update AI tool count from 61 to 76 across all 9 README files
- Document XMP Credit field detection (e.g. Google AI)
- Document EXIF AIGC label detection (e.g. Qwen via ContentProducer)
- Document MP4 AIGC ContentProducer→tool mapping (e.g. Wan videos)
- Document C2PA claim_generator vendor inference (e.g. Google)
- Add new tools to recognition table: Qwen, Wan, Google AI
- Update Cargo.toml description to list all 10 detection methods

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatrixA MatrixA merged commit bede557 into main Apr 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant