Skip to content

creatify-ai/VABench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

VABench — VideoAdAgent Bench

Coming soon.

VABench is a public, reproducible benchmark for AI agents that produce video advertisements end-to-end. We will be releasing the full harness — briefs, evaluation scorers, baseline runners, and aggregation scripts — as open source under the Apache 2.0 license in a future update.

What's in this benchmark

VABench grades AI video-ad systems on three independent evaluation arms:

  • Arm 1 — Capability. Pairwise head-to-head against general-purpose video agents on the brief.
  • Arm 2 — Ad Quality. Pairwise against frontier text-to-video models (single-scene and multi-scene configurations) on eight ad-rubric dimensions.
  • Arm 3 — Video Production Quality. Brief-blind grading on production polish, with a frame-level hallucination rubric folded in at 3× weight.

The brief set covers every major performance-ad pattern (problem-solution, before-after, demo, testimonial, lifestyle) plus targeted stress tests against the hardest axes of multi-scene production (anchor storytelling, persona consistency, brand-asset reuse, structured CTA text, persuasion-arc compliance).

Current results

The headline leaderboard and methodology are public now:

What's coming

  • Brief set + JSON Schema
  • Pairwise judge implementation (Claude Opus 4.7 + GPT-5, position-swap consensus)
  • Frame-level hallucination scorer
  • Structural-compliance checks (duration, aspect, audio, OCR)
  • Baseline runners for the raw text-to-video frontier (Veo 3.1, Kling 3.0, Seedance 2.0) and general-purpose video agents (HeyGen V3)
  • Aggregation scripts that emit the published leaderboard tables

License

Released under the Apache 2.0 license (see LICENSE).

Stay tuned

Watch this repository to be notified when the harness lands.

About

VideoAdAgent Bench is an open evaluation for AI agents that produce video ads end-to-end.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors