GitHub - TheMattBerman/scrollclaw: ScrollClaw — AI UGC video skill suite. Anti-polish production pipeline: persona research, Sora 2 A-roll, Kling 3 B-roll, native captions, virality scoring.

███████╗ ██████╗ ██████╗  █████╗     ██╗   ██╗ ██████╗  ██████╗
██╔════╝██╔═══██╗██╔══██╗██╔══██╗    ██║   ██║██╔════╝ ██╔════╝
███████╗██║   ██║██████╔╝███████║    ██║   ██║██║  ███╗██║     
╚════██║██║   ██║██╔══██╗██╔══██║    ██║   ██║██║   ██║██║     
███████║╚██████╔╝██║  ██║██║  ██║    ╚██████╔╝╚██████╔╝╚██████╗
╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝     ╚═════╝  ╚═════╝  ╚═════╝

AI-generated UGC videos that look like a real person pulled out their phone and started talking.

Brands pay $500–$5,000 per UGC video from human creators. This skill produces them for $5–$50 in API costs.

The big idea

Most AI video looks like AI video. Cinematic drone shots. Perfect lighting. Orchestral energy. Nobody scrolls past that thinking "real person" — they think "ad" and keep moving.

UGC works because it looks like someone pulled out their phone and talked.

This skill makes AI produce that.

Not by adding filters after. By building the entire pipeline around anti-polish: persona research that steals real customer language, first frames that look like iPhone photos, motion prompts that produce handheld energy, audio that sounds like a kitchen not a studio, and post-production that adds grain instead of removing it.

What it produces

Brand + Audience
      ↓
┌─────────────────────────────┐
│  1. Persona research        │  ← mines real reviews for exact language
│  2. Creator profile         │  ← persistent AI "creator" with locked identity
│  3. Format + script         │  ← 6 formats with shot-by-shot enforcement
│  4. First frame (Nano Banana)│  ← iPhone-realistic, not AI-looking
│  5. A-roll (Sora 2)        │  ← talking head with synced voice + lip sync
│  6. B-roll (Kling 3)       │  ← fast contextual scenes, env-matched
│  7. Audio orchestration     │  ← native voice, continuous over B-roll
│  8. Post-production         │  ← color grade, grain, frame rate
│  9. Captions                │  ← native platform-style overlays
│ 10. Virality scoring        │  ← 7-criteria gate, only 70+ publishes
└─────────────────────────────┘
      ↓
Scroll-stopping UGC video

6 formats

Format	What it is	Duration
Talking Head	One person, one camera, honest review	15-25s
Hook Face + Demo	Emotive face stops scroll → product demo	15s max
Podcast Clip	Fake podcast guest — mic, headphones, authority	8-20s
Wall of Text	Animated person + dense text overlay	4-8s
Visual Transformation	Named concept ("The Scroll Trap") + before/after	10-25s
Hybrid Transformation	Talking head bookends + slideshow mechanism bridge	20-30s

What you need

API keys

Key	Required	What it does
`FAL_KEY`	Yes	Sora 2 (talking head video) + Kling 3 (B-roll) via fal.ai
`REPLICATE_API_TOKEN`	Yes	Nano Banana (first frame image generation)
`OPENROUTER_API_KEY`	Recommended	Gemini via OpenRouter (virality scoring + analysis)
`ELEVENLABS_API_KEY`	Optional	Only for multi-clip voice consistency (S2S)

Get keys: fal.ai · Replicate · OpenRouter · ElevenLabs

System requirements

Python 3 with PIL (sudo apt install python3-pil)
ffmpeg with libfreetype (sudo apt install ffmpeg — NOT the Homebrew version)
Inter font (auto-downloaded by the dependency checker)

Check everything

bash scripts/check-deps.sh

Quick start — your first video in 20 minutes

Run the dependency check to make sure everything's configured
Tell the skill what brand and who the audience is — it handles persona research
Pick a format — the skill recommends one based on your goal
Approve the script — it writes one mapped to the format's shot breakdown
Generate the first frame — review it before committing to video
Generate video + B-roll — Sora for talking head, Kling for B-roll scenes
Post-production + captions — automated color grade, grain, caption overlay
Virality score — only publish if it scores 70+

Key findings from testing

Sora's native voice is always better than ElevenLabs TTS for talking head. TTS sounds fake. Sora does voice + lip sync together.
B-roll must be environment-matched. Extract a frame from the A-roll → feed to Kling. Generic B-roll looks like stock footage.
Captions go LAST — after post-production. Grain degrades caption pills.
AI cannot generate realistic UI/app screens. Use real screenshots for demos.
Describe audio by how it sounds, not the gear. "Clean, natural, close and present" works. "Shure SM7B" doesn't.
~1 in 3 Sora generations have hand artifacts. Reroll, don't fix the prompt.
Multi-frame formats: chain from frame 1. Generate the canonical face first, then reference it for every subsequent frame. Parallel generation causes face drift.

Evaluation

Baseline comparison: evals/baseline-vs-scrollclaw.md
Execution checks: evals/execution-evals.md
Trigger checks: evals/trigger-evals.md

Architecture

scrollclaw/
├── SKILL.md                    Router — orchestrates the full suite
├── README.md                   You are here
├── _system/                    Shared doctrine + context protocol
│   ├── SKILL.md                Core doctrine, format selection, pipeline routing
│   └── references/
│       ├── brand-campaign-context.md   Brand memory + campaign workspace protocol
│       ├── color-reference-system.md
│       ├── creator-system.md
│       ├── format-library.md
│       ├── hook-emotions.md
│       └── taste-calibration.md
├── persona/                    Step 1: Persona research + scripting
│   ├── SKILL.md
│   └── references/
│       ├── persona-research.md
│       └── script-voice.md
├── first-frame/                Step 2: Canonical frame generation
│   ├── SKILL.md
│   └── references/
│       ├── first-frame-prompting.md
│       └── first-frame-psychology.md
├── animate/                    Step 3: A-roll (Sora 2)
│   ├── SKILL.md
│   └── references/
│       ├── motion-prompting.md
│       └── sora-api.md
├── b-roll/                     Step 4: B-roll (Kling 3)
│   ├── SKILL.md
│   └── references/
│       ├── kling-api.md
│       └── orchestrator.md
├── assemble/                   Step 5: Stitch, post, captions, or full-assemble
│   ├── SKILL.md
│   └── references/
│       ├── audio-orchestration.md
│       ├── green-zone.md
│       ├── orchestrator.md
│       ├── post-production.md
│       └── voice-system.md
├── score/                      Step 6: Virality scoring gate
│   ├── SKILL.md
│   └── references/
│       └── virality-scoring.md
├── scripts/                    10 automation scripts
├── evals/                      Baseline, trigger, and execution benchmarks
└── assets/                     Campaign brief template

Brand & Campaign Context

ScrollClaw persists work across sessions so campaign 10 takes a fraction of campaign 1.

workspace/
├── brand/                      ← Read-only for ScrollClaw (GrowthClaw or manual)
│   ├── voice-profile.md        ← Informs script tone
│   ├── positioning.md          ← Informs persona research direction
│   └── audience.md             ← Anchors creator archetype selection
├── creators/                   ← Global creator profiles (reuse across campaigns)
└── campaigns/<slug>/
    ├── brief.md                ← Campaign brief
    ├── persona-research.md     ← Extracted customer language
    ├── creators/               ← Campaign-specific creator overrides
    ├── scripts/                ← Approved scripts
    ├── frames/                 ← First frames + context frames
    ├── clips/                  ← A-roll, B-roll, assembled finals
    ├── scores/                 ← Virality score cards
    ├── output-log.md           ← All prompt params (append-only)
    └── learnings.md            ← What worked, what didn't (append-only)

Full context protocol: _system/references/brand-campaign-context.md

Built by Matt Berman · Emerald Digital · Big Players Newsletter

Full documentation: _system/SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The big idea

What it produces

6 formats

What you need

API keys

System requirements

Check everything

Quick start — your first video in 20 minutes

Key findings from testing

Evaluation

Architecture

Brand & Campaign Context

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
_system		_system
animate		animate
assemble		assemble
assets		assets
b-roll		b-roll
evals		evals
first-frame		first-frame
persona		persona
score		score
scripts		scripts
test-output		test-output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
SMOKE-TEST.md		SMOKE-TEST.md
VIRALITY-SCORES.md		VIRALITY-SCORES.md

Folders and files

Latest commit

History

Repository files navigation

The big idea

What it produces

6 formats

What you need

API keys

System requirements

Check everything

Quick start — your first video in 20 minutes

Key findings from testing

Evaluation

Architecture

Brand & Campaign Context

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages