# VTuber Auto-Performance: 从真实表演逆向工程生成系统

## 实验目标

**核心假设**: 真人主播的"人味"可以被结构化提取，提取出的模式可以迁移到AI主播上。

**MVP定义**: 生成2分钟可循环的表演叙事单元（抛梗→叙事→转折→破防），可直接切片为短视频。

---

## 创新点

1. **Attention Focus作为一等公民**: 首次将主播的"注意力指向"作为可控生成参数
2. **Performance→Prompt逆向工程**: 从真实表演反推Prompt，而非拍脑袋设计
3. **面向切片的结构设计**: 生成内容本身按"可切片"结构设计，而非后处理找高光

---

## 实验流程

```
Phase 1: 数据标注 → Phase 2: 模式提炼 → Phase 3: 生成验证
```

---

# Phase 1: 数据处理与标注

## 1.1 标注Schema定义

In [None]:
# 核心标注Schema
from dataclasses import dataclass, field
from typing import List, Optional, Literal
from enum import Enum
import json

# ============================================
# 核心维度1: 注意力指向 (Attention Focus)
# 主播此刻在和谁对话/关注什么
# ============================================
class AttentionFocus(str, Enum):
    SELF = "self"           # 自言自语/内心独白/讲自己的事
    AUDIENCE = "audience"   # 直接对观众说话（你们、大家）
    SPECIFIC = "specific"   # 回应特定观众（读SC、点名）
    CONTENT = "content"     # 专注于内容（逗猫、看画面、读东西）
    META = "meta"           # 谈论直播本身（今天播多久、设备问题）

# ============================================
# 核心维度2: 话语行为 (Speech Act)
# 简化版Dialogue Act，只保留直播场景最relevant的
# ============================================
class SpeechAct(str, Enum):
    NARRATE = "narrate"         # 叙事：讲故事、描述经历
    OPINE = "opine"             # 表态：发表观点、评价
    RESPOND = "respond"         # 回应：回答问题、接梗
    ELICIT = "elicit"           # 引出：提问、抛梗、邀请互动
    PIVOT = "pivot"             # 转折：话题转换、承上启下
    BACKCHANNEL = "backchannel" # 填充：语气词、思考、过渡

# ============================================
# 核心维度3: 触发源 (Trigger)
# 这段话是被什么触发的
# ============================================
class Trigger(str, Enum):
    SC = "sc"               # SC/打赏触发
    DANMAKU = "danmaku"     # 弹幕触发
    SELF_INIT = "self"      # 自发（无外部触发）
    CONTENT = "content"     # 内容触发（猫动了、画面变化）
    PRIOR = "prior"         # 承接上文

# ============================================
# 数据结构定义
# ============================================
@dataclass
class Segment:
    """最小标注单元：一个语义完整的话语片段"""
    id: str                          # segment唯一ID
    start_time: float                # 开始时间（秒）
    end_time: float                  # 结束时间（秒）
    text: str                        # 转录文本
    
    # 三个核心维度
    attention_focus: AttentionFocus  # 注意力指向
    speech_act: SpeechAct            # 话语行为
    trigger: Trigger                 # 触发源
    
    # 可选：风格标记
    catchphrase: Optional[str] = None      # 口癖（如果出现）
    emotion_shift: bool = False            # 是否有明显情绪变化
    
    def to_dict(self):
        return {
            "id": self.id,
            "start_time": self.start_time,
            "end_time": self.end_time,
            "text": self.text,
            "attention_focus": self.attention_focus.value,
            "speech_act": self.speech_act.value,
            "trigger": self.trigger.value,
            "catchphrase": self.catchphrase,
            "emotion_shift": self.emotion_shift
        }

@dataclass
class Clip:
    """一个完整的切片，包含多个segment"""
    id: str                          # clip唯一ID
    source: str                      # 来源（主播名/平台）
    language: str                    # 语言 (zh/en)
    duration_sec: float              # 总时长
    
    # 元数据
    title: Optional[str] = None      # 切片标题/主题
    quality_score: Optional[int] = None  # 质量评分 1-5
    
    # 内容
    segments: List[Segment] = field(default_factory=list)
    
    # 提取的模式
    skeleton: Optional[str] = None   # 叙事骨架类型
    catchphrases: List[str] = field(default_factory=list)  # 口癖列表
    
    def to_dict(self):
        return {
            "id": self.id,
            "source": self.source,
            "language": self.language,
            "duration_sec": self.duration_sec,
            "title": self.title,
            "quality_score": self.quality_score,
            "segments": [s.to_dict() for s in self.segments],
            "skeleton": self.skeleton,
            "catchphrases": self.catchphrases
        }

print("✓ Schema定义完成")
print(f"  - AttentionFocus: {[e.value for e in AttentionFocus]}")
print(f"  - SpeechAct: {[e.value for e in SpeechAct]}")
print(f"  - Trigger: {[e.value for e in Trigger]}")

## 1.2 原始数据格式规范

你需要准备的Markdown格式如下：

In [None]:
# 展示期望的输入格式
EXPECTED_INPUT_FORMAT = '''
# 原始切片数据格式规范

请将每个切片整理为以下Markdown格式：

---

## clip_001

- **source**: 某主播名
- **language**: zh
- **duration**: 120  (秒)
- **title**: 起名与责任转嫁

### transcript

```
0:00 谢谢xx的SC，也可以先从宝宝起名开始
0:08 不行不行不行，给活物起名是我的第一线
0:12 有很多人啊，他做选择是什么，他就想赖别人...
...
```

### notes (可选)

- 这段是回复SC引发的深度展开
- 口癖："我跟你们说"、"对不对"
- 有明显的情绪升级

---

## clip_002

...
'''

print(EXPECTED_INPUT_FORMAT)

## 1.3 数据解析器

In [None]:
import re
from typing import Tuple

def parse_timestamp(ts: str) -> float:
    """解析时间戳，支持 M:SS 和 MM:SS 格式"""
    parts = ts.strip().split(':')
    if len(parts) == 2:
        return int(parts[0]) * 60 + float(parts[1])
    elif len(parts) == 3:
        return int(parts[0]) * 3600 + int(parts[1]) * 60 + float(parts[2])
    return 0.0

def parse_transcript_line(line: str) -> Tuple[float, str]:
    """解析带时间戳的转录行"""
    # 匹配 "0:00 文本" 或 "00:00 文本" 格式
    match = re.match(r'^(\d{1,2}:\d{2})\s+(.+)$', line.strip())
    if match:
        return parse_timestamp(match.group(1)), match.group(2)
    return None, line

def parse_raw_clips_markdown(md_content: str) -> List[dict]:
    """
    解析原始切片Markdown文件
    返回结构化的clip列表（未标注版本）
    """
    clips = []
    current_clip = None
    current_section = None
    transcript_lines = []
    notes_lines = []
    
    for line in md_content.split('\n'):
        line = line.strip()
        
        # 新clip开始
        if line.startswith('## clip_'):
            # 保存之前的clip
            if current_clip:
                current_clip['transcript_lines'] = transcript_lines
                current_clip['notes'] = '\n'.join(notes_lines)
                clips.append(current_clip)
            
            clip_id = line.replace('## ', '').strip()
            current_clip = {'id': clip_id}
            transcript_lines = []
            notes_lines = []
            current_section = None
            
        # 元数据
        elif line.startswith('- **source**:'):
            current_clip['source'] = line.split(':', 1)[1].strip()
        elif line.startswith('- **language**:'):
            current_clip['language'] = line.split(':', 1)[1].strip()
        elif line.startswith('- **duration**:'):
            current_clip['duration_sec'] = float(line.split(':', 1)[1].strip().split()[0])
        elif line.startswith('- **title**:'):
            current_clip['title'] = line.split(':', 1)[1].strip()
            
        # Section标记
        elif line == '### transcript':
            current_section = 'transcript'
        elif line == '### notes' or line.startswith('### notes'):
            current_section = 'notes'
        elif line == '```':
            continue
            
        # 内容收集
        elif current_section == 'transcript' and line:
            ts, text = parse_transcript_line(line)
            if ts is not None:
                transcript_lines.append({'time': ts, 'text': text})
        elif current_section == 'notes' and line:
            notes_lines.append(line)
    
    # 保存最后一个clip
    if current_clip:
        current_clip['transcript_lines'] = transcript_lines
        current_clip['notes'] = '\n'.join(notes_lines)
        clips.append(current_clip)
    
    return clips

print("✓ 数据解析器定义完成")

## 1.4 LLM自动预标注

In [None]:
# LLM预标注Prompt模板
AUTO_ANNOTATION_PROMPT = '''
你是一个专业的直播内容分析师。请对以下直播切片进行segment级别的标注。

## 标注维度

### 1. attention_focus（注意力指向）
- `self`: 自言自语/内心独白/讲自己的事
- `audience`: 直接对观众说话（你们、大家）
- `specific`: 回应特定观众（读SC、点名）
- `content`: 专注于内容（逗猫、看画面）
- `meta`: 谈论直播本身

### 2. speech_act（话语行为）
- `narrate`: 叙事，讲故事、描述经历
- `opine`: 表态，发表观点、评价
- `respond`: 回应，回答问题、接梗
- `elicit`: 引出，提问、抛梗、邀请互动
- `pivot`: 转折，话题转换、承上启下
- `backchannel`: 填充，语气词、思考、过渡

### 3. trigger（触发源）
- `sc`: SC/打赏触发
- `danmaku`: 弹幕触发
- `self`: 自发（无外部触发）
- `content`: 内容触发（画面变化）
- `prior`: 承接上文

## 切分原则

1. 每个segment应该是一个语义完整的单元（通常5-30秒）
2. 当attention_focus或speech_act发生变化时，应该切分新segment
3. 保持segment数量合理（一个2分钟切片通常5-10个segment）

## 输入

```
{transcript}
```

## 输出格式

请输出JSON格式：

```json
{{
  "segments": [
    {{
      "id": "seg_01",
      "start_time": 0.0,
      "end_time": 8.0,
      "text": "segment的文本内容",
      "attention_focus": "specific",
      "speech_act": "respond",
      "trigger": "sc",
      "catchphrase": null,
      "emotion_shift": false
    }},
    ...
  ],
  "skeleton": "触发→拒绝→说理→升华",
  "catchphrases": ["我跟你们说", "对不对"]
}}
```
'''

def format_transcript_for_llm(transcript_lines: List[dict]) -> str:
    """格式化转录文本供LLM处理"""
    lines = []
    for item in transcript_lines:
        minutes = int(item['time'] // 60)
        seconds = int(item['time'] % 60)
        lines.append(f"{minutes}:{seconds:02d} {item['text']}")
    return '\n'.join(lines)

def create_annotation_prompt(clip: dict) -> str:
    """为单个clip创建标注prompt"""
    transcript = format_transcript_for_llm(clip.get('transcript_lines', []))
    return AUTO_ANNOTATION_PROMPT.format(transcript=transcript)

print("✓ LLM预标注模板定义完成")
print("\n示例Prompt预览（前500字符）：")
print(AUTO_ANNOTATION_PROMPT[:500] + "...")

In [None]:
# 模拟LLM调用（实际使用时替换为真实API）
import os

def call_llm_for_annotation(prompt: str, api_key: str = None) -> dict:
    """
    调用LLM进行自动标注
    
    实际使用时，替换为你的API调用：
    - Anthropic Claude API
    - OpenAI API
    - 阿里云通义千问
    等
    """
    # TODO: 替换为实际API调用
    # 示例：使用Anthropic API
    '''
    import anthropic
    client = anthropic.Anthropic(api_key=api_key)
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": prompt}]
    )
    return json.loads(response.content[0].text)
    '''
    
    # 占位返回
    print("[注意] 这是模拟返回，实际使用时需要配置API")
    return {
        "segments": [],
        "skeleton": "待标注",
        "catchphrases": []
    }

print("✓ LLM调用函数定义完成")

---

# Phase 2: 模式提炼与统计分析

In [None]:
import numpy as np
from collections import Counter, defaultdict

class PatternAnalyzer:
    """从标注数据中提炼模式"""
    
    def __init__(self, clips: List[Clip]):
        self.clips = clips
        self.all_segments = []
        for clip in clips:
            self.all_segments.extend(clip.segments)
    
    def compute_attention_transition_matrix(self) -> dict:
        """
        计算attention_focus的状态转移矩阵
        返回: {from_state: {to_state: count}}
        """
        transitions = defaultdict(lambda: defaultdict(int))
        
        for clip in self.clips:
            for i in range(len(clip.segments) - 1):
                from_state = clip.segments[i].attention_focus.value
                to_state = clip.segments[i + 1].attention_focus.value
                transitions[from_state][to_state] += 1
        
        # 转换为概率
        prob_matrix = {}
        for from_state, to_states in transitions.items():
            total = sum(to_states.values())
            prob_matrix[from_state] = {
                to_state: count / total 
                for to_state, count in to_states.items()
            }
        
        return prob_matrix
    
    def compute_trigger_speech_act_distribution(self) -> dict:
        """
        计算不同trigger下speech_act的条件分布
        返回: {trigger: {speech_act: probability}}
        """
        dist = defaultdict(lambda: defaultdict(int))
        
        for seg in self.all_segments:
            dist[seg.trigger.value][seg.speech_act.value] += 1
        
        # 转换为概率
        prob_dist = {}
        for trigger, speech_acts in dist.items():
            total = sum(speech_acts.values())
            prob_dist[trigger] = {
                sa: count / total 
                for sa, count in speech_acts.items()
            }
        
        return prob_dist
    
    def extract_skeleton_patterns(self) -> List[str]:
        """
        提取常见的叙事骨架模式
        返回: 按频率排序的skeleton列表
        """
        skeletons = [clip.skeleton for clip in self.clips if clip.skeleton]
        return [s for s, _ in Counter(skeletons).most_common()]
    
    def extract_segment_type_sequences(self) -> List[List[str]]:
        """
        提取每个clip的segment类型序列
        返回: [["specific_respond", "self_narrate", ...], ...]
        """
        sequences = []
        for clip in self.clips:
            seq = [
                f"{seg.attention_focus.value}_{seg.speech_act.value}"
                for seg in clip.segments
            ]
            sequences.append(seq)
        return sequences
    
    def compute_statistics(self) -> dict:
        """计算整体统计信息"""
        attention_counts = Counter(seg.attention_focus.value for seg in self.all_segments)
        speech_act_counts = Counter(seg.speech_act.value for seg in self.all_segments)
        trigger_counts = Counter(seg.trigger.value for seg in self.all_segments)
        
        return {
            "total_clips": len(self.clips),
            "total_segments": len(self.all_segments),
            "avg_segments_per_clip": len(self.all_segments) / len(self.clips) if self.clips else 0,
            "attention_distribution": dict(attention_counts),
            "speech_act_distribution": dict(speech_act_counts),
            "trigger_distribution": dict(trigger_counts)
        }
    
    def generate_report(self) -> str:
        """生成分析报告"""
        stats = self.compute_statistics()
        trans_matrix = self.compute_attention_transition_matrix()
        trigger_dist = self.compute_trigger_speech_act_distribution()
        skeletons = self.extract_skeleton_patterns()
        
        report = f"""
# 模式分析报告

## 基础统计
- 总切片数: {stats['total_clips']}
- 总segment数: {stats['total_segments']}
- 平均每clip的segment数: {stats['avg_segments_per_clip']:.1f}

## Attention Focus分布
{json.dumps(stats['attention_distribution'], indent=2, ensure_ascii=False)}

## Speech Act分布
{json.dumps(stats['speech_act_distribution'], indent=2, ensure_ascii=False)}

## Trigger分布
{json.dumps(stats['trigger_distribution'], indent=2, ensure_ascii=False)}

## Attention状态转移矩阵
{json.dumps(trans_matrix, indent=2, ensure_ascii=False)}

## 常见叙事骨架
{chr(10).join(f'- {s}' for s in skeletons[:10])}

## Trigger→SpeechAct条件分布
{json.dumps(trigger_dist, indent=2, ensure_ascii=False)}
        """
        return report

print("✓ PatternAnalyzer定义完成")

---

# Phase 3: MultiAgent生成系统

In [None]:
# Agent Prompt模板

DIRECTOR_AGENT_PROMPT = '''
你是一个VTuber直播的导演Agent。你的任务是规划2分钟表演单元的结构。

## 输入
- 人设: {persona}
- 背景: {background}
- 主题: {topic}
- 触发事件: {trigger_event}

## 输出要求
规划5-8个segment的序列，每个segment包含:
- attention_focus: self/audience/specific/content/meta
- speech_act: narrate/opine/respond/elicit/pivot/backchannel
- duration_hint: 预估时长（秒）
- content_hint: 这个segment应该讲什么的简要提示

## 结构原则（基于真实主播数据统计）
{structure_rules}

## 输出格式
```json
{{
  "skeleton": "触发→展开→转折→释放",
  "segments": [
    {{
      "attention_focus": "specific",
      "speech_act": "respond",
      "duration_hint": 10,
      "content_hint": "读SC并简短回应"
    }},
    ...
  ]
}}
```
'''

NARRATOR_AGENT_PROMPT = '''
你是一个VTuber直播的内容生成Agent。你的任务是根据导演的规划生成具体台词。

## 人设
{persona}

## 当前segment规划
- attention_focus: {attention_focus}
- speech_act: {speech_act}
- content_hint: {content_hint}
- 预估时长: {duration_hint}秒

## 上下文
- 之前的segment: {previous_segments}
- 触发事件: {trigger_event}

## 参考风格（Few-shot Examples）
{style_examples}

## 输出要求
生成这个segment的具体台词，要求:
1. 符合attention_focus指向（对谁说话）
2. 符合speech_act类型（在做什么）
3. 符合人设和口语风格
4. 时长约{duration_hint}秒（约{word_count}字）

直接输出台词文本，不要加任何标记。
'''

STYLE_AGENT_PROMPT = '''
你是一个风格润色Agent。你的任务是让生成的台词更有"人味"。

## 原始台词
{raw_text}

## 人设口癖
{catchphrases}

## 润色规则
1. 适当加入口癖（但不要过度）
2. 加入语气词和停顿标记
3. 调整句子长度，短句为主
4. 保持口语化，避免书面语

## 输出
直接输出润色后的台词。
'''

CONSISTENCY_CHECKER_PROMPT = '''
你是一个人设一致性检查Agent。检查生成的内容是否与人设矛盾。

## 人设
{persona}

## 生成的内容
{generated_content}

## 检查项
1. 是否有与人设矛盾的陈述？
2. 是否有不合理的知识假设？
3. 语气是否一致？

## 输出格式
```json
{{
  "is_consistent": true/false,
  "issues": ["问题描述", ...],
  "suggestions": ["修改建议", ...]
}}
```
'''

print("✓ Agent Prompt模板定义完成")

In [None]:
@dataclass
class PerformanceUnit:
    """生成的2分钟表演单元"""
    persona: str
    topic: str
    skeleton: str
    segments: List[dict]  # 每个segment包含台词
    total_duration: float
    consistency_check: dict

class VTuberPerformanceGenerator:
    """
    VTuber自动表演生成系统
    
    Pipeline:
    1. Director Agent: 规划结构
    2. Narrator Agent: 生成台词
    3. Style Agent: 风格润色
    4. Consistency Checker: 一致性检查
    """
    
    def __init__(self, 
                 pattern_analyzer: PatternAnalyzer = None,
                 api_key: str = None):
        self.pattern_analyzer = pattern_analyzer
        self.api_key = api_key
        
        # 从分析器提取规则（如果有）
        self.structure_rules = self._extract_structure_rules()
        self.style_examples = self._extract_style_examples()
    
    def _extract_structure_rules(self) -> str:
        """从标注数据提取结构规则"""
        if not self.pattern_analyzer:
            return "（使用默认规则）"
        
        trans_matrix = self.pattern_analyzer.compute_attention_transition_matrix()
        rules = []
        
        for from_state, to_states in trans_matrix.items():
            top_transitions = sorted(to_states.items(), key=lambda x: -x[1])[:2]
            for to_state, prob in top_transitions:
                if prob > 0.2:
                    rules.append(f"- {from_state}后常接{to_state}（{prob:.0%}概率）")
        
        return '\n'.join(rules)
    
    def _extract_style_examples(self) -> str:
        """从标注数据提取风格示例"""
        if not self.pattern_analyzer:
            return "（无示例）"
        
        examples = []
        for clip in self.pattern_analyzer.clips[:3]:
            for seg in clip.segments[:2]:
                examples.append(f"[{seg.attention_focus.value}_{seg.speech_act.value}] {seg.text[:100]}...")
        
        return '\n'.join(examples)
    
    def generate(self, 
                 persona: str,
                 background: str,
                 topic: str,
                 trigger_event: str = None,
                 catchphrases: List[str] = None) -> PerformanceUnit:
        """
        生成一个2分钟表演单元
        
        Args:
            persona: 人设描述
            background: 背景设定
            topic: 直播主题
            trigger_event: 触发事件（如SC内容）
            catchphrases: 口癖列表
        
        Returns:
            PerformanceUnit: 生成的表演单元
        """
        
        # Step 1: Director规划结构
        print("[1/4] Director Agent: 规划结构...")
        structure = self._call_director_agent(
            persona=persona,
            background=background,
            topic=topic,
            trigger_event=trigger_event
        )
        
        # Step 2: Narrator生成台词
        print("[2/4] Narrator Agent: 生成台词...")
        segments_with_text = []
        previous_texts = []
        
        for i, seg_plan in enumerate(structure.get('segments', [])):
            text = self._call_narrator_agent(
                persona=persona,
                segment_plan=seg_plan,
                previous_segments=previous_texts,
                trigger_event=trigger_event
            )
            
            # Step 3: Style润色
            print(f"[3/4] Style Agent: 润色segment {i+1}...")
            styled_text = self._call_style_agent(
                raw_text=text,
                catchphrases=catchphrases or []
            )
            
            seg_plan['text'] = styled_text
            segments_with_text.append(seg_plan)
            previous_texts.append(styled_text)
        
        # Step 4: 一致性检查
        print("[4/4] Consistency Checker: 检查一致性...")
        full_content = '\n'.join([s['text'] for s in segments_with_text])
        consistency_check = self._call_consistency_checker(
            persona=persona,
            generated_content=full_content
        )
        
        # 估算总时长
        total_duration = sum(s.get('duration_hint', 15) for s in segments_with_text)
        
        return PerformanceUnit(
            persona=persona,
            topic=topic,
            skeleton=structure.get('skeleton', ''),
            segments=segments_with_text,
            total_duration=total_duration,
            consistency_check=consistency_check
        )
    
    def _call_director_agent(self, persona, background, topic, trigger_event) -> dict:
        """调用Director Agent"""
        prompt = DIRECTOR_AGENT_PROMPT.format(
            persona=persona,
            background=background,
            topic=topic,
            trigger_event=trigger_event or "无特定触发",
            structure_rules=self.structure_rules
        )
        # TODO: 替换为实际LLM调用
        return {
            "skeleton": "触发→展开→转折→释放",
            "segments": [
                {"attention_focus": "specific", "speech_act": "respond", "duration_hint": 10, "content_hint": "回应触发"},
                {"attention_focus": "self", "speech_act": "narrate", "duration_hint": 30, "content_hint": "展开叙述"},
                {"attention_focus": "audience", "speech_act": "opine", "duration_hint": 20, "content_hint": "发表观点"},
            ]
        }
    
    def _call_narrator_agent(self, persona, segment_plan, previous_segments, trigger_event) -> str:
        """调用Narrator Agent"""
        # TODO: 替换为实际LLM调用
        return f"[示例台词 - {segment_plan['content_hint']}]"
    
    def _call_style_agent(self, raw_text, catchphrases) -> str:
        """调用Style Agent"""
        # TODO: 替换为实际LLM调用
        return raw_text
    
    def _call_consistency_checker(self, persona, generated_content) -> dict:
        """调用Consistency Checker"""
        # TODO: 替换为实际LLM调用
        return {"is_consistent": True, "issues": [], "suggestions": []}

print("✓ VTuberPerformanceGenerator定义完成")

---

# Phase 4: 评估与迭代

In [None]:
@dataclass
class EvaluationResult:
    """评估结果"""
    clip_id: str
    
    # 结构评估
    structure_validity: float      # 结构是否符合统计分布 (0-1)
    attention_flow_score: float    # 注意力切换自然度 (0-1)
    
    # 质量评估
    human_likeness: Optional[float] = None  # 人味得分 (人工评分 1-5)
    clipability_score: Optional[float] = None  # 可切片性 (0-1)
    
    # 一致性
    persona_consistency: bool = True
    
    # 备注
    notes: str = ""

class Evaluator:
    """评估器"""
    
    def __init__(self, pattern_analyzer: PatternAnalyzer = None):
        self.pattern_analyzer = pattern_analyzer
    
    def evaluate_structure(self, performance: PerformanceUnit) -> float:
        """
        评估生成结构是否符合真实数据分布
        """
        if not self.pattern_analyzer:
            return 0.5  # 无参考数据时返回中性分数
        
        # 检查segment数量是否合理
        stats = self.pattern_analyzer.compute_statistics()
        avg_segs = stats['avg_segments_per_clip']
        actual_segs = len(performance.segments)
        
        # segment数量评分
        seg_count_score = max(0, 1 - abs(actual_segs - avg_segs) / avg_segs)
        
        # 状态转移评分
        trans_matrix = self.pattern_analyzer.compute_attention_transition_matrix()
        transition_scores = []
        
        for i in range(len(performance.segments) - 1):
            from_state = performance.segments[i].get('attention_focus', '')
            to_state = performance.segments[i + 1].get('attention_focus', '')
            
            if from_state in trans_matrix and to_state in trans_matrix.get(from_state, {}):
                prob = trans_matrix[from_state][to_state]
                transition_scores.append(prob)
            else:
                transition_scores.append(0.1)  # 未见过的转换给低分
        
        trans_score = sum(transition_scores) / len(transition_scores) if transition_scores else 0.5
        
        return (seg_count_score + trans_score) / 2
    
    def evaluate_attention_flow(self, performance: PerformanceUnit) -> float:
        """
        评估注意力切换的自然度
        - 不能连续太多相同的attention_focus
        - 切换不能太频繁
        """
        focuses = [s.get('attention_focus', '') for s in performance.segments]
        
        if len(focuses) < 2:
            return 1.0
        
        # 计算切换次数
        switches = sum(1 for i in range(len(focuses) - 1) if focuses[i] != focuses[i + 1])
        switch_rate = switches / (len(focuses) - 1)
        
        # 理想切换率在0.3-0.7之间
        if 0.3 <= switch_rate <= 0.7:
            return 1.0
        elif switch_rate < 0.3:
            return switch_rate / 0.3
        else:
            return (1 - switch_rate) / 0.3
    
    def check_clipability(self, performance: PerformanceUnit) -> Tuple[float, List[dict]]:
        """
        检查可切片性：能否从中提取出独立的短视频片段
        
        Returns:
            score: 可切片性得分 (0-1)
            potential_clips: 潜在可切片的segment组合
        """
        potential_clips = []
        
        # 寻找有完整叙事弧的segment组合
        # 好的切片通常包含: 开头(respond/elicit) + 中间(narrate) + 结尾(opine/pivot)
        
        for i in range(len(performance.segments)):
            for j in range(i + 2, min(i + 5, len(performance.segments) + 1)):
                sub_segments = performance.segments[i:j]
                
                # 检查是否有完整弧线
                speech_acts = [s.get('speech_act', '') for s in sub_segments]
                
                has_opening = speech_acts[0] in ['respond', 'elicit', 'pivot']
                has_body = any(sa == 'narrate' for sa in speech_acts[1:-1]) if len(speech_acts) > 2 else True
                has_closing = speech_acts[-1] in ['opine', 'respond', 'pivot']
                
                if has_opening and has_body and has_closing:
                    duration = sum(s.get('duration_hint', 15) for s in sub_segments)
                    if 15 <= duration <= 60:  # 15-60秒适合短视频
                        potential_clips.append({
                            "start_idx": i,
                            "end_idx": j,
                            "duration": duration,
                            "segments": sub_segments
                        })
        
        # 得分 = 潜在切片数 / 最大可能切片数
        max_possible = len(performance.segments) - 1
        score = min(1.0, len(potential_clips) / max_possible) if max_possible > 0 else 0
        
        return score, potential_clips
    
    def full_evaluate(self, performance: PerformanceUnit) -> EvaluationResult:
        """完整评估"""
        structure_score = self.evaluate_structure(performance)
        attention_score = self.evaluate_attention_flow(performance)
        clipability_score, potential_clips = self.check_clipability(performance)
        
        return EvaluationResult(
            clip_id=f"gen_{hash(performance.topic) % 10000}",
            structure_validity=structure_score,
            attention_flow_score=attention_score,
            clipability_score=clipability_score,
            persona_consistency=performance.consistency_check.get('is_consistent', True),
            notes=f"发现{len(potential_clips)}个潜在可切片段"
        )

print("✓ Evaluator定义完成")

---

# 完整Pipeline运行示例

In [None]:
def run_full_pipeline(raw_clips_md_path: str = None, api_key: str = None):
    """
    运行完整Pipeline
    
    Args:
        raw_clips_md_path: 原始切片Markdown文件路径
        api_key: LLM API密钥
    """
    print("="*60)
    print("VTuber Auto-Performance 实验Pipeline")
    print("="*60)
    
    # ============================================
    # Phase 1: 数据处理
    # ============================================
    print("\n[Phase 1] 数据处理与标注")
    print("-"*40)
    
    if raw_clips_md_path:
        with open(raw_clips_md_path, 'r', encoding='utf-8') as f:
            raw_md = f.read()
        raw_clips = parse_raw_clips_markdown(raw_md)
        print(f"✓ 解析到 {len(raw_clips)} 个原始切片")
    else:
        print("⚠ 未提供原始数据文件，使用示例数据")
        raw_clips = []
    
    # 这里应该进行LLM标注，暂用模拟数据
    annotated_clips = []  # TODO: 实际标注后填充
    
    # ============================================
    # Phase 2: 模式分析
    # ============================================
    print("\n[Phase 2] 模式分析")
    print("-"*40)
    
    if annotated_clips:
        analyzer = PatternAnalyzer(annotated_clips)
        report = analyzer.generate_report()
        print(report)
    else:
        print("⚠ 无标注数据，跳过模式分析")
        analyzer = None
    
    # ============================================
    # Phase 3: 生成测试
    # ============================================
    print("\n[Phase 3] 生成测试")
    print("-"*40)
    
    generator = VTuberPerformanceGenerator(
        pattern_analyzer=analyzer,
        api_key=api_key
    )
    
    # 测试生成
    test_persona = """
    你是一个25岁的游戏主播，性格活泼但偶尔会突然深沉。
    喜欢讲自己的糗事，经常自嘲。
    口癖："我跟你们说"、"天哪"、"对不对"
    """
    
    test_background = "日常杂谈直播，没有特定游戏"
    test_topic = "观众问能不能帮忙给宠物起名字"
    test_trigger = "SC: 主播主播，能不能帮我给我的小猫起个名字？"
    
    print(f"生成测试...")
    print(f"  人设: {test_persona[:50]}...")
    print(f"  主题: {test_topic}")
    print(f"  触发: {test_trigger}")
    
    performance = generator.generate(
        persona=test_persona,
        background=test_background,
        topic=test_topic,
        trigger_event=test_trigger,
        catchphrases=["我跟你们说", "天哪", "对不对"]
    )
    
    print(f"\n生成结果:")
    print(f"  骨架: {performance.skeleton}")
    print(f"  段落数: {len(performance.segments)}")
    print(f"  预估时长: {performance.total_duration}秒")
    
    # ============================================
    # Phase 4: 评估
    # ============================================
    print("\n[Phase 4] 评估")
    print("-"*40)
    
    evaluator = Evaluator(pattern_analyzer=analyzer)
    eval_result = evaluator.full_evaluate(performance)
    
    print(f"评估结果:")
    print(f"  结构有效性: {eval_result.structure_validity:.2f}")
    print(f"  注意力流畅度: {eval_result.attention_flow_score:.2f}")
    print(f"  可切片性: {eval_result.clipability_score:.2f}")
    print(f"  人设一致性: {'✓' if eval_result.persona_consistency else '✗'}")
    print(f"  备注: {eval_result.notes}")
    
    print("\n" + "="*60)
    print("Pipeline完成")
    print("="*60)
    
    return {
        "raw_clips": raw_clips,
        "annotated_clips": annotated_clips,
        "analyzer": analyzer,
        "generator": generator,
        "performance": performance,
        "evaluation": eval_result
    }

# 运行示例（无数据版本）
# results = run_full_pipeline()

---

# 数据文件模板

请按以下格式准备你的切片数据，保存为 `raw_clips.md`：

In [None]:
DATA_TEMPLATE = '''
# VTuber切片原始数据

## clip_001

- **source**: 主播A
- **language**: zh
- **duration**: 120
- **title**: 起名与责任转嫁

### transcript

```
0:00 谢谢xx的SC，也可以先从宝宝起名开始
0:08 不行不行不行，给活物起名是我的第一线
0:15 有很多人啊，他做选择是什么，他就想赖别人
...
```

### notes

- 回复SC引发的深度展开
- 口癖："我跟你们说"、"对不对"

---

## clip_002

- **source**: StreamerB
- **language**: en
- **duration**: 90
- **title**: Why I never give pet naming advice

### transcript

```
0:00 Oh my god chat, someone just asked me to name their cat
0:05 No no no, that\'s where I draw the line
...
```

### notes

- Similar topic to clip_001 but in English
- Catchphrases: "oh my god", "chat"

---

(继续添加更多切片...)
'''

# 保存模板文件
with open('/home/claude/raw_clips_template.md', 'w', encoding='utf-8') as f:
    f.write(DATA_TEMPLATE)

print("✓ 数据模板已保存到 raw_clips_template.md")
print("\n请按此格式准备20个切片（中英混合），然后运行Pipeline")

---

# 下一步行动清单

## 你需要做的：

1. **准备数据** (预计1-2天)
   - 收集20个高质量主播切片（中英各10个左右）
   - 按 `raw_clips_template.md` 格式整理
   - 每个切片1-3分钟，要有完整的叙事弧

2. **配置API** (10分钟)
   - 获取Claude/GPT API密钥
   - 替换notebook中的 `call_llm_for_annotation` 函数

3. **运行标注** (预计2-3小时)
   - 运行LLM预标注
   - 人工校正标注结果

4. **迭代优化** (持续)
   - 分析模式报告
   - 调整Agent Prompt
   - 生成→评估→改进

## 评估指标目标：

| 指标 | 目标值 |
|------|--------|
| 结构有效性 | >0.7 |
| 注意力流畅度 | >0.6 |
| 可切片性 | >0.5 (每2分钟至少1个可切片段) |
| 人味得分 | >3.5/5 (人工盲评) |