# Actor Response Length

本笔记本从 `final_results/data/actor_response_length` 读取 CSV 并绘制曲线。

数据格式（每个 CSV 一条曲线）：
- 必需列：`step`, `response_length`
- 示例：
  - `step`: 训练步数或迭代编号（数值型）
  - `response_length`: 对应步数下的平均/中位响应长度（数值型）

In [None]:
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd

data_dir = Path('..') / 'data' / 'actor_response_length'
series = [
    ('grpo.csv', 'GRPO'),
    ('ppo.csv', 'PPO'),
    ('reinforce_pp.csv', 'Reinforce++'),
    ('grpo_n32.csv', 'GRPO (n=32)'),
    ('grpo_kl_0p001.csv', 'GRPO (KL=0.001)'),
]
markers = ['o', 's', '^', 'D', 'v']

plt.figure(figsize=(11, 4.8))
for (filename, label), marker in zip(series, markers):
    df = pd.read_csv(data_dir / filename)
    plt.plot(
        df['step'],
        df['response_length'],
        marker=marker,
        linewidth=1.5,
        markersize=4,
        label=label,
    )

plt.title('Actor Response Length')
plt.xlabel('Step')
plt.ylabel('Response Length')
plt.grid(True, linestyle='--', alpha=0.4)
plt.legend()
plt.tight_layout()
plt.show()
