Benchmark introduced in Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

Jinzhuo Liu¹, Jiangning Zhang^1✉, Wencan Jiang¹, Yabiao Wang², Dingkang Liang³, Zhucun Xue¹, Ran Yi⁴, Yong Liu¹

¹Zhejiang University, ²Tencent Youtu Lab, ³Huazhong University of Science and Technology,
⁴Shanghai Jiao Tong University ^✉Corresponding author

📷 Introduction

We introduce NarraStream-Bench, a benchmark for narrative streaming video generation that features 324 multi-prompt scripts spanning six dimensions and a three-dimensional evaluation protocol that integrates both traditional metrics and multimodal large language model- based assessment. The benchmark is introduced together with IAMFlow.

✨ Highlights

1. Overview of NarraStream-Bench

2. Benchmark Comparison

Comparison of related long-video generation benchmarks.

Benchmark	VQ	TC	IC	Prompt Type	Aggregation Strategy	Year
VBench-Long	✓	×	×	Single	Slow-Fast Avg.	2024
LV-Bench	✓	✓	×	Single	VDE	2025
NarrLV	×	✓	✓	Single	TNA-based QA	2025
NarraStream-Bench	✓	✓	✓	Multi	Narrative-Aware	2026

🛠️ Installation

1. Install requirements

git clone git@github.com:Eddie0521/NarraStream-Bench.git
cd NarraStream-Bench
conda create -n NarraStream-Bench python=3.10
conda activate NarraStream-Bench

# Install a PyTorch build that matches your CUDA/runtime first.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

2. Download checkpoints

Download the metric backbones and auxiliary weights:

bash scripts/download_weights.sh

By default, checkpoints are saved to ./pretrained and resolved by configs/paths.yaml. Expected checkpoints include CLIP, DINO, RAFT, AMT, VTSS, and LanguageBind video weights.

🔑 Usage

1. Prepare the api key

NarraStream-Bench uses API-backed MLLM/VLM metrics by default. Set your API key before running the full evaluation:

export SILICONFLOW_API_KEY=your_api_key

2. Prepare the evaluation data

Prepare generated videos and prompts in the following structure:

your_dataset/
├── prompt.jsonl
└── video/
    ├── sample_0.mp4
    ├── sample_1.mp4
    └── ...

Each line in prompt.jsonl should contain one sample:

{"prompts": ["segment prompt 1", "segment prompt 2", "segment prompt 3"]}

The number of videos must match the number of prompt samples. If videos are not named as sample_0.mp4, sample_1.mp4, ..., NarraStream-Bench will read all supported video files in natural sorted order.

3. Run the command

bash scripts/run_narrastream_bench.sh \
  --run-name my_eval \
  --video-dir your_dataset/video \
  --prompts your_dataset/prompt.jsonl \
  --gpu-id 0

4. See the output

Results are saved under runs/<run-name>/ by default:

runs/<run-name>/
├── processed/
│   ├── eval_data.json
│   ├── .preprocess_signature
│   └── sample_*/
│       ├── seg_0.mp4
│       ├── seg_1.mp4
│       └── ...
└── results/
    ├── results_latest.json
    ├── results_YYYYMMDD_HHMMSS.json
    ├── steps/
    ├── raw_metrics/
    └── artifacts/

The main files to inspect are:

results_latest.json: latest resumable snapshot, updated after each metric.
results_YYYYMMDD_HHMMSS.json: final timestamped result file.
processed/eval_data.json: preprocessed segment metadata.

🌟 Citation

Please leave us a star 🌟 and cite our paper if you find our work helpful.

@misc{liu2026advancingnarrativelongvideo,
      title={Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory}, 
      author={Jinzhuo Liu and Jiangning Zhang and Wencan Jiang and Yabiao Wang and Dingkang Liang and Zhucun Xue and Ran Yi and Yong Liu},
      year={2026},
      eprint={2605.18733},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.18733}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
configs		configs
narrastream_bench		narrastream_bench
prompt		prompt
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark introduced in Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

📷 Introduction

✨ Highlights

1. Overview of NarraStream-Bench

2. Benchmark Comparison

🛠️ Installation

1. Install requirements

2. Download checkpoints

🔑 Usage

1. Prepare the api key

2. Prepare the evaluation data

3. Run the command

4. See the output

🌟 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmark introduced in Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

📷 Introduction

✨ Highlights

1. Overview of NarraStream-Bench

2. Benchmark Comparison

🛠️ Installation

1. Install requirements

2. Download checkpoints

🔑 Usage

1. Prepare the api key

2. Prepare the evaluation data

3. Run the command

4. See the output

🌟 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages