2026.03.03🎉 We release the code repository for our latest work on GTC. Core code coming soon!
- Game-Theoretic Framework: Novel Stackelberg game formulation for optimal token allocation.
- Model Adaptability: Compatible with most VideoLLMs (e.g., LLaVA, Qwen-VL series).
- Operator Compatibility: Compatible with efficient operators like Flash Attention 2.
- Strong Performance: Maintaining competitive performance.
- High Efficiency: Significantly reduces generation time and overall latency.
TLDR: We present GTC, a game-theoretic framework that dynamically compresses video tokens, achieving superior efficiency and performance across various VideoLLMs and benchmarks.
- Clone this repository:
git clone https://github.com/KevinConqueror/GTC.git
cd GTC- Environment Setup and Preparation:
conda create -n gtc python=3.10 -y
conda activate gtc
pip install --upgrade pip
pip install -e ".[train]"
pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.gitWe use the lmms-eval toolkit to evaluate our models.
You can choose whether to use flash attention, but in our efficiency analysis, if flash attention can be used, then it should be used.
To evaluate LLaVA-OneVision-7B, you can use:
accelerate launch --num_processes=1 \
-m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,attn_implementation=flash_attention_2 \
--tasks videomme,mlvu_dev,longvideobench_val_v,mvbench \
--batch_size 1 \
--log_samples \
--log_samples_suffix llava_onevision \
--output_path ./logs/We extend our gratitude to the open-source efforts of LLaVA-OneVision, Qwen2.5-VL.
For any question about our paper or code, please email ykzhou@whu.edu.cn.
The core implementation (llava/model/gtc.py) is currently Coming Soon and will be released in the future. The repository structure and evaluation code are provided for reference.
