Skip to content

KevinConqueror/GTC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎯 GTC: Game-Theoretic Compression for Video Large Language Models 🚀

Yikang Zhou, Hongchen Wei, Zhenzhong Chen

Wuhan University

🔥 News

  • 2026.03.03 🎉 We release the code repository for our latest work on GTC. Core code coming soon!

📌 Highlights

  • Game-Theoretic Framework: Novel Stackelberg game formulation for optimal token allocation.
  • Model Adaptability: Compatible with most VideoLLMs (e.g., LLaVA, Qwen-VL series).
  • Operator Compatibility: Compatible with efficient operators like Flash Attention 2.
  • Strong Performance: Maintaining competitive performance.
  • High Efficiency: Significantly reduces generation time and overall latency.

✨ Overview

TLDR: We present GTC, a game-theoretic framework that dynamically compresses video tokens, achieving superior efficiency and performance across various VideoLLMs and benchmarks.

🛠 Preparation

  1. Clone this repository:
git clone https://github.com/KevinConqueror/GTC.git
cd GTC
  1. Environment Setup and Preparation:
conda create -n gtc python=3.10 -y
conda activate gtc
pip install --upgrade pip
pip install -e ".[train]"
pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git

🚀 Performance Evaluation

We use the lmms-eval toolkit to evaluate our models.

You can choose whether to use flash attention, but in our efficiency analysis, if flash attention can be used, then it should be used.

To evaluate LLaVA-OneVision-7B, you can use:

accelerate launch --num_processes=1 \
-m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,attn_implementation=flash_attention_2 \
--tasks videomme,mlvu_dev,longvideobench_val_v,mvbench \
--batch_size 1 \
--log_samples \
--log_samples_suffix llava_onevision \
--output_path ./logs/

👍 Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA-OneVision, Qwen2.5-VL.

📩 Contact

For any question about our paper or code, please email ykzhou@whu.edu.cn.


🔒 Note on Core Code

The core implementation (llava/model/gtc.py) is currently Coming Soon and will be released in the future. The repository structure and evaluation code are provided for reference.

About

The official implementation of "GTC: Game-Theoretic Token Compression for Video Large Language Models".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages