$D_{2} O$ : Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

The code for ICLR 2025 paper: D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models.

📃 [Paper] • 💻 [Github] • 🤗 [Huggingface]

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

We recommend using Anaconda to create a new environment and install the required packages. You can create a new environment and install the required packages using the following commands:

pip install -r requirements.txt
conda create -n d2o_v2 python=3.10
conda activate d2o_v2
pip install --upgrade pip  # enable PEP 660 support

Quick Step to Run the Code

You can run the inference code using the following command to run the Longbench sample:

CUDA_VISIBLE_DEVICES=0 python run_pred_long_bench_sample.py --model_name_or_path meta-llama/Meta-Llama-3-8B \
    --cache_dir /your_hf_home_path \
    --use_d2o True \
    --model_type llama3 \
    --hh_ratio 0.1 \
    --recent_ratio 0.1 \
    --action_name d2o_0.2 \
    --e True

cache_dir stores your model weights.
use_d2o specifies the execution strategy name.
hh_ratio refers to important tokens in our main paper.
recent_ratio represents the proportion of the window closest to the generated token.

Then, evaluate the results:

python eval_long_bench.py --model Meta-Llama-3-8B_d2o_0.2 --e

For tasks related to lm-evaluation-harness GitHub Repository,
we recommend using the latest version by running:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git

Then, follow the installation instructions provided in the repository and execute our algorithm accordingly.

Citation

@article{wan2024d2o,
  title={D2o: Dynamic discriminative operations for efficient generative inference of large language models},
  author={Wan, Zhongwei and Wu, Xinjian and Zhang, Yu and Xin, Yi and Tao, Chaofan and Zhu, Zhihong and Wang, Xin and Luo, Siqi and Xiong, Jing and Zhang, Mi},
  journal={arXiv preprint arXiv:2406.13035},
  year={2024}
}

or

@inproceedings{wan2025text,
  title={$$\backslash$text $\{$D$\}$ \_ $\{$2$\}$$\backslash$text $\{$O$\}$ $: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models},
  author={Wan, Zhongwei and Wu, Xinjian and Zhang, Yu and Xin, Yi and Tao, Chaofan and Zhu, Zhihong and Wang, Xin and Luo, Siqi and Xiong, Jing and Wang, Longyue and others},
  booktitle={The Thirteenth International Conference on Learning Representations}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LLM_merge_new		LLM_merge_new
D2O.png		D2O.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

$D_{2} O$ : Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

Quick Step to Run the Code

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

AIoT-MLSys-Lab/D2O

Folders and files

Latest commit

History

Repository files navigation

D 2 O : Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

Quick Step to Run the Code

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

$D_{2} O$ : Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Packages