Skip to content

OpenGVLab/LLMPrune-BESA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BESA

This repository contains code to reproduce the key results of the paper "BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation", accepted at International Conference on Learning Representations (ICLR), 2024.

Dependencies

  • torch: tested on v2.0.1+cu118
  • transformers: tested on v4.31.0
  • accelerate: tested on v0.21.0
  • datasets: tested on v2.14.4
  • timm: tested on v0.9.5

lm-evaluation-harness

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

Customized Cuda Operator

cd models/ops
python setup.py install

Usage

Here is the command to run baseline experiments followed by perplexity evaluations on WikiText2, PTB, C4 and zero-shot tasks. See also the CMD-argument documentation.

bash main_exps.sh

Hardware Simulation

We utilize the ViTCoD accelerator to achieve the speed-up that can be obtained through the sparse accelerator. Here is the command to simulate the average runtime of each module in the pruned model.

python main_hw.py \
--model-name MODEL_DIR or HF_MODEL_NAME
--func SIMULATE_CHOICE (q, k, v, o, gate, up, and down are available)

Others

In the experiment section of our paper, we present the results of row-wise sparsity, which customize sparsity for each row of target layer's weight within in the block. Additionally, we provide an extension presenting the outcomes of layer-wise sparsity, where each row of the target layer is assigned uniform sparsity. You can find the commands to execute the layer-wise sparsity experiments in the main_exps.sh script. Below, we present the perplexity results for the Wikitext2 dataset.

1-7B 1-13B 1-30B 1-65B 2-7B 2-13B 2-70B
Dense 5.68 5.09 4.10 3.53 5.47 4.88 3.31
SparseGPT 7.22 6.21 5.33 4.60 6.99 6.02 4.25
Wanda 7.26 6.15 5.25 4.60 6.92 5.97 4.22
BESA (layer-wise) 7.04 6.07 5.16 4.51 6.77 5.85 4.14
BESA (row-wise) 6.86 5.92 5.00 4.33 6.60 5.75 4.09

Acknowledgement

This repo benefits from SparseGPT, Prodigy, and ViTCoD. Thanks for their wonderful works.

About

BESA is a differentiable weight pruning technique for large language models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published