Skip to content

LawrenceRLiu/ARMOR

Repository files navigation

ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization [ICLR 2026]

Lawrence Liu1, Alexander Liu2, Mengdi Wang3, Tuo Zhao4, Lin F. Yang1
1 UCLA, 2 Independent, 3 Princeton University
, 4 Georgia Tech

ARMOR Overview

ARMOR was accepted to ICLR 2026!

Overview

This is the official implementation of ARMOR, a new 2:4 semi-structured pruning method that leverages adaptive matrix factorization. ARMOR signficantly outperforms existing naive 2:4 semi-structured pruning methods such as Wanda and SparseGPT. Sample results for Qwen2.5 7B/14B/32B/72B are shown below. More results can be found in our paper.

Sample Results

Requirements

  • Python 3.13.2+
  • Miniconda/Anaconda
  • Cuda

Installation

We use two seperate virtual enviroments, one main enviroment ARMOR_main for pruning and perplexity evaluation, and one enviroment ARMOR_eval for LM evaluation harness based evaluation. This is because there are some dependency conflicts between hydra and certain tasks in lm-evaluation-harness such as math.

# Clone the repository
git clone git@github.com:LawrenceRLiu/ARMOR.git
cd ARMOR


# Armor main needs to be generated from pip due to dependency conflicts
conda create -n ARMOR_main python=3.13.2 -y
conda activate ARMOR_main
python -m pip install -r requirements_main.txt

# Armor eval can be generated from conda
conda env create -f eval_env.yml

Usage

We have included automated scripts to generate the calibration data and run the pruning and evaluation.

# llama-2 compression 
scripts/replicate/compress_llama2.bash <model_name> <gpus>
# llama-3 compression
scripts/replicate/compress_llama3.bash <model_name> <gpus>
# Qwen2.5/Qwen3 compression 
scripts/replicate/compress_qwen.bash <model_name> <gpus> [num_processes]

On the Qwen scripts there is an optional argument num_processes that specifies how many processes to use for data parallel for evaluation. If not specified, it will use as many processes as available gpus. However on larger models such as 70B, it is necessary to perform both data parallelism and model parallelism, so you may need to specify num_processes to be less than the number of available gpus.

Baselines

We have also included a modified version of the Wanda in wanda-main folder. We modified the original Wanda codebas to work with the updated transformers and datasets libraries. We have included an automated script to run and evaluatie Wanda and SparseGPT pruning on Qwen models.

# Qwen2.5/Qwen3 compression with Wanda or SparseGPT
scripts/baselines/qwen_compress.bash <model_name> <wanda/sparsegpt> [num_processes]

Since this repository was built ontop the NoWag repository, we have also included a script to run and evaluate NoWag pruning on Qwen models.

# Qwen2.5/Qwen3 compression with NoWag
scripts/baselines/NoWag_P.bash <model_name> [num_processes]

Contact And Additional Information

For questions or issues, please contact lawrencerliu@ucla.edu

About

Official Implementation of the ARMOR pruning algorithm [ICLR 2026]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors