ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization [ICLR 2026]

Lawrence Liu¹, Alexander Liu², Mengdi Wang³, Tuo Zhao⁴, Lin F. Yang¹
¹ UCLA, ² Independent, ³ Princeton University
, ⁴ Georgia Tech

ARMOR was accepted to ICLR 2026!

Overview

This is the official implementation of ARMOR, a new 2:4 semi-structured pruning method that leverages adaptive matrix factorization. ARMOR signficantly outperforms existing naive 2:4 semi-structured pruning methods such as Wanda and SparseGPT. Sample results for Qwen2.5 7B/14B/32B/72B are shown below. More results can be found in our paper.

Requirements

Python 3.13.2+
Miniconda/Anaconda
Cuda

Installation

We use two seperate virtual enviroments, one main enviroment ARMOR_main for pruning and perplexity evaluation, and one enviroment ARMOR_eval for LM evaluation harness based evaluation. This is because there are some dependency conflicts between hydra and certain tasks in lm-evaluation-harness such as math.

# Clone the repository
git clone git@github.com:LawrenceRLiu/ARMOR.git
cd ARMOR


# Armor main needs to be generated from pip due to dependency conflicts
conda create -n ARMOR_main python=3.13.2 -y
conda activate ARMOR_main
python -m pip install -r requirements_main.txt

# Armor eval can be generated from conda
conda env create -f eval_env.yml

Usage

We have included automated scripts to generate the calibration data and run the pruning and evaluation.

# llama-2 compression 
scripts/replicate/compress_llama2.bash <model_name> <gpus>
# llama-3 compression
scripts/replicate/compress_llama3.bash <model_name> <gpus>
# Qwen2.5/Qwen3 compression 
scripts/replicate/compress_qwen.bash <model_name> <gpus> [num_processes]

On the Qwen scripts there is an optional argument num_processes that specifies how many processes to use for data parallel for evaluation. If not specified, it will use as many processes as available gpus. However on larger models such as 70B, it is necessary to perform both data parallelism and model parallelism, so you may need to specify num_processes to be less than the number of available gpus.

Baselines

We have also included a modified version of the Wanda in wanda-main folder. We modified the original Wanda codebas to work with the updated transformers and datasets libraries. We have included an automated script to run and evaluatie Wanda and SparseGPT pruning on Qwen models.

# Qwen2.5/Qwen3 compression with Wanda or SparseGPT
scripts/baselines/qwen_compress.bash <model_name> <wanda/sparsegpt> [num_processes]

Since this repository was built ontop the NoWag repository, we have also included a script to run and evaluate NoWag pruning on Qwen models.

# Qwen2.5/Qwen3 compression with NoWag
scripts/baselines/NoWag_P.bash <model_name> [num_processes]

Contact And Additional Information

For questions or issues, please contact lawrencerliu@ucla.edu

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
config		config
scripts		scripts
src		src
wanda-main		wanda-main
.gitignore		.gitignore
LICENSE		LICENSE
ParallelCompress.py		ParallelCompress.py
eval_env.yml		eval_env.yml
readme.md		readme.md
requirements_main.txt		requirements_main.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization [ICLR 2026]

Overview

Requirements

Installation

Usage

Baselines

Contact And Additional Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization [ICLR 2026]

Overview

Requirements

Installation

Usage

Baselines

Contact And Additional Information

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages