You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As their size increases, Large Languages Models (LLMs) are natural candidatesfor network pruning methods: approaches that drop a subset of network weightswhile striving to preserve performance. Existing methods, however, requireeither retraining, which is rarely affordable for billion-scale LLMs, orsolving a weight reconstruction problem reliant on second-order information,which may also be computationally expensive. In this paper, we introduce anovel, straightforward yet effective pruning method, termed Wanda (Pruning byWeights and activations), designed to induce sparsity in pretrained LLMs.Motivated by the recent observation of emergent large magnitude features inLLMs, our approach prune weights with the smallest magnitudes multiplied by thecorresponding input activations, on a per-output basis. Notably, Wanda requiresno retraining or weight update, and the pruned LLM can be used as is. Weconduct a thorough evaluation of our method on LLaMA across various languagebenchmarks. Wanda significantly outperforms the established baseline ofmagnitude pruning and competes favorably against recent methods involvingintensive weight update. Code is available athttps://github.com/locuslab/wanda.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: