The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Abstract

Large pre-trained transformers are show-stealer in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale. With exploding parameter counts, Lottery Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in sparsifying them due to high computation and memory bottleneck of repetitive train-prune-retrain routine of iterative magnitude pruning (IMP) which worsens with increasing model size. In this paper, we comprehensively study induced sparse patterns across multiple large pre-trained vision and language transformers. We propose the existence of – “essential sparsity” defined with a sharp dropping point beyond which the performance declines much faster w.r.t the rise of sparsity level, when we directly remove weights with the smallest magnitudes in one-shot. We also present an intriguing emerging phenomenon of abrupt sparsification during the pre-training of BERT, i.e., BERT suddenly becomes heavily sparse in pre-training after certain iterations. Moreover, our observations also indicate a counter-intuitive finding that BERT trained with a larger amount of pre-training data tends to have a better ability to condense knowledge in comparatively relatively fewer parameters. Lastly, we investigate the effect of the pre-training loss on essential sparsity and discover that self-supervised learning (SSL) objectives trigger stronger emergent sparsification properties than supervised learning (SL).

Installation

Our implementation is based on Huggingface repo. Details are referred to README here.

With pip

First you need to install one of, or both, TensorFlow 2.0 and PyTorch. Please refer to TensorFlow installation page and/or PyTorch installation page regarding the specific install command for your platform.

When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:

pip install transformers

Glue task:

python -u bert_analysis.py
	   --output_dir tmp/mnli 
	   --logging_steps <ADD_VALUE> 
	   --task_name MNLI
     --do_lower_case
	   --data_dir glue_data/MNLI 
	   --model_type bert 
	   --model_name_or_path bert-base-uncased 
	   --max_seq_length  <ADD_VALUE> 
	   --learning_rate 2e-5 
	   --num_train_epochs  <ADD_VALUE>  
	   --overwrite_output_dir 
	   --evaluate_during_training 
	   --save_steps  <ADD_VALUE> 
	   --eval_all_checkpoints 
	   --seed  <ADD_VALUE>

SQuAD task:

python -u squad_analysis.py 
	   --output_dir <ADD_VALUE> 
	   --model_type bert 
	   --model_name_or_path bert-base-uncased 
       --do_train 
       --do_eval 
       --do_lower_case 
       --train_file SQuAD/train-v1.1.json 
       --predict_file SQuAD/dev-v1.1.json 
       --per_gpu_train_batch_size <ADD_VALUE>  
       --learning_rate 3e-5 
       --num_train_epochs <ADD_VALUE>  
       --max_seq_length <ADD_VALUE>  
       --doc_stride 128 
       --evaluate_during_training 
       --eval_all_checkpoints 
       --overwrite_output_dir 
       --logging_steps <ADD_VALUE>  
       --save_steps <ADD_VALUE>  
       --seed <ADD_VALUE>

Citation

If you use this code for your research, please cite our paper:

@article{jaiswal2023emergence,
  title={The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter},
  author={Jaiswal, Ajay and Liu, Shiwei and Chen, Tianlong and Wang, Zhangyang},
  journal={arXiv preprint arXiv:2306.03805},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

README.md

README.md

Repository files navigation

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Abstract

Installation

With pip

Glue task:

SQuAD task:

Citation

About

Releases

Packages

Languages

VITA-Group/essential_sparsity

Folders and files

Latest commit

History

assets

assets

README.md

README.md

Repository files navigation

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Abstract

Installation

With pip

Glue task:

SQuAD task:

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages