Skip to content

W6WM9M/MA-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MA-BERT

This Github respository contains the pre-trained models of MA-BERT models mentioned in the paper MA-BERT: Towards Matrix Arithmetic-only BERT Inference by Eliminating Complex Non-linear Functions. In particular, three pretrained checkpoints are released under the Pretrained Checkpoint section:

  1. MA-BERT
  2. MA-BERT (Shared Softmax)
  3. MA-DistilBERT

In MA-BERT, we proposed four correlated techniques that include:

  1. Approximating softmax with a two-layer neural network
  2. Replacing GELU with ReLU
  3. Fusing normalization layers with adjacent linear layers
  4. Leveraging knowledge transfer from baseline models

Through these techniques, we were able to eliminate the major non-linear functions in BERT and obtain MA-BERT with only matrix arithmetic and trivial ReLU operations. Our experimental results show that MA-BERT achieves a more efficient inference with comparable accuracy on many downstream tasks compared to the baseline BERT models.

Loading Instructions

To load MA-BERT and MA-BERT (Shared Softmax):

  1. Download the ma-bert folder and its pretrained checkpoint
  2. Move the folder to the BERT folder in the transformers library: transformers/models/bert
  3. Execute the code in loading_example.ipynb

To load MA-DistilBERT:

  1. Download the ma-distilbert folder and its pretrained checkpoint
  2. Move the folder to the DistilBERT folder in the transformers library: transformers/models/distilbert
  3. Execute the code in loading_example.ipynb

Pretrained Checkpoints

The following contains the links to our pretrained checkpoints:

Model
MA-BERT
MA-BERT (Shared Softmax)
MA-DistilBERT

Evaluation on GLUE and IMDb

The GLUE benchmark and the IMDb sentiment classification task were used to evaluate MA-BERT.

val_glue.py
Example python script for finetuning MA-BERT on GLUE tasks
val_imdb.py
Example python script for finetuning MA-BERT on the IMDb sentiment classification task

The following are the required command line arguments:

  • student_ckpt_file: Path to the pretrained checkpoint file of MA-BERT or MA-DistilBERT
  • teacher_ckpt_file: Path to the finetuned checkpoint file of BERT or DistilBERT
  • save_dir: Save directory path
  • file_name: Folder name to be saved as
  • model: "bert-base-uncased" or "distilbert-base-uncased"
  • epoch: Number of epochs for finetuning, 10 (CoLA, MRPC, STS-B, RTE) or 5 (otherwise)
  • learning_rate: Learning rate for finetuning, default: 2e-5
  • batch_size: Batch size for finetuning, 16 (CoLA, MRPC, STS-B, RTE) or 32 (otherwise)
  • KD_alpha: Alpha term used in the knowledge transfer, default: 0.9
  • Citations

    @inproceedings{
    ming2023mabert,
    title={{MA}-{BERT}: Towards Matrix Arithmetic-only {BERT} Inference by Eliminating Complex Non-linear Functions},
    author={Neo Wei Ming and Zhehui Wang and Cheng Liu and Rick Siow Mong Goh and Tao Luo},
    booktitle={International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=HtAfbHa7LAL}
    }