MA-BERT

This Github respository contains the pre-trained models of MA-BERT models mentioned in the paper MA-BERT: Towards Matrix Arithmetic-only BERT Inference by Eliminating Complex Non-linear Functions. In particular, three pretrained checkpoints are released under the Pretrained Checkpoint section:

MA-BERT
MA-BERT (Shared Softmax)
MA-DistilBERT

In MA-BERT, we proposed four correlated techniques that include:

Approximating softmax with a two-layer neural network
Replacing GELU with ReLU
Fusing normalization layers with adjacent linear layers
Leveraging knowledge transfer from baseline models

Through these techniques, we were able to eliminate the major non-linear functions in BERT and obtain MA-BERT with only matrix arithmetic and trivial ReLU operations. Our experimental results show that MA-BERT achieves a more efficient inference with comparable accuracy on many downstream tasks compared to the baseline BERT models.

Loading Instructions

To load MA-BERT and MA-BERT (Shared Softmax):

Download the ma-bert folder and its pretrained checkpoint
Move the folder to the BERT folder in the transformers library: transformers/models/bert
Execute the code in loading_example.ipynb

To load MA-DistilBERT:

Download the ma-distilbert folder and its pretrained checkpoint
Move the folder to the DistilBERT folder in the transformers library: transformers/models/distilbert
Execute the code in loading_example.ipynb

Pretrained Checkpoints

The following contains the links to our pretrained checkpoints:

Model
MA-BERT
MA-BERT (Shared Softmax)
MA-DistilBERT

Evaluation on GLUE and IMDb

The GLUE benchmark and the IMDb sentiment classification task were used to evaluate MA-BERT.

val_glue.py: Example python script for finetuning MA-BERT on GLUE tasks
val_imdb.py: Example python script for finetuning MA-BERT on the IMDb sentiment classification task

The following are the required command line arguments:

student_ckpt_file: Path to the pretrained checkpoint file of MA-BERT or MA-DistilBERT

teacher_ckpt_file: Path to the finetuned checkpoint file of BERT or DistilBERT

save_dir: Save directory path

file_name: Folder name to be saved as

model: "bert-base-uncased" or "distilbert-base-uncased"

epoch: Number of epochs for finetuning, 10 (CoLA, MRPC, STS-B, RTE) or 5 (otherwise)

learning_rate: Learning rate for finetuning, default: 2e-5

batch_size: Batch size for finetuning, 16 (CoLA, MRPC, STS-B, RTE) or 32 (otherwise)

KD_alpha: Alpha term used in the knowledge transfer, default: 0.9

Citations

@inproceedings{
ming2023mabert,
title={{MA}-{BERT}: Towards Matrix Arithmetic-only {BERT} Inference by Eliminating Complex Non-linear Functions},
author={Neo Wei Ming and Zhehui Wang and Cheng Liu and Rick Siow Mong Goh and Tao Luo},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=HtAfbHa7LAL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
ma_bert		ma_bert
ma_distilbert		ma_distilbert
LICENSE		LICENSE
README.md		README.md
loading_example.ipynb		loading_example.ipynb
val_glue.py		val_glue.py
val_imdb.py		val_imdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MA-BERT

Loading Instructions

Pretrained Checkpoints

Evaluation on GLUE and IMDb

Citations

About

Releases

Packages

Languages

License

W6WM9M/MA-BERT

Folders and files

Latest commit

History

Repository files navigation

MA-BERT

Loading Instructions

Pretrained Checkpoints

Evaluation on GLUE and IMDb

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages