Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

PyTorch implementation of paper "Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers". It includes code and pretrained jsons for non-linear operations in quantization models.

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Pingcheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng
DAC 2024

Installation

Clone this repo with submodules:

git clone https://github.com/PingchengDong/GQA-LUT
cd GQA-LUT/

The code is tested with Python3.7, PyTorch == 1.5. We recommend you to use anaconda to make sure that all dependencies are in place. To create an anaconda environment:

conda env create -f environment.yml
conda activate gqa-lut

Support List

├──Non-linear operations
    ├──GELU
    ├──HSwish
    ├──Sigmoid
    ├──Exponent
    ├──Reciprocal
    ├──Reciprocal of square root
    ├──...

Approximation

Example: to approximate GELU with 8 segpoints:

python gqa-lut.py --act_func 'gelu' --x_range -4 4 --sp_range -4.0 4.0 --num_splits 7 --decimal_bit_range 0 6 --total_iters 500 --mutate

We've provided some pretrained jsons for several non-linear operations with 8 & 16 segpoints, which are mostly used in neural network in the pretrained folder.

To assist you in reproducing our results as accurately as possible, we provide a Makefile file. It includes the parameter settings and execution methods for several supported non-linear functions in the GQA-LUT code mentioned above.

For example, for GQA-LUT approximation of GELU function with 8 segpoints, running:

make gelu_8

Citation

@inproceedings{dong2024gqalut,
  author    = author={Dong, Pingcheng and Tan, Yonghao and Zhang, Dong and Ni, Tianwei and Liu, Xuejiao and Liu, Yu and Luo, Peng and Liang, Luhong and Liu, Shih-Yang and Huang, Xijie and Zhu, Huaiyu and Pan, Yun and An, Fengwei and Cheng, Kwang-Ting},
  title     = {Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers},
  booktitle = {Design Automation Conference (DAC)},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
pretrained		pretrained
.gitignore		.gitignore
GQA_FIG.png		GQA_FIG.png
Makefile		Makefile
README.md		README.md
gqa-lut.py		gqa-lut.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretrained

pretrained

.gitignore

.gitignore

GQA_FIG.png

GQA_FIG.png

Makefile

Makefile

README.md

README.md

gqa-lut.py

gqa-lut.py

Repository files navigation

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Installation

Support List

Approximation

Citation

About

Releases

Packages

Contributors 2

Languages

PingchengDong/GQA-LUT

Folders and files

Latest commit

History

Repository files navigation

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Installation

Support List

Approximation

Citation

About

Resources

Stars

Watchers

Forks

Languages