Skip to content

Latest commit

 

History

History
274 lines (267 loc) · 7 KB

llm_recipes.md

File metadata and controls

274 lines (267 loc) · 7 KB

LLMs Quantization Recipes

Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with PyTorch, Intel® Extension for PyTorch and Intel® Extension for Transformers.
This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss.

Notes:

Large Language Models Recipes

Models SQ INT8 WOQ INT8 WOQ INT4
EleutherAI/gpt-j-6b
facebook/opt-1.3b
facebook/opt-30b
meta-llama/Llama-2-7b-hf
meta-llama/Llama-2-13b-hf
meta-llama/Llama-2-70b-hf
tiiuae/falcon-7b
tiiuae/falcon-40b
baichuan-inc/Baichuan-13B-Chat
baichuan-inc/Baichuan2-13B-Chat
baichuan-inc/Baichuan2-7B-Chat
bigscience/bloom-1b7
databricks/dolly-v2-12b
EleutherAI/gpt-neox-20b
mistralai/Mistral-7B-v0.1
THUDM/chatglm2-6b WIP
THUDM/chatglm3-6b WIP

Detail recipes can be found HERE.

Notes:

  • This model list comes from IPEX.
  • The WIP recipes will be published soon.

Large Language Models Accuracy

Model lambada_openai
FP32 SQ INT8 WOQ INT8 WOQ INT4 GPTQ WOQ INT4 AutoRound
ACC ACC Ratio ACC Ratio ACC Ratio ACC Ratio
baichuan-inc/Baichuan-13B-Chat 67.57% 68.23% 1.0098 67.57% 1.0000 67.84% 1.0040 NA NA
baichuan-inc/Baichuan2-13B-Chat 71.51% 70.89% 0.9913 71.53% 1.0003 71.76% 1.0035 NA NA
baichuan-inc/Baichuan2-7B-Chat 67.67% 67.96% 1.0043 67.59% 0.9988 67.24% 0.9936 67.42% 0.9963
bigscience/bloom-1b7 46.34% 47.99% 1.0356 46.38% 1.0009 46.19% 0.9968 NA NA
databricks/dolly-v2-12b 64.35% NA NA 64.10% 0.9961 NA NA NA NA
EleutherAI/gpt-j-6b 68.31% 68.33% 1.0003 68.23% 0.9988 68.79% 1.0070 68.43% 1.0018
EleutherAI/gpt-neox-20b 72.33% NA NA 72.25% 0.9989 71.96% 0.9949 NA NA
facebook/opt-1.3b 57.89% 57.54% 0.9940 58.08% 1.0033 58.57% 1.0117 NA NA
facebook/opt-30b 71.49% 71.51% 1.0003 71.51% 1.0003 71.82% 1.0046 72.11% 1.0087
meta-llama/Llama-2-13b-hf 76.77% 76.25% 0.9932 76.75% 0.9997 77.43% 1.0086 76.75% 0.9997
meta-llama/Llama-2-70b-hf 79.64% 79.55% 0.9989 79.57% 0.9991 80.09% 1.0057 79.97% 1.0041
meta-llama/Llama-2-7b-hf 73.92% 73.45% 0.9936 73.96% 1.0005 73.45% 0.9936 73.49% 0.9942
mistralai/Mistral-7B-v0.1 75.90% NA NA 75.80% 0.9987 76.13% 1.0030 75.61% 0.9962
THUDM/chatglm2-6b 53.23% NA NA 53.19% 0.9992 52.77% 0.9914 53.35% 1.0023
THUDM/chatglm3-6b 59.09% NA NA 59.01% 0.9986 NA NA 58.61% 0.9919
tiiuae/falcon-40b 77.22% 77.04% 0.9977 77.22% 1.0000 77.94% 1.0093 78.79% 1.0203
tiiuae/falcon-7b 74.67% 76.44% 1.0237 74.77% 1.0013 75.00% 1.0044 NA NA