# ðŸª„ AWQ: Activation-aware Weight Quantization (2023)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adiel2012/model-quantization/blob/main/chronology/awq_demo.ipynb)

AWQ identifies that not all weights are equally important. By looking at the activations, AWQ protects the most important weights (salient weights) from quantization error, leading to better accuracy than GPTQ in many cases, especially for smaller models.

In this notebook, we use `AutoAWQ` to quantize an `OPT-125M` model.

In [None]:
!pip install autoawq transformers -q

In [None]:
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch

model_id = "facebook/opt-125m"
quant_path = "opt-125m-awq"
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# 1. Load and Quantize
print("--- Loading and Quantizing with AWQ ---")
model = AutoAWQForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model.quantize(tokenizer, quant_config=quant_config)

# 2. Save (Optional for demo)
# model.save_quantized(quant_path)

In [None]:
print("--- AWQ Inference ---")
tokens = tokenizer("Deep learning quantization is", return_tensors="pt").to("cuda")
out = model.generate(**tokens, max_new_tokens=20)
print(tokenizer.decode(out[0]))