# ðŸ§Š NF4 & QLoRA: 4-bit Quantization (2023)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adiel2012/model-quantization/blob/main/chronology/nf4_demo.ipynb)

NormalFloat 4-bit (NF4) is a data type introduced with QLoRA. It is theoretically optimal for weights that follow a normal distribution (like neural network weights). Combined with Double Quantization and Paged Optimizers, it allows fine-tuning massive models on consumer hardware.

In this notebook, we use `bitsandbytes` to load an `OPT-125M` model in 4-bit NF4 precision.

In [None]:
!pip install bitsandbytes transformers accelerate -q

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "facebook/opt-125m"

# 1. Define NF4 Configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

print("--- Loading Model in NF4 ---")
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    quantization_config=bnb_config, 
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

In [None]:
print("--- NF4 Inference ---")
inputs = tokenizer("The quick brown fox", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))