# Quantization using GPTQ

## THIS NOTEBOOK REQUIRES GPUS/CUDA to run !!!

HuggingFace GPTQ Integration

https://huggingface.co/blog/gptq-integration

Original Notebook

https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb



## Load required libraries

Let us first load the required libraries that are 🤗 transformers, optimum and auto-gptq library.

In [None]:
!pip install -q -U transformers peft accelerate optimum
!pip install -q datasets

For now, until the next release of AutoGPTQ, we will build the library from source!

In [None]:
!pip install -q auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu117/

### Quantize a model by passing a custom dataset

You can also quantize a model by passing a custom dataset, for that you can provide a list of strings to the quantization config. A good number of sample to pass is 128. If you do not pass enough data, the performance of the model will suffer.

In [None]:
# Top 5 rows from C4 dataset with a bit of cleanup
# https://huggingface.co/datasets/allenai/c4/viewer/en

custom_dataset = [
    "Beginners BBQ Class Taking Place in Missoula! Do you want to get better at making delicious BBQ? You will have the opportunity, put this on your calendar now. Thursday, September 22nd join World Class BBQ Champion, Tony Balay from Lonestar Smoke Rangers. He will be teaching a beginner level class for everyone who wants to get better with their culinary skills. He will teach you everything you need to know to compete in a KCBS BBQ competition, including techniques, recipes, timelines, meat selection and trimming, plus smoker and fire information. The cost to be in the class is $35 per person, and for spectators it is free. Included in the cost will be either a t-shirt or apron and you will be tasting samples of each meat that is prepared.",
    "Discussion in 'Mac OS X Lion (10.7)' started by axboi87, Jan 20, 2012. I've got a 500gb internal drive and a 240gb SSD. When trying to restore using disk utility i'm given the error 'Not enough space on disk ____ to restore' But I shouldn't have to do that!!! Any ideas or workarounds before resorting to the above? Use Carbon Copy Cloner to copy one drive to the other. I've done this several times going from larger HDD to smaller SSD and I wound up with a bootable SSD drive. One step you have to remember not to skip is to use Disk Utility to partition the SSD as GUID partition scheme HFS+ before doing the clone. If it came Apple Partition Scheme, even if you let CCC do the clone, the resulting drive won't be bootable. CCC usually works in 'file mode' and it can easily copy a larger drive (that's mostly empty) onto a smaller drive. If you tell CCC to clone a drive you did NOT boot from, it can work in block copy mode where the destination drive must be the same size or larger than the drive you are cloning from (if I recall). I've actually done this somehow on Disk Utility several times (booting from a different drive (or even the dvd) so not running disk utility from the drive your cloning) and had it work just fine from larger to smaller bootable clone. Definitely format the drive cloning to first, as bootable Apple etc.. Thanks for pointing this out. My only experience using DU to go larger to smaller was when I was trying to make a Lion install stick and I was unable to restore InstallESD.dmg to a 4 GB USB stick but of course the reason that wouldn't fit is there was slightly more than 4 GB of data.",
    "Foil plaid lycra and spandex shortall with metallic slinky insets. Attached metallic elastic belt with O-ring. Headband included. Great hip hop or jazz dance costume. Made in the USA.",
    "How many backlinks per day for new site? Discussion in 'Black Hat SEO' started by Omoplata, Dec 3, 2010. 1) for a newly created site, what's the max # backlinks per day I should do to be safe? 2) how long do I have to let my site age before I can start making more blinks? I did about 6000 forum profiles every 24 hours for 10 days for one of my sites which had a brand new domain. There is three backlinks for every of these forum profile so thats 18 000 backlinks every 24 hours and nothing happened in terms of being penalized or sandboxed. This is now maybe 3 months ago and the site is ranking on first page for a lot of my targeted keywords. build more you can in starting but do manual submission and not spammy type means manual + relevant to the post.. then after 1 month you can make a big blast.. Wow, dude, you built 18k backlinks a day on a brand new site? How quickly did you rank up? What kind of competition/searches did those keywords have?",
    "The Denver Board of Education opened the 2017-18 school year with an update on projects that include new construction, upgrades, heat mitigation and quality learning environments. We are excited that Denver students will be the beneficiaries of a four year, $572 million General Obligation Bond. Since the passage of the bond, our construction team has worked to schedule the projects over the four-year term of the bond. Denver voters on Tuesday approved bond and mill funding measures for students in Denver Public Schools, agreeing to invest $572 million in bond funding to build and improve schools and $56.6 million in operating dollars to support proven initiatives, such as early literacy. Denver voters say yes to bond and mill levy funding support for DPS students and schools. Click to learn more about the details of the voter-approved bond measure. Denver voters on Nov. 8 approved bond and mill funding measures for DPS students and schools. Learn more about what’s included in the mill levy measure."
]

In [None]:
import torch
from transformers import AutoModelForCausalLM, GPTQConfig, AutoTokenizer

# Select the model to be quantized
model_id = "facebook/opt-125m"

# Setup the quantization configuration
# Custom dataset can be replaced with an actual dataset. Keep in mind duration of quantization will depend on the size of the dataset
quantization_config = GPTQConfig(
    bits=4,
    group_size=128,
    desc_act=False,
    dataset=custom_dataset
)

# Tokenizer stays the same for
tokenizer = AutoTokenizer.from_pretrained(model_id,clean_up_tokenization_spaces=True)
quantized_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, torch_dtype=torch.float16, device_map="auto")

In [None]:
# quantized_model.model.decoder.layers[0].self_attn.q_proj.__dict__

As you can see from the generation below, the performance seems to be slightly worse than the model quantized using the `c4` dataset.

In [None]:
text = "My name is"
inputs = tokenizer(text, return_tensors="pt").to(0)

out = quantized_model.generate(**inputs)
print(tokenizer.decode(out[0], skip_special_tokens=True))

## Share quantized models on 🤗 Hub

After quantizing the model, it can be used out-of-the-box for inference or you can push the quantized weights on the 🤗 Hub to share your quantized model with the community

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
# Uncomment appropriate line

# quantized_model.push_to_hub("opt-125m-gptq-4bit")

# quantized_model.push_to_hub("opt-125m-gptq-8bit")