[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IST-DASLab/sparsegpt/blob/master/demo.ipynb)

Install dependencies

In [None]:
!pip install -q datasets
!pip install -q transformers

Clone repository

In [None]:
!git clone https://github.com/IST-DASLab/sparsegpt

### Pruning example
---

Below we will show an example of SparseGPT applied to OPT model.

In [None]:
%cd sparsegpt

/content/sparsegpt


Crerate directory to store prune model(s)

In [None]:
!mkdir -p sparse_opt

We will use `opt.py` script to prune the model.
Select one of the following OPT versions to fit into colab (with `bitsandbytes` one should be able to use larger 6.7b and 13b models):
* facebook/opt-125m
* facebook/opt-350m
* facebook/opt-1.3b

To prune the model select dataset for calibration (`c4`, `ptb` or `wikitext`). The SparseGPT paper uses `c4` by default.

One can prune model to uniform sparsity with SparseGPT either with unstructured pruning or semistructured `N:M` pattern.

To apply unstructured pruning specify `--sparsity` - floating point number in `[0, 1]`.

For semitstructured specify `--prunen` and `--prunem` arguments - integer numbers.

To apply magnitude pruning instead of SparseGPT select `--gmp` option.

To apply quantization on top of sparsity specify `--wbits`.

In the example below we prune `facebook/opt-125m` to 0.5 unstructured sparsity via SparseGPT. Try different options.


In [None]:
!python opt.py facebook/opt-125m c4 --sparsity 0.5 --save sparse_opt/opt-125m

Code above prints perplexity on `wikitext2`, `ptb` and `c4` benchmarks in the end.

### Compare generations
---

Let us compare generations produced by the dense and sparse model

In [None]:
from transformers import AutoTokenizer, OPTForCausalLM

In [None]:
device = 'cuda'

In [None]:
# load dense model
model_dn = OPTForCausalLM.from_pretrained('facebook/opt-125m', torch_dtype='auto').to(device)
# load sparse model
model_sp = OPTForCausalLM.from_pretrained('sparse_opt/opt-125m', torch_dtype='auto').to(device)
# init tokenizer
tokenizer = AutoTokenizer.from_pretrained('facebook/opt-125m')

In [None]:
input_text = "It takes a great deal of bravery"

In [None]:
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

Completion by dense model:

In [None]:
output_ids = model_dn.generate(input_ids)

In [None]:
print(tokenizer.decode(output_ids[0].cpu(), skip_special_tokens=True))

Completion by sparse model:

In [None]:
output_ids = model_sp.generate(input_ids)

In [None]:
print(tokenizer.decode(output_ids[0].cpu(), skip_special_tokens=True))