:::{.callout-note}

This example code is taken from the fastai [docs](https://docs.fast.ai/tutorial.transformers.html)

:::

In [None]:
#| include: false
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
from transformers.modeling_utils import Conv1D
from fastai.text.all import *
import fastcore
from fasterai.sparse.all import *

In [None]:
pretrained_weights = 'gpt2'
tokenizer = GPT2TokenizerFast.from_pretrained(pretrained_weights)
model = GPT2LMHeadModel.from_pretrained(pretrained_weights)

In [None]:
path = untar_data(URLs.WIKITEXT_TINY)

In [None]:
#| include: false
df_train = pd.read_csv(path/'train.csv', header=None)
df_valid = pd.read_csv(path/'test.csv', header=None)

In [None]:
#| include: false
all_texts = np.concatenate([df_train[0].values, df_valid[0].values])

In [None]:
#| include: false
class TransformersTokenizer(Transform):
    def __init__(self, tokenizer): self.tokenizer = tokenizer
    def encodes(self, x): 
        toks = self.tokenizer.tokenize(x)
        return tensor(self.tokenizer.convert_tokens_to_ids(toks))
    def decodes(self, x): return TitledStr(self.tokenizer.decode(x.cpu().numpy()))

In [None]:
#| include: false
splits = [range_of(df_train), list(range(len(df_train), len(all_texts)))]
tls = TfmdLists(all_texts, TransformersTokenizer(tokenizer), splits=splits, dl_type=LMDataLoader)

In [None]:
#| include: false
bs,sl = 4,256
dls = tls.dataloaders(bs=bs, seq_len=sl)

In [None]:
#| include: false
def tokenize(text):
    toks = tokenizer.tokenize(text)
    return tensor(tokenizer.convert_tokens_to_ids(toks))

tokenized = [tokenize(t) for t in progress_bar(all_texts)]

In [None]:
#| include: false
class TransformersTokenizer(Transform):
    def __init__(self, tokenizer): self.tokenizer = tokenizer
    def encodes(self, x): 
        return x if isinstance(x, Tensor) else tokenize(x)
        
    def decodes(self, x): return TitledStr(self.tokenizer.decode(x.cpu().numpy()))

In [None]:
#| include: false
tls = TfmdLists(tokenized, TransformersTokenizer(tokenizer), splits=splits, dl_type=LMDataLoader)
dls = tls.dataloaders(bs=bs, seq_len=sl)

In [None]:
#| include: false
class DropOutput(Callback):
    def after_pred(self): self.learn.pred = self.pred[0]

Let's create our fastai `Learner`.

In [None]:
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), cbs=[DropOutput], metrics=Perplexity())

And let's try to extend a given prompt with the pretrained model.

In [None]:
prompt = "\n = Unicorn = \n \n A unicorn is a magical creature with a rainbow tail and a horn"

In [None]:
#| include: false
prompt_ids = tokenizer.encode(prompt)
inp = tensor(prompt_ids)[None]

In [None]:
preds = learn.model.generate(inp, max_length=40, num_beams=5, temperature=1.5)

In [None]:
tokenizer.decode(preds[0].cpu().numpy())

In [None]:
learn.validate()

In [None]:
learn.fit_one_cycle(1, 1e-4)

In [None]:
prompt_ids = tokenizer.encode(prompt)
inp = tensor(prompt_ids)[None]

preds = learn.model.generate(inp.cuda(), max_length=40, num_beams=5, temperature=1.5)

tokenizer.decode(preds[0].cpu().numpy())

## Make it sparse !

Let's see now if we retrain our model, this time introducing sparsity

In [None]:
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), cbs=[DropOutput], metrics=Perplexity())

Unfortunately, the transformer model uses a custom layer: `Conv1D`, which is not a part of PyTorch. To overcome this problem, we have to add this layer to our `Granularities` class, so that it knows what to sparsify.

Here, the `Conv1D` behaves like a `Linear` layer, i.e. the weights are defined by a matrix of dimension `(nf,nx)`

In [None]:
doc(Conv1D)

We can thus add the Conv1D granularity by using the `add_granularity` method, indicating the target module and the corresponding granularities that it can handle (the same as `Linear` so we can reuse it)

In [None]:
Granularities.add_granularity(Conv1D, Granularities._granularities_Linear)

Let's now define our `SparsifyCallback`. Let's say we want to make our model 30% sparse, by removing the highest-norm weight in each attention head.

In [None]:
sp_cb = SparsifyCallback(sparsity=30, granularity='weight', context='local', criteria=large_final, schedule=one_cycle, layer_type=Conv1D)

We now only have to pass our callback to fastai

In [None]:
learn.fit_one_cycle(1, 1e-4, cbs=sp_cb)

And we can check the predicion to the same prompt as before

In [None]:
prompt_ids = tokenizer.encode(prompt)
inp = tensor(prompt_ids)[None]

preds = learn.model.generate(inp.cuda(), max_length=40, num_beams=5, temperature=1.5)

tokenizer.decode(preds[0].cpu().numpy())

That's it ! You now have a sparse Transformer as performant as the whole model. However, this model is currently not more efficient speed and storage wise. To have such a speed-up, I suggest you to look at the [granularity](https://nathanhubens.github.io/fasterai/granularity.html) section.