# Env

If you're opening this Notebook on colab, you will probably need to install 🤗 Transformers and 🤗 Datasets. Uncomment the following cell and run it.

In [1]:
! pip install datasets transformers
! pip install huggingface-hub
! pip install sentencepiece==0.1.94
! pip install pynvml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

## Log-in

In [2]:
from huggingface_hub import notebook_login
# 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
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Then you need to install Git-LFS. Uncomment the following instructions:

In [3]:
!apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
git-lfs is already the newest version (2.3.4-1).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.


Make sure your version of Transformers is at least 4.11.0 since the functionality was introduced in that version:

In [4]:
import transformers

print(transformers.__version__)

4.24.0


In [5]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/text-classification).

# Pruning

## Pre

In [9]:
from transformers import default_data_collator
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from transformers import AutoTokenizer
import numpy as np
# model_choice = ["microsoft/deberta-base", "gpt2", "facebook/blenderbot-3B"]
# model_choice = ["microsoft/deberta-base", "gpt2", "t5-3b"]
# model_test = AutoModelForSequenceClassification.from_pretrained(model_choice[2], num_labels=num_labels).to(device)
# model_choice = ["microsoft/deberta-base", "EleutherAI/gpt-j-6B", "allenai/led-large-16384"]


data_collator = default_data_collator
num_labels = 2
model_choice = ["microsoft/deberta-base", "EleutherAI/gpt-j-6B", "allenai/led-large-16384"]
portion_choice = [0.1, 0.5, 0.9, 0.95, 0.99, 1.0]

## Prune Model

In [16]:
from torch import nn
import torch.nn.utils.prune as prune

def prune_model_global_unstructured(model, layer_type=nn.Linear, proportion=0.1):
    module_tups = []
    for module in model.modules():
        if isinstance(module, layer_type):
            module_tups.append((module, 'weight'))

    prune.global_unstructured(
        parameters=module_tups, pruning_method=prune.L1Unstructured,
        amount=proportion
    )
    for module, _ in module_tups:
        prune.remove(module, 'weight')
        print(module)
    return model

## Deberta

In [35]:
model_name = "microsoft/deberta-base"
model_test = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels).to(device)
# tokenizer_test = AutoTokenizer.from_pretrained(model_name, use_fast=True)
# experiment(model_name, model_test, tokenizer_test)
model_compact = prune_model_global_unstructured(model_test, proportion=0.5)
# print(model_compact)
# del model_test, model_compact

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForSequenceClassification: ['lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.dense.weight']
- This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['classifier.weight', 'pooler.d

Linear(in_features=768, out_features=2304, bias=False)
Linear(in_features=768, out_features=768, bias=False)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=2304, bias=False)
Linear(in_features=768, out_features=768, bias=False)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=2304, bias=False)
Linear(in_features=768, out_features=768, bias=False)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_fea

### Generally Large

In [36]:
from ast import excepthandler
for key, value in model_compact.state_dict().items():
    try:
        if value.mean() > 0.2:
            print(key)
    except:
        continue

deberta.embeddings.LayerNorm.weight
deberta.encoder.layer.0.attention.output.LayerNorm.weight
deberta.encoder.layer.0.output.LayerNorm.weight
deberta.encoder.layer.1.attention.output.LayerNorm.weight
deberta.encoder.layer.1.output.LayerNorm.weight
deberta.encoder.layer.2.attention.output.LayerNorm.weight
deberta.encoder.layer.2.output.LayerNorm.weight
deberta.encoder.layer.3.attention.output.LayerNorm.weight
deberta.encoder.layer.3.output.LayerNorm.weight
deberta.encoder.layer.4.attention.output.LayerNorm.weight
deberta.encoder.layer.4.output.LayerNorm.weight
deberta.encoder.layer.5.attention.output.LayerNorm.weight
deberta.encoder.layer.5.output.LayerNorm.weight
deberta.encoder.layer.6.attention.output.LayerNorm.weight
deberta.encoder.layer.6.output.LayerNorm.weight
deberta.encoder.layer.7.attention.output.LayerNorm.weight
deberta.encoder.layer.7.output.LayerNorm.weight
deberta.encoder.layer.8.attention.output.LayerNorm.weight
deberta.encoder.layer.8.output.LayerNorm.weight
deberta.en

In [37]:
from ast import excepthandler
for key, value in model_compact.state_dict().items():
    try:
        if value.mean() < 0.2:
            print(key, value)
    except:
        continue

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
         1.1235e-02,  7.1490e-03,  8.3615e-03, -4.9015e-02, -3.9357e-02,
        -1.1507e-02,  1.3822e-02,  5.6844e-02,  1.9594e-02, -1.1796e-02,
        -3.3466e-02,  4.2032e-02, -2.7210e-02,  5.4287e-02, -1.2042e-02,
        -2.2712e-03, -3.1130e-02, -6.0892e-02,  5.2161e-02, -3.9810e-02,
         2.3931e-02,  6.0386e-02, -4.0602e-04, -2.8765e-02,  9.4860e-03,
         2.7341e-03, -5.3746e-02, -8.1806e-02,  2.8682e-02, -2.5755e-04,
         2.2253e-02,  1.2925e-02, -4.0159e-02, -2.0850e-02, -6.8314e-03,
         4.2512e-02, -5.4706e-02, -4.3381e-02, -1.1599e-02, -4.9444e-03,
         7.9422e-03,  7.3537e-03, -1.0422e-01,  5.7831e-02, -5.2803e-02,
         5.6831e-03, -1.0000e-02, -2.2456e-02,  3.2678e-02,  1.0191e-02,
         4.8497e-02, -4.8756e-02, -6.6327e-02,  2.3080e-02, -3.5474e-02,
        -1.0330e-01,  9.9420e-04, -2.5961e-02,  2.8819e-02, -1.4099e-03,
        -3.7246e-02, -6.0417e-02,  4.5095e-02,  1.1228e-02,

## GPT-2

In [33]:
model_name = 'gpt2'
model_test = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels).to(device)
# tokenizer_test = AutoTokenizer.from_pretrained(model_name, use_fast=True)
# experiment(model_name, model_test, tokenizer_test)
model_compact = prune_model_global_unstructured(model_test, proportion=0.5)

# del model_test, tokenizer_test

from ast import excepthandler
for key, value in model_compact.state_dict().items():
    try:
        if value.mean() > 0.2:
            print(key)
    except:
        continue

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Linear(in_features=768, out_features=2, bias=False)
transformer.h.0.ln_2.weight
transformer.h.1.ln_1.weight
transformer.h.1.ln_2.weight
transformer.h.2.ln_1.weight
transformer.h.2.ln_2.weight
transformer.h.3.ln_1.weight
transformer.h.3.ln_2.weight
transformer.h.4.ln_1.weight
transformer.h.4.ln_2.weight
transformer.h.5.ln_1.weight
transformer.h.5.ln_2.weight
transformer.h.6.ln_1.weight
transformer.h.6.ln_2.weight
transformer.h.7.ln_1.weight
transformer.h.7.ln_2.weight
transformer.h.8.ln_1.weight
transformer.h.8.ln_2.weight
transformer.h.9.ln_1.weight
transformer.h.9.ln_2.weight
transformer.h.10.ln_1.weight
transformer.h.10.ln_2.weight
transformer.h.11.ln_1.weight
transformer.h.11.ln_2.weight
transformer.ln_f.weight


## LED

In [34]:
model_name = "allenai/longformer-base-4096"
model_test = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels).to(device)
# tokenizer_test = AutoTokenizer.from_pretrained(model_name, use_fast=True)
# experiment(model_name, model_test, tokenizer_test)
model_compact = prune_model_global_unstructured(model_test, proportion=0.5)

from ast import excepthandler
for key, value in model_compact.state_dict().items():
    try:
        if value.mean() > 0.2:
            print(key)
    except:
        continue

Some weights of the model checkpoint at allenai/longformer-base-4096 were not used when initializing LongformerForSequenceClassification: ['lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing LongformerForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LongformerForSequenceClassification were not initialized from the model checkpoint at allenai/longformer-base-4096 and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bias', 

Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=768, 