# RoBERTa Testing Notebook

In [1]:
%load_ext autoreload
%autoreload 2
# Enables autoreload; learn more at https://docs.databricks.com/en/files/workspace-modules.html#autoreload-for-python-modules
# To disable autoreload; run %autoreload 0

In [2]:
import os
import sys
sys.path.append(os.path.abspath('..'))

from constants import (
    TARGET_SPARSITY_LOW, TARGET_SPARSITY_MID, TARGET_SPARSITY_HIGH,
    BATCH_SIZE_CNN, BATCH_SIZE_VIT, BATCH_SIZE_LLM,
    EPOCHS_SMALL_MODEL, EPOCHS_LARGE_MODEL, EPOCHS_VIT
)
from utils import get_device, get_num_workers, load_weights, print_statistics
from unstructured_pruning import check_model_sparsity, check_sparsity_distribution
from trainer import TrainingArguments, Trainer
from bacp import BaCPTrainingArguments, BaCPTrainer

from datasets.utils.logging import disable_progress_bar
disable_progress_bar()
os.environ["HF_DATASETS_CACHE"] = "./cache"
os.environ["TOKENIZERS_PARALLELISM"] = "false" 


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
DEVICE = get_device()
NUM_WORKERS = get_num_workers()
print("Using device:", DEVICE)
print("Using", NUM_WORKERS, "workers")

Using device: cuda
Using 288 workers


In [4]:
MODEL_NAME = "roberta-base"
MODEL_TASK = "sst2"
TRAIN = True

## Baseline Accuracies

In [5]:
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 2e-5),
    scheduler_type='linear_with_warmup',
    epochs=5,
    learning_type="baseline",
    db=False
)
trainer = Trainer(training_args=training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

[TRAINER] Image size: None


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Initialized models
[TRAINER] Optimizer type w/ learning rate: (adamw, 2e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] Linear scheduler initialized with warmup steps: 526 and total steps: 5260
[TRAINER] Pruning not initialized
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.0


  return forward_call(*args, **kwargs)
                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     95.43%

Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.0000 (0.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        baseline
  Batch Size:           64
  Learning Rate:        2e-05
  Optimizer:            adamw
  Epochs:               5

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24



## Pruning Accuracies

### Magnitude Prune

In [25]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="magnitude_pruning",
    target_sparsity=TARGET_SPARSITY_LOW,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: magnitude_pruning
[TRAINER] Target sparsity: 0.95
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_magnitude_pruning_0.95_pruning.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_magnitude_pruning_0.95_pruning.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.95


                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     84.62%

🧠 Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.9500 (95.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        pruning
  Batch Size:           64
  Learning Rate:        5e-05
  Optimizer:            adamw
  Epochs:               5

Pruning Configuration:
------------------------------
  Pruning Type:         magnitude_pruning
  Target Sparsity:      0.95
  Sparsity Scheduler:   cubic
  Recovery Epochs:      10

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24





In [21]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="magnitude_pruning",
    target_sparsity=TARGET_SPARSITY_MID,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: magnitude_pruning
[TRAINER] Target sparsity: 0.97
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_magnitude_pruning_0.97_pruning.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_magnitude_pruning_0.97_pruning.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.97


  return forward_call(*args, **kwargs)
                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     80.77%

🧠 Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.9700 (97.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        pruning
  Batch Size:           64
  Learning Rate:        5e-05
  Optimizer:            adamw
  Epochs:               5

Pruning Configuration:
------------------------------
  Pruning Type:         magnitude_pruning
  Target Sparsity:      0.97
  Sparsity Scheduler:   cubic
  Recovery Epochs:      10

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24





In [20]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="magnitude_pruning",
    target_sparsity=TARGET_SPARSITY_HIGH,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: magnitude_pruning
[TRAINER] Target sparsity: 0.99
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_magnitude_pruning_0.99_pruning.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_magnitude_pruning_0.99_pruning.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.99


  return forward_call(*args, **kwargs)
                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     78.12%

🧠 Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.9900 (99.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        pruning
  Batch Size:           64
  Learning Rate:        5e-05
  Optimizer:            adamw
  Epochs:               5

Pruning Configuration:
------------------------------
  Pruning Type:         magnitude_pruning
  Target Sparsity:      0.99
  Sparsity Scheduler:   cubic
  Recovery Epochs:      10

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24





### SNIP-it Prune

In [19]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="snip_pruning",
    target_sparsity=TARGET_SPARSITY_LOW,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: snip_pruning
[TRAINER] Target sparsity: 0.95
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.95_pruning.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.95_pruning.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.95


  return forward_call(*args, **kwargs)
                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     87.38%

🧠 Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.9500 (95.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        pruning
  Batch Size:           64
  Learning Rate:        5e-05
  Optimizer:            adamw
  Epochs:               5

Pruning Configuration:
------------------------------
  Pruning Type:         snip_pruning
  Target Sparsity:      0.95
  Sparsity Scheduler:   cubic
  Recovery Epochs:      10

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24





In [26]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="snip_pruning",
    target_sparsity=TARGET_SPARSITY_MID,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt


[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: snip_pruning
[TRAINER] Target sparsity: 0.97
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.97_pruning.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.97_pruning.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.97


  return forward_call(*args, **kwargs)
                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     85.82%

🧠 Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.9700 (97.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        pruning
  Batch Size:           64
  Learning Rate:        5e-05
  Optimizer:            adamw
  Epochs:               5

Pruning Configuration:
------------------------------
  Pruning Type:         snip_pruning
  Target Sparsity:      0.97
  Sparsity Scheduler:   cubic
  Recovery Epochs:      10

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24





In [27]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="snip_pruning",
    target_sparsity=TARGET_SPARSITY_HIGH,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if False:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: snip_pruning
[TRAINER] Target sparsity: 0.99
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.99_pruning.pt
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.99_pruning.pt
[TRAINER] Weights loaded successfully
[TRAINER] Model Sparsity: 0.99


  return forward_call(*args, **kwargs)
                                                           


TRAINING STATISTICS SUMMARY

Performance Metrics:
------------------------------
  Accuracy:     83.41%

🧠 Model Information:
------------------------------
  Total Parameters:     124,647,170
  Trainable Parameters: 124,647,170
  Model Sparsity:       0.9900 (99.00%)

Training Configuration:
------------------------------
  Model:                roberta-base
  Task:                 sst2
  Learning Type:        pruning
  Batch Size:           64
  Learning Rate:        5e-05
  Optimizer:            adamw
  Epochs:               5

Pruning Configuration:
------------------------------
  Pruning Type:         snip_pruning
  Target Sparsity:      0.99
  Sparsity Scheduler:   cubic
  Recovery Epochs:      10

System Information:
------------------------------
  Device:               cuda
  Mixed Precision:      True
  Workers:              24





### WandA Prune

In [0]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="wanda_pruning",
    target_sparsity=TARGET_SPARSITY_LOW,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

In [0]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="wanda_pruning",
    target_sparsity=TARGET_SPARSITY_MID,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

In [None]:
# Initializing finetuned weights path
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"
training_args = TrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 5e-5),
    pruning_type="wanda_pruning",
    target_sparsity=TARGET_SPARSITY_HIGH,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type="pruning",
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Initialized models
[TRAINER] Loading weights: ./research/roberta-base/sst2/roberta-base_sst2_baseline.pt
[TRAINER] Weights loaded
[TRAINER] Optimizer type w/ learning rate: (adamw, 5e-05)
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] No scheduler initialized
[TRAINER] Pruning initialized
[TRAINER] Pruning type: wanda_pruning
[TRAINER] Target sparsity: 0.99
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_wanda_pruning_0.99_pruning.pt
[LOGGER] Log file created at location: ./log_records/roberta-base/sst2/pruning/wanda_pruning/0.99/run_2.log
[TRAINER] Training with mixed precision enabled
[TRAINER] Initial model sparsity: 0.0
[Pruner] Adding hooks


Training Epoch [1/5]:   0%|          | 0/1052 [00:00<?, ?it/s]


[Pruner] Cubic Sparsity ratio increased to 0.483.



Training Epoch [1/5]:   0%|          | 3/1052 [00:01<07:42,  2.27it/s, Loss=0.4245, Sparsity=0.4831]


[Pruner] Removing hooks


                                                                                                       

Recovery epoch [1/10]: Avg Loss: 0.2333 | Avg Accuracy: 90.99 | Model Sparsity: 0.4831

[TRAINER] weights saved!


                                                                                                        

Recovery epoch [1/10]: Avg Loss: 0.1408 | Avg Accuracy: 91.59 | Model Sparsity: 0.4831

[TRAINER] weights saved!


                                                                                                        

Recovery epoch [2/10]: Avg Loss: 0.1036 | Avg Accuracy: 90.38 | Model Sparsity: 0.4831



                                                                                                        

Recovery epoch [3/10]: Avg Loss: 0.0804 | Avg Accuracy: 90.75 | Model Sparsity: 0.4831



                                                                                                        

Recovery epoch [4/10]: Avg Loss: 0.0663 | Avg Accuracy: 91.23 | Model Sparsity: 0.4831



                                                                                                        

Recovery epoch [5/10]: Avg Loss: 0.0541 | Avg Accuracy: 91.59 | Model Sparsity: 0.4831



                                                                                                        

Recovery epoch [6/10]: Avg Loss: 0.0466 | Avg Accuracy: 91.83 | Model Sparsity: 0.4831

[TRAINER] weights saved!


                                                                                                        

Recovery epoch [7/10]: Avg Loss: 0.0404 | Avg Accuracy: 91.71 | Model Sparsity: 0.4831



Recovery Epoch [8/10]:  17%|█▋        | 177/1052 [00:15<01:15, 11.57it/s, Loss=0.0285, Sparsity=0.4831]Exception in thread Thread-92 (_pin_memory_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 772, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 61, in _pin_memory_loop
    do_one_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 37, in do_one_step
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocess

KeyboardInterrupt: 

## BaCP Accuracies

### Magnitude Pruning

In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="magnitude_pruning",
    target_sparsity=TARGET_SPARSITY_LOW,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="magnitude_pruning",
    target_sparsity=TARGET_SPARSITY_MID,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="magnitude_pruning",
    target_sparsity=TARGET_SPARSITY_HIGH,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


### SNIP-it Pruning

In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="snip_pruning",
    target_sparsity=TARGET_SPARSITY_LOW,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


Exception ignored in: <function _ConnectionBase.__del__ at 0x7d08b6319fc0>
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 132, in __del__
    self._close()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 361, in _close
    _close(self._handle)
OSError: [Errno 9] Bad file descriptor
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Weights loaded successfully
[TRAINER] Initialized BaCP models
[TRAINER] Optimizer type w/ learning rate: (adamw, 1e-05)
[TRAINER] No scheduler initialized
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] Pruning initialized
[TRAINER] Pruning type: snip_pruning
[TRAINER] Target sparsity: 0.95
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_snip_pruning_0.95_bacp_pruning.pt
[LOGGER] Log file created at location: ./log_records/roberta-base/sst2/bacp_pruning/snip_pruning/0.95/run_1.log


Training Epoch [1/5]:   0%|          | 0/1052 [00:00<?, ?it/s]


[Pruner] Cubic Sparsity ratio increased to 0.464.



                                                                                                                                              

Epoch [1/5]: Avg Total Loss: 5.8722 | Avg PrC Loss: 2.4632 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 3.2346 | Avg CE Loss: 0.1743 | Model Sparsity: 0.4636

[BaCP] weights saved!


                                                                                                                                                 

Retraining Epoch [1/10]: Avg Total Loss: 5.5358 | Avg PrC Loss: 2.5841 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.7845 | Avg CE Loss: 0.1672 | Model Sparsity: 0.4636



                                                                                                                                                 

Retraining Epoch [2/10]: Avg Total Loss: 5.3688 | Avg PrC Loss: 2.6226 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.6006 | Avg CE Loss: 0.1456 | Model Sparsity: 0.4636



                                                                                                                                                 

Retraining Epoch [3/10]: Avg Total Loss: 5.2679 | Avg PrC Loss: 2.6319 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.5123 | Avg CE Loss: 0.1236 | Model Sparsity: 0.4636



                                                                                                                                                   

Retraining Epoch [4/10]: Avg Total Loss: 5.2010 | Avg PrC Loss: 2.6350 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.4574 | Avg CE Loss: 0.1086 | Model Sparsity: 0.4636



                                                                                                                                                   

Retraining Epoch [5/10]: Avg Total Loss: 5.1540 | Avg PrC Loss: 2.6362 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.4191 | Avg CE Loss: 0.0987 | Model Sparsity: 0.4636



                                                                                                                                                   

Retraining Epoch [6/10]: Avg Total Loss: 5.1164 | Avg PrC Loss: 2.6366 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3880 | Avg CE Loss: 0.0918 | Model Sparsity: 0.4636



                                                                                                                                                   

Retraining Epoch [7/10]: Avg Total Loss: 5.0863 | Avg PrC Loss: 2.6366 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3633 | Avg CE Loss: 0.0863 | Model Sparsity: 0.4636



                                                                                                                                                   

Retraining Epoch [8/10]: Avg Total Loss: 5.0617 | Avg PrC Loss: 2.6367 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3426 | Avg CE Loss: 0.0823 | Model Sparsity: 0.4636



                                                                                                                                                   

Retraining Epoch [9/10]: Avg Total Loss: 5.0405 | Avg PrC Loss: 2.6370 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3250 | Avg CE Loss: 0.0785 | Model Sparsity: 0.4636



                                                                                                                                                    

Retraining Epoch [10/10]: Avg Total Loss: 5.0221 | Avg PrC Loss: 2.6368 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3100 | Avg CE Loss: 0.0753 | Model Sparsity: 0.4636



Training Epoch [2/5]:   0%|          | 0/1052 [00:00<?, ?it/s]


[Pruner] Cubic Sparsity ratio increased to 0.745.



                                                                                                                                                   

Epoch [2/5]: Avg Total Loss: 8.7748 | Avg PrC Loss: 2.7140 | Avg SnC Loss: 3.6870 | Avg FiC Loss: 2.2975 | Avg CE Loss: 0.0762 | Model Sparsity: 0.7448

[BaCP] weights saved!


                                                                                                                                                      

Retraining Epoch [1/10]: Avg Total Loss: 8.6120 | Avg PrC Loss: 2.7464 | Avg SnC Loss: 3.4508 | Avg FiC Loss: 2.3369 | Avg CE Loss: 0.0779 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [2/10]: Avg Total Loss: 8.5390 | Avg PrC Loss: 2.7532 | Avg SnC Loss: 3.3443 | Avg FiC Loss: 2.3637 | Avg CE Loss: 0.0778 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [3/10]: Avg Total Loss: 8.4886 | Avg PrC Loss: 2.7557 | Avg SnC Loss: 3.2762 | Avg FiC Loss: 2.3798 | Avg CE Loss: 0.0769 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [4/10]: Avg Total Loss: 8.4526 | Avg PrC Loss: 2.7568 | Avg SnC Loss: 3.2275 | Avg FiC Loss: 2.3922 | Avg CE Loss: 0.0761 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [5/10]: Avg Total Loss: 8.4208 | Avg PrC Loss: 2.7576 | Avg SnC Loss: 3.1895 | Avg FiC Loss: 2.3987 | Avg CE Loss: 0.0750 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [6/10]: Avg Total Loss: 8.3956 | Avg PrC Loss: 2.7583 | Avg SnC Loss: 3.1605 | Avg FiC Loss: 2.4028 | Avg CE Loss: 0.0739 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [7/10]: Avg Total Loss: 8.3753 | Avg PrC Loss: 2.7591 | Avg SnC Loss: 3.1370 | Avg FiC Loss: 2.4062 | Avg CE Loss: 0.0730 | Model Sparsity: 0.7448



                                                                                                                                                      

Retraining Epoch [8/10]: Avg Total Loss: 8.3571 | Avg PrC Loss: 2.7596 | Avg SnC Loss: 3.1171 | Avg FiC Loss: 2.4081 | Avg CE Loss: 0.0723 | Model Sparsity: 0.7448



Retraining epoch [9/10]:  29%|██▉       | 304/1052 [00:43<01:47,  6.98it/s, Loss=0.0608, PrC Loss=2.75, SnC Loss=3.09, FiC Loss=2.36, CE Loss=0.0608]

In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="snip_pruning",
    target_sparsity=TARGET_SPARSITY_MID,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="snip_pruning",
    target_sparsity=TARGET_SPARSITY_HIGH,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


### Wanda Pruning

In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="wanda_pruning",
    target_sparsity=TARGET_SPARSITY_LOW,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


In [0]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="wanda_pruning",
    target_sparsity=TARGET_SPARSITY_MID,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


In [28]:
finetuned_weights = f"./research/{MODEL_NAME}/{MODEL_TASK}/{MODEL_NAME}_{MODEL_TASK}_baseline.pt"

bacp_training_args = BaCPTrainingArguments(
    model_name=MODEL_NAME,
    model_task=MODEL_TASK,
    batch_size=BATCH_SIZE_LLM,
    optimizer_type_and_lr=('adamw', 1e-5),
    pruning_type="wanda_pruning",
    target_sparsity=TARGET_SPARSITY_HIGH,
    sparsity_scheduler='cubic',
    finetuned_weights=finetuned_weights,
    learning_type='bacp_pruning',
    db=False,
    )
bacp_trainer = BaCPTrainer(bacp_training_args)
if TRAIN:
    bacp_trainer.train()

# Finetuning Phase
bacp_trainer.generate_mask_from_model()
training_args = TrainingArguments(
    model_name=bacp_trainer.model_name,
    model_task=bacp_trainer.model_task,
    batch_size=bacp_trainer.batch_size,
    optimizer_type_and_lr=('adamw', 2e-5),
    pruner=bacp_trainer.get_pruner(),
    pruning_type=bacp_trainer.pruning_type,
    target_sparsity=bacp_trainer.target_sparsity,
    finetuned_weights=bacp_trainer.save_path,
    finetune=True,
    learning_type="bacp_finetune",
    epochs=10,
    db=False,
)
trainer = Trainer(training_args)
if TRAIN:
    trainer.train()

metrics = trainer.evaluate()
print_statistics(metrics, trainer)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[TRAINER] Image size: None
[TRAINER] Weights loaded successfully
[TRAINER] Initialized BaCP models
[TRAINER] Optimizer type w/ learning rate: (adamw, 1e-05)
[TRAINER] No scheduler initialized
[DATALOADERS] ['train', 'validation', 'test']
[TRAINER] Data Initialized for model task: sst2
[TRAINER] Batch size: 64
[TRAINER] Number of dataloders: 2
[TRAINER] Pruning initialized
[TRAINER] Pruning type: wanda_pruning
[TRAINER] Target sparsity: 0.99
[TRAINER] Sparsity scheduler: cubic
[TRAINER] Pruning epochs: 5
[TRAINER] Current sparsity: 0.0000
[TRAINER] Saving model to: ./research/roberta-base/sst2/roberta-base_sst2_wanda_pruning_0.99_bacp_pruning.pt
[LOGGER] Log file created at location: ./log_records/roberta-base/sst2/bacp_pruning/wanda_pruning/0.99/run_1.log
[Pruner] Adding hooks


Training Epoch [1/5]:   0%|          | 0/1052 [00:00<?, ?it/s]


[Pruner] Cubic Sparsity ratio increased to 0.483.



Training Epoch [1/5]:   0%|          | 2/1052 [00:01<12:57,  1.35it/s, Loss=0.171, PrC Loss=2.42, SnC Loss=0, FiC Loss=3.45, CE Loss=0.171]


[Pruner] Removing hooks


                                                                                                                                              

Epoch [1/5]: Avg Total Loss: 5.8682 | Avg PrC Loss: 2.4634 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 3.2323 | Avg CE Loss: 0.1726 | Model Sparsity: 0.4831

[BaCP] weights saved!


                                                                                                                                                 

Retraining Epoch [1/10]: Avg Total Loss: 5.5396 | Avg PrC Loss: 2.5838 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.7851 | Avg CE Loss: 0.1707 | Model Sparsity: 0.4831



                                                                                                                                                 

Retraining Epoch [2/10]: Avg Total Loss: 5.3799 | Avg PrC Loss: 2.6224 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.6015 | Avg CE Loss: 0.1560 | Model Sparsity: 0.4831



                                                                                                                                                 

Retraining Epoch [3/10]: Avg Total Loss: 5.2802 | Avg PrC Loss: 2.6330 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.5115 | Avg CE Loss: 0.1357 | Model Sparsity: 0.4831



                                                                                                                                                   

Retraining Epoch [4/10]: Avg Total Loss: 5.2096 | Avg PrC Loss: 2.6374 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.4542 | Avg CE Loss: 0.1180 | Model Sparsity: 0.4831



                                                                                                                                                   

Retraining Epoch [5/10]: Avg Total Loss: 5.1573 | Avg PrC Loss: 2.6380 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.4145 | Avg CE Loss: 0.1047 | Model Sparsity: 0.4831



                                                                                                                                                   

Retraining Epoch [6/10]: Avg Total Loss: 5.1185 | Avg PrC Loss: 2.6384 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3847 | Avg CE Loss: 0.0955 | Model Sparsity: 0.4831



                                                                                                                                                   

Retraining Epoch [7/10]: Avg Total Loss: 5.0851 | Avg PrC Loss: 2.6380 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3588 | Avg CE Loss: 0.0883 | Model Sparsity: 0.4831



                                                                                                                                                   

Retraining Epoch [8/10]: Avg Total Loss: 5.0602 | Avg PrC Loss: 2.6375 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3390 | Avg CE Loss: 0.0836 | Model Sparsity: 0.4831



                                                                                                                                                   

Retraining Epoch [9/10]: Avg Total Loss: 5.0386 | Avg PrC Loss: 2.6372 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3218 | Avg CE Loss: 0.0795 | Model Sparsity: 0.4831



                                                                                                                                                    

Retraining Epoch [10/10]: Avg Total Loss: 5.0194 | Avg PrC Loss: 2.6371 | Avg SnC Loss: 0.0000 | Avg FiC Loss: 2.3063 | Avg CE Loss: 0.0761 | Model Sparsity: 0.4831

[Pruner] Adding hooks


Training Epoch [2/5]:   0%|          | 0/1052 [00:00<?, ?it/s]


[Pruner] Cubic Sparsity ratio increased to 0.776.



Training Epoch [2/5]:   0%|          | 2/1052 [00:01<08:49,  1.98it/s, Loss=0.0649, PrC Loss=2.62, SnC Loss=3.85, FiC Loss=2.25, CE Loss=0.0649]


[Pruner] Removing hooks


                                                                                                                                                   

Epoch [2/5]: Avg Total Loss: 8.6467 | Avg PrC Loss: 2.7512 | Avg SnC Loss: 3.5336 | Avg FiC Loss: 2.2777 | Avg CE Loss: 0.0841 | Model Sparsity: 0.7762

[BaCP] weights saved!


                                                                                                                                                      

Retraining Epoch [1/10]: Avg Total Loss: 8.5347 | Avg PrC Loss: 2.7857 | Avg SnC Loss: 3.3602 | Avg FiC Loss: 2.2980 | Avg CE Loss: 0.0908 | Model Sparsity: 0.7762



                                                                                                                                                      

Retraining Epoch [2/10]: Avg Total Loss: 8.4873 | Avg PrC Loss: 2.7923 | Avg SnC Loss: 3.2914 | Avg FiC Loss: 2.3097 | Avg CE Loss: 0.0940 | Model Sparsity: 0.7762



                                                                                                                                                      

Retraining Epoch [3/10]: Avg Total Loss: 8.4523 | Avg PrC Loss: 2.7958 | Avg SnC Loss: 3.2445 | Avg FiC Loss: 2.3161 | Avg CE Loss: 0.0959 | Model Sparsity: 0.7762



                                                                                                                                                      

Retraining Epoch [4/10]: Avg Total Loss: 8.4260 | Avg PrC Loss: 2.7975 | Avg SnC Loss: 3.2079 | Avg FiC Loss: 2.3222 | Avg CE Loss: 0.0983 | Model Sparsity: 0.7762



                                                                                                                                                      

Retraining Epoch [5/10]: Avg Total Loss: 8.4030 | Avg PrC Loss: 2.7997 | Avg SnC Loss: 3.1772 | Avg FiC Loss: 2.3253 | Avg CE Loss: 0.1009 | Model Sparsity: 0.7762



                                                                                                                                                      

KeyboardInterrupt: 