<a href="https://colab.research.google.com/github/Birkbeck/msc-projects-2023-4-Gabriele_Monti_PEFT/blob/main/prefix_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook demonstrates the implementation of prefix tuning for text generation tasks using the transformers and datasets libraries. It utilizes the T5 model and the Financial PhraseBank dataset for training. The notebook includes steps for setting up the environment, loading and preprocessing the dataset, configuring the model for prefix tuning, training the model, and evaluating its performance on generated text. This approach allows fine-tuning the model efficiently, especially for tasks requiring extensive text generation capabilities.

The original code can be found here

https://huggingface.co/docs/peft/main/en/task_guides/seq2seq-prefix-tuning

In [2]:
!pip install -q peft transformers datasets

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/296.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.4/296.4 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/547.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m32.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m48.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m103.3 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [5]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, default_data_collator, get_linear_schedule_with_warmup
from peft import get_peft_config, get_peft_model, get_peft_model_state_dict, PrefixTuningConfig, TaskType
from datasets import load_dataset
from torch.utils.data import DataLoader
from tqdm import tqdm
import torch
import os

# Set environment variables to control tokenizer parallelism and specify GPU to use
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Specify the device for computation (GPU)
device = "cuda"

# Specify the model and tokenizer paths
model_name_or_path = "t5-large"
tokenizer_name_or_path = "t5-large"

# Define the column names for input text and labels in the dataset
text_column = "sentence"
label_column = "text_label"

# Set the maximum sequence length for tokenization
max_length = 128

# Define the learning rate
lr = 1e-2

# Number of training epochs
num_epochs = 5

# Batch size for training
batch_size = 8


In [3]:
from datasets import load_dataset

# Load the financial_phrasebank dataset with all agreed sentences
dataset = load_dataset("financial_phrasebank", "sentences_allagree")

# Split the train dataset into train and validation sets (90% train, 10% validation)
dataset = dataset["train"].train_test_split(test_size=0.1)

# Rename the test split to validation
dataset["validation"] = dataset["test"]
del dataset["test"]

# Get the class names for the labels
classes = dataset["train"].features["label"].names

# Map the numeric labels to their corresponding text labels
dataset = dataset.map(
    lambda x: {"text_label": [classes[label] for label in x["label"]]},
    batched=True,
    num_proc=1,
)

# Display the first example from the training set
dataset["train"][0]
{"sentence": "Profit before taxes was EUR 4.0 mn , down from EUR 4.9 mn .", "label": 0, "text_label": "negative"}


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script:   0%|          | 0.00/6.04k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.88k [00:00<?, ?B/s]

The repository for financial_phrasebank contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/financial_phrasebank.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/682k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2264 [00:00<?, ? examples/s]

Map:   0%|          | 0/2037 [00:00<?, ? examples/s]

Map:   0%|          | 0/227 [00:00<?, ? examples/s]

{'sentence': 'Profit before taxes was EUR 4.0 mn , down from EUR 4.9 mn .',
 'label': 0,
 'text_label': 'negative'}

In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    # Extract input sentences and target labels from the examples
    inputs = examples[text_column]
    targets = examples[label_column]

    # Tokenize the input sentences with specified max length, padding, and truncation
    model_inputs = tokenizer(inputs, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")

    # Tokenize the target labels with specified max length, padding, and truncation
    labels = tokenizer(targets, max_length=2, padding="max_length", truncation=True, return_tensors="pt")

    # Extract the input_ids for labels
    labels = labels["input_ids"]

    # Replace padding token ids in labels with -100 to ignore them during loss computation
    labels[labels == tokenizer.pad_token_id] = -100

    # Add the processed labels to the model inputs
    model_inputs["labels"] = labels

    return model_inputs


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [5]:
processed_datasets = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=1,
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

Running tokenizer on dataset:   0%|          | 0/2037 [00:00<?, ? examples/s]

Running tokenizer on dataset:   0%|          | 0/227 [00:00<?, ? examples/s]

In [6]:
processed_datasets["train"][0]

{'input_ids': [1377,
  13,
  8,
  12190,
  56,
  281,
  12,
  8,
  23427,
  138,
  26,
  23,
  384,
  3,
  5,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0],
 'attention_mask': [1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
 

In [7]:
#data loader

train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["validation"]

# Create a DataLoader for the training dataset
train_dataloader = DataLoader(
    train_dataset,
    shuffle=True,                # Shuffle the dataset to ensure random sampling of data during training
    collate_fn=default_data_collator, # Collate function to handle batching and padding
    batch_size=batch_size,       # Set the batch size for training
    pin_memory=True              # Pin memory to speed up data transfer to GPU
)

# Create a DataLoader for the evaluation dataset
eval_dataloader = DataLoader(
    eval_dataset,
    collate_fn=default_data_collator, # Collate function to handle batching and padding
    batch_size=batch_size,       # Set the batch size for evaluation
    pin_memory=True              # Pin memory to speed up data transfer to GPU
)


In [8]:
#creation of the prefix model

peft_config = PrefixTuningConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, num_virtual_tokens=20)

model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 983040 || all params: 738651136 || trainable%: 0.13308583065659835"

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

trainable params: 983,040 || all params: 738,651,136 || trainable%: 0.1331


'trainable params: 983040 || all params: 738651136 || trainable%: 0.13308583065659835'

In [9]:
#adam optimizer

optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

In [10]:

#training loop

model = model.to(device)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for step, batch in enumerate(tqdm(train_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        total_loss += loss.detach().float()
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    model.eval()
    eval_loss = 0
    eval_preds = []
    for step, batch in enumerate(tqdm(eval_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)
        loss = outputs.loss
        eval_loss += loss.detach().float()
        eval_preds.extend(
            tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
        )

    eval_epoch_loss = eval_loss / len(eval_dataloader)
    eval_ppl = torch.exp(eval_epoch_loss)
    train_epoch_loss = total_loss / len(train_dataloader)
    train_ppl = torch.exp(train_epoch_loss)
    print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")

100%|██████████| 255/255 [00:36<00:00,  6.89it/s]
100%|██████████| 29/29 [00:02<00:00,  9.70it/s]


epoch=0: train_ppl=tensor(4.9660, device='cuda:0') train_epoch_loss=tensor(1.6026, device='cuda:0') eval_ppl=tensor(1.1292, device='cuda:0') eval_epoch_loss=tensor(0.1215, device='cuda:0')


100%|██████████| 255/255 [00:35<00:00,  7.11it/s]
100%|██████████| 29/29 [00:02<00:00,  9.67it/s]


epoch=1: train_ppl=tensor(1.1311, device='cuda:0') train_epoch_loss=tensor(0.1232, device='cuda:0') eval_ppl=tensor(1.0543, device='cuda:0') eval_epoch_loss=tensor(0.0528, device='cuda:0')


100%|██████████| 255/255 [00:35<00:00,  7.09it/s]
100%|██████████| 29/29 [00:03<00:00,  9.59it/s]


epoch=2: train_ppl=tensor(1.0868, device='cuda:0') train_epoch_loss=tensor(0.0833, device='cuda:0') eval_ppl=tensor(1.0415, device='cuda:0') eval_epoch_loss=tensor(0.0407, device='cuda:0')


100%|██████████| 255/255 [00:36<00:00,  6.98it/s]
100%|██████████| 29/29 [00:03<00:00,  9.52it/s]


epoch=3: train_ppl=tensor(1.0699, device='cuda:0') train_epoch_loss=tensor(0.0676, device='cuda:0') eval_ppl=tensor(1.0538, device='cuda:0') eval_epoch_loss=tensor(0.0524, device='cuda:0')


100%|██████████| 255/255 [00:36<00:00,  7.03it/s]
100%|██████████| 29/29 [00:03<00:00,  9.50it/s]

epoch=4: train_ppl=tensor(1.0724, device='cuda:0') train_epoch_loss=tensor(0.0699, device='cuda:0') eval_ppl=tensor(1.0424, device='cuda:0') eval_epoch_loss=tensor(0.0415, device='cuda:0')





In [11]:
#evaluation on the validation dataset

correct = 0
total = 0
for pred, true in zip(eval_preds, dataset["validation"]["text_label"]):
    if pred.strip() == true.strip():
        correct += 1
    total += 1
accuracy = correct / total * 100
print(f"{accuracy=} % on the evaluation dataset")
print(f"{eval_preds[:10]=}")
print(f"{dataset['validation']['text_label'][:10]=}")
"accuracy=97.3568281938326 % on the evaluation dataset"
"eval_preds[:10]=['neutral', 'positive', 'neutral', 'positive', 'neutral', 'negative', 'negative', 'neutral', 'neutral', 'neutral']"
"dataset['validation']['text_label'][:10]=['neutral', 'positive', 'neutral', 'positive', 'neutral', 'negative', 'negative', 'neutral', 'neutral', 'neutral']"

accuracy=97.79735682819384 % on the evaluation dataset
eval_preds[:10]=['neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'negative', 'positive', 'neutral', 'neutral']
dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'negative', 'positive', 'neutral', 'neutral']


"dataset['validation']['text_label'][:10]=['neutral', 'positive', 'neutral', 'positive', 'neutral', 'negative', 'negative', 'neutral', 'neutral', 'neutral']"

In [3]:
from huggingface_hub import notebook_login

notebook_login()



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
#save the model to hub

peft_model_id = "theoracle/t5-large_PREFIX_TUNING_SEQ2SEQ"
model.push_to_hub("theoracle/t5-large_PREFIX_TUNING_SEQ2SEQ", use_auth_token=True)

In [6]:
#load the model from hub

from peft import PeftModel, PeftConfig

peft_model_id = "theoracle/t5-large_PREFIX_TUNING_SEQ2SEQ"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/3.93M [00:00<?, ?B/s]

In [9]:
inputs = tokenizer(
    "today I had an horrible day",
    return_tensors="pt",
)

In [10]:
model.to(device)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=300)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))


['negative']


In [11]:
!pip install weightwatcher

Collecting weightwatcher
  Downloading weightwatcher-0.7.5.2-py3-none-any.whl.metadata (26 kB)
Collecting powerlaw (from weightwatcher)
  Downloading powerlaw-1.5-py3-none-any.whl.metadata (9.3 kB)
Downloading weightwatcher-0.7.5.2-py3-none-any.whl (80 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.1/80.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading powerlaw-1.5-py3-none-any.whl (24 kB)
Installing collected packages: powerlaw, weightwatcher
Successfully installed powerlaw-1.5 weightwatcher-0.7.5.2


In [12]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

# Analyze the model
results = watcher.analyze()

summary = watcher.get_summary(results)
summary

{'log_norm': 4.8408589878954045,
 'alpha': 4.237486565354773,
 'alpha_weighted': 13.559886742748054,
 'log_alpha_norm': 13.832584375943023,
 'log_spectral_norm': 3.2592943056022237,
 'stable_rank': 53.75282011887372}

### Introduction

This notebook focuses on summarization tasks using the `transformers` library and the CNN/DailyMail dataset. Due to the large size of the dataset, only a subset is utilized. Specifically, the first 7% of the training data is loaded and then split into training and validation sets. The workflow includes setting up the environment, loading and preprocessing the dataset, configuring the model for summarization, training the model, and evaluating its performance. This approach ensures efficient use of computational resources while demonstrating effective summarization techniques.

In [3]:
#use only a portion of the dataset as it is very large
from datasets import load_dataset, DatasetDict

# Load the first 10% of the training data
small_train_dataset = load_dataset("cnn_dailymail", "3.0.0", split="train[:7%]")

# Decide on the split sizes
train_size = 0.8  # 80% of the data for training
valid_size = 0.2  # Remaining 20% for validation

# Split the dataset
train_valid_split = small_train_dataset.train_test_split(test_size=valid_size)

# Organize the splits in a new DatasetDict for convenience
split_dataset = DatasetDict({
    'train': train_valid_split['train'],
    'validation': train_valid_split['test']
})

# Verify the sizes of the splits
print(f"Training set size: {len(split_dataset['train'])}")
print(f"Validation set size: {len(split_dataset['validation'])}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/15.6k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/257M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/257M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/259M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/34.7M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/30.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

Training set size: 16078
Validation set size: 4020


In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
max_input_length = 128
max_target_length = 128

def preprocess_function(examples):
    model_inputs = tokenizer(
        examples["article"],
        max_length=max_input_length,
        truncation=True,
        padding="max_length"  # Ensure all sequences are padded to the same length
    )
    labels = tokenizer(
        examples["highlights"],
        max_length=max_target_length,
        truncation=True,
        padding="max_length"  # Ensure all sequences are padded to the same length
    )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


encoded_subset = split_dataset.map(preprocess_function, batched=True)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Map:   0%|          | 0/16078 [00:00<?, ? examples/s]

Map:   0%|          | 0/4020 [00:00<?, ? examples/s]

In [10]:
peft_config = PrefixTuningConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, num_virtual_tokens=20)

model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

trainable params: 983,040 || all params: 738,651,136 || trainable%: 0.1331


In [11]:
train_dataset = encoded_subset["train"]
eval_dataset = encoded_subset["validation"]

# Create a DataLoader for the training dataset
train_dataloader = DataLoader(
    train_dataset,
    shuffle=True,                # Shuffle the dataset to ensure random sampling of data during training
    collate_fn=default_data_collator, # Collate function to handle batching and padding
    batch_size=batch_size,       # Set the batch size for training
    pin_memory=True              # Pin memory to speed up data transfer to GPU
)

# Create a DataLoader for the evaluation dataset
eval_dataloader = DataLoader(
    eval_dataset,
    collate_fn=default_data_collator, # Collate function to handle batching and padding
    batch_size=batch_size,       # Set the batch size for evaluation
    pin_memory=True              # Pin memory to speed up data transfer to GPU
)


In [12]:
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

In [13]:
model = model.to(device)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for step, batch in enumerate(tqdm(train_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        total_loss += loss.detach().float()
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    model.eval()
    eval_loss = 0
    eval_preds = []
    for step, batch in enumerate(tqdm(eval_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)
        loss = outputs.loss
        eval_loss += loss.detach().float()
        eval_preds.extend(
            tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
        )

    eval_epoch_loss = eval_loss / len(eval_dataloader)
    eval_ppl = torch.exp(eval_epoch_loss)
    train_epoch_loss = total_loss / len(train_dataloader)
    train_ppl = torch.exp(train_epoch_loss)
    print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")

100%|██████████| 2010/2010 [11:35<00:00,  2.89it/s]
100%|██████████| 503/503 [01:48<00:00,  4.64it/s]


epoch=0: train_ppl=tensor(3.8899, device='cuda:0') train_epoch_loss=tensor(1.3584, device='cuda:0') eval_ppl=tensor(2.6327, device='cuda:0') eval_epoch_loss=tensor(0.9680, device='cuda:0')


100%|██████████| 2010/2010 [11:35<00:00,  2.89it/s]
100%|██████████| 503/503 [01:48<00:00,  4.64it/s]


epoch=1: train_ppl=tensor(2.9049, device='cuda:0') train_epoch_loss=tensor(1.0664, device='cuda:0') eval_ppl=tensor(2.5889, device='cuda:0') eval_epoch_loss=tensor(0.9512, device='cuda:0')


100%|██████████| 2010/2010 [11:35<00:00,  2.89it/s]
100%|██████████| 503/503 [01:48<00:00,  4.65it/s]


epoch=2: train_ppl=tensor(2.8601, device='cuda:0') train_epoch_loss=tensor(1.0509, device='cuda:0') eval_ppl=tensor(2.5699, device='cuda:0') eval_epoch_loss=tensor(0.9438, device='cuda:0')


100%|██████████| 2010/2010 [11:35<00:00,  2.89it/s]
100%|██████████| 503/503 [01:48<00:00,  4.65it/s]


epoch=3: train_ppl=tensor(2.8358, device='cuda:0') train_epoch_loss=tensor(1.0423, device='cuda:0') eval_ppl=tensor(2.5623, device='cuda:0') eval_epoch_loss=tensor(0.9409, device='cuda:0')


100%|██████████| 2010/2010 [11:35<00:00,  2.89it/s]
100%|██████████| 503/503 [01:48<00:00,  4.65it/s]

epoch=4: train_ppl=tensor(2.8207, device='cuda:0') train_epoch_loss=tensor(1.0370, device='cuda:0') eval_ppl=tensor(2.5578, device='cuda:0') eval_epoch_loss=tensor(0.9391, device='cuda:0')





In [14]:
text ='''Amazon, the global e-commerce giant, has recently announced the launch of its highly anticipated Prime service, aiming to revolutionize the online shopping experience. Prime, which has already garnered significant attention, promises a host of benefits designed to enhance customer satisfaction and loyalty.

One of the most attractive features of Amazon Prime is its expedited shipping service. Prime members are eligible for free two-day shipping on millions of items, with some locations even offering same-day or one-day delivery. This feature is particularly beneficial for customers who need their purchases quickly, reducing the wait time significantly compared to standard shipping options.

In addition to fast and free shipping, Amazon Prime offers members access to a vast library of streaming content. Prime Video, the platform's streaming service, includes thousands of movies, TV shows, and exclusive content produced by Amazon Studios. This positions Prime as not just a shopping service but also a major player in the entertainment industry, competing with other streaming giants like Netflix and Hulu.

Another compelling feature of Amazon Prime is Prime Music, a streaming service that allows members to listen to over two million songs ad-free. This service is a direct competitor to other music streaming platforms like Spotify and Apple Music, offering curated playlists and personalized recommendations.

Prime members also benefit from exclusive access to early deals and discounts on a wide range of products during special events such as Prime Day. These events offer significant savings on popular items, making membership even more valuable for frequent shoppers.

Amazon has also introduced Prime Reading, which provides members with access to a rotating selection of books, magazines, and comics at no additional cost. This feature enhances the value of Prime membership for avid readers, offering a diverse array of reading material accessible from any device.

Moreover, Amazon Prime includes additional perks such as unlimited photo storage with Amazon Photos, which helps members securely store and organize their photos online. This feature adds another layer of value, especially for customers who frequently use digital photography.

The launch of Amazon Prime represents a significant step in Amazon's strategy to create a comprehensive and integrated service ecosystem. By bundling multiple services into a single membership, Amazon aims to increase customer retention and encourage more frequent use of its platform.

With its extensive range of benefits, Amazon Prime is poised to become an indispensable service for many consumers, offering unparalleled convenience and value. As the service continues to expand and evolve, it is expected to attract even more subscribers, further solidifying Amazon's position as a leader in the e-commerce and digital services industries.'''

In [17]:
peft_model_id = "theoracle/t5-large_PREFIX_TUNING_summarizer"
model.push_to_hub("theoracle/t5-large_PREFIX_TUNING_summarizer", use_auth_token=True)



adapter_model.safetensors:   0%|          | 0.00/3.93M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/theoracle/t5-large_PREFIX_TUNING_summarizer/commit/35b1a7655f8d4aeef84358724f833d208b100da6', commit_message='Upload model', commit_description='', oid='35b1a7655f8d4aeef84358724f833d208b100da6', pr_url=None, pr_revision=None, pr_num=None)

In [18]:
from peft import PeftModel, PeftConfig

peft_model_id = "theoracle/t5-large_PREFIX_TUNING_summarizer"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)

adapter_config.json:   0%|          | 0.00/370 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/3.93M [00:00<?, ?B/s]

In [20]:
text ='''Amazon, the global e-commerce giant, has recently announced the launch of its highly anticipated Prime service, aiming to revolutionize the online shopping experience. Prime, which has already garnered significant attention, promises a host of benefits designed to enhance customer satisfaction and loyalty.

One of the most attractive features of Amazon Prime is its expedited shipping service. Prime members are eligible for free two-day shipping on millions of items, with some locations even offering same-day or one-day delivery. This feature is particularly beneficial for customers who need their purchases quickly, reducing the wait time significantly compared to standard shipping options.

In addition to fast and free shipping, Amazon Prime offers members access to a vast library of streaming content. Prime Video, the platform's streaming service, includes thousands of movies, TV shows, and exclusive content produced by Amazon Studios. This positions Prime as not just a shopping service but also a major player in the entertainment industry, competing with other streaming giants like Netflix and Hulu.

Another compelling feature of Amazon Prime is Prime Music, a streaming service that allows members to listen to over two million songs ad-free. This service is a direct competitor to other music streaming platforms like Spotify and Apple Music, offering curated playlists and personalized recommendations.

Prime members also benefit from exclusive access to early deals and discounts on a wide range of products during special events such as Prime Day. These events offer significant savings on popular items, making membership even more valuable for frequent shoppers.

Amazon has also introduced Prime Reading, which provides members with access to a rotating selection of books, magazines, and comics at no additional cost. This feature enhances the value of Prime membership for avid readers, offering a diverse array of reading material accessible from any device.

Moreover, Amazon Prime includes additional perks such as unlimited photo storage with Amazon Photos, which helps members securely store and organize their photos online. This feature adds another layer of value, especially for customers who frequently use digital photography.

The launch of Amazon Prime represents a significant step in Amazon's strategy to create a comprehensive and integrated service ecosystem. By bundling multiple services into a single membership, Amazon aims to increase customer retention and encourage more frequent use of its platform.

With its extensive range of benefits, Amazon Prime is poised to become an indispensable service for many consumers, offering unparalleled convenience and value. As the service continues to expand and evolve, it is expected to attract even more subscribers, further solidifying Amazon's position as a leader in the e-commerce and digital services industries.'''

In [33]:

text="Former U.S. President Donald Trump was shot at during a rally in Butler, Pennsylvania, but is now doing well despite being hit in the ear. The attack resulted in one bystander's death and critical injuries to two others. The shooter, identified as 20-year-old Thomas Matthew Crooks, was killed by Secret Service at the scene. The incident is being investigated by the FBI, Secret Service, and the Department of Homeland Security, and it's treated as an assassination attempt. Secret Service Director Kimberly Cheatle will testify before the US House Oversight Committee regarding the incident. President Joe Biden and other political figures have condemned the violence."

In [30]:
text = "Former BBC news presenter Huw Edwards, who resigned in April on medical advice, saw his salary increase by £40,000 last year despite being suspended in July 2023 over allegations published in The Sun newspaper. Police found no evidence of a criminal offense. Edwards' salary rose from between £435,000 - £439,999 in 2022/2023 to £475,000 - £479,999 between April 2023 and April 2024. Gary Lineker remains the BBC's highest-paid star, earning around £1.35 million last year. The BBC's annual report also highlighted a reduction in the number of households paying the license fee and announced further job cuts."


In [34]:
# Tokenize the text
inputs = tokenizer(
    text,
    return_tensors="pt",
    max_length=2048,  # Adjust the max_length as needed
    truncation=True,
    padding="max_length"
)



In [35]:
model.to(device)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=300)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

['Trump was shot at during a rally in Pennsylvania. One bystander was killed and two others critically injured. The incident is being investigated as an assassination attempt.']


In [37]:
!pip install weightwatcher

Collecting weightwatcher
  Downloading weightwatcher-0.7.5.2-py3-none-any.whl.metadata (26 kB)
Collecting powerlaw (from weightwatcher)
  Downloading powerlaw-1.5-py3-none-any.whl.metadata (9.3 kB)
Downloading weightwatcher-0.7.5.2-py3-none-any.whl (80 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.1/80.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading powerlaw-1.5-py3-none-any.whl (24 kB)
Installing collected packages: powerlaw, weightwatcher
Successfully installed powerlaw-1.5 weightwatcher-0.7.5.2


In [38]:
import weightwatcher as ww

In [39]:
watcher = ww.WeightWatcher(model=model)

# Analyze the model
results = watcher.analyze()

summary = watcher.get_summary(results)
summary

{'log_norm': 4.840226220037563,
 'alpha': 4.209009679576707,
 'alpha_weighted': 13.421672312587289,
 'log_alpha_norm': 13.694369078535692,
 'log_spectral_norm': 3.259092615529022,
 'stable_rank': 53.740730533818756}

Overall Insight: Both models are very similar in terms of complexity, generalization capacity, stability, and sensitivity, suggesting they are built and trained in a very comparable manner. The sentiment model shows marginally higher values in some parameters, but the differences are minimal.