## CA 2, LLMs Spring 2024

- **Name: Bardia Khalafi**
- **Student ID: 810199414**

---
#### Your submission should be named using the following format: `CA2_LASTNAME_STUDENTID_soft_prompt.ipynb`.

- There is no penalty for using AI assistance on this homework as long as you fully disclose it in the final cell of this notebook (this includes storing any prompts that you feed to large language models). That said, anyone caught using AI assistance without proper disclosure will receive a zero on the assignment (we have several automatic tools to detect such cases). We're literally allowing you to use it with no limitations, so there is no reason to lie!

---

##### *Academic honesty*

- We will audit the Colab notebooks from a set number of students, chosen at random. The audits will check that the code you wrote actually generates the answers in your notebook. If you turn in correct answers on your notebook without code that actually generates those answers, we will consider this a serious case of cheating.

- We will also run automatic checks of Colab notebooks for plagiarism. Copying code from others is also considered a serious case of cheating.

---

If you have any further questions or concerns, contact the TA via email:
mohammad136631@gmail.com

---

# What are Soft prompts?
Soft prompts are learnable tensors concatenated with the input embeddings that can be optimized to a dataset; the downside is that they aren’t human readable because you aren’t matching these “virtual tokens” to the embeddings of a real word.
<br>
<div>
<img src="https://www.researchgate.net/publication/366062946/figure/fig1/AS:11431281105340756@1670383256990/The-comparison-between-the-previous-T5-prompt-tuning-method-part-a-and-the-introduced.jpg"/>
</div>

Read More:
<br>[Youtube : PEFT and Soft Prompt](https://www.youtube.com/watch?v=8uy_WII76L0)
<br>[Paper: The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)
https://arxiv.org/pdf/2101.00190.pdf
<br>[Paper: Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/pdf/2101.00190.pdf)

# Part 1 (20 Points)
Before diving into the practical applications, let's first ensure your foundational knowledge is solid. Please answer the following questions.


**A) Compare and contrast model tuning and prompt tuning in terms of their effectiveness for specific downstream tasks. (5 Points)**

model tuning is more costly because you have to change all the weights
but prompt tuning is cheaper

but model tuning may have better results in most tasks

for storage, you need to tune one model for each task and store them, but with prompt tuning, you just need to store the base model and just the promptsfor each task

**B) Explore the challenges associated with interpreting soft prompts in the continuous embedding space and propose potential solutions. (5 Points)**

there is a hypothesis named :The Waywardness Hypothesis

Khashabi et al.1 propose this incredible hypothesis. It says that given a task, for any discrete target prompt, there exists a continuous prompt that projects to it, while performing well on the task.

so we can find the discrete prompt by exploring and finding the first ( or k ) closest vector(s) to the continuous vectors in the soft prompt so we can interpret it.

**C) What is the effect of initializing prompts randomly versus initializing them from the vocabulary, and how does this impact the performance of prompt tuning? (5 Points)**

initializing from vocabilary has the advantage to converge faster and prone to the overfitting and generalizing better.

**D) How is the optimization process in the prefix tuning(<br>[Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/pdf/2101.00190.pdf)) and Why did they use this technique? (5 Points)**

they used this method because its cost efficient and has a promising accuracy
and it can remove or reduce the time that is spend on prompt engineering

the steps are like this
first we initialize the prefixes with vocab or random vectors
second we are going thorough iterations to update these prefix vectors using GD
and then we can evaluate the performance of the soft promt


# Part 2 (35 points)

## Imports

In [27]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
from transformers import AdamW
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

## Model Selection & Constants
We will use `bert-fa-base-uncased` as our base model from Hugging Face ([HF_Link](https://huggingface.co/HooshvareLab/bert-fa-base-uncased)). For our tuning, we intend to utilize 20 soft prompt tokens.

In [28]:
class CONFIG:
    seed = 42
    max_len = 128
    train_batch = 16
    valid_batch = 32
    epochs = 10
    n_tokens=20
    learning_rate = 0.01
    model_name = 'HooshvareLab/bert-fa-base-uncased'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Dataset

The dataset contains around 7000 Persian sentences and their corresponding polarity, and have been manually classified into 5 categories (i.e. Angry).

### Load Dataset

In [29]:
import pandas as pd
file_path = "softprompt_dataset.csv"
df = pd.read_csv(file_path)

### Pre-Processing

In [30]:
%pip install -q -U clean-text[gpl]
%pip install -q hazm

In [31]:
import re
from cleantext import clean
from hazm import *

In [32]:
import re
def cleanhtml(raw_html):
    cleanr = re.compile('<.*?>')
    cleantext = re.sub(cleanr, '', raw_html)
    return cleantext

def cleaning(text):
    text = text.strip()

    # regular cleaning
    text = clean(text,
        fix_unicode=True,
        to_ascii=False,
        lower=True,
        no_line_breaks=True,
        no_urls=True,
        no_emails=True,
        no_phone_numbers=True,
        no_numbers=False,
        no_digits=False,
        no_currency_symbols=True,
        no_punct=False,
        replace_with_url="",
        replace_with_email="",
        replace_with_phone_number="",
        replace_with_number="",
        replace_with_digit="0",
        replace_with_currency_symbol="",
    )

    text = cleanhtml(text)

    # normalizing
    #normalizer = hazm.Normalizer()
    #text = normalizer.normalize(text)

    wierd_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u'\U00010000-\U0010ffff'
        u"\u200d"
        u"\u2640-\u2642"
        u"\u2600-\u2B55"
        u"\u23cf"
        u"\u23e9"
        u"\u231a"
        u"\u3030"
        u"\ufe0f"
        u"\u2069"
        u"\u2066"
        u"\u2068"
        u"\u2067"
        "]+", flags=re.UNICODE)

    text = wierd_pattern.sub(r'', text)

    # removing extra spaces, hashtags
    text = re.sub("#", "", text)
    text = re.sub("\s+", " ", text)

    return text

In [33]:
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor

tqdm.pandas()

def parallel_apply_with_progress(df, func, n_workers=4):
    with ThreadPoolExecutor(max_workers=n_workers) as executor, tqdm(total=len(df)) as pbar:
        def update(*args):
            pbar.update()

        results = []
        for result in executor.map(func, df['text']):
            results.append(result)
            update()

        df['text'] = pd.Series(results)

    return df

In [34]:
df = parallel_apply_with_progress(df, cleaning)

100%|██████████| 7023/7023 [00:02<00:00, 2846.44it/s]


In [35]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(df.index.values,
                                                  df.label.values,
                                                  test_size=0.15,
                                                  random_state=42,
                                                  stratify=df.label.values)

train_df = df.loc[X_train]
validation_df = df.loc[X_val]

In [36]:
possible_labels = df.label.unique()

label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index
label_dict

{0: 0, 1: 1, 2: 2, -1: 3, -2: 4}

In [37]:
train_df['label'] = train_df.label.replace(label_dict)
validation_df['label'] = validation_df.label.replace(label_dict)

### Create Dataset Class (5 Points)
In this step we will getting our dataset ready for training.

In this part we will define BERT-based dataset class for text classification, with configuration parameters. It preprocesses text data and tokenizes it using the BERT tokenizer.


Complete the preprocessing step in the __getitem__ method by adding padding tokens to 'input_ids' and 'attention_mask',
The count of this pad tokens is the same as `n_tokens`.

In [197]:
import torch.nn.functional as F

class BERTDataset(Dataset):
    def __init__(self,df):
        self.text = df['text'].values
        self.labels = df['label'].values
        self.all_labels = [0, 1, 2, 3, 4]
        self.max_len = CONFIG.max_len
        self.tokenizer = CONFIG.tokenizer
        self.n_tokens=CONFIG.n_tokens

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        text = self.text[index]
        text = ' '.join(text.split())
        inputs = self.tokenizer.encode_plus(
            text,
            None,
            truncation=True,
            add_special_tokens=True,
            max_length=self.max_len,
            padding='max_length',
            return_token_type_ids=True
        )

        ######### Your code begins #########
        pad_tokens_tensor = torch.full(((self.n_tokens),), self.tokenizer.pad_token_id, dtype=torch.long)
        inputs['input_ids'] = torch.cat((pad_tokens_tensor, torch.tensor(inputs['input_ids'][:-self.n_tokens])), dim=-1)
        inputs['attention_mask'] = torch.cat((pad_tokens_tensor, torch.tensor(inputs['attention_mask'][:-self.n_tokens])), dim=-1)

        ######### Your code ends ###########

        labels = self.labels[index]
        label_dict = {label: (label == labels) for label in self.all_labels}
        labels_tensor = torch.tensor([float(label_dict[label]) for label in self.all_labels])
        return {
            'ids': inputs['input_ids'],
            'mask': inputs['attention_mask'],
            'label': labels_tensor
        }


In [198]:
train_dataset = BERTDataset(train_df)
validation_dataset = BERTDataset(validation_df)

## Define Prompt Embedding Layer (15 Points)
In this part we will define our prompt layer in `PROMPTEmbedding` module.


<font color='#73FF73'><b>You have to complete</b></font> `initialize_embedding` and  `forward` <font color='#73FF73'><b>functions.</b></font>

In `initialize_embedding` function initialize the learned embeddings based on whether they should be initialized from the vocabulary or randomly within the specified range.

In `forward` function, modify the input_embedding to extract the relevant part based on n_tokens.

Repeat the learned_embedding to match the size of input_embedding.

Concatenate the learned_embedding and input_embedding properly.


In [219]:
class PROMPTEmbedding(nn.Module):
    def __init__(self,
                emb_layer: nn.Embedding,
                n_tokens: int = 20,
                random_range: float = 0.5,
                initialize_from_vocab: bool = True):

      super(PROMPTEmbedding, self).__init__()
      self.emb_layer = emb_layer
      self.n_tokens = n_tokens
      self.learned_embedding = nn.parameter.Parameter(self.initialize_embedding(emb_layer,
                                                                               n_tokens,
                                                                               random_range,
                                                                               initialize_from_vocab))

    def initialize_embedding(self,
                             emb_layer: nn.Embedding,
                             n_tokens: int = 20,
                             random_range: float = 0.5,
                             initialize_from_vocab: bool = True):

      if initialize_from_vocab:
        ######### Your code begins #########
        random_indices = torch.randperm(emb_layer.weight.size(0))[:n_tokens]
        vocab_emb = emb_layer.weight[random_indices]

        return vocab_emb

      else:
        random_emb = torch.rand(n_tokens, emb_layer.weight.size(1)) * random_range
        ######### Your code ends ###########
      return random_emb


    def forward(self, tokens):
      ######### Your code begins #########
      input_embedding = self.emb_layer.weight[tokens[:, self.n_tokens:]]

      input_size = input_embedding.size(0)
      learned_embedding = self.learned_embedding.unsqueeze(0).repeat(input_size, 1, 1)
      '''print(f'token size: {tokens.size()}')
      print(f'learned embedding size: {learned_embedding.size()}')
      print(f'input embedding size: {input_embedding.size()}')'''

      joined_embedding = torch.cat((learned_embedding, input_embedding), dim=1)
      ######### Your code ends ###########
      return joined_embedding

## Replace model's embedding layer with our layer (5 Points)

In [220]:
# Define your BERT model
model = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                           output_hidden_states = False).to(CONFIG.device)
######### Your code begins #########

model_embedding = model.bert.embeddings.word_embeddings

prompt_embedding = PROMPTEmbedding(
    emb_layer = model_embedding,
    n_tokens = 20,
    random_range = 0.5,
    initialize_from_vocab = True
    )

model.bert.embeddings.word_embeddings = prompt_embedding
######### Your code ends ###########


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Freezing Model Parameters (5 points)
In this part we will freeze entire model except `learned_embedding`

In [221]:
######### Your code begins #########
for name, param in model.named_parameters():
    #if name != 'bert.embeddings.word_embeddings.learned_embedding':
    param.requires_grad = False

model.bert.embeddings.word_embeddings.learned_embedding.requires_grad = True
######### Your code ends ###########

In [222]:
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
pytorch_total_params # 768 * 20 = 15360

15360

## Optimizer


In [223]:
from transformers import AdamW

optimizer = AdamW(model.parameters(), lr=CONFIG.learning_rate)

## Training & Evaluation


### Define dataloaders

In [224]:
train_loader = DataLoader(train_dataset, batch_size=CONFIG.train_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

validation_loader = DataLoader(validation_dataset, batch_size=CONFIG.valid_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

### Define evaluation function

In [225]:
from sklearn.metrics import f1_score

def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = np.argmax(labels, axis=1).flatten()
    return f1_score(labels_flat, preds_flat, average='weighted')

In [226]:
def evaluate(val_dataloader):

    model.eval()

    loss_val_total = 0
    predictions, true_vals = [], []

    for batch in val_dataloader:


        inputs = {'input_ids':      batch['ids'].to(CONFIG.device),
                  'attention_mask': batch['mask'].to(CONFIG.device),
                  'labels':         batch['label'].to(CONFIG.device),
                 }

        with torch.no_grad():
            outputs = model(**inputs)

        loss = outputs["loss"]
        logits = outputs["logits"]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)

    loss_val_avg = loss_val_total/len(val_dataloader)

    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)

    return loss_val_avg, predictions, true_vals

### Define trainng loop


In [227]:
def train(model, optimizer, train_dataloader, val_dataloader):

    epochs = CONFIG.epochs

    for epoch in tqdm(range(1, epochs+1)):

      model.train()

      loss_train_total = 0

      progress_bar = tqdm(train_loader, desc='Epoch {:1d}'.format(epoch), leave=False, disable=True)

      for batch in progress_bar:

        optimizer.zero_grad()

        inputs = {'input_ids':      batch['ids'].to(CONFIG.device),
                  'attention_mask': batch['mask'].to(CONFIG.device),
                  'labels':         batch['label'].to(CONFIG.device),
                }
        output = model(**inputs)

        loss = output["loss"]
        loss_train_total += loss.item()

        loss.backward()
        optimizer.step()

        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})


      tqdm.write(f'\nEpoch {epoch}')
      loss_train_avg = loss_train_total/len(train_loader)
      tqdm.write(f'Training loss: {loss_train_avg}')


      val_loss, predictions, true_vals = evaluate(val_dataloader)
      val_f1 = f1_score_func(predictions, true_vals)
      tqdm.write(f'Validation loss: {val_loss}')
      tqdm.write(f'F1 Score (Weighted): {val_f1}')


### Run

In [228]:
train(model=model, optimizer=optimizer, train_dataloader=train_loader, val_dataloader=validation_loader)

  0%|          | 0/10 [01:20<?, ?it/s]


Epoch 1
Training loss: 0.5079868267723583


 10%|█         | 1/10 [01:27<13:11, 87.95s/it]

Validation loss: 0.47742841279867926
F1 Score (Weighted): 0.27714931230300927


 10%|█         | 1/10 [02:48<13:11, 87.95s/it]


Epoch 2
Training loss: 0.4973959942711866


 20%|██        | 2/10 [02:55<11:40, 87.53s/it]

Validation loss: 0.47584956161903613
F1 Score (Weighted): 0.22911018054744686


 20%|██        | 2/10 [04:15<11:40, 87.53s/it]


Epoch 3
Training loss: 0.4975961350342807


 30%|███       | 3/10 [04:22<10:11, 87.41s/it]

Validation loss: 0.4760757133816228
F1 Score (Weighted): 0.23898700863155997


 30%|███       | 3/10 [05:42<10:11, 87.41s/it]


Epoch 4
Training loss: 0.4966900956662581


 40%|████      | 4/10 [05:49<08:43, 87.32s/it]

Validation loss: 0.47610107515797473
F1 Score (Weighted): 0.21392801810706893


 40%|████      | 4/10 [07:09<08:43, 87.32s/it]


Epoch 5
Training loss: 0.49587863978536373


 50%|█████     | 5/10 [07:16<07:16, 87.24s/it]

Validation loss: 0.4746067975506638
F1 Score (Weighted): 0.28204958777960965


 50%|█████     | 5/10 [08:36<07:16, 87.24s/it]


Epoch 6
Training loss: 0.49633238779351035


 60%|██████    | 6/10 [08:43<05:48, 87.22s/it]

Validation loss: 0.4748729115182703
F1 Score (Weighted): 0.2581352517537825


 60%|██████    | 6/10 [10:04<05:48, 87.22s/it]


Epoch 7
Training loss: 0.4959473603549488


 70%|███████   | 7/10 [10:11<04:21, 87.20s/it]

Validation loss: 0.4751495357715722
F1 Score (Weighted): 0.3012163321226166


 70%|███████   | 7/10 [11:31<04:21, 87.20s/it]


Epoch 8
Training loss: 0.49597818782941544


 80%|████████  | 8/10 [11:38<02:54, 87.17s/it]

Validation loss: 0.47454211748007574
F1 Score (Weighted): 0.26512910323078326


 80%|████████  | 8/10 [12:58<02:54, 87.17s/it]


Epoch 9
Training loss: 0.4954285566660172


 90%|█████████ | 9/10 [13:05<01:27, 87.16s/it]

Validation loss: 0.4743012686570485
F1 Score (Weighted): 0.3106010054894331


 90%|█████████ | 9/10 [14:25<01:27, 87.16s/it]


Epoch 10
Training loss: 0.49515106158460526


100%|██████████| 10/10 [14:32<00:00, 87.24s/it]

Validation loss: 0.4743649354486754
F1 Score (Weighted): 0.25841974963579034





## Using OpenDelta library (5 Points)

In [229]:
!pip install git+https://github.com/thunlp/OpenDelta.git

Collecting git+https://github.com/thunlp/OpenDelta.git
  Cloning https://github.com/thunlp/OpenDelta.git to /tmp/pip-req-build-2rcv88av
  Running command git clone --filter=blob:none --quiet https://github.com/thunlp/OpenDelta.git /tmp/pip-req-build-2rcv88av
  Resolved https://github.com/thunlp/OpenDelta.git to commit 067eed2304cb1bdfe462094e42a37de4de98edff
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting datasets>=1.17.0 (from opendelta==0.3.2)
  Downloading datasets-2.19.0-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
Collecting web.py (from opendelta==0.3.2)
  Downloading web.py-0.62.tar.gz (623 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m623.2/623.2 kB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyprojec

Use `OpenDelta` library to do the same thing. [link](https://opendelta.readthedocs.io/en/latest/modules/deltas.html)

For hyperparameters, test with `N_SOFT_PROMPT_TOKENS=10` and `N_SOFT_PROMPT_TOKENS=20` and report them.

In [230]:
######### Your code begins #########
import opendelta
model = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                           output_hidden_states = False)

train_loader = DataLoader(train_dataset, batch_size=CONFIG.train_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

validation_loader = DataLoader(validation_dataset, batch_size=CONFIG.valid_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

N_SOFT_PROMPT_TOKENS = 10

opendelta_model = opendelta.SoftPromptModel(
    backbone_model = model,
    soft_token_num = N_SOFT_PROMPT_TOKENS,
    token_init = True # init from vocab
)
opendelta_model.freeze_module()

model = model.to(CONFIG.device)

optimizer = AdamW(model.parameters(), lr=CONFIG.learning_rate)

train(model=model, optimizer=optimizer, train_dataloader=train_loader, val_dataloader=validation_loader)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  0%|          | 0/10 [01:35<?, ?it/s]


Epoch 1
Training loss: 0.4679658020881408


 10%|█         | 1/10 [01:43<15:34, 103.85s/it]

Validation loss: 0.453556551174684
F1 Score (Weighted): 0.2199494082944495


 10%|█         | 1/10 [03:18<15:34, 103.85s/it]


Epoch 2
Training loss: 0.44828497184151633


 20%|██        | 2/10 [03:26<13:46, 103.34s/it]

Validation loss: 0.42555009325345355
F1 Score (Weighted): 0.32672642923138834


 20%|██        | 2/10 [05:01<13:46, 103.34s/it]


Epoch 3
Training loss: 0.4346329357853548


 30%|███       | 3/10 [05:09<12:01, 103.02s/it]

Validation loss: 0.4131843601212357
F1 Score (Weighted): 0.4128297965043291


 30%|███       | 3/10 [06:43<12:01, 103.02s/it]


Epoch 4
Training loss: 0.427244484185535


 40%|████      | 4/10 [06:52<10:17, 102.84s/it]

Validation loss: 0.4060957178925023
F1 Score (Weighted): 0.37455649975363503


 40%|████      | 4/10 [08:26<10:17, 102.84s/it]


Epoch 5
Training loss: 0.4252663210114056


 50%|█████     | 5/10 [08:35<08:34, 102.91s/it]

Validation loss: 0.4034629021630143
F1 Score (Weighted): 0.38048123224058455


 50%|█████     | 5/10 [10:09<08:34, 102.91s/it]


Epoch 6
Training loss: 0.419731044514294


 60%|██████    | 6/10 [10:17<06:51, 102.86s/it]

Validation loss: 0.4087850270849286
F1 Score (Weighted): 0.39855712170572577


 60%|██████    | 6/10 [11:52<06:51, 102.86s/it]


Epoch 7
Training loss: 0.41755617629079256


 70%|███████   | 7/10 [12:00<05:08, 102.80s/it]

Validation loss: 0.40246830293626495
F1 Score (Weighted): 0.4361242675885096


 70%|███████   | 7/10 [13:34<05:08, 102.80s/it]


Epoch 8
Training loss: 0.4172697502342775


 80%|████████  | 8/10 [13:42<03:25, 102.64s/it]

Validation loss: 0.3927671051386631
F1 Score (Weighted): 0.44526203768782463


 80%|████████  | 8/10 [15:16<03:25, 102.64s/it]


Epoch 9
Training loss: 0.41387908296151593


 90%|█████████ | 9/10 [15:25<01:42, 102.59s/it]

Validation loss: 0.3988931684783011
F1 Score (Weighted): 0.4160780354384191


 90%|█████████ | 9/10 [16:59<01:42, 102.59s/it]


Epoch 10
Training loss: 0.41272615557685893


100%|██████████| 10/10 [17:08<00:00, 102.81s/it]

Validation loss: 0.3924840535178329
F1 Score (Weighted): 0.4430442217076746





In [232]:
######### Your code begins #########
import opendelta
model = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                           output_hidden_states = False)

train_loader = DataLoader(train_dataset, batch_size=CONFIG.train_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

validation_loader = DataLoader(validation_dataset, batch_size=CONFIG.valid_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

N_SOFT_PROMPT_TOKENS = 20

opendelta_model = opendelta.SoftPromptModel(
    backbone_model = model,
    soft_token_num = N_SOFT_PROMPT_TOKENS,
    token_init = True # init from vocab
)
opendelta_model.freeze_module()

model = model.to(CONFIG.device)

optimizer = AdamW(model.parameters(), lr=CONFIG.learning_rate)

train(model=model, optimizer=optimizer, train_dataloader=train_loader, val_dataloader=validation_loader)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  0%|          | 0/10 [01:40<?, ?it/s]


Epoch 1
Training loss: 0.46993288652782134


 10%|█         | 1/10 [01:49<16:27, 109.75s/it]

Validation loss: 0.4585601840958451
F1 Score (Weighted): 0.13184442079142109


 10%|█         | 1/10 [03:30<16:27, 109.75s/it]


Epoch 2
Training loss: 0.4581914386328529


 20%|██        | 2/10 [03:39<14:37, 109.75s/it]

Validation loss: 0.43669540773738513
F1 Score (Weighted): 0.40807216151247105


 20%|██        | 2/10 [05:19<14:37, 109.75s/it]


Epoch 3
Training loss: 0.43944289093030325


 30%|███       | 3/10 [05:28<12:47, 109.63s/it]

Validation loss: 0.4301042195522424
F1 Score (Weighted): 0.3843056671530074


 30%|███       | 3/10 [07:09<12:47, 109.63s/it]


Epoch 4
Training loss: 0.4268120618905613


 40%|████      | 4/10 [07:18<10:57, 109.57s/it]

Validation loss: 0.41472751714966516
F1 Score (Weighted): 0.39798671118646367


 40%|████      | 4/10 [08:58<10:57, 109.57s/it]


Epoch 5
Training loss: 0.41785954393167546


 50%|█████     | 5/10 [09:07<09:07, 109.51s/it]

Validation loss: 0.4069855556343541
F1 Score (Weighted): 0.41871833332766717


 50%|█████     | 5/10 [10:48<09:07, 109.51s/it]


Epoch 6
Training loss: 0.41552514084838926


 60%|██████    | 6/10 [10:57<07:18, 109.59s/it]

Validation loss: 0.38720552849047113
F1 Score (Weighted): 0.474621610528971


 60%|██████    | 6/10 [12:38<07:18, 109.59s/it]


Epoch 7
Training loss: 0.4109762050410643


 70%|███████   | 7/10 [12:47<05:29, 109.69s/it]

Validation loss: 0.40232420509511774
F1 Score (Weighted): 0.42664284218024096


 70%|███████   | 7/10 [14:27<05:29, 109.69s/it]


Epoch 8
Training loss: 0.40702909749140714


 80%|████████  | 8/10 [14:37<03:39, 109.71s/it]

Validation loss: 0.3856089033863761
F1 Score (Weighted): 0.5044313019454423


 80%|████████  | 8/10 [16:17<03:39, 109.71s/it]


Epoch 9
Training loss: 0.40543673502251426


 90%|█████████ | 9/10 [16:27<01:49, 109.75s/it]

Validation loss: 0.3844484659758481
F1 Score (Weighted): 0.5204714881529142


 90%|█████████ | 9/10 [18:07<01:49, 109.75s/it]


Epoch 10
Training loss: 0.4013144043996372


100%|██████████| 10/10 [18:16<00:00, 109.69s/it]

Validation loss: 0.3811029141599482
F1 Score (Weighted): 0.5177478508675886





OpenDelta library performed a little better than the handwritten version

# AI disclosure



*   how to add padding tokens into a tokenized text
*   i have a list of prompts, i want to encode them with left padding, Explain why left-side padding is preferable during inference.
write the code to encode it
*   how can i freeze the whole model except the new_embedding_layer
*   i have loaded a model like this

model = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                           output_hidden_states = False).to(CONFIG.device)

how can i replace the embedding layer of this model with another layer

and i want to extract the embeddings of this model
*   i want to use SoftPromptModel from opendelta module, how can i use it
*   how can i initialize the soft promt in opendelta with vocab

