# Replication of "Evaluating Parameter-Efficient Finetuning Approaches for Pre-trained Models on the Financial Domain" paper results

This notebook is the code for replication of finetuning Bert Model for 4 Financial tasks and also using parameter efficient methods like Lora and Adapters.

## The structure of code is:
- Chapter 1: Sentiment Classification and Regression
  - 1.2 Lora method for each model type (Bert, FinBert and FlangBert)
  - 1.3 Adapter method for each models

- Chapter 3: News Headline Classification
  - 1.1 Bert Hyperparameter Search and training
  - 1.2 Lora method for each models
  - 1.3 Adapter method for each models

- Chapter 4: Named Entity Recognition
  - 1.1 Bert Hyperparameter Search and training
  - 1.2 Lora method for each models
  - 1.3 Adapter method for each models

## About Hyperparameter Search
We decreased the range of hyperparameter search for number of epochs and made constant for batch size depending on the task (we put the same batch size, which was the best in original paper).

- Full fine tuning:
  - epochs range: [3, 4, 5]
  - learning_rates range: [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
  - batch size: depends on the task

- Lora fine tuning:
  - epochs range: [3, 4, 5]
  - learning_rates range: [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
  - batch size: depends on the task

- Adapter fine tuning:
  - epochs range: [6, 9, 11]
  - learning_rates range: [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
  - batch size: depends on the task

### For the convenience we left import libraries in each chapter as it is. Because at the first time it was done on seperate notebooks and we merged all them in one notebook. You can run which you want first, so it will not affect other chapters.

# Installing Necessary Libraries

Before going to one of the sections, install these necessary libraries

In [None]:
#importing clear_output() function from notebook utilities
from IPython.display import clear_output

# After installing the adapters the will be like restarting, just press cancell and contunie to run. Because it install appropriate transformer library
!pip install adapters
!pip install datasets
!pip install torch
# For training process and looking at hyperparameters
!pip install huggingface_hub[hf_xet]
clear_output()
print('Done!')

Done!


# Sentiment Classification and Regression

## Loading and Preparing Datasets

In [None]:
from IPython.display import clear_output
from datasets import load_dataset
from pprint import pprint
import json

# Load dataset

# Financial Phrasebank
FP_dataset = load_dataset("takala/financial_phrasebank", 'sentences_50agree', trust_remote_code=True)['train'] # 4846 instead of 4845
dataset_FP = [x for x in FP_dataset]

# FIQA files loading
with open('task1_headline_ABSA_train.json') as f:
    headlines = json.load(f)

with open('task1_post_ABSA_train.json') as f:
    posts = json.load(f)

# Clear the output
clear_output()

print('The dataset blueprint for Financial Phrasebank is: ')
print(dataset_FP)
print("===================")
print('The dataset blueprint for Question and Answering is: ')
print(len(headlines))
print(len(posts))
print("===================")

The dataset blueprint for Financial Phrasebank is: 
The dataset blueprint for Question and Answering is: 
438
675


In [None]:
import random
from sklearn.model_selection import train_test_split

# Combine and process data
dataset_FIQA = []
for dataset in [headlines, posts]:
    for entry in dataset.values():
        sentence = entry['sentence']
        for info in entry['info']:
            target = info['target']
            score = float(info['sentiment_score'])
            input_text = f"{target}: {sentence}"
            dataset_FIQA.append({'sentence': input_text, 'label': score})

# Randomly select 1173 samples
random.seed(42)
dataset_FIQA = random.sample(dataset_FIQA, 1173)

## Functions for Tokenization and Classifiers

In [None]:
from torch.utils.data import Dataset
import torch

class FP_FIQA_Dataset(Dataset):
    """
    PyTorch Dataset for Word in Context tokenization.
    """
    def __init__(self, task, dataset, tokenizer, max_length=128):
        # Initializing the instances of a class and assigning values from the parameter
        self.task = task
        self.dataset = dataset
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        # Returning the count of the intences in the sample dataset for idx
        return len(self.dataset['sentence'])

    def __getitem__(self, idx):

        # Tokenizer for sentence
        tokens = self.tokenizer(
            self.dataset['sentence'][idx],
            padding="max_length",
            truncation=True,
            max_length=self.max_length,
            return_tensors='pt',
            return_offsets_mapping=True
        )

        return {
            'input_ids': tokens["input_ids"].squeeze(), # A 1 dimensional torch tensor representing input ids of the first sentence
            'attention_mask': tokens["attention_mask"].squeeze(), # A 1 dimensional torch tensor representing attention mask of the first sentence
            'label': torch.tensor(self.dataset['label'][idx], dtype=torch.float if self.task == "FIQA" else torch.long),# A scalar torch tensor representing the instance label
            }


In [None]:
import torch.nn as nn
from peft import LoraConfig, get_peft_model, TaskType

class BertSentimentClassifier(nn.Module):

    def __init__(self, bert_model, lora):
        super().__init__()
        self.bert = bert_model

        self.classifier = nn.Linear(self.bert.config.hidden_size, 3)

    def forward(self, input_ids, attention_mask):
        # Get BERT outputs
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)

        # Get the CLS token representation (the first token)
        cls_output = outputs.last_hidden_state[:, 0, :]

        # Apply classifier
        logits = self.classifier(cls_output)

        return logits

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

In [None]:
from torch.utils.data import DataLoader, Dataset
import torch.nn as nn
from adapters import AdapterConfig, AutoAdapterModel
from peft import LoraConfig, get_peft_model, TaskType

# Model for QA scoring (regression)
class BertRelevanceRegressor(nn.Module):
    def __init__(self, bert_model, lora):
        super().__init__()
        if lora:
            # Preparing model for LoRA
            peft_config = LoraConfig(
                task_type=TaskType.FEATURE_EXTRACTION,
                r=16,  # Rank
                lora_alpha=32,  # Scaling factor
                lora_dropout=0.05,
                target_modules=["query", "key", "value"]  # Modules to apply
            )

            # Wrapping BERT model with LoRA
            self.bert = get_peft_model(bert_model, peft_config)

            for name, param in self.bert.named_parameters():
              if "lora" not in name.lower():
                  param.requires_grad = False
        else:
            self.bert = bert_model

        self.regressor = nn.Sequential(
            nn.Linear(bert_model.config.hidden_size, 1),
            nn.Tanh()  # ensure output is in [-1, 1]
        )

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        cls_output = outputs.last_hidden_state[:, 0, :]
        score = self.regressor(cls_output).squeeze(-1)
        return score

# Define the AdapterRegressor class
class AdapterRegressor(nn.Module):
    def __init__(self, adapter_name, model_name):
        super().__init__()
        # Load BERT with adapter support
        self.bert = AutoAdapterModel.from_pretrained(model_name)
        # Add adapter to the model
        self.bert.add_adapter(adapter_name, config="houlsby")
        # Activate the adapter
        self.bert.set_active_adapters(adapter_name)
        # Freeze all parameters except adapter
        self.bert.train_adapter(adapter_name)
        # Regressor head
        self.regressor = nn.Linear(768, 1)  # Assuming BERT base has 768 hidden size

    def forward(self, input_ids, attention_mask):
        # Set output_hidden_states=True to access hidden states
        outputs = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_hidden_states=True  # Request hidden states
        )

        # Properly handle the model outputs
        if hasattr(outputs, 'last_hidden_state'):
            # If the model returns last_hidden_state directly
            cls = outputs.last_hidden_state[:, 0, :]
        elif hasattr(outputs, 'hidden_states'):
            # If the model returns hidden_states
            cls = outputs.hidden_states[-1][:, 0, :]
        else:
            # Fallback: try to get from the first element of the tuple
            cls = outputs[0][:, 0, :]

        return torch.tanh(self.regressor(cls).squeeze(-1))

    def save_adapter(self, path, adapter_name):
        self.bert.save_adapter(path, adapter_name)


## Functions for Training and Hyperparameter Search

In [None]:
from tqdm import tqdm
from torch.utils.data import DataLoader
import torch.nn as nn
from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassification
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from datetime import datetime
import os
import json

# Function to train and evaluate model with specific hyperparameters
def train_and_evaluate_FP(model_name, train_loader, val_loader, test_loader, lr, epochs, device, lora, adapter):
    # Adapter configuration for Houlsby method
    adapter_config = AdapterConfig.load("houlsby")

    if adapter:
      # Initialize model for this specific run
      model = AutoAdapterModel.from_pretrained(model_name)
      # Add the Houlsby adapter
      adapter_name = "fp_adapter"
      model.add_adapter(adapter_name, config=adapter_config)
      model.add_classification_head("fp_houlsby", num_labels=3)
      # Activate adapter and freeze other parameters
      model.train_adapter(adapter_name)
      model.set_active_adapters(adapter_name)
    elif lora:
      lora_config = LoraConfig(
          task_type = TaskType.SEQ_CLS,
          r= 16,
          lora_alpha = 32,
          lora_dropout = 0.05,
          target_modules = ["query", "key", "value"],
          bias = "none"
      )
      model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
      trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
      print("number of trainable param: ", trainable_params)
      model = get_peft_model(model, lora_config)
      model.print_trainable_parameters()
    else:
      model = AutoModel.from_pretrained(model_name)
      model = BertSentimentClassifier(model, lora)

    model = model.to(device)

    if lora:
      optimizer = torch.optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()), weight_decay=0.01)
    elif adapter:
      # Optimizer which focuses on adapter parameters
      optimizer = torch.optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=lr, weight_decay=0.01)
    else:
      # Regular optimizer
      optimizer = torch.optim.AdamW(model.parameters(), lr=lr)

    criterion = nn.CrossEntropyLoss()

    # Track best performance
    best_val_acc = 0
    best_model_state = None
    epochs_since_improve = 0
    training_history = {
        'train_loss': [],
        'val_acc': [],
        'epoch': []
    }

    for epoch in range(epochs):
        # Training phase
        model.train()
        total_loss = 0
        progress_bar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs} [Train]")

        for batch in progress_bar:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)

            optimizer.zero_grad()
            if adapter:
              outputs = model(input_ids=input_ids,
                    attention_mask=attention_mask,
                    labels=labels,
                    adapter_names=[adapter_name])
              loss = outputs.loss
            elif lora:
              outputs = model(input_ids, attention_mask, labels = labels)
              loss = outputs.loss
            else:
              outputs = model(input_ids, attention_mask)
              loss = criterion(outputs, labels)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()

            total_loss += loss.item()
            progress_bar.set_postfix({"Loss": f"{loss.item():.4f}"})

        avg_train_loss = total_loss / len(train_loader)

        # Validation phase
        model.eval()
        all_preds = []
        all_labels = []

        with torch.no_grad():
            for batch in tqdm(val_loader, desc=f"Epoch {epoch+1}/{epochs} [Val]"):
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['label'].to(device)

                if adapter:
                  outputs = model(input_ids=input_ids,
                        attention_mask=attention_mask,
                        labels=labels,
                        adapter_names=[adapter_name])
                  _, predicted = torch.max(outputs.logits, dim=1)
                elif lora:
                  outputs = model(input_ids, attention_mask, labels=labels)
                  _, predicted = torch.max(outputs.logits, dim=1)
                else:
                  outputs = model(input_ids, attention_mask)
                  _, predicted = torch.max(outputs.data, 1)

                all_preds.extend(predicted.cpu().numpy())
                all_labels.extend(labels.cpu().numpy())

        val_acc = accuracy_score(all_labels, all_preds)

        # Save metrics for this epoch
        training_history['train_loss'].append(avg_train_loss)
        training_history['val_acc'].append(val_acc)
        training_history['epoch'].append(epoch + 1)


        print(f"Epoch {epoch+1}/{epochs} done!")
        print(f"Average training loss: {avg_train_loss:.4f}")
        print(f"Validation accuracy: {val_acc:.4f}")

        # Save best model
        # ——— Check for improvement ———
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_model_state = model.state_dict().copy()
            epochs_since_improve = 0
            print(f"✓ New best model (val_acc: {best_val_acc:.4f})")
        else:
            epochs_since_improve += 1
            print(f"⟳ No improvement for {epochs_since_improve}/{3} epochs")
            if epochs_since_improve >= 3:
                print(f"⏹ Early stopping at epoch {epoch+1}")
                break
    if adapter:
        # Save best model
        model.save_adapter(os.path.join("./adapters", adapter_name), adapter_name)

    # Load best model state for testing
    if best_model_state:
        model.load_state_dict(best_model_state)

    # Test phase
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in tqdm(test_loader, desc="Testing"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)

            if adapter:
              outputs = model(input_ids=input_ids,
                        attention_mask=attention_mask,
                        labels=labels,
                        adapter_names=[adapter_name])
              _, predicted = torch.max(outputs.logits, dim=1)

            elif lora:
              outputs = model(input_ids, attention_mask, labels=labels)
              _, predicted = torch.max(outputs.logits, dim=1)
            else:
              outputs = model(input_ids, attention_mask)
              _, predicted = torch.max(outputs.data, 1)

            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    test_acc = accuracy_score(all_labels, all_preds)
    classification_rep = classification_report(all_labels, all_preds, output_dict=True)
    conf_matrix = confusion_matrix(all_labels, all_preds).tolist()

    best_result = {
        'learning_rate': lr,
        'epochs': epochs,
        'best_val_accuracy': best_val_acc,
        'test_accuracy': test_acc,
        'classification_report': classification_rep,
        'confusion_matrix': conf_matrix,
        'training_history': training_history,
        'best_model_state': best_model_state
    }

    # Return results
    return best_result

In [None]:
from tqdm import tqdm
from torch.utils.data import DataLoader
import torch.nn as nn
from adapters import AdapterConfig, AutoAdapterModel
from transformers import AutoModel, AutoTokenizer
from sklearn.model_selection import train_test_split
import torch.nn.functional as F
import pandas as pd
import numpy as np
from sklearn.metrics import  mean_squared_error, mean_absolute_error
from datetime import datetime
import os
import json

# Function to train and evaluate model with specific hyperparameters
def train_and_evaluate_FIQA(model_name, train_loader, val_loader, test_loader, lr, epochs, device, lora, adapter):
    # Adapter configuration for Houlsby method
    adapter_config = AdapterConfig.load("houlsby")
    if adapter:
      # Initialize model for this specific run
      model = AdapterRegressor(adapter_name="fiqa_adapter", model_name=model_name)
    else:
      model = AutoModel.from_pretrained(model_name)
      model = BertRelevanceRegressor(model, lora)
    model = model.to(device)

    if lora:
      # Optimizer which focused on LoRA and classifier parameters
      optimizer = torch.optim.AdamW([
          {'params': model.bert.parameters(), 'lr': lr},
          {'params': model.regressor.parameters(), 'lr': lr}
      ], weight_decay=0.01)
    elif adapter:
      # Optimizer which focuses on adapter parameters
      optimizer = torch.optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=lr, weight_decay=0.02)
    else:
      # Regular optimizer
      optimizer = torch.optim.AdamW(model.parameters(), lr=lr)

    criterion = nn.MSELoss()
    best_val_mse = float('inf')
    best_model_state = None
    best_test_result = {}
    epochs_since_improve = 0

    for epoch in range(epochs):
        model.train()
        total_loss = 0
        for batch in tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs} [Train]"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)

            optimizer.zero_grad()
            outputs = model(input_ids, attention_mask)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        avg_train_loss = total_loss / len(train_loader)

        # Validation
        model.eval()
        val_preds, val_labels = [], []
        with torch.no_grad():
            for batch in val_loader:
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['label'].to(device)
                outputs = model(input_ids, attention_mask)
                val_preds.extend(outputs.cpu().numpy())
                val_labels.extend(labels.cpu().numpy())

        val_mse = mean_squared_error(val_labels, val_preds)
        val_mae = mean_absolute_error(val_labels, val_preds)

        print(f"Epoch {epoch+1}: Train Loss = {avg_train_loss:.4f}, Val MSE = {val_mse:.4f}, Val MAE = {val_mae:.4f}")

        # 3) Early-stopping check
        if val_mse < best_val_mse:
            best_val_mse = val_mse
            best_model_state = model.state_dict().copy()
            epochs_since_improve = 0  # reset counter
        else:
            epochs_since_improve += 1
            print(f"  ↪ No improvement for {epochs_since_improve}/{3} epochs")
            if epochs_since_improve >= 3:
                print(f"Stopping early after {epoch+1} epochs.")
                break
    if adapter:
      # Save best model
      model.save_adapter(os.path.join("./adapters", "fiqa_adapter"), "fiqa_adapter")

    # Evaluate on test set with best model
    model.load_state_dict(best_model_state)
    model.eval()
    test_preds, test_labels = [], []
    with torch.no_grad():
        for batch in test_loader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)
            outputs = model(input_ids, attention_mask)
            test_preds.extend(outputs.cpu().numpy())
            test_labels.extend(labels.cpu().numpy())

    test_mse = mean_squared_error(test_labels, test_preds)
    test_mae = mean_absolute_error(test_labels, test_preds)
    best_result = {
        'learning_rate': lr,
        'epochs': epochs,
        'best_val_mse': best_val_mse,
        'test_mse': test_mse,
        'test_mae': test_mae,
        'best_model_state': best_model_state
    }

    return best_result

In [None]:
# Main function for grid search
import shutil
import torch
def hyperparameter_grid_search(model_name, lr, epochs, task, dataset, lora, adapter):

    learning_rates = lr
    num_epochs_list = epochs
    max_length = 128

    if task == 'FP' and not lora and not adapter:
      batch_size = 32
    else:
      batch_size = 16
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    print(f"Using device: {device}")

    # Load pretrained tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Load and prepare dataset

    train_data = dataset
    train, test = train_test_split(train_data, test_size=0.3, random_state=42)
    val, test = train_test_split(test, test_size=0.5, random_state=42)

    print(f"Dataset sizes - Train: {len(train)}, Val: {len(val)}, Test: {len(test)}")

    # Convert to dictionaries with 'sentence' and 'label' keys
    train = {'sentence': [d['sentence'] for d in train], 'label': [d['label'] for d in train]}
    val = {'sentence': [d['sentence'] for d in val], 'label': [d['label'] for d in val]}
    test = {'sentence': [d['sentence'] for d in test], 'label': [d['label'] for d in test]}

    # Create datasets
    train_dataset = FP_FIQA_Dataset(task, train, tokenizer, max_length)
    val_dataset = FP_FIQA_Dataset(task, val, tokenizer, max_length)
    test_dataset = FP_FIQA_Dataset(task, test, tokenizer, max_length)

    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size)
    test_loader = DataLoader(test_dataset, batch_size=batch_size)

    # Create output directory for results
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    results_dir = f"bert_grid_search_{timestamp}"
    os.makedirs(results_dir, exist_ok=True)

    # Track all results
    all_results = []
    best_result = None
    # best_accuracy = 0
    best_metric = float('inf') if task == "FIQA" else 0

    # Grid search
    for lr in learning_rates:
        for epochs in num_epochs_list:
            run_name = f"lr_{lr}_epochs_{epochs}"
            print(f"\n{'='*50}\nStarting training with LR: {lr}, Epochs: {epochs}\n{'='*50}")

            if task == "FIQA":
                result = train_and_evaluate_FIQA(model_name, train_loader, val_loader, test_loader, lr, epochs, device, lora, adapter)
                metric = result['best_val_mse']
                is_better = metric < best_metric
            elif task == "FP":
                result = train_and_evaluate_FP(model_name, train_loader, val_loader, test_loader, lr, epochs, device, lora, adapter)
                metric = result['test_accuracy']
                is_better = metric > best_metric
            else:
                raise ValueError("Unsupported task type")

            # Save model
            if result['best_model_state']:
                torch.save(result['best_model_state'], f"{results_dir}/model_{run_name}.pt")
                model_state = result.pop('best_model_state')
                with open(f"{results_dir}/result_{run_name}.json", 'w') as f:
                    json.dump(result, f, indent=4)
                result['best_model_state'] = model_state

            if is_better:
                best_metric = metric
                best_result = result
                print(f"✓ New best overall model ({'MSE' if task == 'FIQA' else 'accuracy'}: {best_metric:.4f})")

            all_results.append(result)

    # Save all results (remove model state)
    serializable_results = []
    for result in all_results:
        res = result.copy()
        res.pop('best_model_state', None)
        serializable_results.append(res)

    with open(f"{results_dir}/all_results.json", 'w') as f:
        json.dump(serializable_results, f, indent=4)

    # Save best model info
    if best_result:
        print("\nBest Hyperparameters:")
        print(f"Learning Rate: {best_result['learning_rate']}")
        print(f"Epochs: {best_result['epochs']}")
        print(f"Batch Size: {batch_size}")
        if task == "FIQA":
            print(f"Validation MSE: {best_result['best_val_mse']:.4f}")
            print(f"Test MSE: {best_result['test_mse']:.4f}")
        else:
            print(f"Validation Accuracy: {best_result['best_val_accuracy']:.4f}")
            print(f"Test Accuracy: {best_result['test_accuracy']:.4f}")

        model_path = f"{results_dir}/model_lr_{best_result['learning_rate']}_epochs_{best_result['epochs']}.pt"
        if os.path.exists(model_path):
            shutil.copy(model_path, f"{results_dir}/best_model.pt")

    print(f"\nAll results saved in the '{results_dir}' directory.")
    return best_result

## Hyperparameter Search and Bert Model Train (Financial Phrasebank and FIQA)

### Bert on Financial Phrasebank

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'bert-base-uncased'
lr = [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
epochs = [3,4,5]
task = 'FP'
dataset = dataset_FP
lora = False
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

this works
Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 1e-06, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:32<00:00,  3.29it/s, Loss=0.8939]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.42it/s]


Epoch 1/3 done!
Average training loss: 0.9598
Validation accuracy: 0.6121
✓ New best model (val_acc: 0.6121)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.9028]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.63it/s]


Epoch 2/3 done!
Average training loss: 0.8221
Validation accuracy: 0.6740
✓ New best model (val_acc: 0.6740)


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.7723]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 3/3 done!
Average training loss: 0.7347
Validation accuracy: 0.7180
✓ New best model (val_acc: 0.7180)


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.47it/s]


✓ New best overall model (accuracy: 0.6836)

Starting training with LR: 1e-06, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.7526]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]


Epoch 1/4 done!
Average training loss: 0.9579
Validation accuracy: 0.6451
✓ New best model (val_acc: 0.6451)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.9091]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]


Epoch 2/4 done!
Average training loss: 0.7993
Validation accuracy: 0.7043
✓ New best model (val_acc: 0.7043)


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.6947]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.49it/s]


Epoch 3/4 done!
Average training loss: 0.7143
Validation accuracy: 0.7194
✓ New best model (val_acc: 0.7194)


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.6163]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.48it/s]


Epoch 4/4 done!
Average training loss: 0.6458
Validation accuracy: 0.7524
✓ New best model (val_acc: 0.7524)


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.46it/s]


✓ New best overall model (accuracy: 0.7455)

Starting training with LR: 1e-06, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.8145]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]


Epoch 1/5 done!
Average training loss: 0.9694
Validation accuracy: 0.5997
✓ New best model (val_acc: 0.5997)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.32it/s, Loss=0.8608]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]


Epoch 2/5 done!
Average training loss: 0.8139
Validation accuracy: 0.6823
✓ New best model (val_acc: 0.6823)


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.6808]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 3/5 done!
Average training loss: 0.7275
Validation accuracy: 0.7098
✓ New best model (val_acc: 0.7098)


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.4867]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 4/5 done!
Average training loss: 0.6641
Validation accuracy: 0.7249
✓ New best model (val_acc: 0.7249)


Epoch 5/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.5636]
Epoch 5/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 5/5 done!
Average training loss: 0.6008
Validation accuracy: 0.7593
✓ New best model (val_acc: 0.7593)


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.46it/s]


✓ New best overall model (accuracy: 0.7607)

Starting training with LR: 1e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.3023]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 1/3 done!
Average training loss: 0.6725
Validation accuracy: 0.8391
✓ New best model (val_acc: 0.8391)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.2679]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]


Epoch 2/3 done!
Average training loss: 0.3580
Validation accuracy: 0.8693
✓ New best model (val_acc: 0.8693)


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.2565]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 3/3 done!
Average training loss: 0.2380
Validation accuracy: 0.8693
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.50it/s]


✓ New best overall model (accuracy: 0.8377)

Starting training with LR: 1e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.4000]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 1/4 done!
Average training loss: 0.6672
Validation accuracy: 0.8349
✓ New best model (val_acc: 0.8349)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.4318]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.58it/s]


Epoch 2/4 done!
Average training loss: 0.3434
Validation accuracy: 0.8624
✓ New best model (val_acc: 0.8624)


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1140]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]


Epoch 3/4 done!
Average training loss: 0.2256
Validation accuracy: 0.8611
⟳ No improvement for 1/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.1486]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 4/4 done!
Average training loss: 0.1431
Validation accuracy: 0.8473
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.49it/s]


✓ New best overall model (accuracy: 0.8432)

Starting training with LR: 1e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.4457]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]


Epoch 1/5 done!
Average training loss: 0.7219
Validation accuracy: 0.7799
✓ New best model (val_acc: 0.7799)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.2957]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 2/5 done!
Average training loss: 0.3818
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.1195]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 3/5 done!
Average training loss: 0.2507
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.1395]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]


Epoch 4/5 done!
Average training loss: 0.1526
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Epoch 5/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.1431]
Epoch 5/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 5/5 done!
Average training loss: 0.0976
Validation accuracy: 0.8652
✓ New best model (val_acc: 0.8652)


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.51it/s]


✓ New best overall model (accuracy: 0.8487)

Starting training with LR: 2e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.3473]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 1/3 done!
Average training loss: 0.5948
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.2070]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 2/3 done!
Average training loss: 0.2776
Validation accuracy: 0.8542
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.2145]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 3/3 done!
Average training loss: 0.1483
Validation accuracy: 0.8487
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]



Starting training with LR: 2e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.4410]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 1/4 done!
Average training loss: 0.6047
Validation accuracy: 0.8473
✓ New best model (val_acc: 0.8473)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.1026]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 2/4 done!
Average training loss: 0.2839
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.0611]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 3/4 done!
Average training loss: 0.1536
Validation accuracy: 0.8459
⟳ No improvement for 1/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0076]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.58it/s]


Epoch 4/4 done!
Average training loss: 0.0751
Validation accuracy: 0.8377
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.50it/s]


✓ New best overall model (accuracy: 0.8514)

Starting training with LR: 2e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.4434]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 1/5 done!
Average training loss: 0.6128
Validation accuracy: 0.8446
✓ New best model (val_acc: 0.8446)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.2820]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 2/5 done!
Average training loss: 0.2893
Validation accuracy: 0.8308
⟳ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.0633]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.62it/s]


Epoch 3/5 done!
Average training loss: 0.1550
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0174]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 4/5 done!
Average training loss: 0.0793
Validation accuracy: 0.8391
⟳ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0269]
Epoch 5/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.58it/s]


Epoch 5/5 done!
Average training loss: 0.0424
Validation accuracy: 0.8198
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]



Starting training with LR: 3e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.4678]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 1/3 done!
Average training loss: 0.5441
Validation accuracy: 0.8611
✓ New best model (val_acc: 0.8611)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.2789]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 2/3 done!
Average training loss: 0.2552
Validation accuracy: 0.8459
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1302]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 3/3 done!
Average training loss: 0.1275
Validation accuracy: 0.8418
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.49it/s]


✓ New best overall model (accuracy: 0.8542)

Starting training with LR: 3e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3053]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]


Epoch 1/4 done!
Average training loss: 0.5436
Validation accuracy: 0.8501
✓ New best model (val_acc: 0.8501)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.3622]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 2/4 done!
Average training loss: 0.2493
Validation accuracy: 0.8583
✓ New best model (val_acc: 0.8583)


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.3172]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 3/4 done!
Average training loss: 0.1176
Validation accuracy: 0.8514
⟳ No improvement for 1/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0115]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 4/4 done!
Average training loss: 0.0530
Validation accuracy: 0.8267
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.50it/s]



Starting training with LR: 3e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.4315]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.54it/s]


Epoch 1/5 done!
Average training loss: 0.5379
Validation accuracy: 0.8308
✓ New best model (val_acc: 0.8308)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3896]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 2/5 done!
Average training loss: 0.2484
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.1660]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]


Epoch 3/5 done!
Average training loss: 0.1236
Validation accuracy: 0.8473
⟳ No improvement for 1/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0167]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 4/5 done!
Average training loss: 0.0583
Validation accuracy: 0.8638
⟳ No improvement for 2/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0262]
Epoch 5/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 5/5 done!
Average training loss: 0.0345
Validation accuracy: 0.8418
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 5


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.47it/s]



Starting training with LR: 5e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.2897]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.59it/s]


Epoch 1/3 done!
Average training loss: 0.5142
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.0796]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 2/3 done!
Average training loss: 0.2257
Validation accuracy: 0.8693
✓ New best model (val_acc: 0.8693)


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1994]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.59it/s]


Epoch 3/3 done!
Average training loss: 0.1118
Validation accuracy: 0.8624
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.51it/s]


✓ New best overall model (accuracy: 0.8624)

Starting training with LR: 5e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1940]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.58it/s]


Epoch 1/4 done!
Average training loss: 0.5053
Validation accuracy: 0.8583
✓ New best model (val_acc: 0.8583)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3857]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]


Epoch 2/4 done!
Average training loss: 0.2485
Validation accuracy: 0.8391
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.0405]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 3/4 done!
Average training loss: 0.1174
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.0355]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.60it/s]


Epoch 4/4 done!
Average training loss: 0.0610
Validation accuracy: 0.8528
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]



Starting training with LR: 5e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.33it/s, Loss=0.3300]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.59it/s]


Epoch 1/5 done!
Average training loss: 0.5096
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1709]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 2/5 done!
Average training loss: 0.2244
Validation accuracy: 0.8514
⟳ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.0291]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 3/5 done!
Average training loss: 0.1020
Validation accuracy: 0.8446
⟳ No improvement for 2/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.0887]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.59it/s]


Epoch 4/5 done!
Average training loss: 0.0626
Validation accuracy: 0.8542
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.52it/s]



Starting training with LR: 0.0001, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3442]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.62it/s]


Epoch 1/3 done!
Average training loss: 0.5360
Validation accuracy: 0.8267
✓ New best model (val_acc: 0.8267)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.2535]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.59it/s]


Epoch 2/3 done!
Average training loss: 0.2744
Validation accuracy: 0.8363
✓ New best model (val_acc: 0.8363)


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.3920]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.62it/s]


Epoch 3/3 done!
Average training loss: 0.1418
Validation accuracy: 0.8391
✓ New best model (val_acc: 0.8391)


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.59it/s]



Starting training with LR: 0.0001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3458]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.58it/s]


Epoch 1/4 done!
Average training loss: 0.5090
Validation accuracy: 0.8267
✓ New best model (val_acc: 0.8267)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3294]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]


Epoch 2/4 done!
Average training loss: 0.2640
Validation accuracy: 0.8198
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1964]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.60it/s]


Epoch 3/4 done!
Average training loss: 0.1483
Validation accuracy: 0.8377
✓ New best model (val_acc: 0.8377)


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.3015]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 4/4 done!
Average training loss: 0.1079
Validation accuracy: 0.8349
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.56it/s]



Starting training with LR: 0.0001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.2872]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.53it/s]


Epoch 1/5 done!
Average training loss: 0.5323
Validation accuracy: 0.8446
✓ New best model (val_acc: 0.8446)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.2514]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 2/5 done!
Average training loss: 0.2684
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.1404]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 3/5 done!
Average training loss: 0.1737
Validation accuracy: 0.8569
✓ New best model (val_acc: 0.8569)


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.0671]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.60it/s]


Epoch 4/5 done!
Average training loss: 0.1059
Validation accuracy: 0.8363
⟳ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.2101]
Epoch 5/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.55it/s]


Epoch 5/5 done!
Average training loss: 0.0749
Validation accuracy: 0.8404
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]



Starting training with LR: 0.001, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.8809]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.01it/s]


Epoch 1/3 done!
Average training loss: 1.0269
Validation accuracy: 0.5970
✓ New best model (val_acc: 0.5970)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.8372]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.98it/s]


Epoch 2/3 done!
Average training loss: 0.9407
Validation accuracy: 0.5970
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.9324]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.01it/s]


Epoch 3/3 done!
Average training loss: 0.9306
Validation accuracy: 0.5970
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00, 10.05it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Starting training with LR: 0.001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.36it/s, Loss=0.9266]
Epoch 1/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.98it/s]


Epoch 1/4 done!
Average training loss: 0.9977
Validation accuracy: 0.5970
✓ New best model (val_acc: 0.5970)


Epoch 2/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.9906]
Epoch 2/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.95it/s]


Epoch 2/4 done!
Average training loss: 0.9413
Validation accuracy: 0.5970
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=1.0266]
Epoch 3/4 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.94it/s]


Epoch 3/4 done!
Average training loss: 0.9418
Validation accuracy: 0.5970
⟳ No improvement for 2/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=1.0629]
Epoch 4/4 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.02it/s]


Epoch 4/4 done!
Average training loss: 0.9372
Validation accuracy: 0.5970
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 23/23 [00:02<00:00, 10.05it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Starting training with LR: 0.001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=1.0374]
Epoch 1/5 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.08it/s]


Epoch 1/5 done!
Average training loss: 1.0542
Validation accuracy: 0.5970
✓ New best model (val_acc: 0.5970)


Epoch 2/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.8933]
Epoch 2/5 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.03it/s]


Epoch 2/5 done!
Average training loss: 0.9568
Validation accuracy: 0.5970
⟳ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.9902]
Epoch 3/5 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.99it/s]


Epoch 3/5 done!
Average training loss: 0.9357
Validation accuracy: 0.5970
⟳ No improvement for 2/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.35it/s, Loss=0.9672]
Epoch 4/5 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.09it/s]


Epoch 4/5 done!
Average training loss: 0.9366
Validation accuracy: 0.5970
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 23/23 [00:02<00:00, 10.11it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Best Hyperparameters:
Learning Rate: 5e-05
Epochs: 3
Batch Size: 32
Validation Accuracy: 0.8693
Test Accuracy: 0.8624

All results saved in the 'bert_grid_search_20250503_132808' directory.


### Bert + LoRA on Financial Phrasebank

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'bert-base-uncased'
lr = [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
epochs = [3,4,5]
task = 'FP'
dataset = dataset_FP
lora = True
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 1e-06, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.2545]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 18.25it/s]


Epoch 1/3 done!
Average training loss: 0.6104
Validation accuracy: 0.8336
✓ New best model (val_acc: 0.8336)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.02it/s, Loss=0.5629]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 18.09it/s]


Epoch 2/3 done!
Average training loss: 0.4091
Validation accuracy: 0.7552
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2819]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 18.04it/s]


Epoch 3/3 done!
Average training loss: 0.3316
Validation accuracy: 0.8267
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 18.01it/s]


✓ New best overall model (accuracy: 0.8129)

Starting training with LR: 1e-06, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.90it/s, Loss=0.4551]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.80it/s]


Epoch 1/4 done!
Average training loss: 0.6551
Validation accuracy: 0.7992
✓ New best model (val_acc: 0.7992)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.82it/s, Loss=0.2208]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.66it/s]


Epoch 2/4 done!
Average training loss: 0.4157
Validation accuracy: 0.8652
✓ New best model (val_acc: 0.8652)


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.3254]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 3/4 done!
Average training loss: 0.3301
Validation accuracy: 0.8418
⟳ No improvement for 1/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.78it/s, Loss=0.2720]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.57it/s]


Epoch 4/4 done!
Average training loss: 0.2895
Validation accuracy: 0.8336
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.59it/s]


✓ New best overall model (accuracy: 0.8281)

Starting training with LR: 1e-06, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.2102]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.45it/s]


Epoch 1/5 done!
Average training loss: 0.6643
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.5053]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.46it/s]


Epoch 2/5 done!
Average training loss: 0.4057
Validation accuracy: 0.8349
⟳ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.78it/s, Loss=0.6194]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.37it/s]


Epoch 3/5 done!
Average training loss: 0.3387
Validation accuracy: 0.8432
⟳ No improvement for 2/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.75it/s, Loss=0.2396]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.38it/s]


Epoch 4/5 done!
Average training loss: 0.2874
Validation accuracy: 0.8432
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.37it/s]



Starting training with LR: 1e-05, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.7077]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.33it/s]


Epoch 1/3 done!
Average training loss: 0.6175
Validation accuracy: 0.8391
✓ New best model (val_acc: 0.8391)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.5012]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.24it/s]


Epoch 2/3 done!
Average training loss: 0.4018
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.74it/s, Loss=0.4262]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.37it/s]


Epoch 3/3 done!
Average training loss: 0.3207
Validation accuracy: 0.8253
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.35it/s]



Starting training with LR: 1e-05, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.75it/s, Loss=0.4477]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.16it/s]


Epoch 1/4 done!
Average training loss: 0.5842
Validation accuracy: 0.8336
✓ New best model (val_acc: 0.8336)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.2756]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 2/4 done!
Average training loss: 0.4019
Validation accuracy: 0.8336
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.74it/s, Loss=0.2243]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.09it/s]


Epoch 3/4 done!
Average training loss: 0.3148
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.75it/s, Loss=0.7097]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 4/4 done!
Average training loss: 0.2852
Validation accuracy: 0.8459
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.26it/s]



Starting training with LR: 1e-05, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.2678]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.61it/s]


Epoch 1/5 done!
Average training loss: 0.6178
Validation accuracy: 0.8391
✓ New best model (val_acc: 0.8391)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.4672]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.44it/s]


Epoch 2/5 done!
Average training loss: 0.4050
Validation accuracy: 0.8198
⟳ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.3097]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.38it/s]


Epoch 3/5 done!
Average training loss: 0.3415
Validation accuracy: 0.8487
✓ New best model (val_acc: 0.8487)


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.6013]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 4/5 done!
Average training loss: 0.2952
Validation accuracy: 0.8391
⟳ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.0818]
Epoch 5/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.50it/s]


Epoch 5/5 done!
Average training loss: 0.2860
Validation accuracy: 0.8404
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.69it/s]



Starting training with LR: 2e-05, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.3050]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.70it/s]


Epoch 1/3 done!
Average training loss: 0.6147
Validation accuracy: 0.8143
✓ New best model (val_acc: 0.8143)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.1584]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.55it/s]


Epoch 2/3 done!
Average training loss: 0.3895
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.7738]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.34it/s]


Epoch 3/3 done!
Average training loss: 0.3371
Validation accuracy: 0.8239
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.35it/s]



Starting training with LR: 2e-05, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.3684]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.39it/s]


Epoch 1/4 done!
Average training loss: 0.6063
Validation accuracy: 0.8281
✓ New best model (val_acc: 0.8281)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.2426]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


Epoch 2/4 done!
Average training loss: 0.4096
Validation accuracy: 0.8253
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.6108]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 3/4 done!
Average training loss: 0.3259
Validation accuracy: 0.8212
⟳ No improvement for 2/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.5030]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.72it/s]


Epoch 4/4 done!
Average training loss: 0.3002
Validation accuracy: 0.8143
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.67it/s]



Starting training with LR: 2e-05, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.9097]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.20it/s]


Epoch 1/5 done!
Average training loss: 0.6458
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.78it/s, Loss=0.4887]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.39it/s]


Epoch 2/5 done!
Average training loss: 0.4258
Validation accuracy: 0.8473
⟳ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.4711]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.45it/s]


Epoch 3/5 done!
Average training loss: 0.3493
Validation accuracy: 0.8446
⟳ No improvement for 2/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.2360]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.41it/s]


Epoch 4/5 done!
Average training loss: 0.2895
Validation accuracy: 0.8322
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.42it/s]



Starting training with LR: 3e-05, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.1898]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.54it/s]


Epoch 1/3 done!
Average training loss: 0.6434
Validation accuracy: 0.8281
✓ New best model (val_acc: 0.8281)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.1473]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.26it/s]


Epoch 2/3 done!
Average training loss: 0.4192
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.2675]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.43it/s]


Epoch 3/3 done!
Average training loss: 0.3412
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.38it/s]


✓ New best overall model (accuracy: 0.8322)

Starting training with LR: 3e-05, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.7069]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.75it/s]


Epoch 1/4 done!
Average training loss: 0.6512
Validation accuracy: 0.8061
✓ New best model (val_acc: 0.8061)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.3476]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.74it/s]


Epoch 2/4 done!
Average training loss: 0.4436
Validation accuracy: 0.8473
✓ New best model (val_acc: 0.8473)


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.3853]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.96it/s]


Epoch 3/4 done!
Average training loss: 0.3851
Validation accuracy: 0.8446
⟳ No improvement for 1/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.5127]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.87it/s]


Epoch 4/4 done!
Average training loss: 0.3257
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.92it/s]



Starting training with LR: 3e-05, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.5589]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.10it/s]


Epoch 1/5 done!
Average training loss: 0.6127
Validation accuracy: 0.8184
✓ New best model (val_acc: 0.8184)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.4879]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.24it/s]


Epoch 2/5 done!
Average training loss: 0.4015
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.4859]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.19it/s]


Epoch 3/5 done!
Average training loss: 0.3242
Validation accuracy: 0.8143
⟳ No improvement for 1/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.2102]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.23it/s]


Epoch 4/5 done!
Average training loss: 0.2778
Validation accuracy: 0.8501
⟳ No improvement for 2/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.7571]
Epoch 5/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.16it/s]


Epoch 5/5 done!
Average training loss: 0.2578
Validation accuracy: 0.8377
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 5


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.26it/s]



Starting training with LR: 5e-05, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.78it/s, Loss=0.4364]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.71it/s]


Epoch 1/3 done!
Average training loss: 0.6496
Validation accuracy: 0.7593
✓ New best model (val_acc: 0.7593)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=1.1118]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.78it/s]


Epoch 2/3 done!
Average training loss: 0.4162
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.0358]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.86it/s]


Epoch 3/3 done!
Average training loss: 0.3438
Validation accuracy: 0.8239
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.94it/s]



Starting training with LR: 5e-05, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.83it/s, Loss=0.3798]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.05it/s]


Epoch 1/4 done!
Average training loss: 0.5886
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.2058]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.95it/s]


Epoch 2/4 done!
Average training loss: 0.3902
Validation accuracy: 0.8391
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.2542]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.07it/s]


Epoch 3/4 done!
Average training loss: 0.3158
Validation accuracy: 0.8363
⟳ No improvement for 2/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.75it/s, Loss=0.1200]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 4/4 done!
Average training loss: 0.2836
Validation accuracy: 0.8377
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.26it/s]



Starting training with LR: 5e-05, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=1.3631]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.79it/s]


Epoch 1/5 done!
Average training loss: 0.6465
Validation accuracy: 0.7992
✓ New best model (val_acc: 0.7992)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.78it/s, Loss=0.2653]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.88it/s]


Epoch 2/5 done!
Average training loss: 0.4098
Validation accuracy: 0.8487
✓ New best model (val_acc: 0.8487)


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.4396]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.79it/s]


Epoch 3/5 done!
Average training loss: 0.3541
Validation accuracy: 0.8322
⟳ No improvement for 1/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.1476]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.54it/s]


Epoch 4/5 done!
Average training loss: 0.2826
Validation accuracy: 0.8253
⟳ No improvement for 2/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.75it/s, Loss=0.1877]
Epoch 5/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.75it/s]


Epoch 5/5 done!
Average training loss: 0.2537
Validation accuracy: 0.8129
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 5


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.86it/s]



Starting training with LR: 0.0001, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.82it/s, Loss=0.4960]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.19it/s]


Epoch 1/3 done!
Average training loss: 0.6208
Validation accuracy: 0.8308
✓ New best model (val_acc: 0.8308)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.6626]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.03it/s]


Epoch 2/3 done!
Average training loss: 0.3820
Validation accuracy: 0.8391
✓ New best model (val_acc: 0.8391)


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.3780]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.08it/s]


Epoch 3/3 done!
Average training loss: 0.3387
Validation accuracy: 0.8501
✓ New best model (val_acc: 0.8501)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.27it/s]



Starting training with LR: 0.0001, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.73it/s, Loss=0.4996]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.85it/s]


Epoch 1/4 done!
Average training loss: 0.5870
Validation accuracy: 0.8473
✓ New best model (val_acc: 0.8473)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.75it/s, Loss=0.1442]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.86it/s]


Epoch 2/4 done!
Average training loss: 0.3954
Validation accuracy: 0.8336
⟳ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.1940]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.88it/s]


Epoch 3/4 done!
Average training loss: 0.3347
Validation accuracy: 0.8432
⟳ No improvement for 2/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.4045]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.88it/s]


Epoch 4/4 done!
Average training loss: 0.2739
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.73it/s]


✓ New best overall model (accuracy: 0.8336)

Starting training with LR: 0.0001, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.3120]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.07it/s]


Epoch 1/5 done!
Average training loss: 0.6053
Validation accuracy: 0.8432
✓ New best model (val_acc: 0.8432)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.4936]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.10it/s]


Epoch 2/5 done!
Average training loss: 0.3890
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.2474]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.12it/s]


Epoch 3/5 done!
Average training loss: 0.3302
Validation accuracy: 0.8459
⟳ No improvement for 1/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.1849]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.14it/s]


Epoch 4/5 done!
Average training loss: 0.2809
Validation accuracy: 0.8514
⟳ No improvement for 2/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.2013]
Epoch 5/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.94it/s]


Epoch 5/5 done!
Average training loss: 0.2835
Validation accuracy: 0.8459
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 5


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.16it/s]


✓ New best overall model (accuracy: 0.8349)

Starting training with LR: 0.001, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.4430]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.88it/s]


Epoch 1/3 done!
Average training loss: 0.6047
Validation accuracy: 0.8308
✓ New best model (val_acc: 0.8308)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.82it/s, Loss=0.5189]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.90it/s]


Epoch 2/3 done!
Average training loss: 0.3871
Validation accuracy: 0.8157
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.5651]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 3/3 done!
Average training loss: 0.3227
Validation accuracy: 0.8473
✓ New best model (val_acc: 0.8473)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.49it/s]


✓ New best overall model (accuracy: 0.8501)

Starting training with LR: 0.001, Epochs: 4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.73it/s, Loss=0.8649]
Epoch 1/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 1/4 done!
Average training loss: 0.6493
Validation accuracy: 0.8239
✓ New best model (val_acc: 0.8239)


Epoch 2/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.78it/s, Loss=0.3003]
Epoch 2/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.18it/s]


Epoch 2/4 done!
Average training loss: 0.3871
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 3/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.2638]
Epoch 3/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.25it/s]


Epoch 3/4 done!
Average training loss: 0.3195
Validation accuracy: 0.8446
⟳ No improvement for 1/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.4523]
Epoch 4/4 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.16it/s]


Epoch 4/4 done!
Average training loss: 0.2762
Validation accuracy: 0.8143
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.40it/s]



Starting training with LR: 0.001, Epochs: 5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.82it/s, Loss=0.1585]
Epoch 1/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


Epoch 1/5 done!
Average training loss: 0.6465
Validation accuracy: 0.8088
✓ New best model (val_acc: 0.8088)


Epoch 2/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.2767]
Epoch 2/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.72it/s]


Epoch 2/5 done!
Average training loss: 0.4396
Validation accuracy: 0.8487
✓ New best model (val_acc: 0.8487)


Epoch 3/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.80it/s, Loss=0.1529]
Epoch 3/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.48it/s]


Epoch 3/5 done!
Average training loss: 0.3662
Validation accuracy: 0.8569
✓ New best model (val_acc: 0.8569)


Epoch 4/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.1209]
Epoch 4/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.46it/s]


Epoch 4/5 done!
Average training loss: 0.3430
Validation accuracy: 0.8514
⟳ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.1531]
Epoch 5/5 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.68it/s]


Epoch 5/5 done!
Average training loss: 0.3112
Validation accuracy: 0.7950
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]



Best Hyperparameters:
Learning Rate: 0.001
Epochs: 3
Batch Size: 16
Validation Accuracy: 0.8473
Test Accuracy: 0.8501

All results saved in the 'bert_grid_search_20250503_143836' directory.


### Bert + AdapterH on Financial Phrasebank

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'bert-base-uncased'
lr = [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
epochs = [6,9,11]
task = 'FP'
dataset = dataset_FP
lora = False
adapter = True

# Making folder
if adapter:
  os.makedirs("./adapters", exist_ok=True)

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 1e-06, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.69it/s, Loss=1.0681]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.88it/s]


Epoch 1/6 done!
Average training loss: 1.0695
Validation accuracy: 0.6011
✓ New best model (val_acc: 0.6011)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=1.0776]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.39it/s]


Epoch 2/6 done!
Average training loss: 0.9834
Validation accuracy: 0.5970
⟳ No improvement for 1/3 epochs


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.06it/s, Loss=1.0334]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.44it/s]


Epoch 3/6 done!
Average training loss: 0.9386
Validation accuracy: 0.5970
⟳ No improvement for 2/3 epochs


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.8977]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.96it/s]


Epoch 4/6 done!
Average training loss: 0.9154
Validation accuracy: 0.5970
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.02it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


✓ New best overall model (accuracy: 0.5681)

Starting training with LR: 1e-06, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.9636]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.01it/s]


Epoch 1/9 done!
Average training loss: 1.0454
Validation accuracy: 0.5970
✓ New best model (val_acc: 0.5970)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.9383]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.06it/s]


Epoch 2/9 done!
Average training loss: 0.9709
Validation accuracy: 0.5970
⟳ No improvement for 1/3 epochs


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.01it/s, Loss=0.7252]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.27it/s]


Epoch 3/9 done!
Average training loss: 0.9307
Validation accuracy: 0.5970
⟳ No improvement for 2/3 epochs


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.03it/s, Loss=0.7914]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.15it/s]


Epoch 4/9 done!
Average training loss: 0.9108
Validation accuracy: 0.5970
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 4


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.06it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Starting training with LR: 1e-06, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=1.0221]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.66it/s]


Epoch 1/11 done!
Average training loss: 1.0878
Validation accuracy: 0.5653
✓ New best model (val_acc: 0.5653)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.8864]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.45it/s]


Epoch 2/11 done!
Average training loss: 0.9916
Validation accuracy: 0.6011
✓ New best model (val_acc: 0.6011)


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.01it/s, Loss=0.8096]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.58it/s]


Epoch 3/11 done!
Average training loss: 0.9463
Validation accuracy: 0.5970
⟳ No improvement for 1/3 epochs


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.02it/s, Loss=1.0200]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 4/11 done!
Average training loss: 0.9183
Validation accuracy: 0.5970
⟳ No improvement for 2/3 epochs


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.8333]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.54it/s]


Epoch 5/11 done!
Average training loss: 0.9005
Validation accuracy: 0.5970
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 5


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Starting training with LR: 1e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.8370]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.37it/s]


Epoch 1/6 done!
Average training loss: 0.9240
Validation accuracy: 0.6080
✓ New best model (val_acc: 0.6080)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.8340]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.35it/s]


Epoch 2/6 done!
Average training loss: 0.7816
Validation accuracy: 0.7015
✓ New best model (val_acc: 0.7015)


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.4787]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.47it/s]


Epoch 3/6 done!
Average training loss: 0.6853
Validation accuracy: 0.7276
✓ New best model (val_acc: 0.7276)


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.4645]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.56it/s]


Epoch 4/6 done!
Average training loss: 0.6082
Validation accuracy: 0.7552
✓ New best model (val_acc: 0.7552)


Epoch 5/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.5383]
Epoch 5/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.52it/s]


Epoch 5/6 done!
Average training loss: 0.5468
Validation accuracy: 0.7689
✓ New best model (val_acc: 0.7689)


Epoch 6/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.5059]
Epoch 6/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.55it/s]


Epoch 6/6 done!
Average training loss: 0.5053
Validation accuracy: 0.7978
✓ New best model (val_acc: 0.7978)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.58it/s]


✓ New best overall model (accuracy: 0.7868)

Starting training with LR: 1e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.7193]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.57it/s]


Epoch 1/9 done!
Average training loss: 0.9007
Validation accuracy: 0.6080
✓ New best model (val_acc: 0.6080)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.7822]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.57it/s]


Epoch 2/9 done!
Average training loss: 0.7774
Validation accuracy: 0.6768
✓ New best model (val_acc: 0.6768)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.6030]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.74it/s]


Epoch 3/9 done!
Average training loss: 0.6935
Validation accuracy: 0.7331
✓ New best model (val_acc: 0.7331)


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.01it/s, Loss=0.6630]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.58it/s]


Epoch 4/9 done!
Average training loss: 0.6112
Validation accuracy: 0.7455
✓ New best model (val_acc: 0.7455)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.5981]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.65it/s]


Epoch 5/9 done!
Average training loss: 0.5522
Validation accuracy: 0.7717
✓ New best model (val_acc: 0.7717)


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.4104]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.66it/s]


Epoch 6/9 done!
Average training loss: 0.5123
Validation accuracy: 0.7868
✓ New best model (val_acc: 0.7868)


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.8046]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.61it/s]


Epoch 7/9 done!
Average training loss: 0.4806
Validation accuracy: 0.8006
✓ New best model (val_acc: 0.8006)


Epoch 8/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.2218]
Epoch 8/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.66it/s]


Epoch 8/9 done!
Average training loss: 0.4559
Validation accuracy: 0.8129
✓ New best model (val_acc: 0.8129)


Epoch 9/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3041]
Epoch 9/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.56it/s]


Epoch 9/9 done!
Average training loss: 0.4295
Validation accuracy: 0.8253
✓ New best model (val_acc: 0.8253)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.63it/s]


✓ New best overall model (accuracy: 0.8006)

Starting training with LR: 1e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.7537]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


Epoch 1/11 done!
Average training loss: 0.9009
Validation accuracy: 0.6327
✓ New best model (val_acc: 0.6327)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.7902]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.46it/s]


Epoch 2/11 done!
Average training loss: 0.7677
Validation accuracy: 0.7029
✓ New best model (val_acc: 0.7029)


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.5332]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.50it/s]


Epoch 3/11 done!
Average training loss: 0.6807
Validation accuracy: 0.7249
✓ New best model (val_acc: 0.7249)


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.4629]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.55it/s]


Epoch 4/11 done!
Average training loss: 0.6043
Validation accuracy: 0.7538
✓ New best model (val_acc: 0.7538)


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.6138]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.57it/s]


Epoch 5/11 done!
Average training loss: 0.5389
Validation accuracy: 0.7854
✓ New best model (val_acc: 0.7854)


Epoch 6/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.3986]
Epoch 6/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.49it/s]


Epoch 6/11 done!
Average training loss: 0.5032
Validation accuracy: 0.7964
✓ New best model (val_acc: 0.7964)


Epoch 7/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.4529]
Epoch 7/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.49it/s]


Epoch 7/11 done!
Average training loss: 0.4684
Validation accuracy: 0.8074
✓ New best model (val_acc: 0.8074)


Epoch 8/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.7372]
Epoch 8/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.54it/s]


Epoch 8/11 done!
Average training loss: 0.4461
Validation accuracy: 0.8184
✓ New best model (val_acc: 0.8184)


Epoch 9/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3113]
Epoch 9/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.50it/s]


Epoch 9/11 done!
Average training loss: 0.4242
Validation accuracy: 0.8184
⟳ No improvement for 1/3 epochs


Epoch 10/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.5650]
Epoch 10/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


Epoch 10/11 done!
Average training loss: 0.4059
Validation accuracy: 0.8308
✓ New best model (val_acc: 0.8308)


Epoch 11/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.6112]
Epoch 11/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


Epoch 11/11 done!
Average training loss: 0.3941
Validation accuracy: 0.8363
✓ New best model (val_acc: 0.8363)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


✓ New best overall model (accuracy: 0.8267)

Starting training with LR: 2e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.8087]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.63it/s]


Epoch 1/6 done!
Average training loss: 0.8403
Validation accuracy: 0.6891
✓ New best model (val_acc: 0.6891)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.4137]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.86it/s]


Epoch 2/6 done!
Average training loss: 0.6563
Validation accuracy: 0.7579
✓ New best model (val_acc: 0.7579)


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.4493]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.67it/s]


Epoch 3/6 done!
Average training loss: 0.5399
Validation accuracy: 0.7923
✓ New best model (val_acc: 0.7923)


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.4182]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.76it/s]


Epoch 4/6 done!
Average training loss: 0.4615
Validation accuracy: 0.8198
✓ New best model (val_acc: 0.8198)


Epoch 5/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.5063]
Epoch 5/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.73it/s]


Epoch 5/6 done!
Average training loss: 0.4310
Validation accuracy: 0.8267
✓ New best model (val_acc: 0.8267)


Epoch 6/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.4033]
Epoch 6/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.70it/s]


Epoch 6/6 done!
Average training loss: 0.4013
Validation accuracy: 0.8377
✓ New best model (val_acc: 0.8377)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.72it/s]



Starting training with LR: 2e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.6726]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.61it/s]


Epoch 1/9 done!
Average training loss: 0.8519
Validation accuracy: 0.6891
✓ New best model (val_acc: 0.6891)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.5588]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 2/9 done!
Average training loss: 0.6644
Validation accuracy: 0.7524
✓ New best model (val_acc: 0.7524)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2974]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.62it/s]


Epoch 3/9 done!
Average training loss: 0.5316
Validation accuracy: 0.7909
✓ New best model (val_acc: 0.7909)


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.2456]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.58it/s]


Epoch 4/9 done!
Average training loss: 0.4640
Validation accuracy: 0.8088
✓ New best model (val_acc: 0.8088)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3413]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.64it/s]


Epoch 5/9 done!
Average training loss: 0.4187
Validation accuracy: 0.8322
✓ New best model (val_acc: 0.8322)


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3259]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.60it/s]


Epoch 6/9 done!
Average training loss: 0.3967
Validation accuracy: 0.8363
✓ New best model (val_acc: 0.8363)


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.4454]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.62it/s]


Epoch 7/9 done!
Average training loss: 0.3692
Validation accuracy: 0.8473
✓ New best model (val_acc: 0.8473)


Epoch 8/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.2181]
Epoch 8/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.69it/s]


Epoch 8/9 done!
Average training loss: 0.3550
Validation accuracy: 0.8528
✓ New best model (val_acc: 0.8528)


Epoch 9/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.3853]
Epoch 9/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.52it/s]


Epoch 9/9 done!
Average training loss: 0.3324
Validation accuracy: 0.8583
✓ New best model (val_acc: 0.8583)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.64it/s]


✓ New best overall model (accuracy: 0.8308)

Starting training with LR: 2e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.8430]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.89it/s]


Epoch 1/11 done!
Average training loss: 0.8514
Validation accuracy: 0.6878
✓ New best model (val_acc: 0.6878)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.91it/s, Loss=0.6944]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.99it/s]


Epoch 2/11 done!
Average training loss: 0.6670
Validation accuracy: 0.7497
✓ New best model (val_acc: 0.7497)


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.4930]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.95it/s]


Epoch 3/11 done!
Average training loss: 0.5371
Validation accuracy: 0.7799
✓ New best model (val_acc: 0.7799)


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.5366]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.88it/s]


Epoch 4/11 done!
Average training loss: 0.4777
Validation accuracy: 0.8019
✓ New best model (val_acc: 0.8019)


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.2806]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.97it/s]


Epoch 5/11 done!
Average training loss: 0.4289
Validation accuracy: 0.8226
✓ New best model (val_acc: 0.8226)


Epoch 6/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.3076]
Epoch 6/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.89it/s]


Epoch 6/11 done!
Average training loss: 0.3952
Validation accuracy: 0.8404
✓ New best model (val_acc: 0.8404)


Epoch 7/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.4726]
Epoch 7/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.93it/s]


Epoch 7/11 done!
Average training loss: 0.3795
Validation accuracy: 0.8404
⟳ No improvement for 1/3 epochs


Epoch 8/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.4867]
Epoch 8/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.97it/s]


Epoch 8/11 done!
Average training loss: 0.3568
Validation accuracy: 0.8528
✓ New best model (val_acc: 0.8528)


Epoch 9/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.1321]
Epoch 9/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.98it/s]


Epoch 9/11 done!
Average training loss: 0.3332
Validation accuracy: 0.8514
⟳ No improvement for 1/3 epochs


Epoch 10/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.6046]
Epoch 10/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.90it/s]


Epoch 10/11 done!
Average training loss: 0.3205
Validation accuracy: 0.8583
✓ New best model (val_acc: 0.8583)


Epoch 11/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2443]
Epoch 11/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.89it/s]


Epoch 11/11 done!
Average training loss: 0.3024
Validation accuracy: 0.8501
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.04it/s]


✓ New best overall model (accuracy: 0.8418)

Starting training with LR: 3e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.6670]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.93it/s]


Epoch 1/6 done!
Average training loss: 0.8032
Validation accuracy: 0.7221
✓ New best model (val_acc: 0.7221)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.4760]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 2/6 done!
Average training loss: 0.5857
Validation accuracy: 0.7923
✓ New best model (val_acc: 0.7923)


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3375]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 3/6 done!
Average training loss: 0.4745
Validation accuracy: 0.8171
✓ New best model (val_acc: 0.8171)


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.6298]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.00it/s]


Epoch 4/6 done!
Average training loss: 0.4178
Validation accuracy: 0.8253
✓ New best model (val_acc: 0.8253)


Epoch 5/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.2703]
Epoch 5/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.17it/s]


Epoch 5/6 done!
Average training loss: 0.3792
Validation accuracy: 0.8294
✓ New best model (val_acc: 0.8294)


Epoch 6/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3017]
Epoch 6/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.10it/s]


Epoch 6/6 done!
Average training loss: 0.3475
Validation accuracy: 0.8404
✓ New best model (val_acc: 0.8404)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.19it/s]



Starting training with LR: 3e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.6872]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.39it/s]


Epoch 1/9 done!
Average training loss: 0.8123
Validation accuracy: 0.7166
✓ New best model (val_acc: 0.7166)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.3813]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.44it/s]


Epoch 2/9 done!
Average training loss: 0.5932
Validation accuracy: 0.7565
✓ New best model (val_acc: 0.7565)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3335]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 3/9 done!
Average training loss: 0.4717
Validation accuracy: 0.8116
✓ New best model (val_acc: 0.8116)


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3410]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.52it/s]


Epoch 4/9 done!
Average training loss: 0.4151
Validation accuracy: 0.8294
✓ New best model (val_acc: 0.8294)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3056]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.56it/s]


Epoch 5/9 done!
Average training loss: 0.3765
Validation accuracy: 0.8446
✓ New best model (val_acc: 0.8446)


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3382]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.54it/s]


Epoch 6/9 done!
Average training loss: 0.3464
Validation accuracy: 0.8528
✓ New best model (val_acc: 0.8528)


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.5286]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.53it/s]


Epoch 7/9 done!
Average training loss: 0.3227
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 8/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.2599]
Epoch 8/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.58it/s]


Epoch 8/9 done!
Average training loss: 0.2999
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Epoch 9/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.4402]
Epoch 9/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.52it/s]


Epoch 9/9 done!
Average training loss: 0.2853
Validation accuracy: 0.8611
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.60it/s]


✓ New best overall model (accuracy: 0.8446)

Starting training with LR: 3e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.5029]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.96it/s]


Epoch 1/11 done!
Average training loss: 0.8153
Validation accuracy: 0.7098
✓ New best model (val_acc: 0.7098)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.5659]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.04it/s]


Epoch 2/11 done!
Average training loss: 0.5838
Validation accuracy: 0.7854
✓ New best model (val_acc: 0.7854)


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.4885]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.24it/s]


Epoch 3/11 done!
Average training loss: 0.4695
Validation accuracy: 0.8143
✓ New best model (val_acc: 0.8143)


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.3124]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.22it/s]


Epoch 4/11 done!
Average training loss: 0.4184
Validation accuracy: 0.8294
✓ New best model (val_acc: 0.8294)


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.1475]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.27it/s]


Epoch 5/11 done!
Average training loss: 0.3778
Validation accuracy: 0.8418
✓ New best model (val_acc: 0.8418)


Epoch 6/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2983]
Epoch 6/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.08it/s]


Epoch 6/11 done!
Average training loss: 0.3542
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 7/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.1681]
Epoch 7/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 7/11 done!
Average training loss: 0.3146
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 8/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.2276]
Epoch 8/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.30it/s]


Epoch 8/11 done!
Average training loss: 0.3085
Validation accuracy: 0.8583
✓ New best model (val_acc: 0.8583)


Epoch 9/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2092]
Epoch 9/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.11it/s]


Epoch 9/11 done!
Average training loss: 0.2948
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 10/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.1171]
Epoch 10/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 10/11 done!
Average training loss: 0.2755
Validation accuracy: 0.8583
⟳ No improvement for 1/3 epochs


Epoch 11/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.0340]
Epoch 11/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.23it/s]


Epoch 11/11 done!
Average training loss: 0.2691
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.19it/s]


✓ New best overall model (accuracy: 0.8556)

Starting training with LR: 5e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.01it/s, Loss=0.4920]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.69it/s]


Epoch 1/6 done!
Average training loss: 0.7461
Validation accuracy: 0.7813
✓ New best model (val_acc: 0.7813)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.2929]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.65it/s]


Epoch 2/6 done!
Average training loss: 0.4920
Validation accuracy: 0.8226
✓ New best model (val_acc: 0.8226)


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.3961]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.76it/s]


Epoch 3/6 done!
Average training loss: 0.4003
Validation accuracy: 0.8294
✓ New best model (val_acc: 0.8294)


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.5325]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.71it/s]


Epoch 4/6 done!
Average training loss: 0.3522
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 5/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.4604]
Epoch 5/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.75it/s]


Epoch 5/6 done!
Average training loss: 0.3247
Validation accuracy: 0.8624
✓ New best model (val_acc: 0.8624)


Epoch 6/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.00it/s, Loss=0.1450]
Epoch 6/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.73it/s]


Epoch 6/6 done!
Average training loss: 0.2916
Validation accuracy: 0.8611
⟳ No improvement for 1/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.75it/s]



Starting training with LR: 5e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.5402]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.09it/s]


Epoch 1/9 done!
Average training loss: 0.7287
Validation accuracy: 0.7634
✓ New best model (val_acc: 0.7634)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2656]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.14it/s]


Epoch 2/9 done!
Average training loss: 0.4781
Validation accuracy: 0.8226
✓ New best model (val_acc: 0.8226)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.4756]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.06it/s]


Epoch 3/9 done!
Average training loss: 0.3939
Validation accuracy: 0.8336
✓ New best model (val_acc: 0.8336)


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.6799]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.14it/s]


Epoch 4/9 done!
Average training loss: 0.3463
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.5008]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 5/9 done!
Average training loss: 0.3116
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.5669]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.14it/s]


Epoch 6/9 done!
Average training loss: 0.2811
Validation accuracy: 0.8597
⟳ No improvement for 1/3 epochs


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.2756]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.18it/s]


Epoch 7/9 done!
Average training loss: 0.2789
Validation accuracy: 0.8638
✓ New best model (val_acc: 0.8638)


Epoch 8/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.2020]
Epoch 8/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.08it/s]


Epoch 8/9 done!
Average training loss: 0.2483
Validation accuracy: 0.8611
⟳ No improvement for 1/3 epochs


Epoch 9/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.0904]
Epoch 9/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 9/9 done!
Average training loss: 0.2193
Validation accuracy: 0.8528
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.12it/s]



Starting training with LR: 5e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.6528]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.98it/s]


Epoch 1/11 done!
Average training loss: 0.7459
Validation accuracy: 0.7607
✓ New best model (val_acc: 0.7607)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.3439]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.10it/s]


Epoch 2/11 done!
Average training loss: 0.4930
Validation accuracy: 0.8212
✓ New best model (val_acc: 0.8212)


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.5272]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.28it/s]


Epoch 3/11 done!
Average training loss: 0.3910
Validation accuracy: 0.8336
✓ New best model (val_acc: 0.8336)


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.99it/s, Loss=0.2230]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.19it/s]


Epoch 4/11 done!
Average training loss: 0.3530
Validation accuracy: 0.8487
✓ New best model (val_acc: 0.8487)


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.4266]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.14it/s]


Epoch 5/11 done!
Average training loss: 0.3148
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 6/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.1390]
Epoch 6/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.23it/s]


Epoch 6/11 done!
Average training loss: 0.2944
Validation accuracy: 0.8624
✓ New best model (val_acc: 0.8624)


Epoch 7/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.0542]
Epoch 7/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.23it/s]


Epoch 7/11 done!
Average training loss: 0.2652
Validation accuracy: 0.8624
⟳ No improvement for 1/3 epochs


Epoch 8/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.1091]
Epoch 8/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.29it/s]


Epoch 8/11 done!
Average training loss: 0.2562
Validation accuracy: 0.8652
✓ New best model (val_acc: 0.8652)


Epoch 9/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.0613]
Epoch 9/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.28it/s]


Epoch 9/11 done!
Average training loss: 0.2312
Validation accuracy: 0.8611
⟳ No improvement for 1/3 epochs


Epoch 10/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.0313]
Epoch 10/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.21it/s]


Epoch 10/11 done!
Average training loss: 0.2200
Validation accuracy: 0.8597
⟳ No improvement for 2/3 epochs


Epoch 11/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.4807]
Epoch 11/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.21it/s]


Epoch 11/11 done!
Average training loss: 0.2029
Validation accuracy: 0.8583
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 11


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.20it/s]



Starting training with LR: 0.0001, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.5652]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.99it/s]


Epoch 1/6 done!
Average training loss: 0.6437
Validation accuracy: 0.8129
✓ New best model (val_acc: 0.8129)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.0870]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 2/6 done!
Average training loss: 0.4050
Validation accuracy: 0.8322
✓ New best model (val_acc: 0.8322)


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.5256]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.03it/s]


Epoch 3/6 done!
Average training loss: 0.3257
Validation accuracy: 0.8611
✓ New best model (val_acc: 0.8611)


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3001]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 4/6 done!
Average training loss: 0.2929
Validation accuracy: 0.8624
✓ New best model (val_acc: 0.8624)


Epoch 5/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.3609]
Epoch 5/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.95it/s]


Epoch 5/6 done!
Average training loss: 0.2498
Validation accuracy: 0.8597
⟳ No improvement for 1/3 epochs


Epoch 6/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3867]
Epoch 6/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.07it/s]


Epoch 6/6 done!
Average training loss: 0.2205
Validation accuracy: 0.8569
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.17it/s]



Starting training with LR: 0.0001, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.3039]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.55it/s]


Epoch 1/9 done!
Average training loss: 0.6365
Validation accuracy: 0.7950
✓ New best model (val_acc: 0.7950)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.3521]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.68it/s]


Epoch 2/9 done!
Average training loss: 0.4030
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.3345]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.67it/s]


Epoch 3/9 done!
Average training loss: 0.3382
Validation accuracy: 0.8501
⟳ No improvement for 1/3 epochs


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.2569]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.60it/s]


Epoch 4/9 done!
Average training loss: 0.2913
Validation accuracy: 0.8501
⟳ No improvement for 2/3 epochs


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.3068]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.66it/s]


Epoch 5/9 done!
Average training loss: 0.2487
Validation accuracy: 0.8542
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 5


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.70it/s]


✓ New best overall model (accuracy: 0.8583)

Starting training with LR: 0.0001, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.95it/s, Loss=0.2199]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.02it/s]


Epoch 1/11 done!
Average training loss: 0.6482
Validation accuracy: 0.8061
✓ New best model (val_acc: 0.8061)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3363]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.17it/s]


Epoch 2/11 done!
Average training loss: 0.4054
Validation accuracy: 0.8528
✓ New best model (val_acc: 0.8528)


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.1854]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.23it/s]


Epoch 3/11 done!
Average training loss: 0.3349
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.3844]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.13it/s]


Epoch 4/11 done!
Average training loss: 0.2900
Validation accuracy: 0.8666
✓ New best model (val_acc: 0.8666)


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.8672]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.16it/s]


Epoch 5/11 done!
Average training loss: 0.2570
Validation accuracy: 0.8569
⟳ No improvement for 1/3 epochs


Epoch 6/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.98it/s, Loss=0.1261]
Epoch 6/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.04it/s]


Epoch 6/11 done!
Average training loss: 0.2179
Validation accuracy: 0.8666
⟳ No improvement for 2/3 epochs


Epoch 7/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.97it/s, Loss=0.3997]
Epoch 7/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.20it/s]


Epoch 7/11 done!
Average training loss: 0.1868
Validation accuracy: 0.8624
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 7


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.15it/s]



Starting training with LR: 0.001, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.2433]
Epoch 1/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.97it/s]


Epoch 1/6 done!
Average training loss: 0.5379
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 2/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.90it/s, Loss=0.5600]
Epoch 2/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.99it/s]


Epoch 2/6 done!
Average training loss: 0.3604
Validation accuracy: 0.8473
✓ New best model (val_acc: 0.8473)


Epoch 3/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.0217]
Epoch 3/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 3/6 done!
Average training loss: 0.2637
Validation accuracy: 0.8528
✓ New best model (val_acc: 0.8528)


Epoch 4/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.0613]
Epoch 4/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.32it/s]


Epoch 4/6 done!
Average training loss: 0.1992
Validation accuracy: 0.8583
✓ New best model (val_acc: 0.8583)


Epoch 5/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.0257]
Epoch 5/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.30it/s]


Epoch 5/6 done!
Average training loss: 0.1658
Validation accuracy: 0.8226
⟳ No improvement for 1/3 epochs


Epoch 6/6 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.0179]
Epoch 6/6 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.29it/s]


Epoch 6/6 done!
Average training loss: 0.1184
Validation accuracy: 0.8404
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.20it/s]



Starting training with LR: 0.001, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.1497]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.89it/s]


Epoch 1/9 done!
Average training loss: 0.5230
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.4016]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 2/9 done!
Average training loss: 0.3554
Validation accuracy: 0.8377
⟳ No improvement for 1/3 epochs


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.91it/s, Loss=0.0705]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.94it/s]


Epoch 3/9 done!
Average training loss: 0.2714
Validation accuracy: 0.8432
⟳ No improvement for 2/3 epochs


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.2595]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.11it/s]


Epoch 4/9 done!
Average training loss: 0.1935
Validation accuracy: 0.8528
✓ New best model (val_acc: 0.8528)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.0156]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.91it/s]


Epoch 5/9 done!
Average training loss: 0.1494
Validation accuracy: 0.8556
✓ New best model (val_acc: 0.8556)


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.5622]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 6/9 done!
Average training loss: 0.1150
Validation accuracy: 0.8349
⟳ No improvement for 1/3 epochs


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.1828]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.90it/s]


Epoch 7/9 done!
Average training loss: 0.1110
Validation accuracy: 0.8391
⟳ No improvement for 2/3 epochs


Epoch 8/9 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.0817]
Epoch 8/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.96it/s]


Epoch 8/9 done!
Average training loss: 0.0830
Validation accuracy: 0.8391
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 8


Testing: 100%|██████████| 46/46 [00:02<00:00, 17.06it/s]



Starting training with LR: 0.001, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.3628]
Epoch 1/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.23it/s]


Epoch 1/11 done!
Average training loss: 0.5466
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 2/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.92it/s, Loss=0.2433]
Epoch 2/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.25it/s]


Epoch 2/11 done!
Average training loss: 0.3608
Validation accuracy: 0.8363
⟳ No improvement for 1/3 epochs


Epoch 3/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.91it/s, Loss=0.0363]
Epoch 3/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.28it/s]


Epoch 3/11 done!
Average training loss: 0.2737
Validation accuracy: 0.8281
⟳ No improvement for 2/3 epochs


Epoch 4/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.0879]
Epoch 4/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.51it/s]


Epoch 4/11 done!
Average training loss: 0.2039
Validation accuracy: 0.8542
✓ New best model (val_acc: 0.8542)


Epoch 5/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.94it/s, Loss=0.2438]
Epoch 5/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.32it/s]


Epoch 5/11 done!
Average training loss: 0.1368
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 6/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.1537]
Epoch 6/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.31it/s]


Epoch 6/11 done!
Average training loss: 0.1199
Validation accuracy: 0.8432
⟳ No improvement for 1/3 epochs


Epoch 7/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.1264]
Epoch 7/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.35it/s]


Epoch 7/11 done!
Average training loss: 0.1172
Validation accuracy: 0.8432
⟳ No improvement for 2/3 epochs


Epoch 8/11 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.93it/s, Loss=0.5308]
Epoch 8/11 [Val]: 100%|██████████| 46/46 [00:02<00:00, 17.38it/s]


Epoch 8/11 done!
Average training loss: 0.0946
Validation accuracy: 0.8363
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 8


Testing: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]



Best Hyperparameters:
Learning Rate: 0.0001
Epochs: 9
Batch Size: 16
Validation Accuracy: 0.8556
Test Accuracy: 0.8583

All results saved in the 'bert_grid_search_20250503_153905' directory.


### Bert on FIQA

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'bert-base-uncased'
task = 'FIQA'
dataset = dataset_FIQA
lr = [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
epochs = [3,4,5]
lora = False
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 1e-06, Epochs: 3


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:09<00:00,  5.66it/s]


Epoch 1: Train Loss = 0.1973, Val MSE = 0.1847, Val MAE = 0.3620


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.20it/s]


Epoch 2: Train Loss = 0.1696, Val MSE = 0.1673, Val MAE = 0.3554


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.20it/s]


Epoch 3: Train Loss = 0.1653, Val MSE = 0.1646, Val MAE = 0.3469
✓ New best overall model (MSE: 0.1646)

Starting training with LR: 1e-06, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.01it/s]


Epoch 1: Train Loss = 0.2184, Val MSE = 0.1804, Val MAE = 0.3603


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.18it/s]


Epoch 2: Train Loss = 0.1778, Val MSE = 0.1726, Val MAE = 0.3461


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.16it/s]


Epoch 3: Train Loss = 0.1677, Val MSE = 0.1603, Val MAE = 0.3347


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.14it/s]


Epoch 4: Train Loss = 0.1660, Val MSE = 0.1534, Val MAE = 0.3246
✓ New best overall model (MSE: 0.1534)

Starting training with LR: 1e-06, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.13it/s]


Epoch 1: Train Loss = 0.1924, Val MSE = 0.1716, Val MAE = 0.3601


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.12it/s]


Epoch 2: Train Loss = 0.1607, Val MSE = 0.1701, Val MAE = 0.3497


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.10it/s]


Epoch 3: Train Loss = 0.1547, Val MSE = 0.1564, Val MAE = 0.3411


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.10it/s]


Epoch 4: Train Loss = 0.1511, Val MSE = 0.1522, Val MAE = 0.3282


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.09it/s]


Epoch 5: Train Loss = 0.1415, Val MSE = 0.1428, Val MAE = 0.3190
✓ New best overall model (MSE: 0.1428)

Starting training with LR: 1e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.08it/s]


Epoch 1: Train Loss = 0.1539, Val MSE = 0.1748, Val MAE = 0.3216


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.05it/s]


Epoch 2: Train Loss = 0.1086, Val MSE = 0.1134, Val MAE = 0.2521


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.04it/s]


Epoch 3: Train Loss = 0.0606, Val MSE = 0.0833, Val MAE = 0.2156
✓ New best overall model (MSE: 0.0833)

Starting training with LR: 1e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.04it/s]


Epoch 1: Train Loss = 0.1607, Val MSE = 0.1416, Val MAE = 0.3235


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.04it/s]


Epoch 2: Train Loss = 0.1070, Val MSE = 0.0962, Val MAE = 0.2410


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.02it/s]


Epoch 3: Train Loss = 0.0647, Val MSE = 0.0927, Val MAE = 0.2277


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.01it/s]


Epoch 4: Train Loss = 0.0403, Val MSE = 0.0950, Val MAE = 0.2287
  ↪ No improvement for 1/3 epochs

Starting training with LR: 1e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.00it/s]


Epoch 1: Train Loss = 0.1568, Val MSE = 0.1196, Val MAE = 0.2771


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.99it/s]


Epoch 2: Train Loss = 0.0974, Val MSE = 0.1043, Val MAE = 0.2451


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.97it/s]


Epoch 3: Train Loss = 0.0522, Val MSE = 0.0874, Val MAE = 0.2138


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 4: Train Loss = 0.0388, Val MSE = 0.0813, Val MAE = 0.2057


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 5: Train Loss = 0.0256, Val MSE = 0.0849, Val MAE = 0.2033
  ↪ No improvement for 1/3 epochs
✓ New best overall model (MSE: 0.0813)

Starting training with LR: 2e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 1: Train Loss = 0.1475, Val MSE = 0.0931, Val MAE = 0.2467


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 2: Train Loss = 0.0679, Val MSE = 0.1094, Val MAE = 0.2399
  ↪ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0390, Val MSE = 0.0824, Val MAE = 0.2090

Starting training with LR: 2e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.97it/s]


Epoch 1: Train Loss = 0.1642, Val MSE = 0.1203, Val MAE = 0.2696


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0788, Val MSE = 0.0895, Val MAE = 0.2145


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0364, Val MSE = 0.0885, Val MAE = 0.2165


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 4: Train Loss = 0.0237, Val MSE = 0.0875, Val MAE = 0.2066

Starting training with LR: 2e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1643, Val MSE = 0.1095, Val MAE = 0.2679


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 2: Train Loss = 0.0836, Val MSE = 0.1109, Val MAE = 0.2449
  ↪ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 3: Train Loss = 0.0442, Val MSE = 0.0742, Val MAE = 0.1977


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 4: Train Loss = 0.0250, Val MSE = 0.0784, Val MAE = 0.2039
  ↪ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 5: Train Loss = 0.0171, Val MSE = 0.0909, Val MAE = 0.2158
  ↪ No improvement for 2/3 epochs
✓ New best overall model (MSE: 0.0742)

Starting training with LR: 3e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1413, Val MSE = 0.1008, Val MAE = 0.2380


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 2: Train Loss = 0.0645, Val MSE = 0.0778, Val MAE = 0.2020


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0391, Val MSE = 0.0854, Val MAE = 0.1972
  ↪ No improvement for 1/3 epochs

Starting training with LR: 3e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1580, Val MSE = 0.1021, Val MAE = 0.2517


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0785, Val MSE = 0.0721, Val MAE = 0.2020


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0386, Val MSE = 0.0698, Val MAE = 0.1944


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 4: Train Loss = 0.0231, Val MSE = 0.0664, Val MAE = 0.1896
✓ New best overall model (MSE: 0.0664)

Starting training with LR: 3e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1474, Val MSE = 0.1231, Val MAE = 0.2622


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0702, Val MSE = 0.0819, Val MAE = 0.2068


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0364, Val MSE = 0.0789, Val MAE = 0.2020


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 4: Train Loss = 0.0203, Val MSE = 0.0708, Val MAE = 0.1951


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 5: Train Loss = 0.0151, Val MSE = 0.0700, Val MAE = 0.1849

Starting training with LR: 5e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 1: Train Loss = 0.1735, Val MSE = 0.0964, Val MAE = 0.2281


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0571, Val MSE = 0.0915, Val MAE = 0.2202


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0309, Val MSE = 0.0690, Val MAE = 0.1944

Starting training with LR: 5e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1516, Val MSE = 0.1087, Val MAE = 0.2686


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0767, Val MSE = 0.0785, Val MAE = 0.1935


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0348, Val MSE = 0.0746, Val MAE = 0.1932


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 4: Train Loss = 0.0225, Val MSE = 0.0847, Val MAE = 0.2068
  ↪ No improvement for 1/3 epochs

Starting training with LR: 5e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1636, Val MSE = 0.1427, Val MAE = 0.2750


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0653, Val MSE = 0.0763, Val MAE = 0.2190


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0322, Val MSE = 0.0739, Val MAE = 0.2089


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 4: Train Loss = 0.0190, Val MSE = 0.0764, Val MAE = 0.1972
  ↪ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 5: Train Loss = 0.0119, Val MSE = 0.0647, Val MAE = 0.1822
✓ New best overall model (MSE: 0.0647)

Starting training with LR: 0.0001, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 1: Train Loss = 0.1850, Val MSE = 0.1612, Val MAE = 0.3536


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 2: Train Loss = 0.1299, Val MSE = 0.0967, Val MAE = 0.2254


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.0679, Val MSE = 0.0782, Val MAE = 0.2019

Starting training with LR: 0.0001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 1: Train Loss = 0.2142, Val MSE = 0.1753, Val MAE = 0.3535


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 2: Train Loss = 0.1677, Val MSE = 0.1391, Val MAE = 0.3272


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 3: Train Loss = 0.1024, Val MSE = 0.0803, Val MAE = 0.2010


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 4: Train Loss = 0.0625, Val MSE = 0.0898, Val MAE = 0.2208
  ↪ No improvement for 1/3 epochs

Starting training with LR: 0.0001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 1: Train Loss = 0.1618, Val MSE = 0.1174, Val MAE = 0.2445


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.0774, Val MSE = 0.0972, Val MAE = 0.2418


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 3: Train Loss = 0.0405, Val MSE = 0.0720, Val MAE = 0.1947


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 4: Train Loss = 0.0302, Val MSE = 0.0749, Val MAE = 0.1945
  ↪ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.93it/s]


Epoch 5: Train Loss = 0.0220, Val MSE = 0.0573, Val MAE = 0.1781
✓ New best overall model (MSE: 0.0573)

Starting training with LR: 0.001, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 1: Train Loss = 1.0790, Val MSE = 1.2773, Val MAE = 1.0547


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 2: Train Loss = 1.2684, Val MSE = 1.0584, Val MAE = 0.9453


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 3: Train Loss = 0.9391, Val MSE = 1.0583, Val MAE = 0.9453

Starting training with LR: 0.001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 1: Train Loss = 0.9099, Val MSE = 1.0581, Val MAE = 0.9451


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 2: Train Loss = 1.2336, Val MSE = 1.2357, Val MAE = 1.0348
  ↪ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 3: Train Loss = 0.3529, Val MSE = 0.2032, Val MAE = 0.3597


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.96it/s]


Epoch 4: Train Loss = 0.2025, Val MSE = 0.1690, Val MAE = 0.3570

Starting training with LR: 0.001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 1: Train Loss = 1.0622, Val MSE = 1.0584, Val MAE = 0.9453


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.94it/s]


Epoch 2: Train Loss = 0.9372, Val MSE = 1.0584, Val MAE = 0.9453
  ↪ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 3: Train Loss = 0.9399, Val MSE = 1.0584, Val MAE = 0.9453
  ↪ No improvement for 2/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 4: Train Loss = 0.9299, Val MSE = 1.0584, Val MAE = 0.9453
  ↪ No improvement for 3/3 epochs
Stopping early after 4 epochs.

Best Hyperparameters:
Learning Rate: 0.0001
Epochs: 5
Batch Size: 16
Validation MSE: 0.0573
Test MSE: 0.0773

All results saved in the 'bert_grid_search_20250503_142242' directory.


### Bert + LoRA on FIQA

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'bert-base-uncased'
lr = [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
epochs = [3,4,5]
task = 'FIQA'
dataset = dataset_FIQA
lora = True
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 1e-06, Epochs: 3


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.37it/s]


Epoch 1: Train Loss = 0.3147, Val MSE = 0.2841, Val MAE = 0.4452


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.38it/s]


Epoch 2: Train Loss = 0.3126, Val MSE = 0.2730, Val MAE = 0.4376


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.35it/s]


Epoch 3: Train Loss = 0.2957, Val MSE = 0.2622, Val MAE = 0.4302
✓ New best overall model (MSE: 0.2622)

Starting training with LR: 1e-06, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.29it/s]


Epoch 1: Train Loss = 0.4482, Val MSE = 0.4027, Val MAE = 0.5245


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.28it/s]


Epoch 2: Train Loss = 0.4371, Val MSE = 0.3873, Val MAE = 0.5142


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.21it/s]


Epoch 3: Train Loss = 0.4222, Val MSE = 0.3715, Val MAE = 0.5037


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.18it/s]


Epoch 4: Train Loss = 0.4080, Val MSE = 0.3555, Val MAE = 0.4928

Starting training with LR: 1e-06, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.16it/s]


Epoch 1: Train Loss = 0.6123, Val MSE = 0.5247, Val MAE = 0.6122


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.08it/s]


Epoch 2: Train Loss = 0.6065, Val MSE = 0.5070, Val MAE = 0.5992


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 3: Train Loss = 0.5891, Val MSE = 0.4882, Val MAE = 0.5852


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 4: Train Loss = 0.5607, Val MSE = 0.4687, Val MAE = 0.5703


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.89it/s]


Epoch 5: Train Loss = 0.5454, Val MSE = 0.4481, Val MAE = 0.5547

Starting training with LR: 1e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.91it/s]


Epoch 1: Train Loss = 0.3453, Val MSE = 0.2498, Val MAE = 0.4325


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.88it/s]


Epoch 2: Train Loss = 0.2270, Val MSE = 0.1869, Val MAE = 0.3831


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.91it/s]


Epoch 3: Train Loss = 0.1912, Val MSE = 0.1846, Val MAE = 0.3735
✓ New best overall model (MSE: 0.1846)

Starting training with LR: 1e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.96it/s]


Epoch 1: Train Loss = 0.2233, Val MSE = 0.1757, Val MAE = 0.3689


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 2: Train Loss = 0.1940, Val MSE = 0.1772, Val MAE = 0.3647
  ↪ No improvement for 1/3 epochs


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 3: Train Loss = 0.1869, Val MSE = 0.1764, Val MAE = 0.3615
  ↪ No improvement for 2/3 epochs


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 4: Train Loss = 0.1826, Val MSE = 0.1743, Val MAE = 0.3588
✓ New best overall model (MSE: 0.1743)

Starting training with LR: 1e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.08it/s]


Epoch 1: Train Loss = 0.3853, Val MSE = 0.2704, Val MAE = 0.4298


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 2: Train Loss = 0.2534, Val MSE = 0.1953, Val MAE = 0.3725


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.04it/s]


Epoch 3: Train Loss = 0.1993, Val MSE = 0.1888, Val MAE = 0.3654


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.06it/s]


Epoch 4: Train Loss = 0.1820, Val MSE = 0.1844, Val MAE = 0.3615


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 5: Train Loss = 0.1910, Val MSE = 0.1812, Val MAE = 0.3583

Starting training with LR: 2e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.04it/s]


Epoch 1: Train Loss = 0.1615, Val MSE = 0.1833, Val MAE = 0.3538


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 2: Train Loss = 0.1651, Val MSE = 0.1783, Val MAE = 0.3498


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 3: Train Loss = 0.1609, Val MSE = 0.1748, Val MAE = 0.3442

Starting training with LR: 2e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 1: Train Loss = 0.1902, Val MSE = 0.1936, Val MAE = 0.3758


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 2: Train Loss = 0.1745, Val MSE = 0.1890, Val MAE = 0.3700


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 3: Train Loss = 0.1714, Val MSE = 0.1864, Val MAE = 0.3647


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 4: Train Loss = 0.1691, Val MSE = 0.1783, Val MAE = 0.3613

Starting training with LR: 2e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 1: Train Loss = 0.1937, Val MSE = 0.1953, Val MAE = 0.3771


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 2: Train Loss = 0.1747, Val MSE = 0.1845, Val MAE = 0.3679


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 3: Train Loss = 0.1709, Val MSE = 0.1774, Val MAE = 0.3634


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 4: Train Loss = 0.1645, Val MSE = 0.1731, Val MAE = 0.3608


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 5: Train Loss = 0.1605, Val MSE = 0.1709, Val MAE = 0.3574
✓ New best overall model (MSE: 0.1709)

Starting training with LR: 3e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.03it/s]


Epoch 1: Train Loss = 0.1936, Val MSE = 0.1739, Val MAE = 0.3557


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 2: Train Loss = 0.1642, Val MSE = 0.1627, Val MAE = 0.3456


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 3: Train Loss = 0.1595, Val MSE = 0.1594, Val MAE = 0.3376
✓ New best overall model (MSE: 0.1594)

Starting training with LR: 3e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 1: Train Loss = 0.2084, Val MSE = 0.1832, Val MAE = 0.3595


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 2: Train Loss = 0.1630, Val MSE = 0.1727, Val MAE = 0.3536


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 3: Train Loss = 0.1667, Val MSE = 0.1657, Val MAE = 0.3471


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 4: Train Loss = 0.1588, Val MSE = 0.1612, Val MAE = 0.3408

Starting training with LR: 3e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 1: Train Loss = 0.1861, Val MSE = 0.1864, Val MAE = 0.3687


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.94it/s]


Epoch 2: Train Loss = 0.1727, Val MSE = 0.1646, Val MAE = 0.3508


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.94it/s]


Epoch 3: Train Loss = 0.1639, Val MSE = 0.1607, Val MAE = 0.3443


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.96it/s]


Epoch 4: Train Loss = 0.1567, Val MSE = 0.1588, Val MAE = 0.3373


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 5: Train Loss = 0.1475, Val MSE = 0.1515, Val MAE = 0.3288
✓ New best overall model (MSE: 0.1515)

Starting training with LR: 5e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 1: Train Loss = 0.2283, Val MSE = 0.1932, Val MAE = 0.3797


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 2: Train Loss = 0.1897, Val MSE = 0.1827, Val MAE = 0.3700


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 3: Train Loss = 0.1742, Val MSE = 0.1749, Val MAE = 0.3557

Starting training with LR: 5e-05, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 1: Train Loss = 0.2204, Val MSE = 0.1813, Val MAE = 0.3569


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 2: Train Loss = 0.1721, Val MSE = 0.1747, Val MAE = 0.3463


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 3: Train Loss = 0.1621, Val MSE = 0.1591, Val MAE = 0.3366


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 4: Train Loss = 0.1516, Val MSE = 0.1499, Val MAE = 0.3214
✓ New best overall model (MSE: 0.1499)

Starting training with LR: 5e-05, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.02it/s]


Epoch 1: Train Loss = 0.1957, Val MSE = 0.1764, Val MAE = 0.3662


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 2: Train Loss = 0.1756, Val MSE = 0.1783, Val MAE = 0.3563
  ↪ No improvement for 1/3 epochs


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 3: Train Loss = 0.1713, Val MSE = 0.1610, Val MAE = 0.3464


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 4: Train Loss = 0.1542, Val MSE = 0.1542, Val MAE = 0.3363


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 5: Train Loss = 0.1489, Val MSE = 0.1396, Val MAE = 0.3136
✓ New best overall model (MSE: 0.1396)

Starting training with LR: 0.0001, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 1: Train Loss = 0.1807, Val MSE = 0.1678, Val MAE = 0.3465


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.01it/s]


Epoch 2: Train Loss = 0.1633, Val MSE = 0.1561, Val MAE = 0.3238


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 3: Train Loss = 0.1372, Val MSE = 0.1329, Val MAE = 0.2910
✓ New best overall model (MSE: 0.1329)

Starting training with LR: 0.0001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 1: Train Loss = 0.1764, Val MSE = 0.1627, Val MAE = 0.3553


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 2: Train Loss = 0.1514, Val MSE = 0.1457, Val MAE = 0.3327


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 3: Train Loss = 0.1306, Val MSE = 0.1223, Val MAE = 0.2761


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 4: Train Loss = 0.1076, Val MSE = 0.1152, Val MAE = 0.2605
✓ New best overall model (MSE: 0.1152)

Starting training with LR: 0.0001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 1: Train Loss = 0.1936, Val MSE = 0.1649, Val MAE = 0.3448


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.95it/s]


Epoch 2: Train Loss = 0.1625, Val MSE = 0.1513, Val MAE = 0.3241


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.94it/s]


Epoch 3: Train Loss = 0.1428, Val MSE = 0.1345, Val MAE = 0.3157


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.96it/s]


Epoch 4: Train Loss = 0.1181, Val MSE = 0.1068, Val MAE = 0.2642


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 5: Train Loss = 0.0979, Val MSE = 0.0943, Val MAE = 0.2399
✓ New best overall model (MSE: 0.0943)

Starting training with LR: 0.001, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 1: Train Loss = 0.1680, Val MSE = 0.1195, Val MAE = 0.2736


Epoch 2/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 2: Train Loss = 0.0976, Val MSE = 0.1389, Val MAE = 0.2750
  ↪ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.97it/s]


Epoch 3: Train Loss = 0.0520, Val MSE = 0.0889, Val MAE = 0.2126
✓ New best overall model (MSE: 0.0889)

Starting training with LR: 0.001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 1: Train Loss = 0.1689, Val MSE = 0.1926, Val MAE = 0.3455


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.95it/s]


Epoch 2: Train Loss = 0.0871, Val MSE = 0.0802, Val MAE = 0.2323


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.95it/s]


Epoch 3: Train Loss = 0.0533, Val MSE = 0.0723, Val MAE = 0.2083


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.95it/s]


Epoch 4: Train Loss = 0.0330, Val MSE = 0.0696, Val MAE = 0.1947
✓ New best overall model (MSE: 0.0696)

Starting training with LR: 0.001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 1: Train Loss = 0.1723, Val MSE = 0.1328, Val MAE = 0.2747


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.00it/s]


Epoch 2: Train Loss = 0.0889, Val MSE = 0.0915, Val MAE = 0.2135


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 3: Train Loss = 0.0523, Val MSE = 0.0710, Val MAE = 0.1978


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.98it/s]


Epoch 4: Train Loss = 0.0360, Val MSE = 0.0958, Val MAE = 0.2238
  ↪ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.99it/s]


Epoch 5: Train Loss = 0.0257, Val MSE = 0.0735, Val MAE = 0.2000
  ↪ No improvement for 2/3 epochs

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 4
Batch Size: 16
Validation MSE: 0.0696
Test MSE: 0.1056

All results saved in the 'bert_grid_search_20250503_152352' directory.


### Bert + AdapterH on FIQA

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'bert-base-uncased'
lr = [1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3]
epochs = [6,9,11]
task = 'FIQA'
dataset = dataset_FIQA
lora = False
adapter = True

# Making folder
if adapter:
  os.makedirs("./adapters", exist_ok=True)

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 1e-06, Epochs: 6


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  7.55it/s]


Epoch 1: Train Loss = 0.1872, Val MSE = 0.1843, Val MAE = 0.3671


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.51it/s]


Epoch 2: Train Loss = 0.1931, Val MSE = 0.1840, Val MAE = 0.3666


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.47it/s]


Epoch 3: Train Loss = 0.1929, Val MSE = 0.1835, Val MAE = 0.3660


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 4: Train Loss = 0.1846, Val MSE = 0.1832, Val MAE = 0.3656


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 5: Train Loss = 0.1833, Val MSE = 0.1829, Val MAE = 0.3652


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 6: Train Loss = 0.1819, Val MSE = 0.1826, Val MAE = 0.3647
✓ New best overall model (MSE: 0.1826)

Starting training with LR: 1e-06, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.30it/s]


Epoch 1: Train Loss = 0.1826, Val MSE = 0.1943, Val MAE = 0.3720


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 2: Train Loss = 0.1728, Val MSE = 0.1945, Val MAE = 0.3714
  ↪ No improvement for 1/3 epochs


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.14it/s]


Epoch 3: Train Loss = 0.1847, Val MSE = 0.1943, Val MAE = 0.3708
  ↪ No improvement for 2/3 epochs


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.06it/s]


Epoch 4: Train Loss = 0.1781, Val MSE = 0.1939, Val MAE = 0.3703


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.06it/s]


Epoch 5: Train Loss = 0.1771, Val MSE = 0.1938, Val MAE = 0.3698


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.11it/s]


Epoch 6: Train Loss = 0.1773, Val MSE = 0.1933, Val MAE = 0.3693


Epoch 7/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 7: Train Loss = 0.1750, Val MSE = 0.1931, Val MAE = 0.3688


Epoch 8/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 8: Train Loss = 0.1736, Val MSE = 0.1927, Val MAE = 0.3684


Epoch 9/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 9: Train Loss = 0.1779, Val MSE = 0.1923, Val MAE = 0.3679

Starting training with LR: 1e-06, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.30it/s]


Epoch 1: Train Loss = 0.2610, Val MSE = 0.2407, Val MAE = 0.4296


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 2: Train Loss = 0.2432, Val MSE = 0.2306, Val MAE = 0.4236


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 3: Train Loss = 0.2354, Val MSE = 0.2220, Val MAE = 0.4184


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 4: Train Loss = 0.2191, Val MSE = 0.2153, Val MAE = 0.4141


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 5: Train Loss = 0.2143, Val MSE = 0.2099, Val MAE = 0.4100


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 6: Train Loss = 0.2024, Val MSE = 0.2054, Val MAE = 0.4063


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 7: Train Loss = 0.1956, Val MSE = 0.2020, Val MAE = 0.4029


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 8: Train Loss = 0.2007, Val MSE = 0.1997, Val MAE = 0.4001


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 9: Train Loss = 0.1940, Val MSE = 0.1978, Val MAE = 0.3973


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 10: Train Loss = 0.1894, Val MSE = 0.1966, Val MAE = 0.3949


Epoch 11/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 11: Train Loss = 0.1844, Val MSE = 0.1959, Val MAE = 0.3930

Starting training with LR: 1e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 1: Train Loss = 0.1897, Val MSE = 0.1986, Val MAE = 0.3721


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 2: Train Loss = 0.1731, Val MSE = 0.1883, Val MAE = 0.3683


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 3: Train Loss = 0.1616, Val MSE = 0.1860, Val MAE = 0.3658


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 4: Train Loss = 0.1705, Val MSE = 0.1842, Val MAE = 0.3632


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 5: Train Loss = 0.1593, Val MSE = 0.1805, Val MAE = 0.3608


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 6: Train Loss = 0.1574, Val MSE = 0.1787, Val MAE = 0.3581
✓ New best overall model (MSE: 0.1787)

Starting training with LR: 1e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.31it/s]


Epoch 1: Train Loss = 0.2269, Val MSE = 0.1875, Val MAE = 0.3860


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 2: Train Loss = 0.1776, Val MSE = 0.1856, Val MAE = 0.3733


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.29it/s]


Epoch 3: Train Loss = 0.1696, Val MSE = 0.1858, Val MAE = 0.3701
  ↪ No improvement for 1/3 epochs


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 4: Train Loss = 0.1727, Val MSE = 0.1840, Val MAE = 0.3672


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 5: Train Loss = 0.1630, Val MSE = 0.1820, Val MAE = 0.3646


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 6: Train Loss = 0.1610, Val MSE = 0.1791, Val MAE = 0.3627


Epoch 7/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.30it/s]


Epoch 7: Train Loss = 0.1627, Val MSE = 0.1770, Val MAE = 0.3606


Epoch 8/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.29it/s]


Epoch 8: Train Loss = 0.1604, Val MSE = 0.1752, Val MAE = 0.3588


Epoch 9/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.30it/s]


Epoch 9: Train Loss = 0.1610, Val MSE = 0.1758, Val MAE = 0.3554
  ↪ No improvement for 1/3 epochs
✓ New best overall model (MSE: 0.1752)

Starting training with LR: 1e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 1: Train Loss = 0.1823, Val MSE = 0.1941, Val MAE = 0.3736


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 2: Train Loss = 0.1833, Val MSE = 0.1930, Val MAE = 0.3700


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 3: Train Loss = 0.1807, Val MSE = 0.1895, Val MAE = 0.3680


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 4: Train Loss = 0.1758, Val MSE = 0.1852, Val MAE = 0.3663


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 5: Train Loss = 0.1693, Val MSE = 0.1843, Val MAE = 0.3633


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 6: Train Loss = 0.1672, Val MSE = 0.1823, Val MAE = 0.3607


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 7: Train Loss = 0.1693, Val MSE = 0.1822, Val MAE = 0.3577


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 8: Train Loss = 0.1678, Val MSE = 0.1774, Val MAE = 0.3560


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 9: Train Loss = 0.1608, Val MSE = 0.1758, Val MAE = 0.3533


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 10: Train Loss = 0.1652, Val MSE = 0.1723, Val MAE = 0.3513


Epoch 11/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 11: Train Loss = 0.1555, Val MSE = 0.1710, Val MAE = 0.3488
✓ New best overall model (MSE: 0.1710)

Starting training with LR: 2e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.31it/s]


Epoch 1: Train Loss = 0.1746, Val MSE = 0.1856, Val MAE = 0.3592


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 2: Train Loss = 0.1659, Val MSE = 0.1788, Val MAE = 0.3564


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 3: Train Loss = 0.1670, Val MSE = 0.1741, Val MAE = 0.3535


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 4: Train Loss = 0.1654, Val MSE = 0.1697, Val MAE = 0.3515


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 5: Train Loss = 0.1569, Val MSE = 0.1698, Val MAE = 0.3437
  ↪ No improvement for 1/3 epochs


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 6: Train Loss = 0.1520, Val MSE = 0.1669, Val MAE = 0.3401
✓ New best overall model (MSE: 0.1669)

Starting training with LR: 2e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 1: Train Loss = 0.2112, Val MSE = 0.1860, Val MAE = 0.3681


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 2: Train Loss = 0.1760, Val MSE = 0.1821, Val MAE = 0.3619


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 3: Train Loss = 0.1647, Val MSE = 0.1774, Val MAE = 0.3579


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 4: Train Loss = 0.1612, Val MSE = 0.1765, Val MAE = 0.3533


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 5: Train Loss = 0.1600, Val MSE = 0.1753, Val MAE = 0.3489


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 6: Train Loss = 0.1540, Val MSE = 0.1636, Val MAE = 0.3474


Epoch 7/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 7: Train Loss = 0.1515, Val MSE = 0.1662, Val MAE = 0.3408
  ↪ No improvement for 1/3 epochs


Epoch 8/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 8: Train Loss = 0.1433, Val MSE = 0.1598, Val MAE = 0.3369


Epoch 9/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 9: Train Loss = 0.1462, Val MSE = 0.1558, Val MAE = 0.3317
✓ New best overall model (MSE: 0.1558)

Starting training with LR: 2e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.30it/s]


Epoch 1: Train Loss = 0.1993, Val MSE = 0.1974, Val MAE = 0.3766


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 2: Train Loss = 0.1882, Val MSE = 0.1911, Val MAE = 0.3706


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 3: Train Loss = 0.1804, Val MSE = 0.1910, Val MAE = 0.3658


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 4: Train Loss = 0.1734, Val MSE = 0.1806, Val MAE = 0.3621


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 5: Train Loss = 0.1611, Val MSE = 0.1782, Val MAE = 0.3571


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 6: Train Loss = 0.1580, Val MSE = 0.1753, Val MAE = 0.3521


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 7: Train Loss = 0.1512, Val MSE = 0.1685, Val MAE = 0.3485


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 8: Train Loss = 0.1485, Val MSE = 0.1713, Val MAE = 0.3407
  ↪ No improvement for 1/3 epochs


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 9: Train Loss = 0.1422, Val MSE = 0.1582, Val MAE = 0.3374


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 10: Train Loss = 0.1365, Val MSE = 0.1569, Val MAE = 0.3269


Epoch 11/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 11: Train Loss = 0.1344, Val MSE = 0.1486, Val MAE = 0.3205
✓ New best overall model (MSE: 0.1486)

Starting training with LR: 3e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.36it/s]


Epoch 1: Train Loss = 0.1980, Val MSE = 0.1877, Val MAE = 0.3687


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 2: Train Loss = 0.1669, Val MSE = 0.1771, Val MAE = 0.3654


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 3: Train Loss = 0.1572, Val MSE = 0.1714, Val MAE = 0.3571


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 4: Train Loss = 0.1531, Val MSE = 0.1683, Val MAE = 0.3490


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 5: Train Loss = 0.1440, Val MSE = 0.1590, Val MAE = 0.3408


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 6: Train Loss = 0.1386, Val MSE = 0.1514, Val MAE = 0.3324

Starting training with LR: 3e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 1: Train Loss = 0.1736, Val MSE = 0.1649, Val MAE = 0.3474


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 2: Train Loss = 0.1608, Val MSE = 0.1628, Val MAE = 0.3394


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 3: Train Loss = 0.1563, Val MSE = 0.1543, Val MAE = 0.3326


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 4: Train Loss = 0.1506, Val MSE = 0.1474, Val MAE = 0.3258


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 5: Train Loss = 0.1456, Val MSE = 0.1437, Val MAE = 0.3167


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 6: Train Loss = 0.1365, Val MSE = 0.1435, Val MAE = 0.3081


Epoch 7/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 7: Train Loss = 0.1303, Val MSE = 0.1338, Val MAE = 0.2962


Epoch 8/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 8: Train Loss = 0.1237, Val MSE = 0.1236, Val MAE = 0.2822


Epoch 9/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 9: Train Loss = 0.1164, Val MSE = 0.1157, Val MAE = 0.2702
✓ New best overall model (MSE: 0.1157)

Starting training with LR: 3e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 1: Train Loss = 0.1764, Val MSE = 0.1667, Val MAE = 0.3512


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 2: Train Loss = 0.1691, Val MSE = 0.1605, Val MAE = 0.3456


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 3: Train Loss = 0.1599, Val MSE = 0.1561, Val MAE = 0.3378


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.17it/s]


Epoch 4: Train Loss = 0.1497, Val MSE = 0.1519, Val MAE = 0.3300


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.17it/s]


Epoch 5: Train Loss = 0.1426, Val MSE = 0.1481, Val MAE = 0.3194


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 6: Train Loss = 0.1384, Val MSE = 0.1377, Val MAE = 0.3098


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 7: Train Loss = 0.1297, Val MSE = 0.1337, Val MAE = 0.2970


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 8: Train Loss = 0.1246, Val MSE = 0.1216, Val MAE = 0.2854


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 9: Train Loss = 0.1092, Val MSE = 0.1157, Val MAE = 0.2712


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 10: Train Loss = 0.1089, Val MSE = 0.1176, Val MAE = 0.2641
  ↪ No improvement for 1/3 epochs


Epoch 11/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 11: Train Loss = 0.0996, Val MSE = 0.1087, Val MAE = 0.2572
✓ New best overall model (MSE: 0.1087)

Starting training with LR: 5e-05, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.36it/s]


Epoch 1: Train Loss = 0.1882, Val MSE = 0.1743, Val MAE = 0.3598


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 2: Train Loss = 0.1663, Val MSE = 0.1617, Val MAE = 0.3515


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 3: Train Loss = 0.1541, Val MSE = 0.1548, Val MAE = 0.3338


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 4: Train Loss = 0.1377, Val MSE = 0.1427, Val MAE = 0.3138


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 5: Train Loss = 0.1296, Val MSE = 0.1291, Val MAE = 0.2864


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 6: Train Loss = 0.1173, Val MSE = 0.1197, Val MAE = 0.2700

Starting training with LR: 5e-05, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 1: Train Loss = 0.1786, Val MSE = 0.2010, Val MAE = 0.3757


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 2: Train Loss = 0.1663, Val MSE = 0.1803, Val MAE = 0.3629


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 3: Train Loss = 0.1574, Val MSE = 0.1713, Val MAE = 0.3491


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 4: Train Loss = 0.1486, Val MSE = 0.1701, Val MAE = 0.3341


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 5: Train Loss = 0.1315, Val MSE = 0.1470, Val MAE = 0.3169


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 6: Train Loss = 0.1202, Val MSE = 0.1477, Val MAE = 0.2944
  ↪ No improvement for 1/3 epochs


Epoch 7/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 7: Train Loss = 0.1064, Val MSE = 0.1281, Val MAE = 0.2830


Epoch 8/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 8: Train Loss = 0.1036, Val MSE = 0.1208, Val MAE = 0.2651


Epoch 9/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 9: Train Loss = 0.0911, Val MSE = 0.1211, Val MAE = 0.2576
  ↪ No improvement for 1/3 epochs

Starting training with LR: 5e-05, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 1: Train Loss = 0.1755, Val MSE = 0.1796, Val MAE = 0.3602


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.29it/s]


Epoch 2: Train Loss = 0.1630, Val MSE = 0.1717, Val MAE = 0.3481


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 3: Train Loss = 0.1523, Val MSE = 0.1602, Val MAE = 0.3342


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 4: Train Loss = 0.1404, Val MSE = 0.1458, Val MAE = 0.3198


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 5: Train Loss = 0.1354, Val MSE = 0.1466, Val MAE = 0.3045
  ↪ No improvement for 1/3 epochs


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 6: Train Loss = 0.1229, Val MSE = 0.1244, Val MAE = 0.2810


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 7: Train Loss = 0.1062, Val MSE = 0.1160, Val MAE = 0.2666


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 8: Train Loss = 0.0984, Val MSE = 0.1138, Val MAE = 0.2562


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 9: Train Loss = 0.0885, Val MSE = 0.1193, Val MAE = 0.2590
  ↪ No improvement for 1/3 epochs


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 10: Train Loss = 0.0843, Val MSE = 0.1242, Val MAE = 0.2617
  ↪ No improvement for 2/3 epochs


Epoch 11/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.31it/s]


Epoch 11: Train Loss = 0.0748, Val MSE = 0.1036, Val MAE = 0.2439
✓ New best overall model (MSE: 0.1036)

Starting training with LR: 0.0001, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.31it/s]


Epoch 1: Train Loss = 0.1820, Val MSE = 0.1654, Val MAE = 0.3553


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 2: Train Loss = 0.1548, Val MSE = 0.1632, Val MAE = 0.3302


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 3: Train Loss = 0.1367, Val MSE = 0.1285, Val MAE = 0.2954


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.27it/s]


Epoch 4: Train Loss = 0.1155, Val MSE = 0.1123, Val MAE = 0.2552


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 5: Train Loss = 0.0984, Val MSE = 0.1013, Val MAE = 0.2459


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 6: Train Loss = 0.0890, Val MSE = 0.0978, Val MAE = 0.2373
✓ New best overall model (MSE: 0.0978)

Starting training with LR: 0.0001, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 1: Train Loss = 0.1657, Val MSE = 0.1466, Val MAE = 0.3301


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.16it/s]


Epoch 2: Train Loss = 0.1505, Val MSE = 0.1372, Val MAE = 0.3136


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.17it/s]


Epoch 3: Train Loss = 0.1292, Val MSE = 0.1240, Val MAE = 0.2841


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.17it/s]


Epoch 4: Train Loss = 0.1121, Val MSE = 0.1063, Val MAE = 0.2598


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.15it/s]


Epoch 5: Train Loss = 0.0984, Val MSE = 0.0962, Val MAE = 0.2534


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 6: Train Loss = 0.0898, Val MSE = 0.1024, Val MAE = 0.2436
  ↪ No improvement for 1/3 epochs


Epoch 7/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.18it/s]


Epoch 7: Train Loss = 0.0786, Val MSE = 0.0892, Val MAE = 0.2267


Epoch 8/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 8: Train Loss = 0.0646, Val MSE = 0.0880, Val MAE = 0.2281


Epoch 9/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.18it/s]


Epoch 9: Train Loss = 0.0556, Val MSE = 0.0920, Val MAE = 0.2241
  ↪ No improvement for 1/3 epochs
✓ New best overall model (MSE: 0.0880)

Starting training with LR: 0.0001, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 1: Train Loss = 0.1780, Val MSE = 0.1651, Val MAE = 0.3460


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 2: Train Loss = 0.1567, Val MSE = 0.1545, Val MAE = 0.3268


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.24it/s]


Epoch 3: Train Loss = 0.1412, Val MSE = 0.1295, Val MAE = 0.3034


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 4: Train Loss = 0.1150, Val MSE = 0.1072, Val MAE = 0.2551


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 5: Train Loss = 0.0967, Val MSE = 0.1035, Val MAE = 0.2473


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 6: Train Loss = 0.0843, Val MSE = 0.0898, Val MAE = 0.2235


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 7: Train Loss = 0.0734, Val MSE = 0.0881, Val MAE = 0.2243


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.15it/s]


Epoch 8: Train Loss = 0.0633, Val MSE = 0.0908, Val MAE = 0.2162
  ↪ No improvement for 1/3 epochs


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.16it/s]


Epoch 9: Train Loss = 0.0541, Val MSE = 0.1006, Val MAE = 0.2242
  ↪ No improvement for 2/3 epochs


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 10: Train Loss = 0.0508, Val MSE = 0.0899, Val MAE = 0.2031
  ↪ No improvement for 3/3 epochs
Stopping early after 10 epochs.

Starting training with LR: 0.001, Epochs: 6


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.28it/s]


Epoch 1: Train Loss = 0.1679, Val MSE = 0.1211, Val MAE = 0.2810


Epoch 2/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.22it/s]


Epoch 2: Train Loss = 0.1088, Val MSE = 0.1585, Val MAE = 0.3108
  ↪ No improvement for 1/3 epochs


Epoch 3/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 3: Train Loss = 0.0698, Val MSE = 0.0985, Val MAE = 0.2424


Epoch 4/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 4: Train Loss = 0.0447, Val MSE = 0.0868, Val MAE = 0.2075


Epoch 5/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.21it/s]


Epoch 5: Train Loss = 0.0379, Val MSE = 0.0878, Val MAE = 0.2124
  ↪ No improvement for 1/3 epochs


Epoch 6/6 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.19it/s]


Epoch 6: Train Loss = 0.0241, Val MSE = 0.0785, Val MAE = 0.2009
✓ New best overall model (MSE: 0.0785)

Starting training with LR: 0.001, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 1: Train Loss = 0.1847, Val MSE = 0.1428, Val MAE = 0.3274


Epoch 2/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 2: Train Loss = 0.1110, Val MSE = 0.1030, Val MAE = 0.2393


Epoch 3/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 3: Train Loss = 0.0650, Val MSE = 0.0786, Val MAE = 0.2049


Epoch 4/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.26it/s]


Epoch 4: Train Loss = 0.0390, Val MSE = 0.0884, Val MAE = 0.2230
  ↪ No improvement for 1/3 epochs


Epoch 5/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.25it/s]


Epoch 5: Train Loss = 0.0270, Val MSE = 0.0814, Val MAE = 0.2102
  ↪ No improvement for 2/3 epochs


Epoch 6/9 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.30it/s]


Epoch 6: Train Loss = 0.0218, Val MSE = 0.0838, Val MAE = 0.2009
  ↪ No improvement for 3/3 epochs
Stopping early after 6 epochs.

Starting training with LR: 0.001, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 1: Train Loss = 0.1653, Val MSE = 0.0933, Val MAE = 0.2477


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 2: Train Loss = 0.0987, Val MSE = 0.0948, Val MAE = 0.2389
  ↪ No improvement for 1/3 epochs


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.16it/s]


Epoch 3: Train Loss = 0.0722, Val MSE = 0.0778, Val MAE = 0.2024


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.18it/s]


Epoch 4: Train Loss = 0.0370, Val MSE = 0.0873, Val MAE = 0.2047
  ↪ No improvement for 1/3 epochs


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.23it/s]


Epoch 5: Train Loss = 0.0264, Val MSE = 0.0868, Val MAE = 0.2054
  ↪ No improvement for 2/3 epochs


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.20it/s]


Epoch 6: Train Loss = 0.0189, Val MSE = 0.0876, Val MAE = 0.2080
  ↪ No improvement for 3/3 epochs
Stopping early after 6 epochs.
✓ New best overall model (MSE: 0.0778)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 11
Batch Size: 16
Validation MSE: 0.0778
Test MSE: 0.0940

All results saved in the 'bert_grid_search_20250503_165741' directory.


## FinBert on Financial Phrasebank

FinBert Finetuning

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'yiyanghkust/finbert-pretrain'
lr = [5e-5]
epochs = [3]
task = 'FP'
dataset = dataset_FP
lora = False
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 5e-05, Epochs: 3


Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.38it/s, Loss=0.4782]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00, 11.25it/s]


Epoch 1/3 done!
Average training loss: 0.5066
Validation accuracy: 0.8446
✓ New best model (val_acc: 0.8446)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.42it/s, Loss=0.2629]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00, 11.27it/s]


Epoch 2/3 done!
Average training loss: 0.2123
Validation accuracy: 0.7923
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:30<00:00,  3.42it/s, Loss=0.0363]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00, 11.32it/s]


Epoch 3/3 done!
Average training loss: 0.0827
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Testing: 100%|██████████| 23/23 [00:02<00:00, 11.36it/s]


✓ New best overall model (accuracy: 0.8556)

Best Hyperparameters:
Learning Rate: 5e-05
Epochs: 3
Batch Size: 32
Validation Accuracy: 0.8459
Test Accuracy: 0.8556

All results saved in the 'bert_grid_search_20250503_172126' directory.


FinBert + LoRA

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'yiyanghkust/finbert-pretrain'
lr = [1e-3]
epochs = [3]
task = 'FP'
dataset = dataset_FP
lora = True
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 0.001, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109754115
trainable params: 887,043 || all params: 110,641,158 || trainable%: 0.8017


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  9.10it/s, Loss=0.4047]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.96it/s]


Epoch 1/3 done!
Average training loss: 0.5436
Validation accuracy: 0.8226
✓ New best model (val_acc: 0.8226)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.23it/s, Loss=0.3269]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 20.27it/s]


Epoch 2/3 done!
Average training loss: 0.3851
Validation accuracy: 0.8184
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.31it/s, Loss=0.1797]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 20.20it/s]


Epoch 3/3 done!
Average training loss: 0.3092
Validation accuracy: 0.8418
✓ New best model (val_acc: 0.8418)


Testing: 100%|██████████| 46/46 [00:02<00:00, 20.16it/s]


✓ New best overall model (accuracy: 0.8294)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 3
Batch Size: 16
Validation Accuracy: 0.8418
Test Accuracy: 0.8294

All results saved in the 'bert_grid_search_20250503_173501' directory.


FinBert + AdapterH

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'yiyanghkust/finbert-pretrain'
lr = [1e-4]
epochs = [9]
task = 'FP'
dataset = dataset_FP
lora = False
adapter = True

# Making folder
if adapter:
  os.makedirs("./adapters", exist_ok=True)

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 0.0001, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.42it/s, Loss=0.3071]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.51it/s]


Epoch 1/9 done!
Average training loss: 0.5815
Validation accuracy: 0.8116
✓ New best model (val_acc: 0.8116)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.44it/s, Loss=0.2812]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.62it/s]


Epoch 2/9 done!
Average training loss: 0.4003
Validation accuracy: 0.8239
✓ New best model (val_acc: 0.8239)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.48it/s, Loss=0.6098]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.65it/s]


Epoch 3/9 done!
Average training loss: 0.3463
Validation accuracy: 0.8377
✓ New best model (val_acc: 0.8377)


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.46it/s, Loss=0.3531]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.56it/s]


Epoch 4/9 done!
Average training loss: 0.2987
Validation accuracy: 0.8569
✓ New best model (val_acc: 0.8569)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.45it/s, Loss=0.0189]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.64it/s]


Epoch 5/9 done!
Average training loss: 0.2453
Validation accuracy: 0.8597
✓ New best model (val_acc: 0.8597)


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.45it/s, Loss=0.4377]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.61it/s]


Epoch 6/9 done!
Average training loss: 0.2239
Validation accuracy: 0.8473
⟳ No improvement for 1/3 epochs


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.45it/s, Loss=0.0145]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.64it/s]


Epoch 7/9 done!
Average training loss: 0.1826
Validation accuracy: 0.8583
⟳ No improvement for 2/3 epochs


Epoch 8/9 [Train]: 100%|██████████| 212/212 [00:22<00:00,  9.44it/s, Loss=0.0340]
Epoch 8/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 19.64it/s]


Epoch 8/9 done!
Average training loss: 0.1545
Validation accuracy: 0.8556
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 8


Testing: 100%|██████████| 46/46 [00:02<00:00, 19.68it/s]


✓ New best overall model (accuracy: 0.8446)

Best Hyperparameters:
Learning Rate: 0.0001
Epochs: 9
Batch Size: 16
Validation Accuracy: 0.8597
Test Accuracy: 0.8446

All results saved in the 'bert_grid_search_20250503_172627' directory.


## FinBert on FIQA

FinBert Finetuning

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'yiyanghkust/finbert-pretrain'
lr = [1e-4]
epochs = [5]
task = 'FIQA'
dataset = dataset_FIQA
lora = False
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 0.0001, Epochs: 5


Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.16it/s]


Epoch 1: Train Loss = 0.6293, Val MSE = 0.3072, Val MAE = 0.4264


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.08it/s]


Epoch 2: Train Loss = 0.2100, Val MSE = 0.2210, Val MAE = 0.4104


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.03it/s]


Epoch 3: Train Loss = 0.1603, Val MSE = 0.3007, Val MAE = 0.4394
  ↪ No improvement for 1/3 epochs


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.03it/s]


Epoch 4: Train Loss = 0.1288, Val MSE = 0.1044, Val MAE = 0.2456


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.08it/s]


Epoch 5: Train Loss = 0.0679, Val MSE = 0.0901, Val MAE = 0.2309
✓ New best overall model (MSE: 0.0901)

Best Hyperparameters:
Learning Rate: 0.0001
Epochs: 5
Batch Size: 16
Validation MSE: 0.0901
Test MSE: 0.1180

All results saved in the 'bert_grid_search_20250503_172350' directory.


FinBert + LoRA

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'yiyanghkust/finbert-pretrain'
lr = [1e-3]
epochs = [4]
task = 'FIQA'
dataset = dataset_FIQA
lora = True
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 0.001, Epochs: 4


Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.45it/s]


Epoch 1: Train Loss = 0.1704, Val MSE = 0.1222, Val MAE = 0.2798


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.41it/s]


Epoch 2: Train Loss = 0.0958, Val MSE = 0.1141, Val MAE = 0.2575


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.42it/s]


Epoch 3: Train Loss = 0.0521, Val MSE = 0.1068, Val MAE = 0.2518


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  9.41it/s]


Epoch 4: Train Loss = 0.0396, Val MSE = 0.0898, Val MAE = 0.2301
✓ New best overall model (MSE: 0.0898)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 4
Batch Size: 16
Validation MSE: 0.0898
Test MSE: 0.0965

All results saved in the 'bert_grid_search_20250503_172559' directory.


FinBert + AdapterH

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'yiyanghkust/finbert-pretrain'
lr = [1e-3]
epochs = [11]
task = 'FIQA'
dataset = dataset_FIQA
lora = False
adapter = True

# Making folder
if adapter:
  os.makedirs("./adapters", exist_ok=True)

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 0.001, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.45it/s]


Epoch 1: Train Loss = 0.1996, Val MSE = 0.1194, Val MAE = 0.2812


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.42it/s]


Epoch 2: Train Loss = 0.1006, Val MSE = 0.1224, Val MAE = 0.2721
  ↪ No improvement for 1/3 epochs


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.42it/s]


Epoch 3: Train Loss = 0.0676, Val MSE = 0.0868, Val MAE = 0.2313


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.42it/s]


Epoch 4: Train Loss = 0.0546, Val MSE = 0.0775, Val MAE = 0.2136


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 5: Train Loss = 0.0355, Val MSE = 0.0917, Val MAE = 0.2343
  ↪ No improvement for 1/3 epochs


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 6: Train Loss = 0.0284, Val MSE = 0.1338, Val MAE = 0.2907
  ↪ No improvement for 2/3 epochs


Epoch 7/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.45it/s]


Epoch 7: Train Loss = 0.0284, Val MSE = 0.0738, Val MAE = 0.2055


Epoch 8/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.44it/s]


Epoch 8: Train Loss = 0.0245, Val MSE = 0.0928, Val MAE = 0.2358
  ↪ No improvement for 1/3 epochs


Epoch 9/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.51it/s]


Epoch 9: Train Loss = 0.0208, Val MSE = 0.0775, Val MAE = 0.2097
  ↪ No improvement for 2/3 epochs


Epoch 10/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.49it/s]


Epoch 10: Train Loss = 0.0176, Val MSE = 0.0804, Val MAE = 0.2164
  ↪ No improvement for 3/3 epochs
Stopping early after 10 epochs.
✓ New best overall model (MSE: 0.0738)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 11
Batch Size: 16
Validation MSE: 0.0738
Test MSE: 0.0752

All results saved in the 'bert_grid_search_20250504_145511' directory.


## FlangBert on Financial Phrasebank

FlangBert Finetuning

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'SALT-NLP/FLANG-BERT'
lr = [5e-5]
epochs = [3]
task = 'FP'
dataset = dataset_FP
lora = False
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda


tokenizer_config.json:   0%|          | 0.00/369 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 5e-05, Epochs: 3


config.json:   0%|          | 0.00/664 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of BertModel were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/3 [Train]:   0%|          | 0/106 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Epoch 1/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.39it/s, Loss=0.3431]
Epoch 1/3 [Val]: 100%|██████████| 23/23 [00:02<00:00, 10.07it/s]


Epoch 1/3 done!
Average training loss: 0.5464
Validation accuracy: 0.8459
✓ New best model (val_acc: 0.8459)


Epoch 2/3 [Train]: 100%|██████████| 106/106 [00:30<00:00,  3.42it/s, Loss=0.2929]
Epoch 2/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.70it/s]


Epoch 2/3 done!
Average training loss: 0.2487
Validation accuracy: 0.8432
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 106/106 [00:31<00:00,  3.34it/s, Loss=0.1748]
Epoch 3/3 [Val]: 100%|██████████| 23/23 [00:02<00:00,  9.48it/s]


Epoch 3/3 done!
Average training loss: 0.1277
Validation accuracy: 0.8418
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 23/23 [00:02<00:00,  9.50it/s]


✓ New best overall model (accuracy: 0.8501)

Best Hyperparameters:
Learning Rate: 5e-05
Epochs: 3
Batch Size: 32
Validation Accuracy: 0.8459
Test Accuracy: 0.8501

All results saved in the 'bert_grid_search_20250503_180211' directory.


FlangBert + Lora

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'SALT-NLP/FLANG-BERT'
lr = [1e-3]
epochs = [3]
task = 'FP'
dataset = dataset_FP
lora = True
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 0.001, Epochs: 3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


number of trainable param:  109484547
trainable params: 887,043 || all params: 110,371,590 || trainable%: 0.8037


Epoch 1/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.87it/s, Loss=0.2589]
Epoch 1/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 18.14it/s]


Epoch 1/3 done!
Average training loss: 0.6216
Validation accuracy: 0.8514
✓ New best model (val_acc: 0.8514)


Epoch 2/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.0457]
Epoch 2/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 18.21it/s]


Epoch 2/3 done!
Average training loss: 0.3664
Validation accuracy: 0.8391
⟳ No improvement for 1/3 epochs


Epoch 3/3 [Train]: 100%|██████████| 212/212 [00:23<00:00,  8.96it/s, Loss=0.4855]
Epoch 3/3 [Val]: 100%|██████████| 46/46 [00:02<00:00, 18.22it/s]


Epoch 3/3 done!
Average training loss: 0.3222
Validation accuracy: 0.8377
⟳ No improvement for 2/3 epochs


Testing: 100%|██████████| 46/46 [00:02<00:00, 18.22it/s]


✓ New best overall model (accuracy: 0.8212)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 3
Batch Size: 16
Validation Accuracy: 0.8514
Test Accuracy: 0.8212

All results saved in the 'bert_grid_search_20250503_180406' directory.


FlangBert + AdapterH

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'SALT-NLP/FLANG-BERT'
lr = [1e-4]
epochs = [9]
task = 'FP'
dataset = dataset_FP
lora = False
adapter = True

# Making folder
if adapter:
  os.makedirs("./adapters", exist_ok=True)

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 3392, Val: 727, Test: 727

Starting training with LR: 0.0001, Epochs: 9


Some weights of BertAdapterModel were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.76it/s, Loss=0.5608]
Epoch 1/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.92it/s]


Epoch 1/9 done!
Average training loss: 0.6872
Validation accuracy: 0.8116
✓ New best model (val_acc: 0.8116)


Epoch 2/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.72it/s, Loss=0.0741]
Epoch 2/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.95it/s]


Epoch 2/9 done!
Average training loss: 0.4165
Validation accuracy: 0.8666
✓ New best model (val_acc: 0.8666)


Epoch 3/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.2953]
Epoch 3/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.90it/s]


Epoch 3/9 done!
Average training loss: 0.3459
Validation accuracy: 0.8459
⟳ No improvement for 1/3 epochs


Epoch 4/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.81it/s, Loss=0.6507]
Epoch 4/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.99it/s]


Epoch 4/9 done!
Average training loss: 0.3026
Validation accuracy: 0.8817
✓ New best model (val_acc: 0.8817)


Epoch 5/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.74it/s, Loss=0.4079]
Epoch 5/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.94it/s]


Epoch 5/9 done!
Average training loss: 0.2585
Validation accuracy: 0.8721
⟳ No improvement for 1/3 epochs


Epoch 6/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.77it/s, Loss=0.0749]
Epoch 6/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.94it/s]


Epoch 6/9 done!
Average training loss: 0.2176
Validation accuracy: 0.8638
⟳ No improvement for 2/3 epochs


Epoch 7/9 [Train]: 100%|██████████| 212/212 [00:24<00:00,  8.79it/s, Loss=0.2544]
Epoch 7/9 [Val]: 100%|██████████| 46/46 [00:02<00:00, 16.98it/s]


Epoch 7/9 done!
Average training loss: 0.1962
Validation accuracy: 0.8611
⟳ No improvement for 3/3 epochs
⏹ Early stopping at epoch 7


Testing: 100%|██████████| 46/46 [00:02<00:00, 16.93it/s]


✓ New best overall model (accuracy: 0.8542)

Best Hyperparameters:
Learning Rate: 0.0001
Epochs: 9
Batch Size: 16
Validation Accuracy: 0.8817
Test Accuracy: 0.8542

All results saved in the 'bert_grid_search_20250504_150851' directory.


## FlangBert on FIQA

FlangBert Finetuning

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'SALT-NLP/FLANG-BERT'
lr = [1e-4]
epochs = [5]
task = 'FIQA'
dataset = dataset_FIQA
lora = False
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 0.0001, Epochs: 5


Some weights of BertModel were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  5.95it/s]


Epoch 1: Train Loss = 0.1846, Val MSE = 0.1658, Val MAE = 0.3584


Epoch 2/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.04it/s]


Epoch 2: Train Loss = 0.1466, Val MSE = 0.0805, Val MAE = 0.2065


Epoch 3/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.02it/s]


Epoch 3: Train Loss = 0.0812, Val MSE = 0.0770, Val MAE = 0.1936


Epoch 4/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.02it/s]


Epoch 4: Train Loss = 0.0337, Val MSE = 0.0946, Val MAE = 0.2227
  ↪ No improvement for 1/3 epochs


Epoch 5/5 [Train]: 100%|██████████| 52/52 [00:08<00:00,  6.03it/s]


Epoch 5: Train Loss = 0.0201, Val MSE = 0.0736, Val MAE = 0.1945
✓ New best overall model (MSE: 0.0736)

Best Hyperparameters:
Learning Rate: 0.0001
Epochs: 5
Batch Size: 16
Validation MSE: 0.0736
Test MSE: 0.0733

All results saved in the 'bert_grid_search_20250503_180926' directory.


FlangBert + Lora

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'SALT-NLP/FLANG-BERT'
lr = [1e-3]
epochs = [4]
task = 'FIQA'
dataset = dataset_FIQA
lora = True
adapter = False

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 0.001, Epochs: 4


Some weights of BertModel were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.96it/s]


Epoch 1: Train Loss = 0.1672, Val MSE = 0.1676, Val MAE = 0.3537


Epoch 2/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.90it/s]


Epoch 2: Train Loss = 0.1290, Val MSE = 0.0867, Val MAE = 0.2162


Epoch 3/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.87it/s]


Epoch 3: Train Loss = 0.0666, Val MSE = 0.0859, Val MAE = 0.2117


Epoch 4/4 [Train]: 100%|██████████| 52/52 [00:05<00:00,  8.76it/s]


Epoch 4: Train Loss = 0.0358, Val MSE = 0.0618, Val MAE = 0.1782
✓ New best overall model (MSE: 0.0618)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 4
Batch Size: 16
Validation MSE: 0.0618
Test MSE: 0.0833

All results saved in the 'bert_grid_search_20250505_103931' directory.


FlangBert + AdapterH

In [None]:
# Defining parameters for hyperparameter_grid_search function
model = 'SALT-NLP/FLANG-BERT'
lr = [1e-3]
epochs = [11]
task = 'FIQA'
dataset = dataset_FIQA
lora = False
adapter = True

# Making folder
if adapter:
  os.makedirs("./adapters", exist_ok=True)

best_params = hyperparameter_grid_search(model, lr, epochs, task, dataset,lora=lora, adapter=adapter)

Using device: cuda
Dataset sizes - Train: 821, Val: 176, Test: 176

Starting training with LR: 0.001, Epochs: 11


Some weights of BertAdapterModel were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.38it/s]


Epoch 1: Train Loss = 0.1325, Val MSE = 0.1084, Val MAE = 0.2788


Epoch 2/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.36it/s]


Epoch 2: Train Loss = 0.0821, Val MSE = 0.0779, Val MAE = 0.1942


Epoch 3/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.32it/s]


Epoch 3: Train Loss = 0.0543, Val MSE = 0.0663, Val MAE = 0.1817


Epoch 4/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.41it/s]


Epoch 4: Train Loss = 0.0350, Val MSE = 0.0702, Val MAE = 0.1900
  ↪ No improvement for 1/3 epochs


Epoch 5/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 5: Train Loss = 0.0264, Val MSE = 0.0685, Val MAE = 0.1790
  ↪ No improvement for 2/3 epochs


Epoch 6/11 [Train]: 100%|██████████| 52/52 [00:06<00:00,  8.43it/s]


Epoch 6: Train Loss = 0.0200, Val MSE = 0.0769, Val MAE = 0.1903
  ↪ No improvement for 3/3 epochs
Stopping early after 6 epochs.
✓ New best overall model (MSE: 0.0663)

Best Hyperparameters:
Learning Rate: 0.001
Epochs: 11
Batch Size: 16
Validation MSE: 0.0663
Test MSE: 0.0702

All results saved in the 'bert_grid_search_20250503_181211' directory.


# News Headline Classification

In [None]:
import os
import pandas as pd
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.utils.data import TensorDataset, DataLoader, random_split
import torch.optim as optim
from sklearn.metrics import f1_score
from tqdm import tqdm
#LoRA & Adapter imports
from peft import get_peft_model, LoraConfig, TaskType
import adapters
from adapters import AdapterConfig


## Loading Data

In [None]:
import kagglehub
import shutil
from IPython.display import clear_output
# Download it from kaggle datasets and place it in google colab
path = kagglehub.dataset_download("daittan/gold-commodity-news-and-dimensions")

# Move everything from the dataset folder to /content
destination = "/content/"
for filename in os.listdir(path):
    full_file_path = os.path.join(path, filename)
    if os.path.isfile(full_file_path):
        shutil.copy(full_file_path, destination)

# Moved them in content folder for easy access
clear_output()
print("Path to dataset files:", path)
print(f"Files moved to {destination}")

Path to dataset files: /root/.cache/kagglehub/datasets/daittan/gold-commodity-news-and-dimensions/versions/1
Files moved to /content/


## Data Preprocessing

Function to preprocess the data with appropriate tokenizer for model type

In [None]:
def prepare_data(model_name, dataset_path, batch_size, seed=42):
    label_names = [
        'Price or Not', 'Direction Up', 'Direction Constant', 'Direction Down',
        'PastPrice', 'FuturePrice', 'PastNews', 'FutureNews', 'Asset Comparision'
    ]
    df = pd.read_csv(dataset_path)
    # import tokenizer depending on model name
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    tokens = tokenizer(df['News'].astype(str).tolist(), padding=True, truncation=True, return_tensors='pt')
    input_ids = tokens['input_ids']
    attention_masks = tokens['attention_mask']
    labels_tensor = torch.FloatTensor(df[label_names].values.astype(int))

    dataset = TensorDataset(input_ids, attention_masks, labels_tensor)

    torch.manual_seed(seed)
    total = len(dataset)
    test_size = int(0.2 * total)
    val_size = int(0.1 * total)
    train_size = total - val_size - test_size
    train_ds, val_ds, test_ds = random_split(dataset, [train_size, val_size, test_size])

    dataloaders = {
        'train': DataLoader(train_ds, batch_size=batch_size, shuffle=True),
        'val':   DataLoader(val_ds, batch_size=batch_size, shuffle=False),
        'test':  DataLoader(test_ds, batch_size=batch_size, shuffle=False)
    }

    return dataloaders, label_names


## Training function

In [None]:
# this function is for computing weighted average across classes
def compute_per_class_f1(y_true, y_pred):
    scores = []
    for i in range(y_true.shape[1]):
        f1 = f1_score(y_true[:, i], y_pred[:, i], average='weighted', zero_division=0)
        scores.append(f1)
    return np.array(scores), np.mean(scores)

In [None]:
def train_model( model_name, method, learning_rate, num_epochs, batch_size, seed, device, label_names, dataloaders):
    # Setup
    torch.manual_seed(seed)
    np.random.seed(seed)
    num_labels = len(label_names)
    patience = 3
    eps = 1e-2
    threshold = 0.5

    # loading the model
    base_model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=num_labels,
        problem_type='multi_label_classification'
    )

    # depending on method, apply appropriate method for the model
    if method == 'lora':
        lora_config = LoraConfig(
            task_type=TaskType.SEQ_CLS,
            inference_mode=False,
            r=8, lora_alpha=16, lora_dropout=0.05,
            target_modules=["query", "key", "value"],
            bias="none"
        )
        model = get_peft_model(base_model, lora_config)

        # printing number of trainable parameter, to check does lora method applied
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        print(f"Trainable parameters in Lora case: {trainable_params:,} / {total_params:,}",f"({100 * trainable_params / total_params:.2f}%)")
    elif method == 'adapter':
        adapters.init(base_model)
        adapter_config = AdapterConfig.load("houlsby", reduction_factor=16)
        adapter_name = "finance_adapter"
        base_model.add_adapter(adapter_name, config=adapter_config)
        base_model.train_adapter(adapter_name)
        base_model.set_active_adapters(adapter_name)
        model = base_model

        # printing number of trainable parameter, to check does adapter method applied
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        print(f"Trainable parameters in Adapter case: {trainable_params:,} / {total_params:,}",f"({100 * trainable_params / total_params:.2f}%)")
    else:
        model = base_model
        # printing number of trainable parameter, just for checking
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        print(f"Trainable parameters in Bert case: {trainable_params:,} / {total_params:,}",f"({100 * trainable_params / total_params:.2f}%)")

    model.to(device)
    optimizer = optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=learning_rate)


    best_val_loss = float('inf')
    best_val_f1 = 0.0
    early_stop_counter = 0

    # Actual training loop
    for epoch in range(1, num_epochs + 1):
        if early_stop_counter >= patience:
            print("  ==> Early stopping triggered.")
            break

        print(f"\n=> Epoch {epoch}/{num_epochs}")
        model.train()
        train_loss = 0.0
        train_loop = tqdm(dataloaders['train'], desc="Training", leave=False)
        for ids, masks, lbls in train_loop:
            ids, masks, lbls = ids.to(device), masks.to(device), lbls.to(device)
            optimizer.zero_grad()
            loss = model(input_ids=ids, attention_mask=masks, labels=lbls).loss
            loss.backward()
            optimizer.step()
            train_loss += loss.item() * ids.size(0)
            train_loop.set_postfix(loss=loss.item())
        train_loss /= len(dataloaders['train'].dataset)

        # Validation
        model.eval()
        val_loss, all_true, all_pred = 0.0, [], []
        val_loop = tqdm(dataloaders['val'], desc="Validating", leave=False)
        with torch.no_grad():
            for ids, masks, lbls in val_loop:
                ids, masks, lbls = ids.to(device), masks.to(device), lbls.to(device)
                outputs = model(input_ids=ids, attention_mask=masks, labels=lbls)
                val_loss += outputs.loss.item() * ids.size(0)
                logits = outputs.logits
                preds = (torch.sigmoid(logits) > threshold).int()
                all_true.append(lbls.int())
                all_pred.append(preds)
        val_loss /= len(dataloaders['val'].dataset)
        y_true = torch.cat(all_true, dim=0).cpu().numpy()
        y_pred = torch.cat(all_pred, dim=0).cpu().numpy()
        per_class_f1, val_f1 = compute_per_class_f1(y_true, y_pred)

        # Print per-class weighted F1 scores
        print(f"\n=== Validation Results (Epoch {epoch}):")
        for idx, label in enumerate(label_names):
            print(f"  - {label}: {per_class_f1[idx]:.4f}")
        print(f"  === Mean Weighted F1: {val_f1:.4f}")

        if val_loss < best_val_loss - eps or val_f1 > best_val_f1 + eps:
            best_val_loss = min(best_val_loss, val_loss)
            best_val_f1 = max(best_val_f1, val_f1)
            early_stop_counter = 0
        else:
            early_stop_counter += 1

    # Test
    model.eval()
    test_loss, all_true, all_pred = 0.0, [], []
    test_loop = tqdm(dataloaders['test'], desc="Testing", leave=False)
    with torch.no_grad():
        for ids, masks, lbls in test_loop:
            ids, masks, lbls = ids.to(device), masks.to(device), lbls.to(device)
            outputs = model(input_ids=ids, attention_mask=masks, labels=lbls)
            test_loss += outputs.loss.item() * ids.size(0)
            preds = (torch.sigmoid(outputs.logits) > threshold).int()
            all_true.append(lbls.int())
            all_pred.append(preds)
    test_loss /= len(dataloaders['test'].dataset)
    y_true = torch.cat(all_true, dim=0).cpu().numpy()
    y_pred = torch.cat(all_pred, dim=0).cpu().numpy()
    _, test_f1 = compute_per_class_f1(y_true, y_pred)

    return {
        "LR": learning_rate,
        "Epochs": num_epochs,
        "ValLoss": best_val_loss,
        "ValMeanF1": best_val_f1,
        "TestLoss": test_loss,
        "TestMeanF1": test_f1
    }

## Hyperparameter Search

In [None]:
def run_hyperparameter_search( model_name, method='full', dataset_path='/content/finalDataset_0208.csv', output_dir='HyperParamSearchResults', learning_rates=[1e-5, 5e-5],
                              epochs_list=[3, 5], batch_size=16, seed=42 ):
    # making directory for saving the results
    os.makedirs(output_dir, exist_ok=True)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Prepare data ONCE
    dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

    all_results = []
    total_runs = len(learning_rates) * len(epochs_list)
    run_id = 0
    # looping through all the hyperparameters
    for lr in learning_rates:
        for epochs in epochs_list:
            run_id += 1
            print(f"\n🔍 Run {run_id}/{total_runs} | LR={lr}, Epochs={epochs}")
            result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )
            result["Method"] = method
            all_results.append(result)
    # saving all the results to the csv file, however we also print it
    df_results = pd.DataFrame(all_results)
    df_results.to_csv(f"{output_dir}/grid_{method}_{model_name.replace('/', '_')}.csv", index=False)

    best_row = df_results.loc[df_results["ValMeanF1"].idxmax()]
    print("\n==> Best Configuration Found:")
    print(best_row)


### Hyperparameter search for BERT in FULL fine tuning

Best hyperparameter which we found on this is:
- lr: 1e-5
- epoch:3

In [None]:
run_hyperparameter_search(
    model_name="bert-base-uncased",
    method="full",
    learning_rates=[1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3],
    epochs_list=[3,4, 5],
    batch_size=8 # as in the original paper
)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 1/21 | LR=1e-06, Epochs=3
Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4303
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7119
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7502

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.7985
  - Direction Up: 0.6499
  - Direction Constant: 0.9386
  - Direction Down: 0.6773
  - PastPrice: 0.8488
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9882
  - Asset Comparision: 0.7099
  === Mean Weighted F1: 0.8143

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9021
  - Direction Up: 0.8191
  - Direction Constant: 0.9386
  - Direction Down: 0.8523
  - PastPrice: 0.9123
  - FuturePrice: 0.9503
  - PastNews: 0.8908
  - FutureNews: 0.9882
  - Asset Comparision: 0.8951
  === Mean Weighted F1: 0.9054





🔍 Run 2/21 | LR=1e-06, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4303
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7119
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7502

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.7985
  - Direction Up: 0.6499
  - Direction Constant: 0.9386
  - Direction Down: 0.6773
  - PastPrice: 0.8488
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9882
  - Asset Comparision: 0.7099
  === Mean Weighted F1: 0.8143

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9021
  - Direction Up: 0.8191
  - Direction Constant: 0.9386
  - Direction Down: 0.8523
  - PastPrice: 0.9123
  - FuturePrice: 0.9503
  - PastNews: 0.8908
  - FutureNews: 0.9882
  - Asset Comparision: 0.8951
  === Mean Weighted F1: 0.9054

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9274
  - Direction Up: 0.8974
  - Direction Constant: 0.9386
  - Direction Down: 0.9369
  - PastPrice: 0.9277
  - FuturePrice: 0.9525
  - PastNews: 0.9222
  - FutureNews: 0.9882
  - Asset Comparision: 0.9232
  === Mean Weighted F1: 0.9349


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 3/21 | LR=1e-06, Epochs=5
Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4303
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7119
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7502

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.7985
  - Direction Up: 0.6499
  - Direction Constant: 0.9386
  - Direction Down: 0.6773
  - PastPrice: 0.8488
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9882
  - Asset Comparision: 0.7099
  === Mean Weighted F1: 0.8143

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9021
  - Direction Up: 0.8191
  - Direction Constant: 0.9386
  - Direction Down: 0.8523
  - PastPrice: 0.9123
  - FuturePrice: 0.9503
  - PastNews: 0.8908
  - FutureNews: 0.9882
  - Asset Comparision: 0.8951
  === Mean Weighted F1: 0.9054

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9274
  - Direction Up: 0.8974
  - Direction Constant: 0.9386
  - Direction Down: 0.9369
  - PastPrice: 0.9277
  - FuturePrice: 0.9525
  - PastNews: 0.9222
  - FutureNews: 0.9882
  - Asset Comparision: 0.9232
  === Mean Weighted F1: 0.9349

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9295
  - Direction Up: 0.8979
  - Direction Constant: 0.9386
  - Direction Down: 0.9454
  - PastPrice: 0.9270
  - FuturePrice: 0.9620
  - PastNews: 0.9259
  - FutureNews: 0.9882
  - Asset Comparision: 0.9599
  === Mean Weighted F1: 0.9416





🔍 Run 4/21 | LR=1e-05, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9257
  - Direction Up: 0.9474
  - Direction Constant: 0.9386
  - Direction Down: 0.9536
  - PastPrice: 0.9164
  - FuturePrice: 0.9794
  - PastNews: 0.9235
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9518

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9424
  - Direction Up: 0.9457
  - Direction Constant: 0.9521
  - Direction Down: 0.9563
  - PastPrice: 0.9271
  - FuturePrice: 0.9756
  - PastNews: 0.9399
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9585

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9577
  - Direction Up: 0.9508
  - Direction Constant: 0.9791
  - Direction Down: 0.9666
  - PastPrice: 0.9532
  - FuturePrice: 0.9899
  - PastNews: 0.9545
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9710





🔍 Run 5/21 | LR=1e-05, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9257
  - Direction Up: 0.9474
  - Direction Constant: 0.9386
  - Direction Down: 0.9536
  - PastPrice: 0.9164
  - FuturePrice: 0.9794
  - PastNews: 0.9235
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9518

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9424
  - Direction Up: 0.9457
  - Direction Constant: 0.9521
  - Direction Down: 0.9563
  - PastPrice: 0.9271
  - FuturePrice: 0.9756
  - PastNews: 0.9399
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9585

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9577
  - Direction Up: 0.9508
  - Direction Constant: 0.9791
  - Direction Down: 0.9666
  - PastPrice: 0.9532
  - FuturePrice: 0.9899
  - PastNews: 0.9545
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9710

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9388
  - Direction Up: 0.9492
  - Direction Constant: 0.9786
  - Direction Down: 0.9650
  - PastPrice: 0.9373
  - FuturePrice: 0.9887
  - PastNews: 0.9364
  - FutureNews: 0.9877
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9645





🔍 Run 6/21 | LR=1e-05, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9257
  - Direction Up: 0.9474
  - Direction Constant: 0.9386
  - Direction Down: 0.9536
  - PastPrice: 0.9164
  - FuturePrice: 0.9794
  - PastNews: 0.9235
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9518

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9424
  - Direction Up: 0.9457
  - Direction Constant: 0.9521
  - Direction Down: 0.9563
  - PastPrice: 0.9271
  - FuturePrice: 0.9756
  - PastNews: 0.9399
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9585

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9577
  - Direction Up: 0.9508
  - Direction Constant: 0.9791
  - Direction Down: 0.9666
  - PastPrice: 0.9532
  - FuturePrice: 0.9899
  - PastNews: 0.9545
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9710

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9388
  - Direction Up: 0.9492
  - Direction Constant: 0.9786
  - Direction Down: 0.9650
  - PastPrice: 0.9373
  - FuturePrice: 0.9887
  - PastNews: 0.9364
  - FutureNews: 0.9877
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9645

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9387
  - Direction Up: 0.9310
  - Direction Constant: 0.9809
  - Direction Down: 0.9622
  - PastPrice: 0.9387
  - FuturePrice: 0.9900
  - PastNews: 0.9344
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9627





🔍 Run 7/21 | LR=2e-05, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9347
  - Direction Up: 0.9472
  - Direction Constant: 0.9516
  - Direction Down: 0.9580
  - PastPrice: 0.9350
  - FuturePrice: 0.9794
  - PastNews: 0.9313
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9583

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9357
  - Direction Up: 0.9319
  - Direction Constant: 0.9818
  - Direction Down: 0.9580
  - PastPrice: 0.9120
  - FuturePrice: 0.9748
  - PastNews: 0.9362
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9576

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9522
  - Direction Up: 0.9409
  - Direction Constant: 0.9780
  - Direction Down: 0.9562
  - PastPrice: 0.9485
  - FuturePrice: 0.9899
  - PastNews: 0.9496
  - FutureNews: 0.9873
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9670





🔍 Run 8/21 | LR=2e-05, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9347
  - Direction Up: 0.9472
  - Direction Constant: 0.9516
  - Direction Down: 0.9580
  - PastPrice: 0.9350
  - FuturePrice: 0.9794
  - PastNews: 0.9313
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9583

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9357
  - Direction Up: 0.9319
  - Direction Constant: 0.9818
  - Direction Down: 0.9580
  - PastPrice: 0.9120
  - FuturePrice: 0.9748
  - PastNews: 0.9362
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9576

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9522
  - Direction Up: 0.9409
  - Direction Constant: 0.9780
  - Direction Down: 0.9562
  - PastPrice: 0.9485
  - FuturePrice: 0.9899
  - PastNews: 0.9496
  - FutureNews: 0.9873
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9670

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9453
  - Direction Up: 0.9492
  - Direction Constant: 0.9780
  - Direction Down: 0.9580
  - PastPrice: 0.9389
  - FuturePrice: 0.9909
  - PastNews: 0.9415
  - FutureNews: 0.9904
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9658





🔍 Run 9/21 | LR=2e-05, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9347
  - Direction Up: 0.9472
  - Direction Constant: 0.9516
  - Direction Down: 0.9580
  - PastPrice: 0.9350
  - FuturePrice: 0.9794
  - PastNews: 0.9313
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9583

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9357
  - Direction Up: 0.9319
  - Direction Constant: 0.9818
  - Direction Down: 0.9580
  - PastPrice: 0.9120
  - FuturePrice: 0.9748
  - PastNews: 0.9362
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9576

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9522
  - Direction Up: 0.9409
  - Direction Constant: 0.9780
  - Direction Down: 0.9562
  - PastPrice: 0.9485
  - FuturePrice: 0.9899
  - PastNews: 0.9496
  - FutureNews: 0.9873
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9670

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9453
  - Direction Up: 0.9492
  - Direction Constant: 0.9780
  - Direction Down: 0.9580
  - PastPrice: 0.9389
  - FuturePrice: 0.9909
  - PastNews: 0.9415
  - FutureNews: 0.9904
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9658

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9553
  - Direction Up: 0.9474
  - Direction Constant: 0.9821
  - Direction Down: 0.9572
  - PastPrice: 0.9498
  - FuturePrice: 0.9899
  - PastNews: 0.9500
  - FutureNews: 0.9869
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9687





🔍 Run 10/21 | LR=3e-05, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9348
  - Direction Up: 0.9401
  - Direction Constant: 0.9717
  - Direction Down: 0.9597
  - PastPrice: 0.9378
  - FuturePrice: 0.9812
  - PastNews: 0.9226
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9595

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9345
  - Direction Up: 0.9467
  - Direction Constant: 0.9810
  - Direction Down: 0.9589
  - PastPrice: 0.9275
  - FuturePrice: 0.9844
  - PastNews: 0.9284
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9611

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9584
  - Direction Up: 0.9463
  - Direction Constant: 0.9795
  - Direction Down: 0.9520
  - PastPrice: 0.9530
  - FuturePrice: 0.9927
  - PastNews: 0.9493
  - FutureNews: 0.9875
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9687





🔍 Run 11/21 | LR=3e-05, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9348
  - Direction Up: 0.9401
  - Direction Constant: 0.9717
  - Direction Down: 0.9597
  - PastPrice: 0.9378
  - FuturePrice: 0.9812
  - PastNews: 0.9226
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9595

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9345
  - Direction Up: 0.9467
  - Direction Constant: 0.9810
  - Direction Down: 0.9589
  - PastPrice: 0.9275
  - FuturePrice: 0.9844
  - PastNews: 0.9284
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9611

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9584
  - Direction Up: 0.9463
  - Direction Constant: 0.9795
  - Direction Down: 0.9520
  - PastPrice: 0.9530
  - FuturePrice: 0.9927
  - PastNews: 0.9493
  - FutureNews: 0.9875
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9687

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9417
  - Direction Up: 0.9501
  - Direction Constant: 0.9762
  - Direction Down: 0.9587
  - PastPrice: 0.9432
  - FuturePrice: 0.9897
  - PastNews: 0.9406
  - FutureNews: 0.9898
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9655


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 12/21 | LR=3e-05, Epochs=5
Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9348
  - Direction Up: 0.9401
  - Direction Constant: 0.9717
  - Direction Down: 0.9597
  - PastPrice: 0.9378
  - FuturePrice: 0.9812
  - PastNews: 0.9226
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9595

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9345
  - Direction Up: 0.9467
  - Direction Constant: 0.9810
  - Direction Down: 0.9589
  - PastPrice: 0.9275
  - FuturePrice: 0.9844
  - PastNews: 0.9284
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9611

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9584
  - Direction Up: 0.9463
  - Direction Constant: 0.9795
  - Direction Down: 0.9520
  - PastPrice: 0.9530
  - FuturePrice: 0.9927
  - PastNews: 0.9493
  - FutureNews: 0.9875
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9687

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9417
  - Direction Up: 0.9501
  - Direction Constant: 0.9762
  - Direction Down: 0.9587
  - PastPrice: 0.9432
  - FuturePrice: 0.9897
  - PastNews: 0.9406
  - FutureNews: 0.9898
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9655

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9417
  - Direction Up: 0.9405
  - Direction Constant: 0.9782
  - Direction Down: 0.9520
  - PastPrice: 0.9364
  - FuturePrice: 0.9900
  - PastNews: 0.9390
  - FutureNews: 0.9913
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9632





🔍 Run 13/21 | LR=5e-05, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9406
  - Direction Up: 0.9320
  - Direction Constant: 0.9785
  - Direction Down: 0.9554
  - PastPrice: 0.9367
  - FuturePrice: 0.9847
  - PastNews: 0.9367
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9612

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9376
  - Direction Up: 0.9423
  - Direction Constant: 0.9833
  - Direction Down: 0.9520
  - PastPrice: 0.9289
  - FuturePrice: 0.9880
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9619

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9427
  - Direction Up: 0.9500
  - Direction Constant: 0.9793
  - Direction Down: 0.9446
  - PastPrice: 0.9447
  - FuturePrice: 0.9907
  - PastNews: 0.9277
  - FutureNews: 0.9872
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9630


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 14/21 | LR=5e-05, Epochs=4
Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9406
  - Direction Up: 0.9320
  - Direction Constant: 0.9785
  - Direction Down: 0.9554
  - PastPrice: 0.9367
  - FuturePrice: 0.9847
  - PastNews: 0.9367
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9612

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9376
  - Direction Up: 0.9423
  - Direction Constant: 0.9833
  - Direction Down: 0.9520
  - PastPrice: 0.9289
  - FuturePrice: 0.9880
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9619

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9427
  - Direction Up: 0.9500
  - Direction Constant: 0.9793
  - Direction Down: 0.9446
  - PastPrice: 0.9447
  - FuturePrice: 0.9907
  - PastNews: 0.9277
  - FutureNews: 0.9872
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9630

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9451
  - Direction Up: 0.9388
  - Direction Constant: 0.9780
  - Direction Down: 0.9560
  - PastPrice: 0.9439
  - FuturePrice: 0.9875
  - PastNews: 0.9440
  - FutureNews: 0.9902
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9640





🔍 Run 15/21 | LR=5e-05, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9406
  - Direction Up: 0.9320
  - Direction Constant: 0.9785
  - Direction Down: 0.9554
  - PastPrice: 0.9367
  - FuturePrice: 0.9847
  - PastNews: 0.9367
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9612

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9376
  - Direction Up: 0.9423
  - Direction Constant: 0.9833
  - Direction Down: 0.9520
  - PastPrice: 0.9289
  - FuturePrice: 0.9880
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9619

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9427
  - Direction Up: 0.9500
  - Direction Constant: 0.9793
  - Direction Down: 0.9446
  - PastPrice: 0.9447
  - FuturePrice: 0.9907
  - PastNews: 0.9277
  - FutureNews: 0.9872
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9630

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9451
  - Direction Up: 0.9388
  - Direction Constant: 0.9780
  - Direction Down: 0.9560
  - PastPrice: 0.9439
  - FuturePrice: 0.9875
  - PastNews: 0.9440
  - FutureNews: 0.9902
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9640
  ==> Early stopping triggered.





🔍 Run 16/21 | LR=0.0001, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9113
  - Direction Up: 0.9241
  - Direction Constant: 0.9742
  - Direction Down: 0.9409
  - PastPrice: 0.9200
  - FuturePrice: 0.9839
  - PastNews: 0.9005
  - FutureNews: 0.9882
  - Asset Comparision: 0.9903
  === Mean Weighted F1: 0.9482

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9345
  - Direction Up: 0.9344
  - Direction Constant: 0.9802
  - Direction Down: 0.9444
  - PastPrice: 0.9035
  - FuturePrice: 0.9853
  - PastNews: 0.9317
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9557

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9513
  - Direction Up: 0.9351
  - Direction Constant: 0.9786
  - Direction Down: 0.9357
  - PastPrice: 0.9433
  - FuturePrice: 0.9868
  - PastNews: 0.9450
  - FutureNews: 0.9860
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9622





🔍 Run 17/21 | LR=0.0001, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9113
  - Direction Up: 0.9241
  - Direction Constant: 0.9742
  - Direction Down: 0.9409
  - PastPrice: 0.9200
  - FuturePrice: 0.9839
  - PastNews: 0.9005
  - FutureNews: 0.9882
  - Asset Comparision: 0.9903
  === Mean Weighted F1: 0.9482

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9345
  - Direction Up: 0.9344
  - Direction Constant: 0.9802
  - Direction Down: 0.9444
  - PastPrice: 0.9035
  - FuturePrice: 0.9853
  - PastNews: 0.9317
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9557

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9513
  - Direction Up: 0.9351
  - Direction Constant: 0.9786
  - Direction Down: 0.9357
  - PastPrice: 0.9433
  - FuturePrice: 0.9868
  - PastNews: 0.9450
  - FutureNews: 0.9860
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9622

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9478
  - Direction Up: 0.9353
  - Direction Constant: 0.9783
  - Direction Down: 0.9341
  - PastPrice: 0.9350
  - FuturePrice: 0.9823
  - PastNews: 0.9385
  - FutureNews: 0.9891
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9600





🔍 Run 18/21 | LR=0.0001, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9113
  - Direction Up: 0.9241
  - Direction Constant: 0.9742
  - Direction Down: 0.9409
  - PastPrice: 0.9200
  - FuturePrice: 0.9839
  - PastNews: 0.9005
  - FutureNews: 0.9882
  - Asset Comparision: 0.9903
  === Mean Weighted F1: 0.9482

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9345
  - Direction Up: 0.9344
  - Direction Constant: 0.9802
  - Direction Down: 0.9444
  - PastPrice: 0.9035
  - FuturePrice: 0.9853
  - PastNews: 0.9317
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9557

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9513
  - Direction Up: 0.9351
  - Direction Constant: 0.9786
  - Direction Down: 0.9357
  - PastPrice: 0.9433
  - FuturePrice: 0.9868
  - PastNews: 0.9450
  - FutureNews: 0.9860
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9622

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9478
  - Direction Up: 0.9353
  - Direction Constant: 0.9783
  - Direction Down: 0.9341
  - PastPrice: 0.9350
  - FuturePrice: 0.9823
  - PastNews: 0.9385
  - FutureNews: 0.9891
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9600

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9326
  - Direction Up: 0.9232
  - Direction Constant: 0.9756
  - Direction Down: 0.9349
  - PastPrice: 0.9333
  - FuturePrice: 0.9860
  - PastNews: 0.9283
  - FutureNews: 0.9882
  - Asset Comparision: 0.9974
  === Mean Weighted F1: 0.9555





🔍 Run 19/21 | LR=0.001, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 20/21 | LR=0.001, Epochs=4
Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483





🔍 Run 21/21 | LR=0.001, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483
  ==> Early stopping triggered.


                                                          


==> Best Configuration Found:
LR             0.00001
Epochs               3
ValLoss       0.098205
ValMeanF1     0.971015
TestLoss      0.096368
TestMeanF1    0.968794
Method            full
Name: 3, dtype: object




### Hyperparameter search for BERT in Lora fine tuning

Best hyperparameter which we found on this is:
- lr: 1e-3
- epoch: 3

In [None]:
run_hyperparameter_search(
    model_name="bert-base-uncased",
    method="lora",
    learning_rates=[1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3],
    epochs_list=[3,4, 5],
    batch_size=16 # as in the original paper
)


🔍 Run 1/21 | LR=1e-06, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.3117
  - Direction Constant: 0.9386
  - Direction Down: 0.2034
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.4158
  - FutureNews: 0.0001
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.5537

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.3682
  - Direction Constant: 0.9386
  - Direction Down: 0.2197
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7588
  - FutureNews: 0.2641
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.6293

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4875
  - Direction Constant: 0.9386
  - Direction Down: 0.2134
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7710
  - FutureNews: 0.9776
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7224





🔍 Run 2/21 | LR=1e-06, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.3117
  - Direction Constant: 0.9386
  - Direction Down: 0.2034
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.4158
  - FutureNews: 0.0001
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.5537

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.3682
  - Direction Constant: 0.9386
  - Direction Down: 0.2197
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7588
  - FutureNews: 0.2641
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.6293

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4875
  - Direction Constant: 0.9386
  - Direction Down: 0.2134
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7710
  - FutureNews: 0.9776
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7224

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4144
  - Direction Constant: 0.9386
  - Direction Down: 0.4117
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7369





🔍 Run 3/21 | LR=1e-06, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.3117
  - Direction Constant: 0.9386
  - Direction Down: 0.2034
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.4158
  - FutureNews: 0.0001
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.5537

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.3682
  - Direction Constant: 0.9386
  - Direction Down: 0.2197
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7588
  - FutureNews: 0.2641
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.6293

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4875
  - Direction Constant: 0.9386
  - Direction Down: 0.2134
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7710
  - FutureNews: 0.9776
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7224

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4144
  - Direction Constant: 0.9386
  - Direction Down: 0.4117
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7369

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.7541
  - Direction Up: 0.4160
  - Direction Constant: 0.9386
  - Direction Down: 0.5093
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7479


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 4/21 | LR=1e-05, Epochs=3
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.7782
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7280
  - FuturePrice: 0.9503
  - PastNews: 0.7855
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7555


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 5/21 | LR=1e-05, Epochs=4
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.7782
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7280
  - FuturePrice: 0.9503
  - PastNews: 0.7855
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7555

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.8501
  - Direction Up: 0.4692
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8596
  - FuturePrice: 0.9499
  - PastNews: 0.8364
  - FutureNews: 0.9873
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7894


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 6/21 | LR=1e-05, Epochs=5
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.7782
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7280
  - FuturePrice: 0.9503
  - PastNews: 0.7855
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7555

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.8501
  - Direction Up: 0.4692
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8596
  - FuturePrice: 0.9499
  - PastNews: 0.8364
  - FutureNews: 0.9873
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7894

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.8813
  - Direction Up: 0.6252
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8915
  - FuturePrice: 0.9499
  - PastNews: 0.8707
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8176


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 7/21 | LR=2e-05, Epochs=3
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.8527
  - Direction Up: 0.4251
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8563
  - FuturePrice: 0.9499
  - PastNews: 0.8420
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7851

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.8945
  - Direction Up: 0.6130
  - Direction Constant: 0.9386
  - Direction Down: 0.6572
  - PastPrice: 0.8958
  - FuturePrice: 0.9503
  - PastNews: 0.8847
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8360





🔍 Run 8/21 | LR=2e-05, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.8527
  - Direction Up: 0.4251
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8563
  - FuturePrice: 0.9499
  - PastNews: 0.8420
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7851

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.8945
  - Direction Up: 0.6130
  - Direction Constant: 0.9386
  - Direction Down: 0.6572
  - PastPrice: 0.8958
  - FuturePrice: 0.9503
  - PastNews: 0.8847
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8360

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9114
  - Direction Up: 0.6043
  - Direction Constant: 0.9386
  - Direction Down: 0.6206
  - PastPrice: 0.9182
  - FuturePrice: 0.9499
  - PastNews: 0.9073
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8378


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 9/21 | LR=2e-05, Epochs=5
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.8527
  - Direction Up: 0.4251
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8563
  - FuturePrice: 0.9499
  - PastNews: 0.8420
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7851

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.8945
  - Direction Up: 0.6130
  - Direction Constant: 0.9386
  - Direction Down: 0.6572
  - PastPrice: 0.8958
  - FuturePrice: 0.9503
  - PastNews: 0.8847
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8360

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9114
  - Direction Up: 0.6043
  - Direction Constant: 0.9386
  - Direction Down: 0.6206
  - PastPrice: 0.9182
  - FuturePrice: 0.9499
  - PastNews: 0.9073
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8378

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9150
  - Direction Up: 0.6109
  - Direction Constant: 0.9386
  - Direction Down: 0.6314
  - PastPrice: 0.9058
  - FuturePrice: 0.9503
  - PastNews: 0.9108
  - FutureNews: 0.9882
  - Asset Comparision: 0.7963
  === Mean Weighted F1: 0.8497


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 10/21 | LR=3e-05, Epochs=3
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.7761
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7200
  - FuturePrice: 0.9499
  - PastNews: 0.7820
  - FutureNews: 0.9869
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7538

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.8887
  - Direction Up: 0.6150
  - Direction Constant: 0.9386
  - Direction Down: 0.5245
  - PastPrice: 0.9035
  - FuturePrice: 0.9503
  - PastNews: 0.8804
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8212

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9056
  - Direction Up: 0.6018
  - Direction Constant: 0.9386
  - Direction Down: 0.6471
  - PastPrice: 0.9051
  - FuturePrice: 0.9503
  - PastNews: 0.8990
  - FutureNews: 0.9882
  - Asset Comparision: 0.7335
  === Mean Weighted F1: 0.8410


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 11/21 | LR=3e-05, Epochs=4
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.7761
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7200
  - FuturePrice: 0.9499
  - PastNews: 0.7820
  - FutureNews: 0.9869
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7538

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.8887
  - Direction Up: 0.6150
  - Direction Constant: 0.9386
  - Direction Down: 0.5245
  - PastPrice: 0.9035
  - FuturePrice: 0.9503
  - PastNews: 0.8804
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8212

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9056
  - Direction Up: 0.6018
  - Direction Constant: 0.9386
  - Direction Down: 0.6471
  - PastPrice: 0.9051
  - FuturePrice: 0.9503
  - PastNews: 0.8990
  - FutureNews: 0.9882
  - Asset Comparision: 0.7335
  === Mean Weighted F1: 0.8410

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9270
  - Direction Up: 0.6206
  - Direction Constant: 0.9386
  - Direction Down: 0.6291
  - PastPrice: 0.9261
  - FuturePrice: 0.9503
  - PastNews: 0.9213
  - FutureNews: 0.9882
  - Asset Comparision: 0.8805
  === Mean Weighted F1: 0.8646





🔍 Run 12/21 | LR=3e-05, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.7761
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7200
  - FuturePrice: 0.9499
  - PastNews: 0.7820
  - FutureNews: 0.9869
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7538

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.8887
  - Direction Up: 0.6150
  - Direction Constant: 0.9386
  - Direction Down: 0.5245
  - PastPrice: 0.9035
  - FuturePrice: 0.9503
  - PastNews: 0.8804
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8212

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9056
  - Direction Up: 0.6018
  - Direction Constant: 0.9386
  - Direction Down: 0.6471
  - PastPrice: 0.9051
  - FuturePrice: 0.9503
  - PastNews: 0.8990
  - FutureNews: 0.9882
  - Asset Comparision: 0.7335
  === Mean Weighted F1: 0.8410

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9270
  - Direction Up: 0.6206
  - Direction Constant: 0.9386
  - Direction Down: 0.6291
  - PastPrice: 0.9261
  - FuturePrice: 0.9503
  - PastNews: 0.9213
  - FutureNews: 0.9882
  - Asset Comparision: 0.8805
  === Mean Weighted F1: 0.8646

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9262
  - Direction Up: 0.6134
  - Direction Constant: 0.9386
  - Direction Down: 0.6778
  - PastPrice: 0.9130
  - FuturePrice: 0.9503
  - PastNews: 0.9179
  - FutureNews: 0.9882
  - Asset Comparision: 0.9247
  === Mean Weighted F1: 0.8722


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 13/21 | LR=5e-05, Epochs=3
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.8667
  - Direction Up: 0.6016
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8906
  - FuturePrice: 0.9486
  - PastNews: 0.8566
  - FutureNews: 0.9860
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8113

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9054
  - Direction Up: 0.6032
  - Direction Constant: 0.9386
  - Direction Down: 0.6248
  - PastPrice: 0.9179
  - FuturePrice: 0.9503
  - PastNews: 0.8968
  - FutureNews: 0.9882
  - Asset Comparision: 0.7639
  === Mean Weighted F1: 0.8432

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9177
  - Direction Up: 0.6132
  - Direction Constant: 0.9386
  - Direction Down: 0.6600
  - PastPrice: 0.9140
  - FuturePrice: 0.9503
  - PastNews: 0.9116
  - FutureNews: 0.9882
  - Asset Comparision: 0.9264
  === Mean Weighted F1: 0.8689


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 14/21 | LR=5e-05, Epochs=4
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.8667
  - Direction Up: 0.6016
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8906
  - FuturePrice: 0.9486
  - PastNews: 0.8566
  - FutureNews: 0.9860
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8113

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9054
  - Direction Up: 0.6032
  - Direction Constant: 0.9386
  - Direction Down: 0.6248
  - PastPrice: 0.9179
  - FuturePrice: 0.9503
  - PastNews: 0.8968
  - FutureNews: 0.9882
  - Asset Comparision: 0.7639
  === Mean Weighted F1: 0.8432

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9177
  - Direction Up: 0.6132
  - Direction Constant: 0.9386
  - Direction Down: 0.6600
  - PastPrice: 0.9140
  - FuturePrice: 0.9503
  - PastNews: 0.9116
  - FutureNews: 0.9882
  - Asset Comparision: 0.9264
  === Mean Weighted F1: 0.8689

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9321
  - Direction Up: 0.6493
  - Direction Constant: 0.9386
  - Direction Down: 0.6527
  - PastPrice: 0.9344
  - FuturePrice: 0.9503
  - PastNews: 0.9322
  - FutureNews: 0.9882
  - Asset Comparision: 0.9660
  === Mean Weighted F1: 0.8826





🔍 Run 15/21 | LR=5e-05, Epochs=5


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.8667
  - Direction Up: 0.6016
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.8906
  - FuturePrice: 0.9486
  - PastNews: 0.8566
  - FutureNews: 0.9860
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8113

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9054
  - Direction Up: 0.6032
  - Direction Constant: 0.9386
  - Direction Down: 0.6248
  - PastPrice: 0.9179
  - FuturePrice: 0.9503
  - PastNews: 0.8968
  - FutureNews: 0.9882
  - Asset Comparision: 0.7639
  === Mean Weighted F1: 0.8432

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9177
  - Direction Up: 0.6132
  - Direction Constant: 0.9386
  - Direction Down: 0.6600
  - PastPrice: 0.9140
  - FuturePrice: 0.9503
  - PastNews: 0.9116
  - FutureNews: 0.9882
  - Asset Comparision: 0.9264
  === Mean Weighted F1: 0.8689

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9321
  - Direction Up: 0.6493
  - Direction Constant: 0.9386
  - Direction Down: 0.6527
  - PastPrice: 0.9344
  - FuturePrice: 0.9503
  - PastNews: 0.9322
  - FutureNews: 0.9882
  - Asset Comparision: 0.9660
  === Mean Weighted F1: 0.8826

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9317
  - Direction Up: 0.6955
  - Direction Constant: 0.9386
  - Direction Down: 0.7819
  - PastPrice: 0.9247
  - FuturePrice: 0.9503
  - PastNews: 0.9249
  - FutureNews: 0.9882
  - Asset Comparision: 0.9778
  === Mean Weighted F1: 0.9015


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 16/21 | LR=0.0001, Epochs=3
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9067
  - Direction Up: 0.6122
  - Direction Constant: 0.9386
  - Direction Down: 0.6520
  - PastPrice: 0.9058
  - FuturePrice: 0.9503
  - PastNews: 0.9010
  - FutureNews: 0.9882
  - Asset Comparision: 0.7553
  === Mean Weighted F1: 0.8456

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9224
  - Direction Up: 0.6450
  - Direction Constant: 0.9386
  - Direction Down: 0.6982
  - PastPrice: 0.9333
  - FuturePrice: 0.9503
  - PastNews: 0.9222
  - FutureNews: 0.9882
  - Asset Comparision: 0.9623
  === Mean Weighted F1: 0.8845

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9322
  - Direction Up: 0.8367
  - Direction Constant: 0.9386
  - Direction Down: 0.8813
  - PastPrice: 0.9377
  - FuturePrice: 0.9503
  - PastNews: 0.9199
  - FutureNews: 0.9882
  - Asset Comparision: 0.9841
  === Mean Weighted F1: 0.9299





🔍 Run 17/21 | LR=0.0001, Epochs=4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9067
  - Direction Up: 0.6122
  - Direction Constant: 0.9386
  - Direction Down: 0.6520
  - PastPrice: 0.9058
  - FuturePrice: 0.9503
  - PastNews: 0.9010
  - FutureNews: 0.9882
  - Asset Comparision: 0.7553
  === Mean Weighted F1: 0.8456

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9224
  - Direction Up: 0.6450
  - Direction Constant: 0.9386
  - Direction Down: 0.6982
  - PastPrice: 0.9333
  - FuturePrice: 0.9503
  - PastNews: 0.9222
  - FutureNews: 0.9882
  - Asset Comparision: 0.9623
  === Mean Weighted F1: 0.8845

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9322
  - Direction Up: 0.8367
  - Direction Constant: 0.9386
  - Direction Down: 0.8813
  - PastPrice: 0.9377
  - FuturePrice: 0.9503
  - PastNews: 0.9199
  - FutureNews: 0.9882
  - Asset Comparision: 0.9841
  === Mean Weighted F1: 0.9299

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9490
  - Direction Up: 0.9229
  - Direction Constant: 0.9386
  - Direction Down: 0.9454
  - PastPrice: 0.9327
  - FuturePrice: 0.9499
  - PastNews: 0.9436
  - FutureNews: 0.9896
  - Asset Comparision: 0.9850
  === Mean Weighted F1: 0.9508


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 18/21 | LR=0.0001, Epochs=5
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9067
  - Direction Up: 0.6122
  - Direction Constant: 0.9386
  - Direction Down: 0.6520
  - PastPrice: 0.9058
  - FuturePrice: 0.9503
  - PastNews: 0.9010
  - FutureNews: 0.9882
  - Asset Comparision: 0.7553
  === Mean Weighted F1: 0.8456

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9224
  - Direction Up: 0.6450
  - Direction Constant: 0.9386
  - Direction Down: 0.6982
  - PastPrice: 0.9333
  - FuturePrice: 0.9503
  - PastNews: 0.9222
  - FutureNews: 0.9882
  - Asset Comparision: 0.9623
  === Mean Weighted F1: 0.8845

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9322
  - Direction Up: 0.8367
  - Direction Constant: 0.9386
  - Direction Down: 0.8813
  - PastPrice: 0.9377
  - FuturePrice: 0.9503
  - PastNews: 0.9199
  - FutureNews: 0.9882
  - Asset Comparision: 0.9841
  === Mean Weighted F1: 0.9299

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9490
  - Direction Up: 0.9229
  - Direction Constant: 0.9386
  - Direction Down: 0.9454
  - PastPrice: 0.9327
  - FuturePrice: 0.9499
  - PastNews: 0.9436
  - FutureNews: 0.9896
  - Asset Comparision: 0.9850
  === Mean Weighted F1: 0.9508

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.9484
  - Direction Up: 0.9301
  - Direction Constant: 0.9386
  - Direction Down: 0.9507
  - PastPrice: 0.9389
  - FuturePrice: 0.9782
  - PastNews: 0.9412
  - FutureNews: 0.9882
  - Asset Comparision: 0.9904
  === Mean Weighted F1: 0.9561





🔍 Run 19/21 | LR=0.001, Epochs=3


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9455
  - Direction Up: 0.9211
  - Direction Constant: 0.9456
  - Direction Down: 0.9511
  - PastPrice: 0.9339
  - FuturePrice: 0.9818
  - PastNews: 0.9423
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9558

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9517
  - Direction Up: 0.9405
  - Direction Constant: 0.9617
  - Direction Down: 0.9396
  - PastPrice: 0.9455
  - FuturePrice: 0.9858
  - PastNews: 0.9487
  - FutureNews: 0.9877
  - Asset Comparision: 0.9921
  === Mean Weighted F1: 0.9615

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9497
  - Direction Up: 0.9333
  - Direction Constant: 0.9739
  - Direction Down: 0.9474
  - PastPrice: 0.9359
  - FuturePrice: 0.9844
  - PastNews: 0.9459
  - FutureNews: 0.9882
  - Asset Comparision: 0.9965
  === Mean Weighted F1: 0.9617


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 20/21 | LR=0.001, Epochs=4
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/4





=== Validation Results (Epoch 1):
  - Price or Not: 0.9455
  - Direction Up: 0.9211
  - Direction Constant: 0.9456
  - Direction Down: 0.9511
  - PastPrice: 0.9339
  - FuturePrice: 0.9818
  - PastNews: 0.9423
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9558

=> Epoch 2/4





=== Validation Results (Epoch 2):
  - Price or Not: 0.9517
  - Direction Up: 0.9405
  - Direction Constant: 0.9617
  - Direction Down: 0.9396
  - PastPrice: 0.9455
  - FuturePrice: 0.9858
  - PastNews: 0.9487
  - FutureNews: 0.9877
  - Asset Comparision: 0.9921
  === Mean Weighted F1: 0.9615

=> Epoch 3/4





=== Validation Results (Epoch 3):
  - Price or Not: 0.9497
  - Direction Up: 0.9333
  - Direction Constant: 0.9739
  - Direction Down: 0.9474
  - PastPrice: 0.9359
  - FuturePrice: 0.9844
  - PastNews: 0.9459
  - FutureNews: 0.9882
  - Asset Comparision: 0.9965
  === Mean Weighted F1: 0.9617

=> Epoch 4/4





=== Validation Results (Epoch 4):
  - Price or Not: 0.9430
  - Direction Up: 0.9419
  - Direction Constant: 0.9742
  - Direction Down: 0.9481
  - PastPrice: 0.9448
  - FuturePrice: 0.9877
  - PastNews: 0.9440
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9630


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 21/21 | LR=0.001, Epochs=5
Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/5





=== Validation Results (Epoch 1):
  - Price or Not: 0.9455
  - Direction Up: 0.9211
  - Direction Constant: 0.9456
  - Direction Down: 0.9511
  - PastPrice: 0.9339
  - FuturePrice: 0.9818
  - PastNews: 0.9423
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9558

=> Epoch 2/5





=== Validation Results (Epoch 2):
  - Price or Not: 0.9517
  - Direction Up: 0.9405
  - Direction Constant: 0.9617
  - Direction Down: 0.9396
  - PastPrice: 0.9455
  - FuturePrice: 0.9858
  - PastNews: 0.9487
  - FutureNews: 0.9877
  - Asset Comparision: 0.9921
  === Mean Weighted F1: 0.9615

=> Epoch 3/5





=== Validation Results (Epoch 3):
  - Price or Not: 0.9497
  - Direction Up: 0.9333
  - Direction Constant: 0.9739
  - Direction Down: 0.9474
  - PastPrice: 0.9359
  - FuturePrice: 0.9844
  - PastNews: 0.9459
  - FutureNews: 0.9882
  - Asset Comparision: 0.9965
  === Mean Weighted F1: 0.9617

=> Epoch 4/5





=== Validation Results (Epoch 4):
  - Price or Not: 0.9430
  - Direction Up: 0.9419
  - Direction Constant: 0.9742
  - Direction Down: 0.9481
  - PastPrice: 0.9448
  - FuturePrice: 0.9877
  - PastNews: 0.9440
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9630

=> Epoch 5/5





=== Validation Results (Epoch 5):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483


                                                          


==> Best Configuration Found:
LR               0.001
Epochs               3
ValLoss       0.108391
ValMeanF1     0.961674
TestLoss      0.108498
TestMeanF1    0.960967
Method            lora
Name: 18, dtype: object




### Hyperparameter search for BERT in Adapter fine tuning

Best hyperparameter which we found on this is:
- lr: 1e-3
- epoch: 6

In [None]:
run_hyperparameter_search(
    model_name="bert-base-uncased",
    method="adapter",
    learning_rates=[1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3],
    epochs_list=[6,9,11],
    batch_size=16 # as in the original paper for Bert in Adapter
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]


🔍 Run 1/21 | LR=1e-06, Epochs=6


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.3216
  - Direction Constant: 0.9386
  - Direction Down: 0.2052
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.4287
  - FutureNews: 0.0088
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.5574

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.3807
  - Direction Constant: 0.9386
  - Direction Down: 0.2095
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7808
  - FutureNews: 0.5426
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.6629

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4488
  - Direction Constant: 0.9386
  - Direction Down: 0.2048
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7179

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4160
  - Direction Constant: 0.9386
  - Direction Down: 0.5409
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7514

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.7541
  - Direction Up: 0.4173
  - Direction Constant: 0.9386
  - Direction Down: 0.5110
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7482

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 2/21 | LR=1e-06, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.3216
  - Direction Constant: 0.9386
  - Direction Down: 0.2052
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.4287
  - FutureNews: 0.0088
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.5574

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.3807
  - Direction Constant: 0.9386
  - Direction Down: 0.2095
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7808
  - FutureNews: 0.5426
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.6629

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4488
  - Direction Constant: 0.9386
  - Direction Down: 0.2048
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7179

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4160
  - Direction Constant: 0.9386
  - Direction Down: 0.5409
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7514

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.7541
  - Direction Up: 0.4173
  - Direction Constant: 0.9386
  - Direction Down: 0.5110
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7482

=> Epoch 6/9





=== Validation Results (Epoch 6):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 7/9





=== Validation Results (Epoch 7):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 8/9





=== Validation Results (Epoch 8):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 9/9





=== Validation Results (Epoch 9):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 3/21 | LR=1e-06, Epochs=11
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.3216
  - Direction Constant: 0.9386
  - Direction Down: 0.2052
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.4287
  - FutureNews: 0.0088
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.5574

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.7541
  - Direction Up: 0.3807
  - Direction Constant: 0.9386
  - Direction Down: 0.2095
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7808
  - FutureNews: 0.5426
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.6629

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.7541
  - Direction Up: 0.4488
  - Direction Constant: 0.9386
  - Direction Down: 0.2048
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9877
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7179

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.7541
  - Direction Up: 0.4160
  - Direction Constant: 0.9386
  - Direction Down: 0.5409
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7514

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.7541
  - Direction Up: 0.4173
  - Direction Constant: 0.9386
  - Direction Down: 0.5110
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7482

=> Epoch 6/11





=== Validation Results (Epoch 6):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 7/11





=== Validation Results (Epoch 7):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 8/11





=== Validation Results (Epoch 8):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 9/11





=== Validation Results (Epoch 9):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 10/11





=== Validation Results (Epoch 10):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 11/11





=== Validation Results (Epoch 11):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 4/21 | LR=1e-05, Epochs=6
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.7705
  - Direction Up: 0.4173
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7139
  - FuturePrice: 0.9503
  - PastNews: 0.7694
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7513

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.8571
  - Direction Up: 0.5320
  - Direction Constant: 0.9386
  - Direction Down: 0.5134
  - PastPrice: 0.8705
  - FuturePrice: 0.9503
  - PastNews: 0.8406
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7992

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.8801
  - Direction Up: 0.5916
  - Direction Constant: 0.9386
  - Direction Down: 0.5860
  - PastPrice: 0.8948
  - FuturePrice: 0.9503
  - PastNews: 0.8669
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8220

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.8860
  - Direction Up: 0.5563
  - Direction Constant: 0.9386
  - Direction Down: 0.5968
  - PastPrice: 0.9064
  - FuturePrice: 0.9503
  - PastNews: 0.8718
  - FutureNews: 0.9882
  - Asset Comparision: 0.7120
  === Mean Weighted F1: 0.8229

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.8907
  - Direction Up: 0.6127
  - Direction Constant: 0.9386
  - Direction Down: 0.6016
  - PastPrice: 0.9143
  - FuturePrice: 0.9503
  - PastNews: 0.8838
  - FutureNews: 0.9882
  - Asset Comparision: 0.7782
  === Mean Weighted F1: 0.8398


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 5/21 | LR=1e-05, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.7705
  - Direction Up: 0.4173
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7139
  - FuturePrice: 0.9503
  - PastNews: 0.7694
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7513

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.8571
  - Direction Up: 0.5320
  - Direction Constant: 0.9386
  - Direction Down: 0.5134
  - PastPrice: 0.8705
  - FuturePrice: 0.9503
  - PastNews: 0.8406
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7992

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.8801
  - Direction Up: 0.5916
  - Direction Constant: 0.9386
  - Direction Down: 0.5860
  - PastPrice: 0.8948
  - FuturePrice: 0.9503
  - PastNews: 0.8669
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8220

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.8860
  - Direction Up: 0.5563
  - Direction Constant: 0.9386
  - Direction Down: 0.5968
  - PastPrice: 0.9064
  - FuturePrice: 0.9503
  - PastNews: 0.8718
  - FutureNews: 0.9882
  - Asset Comparision: 0.7120
  === Mean Weighted F1: 0.8229

=> Epoch 6/9





=== Validation Results (Epoch 6):
  - Price or Not: 0.8907
  - Direction Up: 0.6127
  - Direction Constant: 0.9386
  - Direction Down: 0.6016
  - PastPrice: 0.9143
  - FuturePrice: 0.9503
  - PastNews: 0.8838
  - FutureNews: 0.9882
  - Asset Comparision: 0.7782
  === Mean Weighted F1: 0.8398

=> Epoch 7/9





=== Validation Results (Epoch 7):
  - Price or Not: 0.8917
  - Direction Up: 0.5757
  - Direction Constant: 0.9386
  - Direction Down: 0.6311
  - PastPrice: 0.9162
  - FuturePrice: 0.9503
  - PastNews: 0.8875
  - FutureNews: 0.9882
  - Asset Comparision: 0.8419
  === Mean Weighted F1: 0.8468

=> Epoch 8/9





=== Validation Results (Epoch 8):
  - Price or Not: 0.9010
  - Direction Up: 0.6184
  - Direction Constant: 0.9386
  - Direction Down: 0.6069
  - PastPrice: 0.9276
  - FuturePrice: 0.9503
  - PastNews: 0.8949
  - FutureNews: 0.9882
  - Asset Comparision: 0.8432
  === Mean Weighted F1: 0.8521

=> Epoch 9/9





=== Validation Results (Epoch 9):
  - Price or Not: 0.9110
  - Direction Up: 0.6182
  - Direction Constant: 0.9386
  - Direction Down: 0.6154
  - PastPrice: 0.9300
  - FuturePrice: 0.9503
  - PastNews: 0.9017
  - FutureNews: 0.9882
  - Asset Comparision: 0.8666
  === Mean Weighted F1: 0.8578


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 6/21 | LR=1e-05, Epochs=11
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.7541
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7652
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7483

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.7705
  - Direction Up: 0.4173
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7139
  - FuturePrice: 0.9503
  - PastNews: 0.7694
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7513

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.8571
  - Direction Up: 0.5320
  - Direction Constant: 0.9386
  - Direction Down: 0.5134
  - PastPrice: 0.8705
  - FuturePrice: 0.9503
  - PastNews: 0.8406
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7992

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.8801
  - Direction Up: 0.5916
  - Direction Constant: 0.9386
  - Direction Down: 0.5860
  - PastPrice: 0.8948
  - FuturePrice: 0.9503
  - PastNews: 0.8669
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8220

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.8860
  - Direction Up: 0.5563
  - Direction Constant: 0.9386
  - Direction Down: 0.5968
  - PastPrice: 0.9064
  - FuturePrice: 0.9503
  - PastNews: 0.8718
  - FutureNews: 0.9882
  - Asset Comparision: 0.7120
  === Mean Weighted F1: 0.8229

=> Epoch 6/11





=== Validation Results (Epoch 6):
  - Price or Not: 0.8907
  - Direction Up: 0.6127
  - Direction Constant: 0.9386
  - Direction Down: 0.6016
  - PastPrice: 0.9143
  - FuturePrice: 0.9503
  - PastNews: 0.8838
  - FutureNews: 0.9882
  - Asset Comparision: 0.7782
  === Mean Weighted F1: 0.8398

=> Epoch 7/11





=== Validation Results (Epoch 7):
  - Price or Not: 0.8917
  - Direction Up: 0.5757
  - Direction Constant: 0.9386
  - Direction Down: 0.6311
  - PastPrice: 0.9162
  - FuturePrice: 0.9503
  - PastNews: 0.8875
  - FutureNews: 0.9882
  - Asset Comparision: 0.8419
  === Mean Weighted F1: 0.8468

=> Epoch 8/11





=== Validation Results (Epoch 8):
  - Price or Not: 0.9010
  - Direction Up: 0.6184
  - Direction Constant: 0.9386
  - Direction Down: 0.6069
  - PastPrice: 0.9276
  - FuturePrice: 0.9503
  - PastNews: 0.8949
  - FutureNews: 0.9882
  - Asset Comparision: 0.8432
  === Mean Weighted F1: 0.8521

=> Epoch 9/11





=== Validation Results (Epoch 9):
  - Price or Not: 0.9110
  - Direction Up: 0.6182
  - Direction Constant: 0.9386
  - Direction Down: 0.6154
  - PastPrice: 0.9300
  - FuturePrice: 0.9503
  - PastNews: 0.9017
  - FutureNews: 0.9882
  - Asset Comparision: 0.8666
  === Mean Weighted F1: 0.8578

=> Epoch 10/11





=== Validation Results (Epoch 10):
  - Price or Not: 0.9092
  - Direction Up: 0.6081
  - Direction Constant: 0.9386
  - Direction Down: 0.6181
  - PastPrice: 0.9230
  - FuturePrice: 0.9503
  - PastNews: 0.9037
  - FutureNews: 0.9882
  - Asset Comparision: 0.8924
  === Mean Weighted F1: 0.8591

=> Epoch 11/11





=== Validation Results (Epoch 11):
  - Price or Not: 0.9152
  - Direction Up: 0.6234
  - Direction Constant: 0.9386
  - Direction Down: 0.6243
  - PastPrice: 0.9243
  - FuturePrice: 0.9503
  - PastNews: 0.9115
  - FutureNews: 0.9882
  - Asset Comparision: 0.9040
  === Mean Weighted F1: 0.8644


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 7/21 | LR=2e-05, Epochs=6
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.7624
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7495

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.8842
  - Direction Up: 0.6099
  - Direction Constant: 0.9386
  - Direction Down: 0.5790
  - PastPrice: 0.8970
  - FuturePrice: 0.9503
  - PastNews: 0.8744
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8248

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.8948
  - Direction Up: 0.5596
  - Direction Constant: 0.9386
  - Direction Down: 0.5899
  - PastPrice: 0.9177
  - FuturePrice: 0.9503
  - PastNews: 0.8850
  - FutureNews: 0.9882
  - Asset Comparision: 0.7878
  === Mean Weighted F1: 0.8347

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.8935
  - Direction Up: 0.6185
  - Direction Constant: 0.9386
  - Direction Down: 0.6063
  - PastPrice: 0.9197
  - FuturePrice: 0.9503
  - PastNews: 0.8872
  - FutureNews: 0.9882
  - Asset Comparision: 0.8422
  === Mean Weighted F1: 0.8494

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9037
  - Direction Up: 0.6348
  - Direction Constant: 0.9386
  - Direction Down: 0.6075
  - PastPrice: 0.9265
  - FuturePrice: 0.9503
  - PastNews: 0.9045
  - FutureNews: 0.9882
  - Asset Comparision: 0.8757
  === Mean Weighted F1: 0.8589

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.9199
  - Direction Up: 0.6112
  - Direction Constant: 0.9386
  - Direction Down: 0.6145
  - PastPrice: 0.9283
  - FuturePrice: 0.9503
  - PastNews: 0.9141
  - FutureNews: 0.9882
  - Asset Comparision: 0.9148
  === Mean Weighted F1: 0.8644


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 8/21 | LR=2e-05, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.7624
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7495

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.8842
  - Direction Up: 0.6099
  - Direction Constant: 0.9386
  - Direction Down: 0.5790
  - PastPrice: 0.8970
  - FuturePrice: 0.9503
  - PastNews: 0.8744
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8248

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.8948
  - Direction Up: 0.5596
  - Direction Constant: 0.9386
  - Direction Down: 0.5899
  - PastPrice: 0.9177
  - FuturePrice: 0.9503
  - PastNews: 0.8850
  - FutureNews: 0.9882
  - Asset Comparision: 0.7878
  === Mean Weighted F1: 0.8347

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.8935
  - Direction Up: 0.6185
  - Direction Constant: 0.9386
  - Direction Down: 0.6063
  - PastPrice: 0.9197
  - FuturePrice: 0.9503
  - PastNews: 0.8872
  - FutureNews: 0.9882
  - Asset Comparision: 0.8422
  === Mean Weighted F1: 0.8494

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.9037
  - Direction Up: 0.6348
  - Direction Constant: 0.9386
  - Direction Down: 0.6075
  - PastPrice: 0.9265
  - FuturePrice: 0.9503
  - PastNews: 0.9045
  - FutureNews: 0.9882
  - Asset Comparision: 0.8757
  === Mean Weighted F1: 0.8589

=> Epoch 6/9





=== Validation Results (Epoch 6):
  - Price or Not: 0.9199
  - Direction Up: 0.6112
  - Direction Constant: 0.9386
  - Direction Down: 0.6145
  - PastPrice: 0.9283
  - FuturePrice: 0.9503
  - PastNews: 0.9141
  - FutureNews: 0.9882
  - Asset Comparision: 0.9148
  === Mean Weighted F1: 0.8644

=> Epoch 7/9





=== Validation Results (Epoch 7):
  - Price or Not: 0.9285
  - Direction Up: 0.6351
  - Direction Constant: 0.9386
  - Direction Down: 0.6486
  - PastPrice: 0.9293
  - FuturePrice: 0.9503
  - PastNews: 0.9258
  - FutureNews: 0.9882
  - Asset Comparision: 0.9185
  === Mean Weighted F1: 0.8737

=> Epoch 8/9





=== Validation Results (Epoch 8):
  - Price or Not: 0.9290
  - Direction Up: 0.6400
  - Direction Constant: 0.9386
  - Direction Down: 0.6115
  - PastPrice: 0.9301
  - FuturePrice: 0.9503
  - PastNews: 0.9258
  - FutureNews: 0.9882
  - Asset Comparision: 0.9222
  === Mean Weighted F1: 0.8706

=> Epoch 9/9





=== Validation Results (Epoch 9):
  - Price or Not: 0.9327
  - Direction Up: 0.6569
  - Direction Constant: 0.9386
  - Direction Down: 0.6630
  - PastPrice: 0.9307
  - FuturePrice: 0.9503
  - PastNews: 0.9296
  - FutureNews: 0.9882
  - Asset Comparision: 0.9415
  === Mean Weighted F1: 0.8813


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 9/21 | LR=2e-05, Epochs=11
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.7624
  - Direction Up: 0.4177
  - Direction Constant: 0.9386
  - Direction Down: 0.5114
  - PastPrice: 0.7077
  - FuturePrice: 0.9503
  - PastNews: 0.7673
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7495

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.8842
  - Direction Up: 0.6099
  - Direction Constant: 0.9386
  - Direction Down: 0.5790
  - PastPrice: 0.8970
  - FuturePrice: 0.9503
  - PastNews: 0.8744
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.8248

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.8948
  - Direction Up: 0.5596
  - Direction Constant: 0.9386
  - Direction Down: 0.5899
  - PastPrice: 0.9177
  - FuturePrice: 0.9503
  - PastNews: 0.8850
  - FutureNews: 0.9882
  - Asset Comparision: 0.7878
  === Mean Weighted F1: 0.8347

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.8935
  - Direction Up: 0.6185
  - Direction Constant: 0.9386
  - Direction Down: 0.6063
  - PastPrice: 0.9197
  - FuturePrice: 0.9503
  - PastNews: 0.8872
  - FutureNews: 0.9882
  - Asset Comparision: 0.8422
  === Mean Weighted F1: 0.8494

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.9037
  - Direction Up: 0.6348
  - Direction Constant: 0.9386
  - Direction Down: 0.6075
  - PastPrice: 0.9265
  - FuturePrice: 0.9503
  - PastNews: 0.9045
  - FutureNews: 0.9882
  - Asset Comparision: 0.8757
  === Mean Weighted F1: 0.8589

=> Epoch 6/11





=== Validation Results (Epoch 6):
  - Price or Not: 0.9199
  - Direction Up: 0.6112
  - Direction Constant: 0.9386
  - Direction Down: 0.6145
  - PastPrice: 0.9283
  - FuturePrice: 0.9503
  - PastNews: 0.9141
  - FutureNews: 0.9882
  - Asset Comparision: 0.9148
  === Mean Weighted F1: 0.8644

=> Epoch 7/11





=== Validation Results (Epoch 7):
  - Price or Not: 0.9285
  - Direction Up: 0.6351
  - Direction Constant: 0.9386
  - Direction Down: 0.6486
  - PastPrice: 0.9293
  - FuturePrice: 0.9503
  - PastNews: 0.9258
  - FutureNews: 0.9882
  - Asset Comparision: 0.9185
  === Mean Weighted F1: 0.8737

=> Epoch 8/11





=== Validation Results (Epoch 8):
  - Price or Not: 0.9290
  - Direction Up: 0.6400
  - Direction Constant: 0.9386
  - Direction Down: 0.6115
  - PastPrice: 0.9301
  - FuturePrice: 0.9503
  - PastNews: 0.9258
  - FutureNews: 0.9882
  - Asset Comparision: 0.9222
  === Mean Weighted F1: 0.8706

=> Epoch 9/11





=== Validation Results (Epoch 9):
  - Price or Not: 0.9327
  - Direction Up: 0.6569
  - Direction Constant: 0.9386
  - Direction Down: 0.6630
  - PastPrice: 0.9307
  - FuturePrice: 0.9503
  - PastNews: 0.9296
  - FutureNews: 0.9882
  - Asset Comparision: 0.9415
  === Mean Weighted F1: 0.8813

=> Epoch 10/11





=== Validation Results (Epoch 10):
  - Price or Not: 0.9356
  - Direction Up: 0.6722
  - Direction Constant: 0.9386
  - Direction Down: 0.6784
  - PastPrice: 0.9157
  - FuturePrice: 0.9503
  - PastNews: 0.9307
  - FutureNews: 0.9882
  - Asset Comparision: 0.9570
  === Mean Weighted F1: 0.8852

=> Epoch 11/11





=== Validation Results (Epoch 11):
  - Price or Not: 0.9344
  - Direction Up: 0.7978
  - Direction Constant: 0.9408
  - Direction Down: 0.8484
  - PastPrice: 0.9225
  - FuturePrice: 0.9494
  - PastNews: 0.9313
  - FutureNews: 0.9882
  - Asset Comparision: 0.9633
  === Mean Weighted F1: 0.9196





🔍 Run 10/21 | LR=3e-05, Epochs=6


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.8523
  - Direction Up: 0.5028
  - Direction Constant: 0.9386
  - Direction Down: 0.5194
  - PastPrice: 0.8648
  - FuturePrice: 0.9503
  - PastNews: 0.8287
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7941

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.8942
  - Direction Up: 0.6097
  - Direction Constant: 0.9386
  - Direction Down: 0.5825
  - PastPrice: 0.9195
  - FuturePrice: 0.9503
  - PastNews: 0.8881
  - FutureNews: 0.9882
  - Asset Comparision: 0.7667
  === Mean Weighted F1: 0.8375

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9077
  - Direction Up: 0.6141
  - Direction Constant: 0.9386
  - Direction Down: 0.6002
  - PastPrice: 0.9241
  - FuturePrice: 0.9503
  - PastNews: 0.8986
  - FutureNews: 0.9882
  - Asset Comparision: 0.8761
  === Mean Weighted F1: 0.8553

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9131
  - Direction Up: 0.6288
  - Direction Constant: 0.9386
  - Direction Down: 0.5948
  - PastPrice: 0.9245
  - FuturePrice: 0.9503
  - PastNews: 0.9123
  - FutureNews: 0.9882
  - Asset Comparision: 0.8915
  === Mean Weighted F1: 0.8602

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9258
  - Direction Up: 0.6489
  - Direction Constant: 0.9386
  - Direction Down: 0.6122
  - PastPrice: 0.9271
  - FuturePrice: 0.9503
  - PastNews: 0.9207
  - FutureNews: 0.9882
  - Asset Comparision: 0.9247
  === Mean Weighted F1: 0.8707

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.9352
  - Direction Up: 0.6456
  - Direction Constant: 0.9386
  - Direction Down: 0.6076
  - PastPrice: 0.9229
  - FuturePrice: 0.9503
  - PastNews: 0.9299
  - FutureNews: 0.9882
  - Asset Comparision: 0.9416
  === Mean Weighted F1: 0.8733


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 11/21 | LR=3e-05, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.8523
  - Direction Up: 0.5028
  - Direction Constant: 0.9386
  - Direction Down: 0.5194
  - PastPrice: 0.8648
  - FuturePrice: 0.9503
  - PastNews: 0.8287
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7941

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.8942
  - Direction Up: 0.6097
  - Direction Constant: 0.9386
  - Direction Down: 0.5825
  - PastPrice: 0.9195
  - FuturePrice: 0.9503
  - PastNews: 0.8881
  - FutureNews: 0.9882
  - Asset Comparision: 0.7667
  === Mean Weighted F1: 0.8375

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.9077
  - Direction Up: 0.6141
  - Direction Constant: 0.9386
  - Direction Down: 0.6002
  - PastPrice: 0.9241
  - FuturePrice: 0.9503
  - PastNews: 0.8986
  - FutureNews: 0.9882
  - Asset Comparision: 0.8761
  === Mean Weighted F1: 0.8553

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.9131
  - Direction Up: 0.6288
  - Direction Constant: 0.9386
  - Direction Down: 0.5948
  - PastPrice: 0.9245
  - FuturePrice: 0.9503
  - PastNews: 0.9123
  - FutureNews: 0.9882
  - Asset Comparision: 0.8915
  === Mean Weighted F1: 0.8602

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.9258
  - Direction Up: 0.6489
  - Direction Constant: 0.9386
  - Direction Down: 0.6122
  - PastPrice: 0.9271
  - FuturePrice: 0.9503
  - PastNews: 0.9207
  - FutureNews: 0.9882
  - Asset Comparision: 0.9247
  === Mean Weighted F1: 0.8707

=> Epoch 6/9





=== Validation Results (Epoch 6):
  - Price or Not: 0.9352
  - Direction Up: 0.6456
  - Direction Constant: 0.9386
  - Direction Down: 0.6076
  - PastPrice: 0.9229
  - FuturePrice: 0.9503
  - PastNews: 0.9299
  - FutureNews: 0.9882
  - Asset Comparision: 0.9416
  === Mean Weighted F1: 0.8733

=> Epoch 7/9





=== Validation Results (Epoch 7):
  - Price or Not: 0.9362
  - Direction Up: 0.7366
  - Direction Constant: 0.9386
  - Direction Down: 0.7717
  - PastPrice: 0.9219
  - FuturePrice: 0.9503
  - PastNews: 0.9331
  - FutureNews: 0.9882
  - Asset Comparision: 0.9587
  === Mean Weighted F1: 0.9039

=> Epoch 8/9





=== Validation Results (Epoch 8):
  - Price or Not: 0.9351
  - Direction Up: 0.8628
  - Direction Constant: 0.9386
  - Direction Down: 0.9184
  - PastPrice: 0.9207
  - FuturePrice: 0.9559
  - PastNews: 0.9309
  - FutureNews: 0.9882
  - Asset Comparision: 0.9657
  === Mean Weighted F1: 0.9352

=> Epoch 9/9





=== Validation Results (Epoch 9):
  - Price or Not: 0.9429
  - Direction Up: 0.8974
  - Direction Constant: 0.9386
  - Direction Down: 0.9407
  - PastPrice: 0.9377
  - FuturePrice: 0.9535
  - PastNews: 0.9405
  - FutureNews: 0.9882
  - Asset Comparision: 0.9665
  === Mean Weighted F1: 0.9451


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 12/21 | LR=3e-05, Epochs=11
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.8523
  - Direction Up: 0.5028
  - Direction Constant: 0.9386
  - Direction Down: 0.5194
  - PastPrice: 0.8648
  - FuturePrice: 0.9503
  - PastNews: 0.8287
  - FutureNews: 0.9882
  - Asset Comparision: 0.7017
  === Mean Weighted F1: 0.7941

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.8942
  - Direction Up: 0.6097
  - Direction Constant: 0.9386
  - Direction Down: 0.5825
  - PastPrice: 0.9195
  - FuturePrice: 0.9503
  - PastNews: 0.8881
  - FutureNews: 0.9882
  - Asset Comparision: 0.7667
  === Mean Weighted F1: 0.8375

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.9077
  - Direction Up: 0.6141
  - Direction Constant: 0.9386
  - Direction Down: 0.6002
  - PastPrice: 0.9241
  - FuturePrice: 0.9503
  - PastNews: 0.8986
  - FutureNews: 0.9882
  - Asset Comparision: 0.8761
  === Mean Weighted F1: 0.8553

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.9131
  - Direction Up: 0.6288
  - Direction Constant: 0.9386
  - Direction Down: 0.5948
  - PastPrice: 0.9245
  - FuturePrice: 0.9503
  - PastNews: 0.9123
  - FutureNews: 0.9882
  - Asset Comparision: 0.8915
  === Mean Weighted F1: 0.8602

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.9258
  - Direction Up: 0.6489
  - Direction Constant: 0.9386
  - Direction Down: 0.6122
  - PastPrice: 0.9271
  - FuturePrice: 0.9503
  - PastNews: 0.9207
  - FutureNews: 0.9882
  - Asset Comparision: 0.9247
  === Mean Weighted F1: 0.8707

=> Epoch 6/11





=== Validation Results (Epoch 6):
  - Price or Not: 0.9352
  - Direction Up: 0.6456
  - Direction Constant: 0.9386
  - Direction Down: 0.6076
  - PastPrice: 0.9229
  - FuturePrice: 0.9503
  - PastNews: 0.9299
  - FutureNews: 0.9882
  - Asset Comparision: 0.9416
  === Mean Weighted F1: 0.8733

=> Epoch 7/11





=== Validation Results (Epoch 7):
  - Price or Not: 0.9362
  - Direction Up: 0.7366
  - Direction Constant: 0.9386
  - Direction Down: 0.7717
  - PastPrice: 0.9219
  - FuturePrice: 0.9503
  - PastNews: 0.9331
  - FutureNews: 0.9882
  - Asset Comparision: 0.9587
  === Mean Weighted F1: 0.9039

=> Epoch 8/11





=== Validation Results (Epoch 8):
  - Price or Not: 0.9351
  - Direction Up: 0.8628
  - Direction Constant: 0.9386
  - Direction Down: 0.9184
  - PastPrice: 0.9207
  - FuturePrice: 0.9559
  - PastNews: 0.9309
  - FutureNews: 0.9882
  - Asset Comparision: 0.9657
  === Mean Weighted F1: 0.9352

=> Epoch 9/11





=== Validation Results (Epoch 9):
  - Price or Not: 0.9429
  - Direction Up: 0.8974
  - Direction Constant: 0.9386
  - Direction Down: 0.9407
  - PastPrice: 0.9377
  - FuturePrice: 0.9535
  - PastNews: 0.9405
  - FutureNews: 0.9882
  - Asset Comparision: 0.9665
  === Mean Weighted F1: 0.9451

=> Epoch 10/11





=== Validation Results (Epoch 10):
  - Price or Not: 0.9424
  - Direction Up: 0.9117
  - Direction Constant: 0.9386
  - Direction Down: 0.9447
  - PastPrice: 0.9328
  - FuturePrice: 0.9554
  - PastNews: 0.9372
  - FutureNews: 0.9882
  - Asset Comparision: 0.9714
  === Mean Weighted F1: 0.9469

=> Epoch 11/11





=== Validation Results (Epoch 11):
  - Price or Not: 0.9507
  - Direction Up: 0.9165
  - Direction Constant: 0.9386
  - Direction Down: 0.9466
  - PastPrice: 0.9417
  - FuturePrice: 0.9699
  - PastNews: 0.9437
  - FutureNews: 0.9882
  - Asset Comparision: 0.9725
  === Mean Weighted F1: 0.9520


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 13/21 | LR=5e-05, Epochs=6
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.8762
  - Direction Up: 0.6011
  - Direction Constant: 0.9386
  - Direction Down: 0.6240
  - PastPrice: 0.8968
  - FuturePrice: 0.9503
  - PastNews: 0.8661
  - FutureNews: 0.9882
  - Asset Comparision: 0.7079
  === Mean Weighted F1: 0.8277

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.9134
  - Direction Up: 0.6019
  - Direction Constant: 0.9386
  - Direction Down: 0.5660
  - PastPrice: 0.9285
  - FuturePrice: 0.9503
  - PastNews: 0.9027
  - FutureNews: 0.9882
  - Asset Comparision: 0.8133
  === Mean Weighted F1: 0.8448

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9245
  - Direction Up: 0.6445
  - Direction Constant: 0.9403
  - Direction Down: 0.6198
  - PastPrice: 0.9246
  - FuturePrice: 0.9503
  - PastNews: 0.9206
  - FutureNews: 0.9882
  - Asset Comparision: 0.9068
  === Mean Weighted F1: 0.8688

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9318
  - Direction Up: 0.6687
  - Direction Constant: 0.9386
  - Direction Down: 0.7859
  - PastPrice: 0.9229
  - FuturePrice: 0.9503
  - PastNews: 0.9263
  - FutureNews: 0.9882
  - Asset Comparision: 0.9440
  === Mean Weighted F1: 0.8952

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9296
  - Direction Up: 0.8813
  - Direction Constant: 0.9386
  - Direction Down: 0.9384
  - PastPrice: 0.9317
  - FuturePrice: 0.9503
  - PastNews: 0.9300
  - FutureNews: 0.9882
  - Asset Comparision: 0.9713
  === Mean Weighted F1: 0.9399

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.9419
  - Direction Up: 0.9023
  - Direction Constant: 0.9386
  - Direction Down: 0.9448
  - PastPrice: 0.9274
  - FuturePrice: 0.9503
  - PastNews: 0.9367
  - FutureNews: 0.9882
  - Asset Comparision: 0.9725
  === Mean Weighted F1: 0.9448


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 14/21 | LR=5e-05, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.8762
  - Direction Up: 0.6011
  - Direction Constant: 0.9386
  - Direction Down: 0.6240
  - PastPrice: 0.8968
  - FuturePrice: 0.9503
  - PastNews: 0.8661
  - FutureNews: 0.9882
  - Asset Comparision: 0.7079
  === Mean Weighted F1: 0.8277

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.9134
  - Direction Up: 0.6019
  - Direction Constant: 0.9386
  - Direction Down: 0.5660
  - PastPrice: 0.9285
  - FuturePrice: 0.9503
  - PastNews: 0.9027
  - FutureNews: 0.9882
  - Asset Comparision: 0.8133
  === Mean Weighted F1: 0.8448

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.9245
  - Direction Up: 0.6445
  - Direction Constant: 0.9403
  - Direction Down: 0.6198
  - PastPrice: 0.9246
  - FuturePrice: 0.9503
  - PastNews: 0.9206
  - FutureNews: 0.9882
  - Asset Comparision: 0.9068
  === Mean Weighted F1: 0.8688

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.9318
  - Direction Up: 0.6687
  - Direction Constant: 0.9386
  - Direction Down: 0.7859
  - PastPrice: 0.9229
  - FuturePrice: 0.9503
  - PastNews: 0.9263
  - FutureNews: 0.9882
  - Asset Comparision: 0.9440
  === Mean Weighted F1: 0.8952

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.9296
  - Direction Up: 0.8813
  - Direction Constant: 0.9386
  - Direction Down: 0.9384
  - PastPrice: 0.9317
  - FuturePrice: 0.9503
  - PastNews: 0.9300
  - FutureNews: 0.9882
  - Asset Comparision: 0.9713
  === Mean Weighted F1: 0.9399

=> Epoch 6/9





=== Validation Results (Epoch 6):
  - Price or Not: 0.9419
  - Direction Up: 0.9023
  - Direction Constant: 0.9386
  - Direction Down: 0.9448
  - PastPrice: 0.9274
  - FuturePrice: 0.9503
  - PastNews: 0.9367
  - FutureNews: 0.9882
  - Asset Comparision: 0.9725
  === Mean Weighted F1: 0.9448

=> Epoch 7/9





=== Validation Results (Epoch 7):
  - Price or Not: 0.9456
  - Direction Up: 0.9265
  - Direction Constant: 0.9386
  - Direction Down: 0.9454
  - PastPrice: 0.9371
  - FuturePrice: 0.9748
  - PastNews: 0.9416
  - FutureNews: 0.9882
  - Asset Comparision: 0.9798
  === Mean Weighted F1: 0.9531

=> Epoch 8/9





=== Validation Results (Epoch 8):
  - Price or Not: 0.9446
  - Direction Up: 0.9240
  - Direction Constant: 0.9386
  - Direction Down: 0.9500
  - PastPrice: 0.9409
  - FuturePrice: 0.9814
  - PastNews: 0.9392
  - FutureNews: 0.9882
  - Asset Comparision: 0.9850
  === Mean Weighted F1: 0.9547

=> Epoch 9/9





=== Validation Results (Epoch 9):
  - Price or Not: 0.9549
  - Direction Up: 0.9360
  - Direction Constant: 0.9428
  - Direction Down: 0.9501
  - PastPrice: 0.9425
  - FuturePrice: 0.9830
  - PastNews: 0.9488
  - FutureNews: 0.9882
  - Asset Comparision: 0.9867
  === Mean Weighted F1: 0.9592


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 15/21 | LR=5e-05, Epochs=11
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.8762
  - Direction Up: 0.6011
  - Direction Constant: 0.9386
  - Direction Down: 0.6240
  - PastPrice: 0.8968
  - FuturePrice: 0.9503
  - PastNews: 0.8661
  - FutureNews: 0.9882
  - Asset Comparision: 0.7079
  === Mean Weighted F1: 0.8277

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.9134
  - Direction Up: 0.6019
  - Direction Constant: 0.9386
  - Direction Down: 0.5660
  - PastPrice: 0.9285
  - FuturePrice: 0.9503
  - PastNews: 0.9027
  - FutureNews: 0.9882
  - Asset Comparision: 0.8133
  === Mean Weighted F1: 0.8448

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.9245
  - Direction Up: 0.6445
  - Direction Constant: 0.9403
  - Direction Down: 0.6198
  - PastPrice: 0.9246
  - FuturePrice: 0.9503
  - PastNews: 0.9206
  - FutureNews: 0.9882
  - Asset Comparision: 0.9068
  === Mean Weighted F1: 0.8688

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.9318
  - Direction Up: 0.6687
  - Direction Constant: 0.9386
  - Direction Down: 0.7859
  - PastPrice: 0.9229
  - FuturePrice: 0.9503
  - PastNews: 0.9263
  - FutureNews: 0.9882
  - Asset Comparision: 0.9440
  === Mean Weighted F1: 0.8952

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.9296
  - Direction Up: 0.8813
  - Direction Constant: 0.9386
  - Direction Down: 0.9384
  - PastPrice: 0.9317
  - FuturePrice: 0.9503
  - PastNews: 0.9300
  - FutureNews: 0.9882
  - Asset Comparision: 0.9713
  === Mean Weighted F1: 0.9399

=> Epoch 6/11





=== Validation Results (Epoch 6):
  - Price or Not: 0.9419
  - Direction Up: 0.9023
  - Direction Constant: 0.9386
  - Direction Down: 0.9448
  - PastPrice: 0.9274
  - FuturePrice: 0.9503
  - PastNews: 0.9367
  - FutureNews: 0.9882
  - Asset Comparision: 0.9725
  === Mean Weighted F1: 0.9448

=> Epoch 7/11





=== Validation Results (Epoch 7):
  - Price or Not: 0.9456
  - Direction Up: 0.9265
  - Direction Constant: 0.9386
  - Direction Down: 0.9454
  - PastPrice: 0.9371
  - FuturePrice: 0.9748
  - PastNews: 0.9416
  - FutureNews: 0.9882
  - Asset Comparision: 0.9798
  === Mean Weighted F1: 0.9531

=> Epoch 8/11





=== Validation Results (Epoch 8):
  - Price or Not: 0.9446
  - Direction Up: 0.9240
  - Direction Constant: 0.9386
  - Direction Down: 0.9500
  - PastPrice: 0.9409
  - FuturePrice: 0.9814
  - PastNews: 0.9392
  - FutureNews: 0.9882
  - Asset Comparision: 0.9850
  === Mean Weighted F1: 0.9547

=> Epoch 9/11





=== Validation Results (Epoch 9):
  - Price or Not: 0.9549
  - Direction Up: 0.9360
  - Direction Constant: 0.9428
  - Direction Down: 0.9501
  - PastPrice: 0.9425
  - FuturePrice: 0.9830
  - PastNews: 0.9488
  - FutureNews: 0.9882
  - Asset Comparision: 0.9867
  === Mean Weighted F1: 0.9592

=> Epoch 10/11





=== Validation Results (Epoch 10):
  - Price or Not: 0.9504
  - Direction Up: 0.9379
  - Direction Constant: 0.9456
  - Direction Down: 0.9517
  - PastPrice: 0.9437
  - FuturePrice: 0.9844
  - PastNews: 0.9433
  - FutureNews: 0.9882
  - Asset Comparision: 0.9885
  === Mean Weighted F1: 0.9593

=> Epoch 11/11





=== Validation Results (Epoch 11):
  - Price or Not: 0.9571
  - Direction Up: 0.9438
  - Direction Constant: 0.9433
  - Direction Down: 0.9502
  - PastPrice: 0.9413
  - FuturePrice: 0.9875
  - PastNews: 0.9502
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9616





🔍 Run 16/21 | LR=0.0001, Epochs=6


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.8867
  - Direction Up: 0.6019
  - Direction Constant: 0.9386
  - Direction Down: 0.6650
  - PastPrice: 0.8887
  - FuturePrice: 0.9503
  - PastNews: 0.8804
  - FutureNews: 0.9882
  - Asset Comparision: 0.8419
  === Mean Weighted F1: 0.8491

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.9326
  - Direction Up: 0.8529
  - Direction Constant: 0.9386
  - Direction Down: 0.9246
  - PastPrice: 0.9262
  - FuturePrice: 0.9503
  - PastNews: 0.9304
  - FutureNews: 0.9882
  - Asset Comparision: 0.9101
  === Mean Weighted F1: 0.9282

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9472
  - Direction Up: 0.9031
  - Direction Constant: 0.9386
  - Direction Down: 0.9527
  - PastPrice: 0.9302
  - FuturePrice: 0.9499
  - PastNews: 0.9428
  - FutureNews: 0.9882
  - Asset Comparision: 0.9728
  === Mean Weighted F1: 0.9473

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9426
  - Direction Up: 0.9170
  - Direction Constant: 0.9474
  - Direction Down: 0.9553
  - PastPrice: 0.9212
  - FuturePrice: 0.9806
  - PastNews: 0.9343
  - FutureNews: 0.9882
  - Asset Comparision: 0.9859
  === Mean Weighted F1: 0.9525

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9547
  - Direction Up: 0.9361
  - Direction Constant: 0.9492
  - Direction Down: 0.9553
  - PastPrice: 0.9434
  - FuturePrice: 0.9818
  - PastNews: 0.9487
  - FutureNews: 0.9882
  - Asset Comparision: 0.9886
  === Mean Weighted F1: 0.9607

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.9462
  - Direction Up: 0.9310
  - Direction Constant: 0.9469
  - Direction Down: 0.9589
  - PastPrice: 0.9353
  - FuturePrice: 0.9818
  - PastNews: 0.9409
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9580


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 17/21 | LR=0.0001, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.8867
  - Direction Up: 0.6019
  - Direction Constant: 0.9386
  - Direction Down: 0.6650
  - PastPrice: 0.8887
  - FuturePrice: 0.9503
  - PastNews: 0.8804
  - FutureNews: 0.9882
  - Asset Comparision: 0.8419
  === Mean Weighted F1: 0.8491

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.9326
  - Direction Up: 0.8529
  - Direction Constant: 0.9386
  - Direction Down: 0.9246
  - PastPrice: 0.9262
  - FuturePrice: 0.9503
  - PastNews: 0.9304
  - FutureNews: 0.9882
  - Asset Comparision: 0.9101
  === Mean Weighted F1: 0.9282

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.9472
  - Direction Up: 0.9031
  - Direction Constant: 0.9386
  - Direction Down: 0.9527
  - PastPrice: 0.9302
  - FuturePrice: 0.9499
  - PastNews: 0.9428
  - FutureNews: 0.9882
  - Asset Comparision: 0.9728
  === Mean Weighted F1: 0.9473

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.9426
  - Direction Up: 0.9170
  - Direction Constant: 0.9474
  - Direction Down: 0.9553
  - PastPrice: 0.9212
  - FuturePrice: 0.9806
  - PastNews: 0.9343
  - FutureNews: 0.9882
  - Asset Comparision: 0.9859
  === Mean Weighted F1: 0.9525

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.9547
  - Direction Up: 0.9361
  - Direction Constant: 0.9492
  - Direction Down: 0.9553
  - PastPrice: 0.9434
  - FuturePrice: 0.9818
  - PastNews: 0.9487
  - FutureNews: 0.9882
  - Asset Comparision: 0.9886
  === Mean Weighted F1: 0.9607

=> Epoch 6/9





=== Validation Results (Epoch 6):
  - Price or Not: 0.9462
  - Direction Up: 0.9310
  - Direction Constant: 0.9469
  - Direction Down: 0.9589
  - PastPrice: 0.9353
  - FuturePrice: 0.9818
  - PastNews: 0.9409
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9580

=> Epoch 7/9





=== Validation Results (Epoch 7):
  - Price or Not: 0.9522
  - Direction Up: 0.9404
  - Direction Constant: 0.9461
  - Direction Down: 0.9552
  - PastPrice: 0.9470
  - FuturePrice: 0.9890
  - PastNews: 0.9480
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9621

=> Epoch 8/9





=== Validation Results (Epoch 8):
  - Price or Not: 0.9508
  - Direction Up: 0.9387
  - Direction Constant: 0.9627
  - Direction Down: 0.9579
  - PastPrice: 0.9480
  - FuturePrice: 0.9858
  - PastNews: 0.9466
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9638
  ==> Early stopping triggered.


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 18/21 | LR=0.0001, Epochs=11
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.8867
  - Direction Up: 0.6019
  - Direction Constant: 0.9386
  - Direction Down: 0.6650
  - PastPrice: 0.8887
  - FuturePrice: 0.9503
  - PastNews: 0.8804
  - FutureNews: 0.9882
  - Asset Comparision: 0.8419
  === Mean Weighted F1: 0.8491

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.9326
  - Direction Up: 0.8529
  - Direction Constant: 0.9386
  - Direction Down: 0.9246
  - PastPrice: 0.9262
  - FuturePrice: 0.9503
  - PastNews: 0.9304
  - FutureNews: 0.9882
  - Asset Comparision: 0.9101
  === Mean Weighted F1: 0.9282

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.9472
  - Direction Up: 0.9031
  - Direction Constant: 0.9386
  - Direction Down: 0.9527
  - PastPrice: 0.9302
  - FuturePrice: 0.9499
  - PastNews: 0.9428
  - FutureNews: 0.9882
  - Asset Comparision: 0.9728
  === Mean Weighted F1: 0.9473

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.9426
  - Direction Up: 0.9170
  - Direction Constant: 0.9474
  - Direction Down: 0.9553
  - PastPrice: 0.9212
  - FuturePrice: 0.9806
  - PastNews: 0.9343
  - FutureNews: 0.9882
  - Asset Comparision: 0.9859
  === Mean Weighted F1: 0.9525

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.9547
  - Direction Up: 0.9361
  - Direction Constant: 0.9492
  - Direction Down: 0.9553
  - PastPrice: 0.9434
  - FuturePrice: 0.9818
  - PastNews: 0.9487
  - FutureNews: 0.9882
  - Asset Comparision: 0.9886
  === Mean Weighted F1: 0.9607

=> Epoch 6/11





=== Validation Results (Epoch 6):
  - Price or Not: 0.9462
  - Direction Up: 0.9310
  - Direction Constant: 0.9469
  - Direction Down: 0.9589
  - PastPrice: 0.9353
  - FuturePrice: 0.9818
  - PastNews: 0.9409
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9580

=> Epoch 7/11





=== Validation Results (Epoch 7):
  - Price or Not: 0.9522
  - Direction Up: 0.9404
  - Direction Constant: 0.9461
  - Direction Down: 0.9552
  - PastPrice: 0.9470
  - FuturePrice: 0.9890
  - PastNews: 0.9480
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9621

=> Epoch 8/11





=== Validation Results (Epoch 8):
  - Price or Not: 0.9508
  - Direction Up: 0.9387
  - Direction Constant: 0.9627
  - Direction Down: 0.9579
  - PastPrice: 0.9480
  - FuturePrice: 0.9858
  - PastNews: 0.9466
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9638
  ==> Early stopping triggered.





🔍 Run 19/21 | LR=0.001, Epochs=6


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.9079
  - Direction Up: 0.9196
  - Direction Constant: 0.9559
  - Direction Down: 0.9546
  - PastPrice: 0.9156
  - FuturePrice: 0.9885
  - PastNews: 0.9001
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9473

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.9425
  - Direction Up: 0.9375
  - Direction Constant: 0.9762
  - Direction Down: 0.9520
  - PastPrice: 0.9442
  - FuturePrice: 0.9921
  - PastNews: 0.9354
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9630

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9397
  - Direction Up: 0.9135
  - Direction Constant: 0.9730
  - Direction Down: 0.9333
  - PastPrice: 0.9312
  - FuturePrice: 0.9868
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9556

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9580
  - Direction Up: 0.9396
  - Direction Constant: 0.9770
  - Direction Down: 0.9613
  - PastPrice: 0.9436
  - FuturePrice: 0.9753
  - PastNews: 0.9556
  - FutureNews: 0.9883
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9665

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9460
  - Direction Up: 0.9319
  - Direction Constant: 0.9809
  - Direction Down: 0.9624
  - PastPrice: 0.9457
  - FuturePrice: 0.9902
  - PastNews: 0.9444
  - FutureNews: 0.9902
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9657
  ==> Early stopping triggered.


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



🔍 Run 20/21 | LR=0.001, Epochs=9
Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/9





=== Validation Results (Epoch 1):
  - Price or Not: 0.9079
  - Direction Up: 0.9196
  - Direction Constant: 0.9559
  - Direction Down: 0.9546
  - PastPrice: 0.9156
  - FuturePrice: 0.9885
  - PastNews: 0.9001
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9473

=> Epoch 2/9





=== Validation Results (Epoch 2):
  - Price or Not: 0.9425
  - Direction Up: 0.9375
  - Direction Constant: 0.9762
  - Direction Down: 0.9520
  - PastPrice: 0.9442
  - FuturePrice: 0.9921
  - PastNews: 0.9354
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9630

=> Epoch 3/9





=== Validation Results (Epoch 3):
  - Price or Not: 0.9397
  - Direction Up: 0.9135
  - Direction Constant: 0.9730
  - Direction Down: 0.9333
  - PastPrice: 0.9312
  - FuturePrice: 0.9868
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9556

=> Epoch 4/9





=== Validation Results (Epoch 4):
  - Price or Not: 0.9580
  - Direction Up: 0.9396
  - Direction Constant: 0.9770
  - Direction Down: 0.9613
  - PastPrice: 0.9436
  - FuturePrice: 0.9753
  - PastNews: 0.9556
  - FutureNews: 0.9883
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9665

=> Epoch 5/9





=== Validation Results (Epoch 5):
  - Price or Not: 0.9460
  - Direction Up: 0.9319
  - Direction Constant: 0.9809
  - Direction Down: 0.9624
  - PastPrice: 0.9457
  - FuturePrice: 0.9902
  - PastNews: 0.9444
  - FutureNews: 0.9902
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9657
  ==> Early stopping triggered.





🔍 Run 21/21 | LR=0.001, Epochs=11


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/11





=== Validation Results (Epoch 1):
  - Price or Not: 0.9079
  - Direction Up: 0.9196
  - Direction Constant: 0.9559
  - Direction Down: 0.9546
  - PastPrice: 0.9156
  - FuturePrice: 0.9885
  - PastNews: 0.9001
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9473

=> Epoch 2/11





=== Validation Results (Epoch 2):
  - Price or Not: 0.9425
  - Direction Up: 0.9375
  - Direction Constant: 0.9762
  - Direction Down: 0.9520
  - PastPrice: 0.9442
  - FuturePrice: 0.9921
  - PastNews: 0.9354
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9630

=> Epoch 3/11





=== Validation Results (Epoch 3):
  - Price or Not: 0.9397
  - Direction Up: 0.9135
  - Direction Constant: 0.9730
  - Direction Down: 0.9333
  - PastPrice: 0.9312
  - FuturePrice: 0.9868
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9556

=> Epoch 4/11





=== Validation Results (Epoch 4):
  - Price or Not: 0.9580
  - Direction Up: 0.9396
  - Direction Constant: 0.9770
  - Direction Down: 0.9613
  - PastPrice: 0.9436
  - FuturePrice: 0.9753
  - PastNews: 0.9556
  - FutureNews: 0.9883
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9665

=> Epoch 5/11





=== Validation Results (Epoch 5):
  - Price or Not: 0.9460
  - Direction Up: 0.9319
  - Direction Constant: 0.9809
  - Direction Down: 0.9624
  - PastPrice: 0.9457
  - FuturePrice: 0.9902
  - PastNews: 0.9444
  - FutureNews: 0.9902
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9657
  ==> Early stopping triggered.





==> Best Configuration Found:
LR               0.001
Epochs               6
ValLoss       0.100429
ValMeanF1     0.963028
TestLoss      0.103347
TestMeanF1    0.963393
Method         adapter
Name: 18, dtype: object


## Training models with best hyperparameters

### Bert training

Full fine-tuning

In [None]:
model_name = "bert-base-uncased"
method = "full"
lr = 1e-5
epochs = 3
batch_size = 8
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9257
  - Direction Up: 0.9474
  - Direction Constant: 0.9386
  - Direction Down: 0.9536
  - PastPrice: 0.9164
  - FuturePrice: 0.9794
  - PastNews: 0.9235
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9518

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9424
  - Direction Up: 0.9457
  - Direction Constant: 0.9521
  - Direction Down: 0.9563
  - PastPrice: 0.9271
  - FuturePrice: 0.9756
  - PastNews: 0.9399
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9585

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9577
  - Direction Up: 0.9508
  - Direction Constant: 0.9791
  - Direction Down: 0.9666
  - PastPrice: 0.9532
  - FuturePrice: 0.9899
  - PastNews: 0.9545
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9710




In [None]:
print("Full fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Full fine tuning result for News Headline Classification on BERT model 0.9687936119060078


Lora Fine tuning

In [None]:
model_name = "bert-base-uncased"
method = "lora"
lr = 1e-3
epochs = 3
batch_size = 16
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9455
  - Direction Up: 0.9211
  - Direction Constant: 0.9456
  - Direction Down: 0.9511
  - PastPrice: 0.9339
  - FuturePrice: 0.9818
  - PastNews: 0.9423
  - FutureNews: 0.9882
  - Asset Comparision: 0.9930
  === Mean Weighted F1: 0.9558

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9517
  - Direction Up: 0.9405
  - Direction Constant: 0.9617
  - Direction Down: 0.9396
  - PastPrice: 0.9455
  - FuturePrice: 0.9858
  - PastNews: 0.9487
  - FutureNews: 0.9877
  - Asset Comparision: 0.9921
  === Mean Weighted F1: 0.9615

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9497
  - Direction Up: 0.9333
  - Direction Constant: 0.9739
  - Direction Down: 0.9474
  - PastPrice: 0.9359
  - FuturePrice: 0.9844
  - PastNews: 0.9459
  - FutureNews: 0.9882
  - Asset Comparision: 0.9965
  === Mean Weighted F1: 0.9617




In [None]:
print("Lora fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Lora fine tuning result for News Headline Classification on BERT model 0.9609673319412221


Adapter Fine tuning

In [None]:
model_name = "bert-base-uncased"
method = "adapter"
lr = 1e-3
epochs = 6
batch_size = 16
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.9079
  - Direction Up: 0.9196
  - Direction Constant: 0.9559
  - Direction Down: 0.9546
  - PastPrice: 0.9156
  - FuturePrice: 0.9885
  - PastNews: 0.9001
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9473

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.9425
  - Direction Up: 0.9375
  - Direction Constant: 0.9762
  - Direction Down: 0.9520
  - PastPrice: 0.9442
  - FuturePrice: 0.9921
  - PastNews: 0.9354
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9630

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9397
  - Direction Up: 0.9135
  - Direction Constant: 0.9730
  - Direction Down: 0.9333
  - PastPrice: 0.9312
  - FuturePrice: 0.9868
  - PastNews: 0.9364
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9556

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9580
  - Direction Up: 0.9396
  - Direction Constant: 0.9770
  - Direction Down: 0.9613
  - PastPrice: 0.9436
  - FuturePrice: 0.9753
  - PastNews: 0.9556
  - FutureNews: 0.9883
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9665

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9460
  - Direction Up: 0.9319
  - Direction Constant: 0.9809
  - Direction Down: 0.9624
  - PastPrice: 0.9457
  - FuturePrice: 0.9902
  - PastNews: 0.9444
  - FutureNews: 0.9902
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9657
  ==> Early stopping triggered.




In [None]:
print("Adapter fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Adapter fine tuning result for News Headline Classification on BERT model 0.9633931512407339


### FinBert training

Full fine tuning

In [None]:
model_name = "yiyanghkust/finbert-pretrain"
method = "full"
lr = 1e-5
epochs = 3
batch_size = 8
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

config.json:   0%|          | 0.00/359 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


pytorch_model.bin:   0%|          | 0.00/442M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,758,729 / 109,758,729 (100.00%)

=> Epoch 1/3


Training:   0%|          | 4/999 [00:00<01:39, 10.04it/s, loss=0.66]

model.safetensors:   0%|          | 0.00/442M [00:00<?, ?B/s]




=== Validation Results (Epoch 1):
  - Price or Not: 0.9372
  - Direction Up: 0.9401
  - Direction Constant: 0.9704
  - Direction Down: 0.9580
  - PastPrice: 0.9195
  - FuturePrice: 0.9769
  - PastNews: 0.9254
  - FutureNews: 0.9882
  - Asset Comparision: 0.9938
  === Mean Weighted F1: 0.9566

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9510
  - Direction Up: 0.9553
  - Direction Constant: 0.9786
  - Direction Down: 0.9615
  - PastPrice: 0.9432
  - FuturePrice: 0.9870
  - PastNews: 0.9452
  - FutureNews: 0.9882
  - Asset Comparision: 0.9974
  === Mean Weighted F1: 0.9675

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9544
  - Direction Up: 0.9499
  - Direction Constant: 0.9772
  - Direction Down: 0.9587
  - PastPrice: 0.9508
  - FuturePrice: 0.9876
  - PastNews: 0.9483
  - FutureNews: 0.9895
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9684




In [None]:
print("Full fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Full fine tuning result for News Headline Classification on BERT model 0.9674159313667696


Lora fine tuning

In [None]:
model_name = "yiyanghkust/finbert-pretrain"
method = "lora"
lr = 1e-3
epochs = 3
batch_size = 16
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 110,208,018 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9366
  - Direction Up: 0.9162
  - Direction Constant: 0.9706
  - Direction Down: 0.9553
  - PastPrice: 0.9219
  - FuturePrice: 0.9836
  - PastNews: 0.9294
  - FutureNews: 0.9882
  - Asset Comparision: 0.9956
  === Mean Weighted F1: 0.9553

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9420
  - Direction Up: 0.9368
  - Direction Constant: 0.9777
  - Direction Down: 0.9554
  - PastPrice: 0.9420
  - FuturePrice: 0.9847
  - PastNews: 0.9422
  - FutureNews: 0.9900
  - Asset Comparision: 0.9938
  === Mean Weighted F1: 0.9627

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9347
  - Direction Up: 0.9307
  - Direction Constant: 0.9738
  - Direction Down: 0.9388
  - PastPrice: 0.9301
  - FuturePrice: 0.9849
  - PastNews: 0.9289
  - FutureNews: 0.9877
  - Asset Comparision: 0.9912
  === Mean Weighted F1: 0.9556




In [None]:
print("Lora fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Lora fine tuning result for News Headline Classification on BERT model 0.958404170123917


Adapter fine tuning

In [None]:
model_name = "yiyanghkust/finbert-pretrain"
method = "adapter"
lr = 1e-3
epochs = 6
batch_size = 16
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,547,785 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.8883
  - Direction Up: 0.9357
  - Direction Constant: 0.9733
  - Direction Down: 0.9300
  - PastPrice: 0.8983
  - FuturePrice: 0.9879
  - PastNews: 0.8835
  - FutureNews: 0.9877
  - Asset Comparision: 0.9878
  === Mean Weighted F1: 0.9414

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.9393
  - Direction Up: 0.9455
  - Direction Constant: 0.9787
  - Direction Down: 0.9545
  - PastPrice: 0.9428
  - FuturePrice: 0.9892
  - PastNews: 0.9349
  - FutureNews: 0.9894
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9636

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9369
  - Direction Up: 0.9457
  - Direction Constant: 0.9802
  - Direction Down: 0.9266
  - PastPrice: 0.9309
  - FuturePrice: 0.9896
  - PastNews: 0.9333
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9590

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9469
  - Direction Up: 0.9396
  - Direction Constant: 0.9779
  - Direction Down: 0.9560
  - PastPrice: 0.9436
  - FuturePrice: 0.9907
  - PastNews: 0.9399
  - FutureNews: 0.9895
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9647

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9521
  - Direction Up: 0.9509
  - Direction Constant: 0.9835
  - Direction Down: 0.9477
  - PastPrice: 0.9470
  - FuturePrice: 0.9921
  - PastNews: 0.9438
  - FutureNews: 0.9864
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9669
  ==> Early stopping triggered.




In [None]:
print("Adapter fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Adapter fine tuning result for News Headline Classification on BERT model 0.9674231836641243


### FlangBert training

Full fine tuning

In [None]:
model_name = "SALT-NLP/FLANG-BERT"
method = "full"
lr = 1e-5
epochs = 3
batch_size = 8
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

tokenizer_config.json:   0%|          | 0.00/369 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/664 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Bert case: 109,489,161 / 109,489,161 (100.00%)

=> Epoch 1/3


Training:   1%|          | 5/999 [00:00<01:37, 10.15it/s, loss=0.615]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]




=== Validation Results (Epoch 1):
  - Price or Not: 0.9272
  - Direction Up: 0.9422
  - Direction Constant: 0.9704
  - Direction Down: 0.9544
  - PastPrice: 0.9182
  - FuturePrice: 0.9794
  - PastNews: 0.9207
  - FutureNews: 0.9882
  - Asset Comparision: 0.9921
  === Mean Weighted F1: 0.9548

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9403
  - Direction Up: 0.9483
  - Direction Constant: 0.9815
  - Direction Down: 0.9564
  - PastPrice: 0.9324
  - FuturePrice: 0.9864
  - PastNews: 0.9369
  - FutureNews: 0.9882
  - Asset Comparision: 0.9974
  === Mean Weighted F1: 0.9631

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9472
  - Direction Up: 0.9527
  - Direction Constant: 0.9811
  - Direction Down: 0.9607
  - PastPrice: 0.9460
  - FuturePrice: 0.9900
  - PastNews: 0.9400
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9672




In [None]:
print("Full fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Full fine tuning result for News Headline Classification on BERT model 0.96784471634007


Lora fine tuning

In [None]:
model_name = "SALT-NLP/FLANG-BERT"
method = "lora"
lr = 1e-3
epochs = 3
batch_size = 16
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Lora case: 449,289 / 109,938,450 (0.41%)

=> Epoch 1/3





=== Validation Results (Epoch 1):
  - Price or Not: 0.9467
  - Direction Up: 0.9413
  - Direction Constant: 0.9727
  - Direction Down: 0.9467
  - PastPrice: 0.9321
  - FuturePrice: 0.9844
  - PastNews: 0.9381
  - FutureNews: 0.9877
  - Asset Comparision: 0.9974
  === Mean Weighted F1: 0.9608

=> Epoch 2/3





=== Validation Results (Epoch 2):
  - Price or Not: 0.9468
  - Direction Up: 0.9369
  - Direction Constant: 0.9793
  - Direction Down: 0.9556
  - PastPrice: 0.9418
  - FuturePrice: 0.9844
  - PastNews: 0.9409
  - FutureNews: 0.9870
  - Asset Comparision: 0.9974
  === Mean Weighted F1: 0.9633

=> Epoch 3/3





=== Validation Results (Epoch 3):
  - Price or Not: 0.9531
  - Direction Up: 0.9378
  - Direction Constant: 0.9785
  - Direction Down: 0.9531
  - PastPrice: 0.9389
  - FuturePrice: 0.9880
  - PastNews: 0.9454
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9646




In [None]:
print("Lora fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Lora fine tuning result for News Headline Classification on BERT model 0.9647934153159602


Adapter Fine tuning

In [None]:
model_name = "SALT-NLP/FLANG-BERT"
method = "adapter"
lr = 1e-3
epochs = 6
batch_size = 16
seed = 42
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataset_path = '/content/finalDataset_0208.csv'

dataloaders, label_names = prepare_data(
        model_name=model_name,
        dataset_path=dataset_path,
        batch_size=batch_size,
        seed=seed
    )

result = train_model(
                model_name=model_name,
                method=method,
                learning_rate=lr,
                num_epochs=epochs,
                batch_size=batch_size,
                seed=seed,
                device=device,
                label_names=label_names,
                dataloaders=dataloaders
            )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters in Adapter case: 1,795,977 / 111,278,217 (1.61%)

=> Epoch 1/6





=== Validation Results (Epoch 1):
  - Price or Not: 0.9489
  - Direction Up: 0.9244
  - Direction Constant: 0.9627
  - Direction Down: 0.9327
  - PastPrice: 0.9459
  - FuturePrice: 0.9900
  - PastNews: 0.9498
  - FutureNews: 0.9882
  - Asset Comparision: 0.9858
  === Mean Weighted F1: 0.9587

=> Epoch 2/6





=== Validation Results (Epoch 2):
  - Price or Not: 0.9270
  - Direction Up: 0.9275
  - Direction Constant: 0.9802
  - Direction Down: 0.9257
  - PastPrice: 0.9201
  - FuturePrice: 0.9858
  - PastNews: 0.9190
  - FutureNews: 0.9894
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9526

=> Epoch 3/6





=== Validation Results (Epoch 3):
  - Price or Not: 0.9536
  - Direction Up: 0.9491
  - Direction Constant: 0.9699
  - Direction Down: 0.9588
  - PastPrice: 0.9483
  - FuturePrice: 0.9844
  - PastNews: 0.9513
  - FutureNews: 0.9882
  - Asset Comparision: 0.9991
  === Mean Weighted F1: 0.9670

=> Epoch 4/6





=== Validation Results (Epoch 4):
  - Price or Not: 0.9232
  - Direction Up: 0.9371
  - Direction Constant: 0.9783
  - Direction Down: 0.9553
  - PastPrice: 0.9239
  - FuturePrice: 0.9909
  - PastNews: 0.9227
  - FutureNews: 0.9882
  - Asset Comparision: 0.9982
  === Mean Weighted F1: 0.9575

=> Epoch 5/6





=== Validation Results (Epoch 5):
  - Price or Not: 0.9495
  - Direction Up: 0.9312
  - Direction Constant: 0.9792
  - Direction Down: 0.9544
  - PastPrice: 0.9426
  - FuturePrice: 0.9899
  - PastNews: 0.9490
  - FutureNews: 0.9882
  - Asset Comparision: 0.9965
  === Mean Weighted F1: 0.9645

=> Epoch 6/6





=== Validation Results (Epoch 6):
  - Price or Not: 0.9501
  - Direction Up: 0.9492
  - Direction Constant: 0.9842
  - Direction Down: 0.9530
  - PastPrice: 0.9472
  - FuturePrice: 0.9893
  - PastNews: 0.9472
  - FutureNews: 0.9882
  - Asset Comparision: 1.0000
  === Mean Weighted F1: 0.9676




In [None]:
print("Adapter fine tuning result for News Headline Classification on BERT model",result["TestMeanF1"])

Adapter fine tuning result for News Headline Classification on BERT model 0.968865551473793


# Named Entity Recognition

## Data loading and preprocessing

Dataset is in my google drive, so links is dependent on my drive. If you need to run it by yourself you need to get the datasets from: https://huggingface.co/datasets/tner/fin

- We tried to find it on the paper, however there is no data on it in the paper. However, we found the same dataset on huggingface by looking the references in the paper (like they referenced another paper and we looked at this referenced paper)

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
import torch
import torch.optim as optim
import numpy as np
from torch.utils.data import DataLoader, TensorDataset, random_split
from transformers import (
    AutoTokenizer,
    AutoModelForTokenClassification,
    get_linear_schedule_with_warmup
)
from peft import LoraConfig, get_peft_model, TaskType
from sklearn.metrics import f1_score
from tqdm import tqdm
import adapters
from adapters import AdapterConfig

# Device configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
def load_data(tokenizer, train=True, label_all_tokens=True):
    """
    Load and preprocess NER data

    Args:
        tokenizer: HuggingFace tokenizer
        train: load training data or testing data
        label_all_tokens: Whether to label all tokens or just the first token of a word

    Returns:
        dataset: TensorDataset containing input_ids, attention_masks, and labels
        num_labels: Number of unique labels
        label_to_id: Dictionary mapping labels to IDs
    """

    # file path will change depending on your drive, make sure to put correct path
    if train:
        file_name = '/content/drive/MyDrive/application_nlp cv/advanced_nlp/FIN5.txt'
    else:
        file_name = '/content/drive/MyDrive/application_nlp cv/advanced_nlp/FIN3.txt'

    with open(file_name, encoding="utf-8") as f:
        lines = [line.rstrip() for line in f]

    sentences_tags = []
    sentence_tags = []
    sentences = []
    sentence = []
    i = 0
    max_length = 0
    unique_tags = set()

    # getting sentences and corresponding tokens for each word
    while i < len(lines):
        if not lines[i] and sentence and sentence_tags:
            tokens = tokenizer(sentence, is_split_into_words=True)
            if len(tokens['input_ids']) > 512:
                sentence = []
                sentence_tags = []
                continue
            max_length = max(max_length, len(tokens['input_ids']))
            sentences.append(sentence)
            sentences_tags.append(sentence_tags)
            unique_tags = unique_tags | set(sentence_tags)
            sentence = []
            sentence_tags = []
        elif lines[i] and "DOCSTART" not in lines[i]:
            word, _, _, tag = lines[i].split(" ")
            sentence.append(word)
            sentence_tags.append(tag)
        i += 1

    label_list = list(unique_tags)
    label_list.sort()
    label_to_id = {l: i for i, l in enumerate(label_list)}
    print(label_to_id)
    num_labels = len(label_list)

    # giving it to tokenizer
    tokenized_inputs = tokenizer(sentences, max_length=max_length, padding='max_length', is_split_into_words=True, return_tensors='pt')
    input_ids = tokenized_inputs['input_ids']
    attention_masks = tokenized_inputs['attention_mask']

    # getting labels
    labels = []
    for i, label in enumerate(sentences_tags):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            if word_idx is None:
                label_ids.append(-100)
            elif word_idx != previous_word_idx:
                label_ids.append(label_to_id[label[word_idx]])
            else:
                label_ids.append(label_to_id[label[word_idx]] if label_all_tokens else -100)
            previous_word_idx = word_idx
        labels.append(label_ids)

    labels = torch.LongTensor(labels)

    return TensorDataset(input_ids, attention_masks, labels), num_labels, label_to_id

In [None]:
def prepare_datasets(model_name, label_all_tokens=True):
    """
    Prepare datasets for training, validation, and testing

    Args:
        model_name: Name of the pretrained model (as we are going to use 3 Bert models)
        label_all_tokens: Whether to label all tokens or just the first token of a word

    Returns:
        train_dataset: Training dataset
        val_dataset: Validation dataset
        test_dataset: Test dataset
        num_labels: Number of unique labels
        label_to_id: Dictionary mapping labels to IDs
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name, do_lower_case=False, do_basic_tokenize=True)

    train_dataset, num_labels, label_to_id = load_data(tokenizer, train=True, label_all_tokens=label_all_tokens)
    test_dataset, _, _ = load_data(tokenizer, train=False, label_all_tokens=label_all_tokens)

    # splitting val and train data
    val_length = int(len(train_dataset) * 0.2)
    train_length = len(train_dataset) - val_length

    # Create loss weights (as we will focus on 4 classes except 'O' label)
    weights = torch.ones(num_labels).to(device) * 1 / (num_labels - 1)
    weights[weights.size(0) - 1] = 0
    print(weights)
    print("length_dataset", train_length, val_length, len(test_dataset))

    return train_dataset, num_labels, label_to_id, test_dataset, train_length, val_length, weights

In [None]:
def get_model(model_name, num_labels, fine_tuning_method="full"):
    """
    Get model based on fine-tuning method

    Args:
        model_name: Name of the pretrained model
        num_labels: Number of unique labels
        fine_tuning_method: One of "full", "lora", or "adapter"

    Returns:
        model: Model with specified fine-tuning configuration
    """
    model = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=num_labels)

    if fine_tuning_method == "lora":
        # LoRA configuration
        lora_config = LoraConfig(
            task_type=TaskType.TOKEN_CLS,
            r=16,
            lora_alpha=16,
            lora_dropout=0.05,
            target_modules=["query", "key", "value"],
            bias="none"
        )
        model = get_peft_model(model, lora_config)
    elif fine_tuning_method == "adapter":
        # Adapter configuration
        adapter_config = AdapterConfig.load("houlsby")
        adapters.init(model)
        adapter_name = "ner_adapter"
        model.add_adapter(adapter_name, config=adapter_config)
        model.train_adapter(adapter_name)
        model.set_active_adapters(adapter_name)

    model.to(device)

    # Print trainable parameters (to check are we really applying parameter efficient fine tuning methods)
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Trainable parameters: {trainable_params:,} / {total_params:,}",
          f"({100 * trainable_params / total_params:.2f}%)")

    return model

In [None]:
def train_and_evaluate(
    model_name,
    fine_tuning_method="full",
    batch_sizes=[16],
    learning_rates=[1e-4],
    num_epochs=5,
    early_stopping=3,
    label_all_tokens=True,
    seed=42
):
    """
    Train and evaluate model with specified hyperparameters

    Args:
        model_name: Name of the pretrained model
        fine_tuning_method: One of "full", "lora", or "adapter"
        batch_sizes: List of batch sizes to try (as we are limited with computation power, we decided to fix batch size which was shown on the paper)
        learning_rates: List of learning rates to try
        num_epochs: Maximum number of epochs to train
        early_stopping: Number of epochs with no improvement before stopping
        label_all_tokens: Whether to label all tokens or just the first token of a word
        seed: Random seed

    Returns:
        results: List of results for each hyperparameter configuration
    """
    # Prepare datasets
    train_dataset, num_labels, label_to_id, test_dataset, train_length, val_length, weights = prepare_datasets(
        model_name, label_all_tokens
    )

    # Set loss function
    criterion = torch.nn.CrossEntropyLoss(weight=weights)

    results = []
    eps = 1e-2
    count = 0

    for j, lr in enumerate(learning_rates):
        for k, bs in enumerate(batch_sizes):
            count += 1
            print(f'\n== Experiment {count} of {len(batch_sizes) * len(learning_rates)}:')
            print(f'=> Learning Rate: {lr}, Batch Size: {bs}')

            # Set random seeds
            torch.manual_seed(seed)
            np.random.seed(seed)

            # Split datasets
            train, val = random_split(dataset=train_dataset, lengths=[train_length, val_length])
            dataloaders_dict = {
                'train': DataLoader(train, batch_size=bs, shuffle=True),
                'val': DataLoader(val, batch_size=bs, shuffle=False),
                'test': DataLoader(test_dataset, batch_size=bs, shuffle=False)
            }

            # Get model
            model = get_model(model_name, num_labels, fine_tuning_method)

            # Setup optimizer
            optimizer = optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=lr)

            # Add learning rate scheduler
            total_steps = len(dataloaders_dict['train']) * num_epochs
            scheduler = get_linear_schedule_with_warmup(
                optimizer,
                num_warmup_steps=int(0.1 * total_steps),  # 10% warmup
                num_training_steps=total_steps
            )

            early_stopping_count = 0
            best_accuracy = float('-inf')
            best_f1 = float('-inf')
            best_ce = float('inf')

            # For numerical stability tracking (in some cases model is giving nan logits, so i will decided to skip this batches for training)
            prev_loss = None

            for epoch in range(num_epochs):
                print(f'\n=== Epoch {epoch + 1}')
                if early_stopping_count >= early_stopping:
                    break

                for phase in ['train', 'val']:
                    print(f'-- {phase.upper()} Phase')
                    if phase == 'train':
                        model.train()
                        early_stopping_count += 1
                    else:
                        model.eval()

                    curr_total = 0
                    curr_correct = 0
                    curr_accuracy = 0
                    curr_ce = 0
                    actual = np.array([])
                    pred = np.array([])

                    # Track running loss for monitoring
                    running_loss = 0.0
                    num_batches = 0

                    for input_ids, attention_masks, labels in tqdm(dataloaders_dict[phase], desc=f"{phase.capitalize()} Loop"):
                        input_ids = input_ids.to(device)
                        attention_masks = attention_masks.to(device)
                        labels = labels.to(device)

                        optimizer.zero_grad()

                        with torch.set_grad_enabled(phase == 'train'):
                            outputs = model(input_ids=input_ids, attention_mask=attention_masks, labels=labels)

                            # Handle potential NaN values in logits
                            if torch.isnan(outputs.logits).any():
                                print(f"Warning: NaN detected in logits - skipping batch")
                                continue

                            active_loss = attention_masks.view(-1) == 1
                            logits = outputs.logits
                            active_logits = logits.view(-1, num_labels)
                            active_labels = torch.where(
                                active_loss, labels.view(-1), torch.tensor(criterion.ignore_index).type_as(labels)
                            )

                            # Check if we have valid elements for loss calculation
                            if torch.sum(active_loss) == 0:
                                print("Warning: No active elements for loss calculation")
                                continue

                            # Loss calculation with safety checks
                            try:
                                loss = criterion(active_logits, active_labels)

                                # Skip batch if loss is NaN
                                if torch.isnan(loss).any():
                                    print(f"Warning: NaN loss detected - skipping batch")
                                    continue

                                # Track running loss
                                running_loss += loss.item()
                                num_batches += 1

                                if phase == 'train':
                                    loss.backward()

                                    # Gradient clipping to prevent explosion
                                    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

                                    optimizer.step()
                                    scheduler.step()
                                else:
                                    curr_pred = outputs.logits.argmax(dim=-1).detach().cpu().clone().numpy()
                                    curr_actual = labels.detach().cpu().clone().numpy()

                                    # Safe extraction of valid predictions and labels
                                    batch_true_predictions = []
                                    batch_true_labels = []

                                    for pred_seq, gold_seq in zip(curr_pred, curr_actual):
                                        valid_pairs = [(p, l) for p, l in zip(pred_seq, gold_seq)
                                                      if l != -100 and l != label_to_id['O']]
                                        if valid_pairs:
                                            batch_preds, batch_labels = zip(*valid_pairs)
                                            batch_true_predictions.extend(batch_preds)
                                            batch_true_labels.extend(batch_labels)

                                    if batch_true_predictions:
                                        curr_correct += np.sum(np.array(batch_true_predictions) == np.array(batch_true_labels))
                                        curr_total += len(batch_true_predictions)
                                        curr_ce += loss.item() * input_ids.size(0)

                                        # Safely extend our arrays
                                        if len(batch_true_labels) > 0:
                                            if len(actual) == 0:
                                                actual = np.array(batch_true_labels)
                                                pred = np.array(batch_true_predictions)
                                            else:
                                                actual = np.concatenate([actual, np.array(batch_true_labels)], axis=0)
                                                pred = np.concatenate([pred, np.array(batch_true_predictions)], axis=0)
                            except Exception as e:
                                print(f"Error in loss calculation: {e}")
                                continue

                    # Report average loss for this epoch
                    if num_batches > 0:
                        epoch_loss = running_loss / num_batches
                        print(f"{phase} Loss: {epoch_loss:.4f}")

                        # Check for dramatic loss increases that might indicate instability
                        if phase == 'train' and prev_loss is not None:
                            if epoch_loss > prev_loss * 2:  # Loss more than doubled
                                print(f"Warning: Loss increased dramatically from {prev_loss:.4f} to {epoch_loss:.4f}")

                        if phase == 'train':
                            prev_loss = epoch_loss

                    if phase == 'val':
                        if curr_total > 0:  # Make sure we have valid samples
                            curr_accuracy = curr_correct / curr_total
                            curr_f1 = f1_score(actual, pred, average='weighted') if len(actual) > 0 else 0.0
                            curr_ce = curr_ce / len(val) if len(val) > 0 else float('inf')

                            if curr_ce <= best_ce + eps:
                                best_ce = curr_ce
                                early_stopping_count = 0
                            if curr_accuracy >= best_accuracy + eps:
                                best_accuracy = curr_accuracy
                                early_stopping_count = 0
                            if curr_f1 >= best_f1 + eps:
                                best_f1 = curr_f1
                                early_stopping_count = 0

                            print("== Val Cross Entropy: ", curr_ce)
                            print("== Val Accuracy: ", curr_accuracy)
                            print("== Val F1: ", curr_f1)
                            print("== Early Stopping Count: ", early_stopping_count)

            # Test phase
            test_accuracy = 0
            test_total = 0
            test_correct = 0
            actual = np.array([])
            pred = np.array([])

            model.eval()  # Ensure model is in evaluation mode

            for input_ids, attention_masks, labels in tqdm(dataloaders_dict['test'], desc="Testing"):
                input_ids = input_ids.to(device)
                attention_masks = attention_masks.to(device)
                labels = labels.to(device)

                with torch.no_grad():
                    outputs = model(input_ids=input_ids, attention_mask=attention_masks, labels=labels)

                    curr_pred = outputs.logits.argmax(dim=-1).detach().cpu().clone().numpy()
                    curr_actual = labels.detach().cpu().clone().numpy()
                    true_predictions = np.concatenate([
                        [p for (p, l) in zip(pred, gold_label) if l != -100 and l != label_to_id['O']]
                        for pred, gold_label in zip(curr_pred, curr_actual)
                    ])
                    true_labels = np.concatenate([
                        [l for (p, l) in zip(pred, gold_label) if l != -100 and l != label_to_id['O']]
                        for pred, gold_label in zip(curr_pred, curr_actual)
                    ])

                    test_total += len(true_predictions)
                    test_correct += np.sum(true_predictions == true_labels)
                    actual = np.concatenate([actual, true_labels], axis=0)
                    pred = np.concatenate([pred, true_predictions], axis=0)

            test_accuracy = test_correct / test_total
            test_f1 = f1_score(actual, pred, average='weighted') if len(actual) > 0 else 0.0

            print(f"\n Test Results — LR: {lr}, BS: {bs}")
            print(f" Test Accuracy: {test_accuracy:.4f}")
            print(f" Test F1 Score: {test_f1:.4f}")

            results.append([seed, lr, bs, best_ce, best_accuracy, best_f1, test_accuracy, test_f1])

    return results

In [None]:
def hyperparameter_search(
    model_name,
    fine_tuning_methods=["full", "lora", "adapter"],
    learning_rates=[1e-5, 1e-4, 1e-3],
    batch_sizes=[16, 32],
    epochs=[3, 5, 7],
    label_all_tokens=True,
    seed=42
):
    """
    Perform hyperparameter search across different fine-tuning methods

    Args:
        model_name: Name of the pretrained model
        fine_tuning_methods: List of fine-tuning methods to try
        learning_rates: List of learning rates to try
        batch_sizes: List of batch sizes to try
        epochs: List of max epochs to try
        label_all_tokens: Whether to label all tokens or just the first token of a word
        seed: Random seed

    Returns:
        all_results: Dictionary mapping fine-tuning methods to their results
    """
    all_results = {}

    for method in fine_tuning_methods:
        print(f"\n{'=' * 50}")
        print(f"Running experiments for {method} fine-tuning on {model_name}")
        print(f"{'=' * 50}")

        method_results = []

        for num_epochs in epochs:
            results = train_and_evaluate(
                model_name=model_name,
                fine_tuning_method=method,
                batch_sizes=batch_sizes,
                learning_rates=learning_rates,
                num_epochs=num_epochs,
                label_all_tokens=label_all_tokens,
                seed=seed
            )

            for result in results:
                # Add fine-tuning method and epochs to results
                method_results.append([method, num_epochs] + result)

        all_results[method] = method_results

    return all_results

In [None]:
# this function goes through methods as loop, however we will run only one method in each cell. Maybe in the future for getting the best method among all
# other this function will be helpful. So, i decided to left it.
def display_results(all_results):
    """
    Display results from hyperparameter search

    Args:
        all_results: Dictionary mapping fine-tuning methods to their results
    """
    print("\n" + "=" * 80)
    print("SUMMARY OF RESULTS")
    print("=" * 80)

    for method, results in all_results.items():
        print(f"\n{method.upper()} FINE-TUNING RESULTS:")
        print("-" * 60)
        print(f"{'Method':<8} {'Epochs':<6} {'LR':<8} {'BS':<4} {'Val CE':<8} {'Val Acc':<8} {'Val F1':<8} {'Test Acc':<8} {'Test F1':<8}")
        print("-" * 60)

        # Sort by test F1 score (descending) (however we choose best one depending on val F1 as it is hyperparameter search)
        sorted_results = sorted(results, key=lambda x: x[-1], reverse=True)

        for result in sorted_results:
            method, epochs, seed, lr, bs, val_ce, val_acc, val_f1, test_acc, test_f1 = result
            print(f"{method:<8} {epochs:<6d} {lr:<8.6f} {bs:<4d} {val_ce:<8.4f} {val_acc:<8.4f} {val_f1:<8.4f} {test_acc:<8.4f} {test_f1:<8.4f}")

    # Find best overall configuration
    best_method = None
    best_config = None
    best_f1 = -1

    for method, results in all_results.items():
        for result in results:
            if result[-3] > best_f1: # -3 index is val F1
                best_f1 = result[-3]
                best_method = method
                best_config = result

    if best_config:
        method, epochs, seed, lr, bs, val_ce, val_acc, val_f1, test_acc, test_f1 = best_config
        print("\n" + "=" * 80)
        print(f"BEST OVERALL CONFIGURATION (Test F1: {test_f1:.4f}):")
        print(f"Fine-tuning method: {best_method}")
        print(f"Learning rate: {lr}")
        print(f"Batch size: {bs}")
        print(f"Epochs: {epochs}")
        print("=" * 80)

## HyperParameter Search of Bert on Full fine tuning

Best hyperparams which we found:
- lr: 5e-5
- epoch:4

In [None]:
model_name = "bert-base-uncased"


all_results = hyperparameter_search(
            model_name=model_name,
            fine_tuning_methods=["full"],
            learning_rates=[1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3],
            batch_sizes=[16], # as in the paper for this task
            epochs=[3,4, 5]
        )


Running experiments for full fine-tuning on bert-base-uncased
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:32<00:00,  1.84it/s]


train Loss: 1.4927
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.27it/s]


val Loss: 1.1915
== Val Cross Entropy:  1.1920427001755813
== Val Accuracy:  0.5036231884057971
== Val F1:  0.40681062819337943
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 1.1456
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.30it/s]


val Loss: 1.0315
== Val Cross Entropy:  1.0354472213778003
== Val Accuracy:  0.5072463768115942
== Val F1:  0.3809268697633113
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 1.0639
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 1.0040
== Val Cross Entropy:  1.0084256797001279
== Val Accuracy:  0.5181159420289855
== Val F1:  0.39437761934098037
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.49it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3803

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.85it/s]


train Loss: 1.0936
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.8227
== Val Cross Entropy:  0.8262815249377283
== Val Accuracy:  0.644927536231884
== Val F1:  0.6008080039582199
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.7245
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.25it/s]


val Loss: 0.7777
== Val Cross Entropy:  0.7786990414405691
== Val Accuracy:  0.644927536231884
== Val F1:  0.621427909655706
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.6174
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7566
== Val Cross Entropy:  0.7550210490308958
== Val Accuracy:  0.7318840579710145
== Val F1:  0.7064528807743206
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.50it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.7025
 Test F1 Score: 0.6846

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9882
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7744
== Val Cross Entropy:  0.7744210865990869
== Val Accuracy:  0.7246376811594203
== Val F1:  0.6819966482973392
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5827
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7392
== Val Cross Entropy:  0.7358520822278385
== Val Accuracy:  0.7065217391304348
== Val F1:  0.7034279821399753
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4211
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7075
== Val Cross Entropy:  0.7022274254724897
== Val Accuracy:  0.7427536231884058
== Val F1:  0.7223302822273074
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.53it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.8055
 Test F1 Score: 0.7926

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9410
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7556
== Val Cross Entropy:  0.7529509869115106
== Val Accuracy:  0.7463768115942029
== Val F1:  0.7077466689961438
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5031
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.7240
== Val Cross Entropy:  0.7194053027136572
== Val Accuracy:  0.7028985507246377
== Val F1:  0.6997525452613539
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.3111
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7166
== Val Cross Entropy:  0.711448986468644
== Val Accuracy:  0.7644927536231884
== Val F1:  0.746142194005127
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.52it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.8169
 Test F1 Score: 0.8085

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9042
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7032
== Val Cross Entropy:  0.7014108639338921
== Val Accuracy:  0.75
== Val F1:  0.7001445309573855
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4584
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.6910
== Val Cross Entropy:  0.6894642243097568
== Val Accuracy:  0.7463768115942029
== Val F1:  0.7413873188630408
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2335
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7210
== Val Cross Entropy:  0.7174089561248648
== Val Accuracy:  0.7717391304347826
== Val F1:  0.7581060609213213
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.51it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.8169
 Test F1 Score: 0.8108

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9277
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7425
== Val Cross Entropy:  0.742058696417973
== Val Accuracy:  0.717391304347826
== Val F1:  0.6721266489847573
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4847
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7273
== Val Cross Entropy:  0.7246988986072869
== Val Accuracy:  0.7246376811594203
== Val F1:  0.7308776529175764
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2308
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.8086
== Val Cross Entropy:  0.8088961184538644
== Val Accuracy:  0.7536231884057971
== Val F1:  0.7426057379473529
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.50it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.8101
 Test F1 Score: 0.8050

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.2361
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.33it/s]


val Loss: 1.0140
== Val Cross Entropy:  1.0213466615512454
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0372
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.35it/s]


val Loss: 1.1165
== Val Cross Entropy:  1.1255419706476146
== Val Accuracy:  0.2028985507246377
== Val F1:  0.06844770385891392
== Early Stopping Count:  1

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0264
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.35it/s]


val Loss: 1.0150
== Val Cross Entropy:  1.0213794214972134
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.61it/s]



 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.5240
 Test F1 Score: 0.3604
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.5099
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 1.1946
== Val Cross Entropy:  1.1951154018270558
== Val Accuracy:  0.5036231884057971
== Val F1:  0.4083326153662836
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.1347
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 1.0111
== Val Cross Entropy:  1.0154825942269687
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3805188083138394
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0306
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.9670
== Val Cross Entropy:  0.9719777579965263
== Val Accuracy:  0.5181159420289855
== Val F1:  0.38840014946885176
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0335
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.9569
== Val Cross Entropy:  0.9620512600602775
== Val Accuracy:  0.5181159420289855
== Val F1:  0.38840014946885176
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.51it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.5355
 Test F1 Score: 0.3932

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.1224
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.8260
== Val Cross Entropy:  0.829394636483028
== Val Accuracy:  0.6521739130434783
== Val F1:  0.6074834162520729
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.7192
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7738
== Val Cross Entropy:  0.774684649089287
== Val Accuracy:  0.6304347826086957
== Val F1:  0.616129119116526
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5791
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7712
== Val Cross Entropy:  0.76812835808458
== Val Accuracy:  0.7101449275362319
== Val F1:  0.684205134442093
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5216
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7394
== Val Cross Entropy:  0.736994107221735
== Val Accuracy:  0.717391304347826
== Val F1:  0.6975471910009592
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.51it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.7643
 Test F1 Score: 0.7553

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0177
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7794
== Val Cross Entropy:  0.7791931865544155
== Val Accuracy:  0.7318840579710145
== Val F1:  0.688253423267262
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5895
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7390
== Val Cross Entropy:  0.735089325699313
== Val Accuracy:  0.7101449275362319
== Val F1:  0.7044832940909294
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.3869
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7744
== Val Cross Entropy:  0.7671687312681099
== Val Accuracy:  0.7572463768115942
== Val F1:  0.7414819080525963
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2863
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.7236
== Val Cross Entropy:  0.7172924323842443
== Val Accuracy:  0.7608695652173914
== Val F1:  0.7414725402784623
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.50it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.8169
 Test F1 Score: 0.8103

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9664
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7791
== Val Cross Entropy:  0.7772090527518042
== Val Accuracy:  0.7210144927536232
== Val F1:  0.670873883150098
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5094
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7218
== Val Cross Entropy:  0.7152877593862599
== Val Accuracy:  0.7355072463768116
== Val F1:  0.7275112443778111
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2878
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.8271
== Val Cross Entropy:  0.8214900814767542
== Val Accuracy:  0.7644927536231884
== Val F1:  0.7526900530681249
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1709
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7928
== Val Cross Entropy:  0.7859073537562428
== Val Accuracy:  0.782608695652174
== Val F1:  0.7674253089721069
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.51it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.8261
 Test F1 Score: 0.8209

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9215
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7145
== Val Cross Entropy:  0.7126655558059956
== Val Accuracy:  0.7536231884057971
== Val F1:  0.7106923784497482
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4746
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.24it/s]


val Loss: 0.6819
== Val Cross Entropy:  0.6795264574988135
== Val Accuracy:  0.75
== Val F1:  0.7432638830211912
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2265
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7928
== Val Cross Entropy:  0.7915174573146063
== Val Accuracy:  0.7753623188405797
== Val F1:  0.7657684288119071
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1174
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.8078
== Val Cross Entropy:  0.8023039195686579
== Val Accuracy:  0.8007246376811594
== Val F1:  0.7828862869880097
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.51it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.8169
 Test F1 Score: 0.8107

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9340
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.6832
== Val Cross Entropy:  0.6833261230896259
== Val Accuracy:  0.7355072463768116
== Val F1:  0.6981812253392472
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4838
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.18it/s]


val Loss: 0.7039
== Val Cross Entropy:  0.7049424761328204
== Val Accuracy:  0.7101449275362319
== Val F1:  0.7171601651932915
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2363
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.18it/s]


val Loss: 0.9822
== Val Cross Entropy:  0.9933499876923603
== Val Accuracy:  0.7318840579710145
== Val F1:  0.7166087911031613
== Early Stopping Count:  1

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.0981
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.18it/s]


val Loss: 0.9101
== Val Cross Entropy:  0.9282722708865486
== Val Accuracy:  0.7572463768115942
== Val F1:  0.7523948547330536
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.47it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.8146
 Test F1 Score: 0.8089

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.2520
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.41it/s]


val Loss: 1.0468
== Val Cross Entropy:  1.0571159782080815
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0627
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.42it/s]


val Loss: 1.1043
== Val Cross Entropy:  1.1134804795528281
== Val Accuracy:  0.2028985507246377
== Val F1:  0.06844770385891392
== Early Stopping Count:  1

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0443
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.42it/s]


val Loss: 1.0253
== Val Cross Entropy:  1.0335016805550148
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0398
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.43it/s]


val Loss: 1.0358
== Val Cross Entropy:  1.04160325075018
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.72it/s]



 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.5240
 Test F1 Score: 0.3604
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.5284
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.19it/s]


val Loss: 1.2071
== Val Cross Entropy:  1.207454722503136
== Val Accuracy:  0.5036231884057971
== Val F1:  0.4083326153662836
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.1348
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 1.0037
== Val Cross Entropy:  1.0082930120928535
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37966790373029996
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0171
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.9526
== Val Cross Entropy:  0.9576913508875616
== Val Accuracy:  0.5217391304347826
== Val F1:  0.3959893576203774
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0087
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.9336
== Val Cross Entropy:  0.9387680641536055
== Val Accuracy:  0.5181159420289855
== Val F1:  0.39382892758783217
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9841
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.9284
== Val Cross Entropy:  0.933592204389901
== Val Accuracy:  0.5144927536231884
== Val F1:  0.3916715677724537
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.52it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.5515
 Test F1 Score: 0.4433

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.1481
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.8303
== Val Cross Entropy:  0.8337903146086068
== Val Accuracy:  0.6630434782608695
== Val F1:  0.6205565130421782
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.7213
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7748
== Val Cross Entropy:  0.7755938871153469
== Val Accuracy:  0.6123188405797102
== Val F1:  0.6069751850397047
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5643
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7725
== Val Cross Entropy:  0.7685767818113853
== Val Accuracy:  0.717391304347826
== Val F1:  0.6920109916425523
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4747
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7231
== Val Cross Entropy:  0.7199865661818405
== Val Accuracy:  0.7282608695652174
== Val F1:  0.7104134847671782
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4059
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7204
== Val Cross Entropy:  0.7181296718531641
== Val Accuracy:  0.7463768115942029
== Val F1:  0.728806300919947
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.50it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.7872
 Test F1 Score: 0.7778

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0430
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7907
== Val Cross Entropy:  0.7906797846843456
== Val Accuracy:  0.7210144927536232
== Val F1:  0.6727233125137753
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5993
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7424
== Val Cross Entropy:  0.7382568501192948
== Val Accuracy:  0.7065217391304348
== Val F1:  0.7066848525496593
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.3788
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7809
== Val Cross Entropy:  0.7733867317438126
== Val Accuracy:  0.7608695652173914
== Val F1:  0.7425199580213473
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2533
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.21it/s]


val Loss: 0.7226
== Val Cross Entropy:  0.7195902791773451
== Val Accuracy:  0.7717391304347826
== Val F1:  0.7539303855047192
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1879
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7662
== Val Cross Entropy:  0.7648170838325188
== Val Accuracy:  0.7789855072463768
== Val F1:  0.7650755953973776
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.52it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.8284
 Test F1 Score: 0.8221

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9921
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7882
== Val Cross Entropy:  0.7870543347350483
== Val Accuracy:  0.7101449275362319
== Val F1:  0.6537964310070747
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5223
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7136
== Val Cross Entropy:  0.7064761041567243
== Val Accuracy:  0.7427536231884058
== Val F1:  0.7334017326152658
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2857
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.8412
== Val Cross Entropy:  0.836850561458489
== Val Accuracy:  0.7536231884057971
== Val F1:  0.7396422418161549
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1718
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.8276
== Val Cross Entropy:  0.8270484764642757
== Val Accuracy:  0.7753623188405797
== Val F1:  0.760791345561493
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1121
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.8943
== Val Cross Entropy:  0.892002459615469
== Val Accuracy:  0.782608695652174
== Val F1:  0.7674237763193875
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.48it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.8215
 Test F1 Score: 0.8166

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9446
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.7611
== Val Cross Entropy:  0.7619739169704502
== Val Accuracy:  0.7246376811594203
== Val F1:  0.6721369707303211
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.4909
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.6777
== Val Cross Entropy:  0.6723641041537811
== Val Accuracy:  0.7536231884057971
== Val F1:  0.7398696385432336
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2369
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.9027
== Val Cross Entropy:  0.89811970210024
== Val Accuracy:  0.7934782608695652
== Val F1:  0.7728028855640946
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1243
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.9365
== Val Cross Entropy:  0.9115022436298174
== Val Accuracy:  0.7644927536231884
== Val F1:  0.7602497582410358
== Early Stopping Count:  1

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.0684
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.22it/s]


val Loss: 0.9868
== Val Cross Entropy:  0.9608530665844165
== Val Accuracy:  0.7898550724637681
== Val F1:  0.773968672673085
== Early Stopping Count:  2


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.51it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.7963
 Test F1 Score: 0.7907

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.9331
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.18it/s]


val Loss: 0.7254
== Val Cross Entropy:  0.7265395213817728
== Val Accuracy:  0.717391304347826
== Val F1:  0.7014137943921612
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.5635
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.20it/s]


val Loss: 0.6752
== Val Cross Entropy:  0.6770530651355612
== Val Accuracy:  0.7101449275362319
== Val F1:  0.7111342343232734
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.2538
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.18it/s]


val Loss: 1.0791
== Val Cross Entropy:  1.0846036673555601
== Val Accuracy:  0.7028985507246377
== Val F1:  0.6787474671819891
== Early Stopping Count:  1

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1433
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.18it/s]


val Loss: 0.8454
== Val Cross Entropy:  0.8626808381543077
== Val Accuracy:  0.782608695652174
== Val F1:  0.7714564496721357
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.0691
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.19it/s]


val Loss: 0.9172
== Val Cross Entropy:  0.9366195492189506
== Val Accuracy:  0.7898550724637681
== Val F1:  0.7761904949218826
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.49it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.8078
 Test F1 Score: 0.8047

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 1.1413
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.41it/s]


val Loss: 1.1622
== Val Cross Entropy:  1.1772630903227577
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 1.0684
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.41it/s]


val Loss: 1.1085
== Val Cross Entropy:  1.116963111121079
== Val Accuracy:  0.2028985507246377
== Val F1:  0.06844770385891392
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 1.0577
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.41it/s]


val Loss: 1.0814
== Val Cross Entropy:  1.0922890942672203
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 1.0572
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.42it/s]


val Loss: 1.0125
== Val Cross Entropy:  1.0204858553820644
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 1.0541
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.41it/s]


val Loss: 1.0220
== Val Cross Entropy:  1.0287615204679554
== Val Accuracy:  0.4891304347826087
== Val F1:  0.3213265629958743
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.70it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.5240
 Test F1 Score: 0.3604





In [None]:
display_results(all_results)


SUMMARY OF RESULTS

FULL FINE-TUNING RESULTS:
------------------------------------------------------------
Method   Epochs LR       BS   Val CE   Val Acc  Val F1   Test Acc Test F1 
------------------------------------------------------------
full     5      0.000020 16   0.7196   0.7717   0.7651   0.8284   0.8221  
full     4      0.000030 16   0.7153   0.7826   0.7674   0.8261   0.8209  
full     5      0.000030 16   0.7065   0.7754   0.7608   0.8215   0.8166  
full     3      0.000050 16   0.6895   0.7717   0.7581   0.8169   0.8108  
full     4      0.000050 16   0.6795   0.8007   0.7829   0.8169   0.8107  
full     4      0.000020 16   0.7173   0.7572   0.7415   0.8169   0.8103  
full     4      0.000100 16   0.6833   0.7572   0.7524   0.8146   0.8089  
full     3      0.000030 16   0.7114   0.7645   0.7461   0.8169   0.8085  
full     3      0.000100 16   0.7247   0.7536   0.7426   0.8101   0.8050  
full     5      0.000100 16   0.6771   0.7826   0.7715   0.8078   0.8047  
full  

## Hyperparameter search for Lora fine tuning

Best hyperparams which we found:
- lr: 1e-3
- epoch: 5

In [None]:
model_name = "bert-base-uncased"


all_results = hyperparameter_search(
            model_name=model_name,
            fine_tuning_methods=["lora"],
            learning_rates=[1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3],
            batch_sizes=[16], # as in the paper for this task
            epochs=[3,4, 5]
        )


Running experiments for lora fine-tuning on bert-base-uncased
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.29it/s]


train Loss: 1.7203
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.7181
== Val Cross Entropy:  1.710701852009214
== Val Accuracy:  0.18115942028985507
== Val F1:  0.10227129456780715
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7141
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.97it/s]


val Loss: 1.7088
== Val Cross Entropy:  1.7015243480945457
== Val Accuracy:  0.18478260869565216
== Val F1:  0.10756793536357061
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.34it/s]


train Loss: 1.7147
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.98it/s]


val Loss: 1.7056
== Val Cross Entropy:  1.6983685329042633
== Val Accuracy:  0.18478260869565216
== Val F1:  0.10756793536357061
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.10it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.2471
 Test F1 Score: 0.1301

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.33it/s]


train Loss: 1.6749
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 1.6005
== Val Cross Entropy:  1.5944507697532917
== Val Accuracy:  0.213768115942029
== Val F1:  0.15112143031333744
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.5628
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.5010
== Val Cross Entropy:  1.4961354362553563
== Val Accuracy:  0.2536231884057971
== Val F1:  0.22864824481049334
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.5010
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.4663
== Val Cross Entropy:  1.4618873513978103
== Val Accuracy:  0.2971014492753623
== Val F1:  0.28352545771727805
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.09it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.4439
 Test F1 Score: 0.4135

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6240
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.4670
== Val Cross Entropy:  1.4625629638803417
== Val Accuracy:  0.2971014492753623
== Val F1:  0.28352545771727805
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.3921
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.2694
== Val Cross Entropy:  1.267903023752673
== Val Accuracy:  0.47101449275362317
== Val F1:  0.4184205233476419
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.2666
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.2073
== Val Cross Entropy:  1.2068542283156822
== Val Accuracy:  0.5036231884057971
== Val F1:  0.4223083821547039
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.08it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.5423
 Test F1 Score: 0.4399

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.33it/s]


train Loss: 1.5736
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.3359
== Val Cross Entropy:  1.3332673845619991
== Val Accuracy:  0.40217391304347827
== Val F1:  0.38025944691579844
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.2386
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.0861
== Val Cross Entropy:  1.0880840934556106
== Val Accuracy:  0.5217391304347826
== Val F1:  0.4095286758330236
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.0912
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.0314
== Val Cross Entropy:  1.0346539390498195
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3854331838572366
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3809

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.33it/s]


train Loss: 1.4787
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.96it/s]


val Loss: 1.1139
== Val Cross Entropy:  1.1151816557193626
== Val Accuracy:  0.5217391304347826
== Val F1:  0.42013622890674707
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0500
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.9428
== Val Cross Entropy:  0.9484761768373949
== Val Accuracy:  0.5072463768115942
== Val F1:  0.375313682696159
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9513
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.9257
== Val Cross Entropy:  0.931551807913287
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37565847141021386
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.06it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3789

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.33it/s]


train Loss: 1.3184
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.9302
== Val Cross Entropy:  0.936774745069701
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37746089265138444
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9232
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.8657
== Val Cross Entropy:  0.8716536698670223
== Val Accuracy:  0.5471014492753623
== Val F1:  0.4588382631860892
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.8446
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8568
== Val Cross Entropy:  0.8626130757660702
== Val Accuracy:  0.5471014492753623
== Val F1:  0.4588382631860892
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.06it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.5835
 Test F1 Score: 0.5321

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.33it/s]


train Loss: 1.0206
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.7502
== Val Cross Entropy:  0.751889620361657
== Val Accuracy:  0.6739130434782609
== Val F1:  0.669115798562439
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.6189
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.6774
== Val Cross Entropy:  0.6759804954816555
== Val Accuracy:  0.7608695652173914
== Val F1:  0.7386631468648089
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.4379
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.6787
== Val Cross Entropy:  0.6772121607229628
== Val Accuracy:  0.7391304347826086
== Val F1:  0.7226070972453236
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.05it/s]



 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.7826
 Test F1 Score: 0.7777
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7207
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.7183
== Val Cross Entropy:  1.7109267136146282
== Val Accuracy:  0.18115942028985507
== Val F1:  0.10227129456780715
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7134
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.7067
== Val Cross Entropy:  1.6994473070933902
== Val Accuracy:  0.18478260869565216
== Val F1:  0.10756793536357061
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7110
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.6995
== Val Cross Entropy:  1.6923677428015347
== Val Accuracy:  0.18840579710144928
== Val F1:  0.10843778484851566
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6924
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.6971
== Val Cross Entropy:  1.6900060259062668
== Val Accuracy:  0.18840579710144928
== Val F1:  0.10820818554994945
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.07it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.2471
 Test F1 Score: 0.1289

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6793
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.6026
== Val Cross Entropy:  1.5965409237762977
== Val Accuracy:  0.21014492753623187
== Val F1:  0.14428237634254495
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.5544
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.4769
== Val Cross Entropy:  1.4723084877277244
== Val Accuracy:  0.29347826086956524
== Val F1:  0.2735735250223858
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.4592
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.3977
== Val Cross Entropy:  1.394243733636264
== Val Accuracy:  0.33695652173913043
== Val F1:  0.33217948335372466
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3976
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.3716
== Val Cross Entropy:  1.3685592371841957
== Val Accuracy:  0.36231884057971014
== Val F1:  0.3585072427364211
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.5423
 Test F1 Score: 0.4949

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6329
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.4707
== Val Cross Entropy:  1.4662179782472808
== Val Accuracy:  0.29347826086956524
== Val F1:  0.2758902914134331
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3751
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.2236
== Val Cross Entropy:  1.2229019156817733
== Val Accuracy:  0.4855072463768116
== Val F1:  0.41525268464646997
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.1938
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.0993
== Val Cross Entropy:  1.1010514086690442
== Val Accuracy:  0.5253623188405797
== Val F1:  0.41621470969297053
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.1136
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.0669
== Val Cross Entropy:  1.0694685023406456
== Val Accuracy:  0.5144927536231884
== Val F1:  0.39408936210101597
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.08it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3814

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.5866
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.3404
== Val Cross Entropy:  1.3376904027215366
== Val Accuracy:  0.38768115942028986
== Val F1:  0.37460900971994393
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.2186
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.0432
== Val Cross Entropy:  1.0462461751082848
== Val Accuracy:  0.5108695652173914
== Val F1:  0.38716170920417636
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0327
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9662
== Val Cross Entropy:  0.9713478252805513
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3825951739990904
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.9853
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.9544
== Val Cross Entropy:  0.9601131965374124
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37582029580658255
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.05it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3779

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.34it/s]


train Loss: 1.4990
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.1174
== Val Cross Entropy:  1.1186149983570493
== Val Accuracy:  0.5217391304347826
== Val F1:  0.42013622890674707
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0401
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.9285
== Val Cross Entropy:  0.9343689885632745
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3777965783041925
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9231
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.96it/s]


val Loss: 0.9019
== Val Cross Entropy:  0.907920763410371
== Val Accuracy:  0.5108695652173914
== Val F1:  0.38231951570798456
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9018
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.8996
== Val Cross Entropy:  0.9059895482556574
== Val Accuracy:  0.5144927536231884
== Val F1:  0.3845181259718266
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.07it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.5286
 Test F1 Score: 0.3886

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3468
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9313
== Val Cross Entropy:  0.9379299365240952
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37746089265138444
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9186
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8544
== Val Cross Entropy:  0.8603108956896025
== Val Accuracy:  0.5543478260869565
== Val F1:  0.47211157697091205
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.8213
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8475
== Val Cross Entropy:  0.8531243122857193
== Val Accuracy:  0.5615942028985508
== Val F1:  0.4832657063953509
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.8029
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.8469
== Val Cross Entropy:  0.8526922186900829
== Val Accuracy:  0.5688405797101449
== Val F1:  0.4948009285475384
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.6018
 Test F1 Score: 0.5737

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0366
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.7368
== Val Cross Entropy:  0.7389401742096605
== Val Accuracy:  0.6992753623188406
== Val F1:  0.690775323195592
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.6159
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.7106
== Val Cross Entropy:  0.7119464679010983
== Val Accuracy:  0.7391304347826086
== Val F1:  0.7139447129434404
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.4556
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.6945
== Val Cross Entropy:  0.694690258081617
== Val Accuracy:  0.717391304347826
== Val F1:  0.714969352241442
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.3282
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.6680
== Val Cross Entropy:  0.6684919933563677
== Val Accuracy:  0.7644927536231884
== Val F1:  0.7536996299358109
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.05it/s]



 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.7757
 Test F1 Score: 0.7640
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7212
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.7188
== Val Cross Entropy:  1.7114513824725974
== Val Accuracy:  0.18115942028985507
== Val F1:  0.10227129456780715
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7133
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.7058
== Val Cross Entropy:  1.6985851074087208
== Val Accuracy:  0.18840579710144928
== Val F1:  0.10867081247516029
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.7092
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.6963
== Val Cross Entropy:  1.689128144034024
== Val Accuracy:  0.18840579710144928
== Val F1:  0.10820818554994945
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6880
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.6906
== Val Cross Entropy:  1.6834838719203555
== Val Accuracy:  0.18840579710144928
== Val F1:  0.10798208441753081
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.6799
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.6887
== Val Cross Entropy:  1.6816235534076034
== Val Accuracy:  0.18840579710144928
== Val F1:  0.10798208441753081
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.06it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.2494
 Test F1 Score: 0.1324

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6840
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.6080
== Val Cross Entropy:  1.6017985837212925
== Val Accuracy:  0.20652173913043478
== Val F1:  0.1375886524822695
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.5530
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.4660
== Val Cross Entropy:  1.4615798284267556
== Val Accuracy:  0.2971014492753623
== Val F1:  0.27717823272433584
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.4370
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.3591
== Val Cross Entropy:  1.3562114526485574
== Val Accuracy:  0.36594202898550726
== Val F1:  0.3593873332881809
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3474
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.2974
== Val Cross Entropy:  1.2955027243186688
== Val Accuracy:  0.44565217391304346
== Val F1:  0.4049624398026662
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3156
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.2787
== Val Cross Entropy:  1.2771142063469723
== Val Accuracy:  0.47101449275362317
== Val F1:  0.41837626740725675
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.07it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.5698
 Test F1 Score: 0.4992

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6423
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.4811
== Val Cross Entropy:  1.4764311108095893
== Val Accuracy:  0.286231884057971
== Val F1:  0.2622093554076016
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3716
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.2028
== Val Cross Entropy:  1.202515540451839
== Val Accuracy:  0.5
== Val F1:  0.4199944553920527
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.1587
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.96it/s]


val Loss: 1.0509
== Val Cross Entropy:  1.0538682958175396
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3854331838572366
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0579
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.96it/s]


val Loss: 0.9998
== Val Cross Entropy:  1.0042166154960106
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3837223416340508
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0424
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.9896
== Val Cross Entropy:  0.9942392069717934
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3837223416340508
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.07it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3789

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.6006
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.3547
== Val Cross Entropy:  1.3518270665201648
== Val Accuracy:  0.36231884057971014
== Val F1:  0.35696556855606487
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.2152
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.0260
== Val Cross Entropy:  1.0295398153107742
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3845755693581781
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0101
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9478
== Val Cross Entropy:  0.9535854705448809
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37746089265138444
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9595
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9319
== Val Cross Entropy:  0.9382889558529032
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37662284792231054
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9578
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9258
== Val Cross Entropy:  0.932106170161017
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37746089265138444
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3789

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.5206
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 1.1332
== Val Cross Entropy:  1.1341665284386997
== Val Accuracy:  0.5289855072463768
== Val F1:  0.4321814713323226
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0407
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.9223
== Val Cross Entropy:  0.9282996469530566
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3777965783041925
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.9105
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.8919
== Val Cross Entropy:  0.8979941627074932
== Val Accuracy:  0.5181159420289855
== Val F1:  0.3963143971551936
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.8811
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.8765
== Val Cross Entropy:  0.8828758247967424
== Val Accuracy:  0.532608695652174
== Val F1:  0.4248763174573751
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.8729
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.8681
== Val Cross Entropy:  0.8742140346559985
== Val Accuracy:  0.5289855072463768
== Val F1:  0.4251781066740941
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.07it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.5469
 Test F1 Score: 0.4484

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.3757
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.9347
== Val Cross Entropy:  0.9412208199501038
== Val Accuracy:  0.5108695652173914
== Val F1:  0.37746089265138444
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.9181
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.8498
== Val Cross Entropy:  0.8555716666682013
== Val Accuracy:  0.5615942028985508
== Val F1:  0.4854351460280221
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.8106
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8390
== Val Cross Entropy:  0.8443337350056089
== Val Accuracy:  0.5905797101449275
== Val F1:  0.5308763307861001
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.7734
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8055
== Val Cross Entropy:  0.811548270028213
== Val Accuracy:  0.6123188405797102
== Val F1:  0.5757932721950714
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.7599
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8066
== Val Cross Entropy:  0.812416921401846
== Val Accuracy:  0.6123188405797102
== Val F1:  0.5750437426861105
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.6636
 Test F1 Score: 0.6551

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 1.0507
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.7537
== Val Cross Entropy:  0.7554141488568537
== Val Accuracy:  0.6666666666666666
== Val F1:  0.6550100974103112
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.6186
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.7388
== Val Cross Entropy:  0.7401882435741096
== Val Accuracy:  0.7246376811594203
== Val F1:  0.7032416383806261
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.4648
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.96it/s]


val Loss: 0.7025
== Val Cross Entropy:  0.7034998131466323
== Val Accuracy:  0.7137681159420289
== Val F1:  0.7111131849831711
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.3072
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.6885
== Val Cross Entropy:  0.6918018810322573
== Val Accuracy:  0.7681159420289855
== Val F1:  0.756563168717241
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.2433
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.7036
== Val Cross Entropy:  0.7058236805465201
== Val Accuracy:  0.7681159420289855
== Val F1:  0.7599274832138881
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.7918
 Test F1 Score: 0.7872





In [None]:
display_results(all_results)


SUMMARY OF RESULTS

LORA FINE-TUNING RESULTS:
------------------------------------------------------------
Method   Epochs LR       BS   Val CE   Val Acc  Val F1   Test Acc Test F1 
------------------------------------------------------------
lora     5      0.001000 16   0.6918   0.7681   0.7566   0.7918   0.7872  
lora     3      0.001000 16   0.6772   0.7609   0.7387   0.7826   0.7777  
lora     4      0.001000 16   0.6685   0.7645   0.7537   0.7757   0.7640  
lora     5      0.000100 16   0.8124   0.6123   0.5758   0.6636   0.6551  
lora     4      0.000100 16   0.8527   0.5688   0.4948   0.6018   0.5737  
lora     3      0.000100 16   0.8626   0.5471   0.4588   0.5835   0.5321  
lora     5      0.000010 16   1.2771   0.4710   0.4184   0.5698   0.4992  
lora     4      0.000010 16   1.3686   0.3623   0.3585   0.5423   0.4949  
lora     5      0.000050 16   0.8742   0.5290   0.4322   0.5469   0.4484  
lora     3      0.000020 16   1.2069   0.5036   0.4184   0.5423   0.4399  
lora  

## Hyperparameters Search for Adapter fine tuning

Best hyperparams which we found:
- lr: 1e-3
- epoch: 11

In [None]:
model_name = "bert-base-uncased"


all_results = hyperparameter_search(
            model_name=model_name,
            fine_tuning_methods=["adapter"],
            learning_rates=[1e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4, 1e-3],
            batch_sizes=[16], # as in the paper for this task
            epochs=[6, 9, 11]
        )


Running experiments for adapter fine-tuning on bert-base-uncased
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.7326
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.82it/s]


val Loss: 1.7132
== Val Cross Entropy:  1.7057340268431038
== Val Accuracy:  0.19927536231884058
== Val F1:  0.1330536941803889
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.38it/s]


train Loss: 1.6885
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6874
== Val Cross Entropy:  1.6801507061925427
== Val Accuracy:  0.1956521739130435
== Val F1:  0.13445591859403233
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6836
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.6670
== Val Cross Entropy:  1.6600371475877433
== Val Accuracy:  0.19927536231884058
== Val F1:  0.14063028714047815
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6612
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.6523
== Val Cross Entropy:  1.6454454126029179
== Val Accuracy:  0.20652173913043478
== Val F1:  0.14949148483008148
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.6567
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.6439
== Val Cross Entropy:  1.6371518579022637
== Val Accuracy:  0.213768115942029
== Val F1:  0.15848297950268547
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.6334
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.6410
== Val Cross Entropy:  1.6342821326749077
== Val Accuracy:  0.21739130434782608
== Val F1:  0.16511288006790256
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.2563
 Test F1 Score: 0.1579

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6755
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.5300
== Val Cross Entropy:  1.524640412166201
== Val Accuracy:  0.27898550724637683
== Val F1:  0.2635714480396727
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.4380
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.3154
== Val Cross Entropy:  1.312815468886803
== Val Accuracy:  0.427536231884058
== Val F1:  0.3945088660805285
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.2835
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.1703
== Val Cross Entropy:  1.170179843902588
== Val Accuracy:  0.5144927536231884
== Val F1:  0.41311810226740914
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.1844
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.0894
== Val Cross Entropy:  1.0908345559547687
== Val Accuracy:  0.5217391304347826
== Val F1:  0.40839956188534465
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.1166
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.0535
== Val Cross Entropy:  1.055777015357182
== Val Accuracy:  0.5181159420289855
== Val F1:  0.40014603525697534
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.1038
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.0424
== Val Cross Entropy:  1.044919412711571
== Val Accuracy:  0.5181159420289855
== Val F1:  0.399275055516095
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.5355
 Test F1 Score: 0.3905

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6170
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.3509
== Val Cross Entropy:  1.347963600323118
== Val Accuracy:  0.3804347826086957
== Val F1:  0.36377956021611274
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.2226
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.0413
== Val Cross Entropy:  1.0438554471936718
== Val Accuracy:  0.5181159420289855
== Val F1:  0.399275055516095
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0328
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9531
== Val Cross Entropy:  0.9582319567943441
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9886
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9252
== Val Cross Entropy:  0.9306000964394932
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37481820393981063
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9388
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.9151
== Val Cross Entropy:  0.9207496190893238
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3831683902294377
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9487
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9122
== Val Cross Entropy:  0.9178271540280046
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3831683902294377
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.5332
 Test F1 Score: 0.3860

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.5637
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.2005
== Val Cross Entropy:  1.200062870979309
== Val Accuracy:  0.5144927536231884
== Val F1:  0.42312557110596144
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.0887
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9434
== Val Cross Entropy:  0.9487376233627056
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9483
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9097
== Val Cross Entropy:  0.915702419034366
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37481820393981063
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9243
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8826
== Val Cross Entropy:  0.8881247393016157
== Val Accuracy:  0.5398550724637681
== Val F1:  0.4415119800597165
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8764
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.8758
== Val Cross Entropy:  0.8813496334799404
== Val Accuracy:  0.5434782608695652
== Val F1:  0.4504842330929288
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.8858
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8739
== Val Cross Entropy:  0.8794416538600264
== Val Accuracy:  0.5471014492753623
== Val F1:  0.4565244582211709
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.5904
 Test F1 Score: 0.5191

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.4751
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.0096
== Val Cross Entropy:  1.013169424287204
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37582029580658255
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.9726
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8934
== Val Cross Entropy:  0.8993414044380188
== Val Accuracy:  0.5217391304347826
== Val F1:  0.40245602239574735
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8774
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8592
== Val Cross Entropy:  0.8648809811164593
== Val Accuracy:  0.572463768115942
== Val F1:  0.4971420831640255
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8506
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8415
== Val Cross Entropy:  0.8467738299534239
== Val Accuracy:  0.605072463768116
== Val F1:  0.5546011008681024
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8026
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8353
== Val Cross Entropy:  0.8404988728720566
== Val Accuracy:  0.6340579710144928
== Val F1:  0.5893629677698464
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8068
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8344
== Val Cross Entropy:  0.8395322540710712
== Val Accuracy:  0.6485507246376812
== Val F1:  0.6062317646450051
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.6362
 Test F1 Score: 0.6205

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.3430
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.8963
== Val Cross Entropy:  0.9016191178354723
== Val Accuracy:  0.5217391304347826
== Val F1:  0.4085740754448654
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.8766
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8500
== Val Cross Entropy:  0.8554517184865886
== Val Accuracy:  0.6340579710144928
== Val F1:  0.5805728361115885
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8002
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8041
== Val Cross Entropy:  0.8093339739174679
== Val Accuracy:  0.6413043478260869
== Val F1:  0.6117071154858681
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7456
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8310
== Val Cross Entropy:  0.8338253796100616
== Val Accuracy:  0.6956521739130435
== Val F1:  0.6587658048450429
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.6865
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7743
== Val Cross Entropy:  0.7779550100194996
== Val Accuracy:  0.7355072463768116
== Val F1:  0.7117382077246351
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.6668
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7778
== Val Cross Entropy:  0.7810546118637611
== Val Accuracy:  0.7391304347826086
== Val F1:  0.7147328406547374
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.6865
 Test F1 Score: 0.6728

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0957
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8074
== Val Cross Entropy:  0.8103231376615064
== Val Accuracy:  0.7137681159420289
== Val F1:  0.6950980686530615
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.7122
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.7043
== Val Cross Entropy:  0.7024069279432297
== Val Accuracy:  0.7282608695652174
== Val F1:  0.7053006806034656
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5202
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.7438
== Val Cross Entropy:  0.7423142821624361
== Val Accuracy:  0.7427536231884058
== Val F1:  0.735583039079312
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.3668
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.6146
== Val Cross Entropy:  0.6203353615670368
== Val Accuracy:  0.7536231884057971
== Val F1:  0.7385662403945855
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.2420
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7020
== Val Cross Entropy:  0.7078650741484659
== Val Accuracy:  0.7934782608695652
== Val F1:  0.7834809702890063
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.1371
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7899
== Val Cross Entropy:  0.7954934565671559
== Val Accuracy:  0.782608695652174
== Val F1:  0.7662180415641651
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.8375
 Test F1 Score: 0.8288
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.7346
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.7178
== Val Cross Entropy:  1.710271535248592
== Val Accuracy:  0.19927536231884058
== Val F1:  0.1336150667407295
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.6913
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6890
== Val Cross Entropy:  1.6818079701785384
== Val Accuracy:  0.1956521739130435
== Val F1:  0.13445591859403233
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6831
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.6639
== Val Cross Entropy:  1.6569350094630801
== Val Accuracy:  0.19927536231884058
== Val F1:  0.14063028714047815
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6556
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6425
== Val Cross Entropy:  1.6357353473531788
== Val Accuracy:  0.213768115942029
== Val F1:  0.16053909858257684
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6441
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6259
== Val Cross Entropy:  1.619378772275201
== Val Accuracy:  0.2246376811594203
== Val F1:  0.1756390988583403
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6134
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.6131
== Val Cross Entropy:  1.6067499826694358
== Val Accuracy:  0.2318840579710145
== Val F1:  0.18337134249776518
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6095
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.6038
== Val Cross Entropy:  1.5975275738485928
== Val Accuracy:  0.2318840579710145
== Val F1:  0.18337134249776518
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6027
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.5983
== Val Cross Entropy:  1.592115854394847
== Val Accuracy:  0.2318840579710145
== Val F1:  0.18337134249776518
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6017
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.5964
== Val Cross Entropy:  1.590173704870816
== Val Accuracy:  0.23550724637681159
== Val F1:  0.189949523101697
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.3135
 Test F1 Score: 0.2463

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6941
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.5714
== Val Cross Entropy:  1.5655197603949185
== Val Accuracy:  0.2463768115942029
== Val F1:  0.21324918336374127
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.4619
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.3277
== Val Cross Entropy:  1.3249185660789753
== Val Accuracy:  0.41304347826086957
== Val F1:  0.3871172276764843
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.2794
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.1487
== Val Cross Entropy:  1.1489434941061611
== Val Accuracy:  0.5217391304347826
== Val F1:  0.4210598082396267
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.1526
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.0443
== Val Cross Entropy:  1.0467719805651698
== Val Accuracy:  0.5181159420289855
== Val F1:  0.399275055516095
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0618
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9940
== Val Cross Entropy:  0.9978226895990043
== Val Accuracy:  0.5108695652173914
== Val F1:  0.38343972854842423
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0362
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9679
== Val Cross Entropy:  0.9724836082294069
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37498428285501684
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0228
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9551
== Val Cross Entropy:  0.9599596631938013
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9986
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9484
== Val Cross Entropy:  0.9533430605099119
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0000
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.9462
== Val Cross Entropy:  0.9512718340446209
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.5309
 Test F1 Score: 0.3803

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6517
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.4241
== Val Cross Entropy:  1.4200915303723565
== Val Accuracy:  0.35144927536231885
== Val F1:  0.35153701399181797
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.2580
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.0513
== Val Cross Entropy:  1.0535680918857968
== Val Accuracy:  0.5181159420289855
== Val F1:  0.399275055516095
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0296
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.9462
== Val Cross Entropy:  0.9516776430195776
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9742
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.9117
== Val Cross Entropy:  0.9173219347822255
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3831683902294377
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9145
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8966
== Val Cross Entropy:  0.9023935178230549
== Val Accuracy:  0.532608695652174
== Val F1:  0.42456908593644965
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9136
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8854
== Val Cross Entropy:  0.891036701613459
== Val Accuracy:  0.5398550724637681
== Val F1:  0.4418729969653138
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9071
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8782
== Val Cross Entropy:  0.8836210411170433
== Val Accuracy:  0.5434782608695652
== Val F1:  0.45011996767965845
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8869
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.86it/s]


val Loss: 0.8715
== Val Cross Entropy:  0.8767767939074286
== Val Accuracy:  0.5579710144927537
== Val F1:  0.4761892520698158
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8895
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.8710
== Val Cross Entropy:  0.8762484973874586
== Val Accuracy:  0.5579710144927537
== Val F1:  0.4756239331749281
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.5950
 Test F1 Score: 0.5331

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.6119
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.2932
== Val Cross Entropy:  1.2911124763817623
== Val Accuracy:  0.4492753623188406
== Val F1:  0.40338775073099853
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.1220
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9472
== Val Cross Entropy:  0.9525565422814468
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9466
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9049
== Val Cross Entropy:  0.9110159051829371
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3814749611586507
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9126
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8734
== Val Cross Entropy:  0.8788701172532707
== Val Accuracy:  0.5471014492753623
== Val F1:  0.4570802724669586
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8553
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8607
== Val Cross Entropy:  0.8661382938253468
== Val Accuracy:  0.5652173913043478
== Val F1:  0.48864464507121974
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8520
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.8499
== Val Cross Entropy:  0.8551548226126309
== Val Accuracy:  0.5942028985507246
== Val F1:  0.5322322824907229
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8462
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8427
== Val Cross Entropy:  0.8477048051768336
== Val Accuracy:  0.6086956521739131
== Val F1:  0.5563231503781637
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8284
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8363
== Val Cross Entropy:  0.8412163360365505
== Val Accuracy:  0.6268115942028986
== Val F1:  0.5823306861253689
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8320
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8367
== Val Cross Entropy:  0.8416851060143833
== Val Accuracy:  0.6159420289855072
== Val F1:  0.5691557903354578
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.01it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.6270
 Test F1 Score: 0.6111

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.5409
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.0919
== Val Cross Entropy:  1.0934983368577629
== Val Accuracy:  0.5217391304347826
== Val F1:  0.40839956188534465
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9945
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8985
== Val Cross Entropy:  0.9045885081948906
== Val Accuracy:  0.5181159420289855
== Val F1:  0.3961409466464055
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8762
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8539
== Val Cross Entropy:  0.8595730415705977
== Val Accuracy:  0.5688405797101449
== Val F1:  0.4937604027118094
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8392
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8487
== Val Cross Entropy:  0.8539013225456764
== Val Accuracy:  0.6268115942028986
== Val F1:  0.577482633723868
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7832
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8191
== Val Cross Entropy:  0.8242526814855379
== Val Accuracy:  0.6739130434782609
== Val F1:  0.6443904936057899
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7685
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8098
== Val Cross Entropy:  0.8147906291073767
== Val Accuracy:  0.6884057971014492
== Val F1:  0.6611994182120409
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7606
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8035
== Val Cross Entropy:  0.8082121507874851
== Val Accuracy:  0.677536231884058
== Val F1:  0.6484997544817843
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7403
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7974
== Val Cross Entropy:  0.802146983557734
== Val Accuracy:  0.6847826086956522
== Val F1:  0.6604190465961719
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7413
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.7999
== Val Cross Entropy:  0.8046318539257707
== Val Accuracy:  0.6847826086956522
== Val F1:  0.6608627463322266
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.04it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.6613
 Test F1 Score: 0.6506

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.4176
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9226
== Val Cross Entropy:  0.9283638247128191
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8909
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8529
== Val Cross Entropy:  0.8583418870794361
== Val Accuracy:  0.6304347826086957
== Val F1:  0.5774838154767375
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7996
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8009
== Val Cross Entropy:  0.8059097816204203
== Val Accuracy:  0.6159420289855072
== Val F1:  0.5915463687641838
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7284
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8469
== Val Cross Entropy:  0.8479229720502064
== Val Accuracy:  0.7028985507246377
== Val F1:  0.6604858414563318
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.6473
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7515
== Val Cross Entropy:  0.7538001033766516
== Val Accuracy:  0.7427536231884058
== Val F1:  0.7248097584305329
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5986
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7567
== Val Cross Entropy:  0.7570526897907257
== Val Accuracy:  0.7318840579710145
== Val F1:  0.7145470626994789
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5649
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7486
== Val Cross Entropy:  0.7474920544131048
== Val Accuracy:  0.7282608695652174
== Val F1:  0.7075165764106529
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5409
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7522
== Val Cross Entropy:  0.7508237218034679
== Val Accuracy:  0.7246376811594203
== Val F1:  0.7077294685990337
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5116
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.7530
== Val Cross Entropy:  0.7513916780208719
== Val Accuracy:  0.7210144927536232
== Val F1:  0.7042609266335152
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.7529
 Test F1 Score: 0.7400

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.44it/s]


train Loss: 1.1081
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.8329
== Val Cross Entropy:  0.8353921606622893
== Val Accuracy:  0.6847826086956522
== Val F1:  0.6727114396409695
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.6926
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.86it/s]


val Loss: 0.8051
== Val Cross Entropy:  0.80694146248801
== Val Accuracy:  0.6630434782608695
== Val F1:  0.6746082120566411
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5041
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7657
== Val Cross Entropy:  0.761344291655154
== Val Accuracy:  0.7355072463768116
== Val F1:  0.7287714087287148
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.3602
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.86it/s]


val Loss: 0.6555
== Val Cross Entropy:  0.6674139319822706
== Val Accuracy:  0.7463768115942029
== Val F1:  0.7345698577240825
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.2466
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7686
== Val Cross Entropy:  0.7799203059282797
== Val Accuracy:  0.8007246376811594
== Val F1:  0.7816124751164786
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.1415
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.9676
== Val Cross Entropy:  0.9911276320445126
== Val Accuracy:  0.7717391304347826
== Val F1:  0.7534638867876855
== Early Stopping Count:  1

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.1098
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.0746
== Val Cross Entropy:  1.1075047223090098
== Val Accuracy:  0.7898550724637681
== Val F1:  0.7793223428071469
== Early Stopping Count:  2

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0680
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.1042
== Val Cross Entropy:  1.1415293506378759
== Val Accuracy:  0.7934782608695652
== Val F1:  0.786575883816636
== Early Stopping Count:  3

=== Epoch 9


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.8032
 Test F1 Score: 0.8018
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 7:
=> Learning Rate: 1e-06, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.7354
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.7206
== Val Cross Entropy:  1.713087377877071
== Val Accuracy:  0.1956521739130435
== Val F1:  0.13252458470218445
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.6936
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6910
== Val Cross Entropy:  1.6837768760220757
== Val Accuracy:  0.1956521739130435
== Val F1:  0.13445591859403233
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6842
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6642
== Val Cross Entropy:  1.657193742949387
== Val Accuracy:  0.19927536231884058
== Val F1:  0.14063028714047815
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6548
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.6403
== Val Cross Entropy:  1.6336198593008107
== Val Accuracy:  0.2210144927536232
== Val F1:  0.17215936494382378
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6408
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6209
== Val Cross Entropy:  1.6143755172861034
== Val Accuracy:  0.2246376811594203
== Val F1:  0.17537997686008786
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6075
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.6046
== Val Cross Entropy:  1.5982719577591995
== Val Accuracy:  0.2318840579710145
== Val F1:  0.18337134249776518
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6003
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.5910
== Val Cross Entropy:  1.5848620431176548
== Val Accuracy:  0.23550724637681159
== Val F1:  0.1894052973513243
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.5892
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.5807
== Val Cross Entropy:  1.5747018888078887
== Val Accuracy:  0.2463768115942029
== Val F1:  0.21030930192124486
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.5838
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.5733
== Val Cross Entropy:  1.567356931752172
== Val Accuracy:  0.2536231884057971
== Val F1:  0.22257071643821857
== Early Stopping Count:  0

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.5795
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.5688
== Val Cross Entropy:  1.5629034658958172
== Val Accuracy:  0.2536231884057971
== Val F1:  0.22298165824100227
== Early Stopping Count:  0

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.5771
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 1.5671
== Val Cross Entropy:  1.5612698998944512
== Val Accuracy:  0.2536231884057971
== Val F1:  0.22298165824100227
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 1e-06, BS: 16
 Test Accuracy: 0.3524
 Test F1 Score: 0.3068

== Experiment 2 of 7:
=> Learning Rate: 1e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.7016
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.5975
== Val Cross Entropy:  1.5913287031239476
== Val Accuracy:  0.2318840579710145
== Val F1:  0.18337134249776518
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.4810
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 1.3430
== Val Cross Entropy:  1.3399925190826942
== Val Accuracy:  0.38768115942028986
== Val F1:  0.3771887233233678
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.2869
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 1.1492
== Val Cross Entropy:  1.1494391540001179
== Val Accuracy:  0.5217391304347826
== Val F1:  0.4210598082396267
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.1481
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 1.0352
== Val Cross Entropy:  1.0378912440661727
== Val Accuracy:  0.5144927536231884
== Val F1:  0.3924436599806389
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0497
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.9817
== Val Cross Entropy:  0.9859475760624327
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37582029580658255
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0218
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.9545
== Val Cross Entropy:  0.9594848032655388
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0057
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.9404
== Val Cross Entropy:  0.9456603547622418
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9773
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.9302
== Val Cross Entropy:  0.9354951628323259
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37565847141021386
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9739
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.9247
== Val Cross Entropy:  0.9300454793305233
== Val Accuracy:  0.5072463768115942
== Val F1:  0.3765030259595477
== Early Stopping Count:  0

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9453
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.9211
== Val Cross Entropy:  0.9265469456541127
== Val Accuracy:  0.5072463768115942
== Val F1:  0.3765030259595477
== Early Stopping Count:  0

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9546
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9200
== Val Cross Entropy:  0.925423761893963
== Val Accuracy:  0.5108695652173914
== Val F1:  0.3831683902294377
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 1e-05, BS: 16
 Test Accuracy: 0.5332
 Test F1 Score: 0.3858

== Experiment 3 of 7:
=> Learning Rate: 2e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6659
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 1.4715
== Val Cross Entropy:  1.4668786361299713
== Val Accuracy:  0.3188405797101449
== Val F1:  0.3183130963545794
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.2876
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.0669
== Val Cross Entropy:  1.0688569627959152
== Val Accuracy:  0.5217391304347826
== Val F1:  0.40776231818249126
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0356
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9465
== Val Cross Entropy:  0.952050638609919
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9723
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.9091
== Val Cross Entropy:  0.9147866799913603
== Val Accuracy:  0.5144927536231884
== Val F1:  0.3897130219009602
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9089
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8922
== Val Cross Entropy:  0.8979683580069706
== Val Accuracy:  0.5398550724637681
== Val F1:  0.4371942327430169
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9044
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8783
== Val Cross Entropy:  0.8838063868983038
== Val Accuracy:  0.5471014492753623
== Val F1:  0.45598404728839514
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8936
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8684
== Val Cross Entropy:  0.8736744666921681
== Val Accuracy:  0.5615942028985508
== Val F1:  0.4809689696923797
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8697
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8585
== Val Cross Entropy:  0.8635851165343975
== Val Accuracy:  0.5760869565217391
== Val F1:  0.504656187501797
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8690
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8561
== Val Cross Entropy:  0.8612343319531145
== Val Accuracy:  0.5797101449275363
== Val F1:  0.5096536401444417
== Early Stopping Count:  0

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8416
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.8537
== Val Cross Entropy:  0.8588978627632404
== Val Accuracy:  0.5760869565217391
== Val F1:  0.5069645866480001
== Early Stopping Count:  0

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8473
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8531
== Val Cross Entropy:  0.8582492244654688
== Val Accuracy:  0.5797101449275363
== Val F1:  0.5118588855160353
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]



 Test Results — LR: 2e-05, BS: 16
 Test Accuracy: 0.6178
 Test F1 Score: 0.5849

== Experiment 4 of 7:
=> Learning Rate: 3e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.6320
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 1.3566
== Val Cross Entropy:  1.3535507095271144
== Val Accuracy:  0.3695652173913043
== Val F1:  0.3595669384506163
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 1.1517
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.9533
== Val Cross Entropy:  0.9584637798112015
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9500
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9053
== Val Cross Entropy:  0.9114737140721288
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37481820393981063
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.9113
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8726
== Val Cross Entropy:  0.8781431687289271
== Val Accuracy:  0.5507246376811594
== Val F1:  0.4636060398627176
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8507
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8569
== Val Cross Entropy:  0.862298505059604
== Val Accuracy:  0.5688405797101449
== Val F1:  0.4938911270380368
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8430
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8435
== Val Cross Entropy:  0.848681443724139
== Val Accuracy:  0.605072463768116
== Val F1:  0.5527341743998967
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8333
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.8343
== Val Cross Entropy:  0.8391797727551954
== Val Accuracy:  0.6485507246376812
== Val F1:  0.604514838743056
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8115
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8261
== Val Cross Entropy:  0.8309719726957124
== Val Accuracy:  0.6666666666666666
== Val F1:  0.6325629312220128
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8114
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8250
== Val Cross Entropy:  0.8300556992662365
== Val Accuracy:  0.6630434782608695
== Val F1:  0.628549440983037
== Early Stopping Count:  0

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7846
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.8234
== Val Cross Entropy:  0.8284739399778431
== Val Accuracy:  0.6630434782608695
== Val F1:  0.6291202048270581
== Early Stopping Count:  0

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7862
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8233
== Val Cross Entropy:  0.8283202483736235
== Val Accuracy:  0.6666666666666666
== Val F1:  0.6325629312220128
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 3e-05, BS: 16
 Test Accuracy: 0.6339
 Test F1 Score: 0.6215

== Experiment 5 of 7:
=> Learning Rate: 5e-05, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.5699
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.1651
== Val Cross Entropy:  1.1651769095453723
== Val Accuracy:  0.5144927536231884
== Val F1:  0.4132456693358359
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.0167
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9041
== Val Cross Entropy:  0.9102797837092959
== Val Accuracy:  0.5181159420289855
== Val F1:  0.3961409466464055
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 0.8796
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.8543
== Val Cross Entropy:  0.8600942410271744
== Val Accuracy:  0.5688405797101449
== Val F1:  0.4938911270380368
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8381
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.8551
== Val Cross Entropy:  0.8601915620524307
== Val Accuracy:  0.6231884057971014
== Val F1:  0.5724654703421134
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7789
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8154
== Val Cross Entropy:  0.8205397416805399
== Val Accuracy:  0.6702898550724637
== Val F1:  0.6449593178855875
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7577
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8028
== Val Cross Entropy:  0.807776952611989
== Val Accuracy:  0.6847826086956522
== Val F1:  0.6614463555231346
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7425
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.7957
== Val Cross Entropy:  0.8001862373845331
== Val Accuracy:  0.6956521739130435
== Val F1:  0.6676260313563812
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7122
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.7912
== Val Cross Entropy:  0.7956283565225273
== Val Accuracy:  0.6920289855072463
== Val F1:  0.6702250583594915
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7020
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.7796
== Val Cross Entropy:  0.7840797767556947
== Val Accuracy:  0.717391304347826
== Val F1:  0.6938898452704041
== Early Stopping Count:  0

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.6790
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.7835
== Val Cross Entropy:  0.7877778770594761
== Val Accuracy:  0.7137681159420289
== Val F1:  0.6901480935680606
== Early Stopping Count:  0

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.6652
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7841
== Val Cross Entropy:  0.7881143319195715
== Val Accuracy:  0.7137681159420289
== Val F1:  0.6857040817837105
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.7094
 Test F1 Score: 0.6963

== Experiment 6 of 7:
=> Learning Rate: 0.0001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.4522
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.9472
== Val Cross Entropy:  0.9524591380152209
== Val Accuracy:  0.5072463768115942
== Val F1:  0.37447766974459323
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.9036
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8553
== Val Cross Entropy:  0.8607425042267504
== Val Accuracy:  0.6231884057971014
== Val F1:  0.5686681814619616
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.8019
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.8023
== Val Cross Entropy:  0.8072231576360506
== Val Accuracy:  0.6159420289855072
== Val F1:  0.591691074535914
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.7249
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8429
== Val Cross Entropy:  0.8433759854785328
== Val Accuracy:  0.7101449275362319
== Val F1:  0.6699445290941106
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.6372
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.7487
== Val Cross Entropy:  0.75045455529772
== Val Accuracy:  0.7318840579710145
== Val F1:  0.7109973219076466
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5824
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7603
== Val Cross Entropy:  0.7600173076678967
== Val Accuracy:  0.7210144927536232
== Val F1:  0.7060042524584524
== Early Stopping Count:  0

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5352
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.7409
== Val Cross Entropy:  0.738282678456142
== Val Accuracy:  0.7137681159420289
== Val F1:  0.6960490947009652
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5096
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7474
== Val Cross Entropy:  0.7445189850083713
== Val Accuracy:  0.7246376811594203
== Val F1:  0.7123841034870514
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.4622
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.7583
== Val Cross Entropy:  0.7552369591490976
== Val Accuracy:  0.7355072463768116
== Val F1:  0.7220454813243784
== Early Stopping Count:  0

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.4646
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.7679
== Val Cross Entropy:  0.7643320257293766
== Val Accuracy:  0.717391304347826
== Val F1:  0.7031017964868728
== Early Stopping Count:  1

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.4288
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7595
== Val Cross Entropy:  0.7555692874152085
== Val Accuracy:  0.717391304347826
== Val F1:  0.7023887331695049
== Early Stopping Count:  2


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.02it/s]



 Test Results — LR: 0.0001, BS: 16
 Test Accuracy: 0.7391
 Test F1 Score: 0.7251

== Experiment 7 of 7:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.43it/s]


train Loss: 1.1229
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.8200
== Val Cross Entropy:  0.8228849953618543
== Val Accuracy:  0.6920289855072463
== Val F1:  0.6805931443937996
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.7191
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7194
== Val Cross Entropy:  0.7187201319069698
== Val Accuracy:  0.717391304347826
== Val F1:  0.7067106614184808
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5476
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.7606
== Val Cross Entropy:  0.7562077099393154
== Val Accuracy:  0.7318840579710145
== Val F1:  0.7352187080956453
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.4038
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.6302
== Val Cross Entropy:  0.6338671194068317
== Val Accuracy:  0.7644927536231884
== Val F1:  0.7621437889159348
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.2600
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.7479
== Val Cross Entropy:  0.7522286830534195
== Val Accuracy:  0.7862318840579711
== Val F1:  0.7701500967486061
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.1822
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.8560
== Val Cross Entropy:  0.8699707391447035
== Val Accuracy:  0.7753623188405797
== Val F1:  0.7645856978053029
== Early Stopping Count:  1

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0966
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.0446
== Val Cross Entropy:  1.0769096257100845
== Val Accuracy:  0.7717391304347826
== Val F1:  0.7668042540165302
== Early Stopping Count:  2

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0951
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.9967
== Val Cross Entropy:  1.0300351679967394
== Val Accuracy:  0.8152173913043478
== Val F1:  0.8013485883996281
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0540
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 1.1045
== Val Cross Entropy:  1.1425585846320307
== Val Accuracy:  0.7862318840579711
== Val F1:  0.7807344368639445
== Early Stopping Count:  1

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0295
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 1.1518
== Val Cross Entropy:  1.1911216898468033
== Val Accuracy:  0.8079710144927537
== Val F1:  0.7962144832140442
== Early Stopping Count:  2

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0205
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 1.1657
== Val Cross Entropy:  1.2056357234652186
== Val Accuracy:  0.8043478260869565
== Val F1:  0.7928282813560255
== Early Stopping Count:  3


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.03it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.8055
 Test F1 Score: 0.8034





In [None]:
display_results(all_results)


SUMMARY OF RESULTS

ADAPTER FINE-TUNING RESULTS:
------------------------------------------------------------
Method   Epochs LR       BS   Val CE   Val Acc  Val F1   Test Acc Test F1 
------------------------------------------------------------
adapter  6      0.001000 16   0.6203   0.7935   0.7835   0.8375   0.8288  
adapter  11     0.001000 16   0.6339   0.8152   0.8013   0.8055   0.8034  
adapter  9      0.001000 16   0.6674   0.8007   0.7816   0.8032   0.8018  
adapter  9      0.000100 16   0.7514   0.7428   0.7248   0.7529   0.7400  
adapter  11     0.000100 16   0.7445   0.7319   0.7220   0.7391   0.7251  
adapter  11     0.000050 16   0.7881   0.7174   0.6939   0.7094   0.6963  
adapter  6      0.000100 16   0.7811   0.7355   0.7117   0.6865   0.6728  
adapter  9      0.000050 16   0.8046   0.6884   0.6612   0.6613   0.6506  
adapter  11     0.000030 16   0.8283   0.6667   0.6326   0.6339   0.6215  
adapter  6      0.000050 16   0.8395   0.6486   0.6062   0.6362   0.6205  
ada

## Finbert training on best hyperparameters

Finbert full fine tuning

In [None]:
model_name = "yiyanghkust/finbert-pretrain"
results = train_and_evaluate(
        model_name=model_name,
        fine_tuning_method="full",
        batch_sizes=[16],
        learning_rates=[5e-5],
        num_epochs=4
)


config.json:   0%|          | 0.00/359 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 1:
=> Learning Rate: 5e-05, Batch Size: 16


pytorch_model.bin:   0%|          | 0.00/442M [00:00<?, ?B/s]

Some weights of BertForTokenClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 109,165,061 / 109,165,061 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop:   3%|▎         | 2/59 [00:00<00:23,  2.42it/s]

model.safetensors:   0%|          | 0.00/442M [00:00<?, ?B/s]

Train Loop: 100%|██████████| 59/59 [00:34<00:00,  1.71it/s]


train Loss: 0.8364
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.29it/s]


val Loss: 0.5610
== Val Cross Entropy:  0.5629416272558015
== Val Accuracy:  0.7486910994764397
== Val F1:  0.7099555285716393
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:34<00:00,  1.71it/s]


train Loss: 0.4014
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.39it/s]


val Loss: 0.4635
== Val Cross Entropy:  0.4731697301412451
== Val Accuracy:  0.7696335078534031
== Val F1:  0.7603197666191093
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:34<00:00,  1.72it/s]


train Loss: 0.1522
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.35it/s]


val Loss: 0.4445
== Val Cross Entropy:  0.4528508150372012
== Val Accuracy:  0.7879581151832461
== Val F1:  0.7726047804663787
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:34<00:00,  1.71it/s]


train Loss: 0.0836
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.35it/s]


val Loss: 0.3986
== Val Cross Entropy:  0.410409152507782
== Val Accuracy:  0.8219895287958116
== Val F1:  0.809085655180519
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:02<00:00,  6.45it/s]


 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.8249
 Test F1 Score: 0.8249





Finbert Lora fine tuning

In [None]:
model_name = "yiyanghkust/finbert-pretrain"
results = train_and_evaluate(
        model_name=model_name,
        fine_tuning_method="lora",
        batch_sizes=[16],
        learning_rates=[1e-3],
        num_epochs=5
)

{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 1:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 110,053,642 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:28<00:00,  2.11it/s]


train Loss: 0.9717
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.99it/s]


val Loss: 0.6058
== Val Cross Entropy:  0.6128816923190807
== Val Accuracy:  0.6465968586387435
== Val F1:  0.6297402355013308
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:28<00:00,  2.10it/s]


train Loss: 0.5231
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.01it/s]


val Loss: 0.5980
== Val Cross Entropy:  0.5988265039591953
== Val Accuracy:  0.7356020942408377
== Val F1:  0.6756626552204498
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:28<00:00,  2.10it/s]


train Loss: 0.3482
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.00it/s]


val Loss: 0.5441
== Val Cross Entropy:  0.5544698012286219
== Val Accuracy:  0.7146596858638743
== Val F1:  0.6969395016637293
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:28<00:00,  2.11it/s]


train Loss: 0.2197
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.00it/s]


val Loss: 0.4434
== Val Cross Entropy:  0.45499100253499786
== Val Accuracy:  0.7774869109947644
== Val F1:  0.7613258053676607
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:28<00:00,  2.10it/s]


train Loss: 0.1705
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.99it/s]


val Loss: 0.4609
== Val Cross Entropy:  0.47243820028058414
== Val Accuracy:  0.7670157068062827
== Val F1:  0.7496477070068256
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.83it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.7843
 Test F1 Score: 0.7776





Finbert Adapter fine tuning

In [None]:
model_name = "yiyanghkust/finbert-pretrain"
results = train_and_evaluate(
        model_name=model_name,
        fine_tuning_method="adapter",
        batch_sizes=[16],
        learning_rates=[1e-3],
        num_epochs= 11
)

{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 1:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at yiyanghkust/finbert-pretrain and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,954,117 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.20it/s]


train Loss: 1.0305
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.97it/s]


val Loss: 0.6038
== Val Cross Entropy:  0.6102639426445139
== Val Accuracy:  0.675392670157068
== Val F1:  0.6283329382583721
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.6078
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.6811
== Val Cross Entropy:  0.690905238020009
== Val Accuracy:  0.643979057591623
== Val F1:  0.6464422766232297
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.3629
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.5188
== Val Cross Entropy:  0.5236883004163874
== Val Accuracy:  0.7696335078534031
== Val F1:  0.7562196915587607
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.2512
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.4927
== Val Cross Entropy:  0.4973817337946645
== Val Accuracy:  0.7905759162303665
== Val F1:  0.7807530354188315
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.1524
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.6129
== Val Cross Entropy:  0.6167719107249687
== Val Accuracy:  0.8219895287958116
== Val F1:  0.8096997365489877
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.0914
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.95it/s]


val Loss: 0.6000
== Val Cross Entropy:  0.6059009766141916
== Val Accuracy:  0.7722513089005235
== Val F1:  0.7747560368089776
== Early Stopping Count:  1

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.0814
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.5653
== Val Cross Entropy:  0.5804427384302534
== Val Accuracy:  0.8167539267015707
== Val F1:  0.8253634192238704
== Early Stopping Count:  0

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.0539
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.5903
== Val Cross Entropy:  0.6092284959580364
== Val Accuracy:  0.8219895287958116
== Val F1:  0.8234346754431381
== Early Stopping Count:  1

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.0208
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.01it/s]


val Loss: 0.6085
== Val Cross Entropy:  0.6243386058586424
== Val Accuracy:  0.8219895287958116
== Val F1:  0.8231979066284334
== Early Stopping Count:  2

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.0125
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.01it/s]


val Loss: 0.6606
== Val Cross Entropy:  0.6828885075218719
== Val Accuracy:  0.8350785340314136
== Val F1:  0.834224399943466
== Early Stopping Count:  0

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:26<00:00,  2.21it/s]


train Loss: 0.0091
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.94it/s]


val Loss: 0.6483
== Val Cross Entropy:  0.669503383033363
== Val Accuracy:  0.837696335078534
== Val F1:  0.8378985083180615
== Early Stopping Count:  0


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.80it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.7913
 Test F1 Score: 0.7926





## FlangBert training on best Hyperparams

FlangBert full fine tuning

In [None]:
# SALT-NLP/FLANG-BERT
model_name = "SALT-NLP/FLANG-BERT"
results = train_and_evaluate(
        model_name=model_name,
        fine_tuning_method="full",
        batch_sizes=[16],
        learning_rates=[5e-5],
        num_epochs=4
)

tokenizer_config.json:   0%|          | 0.00/369 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 1:
=> Learning Rate: 5e-05, Batch Size: 16


config.json:   0%|          | 0.00/664 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of BertForTokenClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 108,895,493 / 108,895,493 (100.00%)

=== Epoch 1
-- TRAIN Phase


Train Loop:   5%|▌         | 3/59 [00:01<00:25,  2.20it/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.85it/s]


train Loss: 0.9069
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.07it/s]


val Loss: 0.7824
== Val Cross Entropy:  0.7853617698981844
== Val Accuracy:  0.717391304347826
== Val F1:  0.6439098939439187
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.85it/s]


train Loss: 0.5230
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7801
== Val Cross Entropy:  0.7804629021677477
== Val Accuracy:  0.6557971014492754
== Val F1:  0.6668611852089598
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.87it/s]


train Loss: 0.2680
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.23it/s]


val Loss: 0.7705
== Val Cross Entropy:  0.7839380149440519
== Val Accuracy:  0.7717391304347826
== Val F1:  0.7569016204314922
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:31<00:00,  1.86it/s]


train Loss: 0.1310
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:02<00:00,  5.16it/s]


val Loss: 0.8225
== Val Cross Entropy:  0.8387456948644129
== Val Accuracy:  0.7753623188405797
== Val F1:  0.762172719219965
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.46it/s]


 Test Results — LR: 5e-05, BS: 16
 Test Accuracy: 0.8261
 Test F1 Score: 0.8191





FlangBert lora fine tuning

In [None]:
model_name = "SALT-NLP/FLANG-BERT"
results = train_and_evaluate(
        model_name=model_name,
        fine_tuning_method="lora",
        batch_sizes=[16],
        learning_rates=[1e-3],
        num_epochs=5
)

{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 1:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 888,581 / 109,784,074 (0.81%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 1.0497
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.7637
== Val Cross Entropy:  0.7658754628280113
== Val Accuracy:  0.6884057971014492
== Val F1:  0.6717744605763484
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.6413
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.93it/s]


val Loss: 0.7238
== Val Cross Entropy:  0.7250643788740553
== Val Accuracy:  0.7282608695652174
== Val F1:  0.7134472671860358
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.4962
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.92it/s]


val Loss: 0.6989
== Val Cross Entropy:  0.7055339854339073
== Val Accuracy:  0.717391304347826
== Val F1:  0.7039686260065963
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.32it/s]


train Loss: 0.3420
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.91it/s]


val Loss: 0.7333
== Val Cross Entropy:  0.7415021925136961
== Val Accuracy:  0.7536231884057971
== Val F1:  0.751186768874516
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:25<00:00,  2.31it/s]


train Loss: 0.2784
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.7470
== Val Cross Entropy:  0.7537680992792393
== Val Accuracy:  0.7572463768115942
== Val F1:  0.756951381951382
== Early Stopping Count:  1


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.05it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.8124
 Test F1 Score: 0.8061





FlangBert adapter fine tuning

In [None]:
model_name = "SALT-NLP/FLANG-BERT"
results = train_and_evaluate(
        model_name=model_name,
        fine_tuning_method="adapter",
        batch_sizes=[16],
        learning_rates=[1e-3],
        num_epochs= 11)

{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}


Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors


{'I-LOC': 0, 'I-MISC': 1, 'I-ORG': 2, 'I-PER': 3, 'O': 4}
tensor([0.2500, 0.2500, 0.2500, 0.2500, 0.0000], device='cuda:0')
length_dataset 932 232 302

== Experiment 1 of 1:
=> Learning Rate: 0.001, Batch Size: 16


Some weights of BertForTokenClassification were not initialized from the model checkpoint at SALT-NLP/FLANG-BERT and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable parameters: 1,792,901 / 110,684,549 (1.62%)

=== Epoch 1
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 1.1204
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.89it/s]


val Loss: 0.8128
== Val Cross Entropy:  0.8181525982659439
== Val Accuracy:  0.6992753623188406
== Val F1:  0.6817657916324856
== Early Stopping Count:  0

=== Epoch 2
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.7463
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.85it/s]


val Loss: 0.7342
== Val Cross Entropy:  0.7360247327335949
== Val Accuracy:  0.7065217391304348
== Val F1:  0.6962583458371767
== Early Stopping Count:  0

=== Epoch 3
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.5547
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.86it/s]


val Loss: 0.7555
== Val Cross Entropy:  0.7591403804976364
== Val Accuracy:  0.7318840579710145
== Val F1:  0.7291410791779251
== Early Stopping Count:  0

=== Epoch 4
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.4205
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.6836
== Val Cross Entropy:  0.6901358982612347
== Val Accuracy:  0.7681159420289855
== Val F1:  0.7435754553800795
== Early Stopping Count:  0

=== Epoch 5
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.2945
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.8457
== Val Cross Entropy:  0.8474752644141172
== Val Accuracy:  0.7789855072463768
== Val F1:  0.7604780307642985
== Early Stopping Count:  0

=== Epoch 6
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.1952
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.85it/s]


val Loss: 0.7444
== Val Cross Entropy:  0.7609310913188704
== Val Accuracy:  0.7644927536231884
== Val F1:  0.76032960769316
== Early Stopping Count:  1

=== Epoch 7
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.1365
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.9250
== Val Cross Entropy:  0.9534725071541195
== Val Accuracy:  0.75
== Val F1:  0.7434710414527633
== Early Stopping Count:  2

=== Epoch 8
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0739
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.88it/s]


val Loss: 0.9717
== Val Cross Entropy:  1.004836787393413
== Val Accuracy:  0.7898550724637681
== Val F1:  0.7744832760472502
== Early Stopping Count:  0

=== Epoch 9
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0588
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9596
== Val Cross Entropy:  0.9904050153262657
== Val Accuracy:  0.7463768115942029
== Val F1:  0.740549269254713
== Early Stopping Count:  1

=== Epoch 10
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.42it/s]


train Loss: 0.0212
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.90it/s]


val Loss: 0.9717
== Val Cross Entropy:  1.0043763648718596
== Val Accuracy:  0.7681159420289855
== Val F1:  0.7584385542083019
== Early Stopping Count:  2

=== Epoch 11
-- TRAIN Phase


Train Loop: 100%|██████████| 59/59 [00:24<00:00,  2.41it/s]


train Loss: 0.0145
-- VAL Phase


Val Loop: 100%|██████████| 15/15 [00:03<00:00,  4.87it/s]


val Loss: 0.9960
== Val Cross Entropy:  1.029274794305193
== Val Accuracy:  0.7717391304347826
== Val F1:  0.7626472273689168
== Early Stopping Count:  3


Testing: 100%|██████████| 19/19 [00:03<00:00,  5.00it/s]


 Test Results — LR: 0.001, BS: 16
 Test Accuracy: 0.8032
 Test F1 Score: 0.7996



