# Router Quality Classifier Training (Part Three)

Cost as a prompt property could be the only thing that may matter in many cases. Majority of the LLMs can generate a good response for a lot of promots, this can be see as many of the benchmarks are becoming obsolete. Users don't always have complex queries for LLMs to answer. Going from GPT-4o to GPT-5 many users are not seeing much improvement, other than tech/AI influencers' praises for the model. Mainly because only a small percentage of queries might need responses from a strong model. This is where cost can be optimised. In this part, we train to predict if the query would need a strong model to query or not.


From our (extremely) small dataset we see that the trends holds true, `meta-llama/llama-3.1-8b-instruct` is capable of answering majority (~66%) of the queries. But this is under the assumption that the model `openai/gpt-oss-120b` is a good judge, this can be challenged and researched further.

This is what the trend in our data looks like:

```
=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)
========================================
```

Everything below rating 5 can be considered as something that should be addressed by a strong model to keep quality high.  

Regarding model selection for this task, `openai/gpt-oss-120b` stands way ahead of our weak model i.e. `meta-llama/llama-3.1-8b-instruct` in benchmarks but the weak model costs substantially less. Hence for the sake of setting an example, these models align to the framework set by the paper RouteLLM.

We evaluate the F1 score of the optimised models here instead of the accuracy as there is a clear imbalance in the dataset.

Selection of model to train, since we have barely any datapoints to actually make learning happen we opt for a model with not many parameters which also gives us speed as bigger models can be resouce heavy. In a perfect world with hundreds of thousands of data points (like in the `routellm/gpt4_dataset` dataset and better with availability of better GPUs, we'd opt for a newer and larger embedding model to finetune or follow what the routeLLM github demonstrates i.e. finetuning a Llama 3.1 8B LLM on Anyscale's infrastructure.

The tl;dr outcome of this notebook is that this method is inappropriate for a tiny dataset, hence the overfit and low f1 score.

All the plots can recreated on your W&B account, make sure you are loggedin when you run this notebook.

*Note:* A response generated that is of good quality should be routed to a weak model (label class 0) and a response that is of bad quality should be routed to a strong model (label class 1). We may be refering to our classes as good quality/weak model and bad quality/strong model, interchangebly in this notebook.

## 1. Import Dependencies

In [2]:
import time
import torch
import os
import json
from torch import nn
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModel, set_seed
from sklearn.model_selection import train_test_split
import random
import numpy as np
import wandb
from tqdm import tqdm
from torch.optim.lr_scheduler import LinearLR, ReduceLROnPlateau
from sklearn.metrics import f1_score
import pandas as pd
from typing import Dict, Any, Tuple, Optional
import warnings
warnings.filterwarnings('ignore')

print("Dependencies imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name()}")

Dependencies imported successfully!
PyTorch version: 2.5.1+cu121
CUDA available: True
CUDA device: NVIDIA GeForce RTX 3060


## 2. Utility Functions and Classes

In [3]:
def set_reproducible_environment(seed: int = 9999) -> None:
    """Set up reproducible environment with fixed random seeds."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        
    # Set transformers seed
    set_seed(seed)
    
    print(f"Reproducible environment set with seed: {seed}")

In [4]:
def load_rated_dataset(file_path: str) -> list:
    """Load the rated dataset from JSONL file."""
    data = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            try:
                item = json.loads(line.strip())
                if 'rating' in item and item['rating'] is not None:
                    data.append(item)
            except json.JSONDecodeError:
                continue
    return data

def analyze_dataset(df: pd.DataFrame, rating_threshold: int = 4) -> None:
    """Analyze and print dataset statistics."""
    total_samples = len(df)
    labels_array = df["quality_label"].values
    label_0_count = np.sum(labels_array == 0)
    label_1_count = np.sum(labels_array == 1)
    
    print("\n=== DATASET ANALYSIS ===")
    print(f"Total samples: {total_samples}")
    print(f"Label 0 (Good quality, rating >= {rating_threshold}): {label_0_count} ({label_0_count/total_samples*100:.1f}%)")
    print(f"Label 1 (Poor quality, rating < {rating_threshold}): {label_1_count} ({label_1_count/total_samples*100:.1f}%)")
    if label_1_count > 0:
        print(f"Class balance ratio (0:1): {label_0_count/label_1_count:.2f}:1")
    
    print("\n=== RATING DISTRIBUTION ===")
    rating_counts = df['rating'].value_counts().sort_index()
    for rating, count in rating_counts.items():
        print(f"Rating {rating}: {count} ({count/total_samples*100:.1f}%)")
    print("=" * 40)

In [5]:
class RatedDataset(Dataset):
    """Custom PyTorch dataset for rated response data."""
    
    def __init__(self, sentences: list, labels: list):
        self.sentences = sentences
        self.labels = labels

    def __len__(self) -> int:
        return len(self.sentences)

    def __getitem__(self, idx: int) -> Tuple[str, torch.Tensor]:
        sentence = self.sentences[idx]
        label = self.labels[idx]
        return sentence, torch.tensor(label, dtype=torch.float)

def worker_init_fn(worker_id: int, seed: int = 9999) -> None:
    """Ensure deterministic behavior in DataLoader workers."""
    np.random.seed(seed + worker_id)
    random.seed(seed + worker_id)

In [6]:
class CompactQualityClassifier(nn.Module):
    """Compact transformer-based quality classifier optimized for limited data."""
    
    def __init__(self, transformer_model_name: str, dropout_rate: float = 0.2, 
                 freeze_transformer: bool = False):
        super(CompactQualityClassifier, self).__init__()
        self.tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
        self.transformer = AutoModel.from_pretrained(transformer_model_name)
        
        # Freeze transformer parameters if specified
        if freeze_transformer:
            for param in self.transformer.parameters():
                param.requires_grad = False
            print("Transformer layers frozen - only training classification head")
        else:
            print("All transformer layers will be fine-tuned")
        
        # Add dropout for regularization
        self.dropout = nn.Dropout(dropout_rate)
        
        # Simplified architecture for smaller dataset
        hidden_size = self.transformer.config.hidden_size
        self.classifier = nn.Sequential(
            nn.Linear(hidden_size, 256),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(256, 64),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(64, 1)
        )

    def forward(self, sentences: list) -> torch.Tensor:
        # Tokenize and encode as batch
        encoding = self.tokenizer(
            sentences, 
            padding=True, 
            truncation=True, 
            max_length=512,
            return_tensors="pt"
        ).to(next(self.transformer.parameters()).device)
        
        outputs = self.transformer(**encoding)
        # Use the [CLS] token embedding for classification
        pooled = outputs.last_hidden_state[:, 0, :]
        
        # Apply dropout and classification layers
        x = self.dropout(pooled)
        logits = self.classifier(x)
        
        return logits
    
    def count_parameters(self) -> Tuple[int, int]:
        """Count total and trainable parameters."""
        total_params = sum(p.numel() for p in self.parameters())
        trainable_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
        
        print(f"\n=== MODEL PARAMETER ANALYSIS ===")
        print(f"Total parameters: {total_params:,}")
        print(f"Trainable parameters: {trainable_params:,}")
        print(f"Frozen parameters: {total_params - trainable_params:,}")
        print(f"Percentage trainable: {trainable_params/total_params*100:.1f}%")
        print("=" * 40)
        
        return total_params, trainable_params

## 3. Training and Validation Functions

In [7]:
def get_metric_for_early_stopping(metric_name: str, loss: float, accuracy: float, f1: float) -> float:
    """Get the metric value based on the early stopping metric name."""
    if metric_name == "loss":
        return loss
    elif metric_name == "accuracy":
        return accuracy
    elif metric_name == "f1":
        return f1
    else:
        raise ValueError(f"Unknown metric: {metric_name}. Choose from 'loss', 'accuracy', 'f1'")

def is_metric_improved(metric_name: str, current_value: float, best_value: float) -> bool:
    """Check if the metric has improved (lower is better for loss, higher is better for accuracy/f1)."""
    if metric_name == "loss":
        return current_value < best_value
    else:  # accuracy or f1
        return current_value > best_value

def validate(model: nn.Module, val_loader: DataLoader, criterion: nn.Module, 
            device: torch.device) -> Tuple[float, float, float]:
    """Perform validation and return the loss, accuracy, and F1 score."""
    model.eval()
    valid_loss = 0.0
    valid_correct = 0
    all_predictions = []
    all_labels = []
    
    with torch.no_grad():
        for sentences, labels in val_loader:
            sentences = list(sentences)
            labels = labels.to(device)
            
            # Forward pass
            outputs = model(sentences).squeeze(1)
            
            # Compute loss
            loss = criterion(outputs, labels)
            valid_loss += loss.item()
            
            # Calculate accuracy and collect predictions for F1
            predictions = torch.round(torch.sigmoid(outputs))
            valid_correct += (predictions == labels).sum().item()
            
            # Store for F1 calculation
            all_predictions.extend(predictions.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    valid_loss /= len(val_loader)
    valid_accuracy = valid_correct / len(val_loader.dataset)
    
    # Handle case where all predictions are the same class
    try:
        valid_f1 = f1_score(all_labels, all_predictions, average='binary')
    except:
        valid_f1 = 0.0
    
    return valid_loss, valid_accuracy, valid_f1

## 4. Main Training Function

In [8]:
def train_router_quality_classifier(
    dataset_path: str = "rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl",
    # Model hyperparameters
    transformer_model_name: str = 'distilbert-base-uncased',
    dropout_rate: float = 0.1,
    freeze_transformer: bool = False,
    
    # Training hyperparameters
    learning_rate: float = 0.0001,
    batch_size: int = 16,
    n_epochs: int = 10,
    weight_decay: float = 0.0001,
    gradient_clip_value: float = 1.0,
    
    # Data hyperparameters
    test_size: float = 0.2,
    dataset_fraction: float = 1.0,
    rating_threshold: int = 4,
    
    # Early stopping and scheduling
    early_stopping_metric: str = "f1",
    early_stopping_patience: int = 500,
    warmup_steps: int = 50,
    enable_lr_annealing: bool = True,
    lr_annealing_patience: int = 5,
    lr_annealing_factor: float = 0.5,
    lr_annealing_min_lr: float = 1e-6,
    
    # Experiment settings
    experiment_name: str = "router_training",
    seed: int = 9999,
    use_wandb: bool = True
) -> Dict[str, Any]:
    """
    Train a router quality classifier with comprehensive configuration options.
    
    Returns:
        Dictionary containing training results and best metrics
    """
    
    # Set up reproducible environment
    set_reproducible_environment(seed)
    
    # Generate run name
    timestamp = time.strftime('%Y%m%d_%H%M%S')
    run_name = f"{experiment_name}_{timestamp}" if experiment_name else f"run_{timestamp}"
    
    # Initialize W&B if enabled
    if use_wandb:
        wandb.init(project="router_rated", name=run_name)
    
    print(f"\nStarting training run: {run_name}")
    print(f"Using dataset: {dataset_path}")
    
    # Load and process dataset
    print("\nLoading rated dataset...")
    data_items = load_rated_dataset(dataset_path)
    print(f"Loaded {len(data_items)} items with ratings")
    
    # Convert to DataFrame and filter
    df = pd.DataFrame(data_items)
    df = df.dropna(subset=['rating'])
    print(f"Dataset after filtering: {len(df)} items")
    
    # Create binary labels
    df["quality_label"] = df["rating"].apply(lambda x: 0 if x >= rating_threshold else 1)
    
    # Analyze dataset
    analyze_dataset(df, rating_threshold)
    
    # Extract sentences and labels
    sentences = df["model_response"].tolist()
    labels = df["quality_label"].tolist()
    
    # Sample dataset if needed
    if dataset_fraction < 1.0:
        df_sampled = df.sample(frac=dataset_fraction, random_state=seed).reset_index(drop=True)
        sentences = df_sampled["model_response"].tolist()
        labels = df_sampled["quality_label"].tolist()
        print(f"Sampled {len(sentences)} items ({dataset_fraction*100}% of dataset)")
    
    # Split data
    sentences_train, sentences_val, labels_train, labels_val = train_test_split(
        sentences, labels, test_size=test_size, random_state=seed, stratify=labels
    )
    
    print(f"\nTraining samples: {len(sentences_train)}")
    print(f"Validation samples: {len(sentences_val)}")
    
    # Create datasets and data loaders
    train_data = RatedDataset(sentences_train, labels_train)
    val_data = RatedDataset(sentences_val, labels_val)
    
    # Create generator for reproducible shuffling
    g = torch.Generator()
    g.manual_seed(seed)
    
    train_loader = DataLoader(
        train_data, 
        batch_size=batch_size, 
        shuffle=True,
        generator=g,
        worker_init_fn=lambda w: worker_init_fn(w, seed),
        num_workers=0
    )
    
    val_loader = DataLoader(
        val_data, 
        batch_size=batch_size, 
        shuffle=False,
        worker_init_fn=lambda w: worker_init_fn(w, seed),
        num_workers=0
    )
    
    print("DataLoaders created successfully")
    
    # Initialize model
    print(f"\nInitializing model: {transformer_model_name}")
    model = CompactQualityClassifier(
        transformer_model_name=transformer_model_name,
        dropout_rate=dropout_rate,
        freeze_transformer=freeze_transformer
    )
    
    # Deterministic weight initialization for classifier layers
    def init_weights_deterministic(module):
        if isinstance(module, nn.Linear):
            torch.manual_seed(seed)
            nn.init.xavier_uniform_(module.weight)
            if module.bias is not None:
                nn.init.constant_(module.bias, 0)
    
    model.classifier.apply(init_weights_deterministic)
    
    # Move to device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)
    print(f"📱 Model moved to device: {device}")
    
    # Count parameters
    total_params, trainable_params = model.count_parameters()
    
    # Initialize training components
    criterion = nn.BCEWithLogitsLoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    
    # Learning rate scheduler for warmup
    scheduler = LinearLR(optimizer, start_factor=0.1, total_iters=warmup_steps)
    
    # Learning rate annealing scheduler
    lr_annealing_scheduler = None
    if enable_lr_annealing:
        lr_annealing_scheduler = ReduceLROnPlateau(
            optimizer, 
            mode='min',
            factor=lr_annealing_factor,
            patience=lr_annealing_patience,
            min_lr=lr_annealing_min_lr,
            verbose=True
        )
        print(f"LR Annealing enabled: factor={lr_annealing_factor}, patience={lr_annealing_patience}")
    
    # Create runs directory
    runs_dir = "runs"
    os.makedirs(runs_dir, exist_ok=True)
    
    # Initialize tracking variables
    best_valid_loss = float('inf')
    best_valid_accuracy = 0.0
    best_valid_f1 = 0.0
    best_metric_value = float('inf') if early_stopping_metric == "loss" else 0.0
    patience_counter = 0
    
    # Log hyperparameters to W&B
    if use_wandb:
        wandb.config = {
            "learning_rate": learning_rate,
            "epochs": n_epochs,
            "batch_size": batch_size,
            "transformer_model": transformer_model_name,
            "dataset_fraction": dataset_fraction,
            "test_size": test_size,
            "experiment_name": experiment_name,
            "weight_decay": weight_decay,
            "warmup_steps": warmup_steps,
            "early_stopping_patience": early_stopping_patience,
            "early_stopping_metric": early_stopping_metric,
            "enable_lr_annealing": enable_lr_annealing,
            "dropout_rate": dropout_rate,
            "freeze_transformer": freeze_transformer,
            "rating_threshold": rating_threshold,
            "total_parameters": total_params,
            "trainable_parameters": trainable_params,
            "dataset_size": len(sentences),
            "seed": seed,
            "device": str(device),
        }
    
    # Initial validation
    print("\nStarting initial validation...")
    initial_valid_loss, initial_valid_accuracy, initial_valid_f1 = validate(model, val_loader, criterion, device)
    print(f"Initial - Loss: {initial_valid_loss:.4f}, Accuracy: {initial_valid_accuracy:.4f}, F1: {initial_valid_f1:.4f}")
    
    if use_wandb:
        wandb.log({
            "epoch": 0,
            "valid_loss": initial_valid_loss,
            "valid_accuracy": initial_valid_accuracy,
            "valid_f1": initial_valid_f1,
        })
    
    # Training loop
    print(f"\nStarting training for {n_epochs} epochs...")
    global_step = 0
    
    def set_all_seeds(s):
        """Set all random seeds for maximum reproducibility."""
        random.seed(s)
        np.random.seed(s)
        torch.manual_seed(s)
        if torch.cuda.is_available():
            torch.cuda.manual_seed_all(s)
    
    for epoch in range(n_epochs):
        # Re-set seeds for each epoch
        set_all_seeds(seed + epoch)
        
        # Training phase
        model.train()
        train_loss = 0.0
        train_correct = 0
        train_predictions = []
        train_labels_list = []
        
        progress_bar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{n_epochs}")
        
        for batch_idx, (sentences, labels) in enumerate(progress_bar):
            sentences = list(sentences)
            labels = labels.to(device)
            
            # Clear gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(sentences).squeeze(1)
            
            # Compute loss
            loss = criterion(outputs, labels)
            
            # Backward pass with gradient clipping
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), gradient_clip_value)
            optimizer.step()
            
            # Update learning rate scheduler for warmup period
            if global_step < warmup_steps:
                scheduler.step()
            
            global_step += 1
            
            # Track training metrics
            train_loss += loss.item()
            predictions = torch.round(torch.sigmoid(outputs))
            train_correct += (predictions == labels).sum().item()
            
            # Store for F1 calculation
            train_predictions.extend(predictions.detach().cpu().numpy())
            train_labels_list.extend(labels.detach().cpu().numpy())
            
            # Update progress bar
            progress_bar.set_postfix({
                'loss': f'{loss.item():.4f}',
                'lr': f'{optimizer.param_groups[0]["lr"]:.6f}'
            })
        
        # Calculate epoch metrics
        train_loss /= len(train_loader)
        train_accuracy = train_correct / len(train_loader.dataset)
        
        try:
            train_f1 = f1_score(train_labels_list, train_predictions, average='binary')
        except:
            train_f1 = 0.0
        
        # Validation
        valid_loss, valid_accuracy, valid_f1 = validate(model, val_loader, criterion, device)
        
        # Apply LR annealing
        if enable_lr_annealing and lr_annealing_scheduler is not None:
            lr_annealing_scheduler.step(train_loss)
        
        # Log metrics
        if use_wandb:
            wandb.log({
                "epoch": epoch + 1,
                "train_loss": train_loss,
                "train_accuracy": train_accuracy,
                "train_f1": train_f1,
                "valid_loss": valid_loss,
                "valid_accuracy": valid_accuracy,
                "valid_f1": valid_f1,
                "learning_rate": optimizer.param_groups[0]["lr"],
            })
        
        print(f'Epoch {epoch+1}/{n_epochs}:')
        print(f'  Training   - Loss: {train_loss:.4f}, Accuracy: {train_accuracy:.4f}, F1: {train_f1:.4f}')
        print(f'  Validation - Loss: {valid_loss:.4f}, Accuracy: {valid_accuracy:.4f}, F1: {valid_f1:.4f}')
        print(f'  Learning Rate: {optimizer.param_groups[0]["lr"]:.8f}')
        
        # Early stopping and model saving
        current_metric_value = get_metric_for_early_stopping(early_stopping_metric, valid_loss, valid_accuracy, valid_f1)
        
        if is_metric_improved(early_stopping_metric, current_metric_value, best_metric_value):
            best_metric_value = current_metric_value
            best_valid_loss = valid_loss
            best_valid_accuracy = valid_accuracy
            best_valid_f1 = valid_f1
            patience_counter = 0
            torch.save(model.state_dict(), os.path.join(runs_dir, f'best_model_{run_name}.pt'))
            print(f"New best model saved with {early_stopping_metric}: {current_metric_value:.4f}")
            
            if use_wandb:
                wandb.log({
                    "best_valid_loss": best_valid_loss,
                    "best_valid_accuracy": best_valid_accuracy,
                    "best_valid_f1": best_valid_f1,
                    f"best_{early_stopping_metric}": best_metric_value,
                })
        else:
            patience_counter += 1
            print(f"No improvement in {early_stopping_metric}. Patience: {patience_counter}/{early_stopping_patience}")
            
            if patience_counter >= early_stopping_patience:
                print(f'Early stopping triggered after {epoch+1} epochs (no improvement in {early_stopping_metric})')
                break
    
    # Final evaluation
    print("\nFinal model evaluation...")
    model.load_state_dict(torch.load(os.path.join(runs_dir, f'best_model_{run_name}.pt')))
    final_valid_loss, final_valid_accuracy, final_valid_f1 = validate(model, val_loader, criterion, device)
    
    print(f'\nTRAINING COMPLETE')
    print(f'Best validation loss: {best_valid_loss:.4f}')
    print(f'Best validation accuracy: {best_valid_accuracy:.4f}')
    print(f'Best validation F1: {best_valid_f1:.4f}')
    print(f'Best {early_stopping_metric}: {best_metric_value:.4f}')
    
    # Log final results
    if use_wandb:
        wandb.log({
            "final_valid_loss": final_valid_loss,
            "final_valid_accuracy": final_valid_accuracy,
            "final_valid_f1": final_valid_f1,
            "training_completed": True,
            "epochs_trained": epoch + 1,
            "early_stopped": patience_counter >= early_stopping_patience,
        })
        
        # Create summary table
        summary_table = wandb.Table(
            columns=["Metric", "Best Validation", "Final Validation"],
            data=[
                ["Loss", best_valid_loss, final_valid_loss],
                ["Accuracy", best_valid_accuracy, final_valid_accuracy],
                ["F1", best_valid_f1, final_valid_f1]
            ]
        )
        wandb.log({"validation_summary": summary_table})
        wandb.finish()
    
    # Return results
    return {
        "run_name": run_name,
        "model_path": os.path.join(runs_dir, f'best_model_{run_name}.pt'),
        "best_metrics": {
            "loss": best_valid_loss,
            "accuracy": best_valid_accuracy,
            "f1": best_valid_f1,
            early_stopping_metric: best_metric_value
        },
        "final_metrics": {
            "loss": final_valid_loss,
            "accuracy": final_valid_accuracy,
            "f1": final_valid_f1
        },
        "training_info": {
            "epochs_trained": epoch + 1,
            "early_stopped": patience_counter >= early_stopping_patience,
            "total_parameters": total_params,
            "trainable_parameters": trainable_params
        }
    }

## 5. Training

### Basic Training with Default Parameters

In [8]:
# Basic training run with default parameters
results = train_router_quality_classifier(
    experiment_name="basic_test",
    n_epochs=3,  # Quick test
    use_wandb=False  # Disable wandb for testing
)

print("\nTraining Results:")
print(f"Model saved at: {results['model_path']}")
print(f"Best F1 score: {results['best_metrics']['f1']:.4f}")
print(f"Total epochs: {results['training_info']['epochs_trained']}")

Reproducible environment set with seed: 9999

Starting training run: basic_test_20250813_005702
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR 

Epoch 1/3: 100%|██████████| 88/88 [00:50<00:00,  1.75it/s, loss=0.3028, lr=0.000100]



Epoch 1/3:
  Training   - Loss: 0.4040, Accuracy: 0.8583, F1: 0.0569
  Validation - Loss: 0.3556, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 2/3: 100%|██████████| 88/88 [00:50<00:00,  1.73it/s, loss=0.0724, lr=0.000100]



Epoch 2/3:
  Training   - Loss: 0.3602, Accuracy: 0.8796, F1: 0.2283
  Validation - Loss: 0.3520, Accuracy: 0.8750, F1: 0.1200
  Learning Rate: 0.00010000
New best model saved with f1: 0.1200
New best model saved with f1: 0.1200


Epoch 3/3: 100%|██████████| 88/88 [00:51<00:00,  1.72it/s, loss=0.6044, lr=0.000100]



Epoch 3/3:
  Training   - Loss: 0.3335, Accuracy: 0.8974, F1: 0.4375
  Validation - Loss: 0.3310, Accuracy: 0.8864, F1: 0.3103
  Learning Rate: 0.00010000
New best model saved with f1: 0.3103

Final model evaluation...
New best model saved with f1: 0.3103

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 0.3310
Best validation accuracy: 0.8864
Best validation F1: 0.3103
Best f1: 0.3103

Training Results:
Model saved at: runs\best_model_basic_test_20250813_005702.pt
Best F1 score: 0.3103
Total epochs: 3

TRAINING COMPLETE
Best validation loss: 0.3310
Best validation accuracy: 0.8864
Best validation F1: 0.3103
Best f1: 0.3103

Training Results:
Model saved at: runs\best_model_basic_test_20250813_005702.pt
Best F1 score: 0.3103
Total epochs: 3


### Lets Adjust Parameters

In [9]:
intial_results = train_router_quality_classifier(
    # Model configuration
    dropout_rate=0.3,
    
    # Training hyperparameters
    learning_rate=0.0001,
    n_epochs=30,
    weight_decay=0.01,
    
    # Learning rate scheduling
    enable_lr_annealing=True,
    
    # Experiment settings
    experiment_name="dropout30wd01",
    use_wandb=True  # Enable wandb logging
)

Reproducible environment set with seed: 9999


wandb: Currently logged in as: shubhamjune (shubhamjune-vilnius-university) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin



Starting training run: dropout30wd01_20250813_161421
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
Initial - 

Epoch 1/30:  16%|█████████████████████▏                                                                                                               | 14/88 [00:08<00:43,  1.69it/s, loss=0.4647, lr=0.000035]


KeyboardInterrupt: 

## We have a tiny dataset, therefore we need high regularisation to stop from overfitting.

In [10]:
do70 = train_router_quality_classifier(
    # Model configuration
    dropout_rate=0.3,
    
    # Training hyperparameters
    learning_rate=0.0001,
    n_epochs=30,
    weight_decay=0.01,
    
    # Learning rate scheduling
    enable_lr_annealing=True,
    
    # Experiment settings
    experiment_name="dropout70wd01",
    use_wandb=True  # Enable wandb logging
)

Reproducible environment set with seed: 9999


0,1
epoch,▁
valid_accuracy,▁
valid_f1,▁
valid_loss,▁

0,1
epoch,0.0
valid_accuracy,0.2358
valid_f1,0.20178
valid_loss,0.73829



Starting training run: dropout70wd01_20250813_161447
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
Initial - 

Epoch 1/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.86it/s, loss=0.3741, lr=0.000100]


Epoch 1/30:
  Training   - Loss: 0.4266, Accuracy: 0.8476, F1: 0.0446
  Validation - Loss: 0.3660, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0998, lr=0.000100]


Epoch 2/30:
  Training   - Loss: 0.3851, Accuracy: 0.8689, F1: 0.1481
  Validation - Loss: 0.3562, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00010000
New best model saved with f1: 0.2903


Epoch 3/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.6740, lr=0.000100]


Epoch 3/30:
  Training   - Loss: 0.3347, Accuracy: 0.8889, F1: 0.4091
  Validation - Loss: 0.3897, Accuracy: 0.8807, F1: 0.2222
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 4/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0766, lr=0.000100]


Epoch 4/30:
  Training   - Loss: 0.2688, Accuracy: 0.9231, F1: 0.6707
  Validation - Loss: 0.5148, Accuracy: 0.8835, F1: 0.2545
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 5/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.6606, lr=0.000100]


Epoch 5/30:
  Training   - Loss: 0.2250, Accuracy: 0.9430, F1: 0.7531
  Validation - Loss: 0.3641, Accuracy: 0.8807, F1: 0.1923
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 6/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.5700, lr=0.000100]


Epoch 6/30:
  Training   - Loss: 0.1655, Accuracy: 0.9658, F1: 0.8621
  Validation - Loss: 0.5224, Accuracy: 0.8523, F1: 0.3500
  Learning Rate: 0.00010000
New best model saved with f1: 0.3500


Epoch 7/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.5119, lr=0.000100]


Epoch 7/30:
  Training   - Loss: 0.0957, Accuracy: 0.9822, F1: 0.9292
  Validation - Loss: 0.8038, Accuracy: 0.8438, F1: 0.3678
  Learning Rate: 0.00010000
New best model saved with f1: 0.3678


Epoch 8/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.5454, lr=0.000100]


Epoch 8/30:
  Training   - Loss: 0.0814, Accuracy: 0.9865, F1: 0.9465
  Validation - Loss: 0.7519, Accuracy: 0.8722, F1: 0.2373
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 9/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0033, lr=0.000100]


Epoch 9/30:
  Training   - Loss: 0.0857, Accuracy: 0.9858, F1: 0.9441
  Validation - Loss: 0.9535, Accuracy: 0.8494, F1: 0.3291
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 10/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0263, lr=0.000100]


Epoch 10/30:
  Training   - Loss: 0.0357, Accuracy: 0.9957, F1: 0.9834
  Validation - Loss: 1.0838, Accuracy: 0.8608, F1: 0.3467
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 11/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0031, lr=0.000100]


Epoch 11/30:
  Training   - Loss: 0.0428, Accuracy: 0.9943, F1: 0.9780
  Validation - Loss: 0.9382, Accuracy: 0.8580, F1: 0.3421
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 12/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0025, lr=0.000100]


Epoch 12/30:
  Training   - Loss: 0.0411, Accuracy: 0.9936, F1: 0.9756
  Validation - Loss: 0.9034, Accuracy: 0.8750, F1: 0.3529
  Learning Rate: 0.00010000
No improvement in f1. Patience: 5/500


Epoch 13/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0012, lr=0.000100]


Epoch 13/30:
  Training   - Loss: 0.0266, Accuracy: 0.9964, F1: 0.9863
  Validation - Loss: 0.8899, Accuracy: 0.8807, F1: 0.3824
  Learning Rate: 0.00010000
New best model saved with f1: 0.3824


Epoch 14/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0020, lr=0.000100]


Epoch 14/30:
  Training   - Loss: 0.0182, Accuracy: 0.9979, F1: 0.9918
  Validation - Loss: 0.8819, Accuracy: 0.8807, F1: 0.4000
  Learning Rate: 0.00010000
New best model saved with f1: 0.4000


Epoch 15/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0002, lr=0.000100]


Epoch 15/30:
  Training   - Loss: 0.0174, Accuracy: 0.9979, F1: 0.9918
  Validation - Loss: 1.0410, Accuracy: 0.8778, F1: 0.3944
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 16/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0018, lr=0.000100]


Epoch 16/30:
  Training   - Loss: 0.0162, Accuracy: 0.9964, F1: 0.9864
  Validation - Loss: 0.9070, Accuracy: 0.8750, F1: 0.2143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 17/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.74it/s, loss=0.0010, lr=0.000100]


Epoch 17/30:
  Training   - Loss: 0.0175, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.4354, Accuracy: 0.8409, F1: 0.3636
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 18/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.73it/s, loss=0.0015, lr=0.000100]


Epoch 18/30:
  Training   - Loss: 0.0089, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.4384, Accuracy: 0.8523, F1: 0.3810
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 19/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.74it/s, loss=0.0002, lr=0.000100]


Epoch 19/30:
  Training   - Loss: 0.0087, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.1294, Accuracy: 0.8864, F1: 0.3750
  Learning Rate: 0.00010000
No improvement in f1. Patience: 5/500


Epoch 20/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.73it/s, loss=0.0007, lr=0.000100]


Epoch 20/30:
  Training   - Loss: 0.0092, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.0872, Accuracy: 0.8864, F1: 0.3750
  Learning Rate: 0.00010000
No improvement in f1. Patience: 6/500


Epoch 21/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0003, lr=0.000100]


Epoch 21/30:
  Training   - Loss: 0.0078, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.1133, Accuracy: 0.8864, F1: 0.3750
  Learning Rate: 0.00010000
No improvement in f1. Patience: 7/500


Epoch 22/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0003, lr=0.000100]


Epoch 22/30:
  Training   - Loss: 0.0057, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.1267, Accuracy: 0.8864, F1: 0.3939
  Learning Rate: 0.00010000
No improvement in f1. Patience: 8/500


Epoch 23/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0003, lr=0.000100]


Epoch 23/30:
  Training   - Loss: 0.0062, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.1430, Accuracy: 0.8864, F1: 0.3939
  Learning Rate: 0.00010000
No improvement in f1. Patience: 9/500


Epoch 24/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0001, lr=0.000100]


Epoch 24/30:
  Training   - Loss: 0.0080, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.1459, Accuracy: 0.8835, F1: 0.3881
  Learning Rate: 0.00010000
No improvement in f1. Patience: 10/500


Epoch 25/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0001, lr=0.000100]


Epoch 25/30:
  Training   - Loss: 0.0077, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.1991, Accuracy: 0.8835, F1: 0.3881
  Learning Rate: 0.00010000
No improvement in f1. Patience: 11/500


Epoch 26/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0002, lr=0.000100]


Epoch 26/30:
  Training   - Loss: 0.0063, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.2217, Accuracy: 0.8835, F1: 0.3881
  Learning Rate: 0.00010000
No improvement in f1. Patience: 12/500


Epoch 27/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]


Epoch 27/30:
  Training   - Loss: 0.0091, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3600, Accuracy: 0.8807, F1: 0.3824
  Learning Rate: 0.00010000
No improvement in f1. Patience: 13/500


Epoch 28/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0001, lr=0.000100]


Epoch 28/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4422, Accuracy: 0.8807, F1: 0.3824
  Learning Rate: 0.00010000
No improvement in f1. Patience: 14/500


Epoch 29/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0005, lr=0.000100]


Epoch 29/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5068, Accuracy: 0.8807, F1: 0.3824
  Learning Rate: 0.00010000
No improvement in f1. Patience: 15/500


Epoch 30/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0000, lr=0.000100]


Epoch 30/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5573, Accuracy: 0.8807, F1: 0.3824
  Learning Rate: 0.00010000
No improvement in f1. Patience: 16/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 0.8819
Best validation accuracy: 0.8807
Best validation F1: 0.4000
Best f1: 0.4000


0,1
best_f1,▁▅▆▇█
best_valid_accuracy,▇▃▁██
best_valid_f1,▁▅▆▇█
best_valid_loss,▁▃▇██
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_f1,0.4
best_valid_accuracy,0.88068
best_valid_f1,0.4
best_valid_loss,0.88189
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.88068
final_valid_f1,0.4
final_valid_loss,0.88189


In [11]:
do50wdlow = train_router_quality_classifier(
    # Model configuration
    dropout_rate=0.3,
    
    # Training hyperparameters
    learning_rate=0.0001,
    n_epochs=30,
    weight_decay=0.0001,
    
    # Learning rate scheduling
    enable_lr_annealing=True,
    
    # Experiment settings
    experiment_name="do50wdlow",
    use_wandb=True  # Enable wandb logging
)

Reproducible environment set with seed: 9999



Starting training run: do50wdlow_20250813_164111
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
Initial - Loss

Epoch 1/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.2890, lr=0.000100]


Epoch 1/30:
  Training   - Loss: 0.4226, Accuracy: 0.8462, F1: 0.0442
  Validation - Loss: 0.3588, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.1378, lr=0.000100]


Epoch 2/30:
  Training   - Loss: 0.3876, Accuracy: 0.8668, F1: 0.1382
  Validation - Loss: 0.3742, Accuracy: 0.8778, F1: 0.1887
  Learning Rate: 0.00010000
New best model saved with f1: 0.1887


Epoch 3/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.7245, lr=0.000100]


Epoch 3/30:
  Training   - Loss: 0.3398, Accuracy: 0.8825, F1: 0.3529
  Validation - Loss: 0.3762, Accuracy: 0.8807, F1: 0.2500
  Learning Rate: 0.00010000
New best model saved with f1: 0.2500


Epoch 4/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0551, lr=0.000100]


Epoch 4/30:
  Training   - Loss: 0.2737, Accuracy: 0.9245, F1: 0.6443
  Validation - Loss: 0.4953, Accuracy: 0.8778, F1: 0.3175
  Learning Rate: 0.00010000
New best model saved with f1: 0.3175


Epoch 5/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0828, lr=0.000100]


Epoch 5/30:
  Training   - Loss: 0.1967, Accuracy: 0.9509, F1: 0.7890
  Validation - Loss: 0.4953, Accuracy: 0.8636, F1: 0.3684
  Learning Rate: 0.00010000
New best model saved with f1: 0.3684


Epoch 6/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.1525, lr=0.000100]


Epoch 6/30:
  Training   - Loss: 0.1095, Accuracy: 0.9701, F1: 0.8807
  Validation - Loss: 0.6580, Accuracy: 0.8466, F1: 0.3864
  Learning Rate: 0.00010000
New best model saved with f1: 0.3864


Epoch 7/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0002, lr=0.000100]


Epoch 7/30:
  Training   - Loss: 0.0483, Accuracy: 0.9872, F1: 0.9503
  Validation - Loss: 1.1038, Accuracy: 0.8608, F1: 0.2687
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 8/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0002, lr=0.000100]


Epoch 8/30:
  Training   - Loss: 0.0172, Accuracy: 0.9957, F1: 0.9835
  Validation - Loss: 1.2247, Accuracy: 0.8665, F1: 0.2985
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 9/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0002, lr=0.000100]


Epoch 9/30:
  Training   - Loss: 0.0105, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.5269, Accuracy: 0.8466, F1: 0.3415
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 10/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0004, lr=0.000100]


Epoch 10/30:
  Training   - Loss: 0.0002, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3900, Accuracy: 0.8778, F1: 0.3944
  Learning Rate: 0.00010000
New best model saved with f1: 0.3944


Epoch 11/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.76it/s, loss=0.0000, lr=0.000100]


Epoch 11/30:
  Training   - Loss: 0.0028, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.5387, Accuracy: 0.8608, F1: 0.3636
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 12/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.75it/s, loss=0.0000, lr=0.000100]


Epoch 12/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5740, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 13/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:50<00:00,  1.75it/s, loss=0.0000, lr=0.000100]


Epoch 13/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6129, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 14/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.0000, lr=0.000100]


Epoch 14/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6372, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 15/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 15/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6600, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 5/500


Epoch 16/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 16/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6688, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 6/500


Epoch 17/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 17/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.7005, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 7/500


Epoch 18/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 18/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6783, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00010000
No improvement in f1. Patience: 8/500


Epoch 19/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 19/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6872, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 9/500


Epoch 20/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 20/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.7473, Accuracy: 0.8722, F1: 0.3836
  Learning Rate: 0.00010000
No improvement in f1. Patience: 10/500


Epoch 21/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]


Epoch 21/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.7841, Accuracy: 0.8722, F1: 0.3836
  Learning Rate: 0.00010000
No improvement in f1. Patience: 11/500


Epoch 22/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0000, lr=0.000100]


Epoch 22/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.8001, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 12/500


Epoch 23/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0000, lr=0.000100]


Epoch 23/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.8288, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 13/500


Epoch 24/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]


Epoch 24/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.8458, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 14/500


Epoch 25/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]


Epoch 25/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.8631, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 15/500


Epoch 26/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]


Epoch 26/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.8774, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 16/500


Epoch 27/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]


Epoch 27/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.8912, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 17/500


Epoch 28/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0000, lr=0.000100]


Epoch 28/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.9141, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 18/500


Epoch 29/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 29/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.9221, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 19/500


Epoch 30/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]


Epoch 30/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.9324, Accuracy: 0.8750, F1: 0.3889
  Learning Rate: 0.00010000
No improvement in f1. Patience: 20/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 1.3900
Best validation accuracy: 0.8778
Best validation F1: 0.3944
Best f1: 0.3944


0,1
best_f1,▁▃▅▇██
best_valid_accuracy,▇█▇▅▁▇
best_valid_f1,▁▃▅▇██
best_valid_loss,▁▁▂▂▃█
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_f1,0.39437
best_valid_accuracy,0.87784
best_valid_f1,0.39437
best_valid_loss,1.38996
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.87784
final_valid_f1,0.39437
final_valid_loss,1.38996


In [12]:
slow_lr = train_router_quality_classifier(
    # Model configuration
    dropout_rate=0.3,
    
    # Training hyperparameters
    learning_rate=0.00005,
    n_epochs=30,
    weight_decay=0.0001,
    
    # Learning rate scheduling
    enable_lr_annealing=True,
    
    # Experiment settings
    experiment_name="slow_lr",
    use_wandb=True  # Enable wandb logging
)

Reproducible environment set with seed: 9999



Starting training run: slow_lr_20250813_170739
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
Initial - Loss: 

Epoch 1/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.3044, lr=0.000050]


Epoch 1/30:
  Training   - Loss: 0.4266, Accuracy: 0.8355, F1: 0.0494
  Validation - Loss: 0.3692, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00005000
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0939, lr=0.000050]


Epoch 2/30:
  Training   - Loss: 0.3749, Accuracy: 0.8761, F1: 0.1386
  Validation - Loss: 0.3294, Accuracy: 0.8835, F1: 0.2264
  Learning Rate: 0.00005000
New best model saved with f1: 0.2264


Epoch 3/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:49<00:00,  1.78it/s, loss=0.8220, lr=0.000050]


Epoch 3/30:
  Training   - Loss: 0.3337, Accuracy: 0.8910, F1: 0.4047
  Validation - Loss: 0.3704, Accuracy: 0.8807, F1: 0.2759
  Learning Rate: 0.00005000
New best model saved with f1: 0.2759


Epoch 4/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:49<00:00,  1.79it/s, loss=0.0246, lr=0.000050]


Epoch 4/30:
  Training   - Loss: 0.2418, Accuracy: 0.9259, F1: 0.6623
  Validation - Loss: 0.4840, Accuracy: 0.8778, F1: 0.3175
  Learning Rate: 0.00005000
New best model saved with f1: 0.3175


Epoch 5/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.1016, lr=0.000050]


Epoch 5/30:
  Training   - Loss: 0.1829, Accuracy: 0.9516, F1: 0.7976
  Validation - Loss: 0.6189, Accuracy: 0.8665, F1: 0.4051
  Learning Rate: 0.00005000
New best model saved with f1: 0.4051


Epoch 6/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0381, lr=0.000050]


Epoch 6/30:
  Training   - Loss: 0.1097, Accuracy: 0.9758, F1: 0.9056
  Validation - Loss: 0.6738, Accuracy: 0.8693, F1: 0.3784
  Learning Rate: 0.00005000
No improvement in f1. Patience: 1/500


Epoch 7/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0020, lr=0.000050]


Epoch 7/30:
  Training   - Loss: 0.0716, Accuracy: 0.9836, F1: 0.9352
  Validation - Loss: 0.7870, Accuracy: 0.8807, F1: 0.2759
  Learning Rate: 0.00005000
No improvement in f1. Patience: 2/500


Epoch 8/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.5839, lr=0.000050]


Epoch 8/30:
  Training   - Loss: 0.0379, Accuracy: 0.9943, F1: 0.9778
  Validation - Loss: 0.8366, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00005000
No improvement in f1. Patience: 3/500


Epoch 9/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0005, lr=0.000050]


Epoch 9/30:
  Training   - Loss: 0.0456, Accuracy: 0.9907, F1: 0.9648
  Validation - Loss: 1.0278, Accuracy: 0.8665, F1: 0.2295
  Learning Rate: 0.00005000
No improvement in f1. Patience: 4/500


Epoch 10/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:49<00:00,  1.78it/s, loss=0.0008, lr=0.000050]


Epoch 10/30:
  Training   - Loss: 0.0161, Accuracy: 0.9972, F1: 0.9891
  Validation - Loss: 1.1307, Accuracy: 0.8807, F1: 0.3824
  Learning Rate: 0.00005000
No improvement in f1. Patience: 5/500


Epoch 11/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0005, lr=0.000050]


Epoch 11/30:
  Training   - Loss: 0.0050, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.2329, Accuracy: 0.8778, F1: 0.2182
  Learning Rate: 0.00005000
No improvement in f1. Patience: 6/500


Epoch 12/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0001, lr=0.000050]


Epoch 12/30:
  Training   - Loss: 0.0113, Accuracy: 0.9979, F1: 0.9918
  Validation - Loss: 1.2842, Accuracy: 0.8693, F1: 0.3611
  Learning Rate: 0.00005000
No improvement in f1. Patience: 7/500


Epoch 13/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000050]


Epoch 13/30:
  Training   - Loss: 0.0241, Accuracy: 0.9964, F1: 0.9864
  Validation - Loss: 1.1222, Accuracy: 0.8835, F1: 0.3492
  Learning Rate: 0.00005000
No improvement in f1. Patience: 8/500


Epoch 14/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000050]


Epoch 14/30:
  Training   - Loss: 0.0056, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.4006, Accuracy: 0.8665, F1: 0.4471
  Learning Rate: 0.00005000
New best model saved with f1: 0.4471


Epoch 15/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000050]


Epoch 15/30:
  Training   - Loss: 0.0064, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.2542, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00005000
No improvement in f1. Patience: 1/500


Epoch 16/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000050]


Epoch 16/30:
  Training   - Loss: 0.0026, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3121, Accuracy: 0.8722, F1: 0.3836
  Learning Rate: 0.00005000
No improvement in f1. Patience: 2/500


Epoch 17/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0002, lr=0.000050]


Epoch 17/30:
  Training   - Loss: 0.0034, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.5075, Accuracy: 0.8665, F1: 0.4471
  Learning Rate: 0.00005000
No improvement in f1. Patience: 3/500


Epoch 18/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.0012, lr=0.000050]


Epoch 18/30:
  Training   - Loss: 0.0144, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.2747, Accuracy: 0.8807, F1: 0.2759
  Learning Rate: 0.00005000
No improvement in f1. Patience: 4/500


Epoch 19/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:49<00:00,  1.77it/s, loss=0.0000, lr=0.000050]


Epoch 19/30:
  Training   - Loss: 0.0004, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3726, Accuracy: 0.8835, F1: 0.2807
  Learning Rate: 0.00005000
No improvement in f1. Patience: 5/500


Epoch 20/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0000, lr=0.000050]


Epoch 20/30:
  Training   - Loss: 0.0404, Accuracy: 0.9936, F1: 0.9756
  Validation - Loss: 1.2523, Accuracy: 0.8835, F1: 0.3051
  Learning Rate: 0.00005000
No improvement in f1. Patience: 6/500


Epoch 21/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0000, lr=0.000050]


Epoch 21/30:
  Training   - Loss: 0.0149, Accuracy: 0.9972, F1: 0.9892
  Validation - Loss: 1.3436, Accuracy: 0.8835, F1: 0.2807
  Learning Rate: 0.00005000
No improvement in f1. Patience: 7/500


Epoch 22/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=1.0103, lr=0.000050]


Epoch 22/30:
  Training   - Loss: 0.0258, Accuracy: 0.9957, F1: 0.9837
  Validation - Loss: 1.3454, Accuracy: 0.8693, F1: 0.3030
  Learning Rate: 0.00005000
No improvement in f1. Patience: 8/500


Epoch 23/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.0000, lr=0.000050]


Epoch 23/30:
  Training   - Loss: 0.0102, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.5004, Accuracy: 0.8551, F1: 0.3544
  Learning Rate: 0.00005000
No improvement in f1. Patience: 9/500


Epoch 24/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0000, lr=0.000050]


Epoch 24/30:
  Training   - Loss: 0.0110, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.5071, Accuracy: 0.8636, F1: 0.2727
  Learning Rate: 0.00005000
No improvement in f1. Patience: 10/500


Epoch 25/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000050]


Epoch 25/30:
  Training   - Loss: 0.0085, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.5799, Accuracy: 0.8750, F1: 0.2667
  Learning Rate: 0.00002500
No improvement in f1. Patience: 11/500


Epoch 26/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000025]


Epoch 26/30:
  Training   - Loss: 0.0177, Accuracy: 0.9979, F1: 0.9918
  Validation - Loss: 1.3630, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00002500
No improvement in f1. Patience: 12/500


Epoch 27/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0001, lr=0.000025]


Epoch 27/30:
  Training   - Loss: 0.0099, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.2396, Accuracy: 0.8807, F1: 0.3000
  Learning Rate: 0.00002500
No improvement in f1. Patience: 13/500


Epoch 28/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0002, lr=0.000025]


Epoch 28/30:
  Training   - Loss: 0.0002, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3123, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00002500
No improvement in f1. Patience: 14/500


Epoch 29/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0006, lr=0.000025]


Epoch 29/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3615, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00002500
No improvement in f1. Patience: 15/500


Epoch 30/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000025]


Epoch 30/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4002, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00002500
No improvement in f1. Patience: 16/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 1.4006
Best validation accuracy: 0.8665
Best validation F1: 0.4471
Best f1: 0.4471


0,1
best_f1,▁▃▄▇█
best_valid_accuracy,█▇▆▁▁
best_valid_f1,▁▃▄▇█
best_valid_loss,▁▁▂▃█
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,████████████████████████▁▁▁▁▁▁

0,1
best_f1,0.44706
best_valid_accuracy,0.86648
best_valid_f1,0.44706
best_valid_loss,1.40063
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.86648
final_valid_f1,0.44706
final_valid_loss,1.40063


In [13]:
very_slow_lr = train_router_quality_classifier(
    # Model configuration
    dropout_rate=0.3,
    
    # Training hyperparameters
    learning_rate=0.000007,
    n_epochs=30,
    weight_decay=0.0001,
    
    # Learning rate scheduling
    enable_lr_annealing=False,
    
    # Experiment settings
    experiment_name="very_slow_lr",
    use_wandb=True  # Enable wandb logging
)

Reproducible environment set with seed: 9999



Starting training run: very_slow_lr_20250813_180851
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%

Starting initial validation...
Initial - Loss: 0.7383, Accuracy: 0.2358, F1: 0.2018

St

Epoch 1/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.3129, lr=0.000007]


Epoch 1/30:
  Training   - Loss: 0.5000, Accuracy: 0.7678, F1: 0.1641
  Validation - Loss: 0.3890, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.2049, lr=0.000007]


Epoch 2/30:
  Training   - Loss: 0.3966, Accuracy: 0.8640, F1: 0.0402
  Validation - Loss: 0.3656, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 2/500


Epoch 3/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.86it/s, loss=0.5931, lr=0.000007]


Epoch 3/30:
  Training   - Loss: 0.3802, Accuracy: 0.8654, F1: 0.0207
  Validation - Loss: 0.3704, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 3/500


Epoch 4/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.2410, lr=0.000007]


Epoch 4/30:
  Training   - Loss: 0.3537, Accuracy: 0.8725, F1: 0.0725
  Validation - Loss: 0.3402, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 4/500


Epoch 5/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.5750, lr=0.000007]


Epoch 5/30:
  Training   - Loss: 0.3369, Accuracy: 0.8732, F1: 0.1100
  Validation - Loss: 0.3713, Accuracy: 0.8750, F1: 0.0833
  Learning Rate: 0.00000700
New best model saved with f1: 0.0833


Epoch 6/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.4883, lr=0.000007]


Epoch 6/30:
  Training   - Loss: 0.3213, Accuracy: 0.8761, F1: 0.1714
  Validation - Loss: 0.3309, Accuracy: 0.8864, F1: 0.2593
  Learning Rate: 0.00000700
New best model saved with f1: 0.2593


Epoch 7/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.4426, lr=0.000007]


Epoch 7/30:
  Training   - Loss: 0.2804, Accuracy: 0.8925, F1: 0.3837
  Validation - Loss: 0.4062, Accuracy: 0.8807, F1: 0.2500
  Learning Rate: 0.00000700
No improvement in f1. Patience: 1/500


Epoch 8/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.4330, lr=0.000007]


Epoch 8/30:
  Training   - Loss: 0.2269, Accuracy: 0.8981, F1: 0.4521
  Validation - Loss: 0.4011, Accuracy: 0.8807, F1: 0.3226
  Learning Rate: 0.00000700
New best model saved with f1: 0.3226


Epoch 9/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.1289, lr=0.000007]


Epoch 9/30:
  Training   - Loss: 0.2004, Accuracy: 0.9274, F1: 0.6531
  Validation - Loss: 0.4644, Accuracy: 0.8864, F1: 0.3333
  Learning Rate: 0.00000700
New best model saved with f1: 0.3333


Epoch 10/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.1456, lr=0.000007]


Epoch 10/30:
  Training   - Loss: 0.1578, Accuracy: 0.9509, F1: 0.7767
  Validation - Loss: 0.5773, Accuracy: 0.8835, F1: 0.3051
  Learning Rate: 0.00000700
No improvement in f1. Patience: 1/500


Epoch 11/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0313, lr=0.000007]


Epoch 11/30:
  Training   - Loss: 0.1013, Accuracy: 0.9672, F1: 0.8663
  Validation - Loss: 0.6299, Accuracy: 0.8722, F1: 0.2857
  Learning Rate: 0.00000700
No improvement in f1. Patience: 2/500


Epoch 12/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0039, lr=0.000007]


Epoch 12/30:
  Training   - Loss: 0.0554, Accuracy: 0.9850, F1: 0.9405
  Validation - Loss: 0.7984, Accuracy: 0.8665, F1: 0.2769
  Learning Rate: 0.00000700
No improvement in f1. Patience: 3/500


Epoch 13/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0024, lr=0.000007]


Epoch 13/30:
  Training   - Loss: 0.0386, Accuracy: 0.9886, F1: 0.9551
  Validation - Loss: 0.9521, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00000700
No improvement in f1. Patience: 4/500


Epoch 14/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0021, lr=0.000007]


Epoch 14/30:
  Training   - Loss: 0.0217, Accuracy: 0.9957, F1: 0.9834
  Validation - Loss: 0.9626, Accuracy: 0.8693, F1: 0.3235
  Learning Rate: 0.00000700
No improvement in f1. Patience: 5/500


Epoch 15/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0076, lr=0.000007]


Epoch 15/30:
  Training   - Loss: 0.0151, Accuracy: 0.9957, F1: 0.9837
  Validation - Loss: 1.0569, Accuracy: 0.8722, F1: 0.3478
  Learning Rate: 0.00000700
New best model saved with f1: 0.3478


Epoch 16/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.0006, lr=0.000007]


Epoch 16/30:
  Training   - Loss: 0.0242, Accuracy: 0.9957, F1: 0.9834
  Validation - Loss: 1.1067, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00000700
No improvement in f1. Patience: 1/500


Epoch 17/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0025, lr=0.000007]


Epoch 17/30:
  Training   - Loss: 0.0065, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.1465, Accuracy: 0.8693, F1: 0.3030
  Learning Rate: 0.00000700
No improvement in f1. Patience: 2/500


Epoch 18/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0004, lr=0.000007]


Epoch 18/30:
  Training   - Loss: 0.0052, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.2107, Accuracy: 0.8750, F1: 0.3125
  Learning Rate: 0.00000700
No improvement in f1. Patience: 3/500


Epoch 19/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 19/30:
  Training   - Loss: 0.0021, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.2784, Accuracy: 0.8835, F1: 0.3051
  Learning Rate: 0.00000700
No improvement in f1. Patience: 4/500


Epoch 20/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 20/30:
  Training   - Loss: 0.0122, Accuracy: 0.9972, F1: 0.9890
  Validation - Loss: 1.3048, Accuracy: 0.8722, F1: 0.3077
  Learning Rate: 0.00000700
No improvement in f1. Patience: 5/500


Epoch 21/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 21/30:
  Training   - Loss: 0.0022, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3341, Accuracy: 0.8665, F1: 0.2985
  Learning Rate: 0.00000700
No improvement in f1. Patience: 6/500


Epoch 22/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 22/30:
  Training   - Loss: 0.0076, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3822, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00000700
No improvement in f1. Patience: 7/500


Epoch 23/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 23/30:
  Training   - Loss: 0.0031, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3544, Accuracy: 0.8580, F1: 0.2857
  Learning Rate: 0.00000700
No improvement in f1. Patience: 8/500


Epoch 24/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 24/30:
  Training   - Loss: 0.0015, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3793, Accuracy: 0.8835, F1: 0.3051
  Learning Rate: 0.00000700
No improvement in f1. Patience: 9/500


Epoch 25/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 25/30:
  Training   - Loss: 0.0004, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3769, Accuracy: 0.8636, F1: 0.2941
  Learning Rate: 0.00000700
No improvement in f1. Patience: 10/500


Epoch 26/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 26/30:
  Training   - Loss: 0.0002, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4141, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00000700
No improvement in f1. Patience: 11/500


Epoch 27/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 27/30:
  Training   - Loss: 0.0003, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4261, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00000700
No improvement in f1. Patience: 12/500


Epoch 28/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 28/30:
  Training   - Loss: 0.0069, Accuracy: 0.9972, F1: 0.9891
  Validation - Loss: 1.4247, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00000700
No improvement in f1. Patience: 13/500


Epoch 29/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 29/30:
  Training   - Loss: 0.0004, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5155, Accuracy: 0.8778, F1: 0.2712
  Learning Rate: 0.00000700
No improvement in f1. Patience: 14/500


Epoch 30/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 30/30:
  Training   - Loss: 0.0032, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.4932, Accuracy: 0.8750, F1: 0.3125
  Learning Rate: 0.00000700
No improvement in f1. Patience: 15/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 1.0569
Best validation accuracy: 0.8722
Best validation F1: 0.3478
Best f1: 0.3478


0,1
best_f1,▁▆▇██
best_valid_accuracy,▂█▅█▁
best_valid_f1,▁▆▇██
best_valid_loss,▁▁▂▂█
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_f1,0.34783
best_valid_accuracy,0.87216
best_valid_f1,0.34783
best_valid_loss,1.05691
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.87216
final_valid_f1,0.34783
final_valid_loss,1.05691


In [14]:
very_slow_lr_withA = train_router_quality_classifier(
    # Model configuration
    dropout_rate=0.3,
    
    # Training hyperparameters
    learning_rate=0.000007,
    n_epochs=30,
    weight_decay=0.0001,
    
    # Learning rate scheduling
    enable_lr_annealing=True,
    
    # Experiment settings
    experiment_name="very_slow_lr_withA",
    use_wandb=True  # Enable wandb logging
)

Reproducible environment set with seed: 9999



Starting training run: very_slow_lr_withA_20250813_183459
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
Initi

Epoch 1/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.3130, lr=0.000007]


Epoch 1/30:
  Training   - Loss: 0.5000, Accuracy: 0.7678, F1: 0.1641
  Validation - Loss: 0.3890, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.2053, lr=0.000007]


Epoch 2/30:
  Training   - Loss: 0.3966, Accuracy: 0.8640, F1: 0.0402
  Validation - Loss: 0.3654, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 2/500


Epoch 3/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.5975, lr=0.000007]


Epoch 3/30:
  Training   - Loss: 0.3803, Accuracy: 0.8654, F1: 0.0207
  Validation - Loss: 0.3712, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 3/500


Epoch 4/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.2433, lr=0.000007]


Epoch 4/30:
  Training   - Loss: 0.3557, Accuracy: 0.8725, F1: 0.0725
  Validation - Loss: 0.3424, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 4/500


Epoch 5/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.5869, lr=0.000007]


Epoch 5/30:
  Training   - Loss: 0.3375, Accuracy: 0.8732, F1: 0.1100
  Validation - Loss: 0.3690, Accuracy: 0.8722, F1: 0.0426
  Learning Rate: 0.00000700
New best model saved with f1: 0.0426


Epoch 6/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.4736, lr=0.000007]


Epoch 6/30:
  Training   - Loss: 0.3217, Accuracy: 0.8768, F1: 0.1722
  Validation - Loss: 0.3336, Accuracy: 0.8864, F1: 0.2593
  Learning Rate: 0.00000700
New best model saved with f1: 0.2593


Epoch 7/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.4289, lr=0.000007]


Epoch 7/30:
  Training   - Loss: 0.2840, Accuracy: 0.8917, F1: 0.3719
  Validation - Loss: 0.3911, Accuracy: 0.8835, F1: 0.2807
  Learning Rate: 0.00000700
New best model saved with f1: 0.2807


Epoch 8/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.4588, lr=0.000007]


Epoch 8/30:
  Training   - Loss: 0.2302, Accuracy: 0.8939, F1: 0.4157
  Validation - Loss: 0.4045, Accuracy: 0.8864, F1: 0.3548
  Learning Rate: 0.00000700
New best model saved with f1: 0.3548


Epoch 9/30: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.1175, lr=0.000007]


Epoch 9/30:
  Training   - Loss: 0.1987, Accuracy: 0.9252, F1: 0.6392
  Validation - Loss: 0.4675, Accuracy: 0.8835, F1: 0.3051
  Learning Rate: 0.00000700
No improvement in f1. Patience: 1/500


Epoch 10/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.1107, lr=0.000007]


Epoch 10/30:
  Training   - Loss: 0.1603, Accuracy: 0.9494, F1: 0.7746
  Validation - Loss: 0.5674, Accuracy: 0.8807, F1: 0.3000
  Learning Rate: 0.00000700
No improvement in f1. Patience: 2/500


Epoch 11/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0418, lr=0.000007]


Epoch 11/30:
  Training   - Loss: 0.1016, Accuracy: 0.9679, F1: 0.8680
  Validation - Loss: 0.6451, Accuracy: 0.8722, F1: 0.2857
  Learning Rate: 0.00000700
No improvement in f1. Patience: 3/500


Epoch 12/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0045, lr=0.000007]


Epoch 12/30:
  Training   - Loss: 0.0516, Accuracy: 0.9865, F1: 0.9465
  Validation - Loss: 0.7655, Accuracy: 0.8608, F1: 0.2687
  Learning Rate: 0.00000700
No improvement in f1. Patience: 4/500


Epoch 13/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0030, lr=0.000007]


Epoch 13/30:
  Training   - Loss: 0.0402, Accuracy: 0.9900, F1: 0.9609
  Validation - Loss: 0.9146, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00000700
No improvement in f1. Patience: 5/500


Epoch 14/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0011, lr=0.000007]


Epoch 14/30:
  Training   - Loss: 0.0281, Accuracy: 0.9950, F1: 0.9806
  Validation - Loss: 0.9227, Accuracy: 0.8750, F1: 0.3125
  Learning Rate: 0.00000700
No improvement in f1. Patience: 6/500


Epoch 15/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0069, lr=0.000007]


Epoch 15/30:
  Training   - Loss: 0.0185, Accuracy: 0.9964, F1: 0.9863
  Validation - Loss: 1.0341, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00000700
No improvement in f1. Patience: 7/500


Epoch 16/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0008, lr=0.000007]


Epoch 16/30:
  Training   - Loss: 0.0241, Accuracy: 0.9943, F1: 0.9779
  Validation - Loss: 1.0913, Accuracy: 0.8722, F1: 0.2857
  Learning Rate: 0.00000700
No improvement in f1. Patience: 8/500


Epoch 17/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0024, lr=0.000007]


Epoch 17/30:
  Training   - Loss: 0.0110, Accuracy: 0.9979, F1: 0.9918
  Validation - Loss: 1.1530, Accuracy: 0.8636, F1: 0.2727
  Learning Rate: 0.00000700
No improvement in f1. Patience: 9/500


Epoch 18/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0004, lr=0.000007]


Epoch 18/30:
  Training   - Loss: 0.0031, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.2075, Accuracy: 0.8636, F1: 0.2727
  Learning Rate: 0.00000700
No improvement in f1. Patience: 10/500


Epoch 19/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 19/30:
  Training   - Loss: 0.0029, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.2469, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00000700
No improvement in f1. Patience: 11/500


Epoch 20/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 20/30:
  Training   - Loss: 0.0051, Accuracy: 0.9986, F1: 0.9945
  Validation - Loss: 1.2921, Accuracy: 0.8608, F1: 0.2687
  Learning Rate: 0.00000700
No improvement in f1. Patience: 12/500


Epoch 21/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 21/30:
  Training   - Loss: 0.0027, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3241, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00000700
No improvement in f1. Patience: 13/500


Epoch 22/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 22/30:
  Training   - Loss: 0.0076, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.3283, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00000700
No improvement in f1. Patience: 14/500


Epoch 23/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 23/30:
  Training   - Loss: 0.0037, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3322, Accuracy: 0.8693, F1: 0.3030
  Learning Rate: 0.00000700
No improvement in f1. Patience: 15/500


Epoch 24/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 24/30:
  Training   - Loss: 0.0022, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.3623, Accuracy: 0.8750, F1: 0.2903
  Learning Rate: 0.00000700
No improvement in f1. Patience: 16/500


Epoch 25/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 25/30:
  Training   - Loss: 0.0004, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3259, Accuracy: 0.8693, F1: 0.3030
  Learning Rate: 0.00000700
No improvement in f1. Patience: 17/500


Epoch 26/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0002, lr=0.000007]


Epoch 26/30:
  Training   - Loss: 0.0003, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4059, Accuracy: 0.8722, F1: 0.2857
  Learning Rate: 0.00000700
No improvement in f1. Patience: 18/500


Epoch 27/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 27/30:
  Training   - Loss: 0.0003, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3957, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00000700
No improvement in f1. Patience: 19/500


Epoch 28/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 28/30:
  Training   - Loss: 0.0033, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.4000, Accuracy: 0.8778, F1: 0.3385
  Learning Rate: 0.00000700
No improvement in f1. Patience: 20/500


Epoch 29/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0001, lr=0.000007]


Epoch 29/30:
  Training   - Loss: 0.0007, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4321, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00000700
No improvement in f1. Patience: 21/500


Epoch 30/30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000007]


Epoch 30/30:
  Training   - Loss: 0.0057, Accuracy: 0.9993, F1: 0.9973
  Validation - Loss: 1.4559, Accuracy: 0.8807, F1: 0.3438
  Learning Rate: 0.00000700
No improvement in f1. Patience: 22/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 0.4045
Best validation accuracy: 0.8864
Best validation F1: 0.3548
Best f1: 0.3548


0,1
best_f1,▁▆▆█
best_valid_accuracy,▁█▇█
best_valid_f1,▁▆▆█
best_valid_loss,▄▁▇█
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_f1,0.35484
best_valid_accuracy,0.88636
best_valid_f1,0.35484
best_valid_loss,0.40445
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.88636
final_valid_f1,0.35484
final_valid_loss,0.40445


### Hyperparameter Comparison

In [10]:
# Compare different configurations
configs = [
    {
        "name": "low_dropout",
        "dropout_rate": 0.1,
        "learning_rate": 0.0001,
        "n_epochs": 30
    },
    {
        "name": "high_dropout", 
        "dropout_rate": 0.5,
        "learning_rate": 0.0001,
        "n_epochs": 30
    },
    {
        "name": "higher_lr",
        "dropout_rate": 0.3,
        "learning_rate": 0.0005,
        "n_epochs": 30
    }
]

comparison_results = []

for config in configs:
    print(f"\nTesting configuration: {config['name']}")
    
    result = train_router_quality_classifier(
        experiment_name=f"comparison_{config['name']}",
        dropout_rate=config['dropout_rate'],
        learning_rate=config['learning_rate'],
        n_epochs=30,
        use_wandb=True
    )
    
    comparison_results.append({
        "config": config['name'],
        "f1_score": result['best_metrics']['f1'],
        "accuracy": result['best_metrics']['accuracy'],
        "epochs": result['training_info']['epochs_trained']
    })

# Display comparison
print("\nConfiguration Comparison:")
print("-" * 60)
print(f"{'Config':<15} {'F1 Score':<10} {'Accuracy':<10} {'Epochs':<8}")
print("-" * 60)
for result in comparison_results:
    print(f"{result['config']:<15} {result['f1_score']:<10.4f} {result['accuracy']:<10.4f} {result['epochs']:<8}")


Testing configuration: low_dropout
Reproducible environment set with seed: 9999



Starting training run: comparison_low_dropout_20250813_015447
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
A

Epoch 1/30: 100%|██████████| 88/88 [00:47<00:00,  1.85it/s, loss=0.3028, lr=0.000100]



Epoch 1/30:
  Training   - Loss: 0.4040, Accuracy: 0.8583, F1: 0.0569
  Validation - Loss: 0.3556, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0791, lr=0.000100]



Epoch 2/30:
  Training   - Loss: 0.3604, Accuracy: 0.8796, F1: 0.2283
  Validation - Loss: 0.3600, Accuracy: 0.8778, F1: 0.1569
  Learning Rate: 0.00010000
New best model saved with f1: 0.1569
New best model saved with f1: 0.1569


Epoch 3/30: 100%|██████████| 88/88 [00:49<00:00,  1.79it/s, loss=0.7775, lr=0.000100]



Epoch 3/30:
  Training   - Loss: 0.3220, Accuracy: 0.8960, F1: 0.4632
  Validation - Loss: 0.3815, Accuracy: 0.8778, F1: 0.2951
  Learning Rate: 0.00010000
New best model saved with f1: 0.2951
New best model saved with f1: 0.2951


Epoch 4/30: 100%|██████████| 88/88 [00:47<00:00,  1.86it/s, loss=0.0164, lr=0.000100]



Epoch 4/30:
  Training   - Loss: 0.2222, Accuracy: 0.9366, F1: 0.7157
  Validation - Loss: 0.4694, Accuracy: 0.8778, F1: 0.3385
  Learning Rate: 0.00010000
New best model saved with f1: 0.3385
New best model saved with f1: 0.3385


Epoch 5/30: 100%|██████████| 88/88 [00:47<00:00,  1.86it/s, loss=0.0325, lr=0.000100]



Epoch 5/30:
  Training   - Loss: 0.1361, Accuracy: 0.9608, F1: 0.8358
  Validation - Loss: 0.7425, Accuracy: 0.8097, F1: 0.3619
  Learning Rate: 0.00010000
New best model saved with f1: 0.3619
New best model saved with f1: 0.3619


Epoch 6/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0007, lr=0.000100]



Epoch 6/30:
  Training   - Loss: 0.0583, Accuracy: 0.9900, F1: 0.9609
  Validation - Loss: 0.8418, Accuracy: 0.8636, F1: 0.3684
  Learning Rate: 0.00010000
New best model saved with f1: 0.3684
New best model saved with f1: 0.3684


Epoch 7/30: 100%|██████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.5804, lr=0.000100]



Epoch 7/30:
  Training   - Loss: 0.0356, Accuracy: 0.9936, F1: 0.9752
  Validation - Loss: 0.9089, Accuracy: 0.8608, F1: 0.4096
  Learning Rate: 0.00010000
New best model saved with f1: 0.4096
New best model saved with f1: 0.4096


Epoch 8/30: 100%|██████████| 88/88 [00:50<00:00,  1.75it/s, loss=0.0113, lr=0.000100]



Epoch 8/30:
  Training   - Loss: 0.0415, Accuracy: 0.9907, F1: 0.9642
  Validation - Loss: 0.9906, Accuracy: 0.8267, F1: 0.3711
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 9/30: 100%|██████████| 88/88 [00:49<00:00,  1.79it/s, loss=0.0003, lr=0.000100]



Epoch 9/30:
  Training   - Loss: 0.0145, Accuracy: 0.9972, F1: 0.9891
  Validation - Loss: 0.9749, Accuracy: 0.8750, F1: 0.3714
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 10/30: 100%|██████████| 88/88 [00:49<00:00,  1.79it/s, loss=0.0006, lr=0.000100]



Epoch 10/30:
  Training   - Loss: 0.0267, Accuracy: 0.9957, F1: 0.9835
  Validation - Loss: 0.8646, Accuracy: 0.8636, F1: 0.3684
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 11/30: 100%|██████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.0003, lr=0.000100]



Epoch 11/30:
  Training   - Loss: 0.0233, Accuracy: 0.9943, F1: 0.9783
  Validation - Loss: 1.0654, Accuracy: 0.8636, F1: 0.4000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 12/30: 100%|██████████| 88/88 [00:49<00:00,  1.80it/s, loss=0.0001, lr=0.000100]



Epoch 12/30:
  Training   - Loss: 0.0056, Accuracy: 0.9972, F1: 0.9891
  Validation - Loss: 1.1748, Accuracy: 0.8750, F1: 0.3125
  Learning Rate: 0.00010000
No improvement in f1. Patience: 5/500


Epoch 13/30: 100%|██████████| 88/88 [00:49<00:00,  1.77it/s, loss=0.0007, lr=0.000100]



Epoch 13/30:
  Training   - Loss: 0.0106, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.2468, Accuracy: 0.8778, F1: 0.2182
  Learning Rate: 0.00010000
No improvement in f1. Patience: 6/500


Epoch 14/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0003, lr=0.000100]



Epoch 14/30:
  Training   - Loss: 0.0164, Accuracy: 0.9972, F1: 0.9892
  Validation - Loss: 1.1727, Accuracy: 0.8750, F1: 0.3125
  Learning Rate: 0.00010000
No improvement in f1. Patience: 7/500


Epoch 15/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]



Epoch 15/30:
  Training   - Loss: 0.0002, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.2682, Accuracy: 0.8580, F1: 0.3243
  Learning Rate: 0.00010000
No improvement in f1. Patience: 8/500


Epoch 16/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0001, lr=0.000100]



Epoch 16/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3486, Accuracy: 0.8608, F1: 0.3288
  Learning Rate: 0.00010000
No improvement in f1. Patience: 9/500


Epoch 17/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0003, lr=0.000100]



Epoch 17/30:
  Training   - Loss: 0.0001, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.3895, Accuracy: 0.8608, F1: 0.3288
  Learning Rate: 0.00010000
No improvement in f1. Patience: 10/500


Epoch 18/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]



Epoch 18/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4215, Accuracy: 0.8608, F1: 0.3288
  Learning Rate: 0.00010000
No improvement in f1. Patience: 11/500


Epoch 19/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]



Epoch 19/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4417, Accuracy: 0.8636, F1: 0.3333
  Learning Rate: 0.00010000
No improvement in f1. Patience: 12/500


Epoch 20/30: 100%|██████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0000, lr=0.000100]



Epoch 20/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4628, Accuracy: 0.8636, F1: 0.3333
  Learning Rate: 0.00010000
No improvement in f1. Patience: 13/500


Epoch 21/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]



Epoch 21/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4816, Accuracy: 0.8636, F1: 0.3333
  Learning Rate: 0.00010000
No improvement in f1. Patience: 14/500


Epoch 22/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]



Epoch 22/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.4986, Accuracy: 0.8665, F1: 0.3380
  Learning Rate: 0.00010000
No improvement in f1. Patience: 15/500


Epoch 23/30: 100%|██████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0000, lr=0.000100]



Epoch 23/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5168, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 16/500


Epoch 24/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]



Epoch 24/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5324, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 17/500


Epoch 25/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]



Epoch 25/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5477, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 18/500


Epoch 26/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]



Epoch 26/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5651, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 19/500


Epoch 27/30: 100%|██████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.0000, lr=0.000100]



Epoch 27/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.5852, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 20/500


Epoch 28/30: 100%|██████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.0000, lr=0.000100]



Epoch 28/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6000, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 21/500


Epoch 29/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0000, lr=0.000100]



Epoch 29/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6091, Accuracy: 0.8722, F1: 0.3077
  Learning Rate: 0.00010000
No improvement in f1. Patience: 22/500


Epoch 30/30: 100%|██████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.0000, lr=0.000100]



Epoch 30/30:
  Training   - Loss: 0.0000, Accuracy: 1.0000, F1: 1.0000
  Validation - Loss: 1.6247, Accuracy: 0.8722, F1: 0.3077
  Learning Rate: 0.00010000
No improvement in f1. Patience: 23/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 0.9089
Best validation accuracy: 0.8608
Best validation F1: 0.4096
Best f1: 0.4096

TRAINING COMPLETE
Best validation loss: 0.9089
Best validation accuracy: 0.8608
Best validation F1: 0.4096
Best f1: 0.4096


[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
best_f1,▁▅▆▇▇█
best_valid_accuracy,███▁▇▆
best_valid_f1,▁▅▆▇▇█
best_valid_loss,▁▁▂▆▇█
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_f1,0.40964
best_valid_accuracy,0.8608
best_valid_f1,0.40964
best_valid_loss,0.90886
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.8608
final_valid_f1,0.40964
final_valid_loss,0.90886



Testing configuration: high_dropout
Reproducible environment set with seed: 9999



Starting training run: comparison_high_dropout_20250813_022115
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...


Epoch 1/30: 100%|██████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.3468, lr=0.000100]



Epoch 1/30:
  Training   - Loss: 0.4883, Accuracy: 0.8105, F1: 0.1014
  Validation - Loss: 0.3791, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|██████████| 88/88 [00:47<00:00,  1.83it/s, loss=0.1592, lr=0.000100]



Epoch 2/30:
  Training   - Loss: 0.4484, Accuracy: 0.8640, F1: 0.0683
  Validation - Loss: 0.3606, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 3/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=1.0691, lr=0.000100]



Epoch 3/30:
  Training   - Loss: 0.4384, Accuracy: 0.8554, F1: 0.1847
  Validation - Loss: 0.3666, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 4/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.8913, lr=0.000100]



Epoch 4/30:
  Training   - Loss: 0.4351, Accuracy: 0.8754, F1: 0.3682
  Validation - Loss: 0.3693, Accuracy: 0.8864, F1: 0.2857
  Learning Rate: 0.00010000
New best model saved with f1: 0.2857
New best model saved with f1: 0.2857


Epoch 5/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.5384, lr=0.000100]



Epoch 5/30:
  Training   - Loss: 0.3764, Accuracy: 0.8868, F1: 0.4133
  Validation - Loss: 0.3578, Accuracy: 0.8807, F1: 0.1600
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 6/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.4754, lr=0.000100]



Epoch 6/30:
  Training   - Loss: 0.3452, Accuracy: 0.9117, F1: 0.5724
  Validation - Loss: 0.4536, Accuracy: 0.8438, F1: 0.2857
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 7/30: 100%|██████████| 88/88 [00:48<00:00,  1.83it/s, loss=0.3468, lr=0.000100]



Epoch 7/30:
  Training   - Loss: 0.3410, Accuracy: 0.9067, F1: 0.5467
  Validation - Loss: 0.4979, Accuracy: 0.8835, F1: 0.2545
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 8/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.3190, lr=0.000100]



Epoch 8/30:
  Training   - Loss: 0.2772, Accuracy: 0.9338, F1: 0.7085
  Validation - Loss: 0.5124, Accuracy: 0.8523, F1: 0.2778
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 9/30: 100%|██████████| 88/88 [00:47<00:00,  1.84it/s, loss=0.0540, lr=0.000100]



Epoch 9/30:
  Training   - Loss: 0.2757, Accuracy: 0.9387, F1: 0.7362
  Validation - Loss: 0.4997, Accuracy: 0.8523, F1: 0.1333
  Learning Rate: 0.00010000
No improvement in f1. Patience: 5/500


Epoch 10/30: 100%|██████████| 88/88 [00:49<00:00,  1.80it/s, loss=0.8909, lr=0.000100]



Epoch 10/30:
  Training   - Loss: 0.2729, Accuracy: 0.9409, F1: 0.7398
  Validation - Loss: 0.4362, Accuracy: 0.8665, F1: 0.2540
  Learning Rate: 0.00010000
No improvement in f1. Patience: 6/500


Epoch 11/30: 100%|██████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.2220, lr=0.000100]



Epoch 11/30:
  Training   - Loss: 0.2306, Accuracy: 0.9452, F1: 0.7688
  Validation - Loss: 0.6042, Accuracy: 0.8750, F1: 0.1852
  Learning Rate: 0.00010000
No improvement in f1. Patience: 7/500


Epoch 12/30: 100%|██████████| 88/88 [00:49<00:00,  1.79it/s, loss=0.0145, lr=0.000100]



Epoch 12/30:
  Training   - Loss: 0.1745, Accuracy: 0.9623, F1: 0.8427
  Validation - Loss: 0.6959, Accuracy: 0.8523, F1: 0.3158
  Learning Rate: 0.00010000
New best model saved with f1: 0.3158
New best model saved with f1: 0.3158


Epoch 13/30: 100%|██████████| 88/88 [00:50<00:00,  1.75it/s, loss=0.0165, lr=0.000100]



Epoch 13/30:
  Training   - Loss: 0.1717, Accuracy: 0.9672, F1: 0.8686
  Validation - Loss: 0.5918, Accuracy: 0.8864, F1: 0.3103
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 14/30: 100%|██████████| 88/88 [00:48<00:00,  1.80it/s, loss=0.0164, lr=0.000100]



Epoch 14/30:
  Training   - Loss: 0.1470, Accuracy: 0.9736, F1: 0.8934
  Validation - Loss: 0.6848, Accuracy: 0.8722, F1: 0.3077
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 15/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0146, lr=0.000100]



Epoch 15/30:
  Training   - Loss: 0.1223, Accuracy: 0.9765, F1: 0.9065
  Validation - Loss: 0.7956, Accuracy: 0.8750, F1: 0.2667
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 16/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0137, lr=0.000100]



Epoch 16/30:
  Training   - Loss: 0.0962, Accuracy: 0.9865, F1: 0.9471
  Validation - Loss: 0.8687, Accuracy: 0.8665, F1: 0.2769
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 17/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0156, lr=0.000100]



Epoch 17/30:
  Training   - Loss: 0.0869, Accuracy: 0.9865, F1: 0.9485
  Validation - Loss: 0.9484, Accuracy: 0.8750, F1: 0.3529
  Learning Rate: 0.00010000
New best model saved with f1: 0.3529
New best model saved with f1: 0.3529


Epoch 18/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.6685, lr=0.000100]



Epoch 18/30:
  Training   - Loss: 0.1084, Accuracy: 0.9850, F1: 0.9428
  Validation - Loss: 0.7850, Accuracy: 0.8381, F1: 0.3736
  Learning Rate: 0.00010000
New best model saved with f1: 0.3736
New best model saved with f1: 0.3736


Epoch 19/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0015, lr=0.000100]



Epoch 19/30:
  Training   - Loss: 0.0851, Accuracy: 0.9886, F1: 0.9560
  Validation - Loss: 0.8438, Accuracy: 0.8636, F1: 0.3143
  Learning Rate: 0.00010000
No improvement in f1. Patience: 1/500


Epoch 20/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0042, lr=0.000100]



Epoch 20/30:
  Training   - Loss: 0.0615, Accuracy: 0.9872, F1: 0.9505
  Validation - Loss: 1.0244, Accuracy: 0.8466, F1: 0.3250
  Learning Rate: 0.00010000
No improvement in f1. Patience: 2/500


Epoch 21/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0008, lr=0.000100]



Epoch 21/30:
  Training   - Loss: 0.0453, Accuracy: 0.9929, F1: 0.9725
  Validation - Loss: 1.0827, Accuracy: 0.8608, F1: 0.2899
  Learning Rate: 0.00010000
No improvement in f1. Patience: 3/500


Epoch 22/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.3764, lr=0.000100]



Epoch 22/30:
  Training   - Loss: 0.0448, Accuracy: 0.9936, F1: 0.9755
  Validation - Loss: 1.1635, Accuracy: 0.8608, F1: 0.2899
  Learning Rate: 0.00010000
No improvement in f1. Patience: 4/500


Epoch 23/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0010, lr=0.000100]



Epoch 23/30:
  Training   - Loss: 0.0127, Accuracy: 0.9972, F1: 0.9891
  Validation - Loss: 1.2995, Accuracy: 0.8693, F1: 0.3235
  Learning Rate: 0.00010000
No improvement in f1. Patience: 5/500


Epoch 24/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0001, lr=0.000100]



Epoch 24/30:
  Training   - Loss: 0.0136, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.3693, Accuracy: 0.8722, F1: 0.3077
  Learning Rate: 0.00010000
No improvement in f1. Patience: 6/500


Epoch 25/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0000, lr=0.000100]



Epoch 25/30:
  Training   - Loss: 0.0153, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.3580, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00010000
No improvement in f1. Patience: 7/500


Epoch 26/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0014, lr=0.000100]



Epoch 26/30:
  Training   - Loss: 0.0125, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.4197, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00010000
No improvement in f1. Patience: 8/500


Epoch 27/30: 100%|██████████| 88/88 [00:48<00:00,  1.81it/s, loss=0.0040, lr=0.000100]



Epoch 27/30:
  Training   - Loss: 0.0155, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.4088, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00010000
No improvement in f1. Patience: 9/500


Epoch 28/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0002, lr=0.000100]



Epoch 28/30:
  Training   - Loss: 0.0177, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.4163, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00010000
No improvement in f1. Patience: 10/500


Epoch 29/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0007, lr=0.000100]



Epoch 29/30:
  Training   - Loss: 0.0169, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.4614, Accuracy: 0.8722, F1: 0.3284
  Learning Rate: 0.00010000
No improvement in f1. Patience: 11/500


Epoch 30/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.0000, lr=0.000100]



Epoch 30/30:
  Training   - Loss: 0.0186, Accuracy: 0.9986, F1: 0.9946
  Validation - Loss: 1.4244, Accuracy: 0.8693, F1: 0.3235
  Learning Rate: 0.00010000
No improvement in f1. Patience: 12/500

Final model evaluation...

TRAINING COMPLETE
Best validation loss: 0.7850
Best validation accuracy: 0.8381
Best validation F1: 0.3736
Best f1: 0.3736

TRAINING COMPLETE
Best validation loss: 0.7850
Best validation accuracy: 0.8381
Best validation F1: 0.3736
Best f1: 0.3736


[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
best_f1,▁▃▆█
best_valid_accuracy,█▃▆▁
best_valid_f1,▁▃▆█
best_valid_loss,▁▅█▆
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
epochs_trained,▁
final_valid_accuracy,▁
final_valid_f1,▁
final_valid_loss,▁
learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_f1,0.37363
best_valid_accuracy,0.83807
best_valid_f1,0.37363
best_valid_loss,0.78505
early_stopped,False
epoch,30
epochs_trained,30
final_valid_accuracy,0.83807
final_valid_f1,0.37363
final_valid_loss,0.78505



Testing configuration: higher_lr
Reproducible environment set with seed: 9999



Starting training run: comparison_higher_lr_20250813_024742
Using dataset: rated_enhanced_dataset_meta_llama_llama_3.1_8b_instruct_20250811_140130_20250811_151904.jsonl

Loading rated dataset...
Loaded 1756 items with ratings
Dataset after filtering: 1756 items

=== DATASET ANALYSIS ===
Total samples: 1756
Label 0 (Good quality, rating >= 4): 1526 (86.9%)
Label 1 (Poor quality, rating < 4): 230 (13.1%)
Class balance ratio (0:1): 6.63:1

=== RATING DISTRIBUTION ===
Rating 1: 37 (2.1%)
Rating 2: 107 (6.1%)
Rating 3: 86 (4.9%)
Rating 4: 329 (18.7%)
Rating 5: 1197 (68.2%)

Training samples: 1404
Validation samples: 352
DataLoaders created successfully

Initializing model: distilbert-base-uncased
All transformer layers will be fine-tuned
📱 Model moved to device: cuda

=== MODEL PARAMETER ANALYSIS ===
Total parameters: 66,576,257
Trainable parameters: 66,576,257
Frozen parameters: 0
Percentage trainable: 100.0%
LR Annealing enabled: factor=0.5, patience=5

Starting initial validation...
All

Epoch 1/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.3527, lr=0.000500]



Epoch 1/30:
  Training   - Loss: 0.4749, Accuracy: 0.8462, F1: 0.0442
  Validation - Loss: 0.3933, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 1/500


Epoch 2/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.1653, lr=0.000500]



Epoch 2/30:
  Training   - Loss: 0.4610, Accuracy: 0.8611, F1: 0.0102
  Validation - Loss: 0.3882, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 2/500


Epoch 3/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.5236, lr=0.000500]



Epoch 3/30:
  Training   - Loss: 0.4288, Accuracy: 0.8682, F1: 0.0000
  Validation - Loss: 0.3887, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 3/500


Epoch 4/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.4880, lr=0.000500]



Epoch 4/30:
  Training   - Loss: 0.4277, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.4123, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 4/500


Epoch 5/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.6032, lr=0.000500]



Epoch 5/30:
  Training   - Loss: 0.4319, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.4849, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 5/500


Epoch 6/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.6540, lr=0.000500]



Epoch 6/30:
  Training   - Loss: 0.4270, Accuracy: 0.8654, F1: 0.0000
  Validation - Loss: 0.3948, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 6/500


Epoch 7/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.7129, lr=0.000500]



Epoch 7/30:
  Training   - Loss: 0.4414, Accuracy: 0.8682, F1: 0.0000
  Validation - Loss: 0.3925, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 7/500


Epoch 8/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.3017, lr=0.000500]



Epoch 8/30:
  Training   - Loss: 0.4193, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3878, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 8/500


Epoch 9/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.3600, lr=0.000500]



Epoch 9/30:
  Training   - Loss: 0.4177, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3878, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 9/500


Epoch 10/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.6265, lr=0.000500]



Epoch 10/30:
  Training   - Loss: 0.4113, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.4015, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 10/500


Epoch 11/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.1797, lr=0.000500]



Epoch 11/30:
  Training   - Loss: 0.4127, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3899, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 11/500


Epoch 12/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.1874, lr=0.000500]



Epoch 12/30:
  Training   - Loss: 0.4045, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3884, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 12/500


Epoch 13/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.2778, lr=0.000500]



Epoch 13/30:
  Training   - Loss: 0.4067, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3897, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 13/500


Epoch 14/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.3200, lr=0.000500]



Epoch 14/30:
  Training   - Loss: 0.4039, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3881, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 14/500


Epoch 15/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.8661, lr=0.000500]



Epoch 15/30:
  Training   - Loss: 0.4127, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3880, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 15/500


Epoch 16/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.5300, lr=0.000500]



Epoch 16/30:
  Training   - Loss: 0.4104, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.4211, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 16/500


Epoch 17/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.4729, lr=0.000500]



Epoch 17/30:
  Training   - Loss: 0.4099, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3906, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 17/500


Epoch 18/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.1277, lr=0.000500]



Epoch 18/30:
  Training   - Loss: 0.3980, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.3898, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 18/500


Epoch 19/30: 100%|██████████| 88/88 [00:48<00:00,  1.82it/s, loss=0.2801, lr=0.000500]



Epoch 19/30:
  Training   - Loss: 0.4053, Accuracy: 0.8689, F1: 0.0000
  Validation - Loss: 0.4011, Accuracy: 0.8693, F1: 0.0000
  Learning Rate: 0.00050000
No improvement in f1. Patience: 19/500


Epoch 20/30:  86%|████████▋ | 76/88 [00:41<00:06,  1.81it/s, loss=0.7651, lr=0.000500]

: 

## Conclusion: Everything overfits, this method is not appropriate for such a dataset.

### What else could be tried?
- Another embedding model
- Smaller last layer sizes
- Large dataset
- Another framework (papers newer than RouteLLM)

## What if had worked?
- We could explore Performance Gap Recovery
- We could tune Call-Performance Threshold