# Notebook 2: Model Training & Hyperparameter Optimization
## Method A - Sentiment Analysis with Domain Adaptation

### Objectives:
1. Train TF-IDF + FFNN classifier
2. Train E5/MiniLM embedding classifier
3. Fine-tune BERT for sentiment analysis
4. Hyperparameter optimization with Optuna
5. Save best models for evaluation

### Models to Train:
- **Model 1**: TF-IDF features + Feed-Forward Neural Network
- **Model 2**: E5/MiniLM embeddings + Classifier head
- **Model 3**: BERT fine-tuning (bert-base-uncased)

### Training Strategy:
- Train on: IMDB + Yelp combined (multi-domain)
- Validate on: IMDB + Yelp validation split
- Test on: Amazon (cross-domain) - in Notebook 3

---

In [1]:
# Install required packages
# Run this cell only once
%pip install torch transformers sentence-transformers scikit-learn optuna tqdm numpy pandas

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
## Setup & Imports - Part 1: Core Libraries

import os
import json
import pickle
import random
from pathlib import Path
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
from tqdm.auto import tqdm

print("✓ Core libraries loaded")

✓ Core libraries loaded


In [3]:
## Setup & Imports - Part 2: PyTorch & Deep Learning
print("Loading PyTorch... (this may take 30-60 seconds)")

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

print(f"✓ PyTorch loaded (version: {torch.__version__})")
print(f"  CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"  GPU: {torch.cuda.get_device_name(0)}")

Loading PyTorch... (this may take 30-60 seconds)
✓ PyTorch loaded (version: 2.4.1+cu121)
  CUDA available: True
  GPU: NVIDIA A100 80GB PCIe
✓ PyTorch loaded (version: 2.4.1+cu121)
  CUDA available: True
  GPU: NVIDIA A100 80GB PCIe


In [4]:
## Additional imports for K-Fold Cross-Validation
from sklearn.model_selection import StratifiedKFold
import copy

print("✓ K-Fold CV imports loaded")

✓ K-Fold CV imports loaded


In [5]:
## Setup & Imports - Part 3: Transformers & Embeddings
print("Loading Transformers library... (this may take 30-60 seconds)")

from transformers import (
    AutoTokenizer,
    AutoModel,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments,
    EarlyStoppingCallback,
    TrainerCallback
)
from sentence_transformers import SentenceTransformer

print("✓ Transformers library loaded")

Loading Transformers library... (this may take 30-60 seconds)
✓ Transformers library loaded
✓ Transformers library loaded


In [6]:
## Setup & Imports - Part 4: Scikit-learn & Optuna

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import (
    accuracy_score,
    precision_recall_fscore_support,
    classification_report,
    confusion_matrix
)

import optuna
from optuna.samplers import TPESampler

print("✓ Scikit-learn and Optuna loaded")

✓ Scikit-learn and Optuna loaded


In [7]:
## Configuration & Setup

# Set random seeds for reproducibility (matching LoRA approach)
SEED = 42

def set_seed(seed: int = 42):
    """Set random seed for reproducibility across all libraries."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    print(f"✓ Random seed set to {seed}")

set_seed(SEED)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Directory configuration
# Data: shared with LoRA (../data/)
# Models: local to fine_tuning/
DATA_DIR = Path('../data/processed')  # Shared data
MODELS_DIR = Path('models')           # Local models
LOGS_DIR = Path('outputs/logs')       # Local logs

# Create directories
(MODELS_DIR / 'tfidf_ffnn').mkdir(parents=True, exist_ok=True)
(MODELS_DIR / 'e5_classifier').mkdir(parents=True, exist_ok=True)
(MODELS_DIR / 'bert_finetuned').mkdir(parents=True, exist_ok=True)
LOGS_DIR.mkdir(parents=True, exist_ok=True)

# Training configuration (matching LoRA)
MAX_LENGTH = 256  # Token limit
N_TRIALS = 10     # Optuna trials
N_SPLITS = 3      # K-fold splits

print("✓ Setup complete")
print(f"  Working directory: {os.getcwd()}")
print(f"  Data directory (shared): {DATA_DIR.absolute()}")
print(f"  Models directory (local): {MODELS_DIR.absolute()}")
print(f"  Max length: {MAX_LENGTH}")
print(f"  Optuna trials: {N_TRIALS}")
print(f"  K-fold splits: {N_SPLITS}")


✓ Random seed set to 42
Using device: cuda
✓ Setup complete
  Working directory: /home3/rudrapra001/FYP-Research/export_package/fine_tuning
  Data directory (shared): /home3/rudrapra001/FYP-Research/export_package/fine_tuning/../data/processed
  Models directory (local): /home3/rudrapra001/FYP-Research/export_package/fine_tuning/models
  Max length: 256
  Optuna trials: 10
  K-fold splits: 3


---
## Part 1: Load Processed Data

Load the preprocessed datasets from Notebook 1.

---

In [8]:
## 1.1: Load Processed Data

def load_json_data(filepath):
    """Load JSON dataset"""
    with open(filepath, 'r', encoding='utf-8') as f:
        return json.load(f)

print("Loading datasets from ../data/processed/...")

# Load datasets created by first.ipynb (shared data directory)
train_data = load_json_data(DATA_DIR / 'train.json')
val_data = load_json_data(DATA_DIR / 'eval.json')

# Load summary
summary = load_json_data(DATA_DIR / 'dataset_summary.json')

print(f"\n✓ Datasets loaded:")
print(f"  Train: {len(train_data):,} samples (IMDB + Yelp)")
print(f"  Eval:  {len(val_data):,} samples (IMDB + Yelp)")

# Extract texts and labels
train_texts = [item['text'] for item in train_data]
train_labels = [item['label'] for item in train_data]

val_texts = [item['text'] for item in val_data]
val_labels = [item['label'] for item in val_data]

print("\n✓ Data extracted and ready for training")
print(f"  Train samples: {len(train_texts):,}")
print(f"  Eval samples:  {len(val_texts):,}")


Loading datasets from ../data/processed/...

✓ Datasets loaded:
  Train: 48,000 samples (IMDB + Yelp)
  Eval:  12,000 samples (IMDB + Yelp)

✓ Data extracted and ready for training
  Train samples: 48,000
  Eval samples:  12,000

✓ Datasets loaded:
  Train: 48,000 samples (IMDB + Yelp)
  Eval:  12,000 samples (IMDB + Yelp)

✓ Data extracted and ready for training
  Train samples: 48,000
  Eval samples:  12,000


In [9]:
## 1.2: Data Statistics

def print_label_distribution(labels, name):
    """Print label distribution"""
    unique, counts = np.unique(labels, return_counts=True)
    print(f"\n{name} label distribution:")
    for label, count in zip(unique, counts):
        sentiment = 'Positive' if label == 1 else 'Negative'
        print(f"  {sentiment}: {count:,} ({count/len(labels)*100:.1f}%)")

print_label_distribution(train_labels, "Training")
print_label_distribution(val_labels, "Validation")


Training label distribution:
  Negative: 24,030 (50.1%)
  Positive: 23,970 (49.9%)

Validation label distribution:
  Negative: 5,970 (49.8%)
  Positive: 6,030 (50.2%)


---
## Part 2: Model 1 - TF-IDF + Feed-Forward Neural Network

Classic approach: Convert text to TF-IDF features, then train a neural network.

---

In [10]:
## 2.1: TF-IDF Feature Extraction

print("Creating TF-IDF features...")
print("This may take 2-3 minutes...\n")

# Initialize TF-IDF vectorizer
tfidf_vectorizer = TfidfVectorizer(
    max_features=10000,      # Top 10k features
    ngram_range=(1, 2),      # Unigrams and bigrams
    min_df=5,                 # Minimum document frequency
    max_df=0.8,               # Maximum document frequency
    lowercase=True,
    strip_accents='unicode',
    stop_words='english'
)

# Fit on training data and transform
X_train_tfidf = tfidf_vectorizer.fit_transform(train_texts)
X_val_tfidf = tfidf_vectorizer.transform(val_texts)

# Convert to dense arrays for PyTorch
X_train_tfidf = X_train_tfidf.toarray()
X_val_tfidf = X_val_tfidf.toarray()

print(f"✓ TF-IDF features created:")
print(f"  Train shape: {X_train_tfidf.shape}")
print(f"  Val shape:   {X_val_tfidf.shape}")
print(f"  Vocabulary size: {len(tfidf_vectorizer.vocabulary_):,}")

# Save vectorizer
with open('models/tfidf_ffnn/tfidf_vectorizer.pkl', 'wb') as f:
    pickle.dump(tfidf_vectorizer, f)
print("\n✓ TF-IDF vectorizer saved")

Creating TF-IDF features...
This may take 2-3 minutes...

✓ TF-IDF features created:
  Train shape: (48000, 10000)
  Val shape:   (12000, 10000)
  Vocabulary size: 10,000
✓ TF-IDF features created:
  Train shape: (48000, 10000)
  Val shape:   (12000, 10000)
  Vocabulary size: 10,000

✓ TF-IDF vectorizer saved

✓ TF-IDF vectorizer saved


In [11]:
## 2.2: Define FFNN Model

class FFNNClassifier(nn.Module):
    """Feed-Forward Neural Network for classification"""
    
    def __init__(self, input_dim, hidden_dims, dropout_rate=0.3):
        super(FFNNClassifier, self).__init__()
        
        layers = []
        prev_dim = input_dim
        
        # Hidden layers
        for hidden_dim in hidden_dims:
            layers.append(nn.Linear(prev_dim, hidden_dim))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout_rate))
            layers.append(nn.BatchNorm1d(hidden_dim))
            prev_dim = hidden_dim
        
        # Output layer
        layers.append(nn.Linear(prev_dim, 2))  # Binary classification
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.network(x)

print("✓ FFNN model class defined")

✓ FFNN model class defined


In [12]:
## 2.3: FFNN Training Function

def train_ffnn(model, train_loader, val_loader, optimizer, criterion, 
               num_epochs=20, patience=3, device='cpu', save_path=None):
    """Train FFNN with early stopping
    
    Args:
        save_path: Optional path to save best model. If None, model is not saved.
    """
    
    best_val_loss = float('inf')
    best_val_acc = 0.0
    patience_counter = 0
    history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss = 0.0
        train_correct = 0
        train_total = 0
        
        pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs} [Train]")
        for batch_X, batch_y in pbar:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)
            
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            train_total += batch_y.size(0)
            train_correct += (predicted == batch_y).sum().item()
            
            pbar.set_postfix({'loss': f'{loss.item():.4f}'})
        
        avg_train_loss = train_loss / len(train_loader)
        train_acc = 100 * train_correct / train_total
        
        # Validation
        model.eval()
        val_loss = 0.0
        val_correct = 0
        val_total = 0
        
        with torch.no_grad():
            for batch_X, batch_y in val_loader:
                batch_X, batch_y = batch_X.to(device), batch_y.to(device)
                outputs = model(batch_X)
                loss = criterion(outputs, batch_y)
                
                val_loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                val_total += batch_y.size(0)
                val_correct += (predicted == batch_y).sum().item()
        
        avg_val_loss = val_loss / len(val_loader)
        val_acc = 100 * val_correct / val_total
        
        # Update history
        history['train_loss'].append(avg_train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(avg_val_loss)
        history['val_acc'].append(val_acc)
        
        print(f"Epoch {epoch+1}: Train Loss={avg_train_loss:.4f}, Train Acc={train_acc:.2f}% | "
              f"Val Loss={avg_val_loss:.4f}, Val Acc={val_acc:.2f}%")
        
        # Early stopping
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_val_loss = avg_val_loss
            patience_counter = 0
            # Save best model if save_path is provided
            if save_path is not None:
                torch.save(model.state_dict(), save_path)
                print(f"  ✓ New best model saved to {save_path} (Val Acc: {val_acc:.2f}%)")
        else:
            patience_counter += 1
            if patience_counter >= patience:
                print(f"\nEarly stopping triggered after {epoch+1} epochs")
                break
    
    return history, best_val_acc

print("✓ Training function defined")

✓ Training function defined


In [13]:
## 2.4: Hyperparameter Optimization with Optuna + K-Fold CV

def objective_ffnn_kfold(trial):
    """Optuna objective function for FFNN with K-Fold Cross-Validation
    
    Uses 3-fold stratified CV to get more robust performance estimates.
    Returns the mean validation accuracy across all folds.
    """
    
    # Hyperparameters to optimize
    hidden_dim1 = trial.suggest_int('hidden_dim1', 128, 512, step=64)
    hidden_dim2 = trial.suggest_int('hidden_dim2', 64, 256, step=32)
    dropout_rate = trial.suggest_float('dropout_rate', 0.2, 0.5)
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-4, 1e-2)
    batch_size = trial.suggest_categorical('batch_size', [32, 64, 128])
    
    print(f"\n[Trial {trial.number}] Testing: hidden={hidden_dim1},{hidden_dim2}, "
          f"dropout={dropout_rate:.3f}, lr={learning_rate:.2e}, batch={batch_size}")
    
    # K-Fold Cross-Validation setup
    n_folds = 3
    skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=SEED)
    
    fold_scores = []
    
    # Combine train and val for k-fold (we'll use test set for final evaluation)
    X_combined = np.vstack([X_train_tfidf, X_val_tfidf])
    y_combined = np.array(train_labels + val_labels)
    
    for fold_idx, (train_idx, val_idx) in enumerate(skf.split(X_combined, y_combined)):
        print(f"  Fold {fold_idx+1}/{n_folds}...", end=" ")
        
        # Split data for this fold
        X_fold_train = X_combined[train_idx]
        y_fold_train = y_combined[train_idx]
        X_fold_val = X_combined[val_idx]
        y_fold_val = y_combined[val_idx]
        
        # Create fresh model for this fold
        model = FFNNClassifier(
            input_dim=X_train_tfidf.shape[1],
            hidden_dims=[hidden_dim1, hidden_dim2],
            dropout_rate=dropout_rate
        ).to(device)
        
        # Create dataloaders
        train_dataset = torch.utils.data.TensorDataset(
            torch.FloatTensor(X_fold_train),
            torch.LongTensor(y_fold_train)
        )
        val_dataset = torch.utils.data.TensorDataset(
            torch.FloatTensor(X_fold_val),
            torch.LongTensor(y_fold_val)
        )
        
        # Use drop_last=True to avoid BatchNorm issues with single-sample batches
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size, drop_last=False)
        
        # Optimizer and loss
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
        criterion = nn.CrossEntropyLoss()
        
        # Train with reduced epochs for optimization (faster tuning)
        _, best_val_acc = train_ffnn(
            model, train_loader, val_loader, optimizer, criterion,
            num_epochs=10, patience=2, device=device, save_path=None
        )
        
        fold_scores.append(best_val_acc)
        print(f"Acc={best_val_acc:.2f}%")
        
        # Clean up to save memory
        del model, optimizer, train_loader, val_loader
        torch.cuda.empty_cache() if torch.cuda.is_available() else None
    
    # Return mean accuracy across all folds
    mean_acc = np.mean(fold_scores)
    std_acc = np.std(fold_scores)
    print(f"  → Mean CV Accuracy: {mean_acc:.2f}% (±{std_acc:.2f}%)")
    
    return mean_acc

print("✓ K-Fold CV Optuna objective function defined")

✓ K-Fold CV Optuna objective function defined


In [14]:
## 2.5: Run Hyperparameter Optimization with K-Fold CV

print("Starting K-Fold Cross-Validation hyperparameter optimization for TF-IDF + FFNN...")
print("Using 3-fold CV for robust hyperparameter selection")
print("This will take 20-40 minutes...\n")

# Create Optuna study
study_ffnn = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=SEED),
    study_name='tfidf_ffnn_kfold_optimization'
)

# Optimize with k-fold CV
study_ffnn.optimize(objective_ffnn_kfold, n_trials=15, show_progress_bar=True)

# Best parameters
print("\n" + "="*70)
print("K-FOLD CV OPTIMIZATION COMPLETE - TF-IDF + FFNN")
print("="*70)
print(f"Best mean CV accuracy: {study_ffnn.best_value:.2f}%")
print(f"\nBest hyperparameters:")
for key, value in study_ffnn.best_params.items():
    print(f"  {key}: {value}")

# Save study results
os.makedirs('models/tfidf_ffnn', exist_ok=True)
with open('models/tfidf_ffnn/optuna_study_kfold.pkl', 'wb') as f:
    pickle.dump(study_ffnn, f)
print("\n✓ K-Fold CV optimization results saved")

[I 2025-11-10 00:56:00,875] A new study created in memory with name: tfidf_ffnn_kfold_optimization


Starting K-Fold Cross-Validation hyperparameter optimization for TF-IDF + FFNN...
Using 3-fold CV for robust hyperparameter selection
This will take 20-40 minutes...



  0%|          | 0/15 [00:00<?, ?it/s]


[Trial 0] Testing: hidden=256,256, dropout=0.420, lr=1.58e-03, batch=32
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 74.97it/s, loss=0.3019]



Epoch 1: Train Loss=0.3414, Train Acc=85.16% | Val Loss=0.2644, Val Acc=88.91%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.38it/s, loss=0.0799]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.38it/s, loss=0.0799]


Epoch 2: Train Loss=0.2230, Train Acc=90.89% | Val Loss=0.2740, Val Acc=88.44%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:20<00:00, 60.71it/s, loss=0.0380]



Epoch 3: Train Loss=0.1576, Train Acc=93.95% | Val Loss=0.3164, Val Acc=88.13%

Early stopping triggered after 3 epochs
Acc=88.91%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.57it/s, loss=0.2637]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.57it/s, loss=0.2637]


Epoch 1: Train Loss=0.3438, Train Acc=85.03% | Val Loss=0.2733, Val Acc=88.62%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 81.27it/s, loss=0.5946]



Epoch 2: Train Loss=0.2275, Train Acc=90.72% | Val Loss=0.2729, Val Acc=88.94%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:22<00:00, 55.78it/s, loss=0.1538]



Epoch 3: Train Loss=0.1643, Train Acc=93.59% | Val Loss=0.2895, Val Acc=88.50%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:26<00:00, 46.44it/s, loss=0.0758]



Epoch 4: Train Loss=0.1215, Train Acc=95.38% | Val Loss=0.3108, Val Acc=88.52%

Early stopping triggered after 4 epochs
Acc=88.94%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:28<00:00, 43.58it/s, loss=0.4328]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:28<00:00, 43.58it/s, loss=0.4328]


Epoch 1: Train Loss=0.3392, Train Acc=85.31% | Val Loss=0.2678, Val Acc=88.83%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 57.40it/s, loss=0.2948]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 57.40it/s, loss=0.2948]


Epoch 2: Train Loss=0.2225, Train Acc=91.06% | Val Loss=0.2778, Val Acc=88.60%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 77.72it/s, loss=0.0551]



Epoch 3: Train Loss=0.1620, Train Acc=93.67% | Val Loss=0.2840, Val Acc=88.45%

Early stopping triggered after 3 epochs
Acc=88.83%
  → Mean CV Accuracy: 88.89% (±0.04%)


Best trial: 0. Best value: 88.8933:   7%|▋         | 1/15 [05:23<1:15:28, 323.44s/it]

[I 2025-11-10 01:01:24,319] Trial 0 finished with value: 88.89333333333333 and parameters: {'hidden_dim1': 256, 'hidden_dim2': 256, 'dropout_rate': 0.4195981825434215, 'learning_rate': 0.0015751320499779737, 'batch_size': 32}. Best is trial 0 with value: 88.89333333333333.

[Trial 1] Testing: hidden=512,192, dropout=0.412, lr=1.10e-04, batch=32
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:10<00:00, 117.04it/s, loss=0.3379]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:10<00:00, 117.04it/s, loss=0.3379]


Epoch 1: Train Loss=0.3574, Train Acc=83.93% | Val Loss=0.2613, Val Acc=89.08%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 70.57it/s, loss=0.2438]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 70.57it/s, loss=0.2438]


Epoch 2: Train Loss=0.1951, Train Acc=92.32% | Val Loss=0.2778, Val Acc=88.66%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 68.66it/s, loss=0.1458]
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 68.66it/s, loss=0.1458]


Epoch 3: Train Loss=0.1048, Train Acc=96.19% | Val Loss=0.3313, Val Acc=88.14%

Early stopping triggered after 3 epochs
Acc=89.08%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 66.57it/s, loss=0.2792]



Epoch 1: Train Loss=0.3546, Train Acc=83.92% | Val Loss=0.2579, Val Acc=89.16%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:26<00:00, 47.43it/s, loss=0.2948]



Epoch 2: Train Loss=0.1987, Train Acc=92.25% | Val Loss=0.2683, Val Acc=89.11%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:22<00:00, 55.85it/s, loss=0.1170]
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:22<00:00, 55.85it/s, loss=0.1170]


Epoch 3: Train Loss=0.1101, Train Acc=96.05% | Val Loss=0.3320, Val Acc=88.19%

Early stopping triggered after 3 epochs
Acc=89.16%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 70.41it/s, loss=0.3179]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 70.41it/s, loss=0.3179]


Epoch 1: Train Loss=0.3564, Train Acc=83.89% | Val Loss=0.2604, Val Acc=89.06%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:24<00:00, 51.18it/s, loss=0.4873]



Epoch 2: Train Loss=0.1976, Train Acc=92.22% | Val Loss=0.2763, Val Acc=88.68%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:22<00:00, 54.95it/s, loss=0.0393]
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:22<00:00, 54.95it/s, loss=0.0393]


Epoch 3: Train Loss=0.1068, Train Acc=96.27% | Val Loss=0.3301, Val Acc=88.27%

Early stopping triggered after 3 epochs
Acc=89.06%
  → Mean CV Accuracy: 89.10% (±0.04%)


Best trial: 1. Best value: 89.0983:  13%|█▎        | 2/15 [09:07<57:24, 264.97s/it]  

[I 2025-11-10 01:05:08,362] Trial 1 finished with value: 89.09833333333334 and parameters: {'hidden_dim1': 512, 'hidden_dim2': 192, 'dropout_rate': 0.4124217733388137, 'learning_rate': 0.00010994335574766199, 'batch_size': 32}. Best is trial 1 with value: 89.09833333333334.

[Trial 2] Testing: hidden=192,96, dropout=0.291, lr=1.12e-03, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.48it/s, loss=0.3288]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.48it/s, loss=0.3288]


Epoch 1: Train Loss=0.3138, Train Acc=86.41% | Val Loss=0.2622, Val Acc=89.04%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 61.14it/s, loss=0.1747]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 61.14it/s, loss=0.1747]


Epoch 2: Train Loss=0.1676, Train Acc=93.41% | Val Loss=0.2962, Val Acc=88.25%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 62.46it/s, loss=0.1182]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 62.46it/s, loss=0.1182]


Epoch 3: Train Loss=0.0801, Train Acc=97.04% | Val Loss=0.3837, Val Acc=88.13%

Early stopping triggered after 3 epochs
Acc=89.04%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.45it/s, loss=0.2491]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.45it/s, loss=0.2491]


Epoch 1: Train Loss=0.3129, Train Acc=86.34% | Val Loss=0.2616, Val Acc=89.16%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 97.23it/s, loss=0.2105] 



Epoch 2: Train Loss=0.1718, Train Acc=93.17% | Val Loss=0.2793, Val Acc=88.86%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:02<00:00, 113.45it/s, loss=0.2050]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:02<00:00, 113.45it/s, loss=0.2050]


Epoch 3: Train Loss=0.0791, Train Acc=97.08% | Val Loss=0.3658, Val Acc=88.25%

Early stopping triggered after 3 epochs
Acc=89.16%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:02<00:00, 109.88it/s, loss=0.2489]



Epoch 1: Train Loss=0.3140, Train Acc=86.33% | Val Loss=0.2630, Val Acc=88.87%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 87.05it/s, loss=0.1463]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 87.05it/s, loss=0.1463]


Epoch 2: Train Loss=0.1676, Train Acc=93.51% | Val Loss=0.2901, Val Acc=88.42%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 74.38it/s, loss=0.0754]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 74.38it/s, loss=0.0754]


Epoch 3: Train Loss=0.0825, Train Acc=97.00% | Val Loss=0.3750, Val Acc=88.23%

Early stopping triggered after 3 epochs
Acc=88.87%
  → Mean CV Accuracy: 89.02% (±0.12%)


Best trial: 1. Best value: 89.0983:  20%|██        | 3/15 [10:01<33:45, 168.76s/it]

[I 2025-11-10 01:06:02,633] Trial 2 finished with value: 89.02333333333333 and parameters: {'hidden_dim1': 192, 'hidden_dim2': 96, 'dropout_rate': 0.2912726728878613, 'learning_rate': 0.0011207606211860567, 'batch_size': 128}. Best is trial 1 with value: 89.09833333333334.

[Trial 3] Testing: hidden=128,128, dropout=0.310, lr=8.17e-04, batch=32
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 72.84it/s, loss=0.2992]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 72.84it/s, loss=0.2992]


Epoch 1: Train Loss=0.3362, Train Acc=85.11% | Val Loss=0.2657, Val Acc=89.05%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:23<00:00, 53.01it/s, loss=0.1541]



Epoch 2: Train Loss=0.2151, Train Acc=91.30% | Val Loss=0.2711, Val Acc=88.71%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 74.85it/s, loss=0.1426]
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 74.85it/s, loss=0.1426]


Epoch 3: Train Loss=0.1430, Train Acc=94.56% | Val Loss=0.3034, Val Acc=88.22%

Early stopping triggered after 3 epochs
Acc=89.05%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 80.52it/s, loss=0.3548]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 80.52it/s, loss=0.3548]


Epoch 1: Train Loss=0.3383, Train Acc=85.14% | Val Loss=0.2673, Val Acc=88.83%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 57.08it/s, loss=0.3465]



Epoch 2: Train Loss=0.2131, Train Acc=91.55% | Val Loss=0.2695, Val Acc=88.77%


Epoch 3/10 [Train]:   0%|          | 0/1250 [00:00<?, ?it/s][A
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 79.85it/s, loss=0.0848]


Epoch 3: Train Loss=0.1436, Train Acc=94.45% | Val Loss=0.2987, Val Acc=88.44%

Early stopping triggered after 3 epochs
Acc=88.83%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 72.50it/s, loss=0.2428]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 72.50it/s, loss=0.2428]


Epoch 1: Train Loss=0.3390, Train Acc=85.26% | Val Loss=0.2729, Val Acc=88.48%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:20<00:00, 60.88it/s, loss=0.2504] 



Epoch 2: Train Loss=0.2125, Train Acc=91.58% | Val Loss=0.2727, Val Acc=88.64%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 69.23it/s, loss=0.0803]



Epoch 3: Train Loss=0.1447, Train Acc=94.52% | Val Loss=0.3157, Val Acc=88.06%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 76.98it/s, loss=0.0413] 
Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 76.98it/s, loss=0.0413] 


Epoch 4: Train Loss=0.1092, Train Acc=95.89% | Val Loss=0.3413, Val Acc=88.17%

Early stopping triggered after 4 epochs
Acc=88.64%
  → Mean CV Accuracy: 88.84% (±0.17%)


Best trial: 1. Best value: 89.0983:  27%|██▋       | 4/15 [14:01<36:04, 196.73s/it]

[I 2025-11-10 01:10:02,238] Trial 3 finished with value: 88.83666666666666 and parameters: {'hidden_dim1': 128, 'hidden_dim2': 128, 'dropout_rate': 0.3099085529881075, 'learning_rate': 0.000816845589476017, 'batch_size': 32}. Best is trial 1 with value: 89.09833333333334.

[Trial 4] Testing: hidden=384,64, dropout=0.382, lr=2.19e-04, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.80it/s, loss=0.2392]



Epoch 1: Train Loss=0.3470, Train Acc=84.45% | Val Loss=0.2574, Val Acc=89.13%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.51it/s, loss=0.2261]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.51it/s, loss=0.2261]


Epoch 2: Train Loss=0.1730, Train Acc=93.50% | Val Loss=0.2788, Val Acc=88.91%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 64.20it/s, loss=0.0552]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 64.20it/s, loss=0.0552]


Epoch 3: Train Loss=0.0750, Train Acc=97.54% | Val Loss=0.3416, Val Acc=88.47%

Early stopping triggered after 3 epochs
Acc=89.13%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 63.93it/s, loss=0.2778]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 63.93it/s, loss=0.2778]


Epoch 1: Train Loss=0.3472, Train Acc=84.54% | Val Loss=0.2548, Val Acc=89.44%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 59.58it/s, loss=0.2018]



Epoch 2: Train Loss=0.1681, Train Acc=93.73% | Val Loss=0.2743, Val Acc=88.97%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 65.27it/s, loss=0.0998]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 65.27it/s, loss=0.0998]


Epoch 3: Train Loss=0.0731, Train Acc=97.69% | Val Loss=0.3356, Val Acc=88.62%

Early stopping triggered after 3 epochs
Acc=89.44%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 48.03it/s, loss=0.2801]



Epoch 1: Train Loss=0.3488, Train Acc=84.51% | Val Loss=0.2574, Val Acc=89.46%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 60.09it/s, loss=0.1440]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 60.09it/s, loss=0.1440]


Epoch 2: Train Loss=0.1737, Train Acc=93.44% | Val Loss=0.2750, Val Acc=89.01%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 54.21it/s, loss=0.0631]



Epoch 3: Train Loss=0.0774, Train Acc=97.65% | Val Loss=0.3345, Val Acc=88.38%

Early stopping triggered after 3 epochs
Acc=89.46%
  → Mean CV Accuracy: 89.34% (±0.15%)


Best trial: 4. Best value: 89.345:  33%|███▎      | 5/15 [15:08<24:59, 149.91s/it] 

[I 2025-11-10 01:11:09,117] Trial 4 finished with value: 89.34499999999998 and parameters: {'hidden_dim1': 384, 'hidden_dim2': 64, 'dropout_rate': 0.3822634555704315, 'learning_rate': 0.00021930485556643703, 'batch_size': 128}. Best is trial 4 with value: 89.34499999999998.

[Trial 5] Testing: hidden=448,128, dropout=0.229, lr=2.34e-03, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 59.20it/s, loss=0.3277]



Epoch 1: Train Loss=0.3139, Train Acc=86.27% | Val Loss=0.2721, Val Acc=88.62%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 44.88it/s, loss=0.2014]



Epoch 2: Train Loss=0.1640, Train Acc=93.60% | Val Loss=0.3099, Val Acc=87.94%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 64.41it/s, loss=0.1390]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 64.41it/s, loss=0.1390]


Epoch 3: Train Loss=0.0699, Train Acc=97.27% | Val Loss=0.4511, Val Acc=87.61%

Early stopping triggered after 3 epochs
Acc=88.62%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 42.76it/s, loss=0.2349]



Epoch 1: Train Loss=0.3111, Train Acc=86.46% | Val Loss=0.2673, Val Acc=88.75%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 37.50it/s, loss=0.1336]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 37.50it/s, loss=0.1336]


Epoch 2: Train Loss=0.1605, Train Acc=93.53% | Val Loss=0.3131, Val Acc=88.28%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 36.21it/s, loss=0.0688]



Epoch 3: Train Loss=0.0701, Train Acc=97.43% | Val Loss=0.4167, Val Acc=88.23%

Early stopping triggered after 3 epochs
Acc=88.75%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 34.97it/s, loss=0.3587]



Epoch 1: Train Loss=0.3096, Train Acc=86.63% | Val Loss=0.2660, Val Acc=88.77%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 55.48it/s, loss=0.2284]



Epoch 2: Train Loss=0.1653, Train Acc=93.34% | Val Loss=0.2953, Val Acc=88.40%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 55.47it/s, loss=0.0894]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 55.47it/s, loss=0.0894]


Epoch 3: Train Loss=0.0704, Train Acc=97.42% | Val Loss=0.4036, Val Acc=87.94%

Early stopping triggered after 3 epochs
Acc=88.77%
  → Mean CV Accuracy: 88.71% (±0.07%)


Best trial: 4. Best value: 89.345:  40%|████      | 6/15 [16:40<19:33, 130.35s/it]

[I 2025-11-10 01:12:41,506] Trial 5 finished with value: 88.71499999999999 and parameters: {'hidden_dim1': 448, 'hidden_dim2': 128, 'dropout_rate': 0.22930163420191518, 'learning_rate': 0.0023359635026261607, 'batch_size': 128}. Best is trial 4 with value: 89.34499999999998.

[Trial 6] Testing: hidden=128,256, dropout=0.278, lr=2.11e-03, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.41it/s, loss=0.3376]



Epoch 1: Train Loss=0.3151, Train Acc=86.22% | Val Loss=0.2625, Val Acc=89.12%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 49.42it/s, loss=0.2540]



Epoch 2: Train Loss=0.1837, Train Acc=92.78% | Val Loss=0.2938, Val Acc=88.69%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 43.96it/s, loss=0.0863]



Epoch 3: Train Loss=0.0981, Train Acc=96.39% | Val Loss=0.3474, Val Acc=87.89%

Early stopping triggered after 3 epochs
Acc=89.12%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 39.63it/s, loss=0.2382]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 39.63it/s, loss=0.2382]


Epoch 1: Train Loss=0.3204, Train Acc=85.98% | Val Loss=0.2646, Val Acc=88.83%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 38.69it/s, loss=0.1427]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 38.69it/s, loss=0.1427]


Epoch 2: Train Loss=0.1831, Train Acc=92.82% | Val Loss=0.2901, Val Acc=88.69%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 59.69it/s, loss=0.1008]



Epoch 3: Train Loss=0.0938, Train Acc=96.53% | Val Loss=0.3666, Val Acc=88.08%

Early stopping triggered after 3 epochs
Acc=88.83%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 58.61it/s, loss=0.2414]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 58.61it/s, loss=0.2414]


Epoch 1: Train Loss=0.3172, Train Acc=86.36% | Val Loss=0.2668, Val Acc=88.83%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 50.57it/s, loss=0.1631]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 50.57it/s, loss=0.1631]


Epoch 2: Train Loss=0.1782, Train Acc=92.85% | Val Loss=0.3042, Val Acc=88.30%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 51.49it/s, loss=0.1055]



Epoch 3: Train Loss=0.0986, Train Acc=96.29% | Val Loss=0.3603, Val Acc=87.80%

Early stopping triggered after 3 epochs
Acc=88.83%
  → Mean CV Accuracy: 88.93% (±0.13%)


Best trial: 4. Best value: 89.345:  47%|████▋     | 7/15 [18:02<15:16, 114.55s/it]

[I 2025-11-10 01:14:03,541] Trial 6 finished with value: 88.92999999999999 and parameters: {'hidden_dim1': 128, 'hidden_dim2': 256, 'dropout_rate': 0.2776339944800051, 'learning_rate': 0.0021137059440645744, 'batch_size': 128}. Best is trial 4 with value: 89.34499999999998.

[Trial 7] Testing: hidden=192,256, dropout=0.433, lr=7.57e-03, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.93it/s, loss=0.3474]



Epoch 1: Train Loss=0.3320, Train Acc=85.74% | Val Loss=0.2678, Val Acc=88.69%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 71.65it/s, loss=0.2731]



Epoch 2: Train Loss=0.1968, Train Acc=92.27% | Val Loss=0.2745, Val Acc=88.55%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 60.46it/s, loss=0.1709]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 60.46it/s, loss=0.1709]


Epoch 3: Train Loss=0.1187, Train Acc=95.34% | Val Loss=0.3317, Val Acc=87.60%

Early stopping triggered after 3 epochs
Acc=88.69%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 64.23it/s, loss=0.2742]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 64.23it/s, loss=0.2742]


Epoch 1: Train Loss=0.3333, Train Acc=85.79% | Val Loss=0.2723, Val Acc=88.80%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.59it/s, loss=0.3133]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.59it/s, loss=0.3133]


Epoch 2: Train Loss=0.1952, Train Acc=92.35% | Val Loss=0.2811, Val Acc=88.69%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 52.60it/s, loss=0.1054]



Epoch 3: Train Loss=0.1181, Train Acc=95.47% | Val Loss=0.3302, Val Acc=88.02%

Early stopping triggered after 3 epochs
Acc=88.80%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 62.87it/s, loss=0.2572]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 62.87it/s, loss=0.2572]


Epoch 1: Train Loss=0.3328, Train Acc=85.63% | Val Loss=0.2741, Val Acc=88.52%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 54.80it/s, loss=0.1575]



Epoch 2: Train Loss=0.1954, Train Acc=92.25% | Val Loss=0.2816, Val Acc=88.72%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 71.90it/s, loss=0.1708]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 71.90it/s, loss=0.1708]


Epoch 3: Train Loss=0.1180, Train Acc=95.47% | Val Loss=0.3299, Val Acc=88.19%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 50.56it/s, loss=0.0837]



Epoch 4: Train Loss=0.0856, Train Acc=96.84% | Val Loss=0.4001, Val Acc=87.78%

Early stopping triggered after 4 epochs
Acc=88.72%
  → Mean CV Accuracy: 88.74% (±0.05%)


Best trial: 4. Best value: 89.345:  53%|█████▎    | 8/15 [19:13<11:43, 100.49s/it]

[I 2025-11-10 01:15:13,928] Trial 7 finished with value: 88.73500000000001 and parameters: {'hidden_dim1': 192, 'hidden_dim2': 256, 'dropout_rate': 0.43253984700833437, 'learning_rate': 0.007568292060167618, 'batch_size': 128}. Best is trial 4 with value: 89.34499999999998.

[Trial 8] Testing: hidden=128,96, dropout=0.214, lr=4.47e-04, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.28it/s, loss=0.3186]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.28it/s, loss=0.3186]


Epoch 1: Train Loss=0.3249, Train Acc=85.81% | Val Loss=0.2573, Val Acc=89.08%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 65.81it/s, loss=0.1738]



Epoch 2: Train Loss=0.1575, Train Acc=93.96% | Val Loss=0.2936, Val Acc=88.44%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 57.92it/s, loss=0.0346]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 57.92it/s, loss=0.0346]


Epoch 3: Train Loss=0.0707, Train Acc=97.58% | Val Loss=0.3852, Val Acc=87.92%

Early stopping triggered after 3 epochs
Acc=89.08%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 39.55it/s, loss=0.2161]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 39.55it/s, loss=0.2161]


Epoch 1: Train Loss=0.3245, Train Acc=85.72% | Val Loss=0.2584, Val Acc=89.17%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 43.47it/s, loss=0.2001]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 43.47it/s, loss=0.2001]


Epoch 2: Train Loss=0.1602, Train Acc=93.92% | Val Loss=0.2905, Val Acc=88.48%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 41.94it/s, loss=0.0671]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 41.94it/s, loss=0.0671]


Epoch 3: Train Loss=0.0722, Train Acc=97.54% | Val Loss=0.3801, Val Acc=87.66%

Early stopping triggered after 3 epochs
Acc=89.17%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.65it/s, loss=0.2899]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.65it/s, loss=0.2899]


Epoch 1: Train Loss=0.3227, Train Acc=85.76% | Val Loss=0.2587, Val Acc=89.06%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 42.99it/s, loss=0.1382]



Epoch 2: Train Loss=0.1566, Train Acc=94.25% | Val Loss=0.2939, Val Acc=88.31%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.86it/s, loss=0.0899]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.86it/s, loss=0.0899]


Epoch 3: Train Loss=0.0674, Train Acc=97.74% | Val Loss=0.3905, Val Acc=87.86%

Early stopping triggered after 3 epochs
Acc=89.06%
  → Mean CV Accuracy: 89.10% (±0.05%)


Best trial: 4. Best value: 89.345:  60%|██████    | 9/15 [20:34<09:26, 94.45s/it] 

[I 2025-11-10 01:16:35,082] Trial 8 finished with value: 89.10166666666667 and parameters: {'hidden_dim1': 128, 'hidden_dim2': 96, 'dropout_rate': 0.21356818667316144, 'learning_rate': 0.0004473636174621269, 'batch_size': 128}. Best is trial 4 with value: 89.34499999999998.

[Trial 9] Testing: hidden=256,96, dropout=0.363, lr=1.91e-04, batch=128
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.48it/s, loss=0.2423]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.48it/s, loss=0.2423]


Epoch 1: Train Loss=0.3592, Train Acc=83.63% | Val Loss=0.2540, Val Acc=89.34%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 40.56it/s, loss=0.1195]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 40.56it/s, loss=0.1195]


Epoch 2: Train Loss=0.1766, Train Acc=93.29% | Val Loss=0.2747, Val Acc=88.92%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 40.89it/s, loss=0.0909]



Epoch 3: Train Loss=0.0860, Train Acc=97.11% | Val Loss=0.3243, Val Acc=88.24%

Early stopping triggered after 3 epochs
Acc=89.34%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 37.78it/s, loss=0.4710]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 37.78it/s, loss=0.4710]


Epoch 1: Train Loss=0.3583, Train Acc=83.63% | Val Loss=0.2561, Val Acc=89.31%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 76.68it/s, loss=0.2371]



Epoch 2: Train Loss=0.1767, Train Acc=93.35% | Val Loss=0.2740, Val Acc=89.09%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 35.74it/s, loss=0.1322]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 35.74it/s, loss=0.1322]


Epoch 3: Train Loss=0.0868, Train Acc=97.07% | Val Loss=0.3229, Val Acc=88.56%

Early stopping triggered after 3 epochs
Acc=89.31%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 60.29it/s, loss=0.2524] 



Epoch 1: Train Loss=0.3556, Train Acc=83.90% | Val Loss=0.2575, Val Acc=89.22%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 62.31it/s, loss=0.1656]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 62.31it/s, loss=0.1656]


Epoch 2: Train Loss=0.1759, Train Acc=93.36% | Val Loss=0.2732, Val Acc=88.93%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 61.14it/s, loss=0.0603]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 61.14it/s, loss=0.0603]


Epoch 3: Train Loss=0.0899, Train Acc=96.95% | Val Loss=0.3243, Val Acc=88.47%

Early stopping triggered after 3 epochs
Acc=89.22%
  → Mean CV Accuracy: 89.29% (±0.05%)


Best trial: 4. Best value: 89.345:  67%|██████▋   | 10/15 [21:57<07:35, 91.02s/it]

[I 2025-11-10 01:17:58,429] Trial 9 finished with value: 89.29333333333334 and parameters: {'hidden_dim1': 256, 'hidden_dim2': 96, 'dropout_rate': 0.3628088249474746, 'learning_rate': 0.00019135880487692312, 'batch_size': 128}. Best is trial 4 with value: 89.34499999999998.

[Trial 10] Testing: hidden=384,64, dropout=0.491, lr=3.16e-04, batch=64
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 69.99it/s, loss=0.1695]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 69.99it/s, loss=0.1695]


Epoch 1: Train Loss=0.3473, Train Acc=84.50% | Val Loss=0.2585, Val Acc=89.17%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.61it/s, loss=0.1766]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.61it/s, loss=0.1766]


Epoch 2: Train Loss=0.1944, Train Acc=92.72% | Val Loss=0.2751, Val Acc=88.84%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 67.59it/s, loss=0.1822]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 67.59it/s, loss=0.1822]


Epoch 3: Train Loss=0.1117, Train Acc=96.00% | Val Loss=0.3342, Val Acc=88.59%

Early stopping triggered after 3 epochs
Acc=89.17%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.98it/s, loss=0.2831]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.98it/s, loss=0.2831]


Epoch 1: Train Loss=0.3428, Train Acc=84.96% | Val Loss=0.2572, Val Acc=89.22%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 62.73it/s, loss=0.1903]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 62.73it/s, loss=0.1903]


Epoch 2: Train Loss=0.1957, Train Acc=92.33% | Val Loss=0.2709, Val Acc=89.03%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 66.97it/s, loss=0.1378]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 66.97it/s, loss=0.1378]


Epoch 3: Train Loss=0.1157, Train Acc=95.80% | Val Loss=0.3242, Val Acc=88.45%

Early stopping triggered after 3 epochs
Acc=89.22%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 68.04it/s, loss=0.3619]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 68.04it/s, loss=0.3619]


Epoch 1: Train Loss=0.3456, Train Acc=84.81% | Val Loss=0.2578, Val Acc=89.21%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:13<00:00, 46.13it/s, loss=0.2723]



Epoch 2: Train Loss=0.1968, Train Acc=92.46% | Val Loss=0.2747, Val Acc=88.78%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 54.22it/s, loss=0.1784]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 54.22it/s, loss=0.1784]


Epoch 3: Train Loss=0.1158, Train Acc=95.88% | Val Loss=0.3234, Val Acc=88.62%

Early stopping triggered after 3 epochs
Acc=89.21%
  → Mean CV Accuracy: 89.20% (±0.02%)


Best trial: 4. Best value: 89.345:  73%|███████▎  | 11/15 [23:55<06:36, 99.21s/it]

[I 2025-11-10 01:19:56,203] Trial 10 finished with value: 89.19666666666666 and parameters: {'hidden_dim1': 384, 'hidden_dim2': 64, 'dropout_rate': 0.4908611332363598, 'learning_rate': 0.0003156826122536684, 'batch_size': 64}. Best is trial 4 with value: 89.34499999999998.

[Trial 11] Testing: hidden=320,64, dropout=0.354, lr=1.18e-04, batch=64
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 61.64it/s, loss=0.2248]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 61.64it/s, loss=0.2248]


Epoch 1: Train Loss=0.3688, Train Acc=83.51% | Val Loss=0.2573, Val Acc=89.55%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 53.68it/s, loss=0.1743]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 53.68it/s, loss=0.1743]


Epoch 2: Train Loss=0.1844, Train Acc=93.13% | Val Loss=0.2691, Val Acc=88.94%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:14<00:00, 43.26it/s, loss=0.0569]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:14<00:00, 43.26it/s, loss=0.0569]


Epoch 3: Train Loss=0.0920, Train Acc=96.93% | Val Loss=0.3145, Val Acc=88.58%

Early stopping triggered after 3 epochs
Acc=89.55%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 53.91it/s, loss=0.1874]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 53.91it/s, loss=0.1874]


Epoch 1: Train Loss=0.3689, Train Acc=83.41% | Val Loss=0.2567, Val Acc=89.47%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:13<00:00, 45.75it/s, loss=0.1346]



Epoch 2: Train Loss=0.1877, Train Acc=93.01% | Val Loss=0.2664, Val Acc=88.97%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 55.41it/s, loss=0.0874]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 55.41it/s, loss=0.0874]


Epoch 3: Train Loss=0.0962, Train Acc=96.80% | Val Loss=0.3154, Val Acc=88.61%

Early stopping triggered after 3 epochs
Acc=89.47%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 52.93it/s, loss=0.3793]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 52.93it/s, loss=0.3793]


Epoch 1: Train Loss=0.3668, Train Acc=83.29% | Val Loss=0.2581, Val Acc=89.31%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 60.53it/s, loss=0.0765]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 60.53it/s, loss=0.0765]


Epoch 2: Train Loss=0.1883, Train Acc=92.88% | Val Loss=0.2664, Val Acc=89.12%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.67it/s, loss=0.0751]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.67it/s, loss=0.0751]


Epoch 3: Train Loss=0.0956, Train Acc=96.86% | Val Loss=0.3089, Val Acc=88.78%

Early stopping triggered after 3 epochs
Acc=89.31%
  → Mean CV Accuracy: 89.44% (±0.10%)


Best trial: 11. Best value: 89.44:  80%|████████  | 12/15 [26:15<05:35, 111.74s/it]

[I 2025-11-10 01:22:16,607] Trial 11 finished with value: 89.44 and parameters: {'hidden_dim1': 320, 'hidden_dim2': 64, 'dropout_rate': 0.3535602771439335, 'learning_rate': 0.00011835846181383978, 'batch_size': 64}. Best is trial 11 with value: 89.44.

[Trial 12] Testing: hidden=384,64, dropout=0.354, lr=1.15e-04, batch=64
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 68.79it/s, loss=0.1638]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 68.79it/s, loss=0.1638]


Epoch 1: Train Loss=0.3655, Train Acc=83.51% | Val Loss=0.2591, Val Acc=89.32%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 81.53it/s, loss=0.1341]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 81.53it/s, loss=0.1341]


Epoch 2: Train Loss=0.1885, Train Acc=92.82% | Val Loss=0.2693, Val Acc=89.12%


Epoch 3/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 92.18it/s, loss=0.1095] 


Epoch 3: Train Loss=0.0970, Train Acc=96.72% | Val Loss=0.3129, Val Acc=88.78%

Early stopping triggered after 3 epochs
Acc=89.32%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 64.17it/s, loss=0.2393]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 64.17it/s, loss=0.2393]


Epoch 1: Train Loss=0.3641, Train Acc=83.44% | Val Loss=0.2554, Val Acc=89.53%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 66.71it/s, loss=0.2349]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 66.71it/s, loss=0.2349]


Epoch 2: Train Loss=0.1839, Train Acc=93.00% | Val Loss=0.2652, Val Acc=88.98%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.09it/s, loss=0.1022]



Epoch 3: Train Loss=0.0893, Train Acc=97.08% | Val Loss=0.3095, Val Acc=88.58%

Early stopping triggered after 3 epochs
Acc=89.53%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 57.99it/s, loss=0.2697]



Epoch 1: Train Loss=0.3613, Train Acc=83.68% | Val Loss=0.2565, Val Acc=89.46%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 50.36it/s, loss=0.1164]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 50.36it/s, loss=0.1164]


Epoch 2: Train Loss=0.1848, Train Acc=93.03% | Val Loss=0.2705, Val Acc=88.94%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:14<00:00, 42.57it/s, loss=0.1171]



Epoch 3: Train Loss=0.0925, Train Acc=97.07% | Val Loss=0.3221, Val Acc=88.26%

Early stopping triggered after 3 epochs
Acc=89.46%
  → Mean CV Accuracy: 89.44% (±0.09%)


Best trial: 11. Best value: 89.44:  87%|████████▋ | 13/15 [28:13<03:46, 113.47s/it]

[I 2025-11-10 01:24:14,045] Trial 12 finished with value: 89.435 and parameters: {'hidden_dim1': 384, 'hidden_dim2': 64, 'dropout_rate': 0.35410216283608925, 'learning_rate': 0.00011509651352150205, 'batch_size': 64}. Best is trial 11 with value: 89.44.

[Trial 13] Testing: hidden=320,192, dropout=0.338, lr=1.04e-04, batch=64
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 58.51it/s, loss=0.2837]



Epoch 1: Train Loss=0.3620, Train Acc=83.27% | Val Loss=0.2579, Val Acc=89.25%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.48it/s, loss=0.2367]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.48it/s, loss=0.2367]


Epoch 2: Train Loss=0.1811, Train Acc=92.91% | Val Loss=0.2728, Val Acc=89.03%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 58.62it/s, loss=0.0848]



Epoch 3: Train Loss=0.0865, Train Acc=97.15% | Val Loss=0.3267, Val Acc=88.71%

Early stopping triggered after 3 epochs
Acc=89.25%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:14<00:00, 43.87it/s, loss=0.2578]



Epoch 1: Train Loss=0.3583, Train Acc=83.83% | Val Loss=0.2552, Val Acc=89.31%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 57.32it/s, loss=0.2107]



Epoch 2: Train Loss=0.1765, Train Acc=93.23% | Val Loss=0.2722, Val Acc=88.88%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 50.20it/s, loss=0.0464]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 50.20it/s, loss=0.0464]


Epoch 3: Train Loss=0.0809, Train Acc=97.32% | Val Loss=0.3194, Val Acc=88.64%

Early stopping triggered after 3 epochs
Acc=89.31%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 64.19it/s, loss=0.3508]



Epoch 1: Train Loss=0.3570, Train Acc=83.70% | Val Loss=0.2591, Val Acc=89.31%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 73.31it/s, loss=0.1296]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 73.31it/s, loss=0.1296]


Epoch 2: Train Loss=0.1785, Train Acc=93.28% | Val Loss=0.2736, Val Acc=88.91%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.17it/s, loss=0.1873]



Epoch 3: Train Loss=0.0861, Train Acc=97.15% | Val Loss=0.3352, Val Acc=88.39%

Early stopping triggered after 3 epochs
Acc=89.31%
  → Mean CV Accuracy: 89.30% (±0.03%)


Best trial: 11. Best value: 89.44:  93%|█████████▎| 14/15 [30:27<01:59, 119.63s/it]

[I 2025-11-10 01:26:27,914] Trial 13 finished with value: 89.295 and parameters: {'hidden_dim1': 320, 'hidden_dim2': 192, 'dropout_rate': 0.33764300167744815, 'learning_rate': 0.00010363921055696355, 'batch_size': 64}. Best is trial 11 with value: 89.44.

[Trial 14] Testing: hidden=384,64, dropout=0.339, lr=5.29e-04, batch=64
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 64.72it/s, loss=0.1959]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 64.72it/s, loss=0.1959]


Epoch 1: Train Loss=0.3224, Train Acc=85.89% | Val Loss=0.2630, Val Acc=88.92%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 68.77it/s, loss=0.2015]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 68.77it/s, loss=0.2015]


Epoch 2: Train Loss=0.1786, Train Acc=93.03% | Val Loss=0.2833, Val Acc=88.36%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 74.76it/s, loss=0.0592]



Epoch 3: Train Loss=0.0846, Train Acc=96.98% | Val Loss=0.3875, Val Acc=88.16%

Early stopping triggered after 3 epochs
Acc=88.92%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 81.87it/s, loss=0.1899]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 81.87it/s, loss=0.1899]


Epoch 1: Train Loss=0.3234, Train Acc=86.06% | Val Loss=0.2602, Val Acc=89.31%


Epoch 2/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 71.48it/s, loss=0.1754]


Epoch 2: Train Loss=0.1760, Train Acc=93.12% | Val Loss=0.2886, Val Acc=88.53%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.84it/s, loss=0.0668]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.84it/s, loss=0.0668]


Epoch 3: Train Loss=0.0833, Train Acc=96.93% | Val Loss=0.3756, Val Acc=88.09%

Early stopping triggered after 3 epochs
Acc=89.31%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 81.34it/s, loss=0.3068]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 81.34it/s, loss=0.3068]


Epoch 1: Train Loss=0.3198, Train Acc=86.16% | Val Loss=0.2621, Val Acc=89.17%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.95it/s, loss=0.2174]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.95it/s, loss=0.2174]


Epoch 2: Train Loss=0.1730, Train Acc=93.30% | Val Loss=0.2912, Val Acc=88.72%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 67.42it/s, loss=0.0581]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 67.42it/s, loss=0.0581]


Epoch 3: Train Loss=0.0765, Train Acc=97.25% | Val Loss=0.4010, Val Acc=88.31%

Early stopping triggered after 3 epochs
Acc=89.17%
  → Mean CV Accuracy: 89.13% (±0.16%)


Best trial: 11. Best value: 89.44: 100%|██████████| 15/15 [32:14<00:00, 128.95s/it]

[I 2025-11-10 01:28:15,190] Trial 14 finished with value: 89.13333333333334 and parameters: {'hidden_dim1': 384, 'hidden_dim2': 64, 'dropout_rate': 0.33865304013034375, 'learning_rate': 0.000529056994167218, 'batch_size': 64}. Best is trial 11 with value: 89.44.

K-FOLD CV OPTIMIZATION COMPLETE - TF-IDF + FFNN
Best mean CV accuracy: 89.44%

Best hyperparameters:
  hidden_dim1: 320
  hidden_dim2: 64
  dropout_rate: 0.3535602771439335
  learning_rate: 0.00011835846181383978
  batch_size: 64

✓ K-Fold CV optimization results saved





In [15]:
## 2.6: Train Final FFNN Model with Best Hyperparameters

print("Training final TF-IDF + FFNN model with best hyperparameters...\n")

best_params = study_ffnn.best_params

# Create final model
final_ffnn_model = FFNNClassifier(
    input_dim=X_train_tfidf.shape[1],
    hidden_dims=[best_params['hidden_dim1'], best_params['hidden_dim2']],
    dropout_rate=best_params['dropout_rate']
).to(device)

# Create dataloaders
train_dataset = torch.utils.data.TensorDataset(
    torch.FloatTensor(X_train_tfidf),
    torch.LongTensor(train_labels)
)
val_dataset = torch.utils.data.TensorDataset(
    torch.FloatTensor(X_val_tfidf),
    torch.LongTensor(val_labels)
)

train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=best_params['batch_size'])

# Optimizer and loss
optimizer = optim.Adam(final_ffnn_model.parameters(), lr=best_params['learning_rate'])
criterion = nn.CrossEntropyLoss()

# Train full model
history_ffnn, best_val_acc = train_ffnn(
    final_ffnn_model, train_loader, val_loader, optimizer, criterion,
    num_epochs=30, patience=5, device=device, save_path='models/tfidf_ffnn/best_model.pt'
)

# Save training history
with open('models/tfidf_ffnn/training_history.pkl', 'wb') as f:
    pickle.dump(history_ffnn, f)

# Save hyperparameters
with open('models/tfidf_ffnn/hyperparameters.json', 'w') as f:
    json.dump(best_params, f, indent=2)

print("\n" + "="*70)
print("TF-IDF + FFNN TRAINING COMPLETE")
print("="*70)
print(f"Best validation accuracy: {best_val_acc:.2f}%")
print("\n✓ Model saved to: models/tfidf_ffnn/")

Training final TF-IDF + FFNN model with best hyperparameters...



Epoch 1/30 [Train]: 100%|██████████| 750/750 [00:13<00:00, 55.79it/s, loss=0.1832]



Epoch 1: Train Loss=0.3527, Train Acc=84.03% | Val Loss=0.2603, Val Acc=89.26%
  ✓ New best model saved to models/tfidf_ffnn/best_model.pt (Val Acc: 89.26%)


Epoch 2/30 [Train]: 100%|██████████| 750/750 [00:09<00:00, 75.25it/s, loss=0.1938] 



Epoch 2: Train Loss=0.1891, Train Acc=92.87% | Val Loss=0.2723, Val Acc=88.61%


Epoch 3/30 [Train]: 100%|██████████| 750/750 [00:10<00:00, 70.39it/s, loss=0.1279] 



Epoch 3: Train Loss=0.1042, Train Acc=96.40% | Val Loss=0.3190, Val Acc=88.33%


Epoch 4/30 [Train]: 100%|██████████| 750/750 [00:09<00:00, 78.51it/s, loss=0.0400] 
Epoch 4/30 [Train]: 100%|██████████| 750/750 [00:09<00:00, 78.51it/s, loss=0.0400]


Epoch 4: Train Loss=0.0508, Train Acc=98.40% | Val Loss=0.3830, Val Acc=87.95%


Epoch 5/30 [Train]: 100%|██████████| 750/750 [00:14<00:00, 50.29it/s, loss=0.0356]
Epoch 5/30 [Train]: 100%|██████████| 750/750 [00:14<00:00, 50.29it/s, loss=0.0356]


Epoch 5: Train Loss=0.0288, Train Acc=99.17% | Val Loss=0.4313, Val Acc=87.92%


Epoch 6/30 [Train]: 100%|██████████| 750/750 [00:11<00:00, 63.86it/s, loss=0.0050] 
Epoch 6/30 [Train]: 100%|██████████| 750/750 [00:11<00:00, 63.86it/s, loss=0.0050]


Epoch 6: Train Loss=0.0211, Train Acc=99.33% | Val Loss=0.4624, Val Acc=87.96%

Early stopping triggered after 6 epochs

TF-IDF + FFNN TRAINING COMPLETE
Best validation accuracy: 89.26%

✓ Model saved to: models/tfidf_ffnn/


In [16]:
## 3.1: Generate E5 Embeddings

print("Loading E5 embedding model...")
print("This will download ~400MB on first run...\n")

# Load E5 model (or use MiniLM as alternative)
# E5 is stronger but larger, MiniLM is faster
embedding_model_name = 'intfloat/e5-small-v2'  # or 'sentence-transformers/all-MiniLM-L6-v2'
embedding_model = SentenceTransformer(embedding_model_name, device=device)

print(f"✓ Loaded: {embedding_model_name}")
print(f"  Embedding dimension: {embedding_model.get_sentence_embedding_dimension()}")

# Generate embeddings
print("\nGenerating embeddings...")
print("This may take 5-10 minutes...\n")

# For E5, prefix with "query: " for better performance
if 'e5' in embedding_model_name.lower():
    train_texts_prefixed = [f"query: {text}" for text in train_texts]
    val_texts_prefixed = [f"query: {text}" for text in val_texts]
else:
    train_texts_prefixed = train_texts
    val_texts_prefixed = val_texts

# Generate embeddings in batches
X_train_embeddings = embedding_model.encode(
    train_texts_prefixed,
    batch_size=64,
    show_progress_bar=True,
    convert_to_numpy=True
)

X_val_embeddings = embedding_model.encode(
    val_texts_prefixed,
    batch_size=64,
    show_progress_bar=True,
    convert_to_numpy=True
)

print(f"\n✓ Embeddings generated:")
print(f"  Train shape: {X_train_embeddings.shape}")
print(f"  Val shape:   {X_val_embeddings.shape}")

# Save embeddings (for faster re-runs)
np.save('models/e5_classifier/train_embeddings.npy', X_train_embeddings)
np.save('models/e5_classifier/val_embeddings.npy', X_val_embeddings)
print("\n✓ Embeddings saved")

Loading E5 embedding model...
This will download ~400MB on first run...

✓ Loaded: intfloat/e5-small-v2
  Embedding dimension: 384

Generating embeddings...
This may take 5-10 minutes...

✓ Loaded: intfloat/e5-small-v2
  Embedding dimension: 384

Generating embeddings...
This may take 5-10 minutes...



Batches: 100%|██████████| 750/750 [02:06<00:00,  5.94it/s]

Batches: 100%|██████████| 188/188 [00:34<00:00,  5.45it/s]




✓ Embeddings generated:
  Train shape: (48000, 384)
  Val shape:   (12000, 384)

✓ Embeddings saved


In [17]:
## 3.2: Define Classifier Head for Embeddings

class EmbeddingClassifier(nn.Module):
    """Simple classifier head for pre-computed embeddings"""
    
    def __init__(self, embedding_dim, hidden_dim, dropout_rate=0.3):
        super(EmbeddingClassifier, self).__init__()
        
        self.classifier = nn.Sequential(
            nn.Linear(embedding_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.BatchNorm1d(hidden_dim),
            nn.Linear(hidden_dim, 2)  # Binary classification
        )
    
    def forward(self, x):
        return self.classifier(x)

print("✓ Embedding classifier class defined")

✓ Embedding classifier class defined


In [18]:
## 3.3: Hyperparameter Optimization for Embedding Classifier with K-Fold CV

def objective_embedding_kfold(trial):
    """Optuna objective for embedding classifier with K-Fold CV"""
    
    # Hyperparameters
    hidden_dim = trial.suggest_int('hidden_dim', 64, 256, step=32)
    dropout_rate = trial.suggest_float('dropout_rate', 0.2, 0.5)
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-4, 1e-2)
    batch_size = trial.suggest_categorical('batch_size', [32, 64, 128])
    
    print(f"\n[Trial {trial.number}] Testing: hidden={hidden_dim}, "
          f"dropout={dropout_rate:.3f}, lr={learning_rate:.2e}, batch={batch_size}")
    
    # K-Fold Cross-Validation setup
    n_folds = 3
    skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=SEED)
    
    fold_scores = []
    
    # Combine train and val embeddings for k-fold
    X_combined = np.vstack([X_train_embeddings, X_val_embeddings])
    y_combined = np.array(train_labels + val_labels)
    
    for fold_idx, (train_idx, val_idx) in enumerate(skf.split(X_combined, y_combined)):
        print(f"  Fold {fold_idx+1}/{n_folds}...", end=" ")
        
        # Split data for this fold
        X_fold_train = X_combined[train_idx]
        y_fold_train = y_combined[train_idx]
        X_fold_val = X_combined[val_idx]
        y_fold_val = y_combined[val_idx]
        
        # Create fresh model for this fold
        model = EmbeddingClassifier(
            embedding_dim=X_train_embeddings.shape[1],
            hidden_dim=hidden_dim,
            dropout_rate=dropout_rate
        ).to(device)
        
        # Create dataloaders
        train_dataset = torch.utils.data.TensorDataset(
            torch.FloatTensor(X_fold_train),
            torch.LongTensor(y_fold_train)
        )
        val_dataset = torch.utils.data.TensorDataset(
            torch.FloatTensor(X_fold_val),
            torch.LongTensor(y_fold_val)
        )
        
        # Use drop_last=True to avoid BatchNorm issues with single-sample batches
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size, drop_last=False)
        
        # Optimizer and loss
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
        criterion = nn.CrossEntropyLoss()
        
        # Train (reuse training function)
        _, best_val_acc = train_ffnn(
            model, train_loader, val_loader, optimizer, criterion,
            num_epochs=10, patience=2, device=device, save_path=None
        )
        
        fold_scores.append(best_val_acc)
        print(f"Acc={best_val_acc:.2f}%")
        
        # Clean up to save memory
        del model, optimizer, train_loader, val_loader
        torch.cuda.empty_cache() if torch.cuda.is_available() else None
    
    # Return mean accuracy across all folds
    mean_acc = np.mean(fold_scores)
    std_acc = np.std(fold_scores)
    print(f"  → Mean CV Accuracy: {mean_acc:.2f}% (±{std_acc:.2f}%)")
    
    return mean_acc

print("Starting K-Fold CV hyperparameter optimization for E5 classifier...")
print("Using 3-fold CV for robust hyperparameter selection")
print("This will take 15-25 minutes...\n")

# Create study
study_embedding = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=SEED),
    study_name='embedding_classifier_kfold_optimization'
)

# Optimize with k-fold CV
study_embedding.optimize(objective_embedding_kfold, n_trials=12, show_progress_bar=True)

print("\n" + "="*70)
print("K-FOLD CV OPTIMIZATION COMPLETE - E5 Embedding Classifier")
print("="*70)
print(f"Best mean CV accuracy: {study_embedding.best_value:.2f}%")
print(f"\nBest hyperparameters:")
for key, value in study_embedding.best_params.items():
    print(f"  {key}: {value}")

# Save study
os.makedirs('models/e5_classifier', exist_ok=True)
with open('models/e5_classifier/optuna_study_kfold.pkl', 'wb') as f:
    pickle.dump(study_embedding, f)
print("\n✓ K-Fold CV optimization results saved")

[I 2025-11-10 01:32:34,384] A new study created in memory with name: embedding_classifier_kfold_optimization


Starting K-Fold CV hyperparameter optimization for E5 classifier...
Using 3-fold CV for robust hyperparameter selection
This will take 15-25 minutes...



  0%|          | 0/12 [00:00<?, ?it/s]


[Trial 0] Testing: hidden=128, dropout=0.485, lr=2.91e-03, batch=32
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 76.76it/s, loss=0.1688]
Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 76.76it/s, loss=0.1688]


Epoch 1: Train Loss=0.2164, Train Acc=91.72% | Val Loss=0.1602, Val Acc=93.77%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:24<00:00, 50.40it/s, loss=0.2907]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:24<00:00, 50.40it/s, loss=0.2907]


Epoch 2: Train Loss=0.2223, Train Acc=91.31% | Val Loss=0.1519, Val Acc=94.19%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:20<00:00, 60.39it/s, loss=0.2041]



Epoch 3: Train Loss=0.2234, Train Acc=91.25% | Val Loss=0.1600, Val Acc=94.05%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:27<00:00, 44.86it/s, loss=0.1895]
Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:27<00:00, 44.86it/s, loss=0.1895]


Epoch 4: Train Loss=0.2102, Train Acc=91.66% | Val Loss=0.1620, Val Acc=93.50%

Early stopping triggered after 4 epochs
Acc=94.19%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 67.44it/s, loss=0.1271]



Epoch 1: Train Loss=0.2200, Train Acc=91.66% | Val Loss=0.1678, Val Acc=93.97%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 72.01it/s, loss=0.1973]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 72.01it/s, loss=0.1973]


Epoch 2: Train Loss=0.2123, Train Acc=91.91% | Val Loss=0.1613, Val Acc=93.94%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.33it/s, loss=0.1679]
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.33it/s, loss=0.1679]


Epoch 3: Train Loss=0.2093, Train Acc=91.92% | Val Loss=0.1504, Val Acc=94.19%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:19<00:00, 65.57it/s, loss=0.2427]
Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:19<00:00, 65.57it/s, loss=0.2427]


Epoch 4: Train Loss=0.2070, Train Acc=92.11% | Val Loss=0.1594, Val Acc=93.77%


Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 79.15it/s, loss=0.2191]
Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 79.15it/s, loss=0.2191]


Epoch 5: Train Loss=0.2067, Train Acc=91.99% | Val Loss=0.1668, Val Acc=92.94%

Early stopping triggered after 5 epochs
Acc=94.19%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:23<00:00, 53.41it/s, loss=0.2113]



Epoch 1: Train Loss=0.2268, Train Acc=91.22% | Val Loss=0.1802, Val Acc=92.96%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:24<00:00, 50.11it/s, loss=0.1604]



Epoch 2: Train Loss=0.2134, Train Acc=91.44% | Val Loss=0.1637, Val Acc=93.92%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:16<00:00, 75.70it/s, loss=0.3366]



Epoch 3: Train Loss=0.2118, Train Acc=91.42% | Val Loss=0.1601, Val Acc=93.84%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:19<00:00, 64.41it/s, loss=0.1643]
Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:19<00:00, 64.41it/s, loss=0.1643]
Best trial: 0. Best value: 94.1:   8%|▊         | 1/12 [05:18<58:26, 318.74s/it]

Epoch 4: Train Loss=0.2258, Train Acc=90.32% | Val Loss=0.1645, Val Acc=93.72%

Early stopping triggered after 4 epochs
Acc=93.92%
  → Mean CV Accuracy: 94.10% (±0.13%)
[I 2025-11-10 01:37:53,130] Trial 0 finished with value: 94.10000000000001 and parameters: {'hidden_dim': 128, 'dropout_rate': 0.4852142919229748, 'learning_rate': 0.0029106359131330704, 'batch_size': 32}. Best is trial 0 with value: 94.10000000000001.

[Trial 1] Testing: hidden=64, dropout=0.460, lr=1.59e-03, batch=128
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 50.09it/s, loss=0.2467]



Epoch 1: Train Loss=0.1927, Train Acc=92.71% | Val Loss=0.1603, Val Acc=93.92%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.99it/s, loss=0.2075]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.99it/s, loss=0.2075]


Epoch 2: Train Loss=0.1757, Train Acc=93.44% | Val Loss=0.1511, Val Acc=94.32%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.25it/s, loss=0.1285]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.25it/s, loss=0.1285]


Epoch 3: Train Loss=0.1752, Train Acc=93.48% | Val Loss=0.1529, Val Acc=94.21%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 43.44it/s, loss=0.1410]



Epoch 4: Train Loss=0.1713, Train Acc=93.60% | Val Loss=0.1510, Val Acc=94.18%

Early stopping triggered after 4 epochs
Acc=94.32%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 40.32it/s, loss=0.1630]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 40.32it/s, loss=0.1630]


Epoch 1: Train Loss=0.1963, Train Acc=92.61% | Val Loss=0.1608, Val Acc=93.99%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.70it/s, loss=0.1507]



Epoch 2: Train Loss=0.1796, Train Acc=93.40% | Val Loss=0.1503, Val Acc=94.09%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 68.04it/s, loss=0.1041]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 68.04it/s, loss=0.1041]


Epoch 3: Train Loss=0.1766, Train Acc=93.51% | Val Loss=0.1525, Val Acc=94.23%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 57.76it/s, loss=0.1939]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 57.76it/s, loss=0.1939]


Epoch 4: Train Loss=0.1771, Train Acc=93.47% | Val Loss=0.1495, Val Acc=94.17%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 66.77it/s, loss=0.1926]



Epoch 5: Train Loss=0.1730, Train Acc=93.68% | Val Loss=0.1529, Val Acc=93.96%

Early stopping triggered after 5 epochs
Acc=94.23%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 77.63it/s, loss=0.1937]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 77.63it/s, loss=0.1937]


Epoch 1: Train Loss=0.1931, Train Acc=92.66% | Val Loss=0.1584, Val Acc=94.11%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 92.61it/s, loss=0.1777] 
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 92.61it/s, loss=0.1777] 


Epoch 2: Train Loss=0.1696, Train Acc=93.53% | Val Loss=0.1562, Val Acc=94.03%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.51it/s, loss=0.1877]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.51it/s, loss=0.1877]
Best trial: 1. Best value: 94.2217:  17%|█▋        | 2/12 [06:40<29:52, 179.27s/it]

Epoch 3: Train Loss=0.1695, Train Acc=93.59% | Val Loss=0.1654, Val Acc=93.53%

Early stopping triggered after 3 epochs
Acc=94.11%
  → Mean CV Accuracy: 94.22% (±0.09%)
[I 2025-11-10 01:39:14,759] Trial 1 finished with value: 94.22166666666668 and parameters: {'hidden_dim': 64, 'dropout_rate': 0.45985284373248053, 'learning_rate': 0.0015930522616241021, 'batch_size': 128}. Best is trial 1 with value: 94.22166666666668.

[Trial 2] Testing: hidden=224, dropout=0.264, lr=2.31e-04, batch=128
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 68.27it/s, loss=0.3238]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 68.27it/s, loss=0.3238]


Epoch 1: Train Loss=0.1953, Train Acc=92.47% | Val Loss=0.1778, Val Acc=93.05%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 86.04it/s, loss=0.2358]



Epoch 2: Train Loss=0.1615, Train Acc=93.87% | Val Loss=0.1574, Val Acc=94.16%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.55it/s, loss=0.1507]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.55it/s, loss=0.1507]


Epoch 3: Train Loss=0.1566, Train Acc=94.02% | Val Loss=0.1530, Val Acc=93.98%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 74.73it/s, loss=0.1738]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 74.73it/s, loss=0.1738]


Epoch 4: Train Loss=0.1538, Train Acc=94.09% | Val Loss=0.1541, Val Acc=93.98%

Early stopping triggered after 4 epochs
Acc=94.16%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.42it/s, loss=0.0938]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 53.42it/s, loss=0.0938]


Epoch 1: Train Loss=0.1955, Train Acc=92.52% | Val Loss=0.1556, Val Acc=93.94%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 50.47it/s, loss=0.1639]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 50.47it/s, loss=0.1639]


Epoch 2: Train Loss=0.1627, Train Acc=93.86% | Val Loss=0.1556, Val Acc=93.97%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 102.07it/s, loss=0.1331]



Epoch 3: Train Loss=0.1582, Train Acc=94.12% | Val Loss=0.1509, Val Acc=94.05%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 63.50it/s, loss=0.1475]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 63.50it/s, loss=0.1475]


Epoch 4: Train Loss=0.1549, Train Acc=94.08% | Val Loss=0.1493, Val Acc=94.24%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 67.66it/s, loss=0.1588]
Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 67.66it/s, loss=0.1588]


Epoch 5: Train Loss=0.1518, Train Acc=94.17% | Val Loss=0.1523, Val Acc=93.95%


Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 66.60it/s, loss=0.1280]
Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 66.60it/s, loss=0.1280]


Epoch 6: Train Loss=0.1517, Train Acc=94.26% | Val Loss=0.1487, Val Acc=94.20%

Early stopping triggered after 6 epochs
Acc=94.24%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 63.76it/s, loss=0.0893]



Epoch 1: Train Loss=0.1930, Train Acc=92.48% | Val Loss=0.1608, Val Acc=93.90%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.59it/s, loss=0.1400]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.59it/s, loss=0.1400]


Epoch 2: Train Loss=0.1593, Train Acc=93.92% | Val Loss=0.1673, Val Acc=93.48%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.25it/s, loss=0.1711]



Epoch 3: Train Loss=0.1568, Train Acc=93.99% | Val Loss=0.1563, Val Acc=94.08%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 54.00it/s, loss=0.1513]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 54.00it/s, loss=0.1513]


Epoch 4: Train Loss=0.1528, Train Acc=94.14% | Val Loss=0.1550, Val Acc=94.09%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 86.70it/s, loss=0.1482]
Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 86.70it/s, loss=0.1482]


Epoch 5: Train Loss=0.1508, Train Acc=94.26% | Val Loss=0.1553, Val Acc=94.12%


Epoch 6/10 [Train]:   0%|          | 0/312 [00:00<?, ?it/s][A
Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 54.89it/s, loss=0.1523]


Epoch 6: Train Loss=0.1488, Train Acc=94.20% | Val Loss=0.1537, Val Acc=94.19%


Epoch 7/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 62.86it/s, loss=0.1084]
Epoch 7/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 62.86it/s, loss=0.1084]


Epoch 7: Train Loss=0.1478, Train Acc=94.33% | Val Loss=0.1543, Val Acc=94.19%


Epoch 8/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 41.07it/s, loss=0.1075]
Epoch 8/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 41.07it/s, loss=0.1075]


Epoch 8: Train Loss=0.1449, Train Acc=94.30% | Val Loss=0.1539, Val Acc=94.20%


Epoch 9/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 48.99it/s, loss=0.3749]
Epoch 9/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 48.99it/s, loss=0.3749]


Epoch 9: Train Loss=0.1464, Train Acc=94.27% | Val Loss=0.1562, Val Acc=94.09%


Epoch 10/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 38.01it/s, loss=0.1129]
Epoch 10/10 [Train]: 100%|██████████| 312/312 [00:08<00:00, 38.01it/s, loss=0.1129]
Best trial: 1. Best value: 94.2217:  25%|██▌       | 3/12 [08:49<23:27, 156.38s/it]

Epoch 10: Train Loss=0.1433, Train Acc=94.44% | Val Loss=0.1560, Val Acc=94.20%
Acc=94.20%
  → Mean CV Accuracy: 94.20% (±0.03%)
[I 2025-11-10 01:41:23,917] Trial 2 finished with value: 94.20166666666665 and parameters: {'hidden_dim': 224, 'dropout_rate': 0.26370173320348284, 'learning_rate': 0.0002310201887845295, 'batch_size': 128}. Best is trial 1 with value: 94.22166666666668.

[Trial 3] Testing: hidden=160, dropout=0.287, lr=1.67e-03, batch=128
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.58it/s, loss=0.2324]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.58it/s, loss=0.2324]


Epoch 1: Train Loss=0.1787, Train Acc=93.09% | Val Loss=0.1710, Val Acc=93.11%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 83.56it/s, loss=0.1498]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 83.56it/s, loss=0.1498]


Epoch 2: Train Loss=0.1674, Train Acc=93.72% | Val Loss=0.1576, Val Acc=93.89%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 93.19it/s, loss=0.1722]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 93.19it/s, loss=0.1722]


Epoch 3: Train Loss=0.1649, Train Acc=93.76% | Val Loss=0.1733, Val Acc=92.93%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 89.49it/s, loss=0.1571]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 89.49it/s, loss=0.1571]


Epoch 4: Train Loss=0.1630, Train Acc=93.81% | Val Loss=0.1498, Val Acc=94.22%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 57.41it/s, loss=0.1666] 
Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 57.41it/s, loss=0.1666] 


Epoch 5: Train Loss=0.1634, Train Acc=93.82% | Val Loss=0.1515, Val Acc=94.20%


Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 55.07it/s, loss=0.2132]



Epoch 6: Train Loss=0.1636, Train Acc=93.74% | Val Loss=0.1500, Val Acc=94.34%


Epoch 7/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 47.46it/s, loss=0.1438]
Epoch 7/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 47.46it/s, loss=0.1438]


Epoch 7: Train Loss=0.1602, Train Acc=93.95% | Val Loss=0.1515, Val Acc=94.17%


Epoch 8/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 49.98it/s, loss=0.1367]
Epoch 8/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 49.98it/s, loss=0.1367]


Epoch 8: Train Loss=0.1605, Train Acc=94.00% | Val Loss=0.1488, Val Acc=94.29%

Early stopping triggered after 8 epochs
Acc=94.34%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 59.28it/s, loss=0.1546]



Epoch 1: Train Loss=0.1795, Train Acc=93.28% | Val Loss=0.1562, Val Acc=93.97%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.64it/s, loss=0.2478]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 45.64it/s, loss=0.2478]


Epoch 2: Train Loss=0.1673, Train Acc=93.65% | Val Loss=0.2037, Val Acc=91.27%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 52.82it/s, loss=0.1496]



Epoch 3: Train Loss=0.1686, Train Acc=93.56% | Val Loss=0.1582, Val Acc=93.93%

Early stopping triggered after 3 epochs
Acc=93.97%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.75it/s, loss=0.1531] 
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.75it/s, loss=0.1531] 


Epoch 1: Train Loss=0.1789, Train Acc=93.08% | Val Loss=0.2096, Val Acc=91.58%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 81.85it/s, loss=0.1178]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 81.85it/s, loss=0.1178]


Epoch 2: Train Loss=0.1631, Train Acc=93.71% | Val Loss=0.1569, Val Acc=94.11%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 79.90it/s, loss=0.2067]



Epoch 3: Train Loss=0.1617, Train Acc=93.86% | Val Loss=0.1550, Val Acc=94.06%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 61.02it/s, loss=0.2805]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 61.02it/s, loss=0.2805]
Best trial: 1. Best value: 94.2217:  33%|███▎      | 4/12 [10:20<17:25, 130.68s/it]

Epoch 4: Train Loss=0.1613, Train Acc=93.80% | Val Loss=0.1562, Val Acc=94.11%

Early stopping triggered after 4 epochs
Acc=94.11%
  → Mean CV Accuracy: 94.14% (±0.15%)
[I 2025-11-10 01:42:55,188] Trial 3 finished with value: 94.14 and parameters: {'hidden_dim': 160, 'dropout_rate': 0.2873687420594126, 'learning_rate': 0.0016738085788752138, 'batch_size': 128}. Best is trial 1 with value: 94.22166666666668.

[Trial 4] Testing: hidden=160, dropout=0.436, lr=2.51e-04, batch=64
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 54.32it/s, loss=0.2298]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 54.32it/s, loss=0.2298]


Epoch 1: Train Loss=0.1970, Train Acc=92.56% | Val Loss=0.1565, Val Acc=94.06%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.16it/s, loss=0.1923]



Epoch 2: Train Loss=0.1699, Train Acc=93.50% | Val Loss=0.1699, Val Acc=93.14%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 69.03it/s, loss=0.1716]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 69.03it/s, loss=0.1716]


Epoch 3: Train Loss=0.1669, Train Acc=93.65% | Val Loss=0.1515, Val Acc=94.39%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.37it/s, loss=0.1313] 
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.37it/s, loss=0.1313] 


Epoch 4: Train Loss=0.1641, Train Acc=93.82% | Val Loss=0.1500, Val Acc=94.20%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.77it/s, loss=0.2283]



Epoch 5: Train Loss=0.1633, Train Acc=93.83% | Val Loss=0.1543, Val Acc=93.94%

Early stopping triggered after 5 epochs
Acc=94.39%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 83.47it/s, loss=0.1329]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 83.47it/s, loss=0.1329]


Epoch 1: Train Loss=0.1958, Train Acc=92.73% | Val Loss=0.1563, Val Acc=94.00%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 79.36it/s, loss=0.1427]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 79.36it/s, loss=0.1427]


Epoch 2: Train Loss=0.1694, Train Acc=93.77% | Val Loss=0.1530, Val Acc=94.05%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.88it/s, loss=0.1117]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.88it/s, loss=0.1117]


Epoch 3: Train Loss=0.1661, Train Acc=93.75% | Val Loss=0.1514, Val Acc=94.18%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 87.38it/s, loss=0.0918] 



Epoch 4: Train Loss=0.1625, Train Acc=93.90% | Val Loss=0.1511, Val Acc=94.16%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.36it/s, loss=0.3047]



Epoch 5: Train Loss=0.1629, Train Acc=93.82% | Val Loss=0.1506, Val Acc=94.16%

Early stopping triggered after 5 epochs
Acc=94.18%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 80.86it/s, loss=0.1688]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 80.86it/s, loss=0.1688]


Epoch 1: Train Loss=0.1953, Train Acc=92.32% | Val Loss=0.1610, Val Acc=94.00%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 53.81it/s, loss=0.1374]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 53.81it/s, loss=0.1374]


Epoch 2: Train Loss=0.1700, Train Acc=93.42% | Val Loss=0.1579, Val Acc=94.06%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 84.75it/s, loss=0.0874]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 84.75it/s, loss=0.0874]


Epoch 3: Train Loss=0.1634, Train Acc=93.78% | Val Loss=0.1577, Val Acc=93.96%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 61.31it/s, loss=0.0958] 
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 61.31it/s, loss=0.0958] 


Epoch 4: Train Loss=0.1600, Train Acc=93.94% | Val Loss=0.1545, Val Acc=94.19%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 52.84it/s, loss=0.1249]



Epoch 5: Train Loss=0.1605, Train Acc=93.93% | Val Loss=0.1613, Val Acc=93.73%


Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.53it/s, loss=0.1040]
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.53it/s, loss=0.1040]
Best trial: 4. Best value: 94.255:  42%|████▏     | 5/12 [13:14<17:03, 146.19s/it] 

Epoch 6: Train Loss=0.1571, Train Acc=93.90% | Val Loss=0.1550, Val Acc=94.17%

Early stopping triggered after 6 epochs
Acc=94.19%
  → Mean CV Accuracy: 94.25% (±0.10%)
[I 2025-11-10 01:45:48,879] Trial 4 finished with value: 94.255 and parameters: {'hidden_dim': 160, 'dropout_rate': 0.43555278841790407, 'learning_rate': 0.00025081156860452336, 'batch_size': 64}. Best is trial 4 with value: 94.255.

[Trial 5] Testing: hidden=192, dropout=0.251, lr=1.35e-04, batch=64
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 49.88it/s, loss=0.0798]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 49.88it/s, loss=0.0798]


Epoch 1: Train Loss=0.2041, Train Acc=92.24% | Val Loss=0.1565, Val Acc=94.08%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 101.27it/s, loss=0.1539]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 101.27it/s, loss=0.1539]


Epoch 2: Train Loss=0.1668, Train Acc=93.67% | Val Loss=0.1536, Val Acc=94.14%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 70.79it/s, loss=0.2117]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 70.79it/s, loss=0.2117]


Epoch 3: Train Loss=0.1599, Train Acc=93.92% | Val Loss=0.1604, Val Acc=93.75%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 89.20it/s, loss=0.1410] 
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 89.20it/s, loss=0.1410] 


Epoch 4: Train Loss=0.1586, Train Acc=93.96% | Val Loss=0.1510, Val Acc=94.39%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 86.91it/s, loss=0.0787]
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 86.91it/s, loss=0.0787]


Epoch 5: Train Loss=0.1562, Train Acc=94.04% | Val Loss=0.1497, Val Acc=94.37%


Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 48.44it/s, loss=0.1289]
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 48.44it/s, loss=0.1289]


Epoch 6: Train Loss=0.1554, Train Acc=94.06% | Val Loss=0.1508, Val Acc=94.14%

Early stopping triggered after 6 epochs
Acc=94.39%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 60.33it/s, loss=0.3206]



Epoch 1: Train Loss=0.1967, Train Acc=92.53% | Val Loss=0.1551, Val Acc=93.98%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 91.15it/s, loss=0.2699]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 91.15it/s, loss=0.2699]


Epoch 2: Train Loss=0.1656, Train Acc=93.62% | Val Loss=0.1510, Val Acc=94.16%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 83.14it/s, loss=0.2234]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 83.14it/s, loss=0.2234]


Epoch 3: Train Loss=0.1599, Train Acc=93.85% | Val Loss=0.1509, Val Acc=94.11%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 91.12it/s, loss=0.0896] 
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 91.12it/s, loss=0.0896] 


Epoch 4: Train Loss=0.1582, Train Acc=94.06% | Val Loss=0.1500, Val Acc=94.14%

Early stopping triggered after 4 epochs
Acc=94.16%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.33it/s, loss=0.1043]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.33it/s, loss=0.1043]


Epoch 1: Train Loss=0.1970, Train Acc=92.39% | Val Loss=0.1691, Val Acc=93.56%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:05<00:00, 106.73it/s, loss=0.1301]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:05<00:00, 106.73it/s, loss=0.1301]


Epoch 2: Train Loss=0.1655, Train Acc=93.70% | Val Loss=0.1574, Val Acc=94.14%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.24it/s, loss=0.1629]



Epoch 3: Train Loss=0.1575, Train Acc=94.03% | Val Loss=0.1579, Val Acc=94.13%


Epoch 4/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.44it/s, loss=0.1325]
Best trial: 4. Best value: 94.255:  50%|█████     | 6/12 [15:38<14:32, 145.36s/it]

Epoch 4: Train Loss=0.1551, Train Acc=94.06% | Val Loss=0.1562, Val Acc=94.11%

Early stopping triggered after 4 epochs
Acc=94.14%
  → Mean CV Accuracy: 94.23% (±0.12%)
[I 2025-11-10 01:48:12,612] Trial 5 finished with value: 94.23166666666667 and parameters: {'hidden_dim': 192, 'dropout_rate': 0.2511572371061875, 'learning_rate': 0.00013492834268013249, 'batch_size': 64}. Best is trial 4 with value: 94.255.

[Trial 6] Testing: hidden=128, dropout=0.229, lr=2.34e-03, batch=128
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 80.35it/s, loss=0.1200]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 80.35it/s, loss=0.1200]


Epoch 1: Train Loss=0.1780, Train Acc=93.20% | Val Loss=0.1693, Val Acc=93.47%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 93.58it/s, loss=0.2621] 
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 93.58it/s, loss=0.2621] 


Epoch 2: Train Loss=0.1698, Train Acc=93.50% | Val Loss=0.1504, Val Acc=94.19%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 92.32it/s, loss=0.0843]



Epoch 3: Train Loss=0.1650, Train Acc=93.69% | Val Loss=0.1501, Val Acc=94.32%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 80.76it/s, loss=0.1975]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 80.76it/s, loss=0.1975]


Epoch 4: Train Loss=0.1614, Train Acc=93.93% | Val Loss=0.1511, Val Acc=94.19%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 79.85it/s, loss=0.1172]
Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 79.85it/s, loss=0.1172]


Epoch 5: Train Loss=0.1596, Train Acc=94.05% | Val Loss=0.1526, Val Acc=93.97%

Early stopping triggered after 5 epochs
Acc=94.32%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.91it/s, loss=0.1383]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 72.91it/s, loss=0.1383]


Epoch 1: Train Loss=0.1781, Train Acc=93.22% | Val Loss=0.1631, Val Acc=93.69%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.37it/s, loss=0.1688]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.37it/s, loss=0.1688]


Epoch 2: Train Loss=0.1679, Train Acc=93.63% | Val Loss=0.1523, Val Acc=94.07%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.80it/s, loss=0.2127]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 75.80it/s, loss=0.2127]


Epoch 3: Train Loss=0.1634, Train Acc=93.83% | Val Loss=0.1614, Val Acc=93.78%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 83.49it/s, loss=0.1370]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 83.49it/s, loss=0.1370]


Epoch 4: Train Loss=0.1610, Train Acc=93.82% | Val Loss=0.1514, Val Acc=94.22%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.03it/s, loss=0.1152]
Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.03it/s, loss=0.1152]


Epoch 5: Train Loss=0.1604, Train Acc=93.92% | Val Loss=0.1501, Val Acc=94.11%


Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 77.75it/s, loss=0.1417]



Epoch 6: Train Loss=0.1601, Train Acc=94.07% | Val Loss=0.1528, Val Acc=93.90%

Early stopping triggered after 6 epochs
Acc=94.22%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 63.83it/s, loss=0.1640]



Epoch 1: Train Loss=0.1752, Train Acc=93.23% | Val Loss=0.1619, Val Acc=93.81%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 55.44it/s, loss=0.1829]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 55.44it/s, loss=0.1829]


Epoch 2: Train Loss=0.1627, Train Acc=93.76% | Val Loss=0.1661, Val Acc=93.67%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 43.67it/s, loss=0.1577]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 43.67it/s, loss=0.1577]
Best trial: 4. Best value: 94.255:  58%|█████▊    | 7/12 [16:56<10:17, 123.52s/it]

Epoch 3: Train Loss=0.1645, Train Acc=93.74% | Val Loss=0.1618, Val Acc=93.62%

Early stopping triggered after 3 epochs
Acc=93.81%
  → Mean CV Accuracy: 94.12% (±0.22%)
[I 2025-11-10 01:49:31,168] Trial 6 finished with value: 94.11666666666667 and parameters: {'hidden_dim': 128, 'dropout_rate': 0.22930163420191518, 'learning_rate': 0.0023359635026261607, 'batch_size': 128}. Best is trial 4 with value: 94.255.

[Trial 7] Testing: hidden=64, dropout=0.473, lr=3.29e-04, batch=32
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 66.84it/s, loss=0.3579]



Epoch 1: Train Loss=0.2095, Train Acc=92.33% | Val Loss=0.1523, Val Acc=94.23%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:23<00:00, 52.71it/s, loss=0.2171]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:23<00:00, 52.71it/s, loss=0.2171]


Epoch 2: Train Loss=0.1817, Train Acc=93.37% | Val Loss=0.1531, Val Acc=94.25%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:22<00:00, 56.46it/s, loss=0.1004]



Epoch 3: Train Loss=0.1805, Train Acc=93.30% | Val Loss=0.1501, Val Acc=94.22%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 78.48it/s, loss=0.0365]



Epoch 4: Train Loss=0.1795, Train Acc=93.19% | Val Loss=0.1516, Val Acc=94.28%


Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:23<00:00, 54.21it/s, loss=0.1165]
Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:23<00:00, 54.21it/s, loss=0.1165]


Epoch 5: Train Loss=0.1786, Train Acc=93.38% | Val Loss=0.1520, Val Acc=93.97%


Epoch 6/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 68.09it/s, loss=0.1505]
Epoch 6/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 68.09it/s, loss=0.1505]


Epoch 6: Train Loss=0.1757, Train Acc=93.37% | Val Loss=0.1510, Val Acc=94.28%

Early stopping triggered after 6 epochs
Acc=94.28%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 67.75it/s, loss=0.4082]



Epoch 1: Train Loss=0.2077, Train Acc=92.37% | Val Loss=0.1555, Val Acc=94.00%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.69it/s, loss=0.3815]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:21<00:00, 58.69it/s, loss=0.3815]


Epoch 2: Train Loss=0.1855, Train Acc=93.20% | Val Loss=0.1610, Val Acc=93.69%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:19<00:00, 63.51it/s, loss=0.1956]
Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:19<00:00, 63.51it/s, loss=0.1956]


Epoch 3: Train Loss=0.1823, Train Acc=93.28% | Val Loss=0.1520, Val Acc=94.05%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 78.64it/s, loss=0.1393]



Epoch 4: Train Loss=0.1803, Train Acc=93.32% | Val Loss=0.1492, Val Acc=94.19%


Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:25<00:00, 49.21it/s, loss=0.1233]
Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:25<00:00, 49.21it/s, loss=0.1233]


Epoch 5: Train Loss=0.1782, Train Acc=93.50% | Val Loss=0.1572, Val Acc=93.67%


Epoch 6/10 [Train]: 100%|██████████| 1250/1250 [00:20<00:00, 61.85it/s, loss=0.0995]



Epoch 6: Train Loss=0.1755, Train Acc=93.46% | Val Loss=0.1507, Val Acc=94.03%

Early stopping triggered after 6 epochs
Acc=94.19%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 1250/1250 [00:24<00:00, 51.38it/s, loss=0.0651]



Epoch 1: Train Loss=0.2112, Train Acc=92.14% | Val Loss=0.1621, Val Acc=93.78%


Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 83.11it/s, loss=0.3259]
Epoch 2/10 [Train]: 100%|██████████| 1250/1250 [00:15<00:00, 83.11it/s, loss=0.3259]


Epoch 2: Train Loss=0.1816, Train Acc=93.09% | Val Loss=0.1617, Val Acc=94.02%


Epoch 3/10 [Train]: 100%|██████████| 1250/1250 [00:20<00:00, 59.70it/s, loss=0.2258]



Epoch 3: Train Loss=0.1779, Train Acc=93.31% | Val Loss=0.1560, Val Acc=94.15%


Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 69.95it/s, loss=0.0943]
Epoch 4/10 [Train]: 100%|██████████| 1250/1250 [00:17<00:00, 69.95it/s, loss=0.0943]


Epoch 4: Train Loss=0.1762, Train Acc=93.42% | Val Loss=0.1621, Val Acc=93.72%


Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 65.84it/s, loss=0.1854]
Epoch 5/10 [Train]: 100%|██████████| 1250/1250 [00:18<00:00, 65.84it/s, loss=0.1854]
Best trial: 4. Best value: 94.255:  67%|██████▋   | 8/12 [23:43<14:15, 213.79s/it]

Epoch 5: Train Loss=0.1754, Train Acc=93.30% | Val Loss=0.1590, Val Acc=94.06%

Early stopping triggered after 5 epochs
Acc=94.15%
  → Mean CV Accuracy: 94.21% (±0.06%)
[I 2025-11-10 01:56:18,256] Trial 7 finished with value: 94.21 and parameters: {'hidden_dim': 64, 'dropout_rate': 0.47279612062363463, 'learning_rate': 0.00032927591344236165, 'batch_size': 32}. Best is trial 4 with value: 94.255.

[Trial 8] Testing: hidden=160, dropout=0.255, lr=8.69e-03, batch=64
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 84.73it/s, loss=0.1245]



Epoch 1: Train Loss=0.1914, Train Acc=92.67% | Val Loss=0.1625, Val Acc=93.77%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.41it/s, loss=0.2599]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.41it/s, loss=0.2599]


Epoch 2: Train Loss=0.1843, Train Acc=92.94% | Val Loss=0.1777, Val Acc=93.10%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 52.04it/s, loss=0.2637]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 52.04it/s, loss=0.2637]


Epoch 3: Train Loss=0.1800, Train Acc=93.22% | Val Loss=0.1572, Val Acc=93.99%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.66it/s, loss=0.1936]



Epoch 4: Train Loss=0.1811, Train Acc=93.18% | Val Loss=0.1492, Val Acc=94.24%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 71.44it/s, loss=0.2407]
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 71.44it/s, loss=0.2407]


Epoch 5: Train Loss=0.1757, Train Acc=93.31% | Val Loss=0.1549, Val Acc=94.23%


Epoch 6/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 90.44it/s, loss=0.1646]


Epoch 6: Train Loss=0.1750, Train Acc=93.35% | Val Loss=0.1530, Val Acc=93.92%

Early stopping triggered after 6 epochs
Acc=94.24%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 86.55it/s, loss=0.0718]



Epoch 1: Train Loss=0.1912, Train Acc=92.63% | Val Loss=0.1785, Val Acc=93.04%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 77.29it/s, loss=0.0999]



Epoch 2: Train Loss=0.1826, Train Acc=93.12% | Val Loss=0.1546, Val Acc=94.18%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.42it/s, loss=0.1213]



Epoch 3: Train Loss=0.1777, Train Acc=93.28% | Val Loss=0.1612, Val Acc=93.63%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.03it/s, loss=0.1665]
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.03it/s, loss=0.1665]


Epoch 4: Train Loss=0.1726, Train Acc=93.48% | Val Loss=0.1702, Val Acc=92.89%

Early stopping triggered after 4 epochs
Acc=94.18%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 59.03it/s, loss=0.1602]



Epoch 1: Train Loss=0.1903, Train Acc=92.50% | Val Loss=0.1705, Val Acc=93.58%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.73it/s, loss=0.2064]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.73it/s, loss=0.2064]


Epoch 2: Train Loss=0.1796, Train Acc=93.00% | Val Loss=0.1648, Val Acc=94.19%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 56.70it/s, loss=0.2881]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 56.70it/s, loss=0.2881]


Epoch 3: Train Loss=0.1813, Train Acc=93.19% | Val Loss=0.1582, Val Acc=94.08%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 49.86it/s, loss=0.3474]
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 49.86it/s, loss=0.3474]
Best trial: 4. Best value: 94.255:  75%|███████▌  | 9/12 [26:15<09:43, 194.51s/it]

Epoch 4: Train Loss=0.1784, Train Acc=93.02% | Val Loss=0.1571, Val Acc=94.09%

Early stopping triggered after 4 epochs
Acc=94.19%
  → Mean CV Accuracy: 94.20% (±0.03%)
[I 2025-11-10 01:58:50,352] Trial 8 finished with value: 94.20333333333333 and parameters: {'hidden_dim': 160, 'dropout_rate': 0.2554563366576581, 'learning_rate': 0.00869299151113955, 'batch_size': 64}. Best is trial 4 with value: 94.255.

[Trial 9] Testing: hidden=192, dropout=0.477, lr=1.50e-04, batch=128
  Fold 1/3... 

Epoch 1/10 [Train]:   0%|          | 0/312 [00:00<?, ?it/s][A
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:07<00:00, 40.00it/s, loss=0.1516]


Epoch 1: Train Loss=0.2241, Train Acc=91.46% | Val Loss=0.1610, Val Acc=93.93%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 79.89it/s, loss=0.1151] 
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 79.89it/s, loss=0.1151] 


Epoch 2: Train Loss=0.1681, Train Acc=93.74% | Val Loss=0.1549, Val Acc=94.17%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 58.74it/s, loss=0.1397]
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:05<00:00, 58.74it/s, loss=0.1397]


Epoch 3: Train Loss=0.1631, Train Acc=93.76% | Val Loss=0.1530, Val Acc=94.21%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 46.02it/s, loss=0.3318]



Epoch 4: Train Loss=0.1609, Train Acc=93.92% | Val Loss=0.1625, Val Acc=93.81%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 46.92it/s, loss=0.1987]



Epoch 5: Train Loss=0.1582, Train Acc=94.00% | Val Loss=0.1493, Val Acc=94.39%


Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 44.89it/s, loss=0.1269]
Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 44.89it/s, loss=0.1269]


Epoch 6: Train Loss=0.1555, Train Acc=94.06% | Val Loss=0.1488, Val Acc=94.28%


Epoch 7/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 46.63it/s, loss=0.1516]
Epoch 7/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 46.63it/s, loss=0.1516]


Epoch 7: Train Loss=0.1546, Train Acc=94.11% | Val Loss=0.1494, Val Acc=94.36%

Early stopping triggered after 7 epochs
Acc=94.39%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 73.14it/s, loss=0.1928] 
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 73.14it/s, loss=0.1928] 


Epoch 1: Train Loss=0.2253, Train Acc=91.31% | Val Loss=0.1628, Val Acc=93.86%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.86it/s, loss=0.1649]
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 70.86it/s, loss=0.1649]


Epoch 2: Train Loss=0.1692, Train Acc=93.66% | Val Loss=0.1561, Val Acc=93.97%


Epoch 3/10 [Train]:   0%|          | 0/312 [00:00<?, ?it/s][A
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:06<00:00, 46.51it/s, loss=0.4632]


Epoch 3: Train Loss=0.1636, Train Acc=93.88% | Val Loss=0.1576, Val Acc=93.80%


Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 74.43it/s, loss=0.1458]
Epoch 4/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 74.43it/s, loss=0.1458]


Epoch 4: Train Loss=0.1609, Train Acc=93.99% | Val Loss=0.1516, Val Acc=94.25%


Epoch 5/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.66it/s, loss=0.1101]



Epoch 5: Train Loss=0.1579, Train Acc=94.02% | Val Loss=0.1514, Val Acc=94.08%


Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.37it/s, loss=0.2080]
Epoch 6/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 69.37it/s, loss=0.2080]


Epoch 6: Train Loss=0.1561, Train Acc=94.15% | Val Loss=0.1519, Val Acc=94.11%

Early stopping triggered after 6 epochs
Acc=94.25%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 77.47it/s, loss=0.1893]
Epoch 1/10 [Train]: 100%|██████████| 312/312 [00:04<00:00, 77.47it/s, loss=0.1893]


Epoch 1: Train Loss=0.2267, Train Acc=91.28% | Val Loss=0.1656, Val Acc=94.00%


Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 78.25it/s, loss=0.1077] 
Epoch 2/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 78.25it/s, loss=0.1077] 


Epoch 2: Train Loss=0.1661, Train Acc=93.61% | Val Loss=0.1638, Val Acc=93.97%


Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 90.65it/s, loss=0.1155] 
Epoch 3/10 [Train]: 100%|██████████| 312/312 [00:03<00:00, 90.65it/s, loss=0.1155] 
Best trial: 4. Best value: 94.255:  83%|████████▎ | 10/12 [28:00<05:33, 166.71s/it]

Epoch 3: Train Loss=0.1608, Train Acc=93.84% | Val Loss=0.1594, Val Acc=93.98%

Early stopping triggered after 3 epochs
Acc=94.00%
  → Mean CV Accuracy: 94.21% (±0.16%)
[I 2025-11-10 02:00:34,815] Trial 9 finished with value: 94.21499999999999 and parameters: {'hidden_dim': 192, 'dropout_rate': 0.47656227050693506, 'learning_rate': 0.00015030900645056822, 'batch_size': 128}. Best is trial 4 with value: 94.255.

[Trial 10] Testing: hidden=256, dropout=0.402, lr=5.46e-04, batch=64
  Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 96.29it/s, loss=0.0787]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 96.29it/s, loss=0.0787]


Epoch 1: Train Loss=0.1867, Train Acc=92.94% | Val Loss=0.1552, Val Acc=94.25%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 99.50it/s, loss=0.1579]A
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 99.50it/s, loss=0.1579]


Epoch 2: Train Loss=0.1696, Train Acc=93.64% | Val Loss=0.1521, Val Acc=94.19%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 60.57it/s, loss=0.2102]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 60.57it/s, loss=0.2102]


Epoch 3: Train Loss=0.1683, Train Acc=93.66% | Val Loss=0.1509, Val Acc=94.28%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 92.48it/s, loss=0.1656]
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 92.48it/s, loss=0.1656]


Epoch 4: Train Loss=0.1663, Train Acc=93.69% | Val Loss=0.1490, Val Acc=94.38%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 87.36it/s, loss=0.1181]
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 87.36it/s, loss=0.1181]


Epoch 5: Train Loss=0.1651, Train Acc=93.68% | Val Loss=0.1491, Val Acc=94.29%


Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.15it/s, loss=0.1802]
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.15it/s, loss=0.1802]


Epoch 6: Train Loss=0.1662, Train Acc=93.73% | Val Loss=0.1510, Val Acc=94.08%

Early stopping triggered after 6 epochs
Acc=94.38%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 103.68it/s, loss=0.1237]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 103.68it/s, loss=0.1237]


Epoch 1: Train Loss=0.1856, Train Acc=93.04% | Val Loss=0.1623, Val Acc=93.65%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 65.54it/s, loss=0.1348]
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 65.54it/s, loss=0.1348]


Epoch 2: Train Loss=0.1725, Train Acc=93.54% | Val Loss=0.1547, Val Acc=94.00%


[A
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 65.37it/s, loss=0.1583]


Epoch 3: Train Loss=0.1694, Train Acc=93.58% | Val Loss=0.1525, Val Acc=94.16%


Epoch 4/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 65.37it/s, loss=0.2454]


Epoch 4: Train Loss=0.1677, Train Acc=93.72% | Val Loss=0.1538, Val Acc=93.94%


Epoch 5/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 59.55it/s, loss=0.2217]


Epoch 5: Train Loss=0.1687, Train Acc=93.69% | Val Loss=0.1492, Val Acc=94.17%


Epoch 6/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:10<00:00, 60.46it/s, loss=0.2285]


Epoch 6: Train Loss=0.1674, Train Acc=93.72% | Val Loss=0.1516, Val Acc=94.03%


[A
Epoch 7/10 [Train]: 100%|██████████| 625/625 [00:05<00:00, 118.80it/s, loss=0.1264]


Epoch 7: Train Loss=0.1653, Train Acc=93.78% | Val Loss=0.1511, Val Acc=94.06%

Early stopping triggered after 7 epochs
Acc=94.17%
  Fold 3/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 63.77it/s, loss=0.1078]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 63.77it/s, loss=0.1078]


Epoch 1: Train Loss=0.1842, Train Acc=92.93% | Val Loss=0.1639, Val Acc=93.99%


[A
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 50.27it/s, loss=0.2622]


Epoch 2: Train Loss=0.1679, Train Acc=93.69% | Val Loss=0.1619, Val Acc=93.78%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 80.34it/s, loss=0.1023]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 80.34it/s, loss=0.1023]
Best trial: 4. Best value: 94.255:  92%|█████████▏| 11/12 [30:50<02:47, 167.62s/it]

Epoch 3: Train Loss=0.1632, Train Acc=93.69% | Val Loss=0.1669, Val Acc=93.55%

Early stopping triggered after 3 epochs
Acc=93.99%
  → Mean CV Accuracy: 94.18% (±0.16%)
[I 2025-11-10 02:03:24,501] Trial 10 finished with value: 94.18 and parameters: {'hidden_dim': 256, 'dropout_rate': 0.4024391203645334, 'learning_rate': 0.0005463331403485196, 'batch_size': 64}. Best is trial 4 with value: 94.255.

[Trial 11] Testing: hidden=192, dropout=0.355, lr=1.07e-04, batch=64
  Fold 1/3...   Fold 1/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 67.04it/s, loss=0.2316]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:09<00:00, 67.04it/s, loss=0.2316]


Epoch 1: Train Loss=0.2142, Train Acc=91.93% | Val Loss=0.1580, Val Acc=94.04%


Epoch 2/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:13<00:00, 47.26it/s, loss=0.2130]


Epoch 2: Train Loss=0.1683, Train Acc=93.65% | Val Loss=0.1567, Val Acc=94.14%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 51.49it/s, loss=0.0654]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:12<00:00, 51.49it/s, loss=0.0654]


Epoch 3: Train Loss=0.1635, Train Acc=93.85% | Val Loss=0.1514, Val Acc=94.30%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 54.43it/s, loss=0.2106]
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:11<00:00, 54.43it/s, loss=0.2106]


Epoch 4: Train Loss=0.1619, Train Acc=93.71% | Val Loss=0.1516, Val Acc=94.25%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 77.66it/s, loss=0.2772] 
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 77.66it/s, loss=0.2772] 


Epoch 5: Train Loss=0.1585, Train Acc=93.89% | Val Loss=0.1492, Val Acc=94.25%

Early stopping triggered after 5 epochs
Acc=94.30%
  Fold 2/3... 

Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.53it/s, loss=0.1463]
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.53it/s, loss=0.1463]


Epoch 1: Train Loss=0.2151, Train Acc=91.59% | Val Loss=0.1574, Val Acc=93.94%


[A
Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.44it/s, loss=0.0464]


Epoch 2: Train Loss=0.1679, Train Acc=93.68% | Val Loss=0.1533, Val Acc=94.11%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.72it/s, loss=0.1601]
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 82.72it/s, loss=0.1601]


Epoch 3: Train Loss=0.1644, Train Acc=93.84% | Val Loss=0.1522, Val Acc=94.12%


Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.06it/s, loss=0.1171]A
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.06it/s, loss=0.1171]


Epoch 4: Train Loss=0.1615, Train Acc=93.95% | Val Loss=0.1509, Val Acc=94.11%


Epoch 5/10 [Train]:   0%|          | 0/625 [00:00<?, ?it/s][A
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 86.02it/s, loss=0.1280]


Epoch 5: Train Loss=0.1606, Train Acc=93.86% | Val Loss=0.1492, Val Acc=94.28%


Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 89.67it/s, loss=0.1887]
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:06<00:00, 89.67it/s, loss=0.1887]


Epoch 6: Train Loss=0.1571, Train Acc=94.12% | Val Loss=0.1491, Val Acc=94.17%


Epoch 7/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 72.26it/s, loss=0.1597]A



Epoch 7: Train Loss=0.1554, Train Acc=94.03% | Val Loss=0.1502, Val Acc=94.21%

Early stopping triggered after 7 epochs
Acc=94.28%
  Fold 3/3... 

[A
Epoch 1/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 77.59it/s, loss=0.1660]


Epoch 1: Train Loss=0.2176, Train Acc=91.60% | Val Loss=0.1636, Val Acc=93.88%


Epoch 2/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 80.39it/s, loss=0.1675][A



Epoch 2: Train Loss=0.1677, Train Acc=93.64% | Val Loss=0.1614, Val Acc=93.94%


Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.96it/s, loss=0.1652] 
Epoch 3/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 76.96it/s, loss=0.1652] 


Epoch 3: Train Loss=0.1609, Train Acc=93.97% | Val Loss=0.1567, Val Acc=94.08%


[A
Epoch 4/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 78.91it/s, loss=0.1920]


Epoch 4: Train Loss=0.1578, Train Acc=94.03% | Val Loss=0.1581, Val Acc=94.12%


Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 78.31it/s, loss=0.1753]
Epoch 5/10 [Train]: 100%|██████████| 625/625 [00:07<00:00, 78.31it/s, loss=0.1753]


Epoch 5: Train Loss=0.1581, Train Acc=93.92% | Val Loss=0.1559, Val Acc=94.11%


Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.64it/s, loss=0.1089]
Epoch 6/10 [Train]: 100%|██████████| 625/625 [00:08<00:00, 75.64it/s, loss=0.1089]
Best trial: 4. Best value: 94.255: 100%|██████████| 12/12 [34:03<00:00, 170.27s/it]

Epoch 6: Train Loss=0.1537, Train Acc=94.16% | Val Loss=0.1587, Val Acc=94.08%

Early stopping triggered after 6 epochs
Acc=94.12%
  → Mean CV Accuracy: 94.23% (±0.08%)
[I 2025-11-10 02:06:37,570] Trial 11 finished with value: 94.23333333333333 and parameters: {'hidden_dim': 192, 'dropout_rate': 0.3548125484709944, 'learning_rate': 0.00010744583847151158, 'batch_size': 64}. Best is trial 4 with value: 94.255.

K-FOLD CV OPTIMIZATION COMPLETE - E5 Embedding Classifier
Best mean CV accuracy: 94.25%

Best hyperparameters:
  hidden_dim: 160
  dropout_rate: 0.43555278841790407
  learning_rate: 0.00025081156860452336
  batch_size: 64

✓ K-Fold CV optimization results saved





In [19]:
## 3.4: Train Final E5 Classifier

print("Training final E5 classifier with best hyperparameters...\n")

best_params = study_embedding.best_params

# Create final model
final_embedding_model = EmbeddingClassifier(
    embedding_dim=X_train_embeddings.shape[1],
    hidden_dim=best_params['hidden_dim'],
    dropout_rate=best_params['dropout_rate']
).to(device)

# Create dataloaders
train_dataset = torch.utils.data.TensorDataset(
    torch.FloatTensor(X_train_embeddings),
    torch.LongTensor(train_labels)
)
val_dataset = torch.utils.data.TensorDataset(
    torch.FloatTensor(X_val_embeddings),
    torch.LongTensor(val_labels)
)

train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=best_params['batch_size'])

# Optimizer and loss
optimizer = optim.Adam(final_embedding_model.parameters(), lr=best_params['learning_rate'])
criterion = nn.CrossEntropyLoss()

# Train
history_embedding, best_val_acc = train_ffnn(
    final_embedding_model, train_loader, val_loader, optimizer, criterion,
    num_epochs=30, patience=5, device=device, save_path='models/e5_classifier/best_model.pt'
)

# Save everything
with open('models/e5_classifier/training_history.pkl', 'wb') as f:
    pickle.dump(history_embedding, f)

with open('models/e5_classifier/hyperparameters.json', 'w') as f:
    json.dump(best_params, f, indent=2)

# Save model name for later use
with open('models/e5_classifier/embedding_model_name.txt', 'w') as f:
    f.write(embedding_model_name)

print("\n" + "="*70)
print("E5 EMBEDDING CLASSIFIER TRAINING COMPLETE")
print("="*70)
print(f"Best validation accuracy: {best_val_acc:.2f}%")
print("\n✓ Model saved to: models/e5_classifier/")

Training final E5 classifier with best hyperparameters...



Epoch 1/30 [Train]: 100%|██████████| 750/750 [00:07<00:00, 99.50it/s, loss=0.1811] 
Epoch 1/30 [Train]: 100%|██████████| 750/750 [00:07<00:00, 99.50it/s, loss=0.1811] 


Epoch 1: Train Loss=0.1905, Train Acc=92.70% | Val Loss=0.1578, Val Acc=94.08%
  ✓ New best model saved to models/e5_classifier/best_model.pt (Val Acc: 94.08%)


Epoch 2/30 [Train]: 100%|██████████| 750/750 [00:07<00:00, 98.17it/s, loss=0.1213] 
Epoch 2/30 [Train]: 100%|██████████| 750/750 [00:07<00:00, 98.17it/s, loss=0.1213] 


Epoch 2: Train Loss=0.1683, Train Acc=93.70% | Val Loss=0.1601, Val Acc=93.69%


Epoch 3/30 [Train]: 100%|██████████| 750/750 [00:10<00:00, 74.74it/s, loss=0.1040] 
Epoch 3/30 [Train]: 100%|██████████| 750/750 [00:10<00:00, 74.74it/s, loss=0.1040]


Epoch 3: Train Loss=0.1653, Train Acc=93.76% | Val Loss=0.1533, Val Acc=93.96%


Epoch 4/30 [Train]: 100%|██████████| 750/750 [00:09<00:00, 79.36it/s, loss=0.1303] 



Epoch 4: Train Loss=0.1625, Train Acc=93.79% | Val Loss=0.1529, Val Acc=94.05%


Epoch 5/30 [Train]: 100%|██████████| 750/750 [00:11<00:00, 63.30it/s, loss=0.1679] 
Epoch 5/30 [Train]: 100%|██████████| 750/750 [00:11<00:00, 63.30it/s, loss=0.1679]


Epoch 5: Train Loss=0.1592, Train Acc=93.96% | Val Loss=0.1510, Val Acc=94.08%


Epoch 6/30 [Train]: 100%|██████████| 750/750 [00:06<00:00, 110.45it/s, loss=0.1956]
Epoch 6/30 [Train]: 100%|██████████| 750/750 [00:06<00:00, 110.45it/s, loss=0.1956]


Epoch 6: Train Loss=0.1587, Train Acc=93.92% | Val Loss=0.1535, Val Acc=94.04%

Early stopping triggered after 6 epochs

E5 EMBEDDING CLASSIFIER TRAINING COMPLETE
Best validation accuracy: 94.08%

✓ Model saved to: models/e5_classifier/


---
## Part 4: Model 3 - BERT Fine-Tuning with K-Fold CV Optimization

Fine-tune BERT (bert-base-uncased) for sentiment classification.
Uses K-Fold CV to find optimal hyperparameters for better performance.

**Key Improvements:**
- 2-fold stratified CV for hyperparameter search (faster than 3-fold for BERT)
- Optimizes: learning rate, weight decay, warmup ratio, epochs, batch size
- More robust than fixed hyperparameters

---

In [20]:
## 4.1: Prepare BERT Dataset

class SentimentDataset(Dataset):
    """Dataset class for BERT tokenization"""
    
    def __init__(self, texts, labels, tokenizer, max_length=256):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]
        
        encoding = self.tokenizer(
            text,
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

print("✓ BERT dataset class defined")

✓ BERT dataset class defined


In [1]:
## 4.2: Load BERT Tokenizer and Model

print("Loading BERT model and tokenizer...")
print("This will download ~440MB on first run...\n")

model_name = 'bert-base-uncased'

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model for sequence classification
bert_model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    output_attentions=False,
    output_hidden_states=False
)

print(f"✓ Loaded: {model_name}")
print(f"  Total parameters: {sum(p.numel() for p in bert_model.parameters()):,}")
print(f"  Trainable parameters: {sum(p.numel() for p in bert_model.parameters() if p.requires_grad):,}")

Loading BERT model and tokenizer...
This will download ~440MB on first run...



NameError: name 'AutoTokenizer' is not defined

In [None]:
## 4.3: Create BERT Datasets

print("Creating tokenized datasets...")
print("This may take 2-3 minutes...\n")

# Create datasets
train_dataset_bert = SentimentDataset(
    texts=train_texts,
    labels=train_labels,
    tokenizer=tokenizer,
    max_length=256
)

val_dataset_bert = SentimentDataset(
    texts=val_texts,
    labels=val_labels,
    tokenizer=tokenizer,
    max_length=256
)

print(f"✓ Datasets created:")
print(f"  Train: {len(train_dataset_bert):,} samples")
print(f"  Val:   {len(val_dataset_bert):,} samples")

# Test tokenization
sample = train_dataset_bert[0]
print(f"\nSample tokenization:")
print(f"  Input IDs shape: {sample['input_ids'].shape}")
print(f"  Attention mask shape: {sample['attention_mask'].shape}")
print(f"  Label: {sample['labels']}")

Creating tokenized datasets...
This may take 2-3 minutes...

✓ Datasets created:
  Train: 48,000 samples
  Val:   12,000 samples

Sample tokenization:
  Input IDs shape: torch.Size([256])
  Attention mask shape: torch.Size([256])
  Label: 0


In [None]:
## 4.4: Define Metrics for BERT Training

def compute_metrics(eval_pred):
    """Compute metrics for BERT evaluation"""
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, predictions, average='binary'
    )
    acc = accuracy_score(labels, predictions)
    
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

print("✓ Metrics function defined")

✓ Metrics function defined


In [None]:
%pip install 'accelerate>={ACCELERATE_MIN_VERSION}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


/bin/bash: -c: line 0: unexpected EOF while looking for matching `''
/bin/bash: -c: line 1: syntax error: unexpected end of file
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
## 4.5: BERT Hyperparameter Optimization with K-Fold CV

def objective_bert_kfold(trial):
    """Optuna objective for BERT with K-Fold CV
    
    Uses stratified 2-fold CV (not 3-fold due to BERT's computational cost).
    Optimizes key hyperparameters that affect BERT performance.
    """
    
    # Hyperparameters to optimize
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 5e-5, log=True)
    weight_decay = trial.suggest_float('weight_decay', 0.0, 0.1)
    warmup_ratio = trial.suggest_float('warmup_ratio', 0.0, 0.2)
    num_epochs = trial.suggest_int('num_epochs', 2, 4)
    batch_size = trial.suggest_categorical('batch_size', [8, 16])
    
    print(f"\n[Trial {trial.number}] Testing: lr={learning_rate:.2e}, wd={weight_decay:.3f}, "
          f"warmup={warmup_ratio:.2f}, epochs={num_epochs}, batch={batch_size}")
    
    # Use 2-fold CV for BERT (computational efficiency)
    n_folds = 2
    skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=SEED)
    
    fold_scores = []
    
    # Combine train and val for k-fold
    combined_texts = train_texts + val_texts
    combined_labels = train_labels + val_labels
    
    for fold_idx, (train_idx, val_idx) in enumerate(skf.split(combined_texts, combined_labels)):
        print(f"  Fold {fold_idx+1}/{n_folds}...", end=" ")
        
        # Create datasets for this fold
        fold_train_texts = [combined_texts[i] for i in train_idx]
        fold_train_labels = [combined_labels[i] for i in train_idx]
        fold_val_texts = [combined_texts[i] for i in val_idx]
        fold_val_labels = [combined_labels[i] for i in val_idx]
        
        fold_train_dataset = SentimentDataset(
            texts=fold_train_texts,
            labels=fold_train_labels,
            tokenizer=tokenizer,
            max_length=256
        )
        fold_val_dataset = SentimentDataset(
            texts=fold_val_texts,
            labels=fold_val_labels,
            tokenizer=tokenizer,
            max_length=256
        )
        
        # Load fresh BERT model for this fold
        fold_model = AutoModelForSequenceClassification.from_pretrained(
            'bert-base-uncased',
            num_labels=2,
            output_attentions=False,
            output_hidden_states=False
        )
        
        # Training arguments for this fold
        fold_training_args = TrainingArguments(
            output_dir=f'models/bert_finetuned/trial_{trial.number}_fold_{fold_idx}',
            num_train_epochs=num_epochs,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=32,
            warmup_ratio=warmup_ratio,
            weight_decay=weight_decay,
            learning_rate=learning_rate,
            logging_steps=200,
            evaluation_strategy='epoch',
            save_strategy='epoch',
            save_total_limit=1,
            load_best_model_at_end=True,
            metric_for_best_model='f1',
            greater_is_better=True,
            fp16=torch.cuda.is_available(),
            seed=SEED,
            report_to='none',  # Disable wandb/tensorboard during tuning
            logging_dir=None
        )
        
        # Create trainer
        fold_trainer = Trainer(
            model=fold_model,
            args=fold_training_args,
            train_dataset=fold_train_dataset,
            eval_dataset=fold_val_dataset,
            compute_metrics=compute_metrics,
            callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
        )
        
        # Train
        fold_trainer.train()
        
        # Evaluate
        eval_results = fold_trainer.evaluate()
        fold_f1 = eval_results['eval_f1']
        fold_scores.append(fold_f1)
        
        print(f"F1={fold_f1:.4f}")
        
        # Clean up
        del fold_model, fold_trainer
        torch.cuda.empty_cache() if torch.cuda.is_available() else None
        
        # Remove temporary checkpoint directory
        import shutil
        if os.path.exists(f'models/bert_finetuned/trial_{trial.number}_fold_{fold_idx}'):
            shutil.rmtree(f'models/bert_finetuned/trial_{trial.number}_fold_{fold_idx}')
    
    # Return mean F1 score across folds
    mean_f1 = np.mean(fold_scores)
    std_f1 = np.std(fold_scores)
    print(f"  → Mean CV F1: {mean_f1:.4f} (±{std_f1:.4f})")
    
    return mean_f1

print("Starting K-Fold CV hyperparameter optimization for BERT...")
print("Using 2-fold CV for computational efficiency")
print("This will take 1-3 hours depending on your hardware...\n")
print("NOTE: You can reduce n_trials to 3-5 for faster testing\n")

# Create study
study_bert = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=SEED),
    study_name='bert_kfold_optimization'
)

# Optimize with k-fold CV (fewer trials due to computational cost)
study_bert.optimize(objective_bert_kfold, n_trials=8, show_progress_bar=True)

print("\n" + "="*70)
print("K-FOLD CV OPTIMIZATION COMPLETE - BERT")
print("="*70)
print(f"Best mean CV F1: {study_bert.best_value:.4f}")
print(f"\nBest hyperparameters:")
for key, value in study_bert.best_params.items():
    print(f"  {key}: {value}")

# Save study results
os.makedirs('models/bert_finetuned', exist_ok=True)
with open('models/bert_finetuned/optuna_study_kfold.pkl', 'wb') as f:
    pickle.dump(study_bert, f)

# Save best params
with open('models/bert_finetuned/best_hyperparameters.json', 'w') as f:
    json.dump(study_bert.best_params, f, indent=2)

print("\n✓ K-Fold CV optimization results saved")

[I 2025-11-10 02:07:44,486] A new study created in memory with name: bert_kfold_optimization


Starting K-Fold CV hyperparameter optimization for BERT...
Using 2-fold CV for computational efficiency
This will take 1-3 hours depending on your hardware...

NOTE: You can reduce n_trials to 3-5 for faster testing



  0%|          | 0/8 [00:00<?, ?it/s]


[Trial 0] Testing: lr=1.83e-05, wd=0.095, warmup=0.15, epochs=3, batch=8
  Fold 1/2...   Fold 1/2... 

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  0%|          | 0/8 [00:10<?, ?it/s]



[W 2025-11-10 02:07:55,203] Trial 0 failed with parameters: {'learning_rate': 1.827226177606625e-05, 'weight_decay': 0.09507143064099162, 'warmup_ratio': 0.146398788362281, 'num_epochs': 3, 'batch_size': 8} because of the following error: OutOfMemoryError('Caught OutOfMemoryError in replica 1 on device 1.\nOriginal Traceback (most recent call last):\n  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 84, in _worker\n    output = module(*input, **kwargs)\n  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl\n    return self._call_impl(*args, **kwargs)\n  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl\n    return forward_call(*args, **kwargs)\n  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line

OutOfMemoryError: Caught OutOfMemoryError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 84, in _worker
    output = module(*input, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1668, in forward
    outputs = self.bert(
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1142, in forward
    encoder_outputs = self.encoder(
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 695, in forward
    layer_outputs = layer_module(
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 585, in forward
    self_attention_outputs = self.attention(
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 515, in forward
    self_outputs = self.self(
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 409, in forward
    value_layer = self.transpose_for_scores(self.value(current_states))
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home3/rudrapra001/FYP-Research/.venv/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 117, in forward
    return F.linear(input, self.weight, self.bias)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 1 has a total capacity of 79.25 GiB of which 5.94 MiB is free. Process 2669594 has 77.63 GiB memory in use. Including non-PyTorch memory, this process has 1.59 GiB memory in use. Of the allocated memory 865.21 MiB is allocated by PyTorch, and 22.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


In [None]:
## 4.6: Train Final BERT Model with Best Hyperparameters

print("\nTraining final BERT model with best hyperparameters from K-Fold CV...")
print("This will take 30-60 minutes (or 2-4 hours on CPU)...\n")

best_params = study_bert.best_params

# Calculate warmup steps from warmup ratio
total_steps = (len(train_dataset_bert) // best_params['batch_size']) * best_params['num_epochs']
warmup_steps = int(total_steps * best_params['warmup_ratio'])

print(f"Using optimized hyperparameters:")
print(f"  Learning rate: {best_params['learning_rate']:.2e}")
print(f"  Weight decay: {best_params['weight_decay']:.3f}")
print(f"  Warmup ratio: {best_params['warmup_ratio']:.2f} ({warmup_steps} steps)")
print(f"  Epochs: {best_params['num_epochs']}")
print(f"  Batch size: {best_params['batch_size']}\n")

# Load fresh BERT model for final training
bert_model = AutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased',
    num_labels=2,
    output_attentions=False,
    output_hidden_states=False
)

# Training arguments with optimized hyperparameters
training_args = TrainingArguments(
    output_dir='models/bert_finetuned',
    num_train_epochs=best_params['num_epochs'],
    per_device_train_batch_size=best_params['batch_size'],
    per_device_eval_batch_size=32,
    warmup_steps=warmup_steps,
    weight_decay=best_params['weight_decay'],
    learning_rate=best_params['learning_rate'],
    logging_dir='outputs/training_logs/bert',
    logging_steps=100,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model='f1',
    greater_is_better=True,
    fp16=torch.cuda.is_available(),
    seed=SEED
)

# Create Trainer
trainer = Trainer(
    model=bert_model,
    args=training_args,
    train_dataset=train_dataset_bert,
    eval_dataset=val_dataset_bert,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

# Train
train_result = trainer.train()

# Save model
trainer.save_model('models/bert_finetuned/final_model')
tokenizer.save_pretrained('models/bert_finetuned/final_model')

# Save training metrics
with open('models/bert_finetuned/training_metrics.json', 'w') as f:
    json.dump(train_result.metrics, f, indent=2)

print("\n" + "="*70)
print("BERT FINE-TUNING COMPLETE")
print("="*70)
print(f"Training loss: {train_result.metrics['train_loss']:.4f}")
print("\n✓ Model saved to: models/bert_finetuned/final_model/")

In [None]:
## 4.7: Evaluate BERT on Validation Set

print("Evaluating BERT on validation set...\n")

eval_results = trainer.evaluate()

print("Validation Results:")
print(f"  Accuracy:  {eval_results['eval_accuracy']:.4f}")
print(f"  F1 Score:  {eval_results['eval_f1']:.4f}")
print(f"  Precision: {eval_results['eval_precision']:.4f}")
print(f"  Recall:    {eval_results['eval_recall']:.4f}")

# Save evaluation results
with open('models/bert_finetuned/validation_metrics.json', 'w') as f:
    json.dump(eval_results, f, indent=2)

print("\n✓ Evaluation complete")

---
## ✅ Notebook 2 Complete!

### What we accomplished:
1. ✓ Trained TF-IDF + FFNN classifier with Optuna optimization
2. ✓ Trained E5 embedding classifier with Optuna optimization
3. ✓ Fine-tuned BERT for sentiment analysis
4. ✓ Saved all models and hyperparameters
5. ✓ Achieved strong validation performance on multi-domain data

### Models Saved:
- `models/tfidf_ffnn/` - TF-IDF + FFNN classifier
- `models/e5_classifier/` - E5 embedding classifier
- `models/bert_finetuned/` - Fine-tuned BERT model

### Next Steps:
→ **Notebook 3**: Evaluation & Domain Adaptation Analysis
- Load all trained models
- In-domain evaluation (IMDB + Yelp test sets)
- **Cross-domain evaluation** (Amazon test set)
- Performance comparison and visualization

---