# 🧪 MS/MS Spectra Modification Classifier with Transformers

This notebook builds and trains a deep learning model designed to detect and classify *post-translational modifications (PTMs)* in MS/MS spectra from shotgun proteomics. The input is MS/MS spectra in `mgf` format and the classifier is based on a hybrid CNN-Transformer architecture.

---

### 🧠 Objectives

- **Multi-class Classification**: If modified, predict the specific type:
  - Unmodified
  - Oxidation
  - Phosphorylation
  - Ubiquitination
  - Acetylation

---

### 🔧 Environment Setup

The following libraries and paths are configured:

#### 📦 Core Libraries
- `torch`, `torch.nn`, `torch.optim`: PyTorch for neural network construction and training.
- `numpy`, `random`, `os`, `sys`: Utilities for array operations, randomness, and file handling.
- `math`, `datetime`, `logging`: Math functions, timestamping, and logging system.
- `matplotlib.pyplot`: (optional) Visualization.
- `scikit-learn`: Evaluation metrics and dataset splitting.


#### 🛠️ Path Configuration
- Adds the dataset directory on Google Drive to the system path to ensure data files can be accessed during training and evaluation.

---

### 🧬 Pipeline Overview

This project includes the following components:
- **MGF File Parsing**: Custom loader to extract raw spectra from `.mgf` files dataset.
- **Spectral Preprocessing**: Converts spectra into binned, normalized vector representations.
- **Metadata Normalization**: Processes and scales parent ion mass (`pepmass`) for model input.
- **HYbrid CNN-Transformer Model**: Hybrid neural architecture combining CNNs, self-attention, and metadata fusion.
- **Training & Evaluation**: Loop with weighted loss, custom metrics, logging, and model checkpointing.

This setup is tailored for high-performance PTM classification while maintaining compatibility with Google Colab workflows and GPU acceleration tuned using Optuna.



---

## 📁 Directory Setup Instructions

Before running the notebook, ensure your **Google Drive** is properly structured so that the code can:

* Load `.mgf` spectra files.
* Save model weights.
* Persist log files from training.

This is **required** for the notebook to run end-to-end.

---

### 🔗 1. Mount Google Drive

At the beginning of your notebook, run:

```python
from google.colab import drive
drive.mount('/content/drive')
```

You will be prompted to authorize access.

---

### 📂 2. Create This Folder Structure in Your Drive

Organize your files inside `MyDrive` as follows:

```
MyDrive/
├── data/
│   └── balanced_dataset/                ← contains balanced .mgf files for training, they dont neeed to be balanced in the class distribution, but it help in tranning performance
│       ├── split_file_001.mgf
│       ├── split_file_002.mgf
│       └── ...
├── peak_encoder_transformer_pipeline/
│   ├── model_weights/                   ← for saving trained model weights
│   └── logs/                            ← for saving training logs
```

If these folders don't exist, you can create them manually in Google Drive or use Python:

```python
import os

os.makedirs("/content/drive/MyDrive/data/balanced_dataset", exist_ok=True)
os.makedirs("/content/drive/MyDrive/peak_encoder_transformer_pipeline/model_weights", exist_ok=True)
os.makedirs("/content/drive/MyDrive/peak_encoder_transformer_pipeline/logs", exist_ok=True)
```

---

### ⚙️ 3. Update Paths in the Code (if needed)

These variables should point to the correct folders:

```python
input_dir = "your split dataset path"
model_weights_dir = "path for where the weights go"
log_dir = "path for the log system for the per bath logs to be"
```

Make sure the paths you changed to your own are comtable with is expected of each one of them


---

✅ **Once these are set**, you're ready to run the notebook end-to-end, including training, evaluation, and logging.




In [None]:
pip install optuna

Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, optuna
Successfully installed colorlog-6.9.0 optuna-4.5.0


In [None]:
#Set up the enviorment imports and paths that are necessary for the processing of the cells

from google.colab import drive
drive.mount('/content/drive')
import sys
sys.path.append('content/drive/MyDrive/data/balanced_dataset')  # Add the folder containing main.py to sys.path
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import random
import os
from collections import Counter
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, average_precision_score, precision_recall_curve
import matplotlib.pyplot as plt
import logging
from datetime import datetime
from sklearn.model_selection import train_test_split
import math
from sklearn.metrics import classification_report
import torch.nn.functional as F
from torch.nn import SiLU
import re
import optuna

Mounted at /content/drive


## 📂 DatasetHandler class for loading MGF files

This section defines the `DatasetHandler` class responsible for managing the loading and iteration over `.mgf` files containing MS/MS spectra.
Loading small `.mgf` at a time in order to make the pipeline scalable without running in to out of memory memory issues.

---

### 📦 `DatasetHandler` Overview

The `DatasetHandler` class provides a memory-efficient way to iterate through `.mgf` files stored in a directory. It supports:

- **Shuffling input files** to randomize data order across training loops.
- **Per-file usage tracking** with `MAX_FILE_PASSES`, ensuring that no file is overused during training.
- **Controlled looping** over the dataset using `num_loops` to allow multiple training epochs without data reloading.

---

### 🧩 Key Components make this under the code explaining how to use it, make it like an example under evrything

#### 🔧 Initialization
```python
handler = DatasetHandler(input_dir="/path/to/mgf", num_loops=1)

In [None]:
#Setting up the dataset handler class that handles the input
#There is still prints to remove

MAX_FILE_PASSES = 1 # Max times a file can be used before being ignored

class DatasetHandler:
    def __init__(self, input_dir, num_loops=1):
        """
        Initialize the dataset handler.

        Args:
            input_dir (str): Path to the directory containing split MGF files.
            num_loops (int): Number of times the dataset should be iterated.
        """
        self.files = [os.path.join(input_dir, f) for f in os.listdir(input_dir) if f.endswith('.mgf')]
        self.files = random.sample(self.files, len(self.files))  # Shuffle files
        self.file_usage_counter = {f: 0 for f in self.files}
        self.num_loops = num_loops
        self.loop_count = 0

    def get_next_file(self) -> list:
      """
      Load one MGF file at a time into RAM and return all valid spectra from it.

      Returns:
          list of dict: Each dict contains a valid spectrum and its metadata.
      """
      while self.loop_count < self.num_loops:
          available_files = [f for f in self.files if self.file_usage_counter[f] < MAX_FILE_PASSES]
          if not available_files:
              self.loop_count += 1
              if self.loop_count < self.num_loops:
                  print("Restarting dataset loop...")
                  self.file_usage_counter = {f: 0 for f in self.files}
                  continue
              else:
                  print("All dataset loops completed.")
                  return None

          file = random.choice(available_files)
          print(f"Processing file: {file}")

          spectra = []
          spectrum_data = None

          with open(file, 'r') as f:
              for line in f:
                  line = line.strip()

                  if line == "BEGIN IONS":
                      spectrum_data = {"mz_values": [], "intensity_values": []}

                  elif line.startswith("TITLE=") and spectrum_data is not None:
                      spectrum_data["title"] = line.split("=", 1)[1].strip()

                  elif line.startswith("PEPMASS=") and spectrum_data is not None:
                      try:
                          spectrum_data["pepmass"] = float(line.split("=", 1)[1].split()[0])
                      except ValueError:
                          spectrum_data["pepmass"] = None  # mark as missing
                  elif line.startswith("CHARGE=") and spectrum_data is not None:
                      charge_str = line.split("=", 1)[1].strip()
                      match = re.match(r'^(\d+)', charge_str)  # Match one or more digits at the start
                      if match:
                          spectrum_data["charge"] = int(match.group(1))
                      else:
                          print(f"[SKIPPED CHARGE] Invalid charge format: '{charge_str}'")
                          spectrum_data["charge"] = None

                  elif line == "END IONS" and spectrum_data is not None:
                      title = spectrum_data.get("title", "").strip()
                      mz_vals = spectrum_data.get("mz_values", [])
                      int_vals = spectrum_data.get("intensity_values", [])

                      # Final validation before appending
                      if not title or not title.strip():
                          print(f"[SKIPPED] Missing TITLE in file: {file}")
                      elif not mz_vals or not int_vals:
                          print(f"[SKIPPED] Empty m/z or intensity in file: {file}")
                      elif len(mz_vals) != len(int_vals):
                          print(f"[SKIPPED] Mismatched m/z and intensity count in file: {file}")
                      elif np.sum(int_vals) == 0:
                          print(f"[SKIPPED] All-zero intensities in spectrum '{title}'")
                      else:
                          spectra.append(spectrum_data)

                      spectrum_data = None  # Reset for next spectrum

                  else:
                      if spectrum_data is not None:
                          try:
                              parts = line.split()
                              if len(parts) != 2:
                                  raise ValueError("Expected two float values")
                              mz, intensity = map(float, parts)
                              if math.isnan(mz) or math.isnan(intensity):
                                  raise ValueError("NaN detected")
                              spectrum_data["mz_values"].append(mz)
                              spectrum_data["intensity_values"].append(intensity)
                          except ValueError:
                              #print(f"[SKIPPED LINE] Invalid peak: '{line}' in file: {file}")
                              continue

          self.file_usage_counter[file] += 1

          if spectra:
              return spectra, file

      print("All spectra processed.")
      return None


# ⚙️ Dense Vector Binning for 1D CNN Input  
This section defines the updated preprocessing pipeline for converting annotated MS/MS spectra into dense, fixed-length vectors. These are tailored for use in models such as CNNs or hybrid CNN-Transformer architectures.

## 🔧 Functions:

### `bin_spectra_to_dense_vectors`  
Converts a list of spectra into fixed-length vectors by:  
- **Binning the m/z values** across a specified range (`mz_min` to `mz_max`) into `num_bins`.  
- Each bin holds the **intensity sum of peaks** falling into that m/z range.  
- Applies **sliding window normalization**:  
  The m/z axis is divided into fixed-size windows (e.g., 200 m/z), and intensities within each window are normalized individually to the [0, 1] range. This preserves local signal structure and prevents domination by high-intensity regions.

### `process_spectra_with_handler`  
Processes a batch of spectra by:  
- Logging and skipping spectra with empty or invalid m/z or intensity values.  
- Using the above function to apply binning and **sliding window normalization**.  
- Skipping spectra with no signal after binning (i.e., zero-vector).  

Returns a list of valid, normalized dense vectors for CNN input and logs the total number of skipped spectra.

## 📦 Output Format:  
Each spectrum becomes a 1D `np.array` of shape `(num_bins,)` with `float32` values.  

The final output is either:  
- a stacked `np.ndarray` of shape `(batch_size, num_bins)` when using `bin_spectra_to_dense_vectors` directly on a list, or  
- a list of valid vectors (1 per spectrum) when using `process_spectra_with_handler`.
 used.

In [None]:
def bin_spectra_to_dense_vectors(spectra_data, num_bins=5000, mz_min=100.0, mz_max=2200.0, window_size=200.0):
    """
    Converts spectra into dense, fixed-length binned vectors suitable for 1D CNN input with sliding window normalization.

    Parameters:
    - spectra_data: List of spectra dicts with 'mz_values' and 'intensity_values'.
    - num_bins: Number of bins to divide the m/z range [mz_min, mz_max] into.
    - mz_min: Minimum m/z value for binning.
    - mz_max: Maximum m/z value for binning.
    - window_size: Size of m/z window for normalization (default is 200.0).

    Returns:
    - np.ndarray of shape (batch_size, num_bins) with per-spectrum normalized intensities.
    """
    bin_edges = np.linspace(mz_min, mz_max, num_bins + 1)
    binned_spectra = []

    for spectrum in spectra_data:
        mz_values = np.array(spectrum['mz_values'])
        intensity_values = np.array(spectrum['intensity_values'])

        if len(mz_values) == 0 or len(intensity_values) == 0:
            binned_spectra.append(np.zeros(num_bins, dtype=np.float32))
            continue

        # Create an array to hold the binned intensities (fixed size)
        binned_intensity = np.zeros(num_bins)

        # Iterate over windows of m/z values
        for window_start in np.arange(mz_min, mz_max, window_size):
            window_end = window_start + window_size
            window_mask = (mz_values >= window_start) & (mz_values < window_end)
            window_mz_values = mz_values[window_mask]
            window_intensity_values = intensity_values[window_mask]

            if len(window_mz_values) > 0:
                # Bin the intensities for this window
                binned_window_intensity, _ = np.histogram(window_mz_values, bins=bin_edges, weights=window_intensity_values)

                # Normalize the binned intensities within this window
                min_val = binned_window_intensity.min()
                max_val = binned_window_intensity.max()
                range_val = max_val - min_val if max_val != min_val else 1e-6
                normalized_binned_window = (binned_window_intensity - min_val) / range_val

                # Add the normalized intensities to the final vector (same size as before)
                binned_intensity += normalized_binned_window

        binned_spectra.append(binned_intensity.astype(np.float32))

    return np.stack(binned_spectra)  # Shape: (batch_size, num_bins)


def process_spectra_with_handler(spectra_batch, num_bins=1000, window_size=200.0):
    """
    Processes spectra batch and returns a list of 1D CNN-ready vectors (one per spectrum),
    with sliding window normalization applied.
    """
    spectrum_vectors = []
    skipped_spectra = 0

    for idx, spectrum in enumerate(spectra_batch):
        title = spectrum.get("title", f"unnamed_{idx}")
        mz_values = np.array(spectrum['mz_values'])
        intensity_values = np.array(spectrum['intensity_values'])

        if mz_values.size == 0 or intensity_values.size == 0:
            print(f"[SKIPPED] Empty m/z or intensity array: '{title}'")
            skipped_spectra += 1
            continue

        # Call the binning function with windowed normalization
        binned_spectrum = bin_spectra_to_dense_vectors([spectrum], num_bins=num_bins, window_size=window_size)

        # Ensure only valid (non-zero) spectra are added
        if np.sum(binned_spectrum) == 0:
            print(f"[SKIPPED] Zero intensity after binning: '{title}'")
            skipped_spectra += 1
            continue

        spectrum_vectors.append(binned_spectrum[0])  # Extract the vector

    print(f"Total skipped spectra: {skipped_spectra}")
    return spectrum_vectors


## 🔬 Normalize Parent Ion Mass (PEPMASS)

This module provides utilities to **extract sequences**, **convert observed m/z to monoisotopic neutral mass** (if needed), and **normalize parent ion values** into the range [0, 1].

---

### 🎯 Objectives (current implementation)

- **Extract** peptide sequence from the beginning of the `TITLE` field.  
- **Convert** PEPMASS from **observed m/z** to **monoisotopic single charged mass** when `assume_observed=True`.  
- **Normalize** the parent ion mass into \[0, 1\] using global bounds from `min_max_dict`.



### 🧩 Key Functions

#### 🔹 `extract_sequence_from_title(title: str) -> str`
Extracts the peptide sequence from the `TITLE`.  
Assumes the sequence is the **first token** (before the first space).

**Example**
```python
TITLE = "GWSMSEQSEESVGGR 2,S,Phospho"
extract_sequence_from_title(TITLE)
# → "GWSMSEQSEESVGGR"
```

🔹 `observed_to_monoisotopic(observed_mz: float, charge: int) -> float`

Converts observed precursor **m/z** into **monoisotopic neutral mass**:

$$
\text{mono\_mass} = z \cdot \text{m/z} - (z - 1)\cdot \text{PROTON\_MASS}
$$

Uses `PROTON_MASS = 1.007276`.

---

#### 🔹 `normalize_parent_ions(data, min_max_dict, assume_observed=True) -> list[float]`

Normalizes parent ion values to the range \$0, 1\$.

* **Inputs per spectrum (dict):**

  * `"pepmass"`: precursor value
  * `"charge"`: integer charge state

* **Behavior:**

  1. If `assume_observed=True`:

     * Converts `"pepmass"` (observed m/z) → monoisotopic neutral mass.
  2. If `assume_observed=False`:

     * Uses `"pepmass"` directly (assumed monoisotopic).
  3. Normalizes with:

     $$
     \text{norm} = \frac{parent\_ion - min}{max - min}
     $$
  4. Clamps results into \$0, 1\$.
  5. Missing metadata → returns `0.0`.

**Example**

```python
min_max = {"min": 400.0, "max": 6000.0}
normalized = normalize_parent_ions(spectra, min_max, assume_observed=True)
```

---

### ✅ Output

Returns:

```python
[List of float values between 0 and 1]
```

---

### ⚠️ Notes

* Requires `"min"` and `"max"` keys in `min_max_dict`.
* Missing or invalid metadata defaults to **0.0**.
* No theoretical mass calculation or spectrum validation is performed here.

In [None]:

PROTON_MASS = 1.0072764665789
H2O_MASS = 18.01056

def extract_sequence_from_title(title: str) -> str:
    """
    Extracts the peptide sequence from the TITLE string.
    Assumes the sequence is the first word, before the first space.
    """
    if not isinstance(title, str) or not title.strip():
        return ""
    return title.strip().split(" ")[0]  # safe even with extra spaces



def observed_to_monoisotopic(observed_mz, charge):
    return charge * observed_mz - (charge - 1) * PROTON_MASS



def normalize_parent_ions(data, min_max_dict, assume_observed=True):
    """
    Normalize parent ions to the range [0, 1].

    If assume_observed=True, converts PEPMASS (observed m/z) to monoisotopic mass before computing normalization.
    """
    normalized = []

    for spectrum in data:
        pepmass = spectrum.get("pepmass", None)
        charge = spectrum.get("charge", None)

        if pepmass is None or charge is None:
            normalized.append(0.0)
            continue

        if assume_observed:
            mono_mass = observed_to_monoisotopic(pepmass, charge)
            parent_ion = mono_mass
        else:
            parent_ion = pepmass  # Already monoisotopic

        # Normalize to [0, 1]
        pepmass_min = min_max_dict["min"]
        pepmass_max = min_max_dict["max"]
        norm = (parent_ion - pepmass_min) / (pepmass_max - pepmass_min)
        normalized.append(max(0, min(1, norm)))

    return normalized



### 🧬 Combine Spectra with Parent Ion Mass, change the model to always recieve monoistopic single charged mass inetas of obserd mass like we currently do.


This function constructs the final **input representation** for the neural network by pairing each processed spectrum with its corresponding normalized parent ion mass.

---

### ⚙️ `combine_features(...)`

#### **Purpose**
Aggregates spectral and precursor metadata into a unified format, ready to be passed into the model during training or evaluation.

---

### 🔄 Process Flow

1. **Spectral Preprocessing**
   - Calls `process_spectra_with_handler(...)` to:
     - Apply binning and normalization.
     - Generate a dense, fixed-length vector for each spectrum.
   - Result: `spectra_vectors` — a list of shape `[batch_size, num_bins]`.

2. **Parent Ion Normalization**
   - Invokes `normalize_parent_ions(...)` to:
     - Convert precursor monoisotopic mass to observed mass.
     - Normalize to a range of [0, 1] using dataset-specific bounds.
   - Result: `parent_ions` — a list of length `[batch_size]`.

3. **Validation**
   - Verifies alignment between spectrum vectors and parent ion list.
   - Logs an error and aborts if lengths mismatch.

4. **Zipping**
   - Combines each spectrum vector and its corresponding normalized parent ion into a tuple:
     ```python
     (spectrum_vector, normalized_parent_ion)
     ```

---

### 📤 Output Format

```python
[
  (spectrum_vec₁, pepmass₁),
  (spectrum_vec₂, pepmass₂),
  ...
]



In [None]:
def combine_features(data, pepmass_min_max, num_bins, window_normaliation_size, assume_observed):
    """
    Converts spectra + metadata into model input tuples:
        (binned spectrum, normalized parent ion mass)
    """

    spectra_vectors = process_spectra_with_handler(data, num_bins, window_normaliation_size)
    if not spectra_vectors:
        return None

    parent_ions = parent_ions = normalize_parent_ions(
    data, pepmass_min_max, assume_observed=assume_observed)


    if len(spectra_vectors) != len(parent_ions):
        print("❌ Mismatch between spectra and parent ions.")
        return None

    return list(zip(spectra_vectors, parent_ions))



### 🏷️ Label Spectra Based on Modifications

This function performs **automatic labeling** of MS/MS spectra for supervised learning, based on the content of the `TITLE` field in each spectrum's metadata.

---

### 🧠 Purpose

Assigns integer labels to each spectrum in a batch according to the presence of post-translational modification (PTM) keywords in the title:

- `0` → **Unmodified**
- `1` → **Oxidation** (if the string `"oxidation"` appears in the title)
- `2` → **Phosphorylation** (if the string `"phospho"` appears in the title)
- `3` → **Ubiquitination** (if the string `"k_gg"` appears in the title)
- `4` → **Acetylation** (if the string `"k_ac"` appears in the title)

The result is a list of labels aligned with the order of input spectra — suitable for classification tasks using `softmax`
---

### ⚙️ Logic

For each spectrum in the input list:
1. Checks that the entry is a dictionary.
2. Extracts the `title` and converts it to lowercase.
3. Searches for PTM-related keywords.
4. Defaults to `0` if no match or invalid format.

---

### 📤 Output Format

Returns:
```python
[0, 2, 1, 4, 3, ...]


In [None]:
#This cell reads the labels of the data and prepares it for the model


#Faltam bue labels agr

def spectrum_label(spectra_data) -> list:
    """
    Assigns labels to spectra based on known modifications in TITLE.

    Parameters:
    - spectra_data (list of dict): List of spectrum dictionaries (from DatasetHandler).

    Returns:
    - List of labels for each spectrum.
    """
    if not isinstance(spectra_data, list):
        print("ERROR: Expected a list of spectra, got", type(spectra_data))
        return None

    labels = []

    for spectrum in spectra_data:
        if not isinstance(spectrum, dict):
            print(f"WARNING: Expected spectrum to be a dict, got {type(spectrum)}")
            labels.append(0)
            continue

        # Get spectrum title and verify it's a non-empty string
        spectrum_id = spectrum.get("title", "")
        if not isinstance(spectrum_id, str) or not spectrum_id.strip():
            print(f"WARNING: Missing or invalid title for spectrum, assigning label 0")
            labels.append(0)
            continue

        spectrum_id = spectrum_id.lower().strip()  # Normalize for label detection

        # Assign labels based on keywords in TITLE
        if "oxidation" in spectrum_id:
            labels.append(1)
        elif "phospho" in spectrum_id:
            labels.append(2)
        elif "k_gg" in spectrum_id:
            labels.append(3)
        elif "k_ac" in spectrum_id:
            labels.append(4)
        else:
            labels.append(0)

    print(f"Labels: {Counter(labels)}")
    return labels


---

# 🧠 Hybrid CNN-Transformer Classification Model

This module defines the **final architecture** used for **multi-class PTM classification** from MS/MS spectra.
The model integrates **local pattern extraction (CNN)**, **global context modeling (Transformer)**, and **metadata (parent ion mass)** into a single classification head.

---

## 🔹 `PositionalEncoding`

Implements **sinusoidal positional encodings** (Vaswani et al., 2017), injecting sequence order information into embeddings.

* **Signature:**

  ```python
  PositionalEncoding(d_model: int = 64, seq_len: int = 175, dropout: float = 0.1)
  ```
* **Behavior:** Precomputes a tensor of `sin`/`cos` terms and adds it to input, followed by dropout.
* **Input:** `[B, L, d_model]` with `L ≤ seq_len`.
* **Output:** Same shape as input.

**Example**

```python
pe = PositionalEncoding(d_model=64, seq_len=1000, dropout=0.1)
x = torch.randn(32, 10, 64)   # [batch, seq_len, d_model]
x = pe(x)                     # same shape
```

---

## 🔹 `EncoderTransformerClassifier`

A **hybrid classifier** with four main blocks:

1. **1D CNN Encoder** – Extracts local spectral patterns

   ```
   Conv1d(1→32, k=5, pad=2) → BN → ReLU
   MaxPool1d(k=2)            # halves length
   Conv1d(32→64, k=3, pad=1) → BN → ReLU
   Flatten
   ```

   * **Output:** `[B, 64 * (input_size // 2)]`

2. **Linear Encoder** – Projects CNN features into Transformer latent space

   ```
   Linear(64*(S/2) → 512) → BN → ReLU
   Linear(512 → latent_size) → BN → ReLU → Dropout
   ```

   * **Output:** `[B, latent_size]`

3. **Positional Encoding + Transformer** – Global context

   * Expand to sequence: `[B, 1, latent_size]`
   * Add sinusoidal encoding
   * Pass through `nn.TransformerEncoder` (`num_layers`, `num_heads`, `dim_feedforward=4*latent_size`)
   * Mean over sequence dim → `[B, latent_size]`

4. **Parent Ion Processor** – Encodes normalized parent mass

   ```
   Linear(1 → 64) → ReLU
   Linear(64 → latent_size) → ReLU
   ```

   * **Output:** `[B, latent_size]`

5. **Fusion & Classification**

   ```
   concat([spectrum, parent]) → [B, 2*latent_size]
   Dropout
   Linear(2*latent_size → num_classes)
   ```

   * **Output:** logits `[B, num_classes]`

---

### ✅ Forward Pass

**Inputs**

* `spectra`: `[B, S]` (dense binned spectrum, length = `input_size`)
* `parent_ion`: `[B]` (normalized precursor mass)

**Output**

* `logits`: `[B, num_classes]`

---

### 🔧 Implementation Notes

* `latent_size % num_heads == 0` is enforced.
* `input_size` must be **even** (due to `MaxPool1d`).
* The Transformer currently sees only one token per spectrum (global embedding). To use **true attention over multiple tokens**, pass a sequence (e.g., CNN feature map before flattening).

---

### 🧪 Example

```python
model = EncoderTransformerClassifier(
    input_size=175, latent_size=64, num_classes=5,
    num_heads=4, num_layers=2, dropout_prob=0.1
)

spectra = torch.randn(32, 175)   # [batch, S]
parent  = torch.rand(32)         # [batch], normalized
logits  = model((spectra, parent))  # [32, 5]
```

**Loss**

* Multi-class PTM classification:

  ```python
  loss_fn = nn.CrossEntropyLoss()
  loss = loss_fn(logits, labels)   # labels ∈ [0..num_classes-1]


In [None]:


class PositionalEncoding(nn.Module):
    def __init__(self, d_model: int = 64, seq_len: int = 175, dropout: float = 0.1):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(dropout)

        position = torch.arange(seq_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(seq_len, d_model)
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe.unsqueeze(0))  # shape: [1, seq_len, d_model]

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        seq_len = x.size(1)
        x = x + self.pe[:, :seq_len]
        return self.dropout(x)


class EncoderTransformerClassifier(nn.Module):
    def __init__(self, input_size, latent_size, num_classes, num_heads, num_layers, dropout_prob, max_len=1000):
        super(EncoderTransformerClassifier, self).__init__()
        self.input_size = input_size
        self.latent_size = latent_size
        self.num_classes = num_classes

        # Validate divisibility
        if latent_size % num_heads != 0:
            raise ValueError(f"latent_size ({latent_size}) must be divisible by num_heads ({num_heads}).")

        # 1. CNN Encoder (New Layer)
        self.cnn_encoder = nn.Sequential(
            nn.Conv1d(in_channels=1, out_channels=32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm1d(32),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=2),  # Downsample
            nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Flatten()
        )

        # 2. Linear Encoder (Refactored)
        self.encoder = nn.Sequential(
            nn.Linear(64 * (input_size // 2), 512),
            nn.BatchNorm1d(512),
            nn.ReLU(),
            nn.Linear(512, latent_size),
            nn.BatchNorm1d(latent_size),
            nn.ReLU(),
            nn.Dropout(dropout_prob)
        )

        #3. Positional Encoding
        self.positional_encoding = PositionalEncoding(d_model=latent_size, seq_len=max_len, dropout=dropout_prob)

        # 4. Transformer Encoder
        self.transformer_encoder = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(
                d_model=latent_size,
                nhead=num_heads,
                dim_feedforward=latent_size * 4,
                dropout=dropout_prob,
                activation='relu',
                batch_first=True,
                norm_first=False
            ),
            num_layers=num_layers
        )

        # Parent Ion Layer
        self.parent_ion_layer = nn.Sequential(
            nn.Linear(1, 64),
            nn.ReLU(),
            nn.Linear(64, self.latent_size),
            nn.ReLU()
        )

        # Dropout before classification
        self.dropout = nn.Dropout(dropout_prob)


        self.classifier_head = nn.Linear(latent_size * 2, num_classes)


    def forward(self, inputs):
        spectra, parent_ion = inputs
        parent_ion = parent_ion.unsqueeze(1)

        # CNN Encoder
        spectra = spectra.unsqueeze(1)  # Ensure input is [B, 1, S]
        cnn_output = self.cnn_encoder(spectra)

        # Linear Encoder
        x = self.encoder(cnn_output)

        # Positional Encoding and Transformer
        x = x.unsqueeze(1)  # Adding sequence dimension
        x = self.positional_encoding(x)
        x = self.transformer_encoder(x)
        x = x.mean(dim=1)

        # Parent Ion Encoding
        parent = self.parent_ion_layer(parent_ion).squeeze(1)

        # Concatenate
        combined = torch.cat([x, parent], dim=1)
        combined = self.dropout(combined)

        logits = self.classifier_head(combined)  # shape: [batch, num_classes]
        return logits





### 🧪 Training, Evaluation & Logging Utilities

This section documents the **current** training, evaluation, and logging helpers for the Transformer-based classifier on MS/MS spectra. It replaces older notes that referred to multi-label BCE and PR/ROC AUC.

---

#### 🗂️ Logging Setup
Configures a clean logging pipeline for Colab:

- **Clears** any pre-existing root handlers to prevent duplicate logs.
- **Persists** run logs to Google Drive at (change to the path you want the directory of your log file)
```

/content/drive/MyDrive/4\_mod\_model/logs/4\_mod\_transformer.log

````
- Uses a named logger: `spectra_logger` with level **INFO**.

---

#### 🧪 `train_classifier_with_weights(...)`

Trains the classifier with **multi-class CrossEntropyLoss** and optional **L1 regularization**.

**Key features**
- **Optimizer:** `AdamW` with `weight_decay=0.01`.
- **Loss:** `CrossEntropyLoss` on **integer class labels** (not one-hot).
- **Class imbalance:** Per-class counts are computed; a weight vector is prepared **but not applied** by default.  
> To enable class weighting, pass it to CE:  
> `loss_fn = nn.CrossEntropyLoss(weight=class_weights)`.
- **Regularization:** Optional **L1** penalty via `l1_lambda`.
- **Metrics:** Per-epoch **loss** and **accuracy**; prints debug samples every 10 epochs.
- **Checkpointing:** Saves `state_dict` to `save_path` when training completes and if the costum scoring fuction imporves, the user can also set a minium score so the model doesnt uncessary save in the beginning of tranning, making it so it only save high quality results.

**Inputs**
- `model`: the `EncoderTransformerClassifier`.
- `data_tensors`: `(spectra_tensors[B, S], parent_ions[B])`.
- `labels`: integer tensor `[B]` with values in `[0, num_classes-1]`.
- Typical kwargs: `epochs=100`, `learning_rate=1e-3`, `l1_lambda=1e-3`, `device='cuda'`.

**Output**
- Saves weights to disk and prints/logs training progress.

---

#### 📊 `evaluate_model(...)`

Evaluates the model on a validation batch using **multi-class** metrics.

**What it does**
- Runs inference with `CrossEntropyLoss` against integer labels.
- Computes:
- **Macro / Weighted:** Precision, Recall, F1
- **Accuracy**
- **Per-class report:** via `classification_report`
- **Class distribution** (counts)
- **Composite score** (returned) prioritizes balanced performance under imbalance:

\[
\text{Score} \;=\; 0.45\cdot \text{Macro-F1} \;+\; 0.30\cdot \text{Macro-Recall} \;+\; 0.15\cdot \text{Weighted-F1} \;+\; 0.10\cdot \text{Accuracy}
\]

- Logs a formatted summary (including per-class table) via `spectra_logger`.

**Inputs**
- `model`, `data_tensors`, `labels` (same shapes as training).
- `batch` (optional): tag to identify the batch in logs.

**Output**
- **Returns:** the composite score (float).

---

### ✅ Notes & Assumptions

- **Task type:** This pipeline is **multi-class**, not multi-label. Use `CrossEntropyLoss` targets of shape `[B]`.
- **Class names:** The list
```python
["Unmodified", "Oxidation", "Phospho", "Ubiquitination", "Acetylation"]
````

should match your `num_classes`. Update it if you use a different class set/order.

* **Imbalance handling:** Class weights are computed but **not applied** unless passed to `CrossEntropyLoss`.
* **AUC metrics:** PR-AUC / ROC-AUC are **not** computed in the current implementation. Add them explicitly if needed.
* **Parent ion input:** Expected normalized to `[0, 1]` (see the normalization module).

---

### 🔧 Minimal Usage Example

```python
# Train
score_path = "model_final_weights.pth"
train_classifier_with_weights(
    model, (spectra_tensors, parent_ions), labels,
    epochs=50, learning_rate=1e-3, l1_lambda=1e-3,
    save_path=score_path, device='cuda'
)

# Evaluate
val_score = evaluate_model(model, (val_spectra, val_parent_ions), val_labels, batch="val-01")
print("Validation composite score:", val_score)


In [None]:
#Testing, evaluating and logging cell

#Remove existing handlers to prevent Colab caching issues
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

log_dir = "/content/drive/MyDrive/4_mod_model/logs"
os.makedirs(log_dir, exist_ok=True)

log_path = os.path.join(log_dir, "4_mod_transformer.log")

# Reapply config AFTER handlers are cleared
logging.basicConfig(
    filename=log_path,
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    filemode="a"
)

logger = logging.getLogger("spectra_logger")



def train_classifier_with_weights(model, data_tensors, labels, epochs=100, learning_rate=0.001,l1_lambda=0.001, save_path="model_final_weights.pth", device='cuda'):
    """
    Trains the encoder-classifier model with a weighted loss function for handling class imbalance.
    Assumes all inputs are already PyTorch tensors on the correct device.
    """
    spectra_tensors, parent_ions = data_tensors
    parent_ions = parent_ions.unsqueeze(1)  # Ensure shape (batch_size, 1)
    labels_tensors = labels  # Already a tensor

    # Compute class weights (just once on CPU for Counter)
    class_counts = Counter(labels_tensors.cpu().numpy())
    num_classes = torch.max(labels_tensors).item() + 1
    class_weights = torch.ones(num_classes, dtype=torch.float32).to(device)

    for cls in range(num_classes):
        if cls in class_counts:
            class_weights[cls] = len(labels_tensors) / (num_classes * class_counts[cls])

    print(f"Class Weights: {class_weights.cpu().numpy()}")

    # Optimizer & Loss
    optimizer = optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01) #SGD worth a try?
    loss_fn = nn.CrossEntropyLoss()
     # or add pos_weight=torch.tensor([...]) if it performs badly

    epoch_losses = []
    epoch_accuracies = []

    # Train loop
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()

        outputs = model((spectra_tensors ,parent_ions))
        loss = loss_fn(outputs, labels_tensors)

        # L1 Regularization
        l1_loss = 0.0
        for param in model.parameters():
            l1_loss += torch.sum(torch.abs(param))
        loss += l1_lambda * l1_loss

        loss.backward()
        optimizer.step()


        predictions = torch.argmax(outputs, dim=1)
        accuracy = (predictions == labels_tensors).float().mean().item()


        #Store metrics
        epoch_losses.append(loss.item())
        epoch_accuracies.append(accuracy)

        #Log epoch details
        print(f"Epoch [{epoch + 1}/{epochs}] - Loss: {loss.item():.4f}, Accuracy: {accuracy * 100:.2f}%")

        # Debugging Information
        if (epoch + 1) % 10 == 0 or epoch == epochs - 1:
            print(f"Sample Predictions: {predictions[:5].cpu().numpy()}")
            print(f"Actual Labels: {labels[:5]}")
            print(f"Sample Logits: {outputs[:5].detach().cpu().numpy()}")

    #Save model weights after training
    torch.save(model.state_dict(), save_path)
    print(f"Final model weights saved to {save_path}")



def evaluate_model(model, data_tensors, labels, batch=None):

    model.eval()
    spectra_tensors, parent_ions = data_tensors
    parent_ions = parent_ions.unsqueeze(1)


    targets = labels.long()


    with torch.no_grad():
        outputs = model((spectra_tensors, parent_ions))
        loss_fn = nn.CrossEntropyLoss()  # if you're using them
        loss = loss_fn(outputs, targets)

        predictions = torch.argmax(outputs, dim=1)

        predictions_np = predictions.cpu().numpy()
        targets_np = targets.cpu().numpy()

        print("predictions device:", predictions.device)
        print("targets device:", targets.device)

        # Macro & Weighted scores
        macro_precision = precision_score(targets_np, predictions_np, average='macro', zero_division=0)
        macro_recall = recall_score(targets_np, predictions_np, average='macro', zero_division=0)
        macro_f1 = f1_score(targets_np, predictions_np, average='macro', zero_division=0)
        weighted_precision = precision_score(targets_np, predictions_np, average='weighted', zero_division=0)
        weighted_recall = recall_score(targets_np, predictions_np, average='weighted', zero_division=0)
        weighted_f1 = f1_score(targets_np, predictions_np, average='weighted', zero_division=0)

        accuracy = (predictions == targets).float().mean().item()


        # Class distribution
        class_distribution = Counter(targets_np)


        # Per-class report
        class_names = ["Unmodified", "Oxidation", "Phospho", "Ubiquitination", "Acetylation"]
        report = classification_report(targets_np, predictions_np, target_names=class_names, zero_division=0, digits=4)


        score = (
            0.45 * macro_f1 +
            0.30 * macro_recall +
            0.15 * weighted_f1 +
            0.10 * accuracy
        )


        log_message = (
            f"Batch {batch if batch is not None else '-'}: Validation Loss: {loss.item():.4f},\n"
            f"Macro Precision: {macro_precision:.4f}, Macro Recall: {macro_recall:.4f}, Macro F1-score: {macro_f1:.4f},\n"
            f"Weighted Precision: {weighted_precision:.4f}, Weighted Recall: {weighted_recall:.4f}, Weighted F1-score: {weighted_f1:.4f},\n"
            f"Accuracy: {accuracy * 100:.2f}%, "
            f"Class Distribution: {class_distribution}\n"
            f"Per-class metrics:\n{report}"
        )

        print(log_message)
        logger.info(log_message)

    return  score

here’s a paste-ready markdown tailored to **this** 5-class Optuna script.

# Optuna Hyperparameter Tuning (5-Class, Streaming Mini-Batches)

We tune the Transformer classifier with **Optuna** using **low-memory, file-streamed mini-batches** from `.mgf` files. The objective is to maximize a validation score computed consistently with the project’s evaluation function.

---

## Setup

* **Classes:** `num_classes = 5`
* **Data:** `input_dir = "/content/drive/MyDrive/data/4_mod_balanced_dataset"`
* **Artifacts:** `model_weights_dir = "/content/drive/MyDrive/4_mod_model/model_weights"`
* **Precursor normalization:** `pepmass_range = [500.0, 6000.0]`, window `200.0`
* **Device:** GPU if available (`torch.cuda.is_available()`)

### Controller flags

* `run_optuna = True` (run tuning)
* `n_trials = 100` (can be increased)
* `assume_observed = True`, `load_latest_model_at_start = True` (kept for parity with training loop; not used inside objective)

---

## Search Space


Model/training hyperparameters tuned:

* `latent_size ∈ {64, 128, 256}`
* `num_heads ∈ {2, 4, 8}` with constraint **`latent_size % num_heads == 0`** (else prune)
* `num_layers ∈ [2, 6]` (int)
* `dropout_prob ∈ [0.10, 0.35]`
* `l1_lambda ∈ [1e−7, 5e−6]` (log-scaled)
* `learning_rate ∈ [1e−4, 3e−4]` (log-scaled)

---

## Objective & Data Flow (Low-Memory)

Each trial evaluates the candidate hyperparameters over **two independent streamed mini-batches** (two `.mgf` files), reducing variance without loading the full dataset.

1. **Deterministic seeding per trial**
   `seed = trial.number + 1337`; we seed Python, NumPy, and PyTorch (CPU/GPU).

2. **Mini-batch streaming**
   A `DatasetHandler` yields one file at a time (`NUM_BATCHES = 2`).
   For each file:

   * Build features with `combine_features(...)` →
     **(a)** binned spectrum vector of size `num_bins`, **(b)** normalized parent ion.

3. **Per-batch stratified split (80/20)**
   `train_test_split(..., stratify=labels, random_state=seed + batch_index)` ensures class balance within the batch.

4. **Inner training loop with pruning checkpoints**

   * Initialize a fresh `EncoderTransformerClassifier` per trial.
   * Train for **60 epochs** in **3 chunks of 20**:

     * after each 20 epochs, compute `val_score = evaluate_model(...)`
     * report to Optuna via `trial.report(val_score, step=ep)`
     * allow early stopping via `trial.should_prune()`

5. **Trial objective**
   The trial’s score is the **mean** of the validation scores across the available mini-batches.

---

## Pruning

We use **MedianPruner** (`n_warmup_steps = 5`) to terminate underperforming trials early based on intermediate results, saving compute time.

---

## Running the Study & Diagnostics

* Create and run the study:

  ```python
  study = optuna.create_study(direction="maximize",
                              pruner=optuna.pruners.MedianPruner(n_warmup_steps=5))
  study.optimize(objective, n_trials=100)  # script currently uses 100 here
  ```
* After optimization, we report the best params and show standard Optuna plots:

  * Parameter Importances
  * Optimization History
  * Slice Plot
  * Parallel Coordinate Plot

These visualizations highlight sensitive hyperparameters and interactions.

---

## Reproducibility

* Fixed random seeds per trial and batch.
* Stratified splits preserve class balance within each streamed mini-batch.
* The attention head divisibility constraint (`latent_size % num_heads == 0`) prevents invalid architectures (invalid trials are pruned immediately).

---

## Why This Design

* **Memory-efficient:** only one `.mgf` file is in memory at a time.
* **Variance-aware:** averaging over two distinct files stabilizes validation estimates.
* **Compute-savvy:** chunked training with intermediate evaluation enables effective **pruning**.
* **Consistent:** uses the same feature pipeline and evaluation routine as the main training workflow.



In [None]:
# === CONFIGURATION FLAGS ===
run_optuna = True  # Set to False if you want to run the full training loop instead
n_trials = 100  # Number of Optuna trials

# === FIXED PARAMS ===
num_classes = 5
pepmass_range = {'min': 500.0, 'max': 6000.0}
window_normaliation_size = 200.00
epoch = 100
num_loops = 1
min_score_threshold = 0.90
input_dir = "/content/drive/MyDrive/data/4_mod_balanced_dataset"
model_weights_dir = "/content/drive/MyDrive/4_mod_model/model_weights"
assume_observed = True
load_latest_model_at_start = True

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("Device:", torch.cuda.get_device_name(0))

def objective(trial):
    # --- Fixed or 2-stage: pick one ---
    num_bins = 4500
    # --- Model hyperparams ---
    latent_size  = trial.suggest_categorical("latent_size", [64, 128, 256])
    num_heads    = trial.suggest_categorical("num_heads", [2, 4, 8])
    if latent_size % num_heads != 0:
        raise optuna.exceptions.TrialPruned()

    num_layers   = trial.suggest_int("num_layers", 2, 6)
    dropout_prob = trial.suggest_float("dropout_prob", 0.1, 0.35)   # narrower to avoid underfit with L1
    l1_lambda    = trial.suggest_float("l1_lambda", 1e-7, 5e-6, log=True)
    lr           = trial.suggest_float("learning_rate", 1e-4, 3e-4, log=True)

    # --- Determinism per trial ---
    seed = trial.number + 1337
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)

    # --- Evaluate on multiple mini-batches to reduce variance ---
    NUM_BATCHES = 2  # or 3 if affordable
    scores = []

    handler = DatasetHandler(input_dir, num_loops=1)

    for b in range(NUM_BATCHES):
        batch = handler.get_next_file()
        if batch is None:
            break
        spectra_batch, file_path = batch

        # (Optional) group-aware split: skip here if you can’t extract groups quickly
        feature_batch = combine_features(spectra_batch, pepmass_range, num_bins,
                                         window_normaliation_size, assume_observed)
        if not feature_batch:
            raise optuna.exceptions.TrialPruned()

        spectra, parent_ions = zip(*feature_batch)
        X_spec = torch.tensor(np.array(spectra), dtype=torch.float32, device=device)
        X_pi   = torch.tensor(np.array(parent_ions), dtype=torch.float32, device=device)
        y      = torch.tensor(spectrum_label(spectra_batch), dtype=torch.long, device=device)

        idx_tr, idx_va = train_test_split(np.arange(len(y)), test_size=0.2,
                                          stratify=y.detach().cpu().numpy(),
                                          random_state=seed + b)

        train_data = (X_spec[idx_tr], X_pi[idx_tr])
        val_data   = (X_spec[idx_va], X_pi[idx_va])
        y_tr, y_va = y[idx_tr], y[idx_va]

        model = EncoderTransformerClassifier(
            input_size=num_bins, latent_size=latent_size, num_heads=num_heads,
            num_layers=num_layers, dropout_prob=dropout_prob, num_classes=num_classes
        ).to(device)

        # Shorter inner training + report for pruning
        EPOCHS = 60
        for ep in range(0, EPOCHS, 20):
            train_classifier_with_weights(
                model, train_data, y_tr,
                epochs=20, learning_rate=lr, l1_lambda=l1_lambda,
                save_path="/dev/null", device=device
            )
            val_score = evaluate_model(model, val_data, y_va)
            trial.report(val_score, step=ep+20)
            if trial.should_prune():
                raise optuna.exceptions.TrialPruned()

        scores.append(val_score)

    if not scores:
        raise optuna.exceptions.TrialPruned()

    return float(np.mean(scores))



study = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner(n_warmup_steps=5))
study.optimize(objective, n_trials= 100)

print("Best trial:")
print(study.best_trial.params)

optuna.visualization.plot_param_importances(study).show()
optuna.visualization.plot_optimization_history(study).show()
optuna.visualization.plot_slice(study).show()
optuna.visualization.plot_parallel_coordinate(study).show()

[I 2025-09-16 02:03:04,239] A new study created in memory with name: no-name-7b11e50a-d00c-4c58-8b60-feb9bc1486fd


CUDA available: True
Device: NVIDIA L4
Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset/split_file_037.mgf
Total skipped spectra: 0
Labels: Counter({0: 359, 1: 350, 3: 251, 2: 176, 4: 66})
Class Weights: [0.66968644 0.68642855 1.3631206  0.961      3.626415  ]
Epoch [1/20] - Loss: 2.1109, Accuracy: 22.37%
Epoch [2/20] - Loss: 1.7459, Accuracy: 46.62%
Epoch [3/20] - Loss: 1.6220, Accuracy: 55.46%
Epoch [4/20] - Loss: 1.5544, Accuracy: 58.79%
Epoch [5/20] - Loss: 1.4790, Accuracy: 64.00%
Epoch [6/20] - Loss: 1.4041, Accuracy: 70.66%
Epoch [7/20] - Loss: 1.3374, Accuracy: 73.78%
Epoch [8/20] - Loss: 1.2524, Accuracy: 78.67%
Epoch [9/20] - Loss: 1.1753, Accuracy: 82.52%
Epoch [10/20] - Loss: 1.0771, Accuracy: 86.47%
Sample Predictions: [1 3 0 1 3]
Actual Labels: tensor([1, 0, 0, 1, 3], device='cuda:0')
Sample Logits: [[-0.20299911  1.0040014   0.14009853  0.05741541 -0.73667336]
 [ 0.09131964 -0.38636446  0.23621047  1.3219966  -1.7356995 ]
 [ 1.8972821   1.2484958   0.1



Epoch [1/20] - Loss: 0.5740, Accuracy: 98.23%
Epoch [2/20] - Loss: 0.5450, Accuracy: 98.75%
Epoch [3/20] - Loss: 0.5170, Accuracy: 99.17%
Epoch [4/20] - Loss: 0.4918, Accuracy: 99.38%
Epoch [5/20] - Loss: 0.4730, Accuracy: 99.17%
Epoch [6/20] - Loss: 0.4556, Accuracy: 99.58%
Epoch [7/20] - Loss: 0.4407, Accuracy: 99.79%
Epoch [8/20] - Loss: 0.4289, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.4209, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.4097, Accuracy: 99.69%
Sample Predictions: [4 3 0 3 1]
Actual Labels: tensor([4, 3, 0, 3, 1], device='cuda:0')
Sample Logits: [[ 0.6491272  -1.0670115  -0.9472309  -0.6647108   4.0056014 ]
 [-1.298186   -2.4734921  -1.9847916   4.116307   -0.19793013]
 [ 4.081837   -1.1361313  -0.48404986 -0.8910955  -0.10031009]
 [-1.1658239  -1.1422902  -2.6387804   4.7766995  -0.8030215 ]
 [-2.760413    4.479639   -0.33274096 -1.5148419   0.25053257]]
Epoch [11/20] - Loss: 0.4013, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.3969, Accuracy: 99.90%
Epoch [13/20] - Loss: 0.3



Epoch [1/20] - Loss: 0.3532, Accuracy: 99.90%
Epoch [2/20] - Loss: 0.3460, Accuracy: 99.79%
Epoch [3/20] - Loss: 0.3370, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.3287, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.3233, Accuracy: 100.00%
Epoch [6/20] - Loss: 0.3169, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.3151, Accuracy: 99.90%
Epoch [8/20] - Loss: 0.3062, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.3050, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.3025, Accuracy: 99.79%
Sample Predictions: [4 3 0 3 1]
Actual Labels: tensor([4, 3, 0, 3, 1], device='cuda:0')
Sample Logits: [[-0.8548459  -0.6808745  -0.9439251  -2.7962697   5.0033236 ]
 [-1.029662   -1.7174983  -1.6527284   5.4841094  -0.7007811 ]
 [ 5.978806   -0.5135433  -2.001393   -0.9120028  -0.7340684 ]
 [-1.9594841  -1.2682695  -2.0802236   5.4405236  -2.437881  ]
 [-3.1794245   5.720708   -1.446897   -0.65961975 -2.4329216 ]]
Epoch [11/20] - Loss: 0.2961, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.2938, Accuracy: 100.00%
Epoch [13/20] - Los

[I 2025-09-16 02:03:47,522] Trial 0 finished with value: 0.45970645228491547 and parameters: {'latent_size': 256, 'num_heads': 8, 'num_layers': 3, 'dropout_prob': 0.3324259231699741, 'l1_lambda': 2.554307905179933e-06, 'learning_rate': 0.00015352333906687565}. Best is trial 0 with value: 0.45970645228491547.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.5844,
Macro Precision: 0.6179, Macro Recall: 0.4894, Macro F1-score: 0.5093,
Weighted Precision: 0.5821, Weighted Recall: 0.5062, Weighted F1-score: 0.4922,
Accuracy: 50.62%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3986    0.8194    0.5364        72
     Oxidation     0.5588    0.2714    0.3654        70
       Phospho     0.5882    0.2778    0.3774        36
Ubiquitination     0.8438    0.5400    0.6585        50
   Acetylation     0.7000    0.5385    0.6087        13

      accuracy                         0.5062       241
     macro avg     0.6179    0.4894    0.5093       241
  weighted avg     0.5821    0.5062    0.4922       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.1794, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.1689, Accuracy: 100.00%
Epoch [3/20] - Loss: 0.1608, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.1539, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.1477, Accuracy: 100.00%
Epoch [6/20] - Loss: 0.1420, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.1367, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.1321, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.1275, Accuracy: 100.00%
Epoch [10/20] - Loss: 0.1236, Accuracy: 100.00%
Sample Predictions: [1 4 3 3 0]
Actual Labels: tensor([1, 4, 3, 3, 0], device='cuda:0')
Sample Logits: [[-0.9015016   5.5824714  -1.615439   -1.9910954  -1.9115705 ]
 [-1.7323582  -1.1244178  -1.4343807  -1.2761532   4.9419928 ]
 [-1.8458118  -1.7181554  -0.71022004  5.6791773  -1.142229  ]
 [-2.077652   -1.2945844  -0.5667371   6.1522255  -1.3739376 ]
 [ 5.7357655  -1.457063   -1.7795018  -1.4462341  -1.3287606 ]]
Epoch [11/20] - Loss: 0.1196, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.1161, Accuracy: 100.00%
Epoch [13/20] 



Epoch [1/20] - Loss: 0.0931, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.0888, Accuracy: 100.00%
Epoch [3/20] - Loss: 0.0853, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.0825, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.0806, Accuracy: 100.00%
Epoch [6/20] - Loss: 0.0791, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.0773, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.0754, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.0740, Accuracy: 100.00%
Epoch [10/20] - Loss: 0.0731, Accuracy: 100.00%
Sample Predictions: [1 4 3 3 0]
Actual Labels: tensor([1, 4, 3, 3, 0], device='cuda:0')
Sample Logits: [[-1.5036025  5.877358  -1.7700018 -1.4335291 -1.7129319]
 [-1.9504555 -0.7943862 -1.2545526 -2.760554   6.0958195]
 [-1.5496368 -1.7522871 -1.5289338  6.5487633 -1.4700581]
 [-1.2560816 -0.9728955 -1.6075246  6.396636  -2.1517239]
 [ 6.4411135 -1.2566941 -2.1498766 -1.6889826 -1.9030912]]
Epoch [11/20] - Loss: 0.0722, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.0712, Accuracy: 100.00%
Epoch [13/20] - Loss: 0.0703, Accuracy:

[I 2025-09-16 02:04:20,950] Trial 1 finished with value: 0.1810526757997597 and parameters: {'latent_size': 256, 'num_heads': 2, 'num_layers': 3, 'dropout_prob': 0.1074495756399585, 'l1_lambda': 9.442178935006351e-07, 'learning_rate': 0.0001392725350542664}. Best is trial 0 with value: 0.45970645228491547.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.0211,
Macro Precision: 0.6176, Macro Recall: 0.2577, Macro F1-score: 0.2366,
Weighted Precision: 0.5069, Weighted Recall: 0.3485, Weighted F1-score: 0.2897,
Accuracy: 34.85%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3309    0.6389    0.4360        72
     Oxidation     0.3404    0.4571    0.3902        70
       Phospho     0.6667    0.0556    0.1026        36
Ubiquitination     0.7500    0.0600    0.1111        50
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.3485       241
     macro avg     0.6176    0.2577    0.2366       241
  weighted avg     0.5069    0.3485    0.2897       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.6429, Accuracy: 98.54%
Epoch [2/20] - Loss: 0.6419, Accuracy: 97.50%
Epoch [3/20] - Loss: 0.6115, Accuracy: 98.54%
Epoch [4/20] - Loss: 0.5946, Accuracy: 98.86%
Epoch [5/20] - Loss: 0.5838, Accuracy: 99.38%
Epoch [6/20] - Loss: 0.5729, Accuracy: 99.48%
Epoch [7/20] - Loss: 0.5612, Accuracy: 99.27%
Epoch [8/20] - Loss: 0.5526, Accuracy: 99.48%
Epoch [9/20] - Loss: 0.5457, Accuracy: 99.58%
Epoch [10/20] - Loss: 0.5318, Accuracy: 99.90%
Sample Predictions: [1 0 0 2 1]
Actual Labels: tensor([1, 0, 0, 2, 1], device='cuda:0')
Sample Logits: [[-1.8444754   4.9157977  -0.9667365  -1.8495096  -1.7762079 ]
 [ 4.984418   -1.0591029  -1.8548821  -2.055855   -0.23275384]
 [ 4.8112254  -0.73211354 -0.76489586 -2.0517476  -0.4817525 ]
 [-0.1159778  -2.2008038   4.301184   -0.09642613 -3.73359   ]
 [-0.58360493  5.097138   -1.303732   -1.8763947  -0.8232776 ]]
Epoch [11/20] - Loss: 0.5224, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.5279, Accuracy: 99.58%
Epoch [13/20] - Loss: 0.5



Epoch [1/20] - Loss: 0.4871, Accuracy: 99.90%
Epoch [2/20] - Loss: 0.4791, Accuracy: 100.00%
Epoch [3/20] - Loss: 0.4701, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.4613, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.4595, Accuracy: 99.79%
Epoch [6/20] - Loss: 0.4523, Accuracy: 99.90%
Epoch [7/20] - Loss: 0.4464, Accuracy: 99.90%
Epoch [8/20] - Loss: 0.4433, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.4398, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.4404, Accuracy: 99.58%
Sample Predictions: [1 0 0 2 1]
Actual Labels: tensor([1, 0, 0, 2, 1], device='cuda:0')
Sample Logits: [[-2.516789    5.466916   -1.7764587  -1.3374783  -0.9201284 ]
 [ 5.519775   -2.0392804  -1.2488459  -1.5923504  -2.3520586 ]
 [ 5.7230477  -1.301386   -0.879325   -2.5928268  -2.300845  ]
 [-3.324475    0.06013795  4.9810224  -1.2025931  -1.8992455 ]
 [-1.552488    5.8095617  -1.6046686  -1.5247418  -0.9263502 ]]
Epoch [11/20] - Loss: 0.4349, Accuracy: 99.79%
Epoch [12/20] - Loss: 0.4328, Accuracy: 99.79%
Epoch [13/20] - Loss: 

[I 2025-09-16 02:04:57,874] Trial 2 finished with value: 0.3851632419056277 and parameters: {'latent_size': 256, 'num_heads': 2, 'num_layers': 6, 'dropout_prob': 0.3156303680738798, 'l1_lambda': 2.286811223700905e-06, 'learning_rate': 0.00013323021262229078}. Best is trial 0 with value: 0.45970645228491547.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.6428,
Macro Precision: 0.6024, Macro Recall: 0.4672, Macro F1-score: 0.4942,
Weighted Precision: 0.5486, Weighted Recall: 0.4689, Weighted F1-score: 0.4700,
Accuracy: 46.89%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4177    0.4583    0.4371        72
     Oxidation     0.3717    0.6087    0.4615        69
       Phospho     0.6154    0.2222    0.3265        36
Ubiquitination     0.8800    0.4314    0.5789        51
   Acetylation     0.7273    0.6154    0.6667        13

      accuracy                         0.4689       241
     macro avg     0.6024    0.4672    0.4942       241
  weighted avg     0.5486    0.4689    0.4700       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.0852, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.0854, Accuracy: 100.00%
Epoch [3/20] - Loss: 0.0836, Accuracy: 99.79%
Epoch [4/20] - Loss: 0.0766, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.0755, Accuracy: 100.00%
Epoch [6/20] - Loss: 0.0767, Accuracy: 99.90%
Epoch [7/20] - Loss: 0.0720, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.0708, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.0694, Accuracy: 100.00%
Epoch [10/20] - Loss: 0.0701, Accuracy: 99.90%
Sample Predictions: [0 1 2 1 1]
Actual Labels: tensor([0, 1, 2, 1, 1], device='cuda:0')
Sample Logits: [[ 6.1131496  -1.8114359  -2.1537564  -0.9339235  -0.8674173 ]
 [-1.11677     5.6729503  -2.0855412  -1.9064401  -1.2417022 ]
 [-1.6120236  -1.712482    5.7670836  -1.6930152  -1.0094173 ]
 [-1.940094    6.0956297  -2.1873143  -0.76611924 -2.1858623 ]
 [-1.5458897   6.072611   -2.6996717  -1.1902865  -1.3527513 ]]
Epoch [11/20] - Loss: 0.0676, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.0671, Accuracy: 100.00%
Epoch [13/20] - L



Epoch [1/20] - Loss: 0.0623, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.0682, Accuracy: 99.69%
Epoch [3/20] - Loss: 0.0608, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.1536, Accuracy: 97.92%
Epoch [5/20] - Loss: 0.0634, Accuracy: 99.90%
Epoch [6/20] - Loss: 0.0606, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.0776, Accuracy: 99.38%
Epoch [8/20] - Loss: 0.0746, Accuracy: 99.79%
Epoch [9/20] - Loss: 0.0819, Accuracy: 99.48%
Epoch [10/20] - Loss: 0.0733, Accuracy: 99.69%
Sample Predictions: [0 1 2 1 1]
Actual Labels: tensor([0, 1, 2, 1, 1], device='cuda:0')
Sample Logits: [[ 6.7322955  -2.0821352  -1.2371323  -2.055483   -1.7414768 ]
 [-0.20334949  6.0263205  -2.7230985  -2.7699661  -2.0494292 ]
 [-0.45035937 -2.1200972   6.474992   -2.3507814  -0.9505376 ]
 [ 0.3882888   5.840916   -3.4187434  -3.0116324  -2.3408935 ]
 [-1.0214154   6.1240363  -2.5956864  -2.3467078  -1.7605392 ]]
Epoch [11/20] - Loss: 0.0629, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.0617, Accuracy: 100.00%
Epoch [13/20] - Loss: 

[I 2025-09-16 02:05:31,769] Trial 3 finished with value: 0.3574098202586793 and parameters: {'latent_size': 256, 'num_heads': 2, 'num_layers': 6, 'dropout_prob': 0.15882953649629808, 'l1_lambda': 2.871436972456132e-07, 'learning_rate': 0.00017883532499639082}. Best is trial 0 with value: 0.45970645228491547.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.2122,
Macro Precision: 0.6223, Macro Recall: 0.3524, Macro F1-score: 0.3642,
Weighted Precision: 0.5460, Weighted Recall: 0.4066, Weighted F1-score: 0.3623,
Accuracy: 40.66%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 51, np.int64(2): 34, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3497    0.8889    0.5020        72
     Oxidation     0.4815    0.1831    0.2653        71
       Phospho     0.3571    0.1471    0.2083        34
Ubiquitination     0.9231    0.2353    0.3750        51
   Acetylation     1.0000    0.3077    0.4706        13

      accuracy                         0.4066       241
     macro avg     0.6223    0.3524    0.3642       241
  weighted avg     0.5460    0.4066    0.3623       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 1.0198, Accuracy: 76.17%
Epoch [2/20] - Loss: 1.0033, Accuracy: 79.50%
Epoch [3/20] - Loss: 0.9474, Accuracy: 82.31%
Epoch [4/20] - Loss: 0.9342, Accuracy: 82.73%
Epoch [5/20] - Loss: 0.8875, Accuracy: 81.89%
Epoch [6/20] - Loss: 0.8817, Accuracy: 82.21%
Epoch [7/20] - Loss: 0.8469, Accuracy: 83.45%
Epoch [8/20] - Loss: 0.8288, Accuracy: 85.74%
Epoch [9/20] - Loss: 0.8084, Accuracy: 84.39%
Epoch [10/20] - Loss: 0.7773, Accuracy: 86.37%
Sample Predictions: [2 0 1 0 1]
Actual Labels: tensor([2, 0, 1, 0, 1], device='cuda:0')
Sample Logits: [[ 0.57931817  0.5453814   1.4917088  -1.1436064  -0.7668064 ]
 [ 2.3510478   0.14683191  0.3091127  -1.2842909  -1.4697292 ]
 [ 1.2044597   1.2079535   0.24396324 -1.7802069  -0.28387892]
 [ 1.2382075  -0.46130157 -0.27521768 -0.27981424 -1.1775322 ]
 [ 0.61870116  1.8372345   0.5958513  -2.1389413  -0.0574432 ]]
Epoch [11/20] - Loss: 0.7532, Accuracy: 87.41%
Epoch [12/20] - Loss: 0.7137, Accuracy: 88.76%
Epoch [13/20] - Loss: 0.71



Epoch [1/20] - Loss: 0.5850, Accuracy: 92.30%
Epoch [2/20] - Loss: 0.5621, Accuracy: 93.24%
Epoch [3/20] - Loss: 0.5527, Accuracy: 94.17%
Epoch [4/20] - Loss: 0.5417, Accuracy: 94.28%
Epoch [5/20] - Loss: 0.5128, Accuracy: 94.69%
Epoch [6/20] - Loss: 0.5157, Accuracy: 94.28%
Epoch [7/20] - Loss: 0.5018, Accuracy: 95.42%
Epoch [8/20] - Loss: 0.4892, Accuracy: 95.32%
Epoch [9/20] - Loss: 0.4805, Accuracy: 95.63%
Epoch [10/20] - Loss: 0.4708, Accuracy: 95.53%
Sample Predictions: [2 0 1 0 0]
Actual Labels: tensor([2, 0, 1, 0, 1], device='cuda:0')
Sample Logits: [[ 0.71913886  0.36124653  1.6373782  -0.25173315 -1.6224294 ]
 [ 2.8551676  -1.2158699  -0.4916129   0.04744786 -1.3321207 ]
 [-0.29796657  2.321404    0.52172357 -1.3439256  -0.45710787]
 [ 3.0197449  -0.40848374 -0.68282455 -1.1058644  -1.0821557 ]
 [ 1.7932496   0.9362526  -0.53853774 -1.6361341  -1.1074337 ]]
Epoch [11/20] - Loss: 0.4523, Accuracy: 96.46%
Epoch [12/20] - Loss: 0.4295, Accuracy: 97.61%
Epoch [13/20] - Loss: 0.44

[I 2025-09-16 02:06:06,059] Trial 4 finished with value: 0.48896709495588064 and parameters: {'latent_size': 64, 'num_heads': 8, 'num_layers': 6, 'dropout_prob': 0.26102994700216464, 'l1_lambda': 1.2390732708942145e-06, 'learning_rate': 0.00012714380152261821}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.6515,
Macro Precision: 0.6652, Macro Recall: 0.4815, Macro F1-score: 0.5212,
Weighted Precision: 0.5999, Weighted Recall: 0.5187, Weighted F1-score: 0.5146,
Accuracy: 51.87%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4135    0.7639    0.5366        72
     Oxidation     0.4746    0.4000    0.4341        70
       Phospho     0.6154    0.2222    0.3265        36
Ubiquitination     0.9655    0.5600    0.7089        50
   Acetylation     0.8571    0.4615    0.6000        13

      accuracy                         0.5187       241
     macro avg     0.6652    0.4815    0.5212       241
  weighted avg     0.5999    0.5187    0.5146       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:06:23,061] Trial 5 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.4662,
Macro Precision: 0.5314, Macro Recall: 0.2834, Macro F1-score: 0.2555,
Weighted Precision: 0.5590, Weighted Recall: 0.3983, Weighted F1-score: 0.3424,
Accuracy: 39.83%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3551    0.6806    0.4667        72
     Oxidation     0.4022    0.5286    0.4568        70
       Phospho     1.0000    0.0278    0.0541        36
Ubiquitination     0.9000    0.1800    0.3000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3983       241
     macro avg     0.5314    0.2834    0.2555       241
  weighted avg     0.5590    0.3983    0.3424       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.5618, Accuracy: 92.82%
Epoch [2/20] - Loss: 0.5196, Accuracy: 94.59%
Epoch [3/20] - Loss: 0.4888, Accuracy: 95.42%
Epoch [4/20] - Loss: 0.4531, Accuracy: 95.84%
Epoch [5/20] - Loss: 0.4205, Accuracy: 96.15%
Epoch [6/20] - Loss: 0.3882, Accuracy: 97.81%
Epoch [7/20] - Loss: 0.3785, Accuracy: 97.50%
Epoch [8/20] - Loss: 0.3481, Accuracy: 98.23%
Epoch [9/20] - Loss: 0.3240, Accuracy: 98.23%
Epoch [10/20] - Loss: 0.3055, Accuracy: 98.75%
Sample Predictions: [0 0 1 3 0]
Actual Labels: tensor([0, 0, 1, 3, 0], device='cuda:0')
Sample Logits: [[ 1.5277525  -1.4290016  -0.41591987  0.25049967 -1.3482459 ]
 [ 3.0396886  -1.0808712  -1.0782348  -1.7712617  -1.2253308 ]
 [-0.28458014  2.789787   -0.3528243  -0.5853885  -0.38841513]
 [ 0.72585636 -1.1641574  -0.3586719   1.8339257  -0.08585422]
 [ 2.7989817  -1.2574205  -1.3911912  -1.0239453  -0.71908593]]
Epoch [11/20] - Loss: 0.2918, Accuracy: 98.44%
Epoch [12/20] - Loss: 0.2790, Accuracy: 99.27%
Epoch [13/20] - Loss: 0.26



Epoch [1/20] - Loss: 0.1864, Accuracy: 99.69%
Epoch [2/20] - Loss: 0.1777, Accuracy: 99.69%
Epoch [3/20] - Loss: 0.1764, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.1686, Accuracy: 99.79%
Epoch [5/20] - Loss: 0.1620, Accuracy: 99.90%
Epoch [6/20] - Loss: 0.1566, Accuracy: 99.79%
Epoch [7/20] - Loss: 0.1533, Accuracy: 99.90%
Epoch [8/20] - Loss: 0.1480, Accuracy: 99.79%
Epoch [9/20] - Loss: 0.1409, Accuracy: 100.00%
Epoch [10/20] - Loss: 0.1380, Accuracy: 99.90%
Sample Predictions: [0 0 1 3 0]
Actual Labels: tensor([0, 0, 1, 3, 0], device='cuda:0')
Sample Logits: [[ 3.509233   -0.67702687 -1.6568195  -1.377723   -1.2514096 ]
 [ 3.6240847  -0.719508   -1.0251371  -1.9760281  -1.6539931 ]
 [-1.0817628   3.040658   -0.873116   -0.29654032 -0.06930879]
 [-1.1392211  -0.49281174 -0.5054391   3.4984467  -1.0978441 ]
 [ 3.4819913  -1.3079486  -1.984263   -2.286075   -0.8245727 ]]
Epoch [11/20] - Loss: 0.1365, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.1347, Accuracy: 100.00%
Epoch [13/20] - Loss: 0

[I 2025-09-16 02:06:56,003] Trial 6 finished with value: 0.4439291601528968 and parameters: {'latent_size': 128, 'num_heads': 2, 'num_layers': 4, 'dropout_prob': 0.286876831159987, 'l1_lambda': 6.714575554423259e-07, 'learning_rate': 0.00011644547861384849}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5950,
Macro Precision: 0.6302, Macro Recall: 0.4229, Macro F1-score: 0.4488,
Weighted Precision: 0.5824, Weighted Recall: 0.4647, Weighted F1-score: 0.4449,
Accuracy: 46.47%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3659    0.8333    0.5085        72
     Oxidation     0.5833    0.2000    0.2979        70
       Phospho     0.5217    0.3333    0.4068        36
Ubiquitination     0.8800    0.4400    0.5867        50
   Acetylation     0.8000    0.3077    0.4444        13

      accuracy                         0.4647       241
     macro avg     0.6302    0.4229    0.4488       241
  weighted avg     0.5824    0.4647    0.4449       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.7512, Accuracy: 95.84%
Epoch [2/20] - Loss: 0.6998, Accuracy: 96.67%
Epoch [3/20] - Loss: 0.6571, Accuracy: 97.29%
Epoch [4/20] - Loss: 0.6331, Accuracy: 97.40%
Epoch [5/20] - Loss: 0.6043, Accuracy: 98.13%
Epoch [6/20] - Loss: 0.5734, Accuracy: 98.13%
Epoch [7/20] - Loss: 0.5447, Accuracy: 99.06%
Epoch [8/20] - Loss: 0.5361, Accuracy: 98.96%
Epoch [9/20] - Loss: 0.5208, Accuracy: 98.86%
Epoch [10/20] - Loss: 0.4981, Accuracy: 99.48%
Sample Predictions: [3 1 0 2 0]
Actual Labels: tensor([3, 1, 4, 2, 0], device='cuda:0')
Sample Logits: [[-1.083635   -0.87971014 -1.2709255   3.6667624  -1.855731  ]
 [-0.3529801   3.1764162  -1.4788213  -1.429598   -1.7807314 ]
 [ 1.82805    -0.02614466 -2.0540836  -0.8600005   1.7195865 ]
 [-2.3096776  -0.08757638  3.2707489  -2.2053351   0.35996264]
 [ 3.6899529  -1.0662371  -0.6715085  -0.13858688 -0.8996696 ]]
Epoch [11/20] - Loss: 0.4852, Accuracy: 99.17%
Epoch [12/20] - Loss: 0.4673, Accuracy: 99.38%
Epoch [13/20] - Loss: 0.46



Epoch [1/20] - Loss: 0.3939, Accuracy: 99.79%
Epoch [2/20] - Loss: 0.3729, Accuracy: 99.90%
Epoch [3/20] - Loss: 0.3570, Accuracy: 99.79%
Epoch [4/20] - Loss: 0.3470, Accuracy: 99.69%
Epoch [5/20] - Loss: 0.3288, Accuracy: 99.79%
Epoch [6/20] - Loss: 0.3206, Accuracy: 99.79%
Epoch [7/20] - Loss: 0.3137, Accuracy: 99.90%
Epoch [8/20] - Loss: 0.3037, Accuracy: 99.90%
Epoch [9/20] - Loss: 0.3050, Accuracy: 99.79%
Epoch [10/20] - Loss: 0.2951, Accuracy: 99.90%
Sample Predictions: [3 1 4 2 0]
Actual Labels: tensor([3, 1, 4, 2, 0], device='cuda:0')
Sample Logits: [[-2.0132427   0.1457225  -1.9915156   3.850112   -1.321104  ]
 [-0.9885253   3.2196906  -1.4660218  -0.94673336 -1.6655046 ]
 [-0.2632801  -1.6234516  -0.94818276 -0.34847164  4.1803107 ]
 [-1.2164174  -1.4121763   4.7742305  -1.6113163  -1.4544464 ]
 [ 4.6603336  -1.1238447  -0.7401542  -1.1637874  -1.4665279 ]]
Epoch [11/20] - Loss: 0.2987, Accuracy: 99.79%
Epoch [12/20] - Loss: 0.3079, Accuracy: 99.17%
Epoch [13/20] - Loss: 0.28

[I 2025-09-16 02:07:28,996] Trial 7 finished with value: 0.36598550159777754 and parameters: {'latent_size': 128, 'num_heads': 2, 'num_layers': 4, 'dropout_prob': 0.3072162408804823, 'l1_lambda': 3.5444429102004172e-06, 'learning_rate': 0.00025355131570051084}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.9607,
Macro Precision: 0.7924, Macro Recall: 0.3599, Macro F1-score: 0.3170,
Weighted Precision: 0.7657, Weighted Recall: 0.3734, Weighted F1-score: 0.2623,
Accuracy: 37.34%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3273    1.0000    0.4932        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     0.8571    0.1667    0.2791        36
Ubiquitination     1.0000    0.0800    0.1481        50
   Acetylation     0.7778    0.5385    0.6364        13

      accuracy                         0.3734       241
     macro avg     0.7924    0.3599    0.3170       241
  weighted avg     0.7657    0.3734    0.2623       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.7870, Accuracy: 91.57%
Epoch [2/20] - Loss: 0.7417, Accuracy: 93.55%
Epoch [3/20] - Loss: 0.6895, Accuracy: 95.01%
Epoch [4/20] - Loss: 0.6894, Accuracy: 92.82%
Epoch [5/20] - Loss: 0.6450, Accuracy: 94.90%
Epoch [6/20] - Loss: 0.6106, Accuracy: 96.05%
Epoch [7/20] - Loss: 0.5970, Accuracy: 95.73%
Epoch [8/20] - Loss: 0.5653, Accuracy: 97.29%
Epoch [9/20] - Loss: 0.5388, Accuracy: 96.98%
Epoch [10/20] - Loss: 0.5245, Accuracy: 97.40%
Sample Predictions: [1 0 2 4 0]
Actual Labels: tensor([1, 0, 2, 4, 0], device='cuda:0')
Sample Logits: [[-0.9317111   2.8848097   1.1565447  -0.8123777  -0.19613844]
 [ 2.623687   -1.1442313   0.19225837 -1.0281458  -1.3384595 ]
 [ 0.3499479   0.02253477  2.0424945  -1.7568079  -0.68864036]
 [-1.7653502  -1.3984326  -1.0097094  -0.16926405  2.7629642 ]
 [ 3.1041358  -0.91882485 -0.4657786  -0.42212743 -2.0655844 ]]
Epoch [11/20] - Loss: 0.4986, Accuracy: 98.23%
Epoch [12/20] - Loss: 0.4844, Accuracy: 97.81%
Epoch [13/20] - Loss: 0.46



Epoch [1/20] - Loss: 0.3671, Accuracy: 99.79%
Epoch [2/20] - Loss: 0.3591, Accuracy: 99.06%
Epoch [3/20] - Loss: 0.3507, Accuracy: 99.38%
Epoch [4/20] - Loss: 0.3356, Accuracy: 99.79%
Epoch [5/20] - Loss: 0.3283, Accuracy: 99.69%
Epoch [6/20] - Loss: 0.3274, Accuracy: 99.06%
Epoch [7/20] - Loss: 0.3132, Accuracy: 99.58%
Epoch [8/20] - Loss: 0.3099, Accuracy: 99.48%
Epoch [9/20] - Loss: 0.2985, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.2918, Accuracy: 99.69%
Sample Predictions: [1 0 2 4 0]
Actual Labels: tensor([1, 0, 2, 4, 0], device='cuda:0')
Sample Logits: [[-0.6584983   3.9629607   0.00624232 -0.27445868 -0.9153255 ]
 [ 3.6006892  -1.3187126  -0.8001404  -1.4055221  -0.9960394 ]
 [-0.8431084  -0.28503272  2.9041715  -0.8743236  -0.20086533]
 [-2.3392892  -1.3748244  -1.0738528  -0.14915848  3.0151951 ]
 [ 3.1938338  -1.1669188   0.19208613 -0.76484704 -2.046569  ]]
Epoch [11/20] - Loss: 0.2894, Accuracy: 99.79%
Epoch [12/20] - Loss: 0.2818, Accuracy: 99.58%
Epoch [13/20] - Loss: 0.27

[I 2025-09-16 02:08:02,098] Trial 8 finished with value: 0.4883033844083547 and parameters: {'latent_size': 128, 'num_heads': 2, 'num_layers': 4, 'dropout_prob': 0.30436257555967405, 'l1_lambda': 2.0327232313833876e-06, 'learning_rate': 0.00010734684777725852}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.0934,
Macro Precision: 0.5496, Macro Recall: 0.4465, Macro F1-score: 0.4644,
Weighted Precision: 0.5349, Weighted Recall: 0.4813, Weighted F1-score: 0.4731,
Accuracy: 48.13%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4667    0.6806    0.5537        72
     Oxidation     0.4268    0.5072    0.4636        69
       Phospho     0.3600    0.2432    0.2903        37
Ubiquitination     0.8947    0.3400    0.4928        50
   Acetylation     0.6000    0.4615    0.5217        13

      accuracy                         0.4813       241
     macro avg     0.5496    0.4465    0.4644       241
  weighted avg     0.5349    0.4813    0.4731       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:08:19,116] Trial 9 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.7367,
Macro Precision: 0.7297, Macro Recall: 0.3450, Macro F1-score: 0.3299,
Weighted Precision: 0.7118, Weighted Recall: 0.3900, Weighted F1-score: 0.3072,
Accuracy: 39.00%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3333    0.9722    0.4965        72
     Oxidation     1.0000    0.0429    0.0822        70
       Phospho     0.6154    0.2222    0.3265        36
Ubiquitination     0.9000    0.1800    0.3000        50
   Acetylation     0.8000    0.3077    0.4444        13

      accuracy                         0.3900       241
     macro avg     0.7297    0.3450    0.3299       241
  weighted avg     0.7118    0.3900    0.3072       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.7151, Accuracy: 91.16%
Epoch [2/20] - Loss: 0.6630, Accuracy: 93.03%
Epoch [3/20] - Loss: 0.6487, Accuracy: 94.07%
Epoch [4/20] - Loss: 0.6279, Accuracy: 93.13%
Epoch [5/20] - Loss: 0.6003, Accuracy: 94.48%
Epoch [6/20] - Loss: 0.5640, Accuracy: 95.53%
Epoch [7/20] - Loss: 0.5479, Accuracy: 95.53%
Epoch [8/20] - Loss: 0.5234, Accuracy: 96.77%
Epoch [9/20] - Loss: 0.5068, Accuracy: 96.77%
Epoch [10/20] - Loss: 0.4856, Accuracy: 97.19%
Sample Predictions: [1 3 0 1 4]
Actual Labels: tensor([1, 3, 0, 1, 4], device='cuda:0')
Sample Logits: [[-0.09936766  1.7754691  -0.62100387 -0.16132085 -0.11062351]
 [-1.003631   -1.0056942   0.05262201  1.6027391  -0.14164236]
 [ 1.9989029  -0.08301423 -0.41280168 -0.7626832  -1.1441449 ]
 [-0.44690496  2.4928372  -0.77022755 -1.0380456  -0.29961008]
 [-1.3056499  -0.56948894  0.15010545  0.35562927  1.2742864 ]]
Epoch [11/20] - Loss: 0.4579, Accuracy: 97.09%
Epoch [12/20] - Loss: 0.4468, Accuracy: 97.50%
Epoch [13/20] - Loss: 0.42



Epoch [1/20] - Loss: 0.3234, Accuracy: 99.06%
Epoch [2/20] - Loss: 0.3145, Accuracy: 98.65%
Epoch [3/20] - Loss: 0.3088, Accuracy: 98.44%
Epoch [4/20] - Loss: 0.2847, Accuracy: 99.38%
Epoch [5/20] - Loss: 0.2793, Accuracy: 99.58%
Epoch [6/20] - Loss: 0.2741, Accuracy: 99.27%
Epoch [7/20] - Loss: 0.2754, Accuracy: 98.86%
Epoch [8/20] - Loss: 0.2624, Accuracy: 99.38%
Epoch [9/20] - Loss: 0.2506, Accuracy: 99.27%
Epoch [10/20] - Loss: 0.2532, Accuracy: 99.06%
Sample Predictions: [1 3 0 1 4]
Actual Labels: tensor([1, 3, 0, 1, 4], device='cuda:0')
Sample Logits: [[ 0.5325899   2.7271616  -0.93228567 -1.4941804  -1.5681636 ]
 [-1.2880548  -0.34863454 -0.4900883   2.710211   -0.6747599 ]
 [ 2.0029054  -0.7482904  -0.96009624 -0.59342957 -1.0987883 ]
 [-0.3558101   2.880725   -0.23246187 -1.0283724   0.2223655 ]
 [-0.84839743 -0.8656178   0.33457813 -1.0187712   2.1830077 ]]
Epoch [11/20] - Loss: 0.2407, Accuracy: 99.27%
Epoch [12/20] - Loss: 0.2359, Accuracy: 99.38%
Epoch [13/20] - Loss: 0.22

[I 2025-09-16 02:08:51,534] Trial 10 finished with value: 0.4855964871000936 and parameters: {'latent_size': 64, 'num_heads': 4, 'num_layers': 2, 'dropout_prob': 0.252885327497997, 'l1_lambda': 4.735907101962292e-07, 'learning_rate': 0.00020116372239523293}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.5736,
Macro Precision: 0.6718, Macro Recall: 0.5358, Macro F1-score: 0.5650,
Weighted Precision: 0.6068, Weighted Recall: 0.5270, Weighted F1-score: 0.5221,
Accuracy: 52.70%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4173    0.8056    0.5498        72
     Oxidation     0.5000    0.3000    0.3750        70
       Phospho     0.5417    0.3611    0.4333        36
Ubiquitination     1.0000    0.5200    0.6842        50
   Acetylation     0.9000    0.6923    0.7826        13

      accuracy                         0.5270       241
     macro avg     0.6718    0.5358    0.5650       241
  weighted avg     0.6068    0.5270    0.5221       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 1.2911, Accuracy: 82.62%
Epoch [2/20] - Loss: 1.2517, Accuracy: 83.45%
Epoch [3/20] - Loss: 1.2181, Accuracy: 85.33%
Epoch [4/20] - Loss: 1.1959, Accuracy: 85.02%
Epoch [5/20] - Loss: 1.1590, Accuracy: 86.68%
Epoch [6/20] - Loss: 1.1447, Accuracy: 87.51%
Epoch [7/20] - Loss: 1.1281, Accuracy: 86.68%
Epoch [8/20] - Loss: 1.1082, Accuracy: 87.30%
Epoch [9/20] - Loss: 1.0767, Accuracy: 87.41%
Epoch [10/20] - Loss: 1.0706, Accuracy: 88.03%
Sample Predictions: [0 1 0 1 3]
Actual Labels: tensor([0, 2, 0, 1, 3], device='cuda:0')
Sample Logits: [[ 1.7655411  -1.3120025  -0.2669053   0.19838189 -0.34514722]
 [-0.53110504  1.0087625   0.9776596  -0.6704491  -0.90718555]
 [ 1.501792   -0.08765934  0.10059367 -0.7429749  -1.0782741 ]
 [-0.50898516  1.6540116   0.31080148 -0.66848886 -0.31972906]
 [ 0.93077314 -0.34401935  0.06503412  1.2565744  -0.8261093 ]]
Epoch [11/20] - Loss: 1.0443, Accuracy: 88.35%
Epoch [12/20] - Loss: 1.0018, Accuracy: 91.47%
Epoch [13/20] - Loss: 1.00



Epoch [1/20] - Loss: 0.8429, Accuracy: 93.76%
Epoch [2/20] - Loss: 0.8214, Accuracy: 93.76%
Epoch [3/20] - Loss: 0.8039, Accuracy: 93.55%
Epoch [4/20] - Loss: 0.8061, Accuracy: 92.92%
Epoch [5/20] - Loss: 0.7866, Accuracy: 93.96%
Epoch [6/20] - Loss: 0.7579, Accuracy: 94.38%
Epoch [7/20] - Loss: 0.7457, Accuracy: 94.59%
Epoch [8/20] - Loss: 0.7462, Accuracy: 95.01%
Epoch [9/20] - Loss: 0.7382, Accuracy: 94.17%
Epoch [10/20] - Loss: 0.7221, Accuracy: 93.86%
Sample Predictions: [0 2 0 1 3]
Actual Labels: tensor([0, 2, 0, 1, 3], device='cuda:0')
Sample Logits: [[ 2.5026233  -1.6490467  -0.5768271  -0.30992466 -0.19262682]
 [-0.44739968  0.4141614   2.2104623  -1.0314862  -0.9463008 ]
 [ 1.7493932  -1.2234923   0.31471172 -1.0013527  -0.09079039]
 [-0.74729425  3.2279992  -0.02191863  0.14714204  0.11844404]
 [ 0.07278349 -0.04840414 -0.8398979   2.8253338  -0.9439336 ]]
Epoch [11/20] - Loss: 0.6995, Accuracy: 95.53%
Epoch [12/20] - Loss: 0.6911, Accuracy: 95.11%
Epoch [13/20] - Loss: 0.67

[I 2025-09-16 02:09:25,494] Trial 11 finished with value: 0.42554668498843845 and parameters: {'latent_size': 64, 'num_heads': 8, 'num_layers': 5, 'dropout_prob': 0.25391542342556, 'l1_lambda': 4.795417971751232e-06, 'learning_rate': 0.00010208995679429773}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.7198,
Macro Precision: 0.4610, Macro Recall: 0.3802, Macro F1-score: 0.3887,
Weighted Precision: 0.4753, Weighted Recall: 0.4564, Weighted F1-score: 0.4393,
Accuracy: 45.64%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3790    0.6528    0.4796        72
     Oxidation     0.4426    0.3803    0.4091        71
       Phospho     0.4000    0.1143    0.1778        35
Ubiquitination     0.7500    0.6000    0.6667        50
   Acetylation     0.3333    0.1538    0.2105        13

      accuracy                         0.4564       241
     macro avg     0.4610    0.3802    0.3887       241
  weighted avg     0.4753    0.4564    0.4393       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.8958, Accuracy: 86.78%
Epoch [2/20] - Loss: 0.8665, Accuracy: 90.32%
Epoch [3/20] - Loss: 0.8397, Accuracy: 90.22%
Epoch [4/20] - Loss: 0.8303, Accuracy: 89.80%
Epoch [5/20] - Loss: 0.8010, Accuracy: 93.24%
Epoch [6/20] - Loss: 0.7935, Accuracy: 90.53%
Epoch [7/20] - Loss: 0.7644, Accuracy: 92.20%
Epoch [8/20] - Loss: 0.7441, Accuracy: 93.34%
Epoch [9/20] - Loss: 0.7319, Accuracy: 93.24%
Epoch [10/20] - Loss: 0.7033, Accuracy: 94.28%
Sample Predictions: [2 0 0 3 0]
Actual Labels: tensor([2, 0, 0, 3, 0], device='cuda:0')
Sample Logits: [[-0.5476371  -0.5003604   1.1723666  -0.94243765 -0.37684172]
 [ 2.3906958   0.47958803 -0.97434026 -0.2807087  -0.90770483]
 [ 1.7625679   0.4436748   0.27185965 -0.2950087  -1.1927346 ]
 [-0.38641745  0.548909   -1.6428427   1.0507499  -0.9515276 ]
 [ 1.3461231   0.02580935 -1.1801667   0.47299722 -0.698631  ]]
Epoch [11/20] - Loss: 0.6917, Accuracy: 95.11%
Epoch [12/20] - Loss: 0.6810, Accuracy: 93.96%
Epoch [13/20] - Loss: 0.65



Epoch [1/20] - Loss: 0.5580, Accuracy: 96.46%
Epoch [2/20] - Loss: 0.5376, Accuracy: 97.29%
Epoch [3/20] - Loss: 0.5149, Accuracy: 97.40%
Epoch [4/20] - Loss: 0.5069, Accuracy: 98.13%
Epoch [5/20] - Loss: 0.5113, Accuracy: 96.57%
Epoch [6/20] - Loss: 0.4929, Accuracy: 97.09%
Epoch [7/20] - Loss: 0.4882, Accuracy: 96.77%
Epoch [8/20] - Loss: 0.4754, Accuracy: 97.19%
Epoch [9/20] - Loss: 0.4627, Accuracy: 97.40%
Epoch [10/20] - Loss: 0.4501, Accuracy: 98.54%
Sample Predictions: [2 0 0 3 0]
Actual Labels: tensor([2, 0, 0, 3, 0], device='cuda:0')
Sample Logits: [[-0.6317601  -0.9181788   2.2965937  -1.3083026  -0.62202036]
 [ 2.5115683   0.1763939  -1.2010565  -0.42243338 -0.40909916]
 [ 2.1055996  -0.47378072 -1.1326708   0.08213166 -0.02665908]
 [ 0.33448887 -0.5155858  -0.6076135   2.1998394  -0.49368757]
 [ 2.442431   -0.7136097  -0.4528786  -0.5863105  -0.951018  ]]
Epoch [11/20] - Loss: 0.4541, Accuracy: 97.50%
Epoch [12/20] - Loss: 0.4349, Accuracy: 98.23%
Epoch [13/20] - Loss: 0.43

[I 2025-09-16 02:09:59,113] Trial 12 finished with value: 0.46393840256571456 and parameters: {'latent_size': 64, 'num_heads': 8, 'num_layers': 4, 'dropout_prob': 0.22522059888720267, 'l1_lambda': 1.2793705678675405e-06, 'learning_rate': 0.00010232498284648924}. Best is trial 4 with value: 0.48896709495588064.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.5586,
Macro Precision: 0.6255, Macro Recall: 0.4658, Macro F1-score: 0.4891,
Weighted Precision: 0.6026, Weighted Recall: 0.5062, Weighted F1-score: 0.4902,
Accuracy: 50.62%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3922    0.8333    0.5333        72
     Oxidation     0.6923    0.2609    0.3789        69
       Phospho     0.5000    0.2703    0.3509        37
Ubiquitination     0.8286    0.5800    0.6824        50
   Acetylation     0.7143    0.3846    0.5000        13

      accuracy                         0.5062       241
     macro avg     0.6255    0.4658    0.4891       241
  weighted avg     0.6026    0.5062    0.4902       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.6412, Accuracy: 93.13%
Epoch [2/20] - Loss: 0.6023, Accuracy: 94.07%
Epoch [3/20] - Loss: 0.5692, Accuracy: 95.53%
Epoch [4/20] - Loss: 0.5465, Accuracy: 96.15%
Epoch [5/20] - Loss: 0.5102, Accuracy: 96.46%
Epoch [6/20] - Loss: 0.4766, Accuracy: 97.71%
Epoch [7/20] - Loss: 0.4587, Accuracy: 97.61%
Epoch [8/20] - Loss: 0.4398, Accuracy: 98.75%
Epoch [9/20] - Loss: 0.4178, Accuracy: 98.54%
Epoch [10/20] - Loss: 0.4033, Accuracy: 98.96%
Sample Predictions: [2 1 0 4 2]
Actual Labels: tensor([2, 1, 0, 4, 2], device='cuda:0')
Sample Logits: [[-1.4886158  -1.1270541   2.4809027  -0.530072   -0.9052881 ]
 [-1.1167455   2.725729   -0.1803489  -0.25224823  0.02463698]
 [ 2.9405096  -1.6298378  -0.4227852  -0.5115466  -0.0743144 ]
 [-0.86658114  1.3361372   0.43229628 -0.8415612   1.7764649 ]
 [-1.2064657  -0.85698086  2.5130732  -0.12023726 -0.84829414]]
Epoch [11/20] - Loss: 0.3916, Accuracy: 98.65%
Epoch [12/20] - Loss: 0.3696, Accuracy: 99.17%
Epoch [13/20] - Loss: 0.35



Epoch [1/20] - Loss: 0.2941, Accuracy: 99.90%
Epoch [2/20] - Loss: 0.2888, Accuracy: 99.58%
Epoch [3/20] - Loss: 0.2818, Accuracy: 99.69%
Epoch [4/20] - Loss: 0.2701, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.2660, Accuracy: 99.79%
Epoch [6/20] - Loss: 0.2644, Accuracy: 99.79%
Epoch [7/20] - Loss: 0.2540, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.2495, Accuracy: 99.69%
Epoch [9/20] - Loss: 0.2438, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.2399, Accuracy: 99.90%
Sample Predictions: [2 1 0 4 2]
Actual Labels: tensor([2, 1, 0, 4, 2], device='cuda:0')
Sample Logits: [[-0.8146743  -0.6436569   3.1489606  -1.1103306  -0.6926019 ]
 [-0.20218591  4.7239094  -0.69788826 -1.7243968  -1.2991579 ]
 [ 4.1043463  -2.183588   -1.4105784  -0.9214696  -1.3601489 ]
 [-0.13115002 -1.3731432  -0.6562053  -0.6424417   2.9979327 ]
 [-0.44071844 -0.22705807  2.8757966  -1.2928743  -0.71159744]]
Epoch [11/20] - Loss: 0.2367, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.2320, Accuracy: 99.90%
Epoch [13/20] - Loss: 0.

[I 2025-09-16 02:10:32,484] Trial 13 finished with value: 0.49189069005318664 and parameters: {'latent_size': 128, 'num_heads': 4, 'num_layers': 5, 'dropout_prob': 0.2707563089515309, 'l1_lambda': 1.8298118713741343e-06, 'learning_rate': 0.00012192036281034578}. Best is trial 13 with value: 0.49189069005318664.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5852,
Macro Precision: 0.5864, Macro Recall: 0.4452, Macro F1-score: 0.4690,
Weighted Precision: 0.5589, Weighted Recall: 0.4730, Weighted F1-score: 0.4626,
Accuracy: 47.30%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3904    0.7917    0.5229        72
     Oxidation     0.5833    0.2958    0.3925        71
       Phospho     0.3333    0.2571    0.2903        35
Ubiquitination     0.8750    0.4200    0.5676        50
   Acetylation     0.7500    0.4615    0.5714        13

      accuracy                         0.4730       241
     macro avg     0.5864    0.4452    0.4690       241
  weighted avg     0.5589    0.4730    0.4626       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.4257, Accuracy: 94.48%
Epoch [2/20] - Loss: 0.3915, Accuracy: 94.38%
Epoch [3/20] - Loss: 0.3692, Accuracy: 95.21%
Epoch [4/20] - Loss: 0.3414, Accuracy: 96.36%
Epoch [5/20] - Loss: 0.3152, Accuracy: 96.67%
Epoch [6/20] - Loss: 0.2991, Accuracy: 97.50%
Epoch [7/20] - Loss: 0.2890, Accuracy: 97.19%
Epoch [8/20] - Loss: 0.2628, Accuracy: 98.13%
Epoch [9/20] - Loss: 0.2446, Accuracy: 98.34%
Epoch [10/20] - Loss: 0.2303, Accuracy: 98.65%
Sample Predictions: [2 1 0 0 2]
Actual Labels: tensor([2, 1, 0, 0, 2], device='cuda:0')
Sample Logits: [[-1.0495127  -0.77572083  3.237881   -0.74821955 -1.2354686 ]
 [-1.8834236   3.3275769  -0.5736055  -0.68611765 -0.9728243 ]
 [ 3.170656   -1.8681527  -0.60485476 -0.5770362  -0.9456804 ]
 [ 3.485796   -1.5571274  -1.2005073  -1.4692556   0.01110803]
 [-1.018664    0.3872731   2.6924574  -1.4566672  -1.3336529 ]]
Epoch [11/20] - Loss: 0.2220, Accuracy: 98.75%
Epoch [12/20] - Loss: 0.2072, Accuracy: 99.38%
Epoch [13/20] - Loss: 0.20



Epoch [1/20] - Loss: 0.1428, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.1378, Accuracy: 99.90%
Epoch [3/20] - Loss: 0.1355, Accuracy: 99.79%
Epoch [4/20] - Loss: 0.1339, Accuracy: 99.58%
Epoch [5/20] - Loss: 0.1240, Accuracy: 99.90%
Epoch [6/20] - Loss: 0.1199, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.1175, Accuracy: 99.79%
Epoch [8/20] - Loss: 0.1139, Accuracy: 99.90%
Epoch [9/20] - Loss: 0.1102, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.1091, Accuracy: 99.90%
Sample Predictions: [2 1 0 0 2]
Actual Labels: tensor([2, 1, 0, 0, 2], device='cuda:0')
Sample Logits: [[-0.86140215 -1.051619    3.9546397  -0.28136975 -0.99906   ]
 [-0.8468181   3.2384179  -1.4037051  -0.6553581  -0.8658742 ]
 [ 3.7694144  -0.89370185 -2.172404   -0.12507012 -0.5356576 ]
 [ 4.3132877  -0.9716706  -0.2730573  -0.961293   -0.7834435 ]
 [-1.6967984   0.10887693  2.786582   -0.7650996  -0.798312  ]]
Epoch [11/20] - Loss: 0.1074, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.1046, Accuracy: 99.79%
Epoch [13/20] - Loss: 0

[I 2025-09-16 02:11:05,856] Trial 14 finished with value: 0.4575511002309516 and parameters: {'latent_size': 128, 'num_heads': 4, 'num_layers': 5, 'dropout_prob': 0.2645152455964582, 'l1_lambda': 4.824664986034801e-07, 'learning_rate': 0.00012540156645127898}. Best is trial 13 with value: 0.49189069005318664.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.6650,
Macro Precision: 0.6361, Macro Recall: 0.3638, Macro F1-score: 0.3782,
Weighted Precision: 0.6028, Weighted Recall: 0.4274, Weighted F1-score: 0.3925,
Accuracy: 42.74%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3497    0.8889    0.5020        72
     Oxidation     0.6364    0.2029    0.3077        69
       Phospho     0.4444    0.2162    0.2909        37
Ubiquitination     1.0000    0.2800    0.4375        50
   Acetylation     0.7500    0.2308    0.3529        13

      accuracy                         0.4274       241
     macro avg     0.6361    0.3638    0.3782       241
  weighted avg     0.6028    0.4274    0.3925       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:11:22,347] Trial 15 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.0093,
Macro Precision: 0.5203, Macro Recall: 0.4223, Macro F1-score: 0.4412,
Weighted Precision: 0.5088, Weighted Recall: 0.4772, Weighted F1-score: 0.4647,
Accuracy: 47.72%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4309    0.7361    0.5436        72
     Oxidation     0.4286    0.3857    0.4060        70
       Phospho     0.5000    0.2222    0.3077        36
Ubiquitination     0.7419    0.4600    0.5679        50
   Acetylation     0.5000    0.3077    0.3810        13

      accuracy                         0.4772       241
     macro avg     0.5203    0.4223    0.4412       241
  weighted avg     0.5088    0.4772    0.4647       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:11:34,534] Trial 16 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.3348,
Macro Precision: 0.2600, Macro Recall: 0.2029, Macro F1-score: 0.0979,
Weighted Precision: 0.3801, Weighted Recall: 0.3029, Weighted F1-score: 0.1461,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2029    0.0979       241
  weighted avg     0.3801    0.3029    0.1461       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.6526, Accuracy: 95.53%
Epoch [2/20] - Loss: 0.6231, Accuracy: 96.25%
Epoch [3/20] - Loss: 0.5984, Accuracy: 96.05%
Epoch [4/20] - Loss: 0.5741, Accuracy: 96.36%
Epoch [5/20] - Loss: 0.5454, Accuracy: 97.09%
Epoch [6/20] - Loss: 0.5438, Accuracy: 96.98%
Epoch [7/20] - Loss: 0.5278, Accuracy: 96.88%
Epoch [8/20] - Loss: 0.4901, Accuracy: 98.02%
Epoch [9/20] - Loss: 0.4698, Accuracy: 98.02%
Epoch [10/20] - Loss: 0.4544, Accuracy: 97.92%
Sample Predictions: [0 3 1 0 0]
Actual Labels: tensor([0, 3, 1, 0, 0], device='cuda:0')
Sample Logits: [[ 2.197338   -0.07100908 -0.33958545 -1.015199   -0.45006013]
 [-0.27292028 -0.0590806  -0.36735544  2.291285   -0.86399627]
 [-0.09153229  1.5992397  -1.026262    0.6134398  -0.88973653]
 [ 2.541685   -0.86257654 -0.11403967 -0.6002184  -0.81605554]
 [ 1.8797218  -0.36054102 -0.55210155  0.29455653 -0.49990857]]
Epoch [11/20] - Loss: 0.4395, Accuracy: 98.96%
Epoch [12/20] - Loss: 0.4385, Accuracy: 98.44%
Epoch [13/20] - Loss: 0.41



Epoch [1/20] - Loss: 0.3216, Accuracy: 99.69%
Epoch [2/20] - Loss: 0.3172, Accuracy: 99.06%
Epoch [3/20] - Loss: 0.3073, Accuracy: 98.86%
Epoch [4/20] - Loss: 0.3006, Accuracy: 99.06%
Epoch [5/20] - Loss: 0.2863, Accuracy: 99.38%
Epoch [6/20] - Loss: 0.2810, Accuracy: 99.48%
Epoch [7/20] - Loss: 0.2742, Accuracy: 99.06%
Epoch [8/20] - Loss: 0.2621, Accuracy: 99.69%
Epoch [9/20] - Loss: 0.2597, Accuracy: 99.06%
Epoch [10/20] - Loss: 0.2466, Accuracy: 99.58%
Sample Predictions: [0 0 1 0 0]
Actual Labels: tensor([0, 3, 1, 0, 0], device='cuda:0')
Sample Logits: [[ 2.9597456  -1.1764922  -0.54754376 -1.0368335  -0.4320428 ]
 [ 1.0550629   0.5053717  -1.098688    0.62323225 -0.8945479 ]
 [-0.8006641   2.7318304  -1.3231772  -0.4589813  -0.59907866]
 [ 3.2645574  -0.7117144  -0.90301585 -0.6603279  -0.6006835 ]
 [ 2.0348024  -0.2981818  -0.7522608   0.13807194  0.0342741 ]]
Epoch [11/20] - Loss: 0.2515, Accuracy: 99.27%
Epoch [12/20] - Loss: 0.2399, Accuracy: 99.27%
Epoch [13/20] - Loss: 0.23

[I 2025-09-16 02:12:08,037] Trial 17 finished with value: 0.48566353934118667 and parameters: {'latent_size': 64, 'num_heads': 4, 'num_layers': 5, 'dropout_prob': 0.17975337777514894, 'l1_lambda': 2.501796625120027e-07, 'learning_rate': 0.00011913392816864405}. Best is trial 13 with value: 0.49189069005318664.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.6487,
Macro Precision: 0.6258, Macro Recall: 0.4821, Macro F1-score: 0.5075,
Weighted Precision: 0.6051, Weighted Recall: 0.5104, Weighted F1-score: 0.5030,
Accuracy: 51.04%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4041    0.8194    0.5413        72
     Oxidation     0.6216    0.3286    0.4299        70
       Phospho     0.5200    0.3611    0.4262        36
Ubiquitination     0.9167    0.4400    0.5946        50
   Acetylation     0.6667    0.4615    0.5455        13

      accuracy                         0.5104       241
     macro avg     0.6258    0.4821    0.5075       241
  weighted avg     0.6051    0.5104    0.5030       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:12:20,136] Trial 18 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5172,
Macro Precision: 0.0605, Macro Recall: 0.2000, Macro F1-score: 0.0929,
Weighted Precision: 0.0904, Weighted Recall: 0.2988, Weighted F1-score: 0.1388,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3025    1.0000    0.4645        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0605    0.2000    0.0929       241
  weighted avg     0.0904    0.2988    0.1388       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:12:37,023] Trial 19 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1316,
Macro Precision: 0.4061, Macro Recall: 0.3950, Macro F1-score: 0.3572,
Weighted Precision: 0.4746, Weighted Recall: 0.4440, Weighted F1-score: 0.4199,
Accuracy: 44.40%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 51, np.int64(2): 34, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3894    0.6111    0.4757        72
     Oxidation     0.5000    0.5493    0.5235        71
       Phospho     0.0000    0.0000    0.0000        34
Ubiquitination     0.9474    0.3529    0.5143        51
   Acetylation     0.1935    0.4615    0.2727        13

      accuracy                         0.4440       241
     macro avg     0.4061    0.3950    0.3572       241
  weighted avg     0.4746    0.4440    0.4199       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:12:49,138] Trial 20 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.3936,
Macro Precision: 0.1938, Macro Recall: 0.2111, Macro F1-score: 0.1134,
Weighted Precision: 0.1900, Weighted Recall: 0.3071, Weighted F1-score: 0.1541,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3025    1.0000    0.4645        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.6667    0.0556    0.1026        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.1938    0.2111    0.1134       241
  weighted avg     0.1900    0.3071    0.1541       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.7520, Accuracy: 93.44%
Epoch [2/20] - Loss: 0.6961, Accuracy: 95.63%
Epoch [3/20] - Loss: 0.6750, Accuracy: 94.38%
Epoch [4/20] - Loss: 0.6640, Accuracy: 94.90%
Epoch [5/20] - Loss: 0.6215, Accuracy: 96.46%
Epoch [6/20] - Loss: 0.5854, Accuracy: 97.29%
Epoch [7/20] - Loss: 0.5630, Accuracy: 97.40%
Epoch [8/20] - Loss: 0.5298, Accuracy: 98.65%
Epoch [9/20] - Loss: 0.5139, Accuracy: 98.44%
Epoch [10/20] - Loss: 0.5028, Accuracy: 98.54%
Sample Predictions: [0 0 1 3 3]
Actual Labels: tensor([2, 0, 1, 3, 3], device='cuda:0')
Sample Logits: [[ 1.2693635  -0.49092138  0.9940384  -1.3541417  -1.4244527 ]
 [ 2.8986964  -0.05107183 -0.4654433  -1.3636489  -0.26250842]
 [-0.76761425  2.7147946  -0.15230128 -1.4168044  -1.0004046 ]
 [-2.0227885   0.26602593  0.05033499  1.6421951  -0.91551954]
 [-1.5346763  -0.65822154 -1.4653724   2.3209667   0.03297698]]
Epoch [11/20] - Loss: 0.4754, Accuracy: 98.75%
Epoch [12/20] - Loss: 0.4591, Accuracy: 98.65%
Epoch [13/20] - Loss: 0.44



Epoch [1/20] - Loss: 0.3562, Accuracy: 99.69%
Epoch [2/20] - Loss: 0.3466, Accuracy: 99.69%
Epoch [3/20] - Loss: 0.3341, Accuracy: 99.79%
Epoch [4/20] - Loss: 0.3214, Accuracy: 99.90%
Epoch [5/20] - Loss: 0.3179, Accuracy: 99.69%
Epoch [6/20] - Loss: 0.3107, Accuracy: 99.79%
Epoch [7/20] - Loss: 0.3067, Accuracy: 99.58%
Epoch [8/20] - Loss: 0.2944, Accuracy: 99.90%
Epoch [9/20] - Loss: 0.2884, Accuracy: 99.79%
Epoch [10/20] - Loss: 0.2831, Accuracy: 99.69%
Sample Predictions: [2 0 1 3 3]
Actual Labels: tensor([2, 0, 1, 3, 3], device='cuda:0')
Sample Logits: [[ 0.311568   -1.6693417   3.8150733   0.10682163 -0.98165184]
 [ 3.59079    -0.7093445  -2.1478975  -1.1135092  -0.23510052]
 [-1.3835461   4.027122   -1.4042188  -1.1822302  -1.1132201 ]
 [-1.6863836  -1.2996621  -0.0715854   4.140104   -1.1991248 ]
 [-1.7140849  -0.6878662  -1.2439989   3.5201702   0.51945937]]
Epoch [11/20] - Loss: 0.2781, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.2724, Accuracy: 99.90%
Epoch [13/20] - Loss: 0.26

[I 2025-09-16 02:13:22,265] Trial 21 finished with value: 0.44216256408888593 and parameters: {'latent_size': 128, 'num_heads': 4, 'num_layers': 4, 'dropout_prob': 0.2941054309139025, 'l1_lambda': 2.2261209005745964e-06, 'learning_rate': 0.00011183864526471359}. Best is trial 13 with value: 0.49189069005318664.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.3595,
Macro Precision: 0.4754, Macro Recall: 0.3684, Macro F1-score: 0.3680,
Weighted Precision: 0.5397, Weighted Recall: 0.4730, Weighted F1-score: 0.4435,
Accuracy: 47.30%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3937    0.8750    0.5431        72
     Oxidation     0.5500    0.3099    0.3964        71
       Phospho     0.6000    0.2571    0.3600        35
Ubiquitination     0.8333    0.4000    0.5405        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.4730       241
     macro avg     0.4754    0.3684    0.3680       241
  weighted avg     0.5397    0.4730    0.4435       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:13:34,550] Trial 22 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2819,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.9594, Accuracy: 88.76%
Epoch [2/20] - Loss: 0.9027, Accuracy: 91.16%
Epoch [3/20] - Loss: 0.8565, Accuracy: 91.88%
Epoch [4/20] - Loss: 0.8525, Accuracy: 90.11%
Epoch [5/20] - Loss: 0.8014, Accuracy: 92.51%
Epoch [6/20] - Loss: 0.7875, Accuracy: 92.30%
Epoch [7/20] - Loss: 0.7454, Accuracy: 94.17%
Epoch [8/20] - Loss: 0.7236, Accuracy: 93.34%
Epoch [9/20] - Loss: 0.7200, Accuracy: 93.13%
Epoch [10/20] - Loss: 0.6990, Accuracy: 93.86%
Sample Predictions: [1 0 0 0 1]
Actual Labels: tensor([1, 0, 0, 0, 1], device='cuda:0')
Sample Logits: [[-1.8082769   2.5382476  -0.6566738  -0.09987704 -0.8392023 ]
 [ 2.7847497  -0.61742765 -1.7980615   0.11232227 -0.76323324]
 [ 3.4515514   0.29459918 -0.6166223  -0.86688155  0.63961625]
 [ 3.5306375  -0.56423825 -1.2082708  -1.2368793   0.6026547 ]
 [-1.5703982   2.3438752  -0.10604821 -0.98574305 -0.27490437]]
Epoch [11/20] - Loss: 0.6709, Accuracy: 95.73%
Epoch [12/20] - Loss: 0.6441, Accuracy: 95.11%
Epoch [13/20] - Loss: 0.63



Epoch [1/20] - Loss: 0.5617, Accuracy: 96.77%
Epoch [2/20] - Loss: 0.5323, Accuracy: 98.65%
Epoch [3/20] - Loss: 0.5352, Accuracy: 97.81%
Epoch [4/20] - Loss: 0.5089, Accuracy: 98.54%
Epoch [5/20] - Loss: 0.4991, Accuracy: 98.23%
Epoch [6/20] - Loss: 0.4814, Accuracy: 99.27%
Epoch [7/20] - Loss: 0.4855, Accuracy: 99.17%
Epoch [8/20] - Loss: 0.4759, Accuracy: 98.96%
Epoch [9/20] - Loss: 0.4676, Accuracy: 99.06%
Epoch [10/20] - Loss: 0.4626, Accuracy: 99.17%
Sample Predictions: [1 0 0 0 1]
Actual Labels: tensor([1, 0, 0, 0, 1], device='cuda:0')
Sample Logits: [[-0.37150776  3.926421   -1.7004832  -0.42905727 -2.042944  ]
 [ 4.066157   -0.8390588  -1.3019866  -0.21232577 -0.86593264]
 [ 3.8884144  -0.06054203 -1.8517605  -0.7053087  -0.60410875]
 [ 3.7430077  -1.2828889  -0.9960013  -0.9060198  -0.32312164]
 [-1.1883444   3.904064   -1.6618313  -0.92964494 -1.2429209 ]]
Epoch [11/20] - Loss: 0.4454, Accuracy: 99.58%
Epoch [12/20] - Loss: 0.4433, Accuracy: 98.86%
Epoch [13/20] - Loss: 0.43

[I 2025-09-16 02:14:08,569] Trial 23 finished with value: 0.46906610196974413 and parameters: {'latent_size': 128, 'num_heads': 4, 'num_layers': 6, 'dropout_prob': 0.3192091951305976, 'l1_lambda': 3.2273344152821056e-06, 'learning_rate': 0.00012577658420243306}. Best is trial 13 with value: 0.49189069005318664.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.7162,
Macro Precision: 0.6475, Macro Recall: 0.3801, Macro F1-score: 0.4044,
Weighted Precision: 0.5928, Weighted Recall: 0.4149, Weighted F1-score: 0.3999,
Accuracy: 41.49%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3313    0.7500    0.4596        72
     Oxidation     0.4130    0.2714    0.3276        70
       Phospho     1.0000    0.1944    0.3256        36
Ubiquitination     0.9375    0.3000    0.4545        50
   Acetylation     0.5556    0.3846    0.4545        13

      accuracy                         0.4149       241
     macro avg     0.6475    0.3801    0.4044       241
  weighted avg     0.5928    0.4149    0.3999       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.8400, Accuracy: 84.08%
Epoch [2/20] - Loss: 0.7961, Accuracy: 85.12%
Epoch [3/20] - Loss: 0.7783, Accuracy: 86.16%
Epoch [4/20] - Loss: 0.7368, Accuracy: 86.99%
Epoch [5/20] - Loss: 0.6862, Accuracy: 89.39%
Epoch [6/20] - Loss: 0.6764, Accuracy: 88.87%
Epoch [7/20] - Loss: 0.6219, Accuracy: 91.47%
Epoch [8/20] - Loss: 0.6110, Accuracy: 90.32%
Epoch [9/20] - Loss: 0.5548, Accuracy: 93.34%
Epoch [10/20] - Loss: 0.5420, Accuracy: 93.86%
Sample Predictions: [1 1 1 1 1]
Actual Labels: tensor([1, 1, 1, 1, 1], device='cuda:0')
Sample Logits: [[-1.1855003   1.4192104  -0.8982952  -1.1635792   0.02597226]
 [ 0.38291967  2.4724195  -1.4102954  -1.135012   -0.9999484 ]
 [-0.43920442  2.2951937  -1.2266347  -0.49263716 -0.786591  ]
 [-0.42620894  2.309885   -0.9530939  -0.9590771  -0.53330255]
 [-1.4626975   2.7672548  -0.02042992 -1.2050948  -0.2640203 ]]
Epoch [11/20] - Loss: 0.5280, Accuracy: 93.44%
Epoch [12/20] - Loss: 0.4852, Accuracy: 95.01%
Epoch [13/20] - Loss: 0.48



Epoch [1/20] - Loss: 0.3510, Accuracy: 97.40%
Epoch [2/20] - Loss: 0.3358, Accuracy: 97.50%
Epoch [3/20] - Loss: 0.3328, Accuracy: 96.88%
Epoch [4/20] - Loss: 0.3251, Accuracy: 98.02%
Epoch [5/20] - Loss: 0.3153, Accuracy: 97.71%
Epoch [6/20] - Loss: 0.3054, Accuracy: 97.61%
Epoch [7/20] - Loss: 0.2977, Accuracy: 97.29%
Epoch [8/20] - Loss: 0.2952, Accuracy: 98.02%
Epoch [9/20] - Loss: 0.2757, Accuracy: 98.44%
Epoch [10/20] - Loss: 0.2662, Accuracy: 98.34%
Sample Predictions: [1 1 1 1 1]
Actual Labels: tensor([1, 1, 1, 1, 1], device='cuda:0')
Sample Logits: [[-1.3735536   2.4329114  -2.1350632  -0.31665447 -0.02200594]
 [-1.4103307   2.8004003  -1.6643996  -0.08120689 -0.12964867]
 [-1.2279433   3.0169306  -1.3322476  -0.86305314  0.28552985]
 [-0.31478873  1.8528476  -0.84046566 -1.6501284   0.6432052 ]
 [-0.62083125  2.5747035  -1.12997    -0.7807614  -0.06555946]]
Epoch [11/20] - Loss: 0.2515, Accuracy: 98.54%
Epoch [12/20] - Loss: 0.2605, Accuracy: 98.44%
Epoch [13/20] - Loss: 0.24

[I 2025-09-16 02:14:42,324] Trial 24 finished with value: 0.5239232890762356 and parameters: {'latent_size': 128, 'num_heads': 4, 'num_layers': 4, 'dropout_prob': 0.34853786537534726, 'l1_lambda': 8.400719443378379e-07, 'learning_rate': 0.00010032261430366369}. Best is trial 24 with value: 0.5239232890762356.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.1712,
Macro Precision: 0.6628, Macro Recall: 0.4691, Macro F1-score: 0.5044,
Weighted Precision: 0.6122, Weighted Recall: 0.4938, Weighted F1-score: 0.4898,
Accuracy: 49.38%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3889    0.7778    0.5185        72
     Oxidation     0.4906    0.3662    0.4194        71
       Phospho     0.8235    0.4000    0.5385        35
Ubiquitination     0.9444    0.3400    0.5000        50
   Acetylation     0.6667    0.4615    0.5455        13

      accuracy                         0.4938       241
     macro avg     0.6628    0.4691    0.5044       241
  weighted avg     0.6122    0.4938    0.4898       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.9832, Accuracy: 77.21%
Epoch [2/20] - Loss: 0.9517, Accuracy: 79.29%
Epoch [3/20] - Loss: 0.9123, Accuracy: 81.06%
Epoch [4/20] - Loss: 0.8856, Accuracy: 82.10%
Epoch [5/20] - Loss: 0.8309, Accuracy: 83.87%
Epoch [6/20] - Loss: 0.8037, Accuracy: 84.91%
Epoch [7/20] - Loss: 0.7707, Accuracy: 86.58%
Epoch [8/20] - Loss: 0.7418, Accuracy: 88.24%
Epoch [9/20] - Loss: 0.7157, Accuracy: 88.45%
Epoch [10/20] - Loss: 0.6820, Accuracy: 89.18%
Sample Predictions: [1 0 0 2 2]
Actual Labels: tensor([0, 0, 0, 2, 2], device='cuda:0')
Sample Logits: [[ 0.6152459   0.7088121  -0.02046071 -1.5196173  -1.6179773 ]
 [ 0.9014926  -0.08888955  0.37599075 -0.85526747 -1.2648891 ]
 [ 1.7950786  -0.7029883  -0.8321414  -0.17282428 -0.5420871 ]
 [ 0.69003314 -0.3106372   1.5076654   0.07719144 -0.5929359 ]
 [ 0.5316635   0.0974755   1.8574413  -0.27249432 -1.4664534 ]]
Epoch [11/20] - Loss: 0.6449, Accuracy: 90.95%
Epoch [12/20] - Loss: 0.6536, Accuracy: 89.91%
Epoch [13/20] - Loss: 0.63



Epoch [1/20] - Loss: 0.4865, Accuracy: 94.07%
Epoch [2/20] - Loss: 0.4606, Accuracy: 94.38%
Epoch [3/20] - Loss: 0.4492, Accuracy: 94.48%
Epoch [4/20] - Loss: 0.4472, Accuracy: 94.48%
Epoch [5/20] - Loss: 0.4198, Accuracy: 95.42%
Epoch [6/20] - Loss: 0.4092, Accuracy: 95.21%
Epoch [7/20] - Loss: 0.4176, Accuracy: 94.69%
Epoch [8/20] - Loss: 0.3851, Accuracy: 96.15%
Epoch [9/20] - Loss: 0.3653, Accuracy: 96.67%
Epoch [10/20] - Loss: 0.3818, Accuracy: 95.94%
Sample Predictions: [0 0 0 2 2]
Actual Labels: tensor([0, 0, 0, 2, 2], device='cuda:0')
Sample Logits: [[ 2.5232882  -1.754608   -0.04464891 -1.1982319  -0.9288788 ]
 [ 2.585932   -0.8919506  -1.4164267  -0.4359124  -0.84989893]
 [ 2.8123357  -0.7202306  -1.3900276  -1.0238819  -1.1132618 ]
 [-0.4190484   0.96226496  2.1984766  -0.5931176  -0.55931354]
 [-0.11149168 -0.467142    2.1626632  -0.75373244 -1.1243467 ]]
Epoch [11/20] - Loss: 0.3618, Accuracy: 96.15%
Epoch [12/20] - Loss: 0.3515, Accuracy: 96.46%
Epoch [13/20] - Loss: 0.33

[I 2025-09-16 02:15:14,882] Trial 25 finished with value: 0.4819211404980522 and parameters: {'latent_size': 64, 'num_heads': 4, 'num_layers': 2, 'dropout_prob': 0.3494742980046216, 'l1_lambda': 5.712294006764694e-07, 'learning_rate': 0.0002979594374839385}. Best is trial 24 with value: 0.5239232890762356.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.8829,
Macro Precision: 0.6605, Macro Recall: 0.3967, Macro F1-score: 0.4353,
Weighted Precision: 0.5638, Weighted Recall: 0.4647, Weighted F1-score: 0.4663,
Accuracy: 46.47%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3607    0.6111    0.4536        72
     Oxidation     0.4110    0.4286    0.4196        70
       Phospho     0.6000    0.2500    0.3529        36
Ubiquitination     0.9310    0.5400    0.6835        50
   Acetylation     1.0000    0.1538    0.2667        13

      accuracy                         0.4647       241
     macro avg     0.6605    0.3967    0.4353       241
  weighted avg     0.5638    0.4647    0.4663       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:15:26,551] Trial 26 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2784,
Macro Precision: 0.0603, Macro Recall: 0.2000, Macro F1-score: 0.0926,
Weighted Precision: 0.0900, Weighted Recall: 0.2988, Weighted F1-score: 0.1383,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0603    0.2000    0.0926       241
  weighted avg     0.0900    0.2988    0.1383       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.7562, Accuracy: 88.87%
Epoch [2/20] - Loss: 0.7525, Accuracy: 88.87%
Epoch [3/20] - Loss: 0.7215, Accuracy: 89.18%
Epoch [4/20] - Loss: 0.6952, Accuracy: 89.80%
Epoch [5/20] - Loss: 0.6826, Accuracy: 90.22%
Epoch [6/20] - Loss: 0.6627, Accuracy: 91.78%
Epoch [7/20] - Loss: 0.6317, Accuracy: 91.47%
Epoch [8/20] - Loss: 0.6355, Accuracy: 91.78%
Epoch [9/20] - Loss: 0.5993, Accuracy: 93.03%
Epoch [10/20] - Loss: 0.5806, Accuracy: 93.65%
Sample Predictions: [1 1 0 3 0]
Actual Labels: tensor([1, 1, 0, 3, 0], device='cuda:0')
Sample Logits: [[ 0.50181824  2.1103683   0.14699242 -1.4014753  -0.52364314]
 [-0.5199475   1.8330454  -0.39510658 -1.5846472  -1.4006606 ]
 [ 1.5985576  -0.02808408 -0.58564466  0.54340595  0.22499004]
 [-0.79766965 -0.48010576 -0.74768525  1.2921184  -0.907379  ]
 [ 1.0253719   0.42286217  0.13110884 -0.66032577 -0.4127586 ]]
Epoch [11/20] - Loss: 0.5715, Accuracy: 93.24%
Epoch [12/20] - Loss: 0.5466, Accuracy: 93.34%
Epoch [13/20] - Loss: 0.53



Epoch [1/20] - Loss: 0.4333, Accuracy: 95.32%
Epoch [2/20] - Loss: 0.4188, Accuracy: 95.63%
Epoch [3/20] - Loss: 0.4060, Accuracy: 96.36%
Epoch [4/20] - Loss: 0.3934, Accuracy: 96.25%
Epoch [5/20] - Loss: 0.4029, Accuracy: 95.32%
Epoch [6/20] - Loss: 0.3726, Accuracy: 96.25%
Epoch [7/20] - Loss: 0.3636, Accuracy: 96.57%
Epoch [8/20] - Loss: 0.3766, Accuracy: 95.53%
Epoch [9/20] - Loss: 0.3462, Accuracy: 96.98%
Epoch [10/20] - Loss: 0.3525, Accuracy: 96.46%
Sample Predictions: [1 1 0 3 0]
Actual Labels: tensor([1, 1, 0, 3, 0], device='cuda:0')
Sample Logits: [[-0.7402084   2.3090959  -0.13128504 -0.62589794 -0.9924996 ]
 [-0.74772483  2.1207943  -0.747863   -1.0928353  -1.2608793 ]
 [ 3.5571408   0.07354155 -0.18770155  0.4518015   0.24771348]
 [-0.906788   -1.6999991  -0.71120185  2.683697   -0.17103311]
 [ 2.164304    0.5539536  -0.6892783   0.12303898 -1.0300255 ]]
Epoch [11/20] - Loss: 0.3397, Accuracy: 96.98%
Epoch [12/20] - Loss: 0.3251, Accuracy: 96.77%
Epoch [13/20] - Loss: 0.32

[I 2025-09-16 02:15:59,325] Trial 27 finished with value: 0.46449886588707195 and parameters: {'latent_size': 64, 'num_heads': 4, 'num_layers': 3, 'dropout_prob': 0.2664518814619523, 'l1_lambda': 3.2548746177820224e-07, 'learning_rate': 0.00014310222554089253}. Best is trial 24 with value: 0.5239232890762356.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.6668,
Macro Precision: 0.5548, Macro Recall: 0.3627, Macro F1-score: 0.3741,
Weighted Precision: 0.5298, Weighted Recall: 0.4440, Weighted F1-score: 0.4152,
Accuracy: 44.40%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3910    0.8472    0.5351        72
     Oxidation     0.4773    0.3043    0.3717        69
       Phospho     0.3500    0.1944    0.2500        36
Ubiquitination     0.8889    0.3137    0.4638        51
   Acetylation     0.6667    0.1538    0.2500        13

      accuracy                         0.4440       241
     macro avg     0.5548    0.3627    0.3741       241
  weighted avg     0.5298    0.4440    0.4152       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:16:11,058] Trial 28 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.9290,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:16:22,732] Trial 29 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.2025,
Macro Precision: 0.4104, Macro Recall: 0.2138, Macro F1-score: 0.1241,
Weighted Precision: 0.5156, Weighted Recall: 0.3154, Weighted F1-score: 0.1777,
Accuracy: 31.54%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3021    0.9861    0.4625        72
     Oxidation     0.7500    0.0429    0.0811        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     1.0000    0.0400    0.0769        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3154       241
     macro avg     0.4104    0.2138    0.1241       241
  weighted avg     0.5156    0.3154    0.1777       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:16:35,027] Trial 30 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.6877,
Macro Precision: 0.3605, Macro Recall: 0.2084, Macro F1-score: 0.1093,
Weighted Precision: 0.3850, Weighted Recall: 0.3071, Weighted F1-score: 0.1549,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3025    1.0000    0.4645        72
     Oxidation     0.5000    0.0143    0.0278        70
       Phospho     1.0000    0.0278    0.0541        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.3605    0.2084    0.1093       241
  weighted avg     0.3850    0.3071    0.1549       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:16:46,977] Trial 31 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.0534,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:16:59,020] Trial 32 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1593,
Macro Precision: 0.1274, Macro Recall: 0.2029, Macro F1-score: 0.0988,
Weighted Precision: 0.1848, Weighted Recall: 0.3029, Weighted F1-score: 0.1472,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 68, np.int64(3): 51, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3038    1.0000    0.4660        72
     Oxidation     0.3333    0.0147    0.0282        68
       Phospho     0.0000    0.0000    0.0000        37
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.1274    0.2029    0.0988       241
  weighted avg     0.1848    0.3029    0.1472       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:17:10,579] Trial 33 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.6037,
Macro Precision: 0.3610, Macro Recall: 0.2136, Macro F1-score: 0.1194,
Weighted Precision: 0.3733, Weighted Recall: 0.3112, Weighted F1-score: 0.1635,
Accuracy: 31.12%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3051    1.0000    0.4675        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.5000    0.0278    0.0526        36
Ubiquitination     1.0000    0.0400    0.0769        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3112       241
     macro avg     0.3610    0.2136    0.1194       241
  weighted avg     0.3733    0.3112    0.1635       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.4671, Accuracy: 98.02%
Epoch [2/20] - Loss: 0.4299, Accuracy: 98.44%
Epoch [3/20] - Loss: 0.4084, Accuracy: 98.34%
Epoch [4/20] - Loss: 0.3850, Accuracy: 99.58%
Epoch [5/20] - Loss: 0.3611, Accuracy: 99.17%
Epoch [6/20] - Loss: 0.3518, Accuracy: 99.38%
Epoch [7/20] - Loss: 0.3346, Accuracy: 99.58%
Epoch [8/20] - Loss: 0.3214, Accuracy: 99.79%
Epoch [9/20] - Loss: 0.3069, Accuracy: 99.69%
Epoch [10/20] - Loss: 0.2998, Accuracy: 99.79%
Sample Predictions: [0 4 0 0 3]
Actual Labels: tensor([0, 4, 0, 0, 3], device='cuda:0')
Sample Logits: [[ 3.6503663  -2.4426172  -1.5026612  -1.0639277  -0.9644419 ]
 [-1.091305   -1.2323407  -0.8432375  -0.5948967   2.6358433 ]
 [ 5.1462445  -2.2453635   0.06715882 -2.712269   -1.6464502 ]
 [ 4.2012076  -1.4893682  -0.8252274  -1.4963708  -1.2398663 ]
 [-1.2616308  -1.9679291  -0.81698215  3.6120598   0.12866223]]
Epoch [11/20] - Loss: 0.2942, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.2851, Accuracy: 99.48%
Epoch [13/20] - Loss: 0.2



Epoch [1/20] - Loss: 0.2486, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.2420, Accuracy: 100.00%
Epoch [3/20] - Loss: 0.2350, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.2306, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.2242, Accuracy: 100.00%
Epoch [6/20] - Loss: 0.2201, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.2150, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.2099, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.2065, Accuracy: 100.00%
Epoch [10/20] - Loss: 0.2027, Accuracy: 100.00%
Sample Predictions: [0 4 0 0 3]
Actual Labels: tensor([0, 4, 0, 0, 3], device='cuda:0')
Sample Logits: [[ 4.738529   -1.258446   -2.1224806  -0.36297247 -0.85330826]
 [-3.064768    0.33493388 -0.94790477 -1.2384835   4.9418573 ]
 [ 5.281398   -1.5418423  -1.8589833  -1.3598     -2.0200484 ]
 [ 5.55737    -2.1017797  -1.1366595  -2.471272   -1.9327403 ]
 [-1.4417981  -0.7997155  -0.8393206   5.4640455  -0.6644828 ]]
Epoch [11/20] - Loss: 0.1995, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.1953, Accuracy: 100.00%
Epoch [13/20] 

[I 2025-09-16 02:17:43,953] Trial 34 finished with value: 0.46671011945237406 and parameters: {'latent_size': 256, 'num_heads': 2, 'num_layers': 3, 'dropout_prob': 0.3304922045539824, 'l1_lambda': 1.5811275872385984e-06, 'learning_rate': 0.00010904598866728358}. Best is trial 24 with value: 0.5239232890762356.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2816,
Macro Precision: 0.5190, Macro Recall: 0.4399, Macro F1-score: 0.4573,
Weighted Precision: 0.4966, Weighted Recall: 0.4440, Weighted F1-score: 0.4461,
Accuracy: 44.40%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3962    0.5833    0.4719        72
     Oxidation     0.3889    0.4000    0.3944        70
       Phospho     0.3571    0.2778    0.3125        36
Ubiquitination     0.8696    0.4000    0.5479        50
   Acetylation     0.5833    0.5385    0.5600        13

      accuracy                         0.4440       241
     macro avg     0.5190    0.4399    0.4573       241
  weighted avg     0.4966    0.4440    0.4461       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:17:55,980] Trial 35 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.4881,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        37
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:18:12,787] Trial 36 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.9121,
Macro Precision: 0.5598, Macro Recall: 0.3914, Macro F1-score: 0.3659,
Weighted Precision: 0.5612, Weighted Recall: 0.4149, Weighted F1-score: 0.3438,
Accuracy: 41.49%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3598    0.9444    0.5211        72
     Oxidation     0.6667    0.0571    0.1053        70
       Phospho     0.4375    0.2000    0.2745        35
Ubiquitination     0.7895    0.2941    0.4286        51
   Acetylation     0.5455    0.4615    0.5000        13

      accuracy                         0.4149       241
     macro avg     0.5598    0.3914    0.3659       241
  weighted avg     0.5612    0.4149    0.3438       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.6560, Accuracy: 86.26%
Epoch [2/20] - Loss: 0.6083, Accuracy: 88.45%
Epoch [3/20] - Loss: 0.5803, Accuracy: 90.32%
Epoch [4/20] - Loss: 0.5572, Accuracy: 91.16%
Epoch [5/20] - Loss: 0.5300, Accuracy: 92.61%
Epoch [6/20] - Loss: 0.5012, Accuracy: 92.61%
Epoch [7/20] - Loss: 0.4668, Accuracy: 95.11%
Epoch [8/20] - Loss: 0.4488, Accuracy: 95.01%
Epoch [9/20] - Loss: 0.4128, Accuracy: 96.98%
Epoch [10/20] - Loss: 0.4025, Accuracy: 96.36%
Sample Predictions: [0 1 3 1 2]
Actual Labels: tensor([0, 1, 3, 1, 2], device='cuda:0')
Sample Logits: [[ 2.4754758e+00 -9.7316259e-01 -1.6085884e-01  5.8913642e-01
  -2.2668491e-01]
 [ 3.0039889e-03  3.1862602e+00 -4.4037607e-01 -1.5586216e+00
  -5.0232923e-01]
 [ 7.9798490e-01 -9.0536290e-01 -4.9698275e-01  3.1407957e+00
  -1.9942882e+00]
 [ 6.8321204e-01  2.7795162e+00 -1.7916189e-01 -5.4199600e-01
  -1.2558216e+00]
 [-1.3507143e+00  1.5994945e+00  2.1415873e+00 -1.6616255e+00
   5.2495268e-03]]
Epoch [11/20] - Loss: 0.3870, Accur



Epoch [1/20] - Loss: 0.2645, Accuracy: 98.54%
Epoch [2/20] - Loss: 0.2545, Accuracy: 98.75%
Epoch [3/20] - Loss: 0.2425, Accuracy: 99.06%
Epoch [4/20] - Loss: 0.2376, Accuracy: 99.38%
Epoch [5/20] - Loss: 0.2303, Accuracy: 99.06%
Epoch [6/20] - Loss: 0.2239, Accuracy: 99.27%
Epoch [7/20] - Loss: 0.2283, Accuracy: 99.17%
Epoch [8/20] - Loss: 0.2139, Accuracy: 99.58%
Epoch [9/20] - Loss: 0.2225, Accuracy: 99.27%
Epoch [10/20] - Loss: 0.2106, Accuracy: 99.27%
Sample Predictions: [0 1 3 1 2]
Actual Labels: tensor([0, 1, 3, 1, 2], device='cuda:0')
Sample Logits: [[ 4.358918   -0.7284012  -1.1975917  -1.2686229  -0.22629845]
 [-2.311408    4.208254   -0.3777161  -0.4367773  -1.3741604 ]
 [-1.1702838  -0.7067002  -0.6139713   3.3335629  -0.22680391]
 [-0.56399304  3.5620577  -0.9378182  -1.2462662  -0.4236393 ]
 [ 0.5117549  -0.29255128  3.1764364  -0.3727901  -0.31898028]]
Epoch [11/20] - Loss: 0.2099, Accuracy: 99.27%
Epoch [12/20] - Loss: 0.2021, Accuracy: 99.48%
Epoch [13/20] - Loss: 0.20

[I 2025-09-16 02:18:46,997] Trial 37 finished with value: 0.49667732197713166 and parameters: {'latent_size': 128, 'num_heads': 8, 'num_layers': 6, 'dropout_prob': 0.31480141038144605, 'l1_lambda': 9.941666246671685e-07, 'learning_rate': 0.00013007857579655954}. Best is trial 24 with value: 0.5239232890762356.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5334,
Macro Precision: 0.5360, Macro Recall: 0.4905, Macro F1-score: 0.4839,
Weighted Precision: 0.5432, Weighted Recall: 0.4855, Weighted F1-score: 0.4824,
Accuracy: 48.55%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3984    0.6806    0.5026        72
     Oxidation     0.5490    0.4000    0.4628        70
       Phospho     0.3478    0.2286    0.2759        35
Ubiquitination     0.8846    0.4510    0.5974        51
   Acetylation     0.5000    0.6923    0.5806        13

      accuracy                         0.4855       241
     macro avg     0.5360    0.4905    0.4839       241
  weighted avg     0.5432    0.4855    0.4824       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:18:59,152] Trial 38 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2290,
Macro Precision: 0.3941, Macro Recall: 0.2211, Macro F1-score: 0.1327,
Weighted Precision: 0.3383, Weighted Recall: 0.3112, Weighted F1-score: 0.1628,
Accuracy: 31.12%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3038    1.0000    0.4660        72
     Oxidation     0.6667    0.0286    0.0548        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.3112       241
     macro avg     0.3941    0.2211    0.1327       241
  weighted avg     0.3383    0.3112    0.1628       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:19:11,583] Trial 39 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 5.7145,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:19:23,771] Trial 40 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2215,
Macro Precision: 0.2600, Macro Recall: 0.2039, Macro F1-score: 0.1000,
Weighted Precision: 0.3012, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     1.0000    0.0196    0.0385        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2039    0.1000       241
  weighted avg     0.3012    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:19:40,476] Trial 41 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.3341,
Macro Precision: 0.5942, Macro Recall: 0.4453, Macro F1-score: 0.4675,
Weighted Precision: 0.5718, Weighted Recall: 0.4772, Weighted F1-score: 0.4628,
Accuracy: 47.72%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3742    0.8056    0.5110        72
     Oxidation     0.6071    0.2429    0.3469        70
       Phospho     0.5714    0.3333    0.4211        36
Ubiquitination     0.7931    0.4600    0.5823        50
   Acetylation     0.6250    0.3846    0.4762        13

      accuracy                         0.4772       241
     macro avg     0.5942    0.4453    0.4675       241
  weighted avg     0.5718    0.4772    0.4628       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:19:53,036] Trial 42 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.3418,
Macro Precision: 0.0594, Macro Recall: 0.1972, Macro F1-score: 0.0913,
Weighted Precision: 0.0888, Weighted Recall: 0.2946, Weighted F1-score: 0.1364,
Accuracy: 29.46%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2971    0.9861    0.4566        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        37
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2946       241
     macro avg     0.0594    0.1972    0.0913       241
  weighted avg     0.0888    0.2946    0.1364       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:20:04,710] Trial 43 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.9754,
Macro Precision: 0.4605, Macro Recall: 0.2183, Macro F1-score: 0.1272,
Weighted Precision: 0.4306, Weighted Recall: 0.3071, Weighted F1-score: 0.1547,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3025    1.0000    0.4645        72
     Oxidation     1.0000    0.0145    0.0286        69
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.3071       241
     macro avg     0.4605    0.2183    0.1272       241
  weighted avg     0.4306    0.3071    0.1547       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:20:16,584] Trial 44 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1652,
Macro Precision: 0.1938, Macro Recall: 0.2057, Macro F1-score: 0.1039,
Weighted Precision: 0.2840, Weighted Recall: 0.3071, Weighted F1-score: 0.1547,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3025    1.0000    0.4645        72
     Oxidation     0.6667    0.0286    0.0548        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.1938    0.2057    0.1039       241
  weighted avg     0.2840    0.3071    0.1547       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:20:28,214] Trial 45 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.1714,
Macro Precision: 0.2600, Macro Recall: 0.2057, Macro F1-score: 0.1034,
Weighted Precision: 0.2349, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     1.0000    0.0286    0.0556        35
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2057    0.1034       241
  weighted avg     0.2349    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:20:39,909] Trial 46 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1751,
Macro Precision: 0.3807, Macro Recall: 0.2201, Macro F1-score: 0.1339,
Weighted Precision: 0.4682, Weighted Recall: 0.3154, Weighted F1-score: 0.1766,
Accuracy: 31.54%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3034    0.9861    0.4641        72
     Oxidation     1.0000    0.0286    0.0556        70
       Phospho     0.6000    0.0857    0.1500        35
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3154       241
     macro avg     0.3807    0.2201    0.1339       241
  weighted avg     0.4682    0.3154    0.1766       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:20:52,445] Trial 47 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 5.5391,
Macro Precision: 0.0600, Macro Recall: 0.2000, Macro F1-score: 0.0923,
Weighted Precision: 0.0896, Weighted Recall: 0.2988, Weighted F1-score: 0.1379,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0600    0.2000    0.0923       241
  weighted avg     0.0896    0.2988    0.1379       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:21:04,199] Trial 48 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.0204,
Macro Precision: 0.0600, Macro Recall: 0.2000, Macro F1-score: 0.0923,
Weighted Precision: 0.0896, Weighted Recall: 0.2988, Weighted F1-score: 0.1379,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0600    0.2000    0.0923       241
  weighted avg     0.0896    0.2988    0.1379       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:21:20,922] Trial 49 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.4257,
Macro Precision: 0.5881, Macro Recall: 0.4451, Macro F1-score: 0.4566,
Weighted Precision: 0.5580, Weighted Recall: 0.4689, Weighted F1-score: 0.4401,
Accuracy: 46.89%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3694    0.8056    0.5066        72
     Oxidation     0.5882    0.1408    0.2273        71
       Phospho     0.4400    0.3143    0.3667        35
Ubiquitination     0.8286    0.5800    0.6824        50
   Acetylation     0.7143    0.3846    0.5000        13

      accuracy                         0.4689       241
     macro avg     0.5881    0.4451    0.4566       241
  weighted avg     0.5580    0.4689    0.4401       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:21:32,528] Trial 50 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.4122,
Macro Precision: 0.1608, Macro Recall: 0.2057, Macro F1-score: 0.1040,
Weighted Precision: 0.2360, Weighted Recall: 0.3071, Weighted F1-score: 0.1549,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3038    1.0000    0.4660        72
     Oxidation     0.5000    0.0286    0.0541        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.1608    0.2057    0.1040       241
  weighted avg     0.2360    0.3071    0.1549       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:21:44,228] Trial 51 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.4788,
Macro Precision: 0.2600, Macro Recall: 0.2039, Macro F1-score: 0.1000,
Weighted Precision: 0.3012, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     1.0000    0.0196    0.0385        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2039    0.1000       241
  weighted avg     0.3012    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:21:55,853] Trial 52 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.2889,
Macro Precision: 0.2600, Macro Recall: 0.2040, Macro F1-score: 0.1002,
Weighted Precision: 0.2971, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2040    0.1002       241
  weighted avg     0.2971    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.6984, Accuracy: 90.74%
Epoch [2/20] - Loss: 0.6575, Accuracy: 91.88%
Epoch [3/20] - Loss: 0.6264, Accuracy: 91.88%
Epoch [4/20] - Loss: 0.6116, Accuracy: 92.82%
Epoch [5/20] - Loss: 0.5838, Accuracy: 93.76%
Epoch [6/20] - Loss: 0.5554, Accuracy: 94.28%
Epoch [7/20] - Loss: 0.5335, Accuracy: 94.80%
Epoch [8/20] - Loss: 0.5237, Accuracy: 94.59%
Epoch [9/20] - Loss: 0.5028, Accuracy: 94.59%
Epoch [10/20] - Loss: 0.4736, Accuracy: 96.57%
Sample Predictions: [1 0 1 2 3]
Actual Labels: tensor([1, 0, 1, 2, 3], device='cuda:0')
Sample Logits: [[-0.25004998  1.7394311   0.08948255 -1.224256   -0.762688  ]
 [ 2.2074497  -0.320588   -1.4773124  -0.783776   -0.8989887 ]
 [ 0.9722153   1.6040134  -0.8287796  -0.20956776 -0.6643937 ]
 [-0.5730383  -0.5003795   1.8573543  -1.0035074   0.17660446]
 [ 1.2381643  -1.0626646  -1.3348194   1.41863    -0.6910723 ]]
Epoch [11/20] - Loss: 0.4397, Accuracy: 97.19%
Epoch [12/20] - Loss: 0.4467, Accuracy: 95.94%
Epoch [13/20] - Loss: 0.41



Epoch [1/20] - Loss: 0.3135, Accuracy: 98.02%
Epoch [2/20] - Loss: 0.2994, Accuracy: 98.23%
Epoch [3/20] - Loss: 0.2988, Accuracy: 98.44%
Epoch [4/20] - Loss: 0.2945, Accuracy: 98.34%
Epoch [5/20] - Loss: 0.2752, Accuracy: 98.86%
Epoch [6/20] - Loss: 0.2694, Accuracy: 98.23%
Epoch [7/20] - Loss: 0.2596, Accuracy: 98.44%
Epoch [8/20] - Loss: 0.2632, Accuracy: 98.44%
Epoch [9/20] - Loss: 0.2467, Accuracy: 98.75%
Epoch [10/20] - Loss: 0.2384, Accuracy: 99.06%
Sample Predictions: [1 0 1 2 3]
Actual Labels: tensor([1, 0, 1, 2, 3], device='cuda:0')
Sample Logits: [[-0.37728468  2.5935414  -0.5126577  -0.44270185 -1.5117966 ]
 [ 2.5075326  -0.83490914 -1.3248464  -0.81156313 -0.9084025 ]
 [-0.17264043  2.7640786  -1.0770061  -0.79589516 -0.19349125]
 [-0.45387053 -0.81741786  2.2077608  -0.25254318  0.4867429 ]
 [-0.48394212  0.44672576 -1.0096767   1.7077817  -1.405238  ]]
Epoch [11/20] - Loss: 0.2352, Accuracy: 98.96%
Epoch [12/20] - Loss: 0.2197, Accuracy: 99.38%
Epoch [13/20] - Loss: 0.21

[I 2025-09-16 02:22:29,661] Trial 53 finished with value: 0.5448189082937827 and parameters: {'latent_size': 64, 'num_heads': 4, 'num_layers': 6, 'dropout_prob': 0.22162835150922375, 'l1_lambda': 2.3139046200726137e-07, 'learning_rate': 0.00013235255068305934}. Best is trial 53 with value: 0.5448189082937827.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.6475,
Macro Precision: 0.5942, Macro Recall: 0.5301, Macro F1-score: 0.5511,
Weighted Precision: 0.5287, Weighted Recall: 0.4979, Weighted F1-score: 0.5023,
Accuracy: 49.79%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4200    0.5833    0.4884        72
     Oxidation     0.4364    0.3429    0.3840        70
       Phospho     0.4146    0.4722    0.4416        36
Ubiquitination     0.8000    0.5600    0.6588        50
   Acetylation     0.9000    0.6923    0.7826        13

      accuracy                         0.4979       241
     macro avg     0.5942    0.5301    0.5511       241
  weighted avg     0.5287    0.4979    0.5023       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:22:41,427] Trial 54 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.3046,
Macro Precision: 0.2600, Macro Recall: 0.2054, Macro F1-score: 0.1028,
Weighted Precision: 0.2432, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     1.0000    0.0270    0.0526        37
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2054    0.1028       241
  weighted avg     0.2432    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:22:53,208] Trial 55 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.9084,
Macro Precision: 0.3281, Macro Recall: 0.2097, Macro F1-score: 0.1205,
Weighted Precision: 0.3974, Weighted Recall: 0.3112, Weighted F1-score: 0.1760,
Accuracy: 31.12%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3070    0.9722    0.4667        72
     Oxidation     0.3333    0.0563    0.0964        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3112       241
     macro avg     0.3281    0.2097    0.1205       241
  weighted avg     0.3974    0.3112    0.1760       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:23:05,070] Trial 56 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.5167,
Macro Precision: 0.4597, Macro Recall: 0.2166, Macro F1-score: 0.1280,
Weighted Precision: 0.3505, Weighted Recall: 0.3029, Weighted F1-score: 0.1527,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2983    0.9861    0.4581        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.3029       241
     macro avg     0.4597    0.2166    0.1280       241
  weighted avg     0.3505    0.3029    0.1527       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:23:17,530] Trial 57 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.8214,
Macro Precision: 0.2605, Macro Recall: 0.2086, Macro F1-score: 0.1093,
Weighted Precision: 0.3808, Weighted Recall: 0.3112, Weighted F1-score: 0.1626,
Accuracy: 31.12%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3025    1.0000    0.4645        72
     Oxidation     1.0000    0.0429    0.0822        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3112       241
     macro avg     0.2605    0.2086    0.1093       241
  weighted avg     0.3808    0.3112    0.1626       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:23:29,418] Trial 58 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2963,
Macro Precision: 0.4603, Macro Recall: 0.2068, Macro F1-score: 0.1060,
Weighted Precision: 0.5921, Weighted Recall: 0.3071, Weighted F1-score: 0.1546,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     1.0000    0.0141    0.0278        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.4603    0.2068    0.1060       241
  weighted avg     0.5921    0.3071    0.1546       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:23:41,272] Trial 59 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1927,
Macro Precision: 0.3268, Macro Recall: 0.2081, Macro F1-score: 0.1131,
Weighted Precision: 0.3942, Weighted Recall: 0.3071, Weighted F1-score: 0.1617,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3008    0.9861    0.4610        72
     Oxidation     0.3333    0.0143    0.0274        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     1.0000    0.0400    0.0769        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.3268    0.2081    0.1131       241
  weighted avg     0.3942    0.3071    0.1617       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:23:52,991] Trial 60 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5999,
Macro Precision: 0.2603, Macro Recall: 0.2054, Macro F1-score: 0.1031,
Weighted Precision: 0.2435, Weighted Recall: 0.3029, Weighted F1-score: 0.1464,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     1.0000    0.0270    0.0526        37
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2603    0.2054    0.1031       241
  weighted avg     0.2435    0.3029    0.1464       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:24:04,607] Trial 61 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.2032,
Macro Precision: 0.3613, Macro Recall: 0.2136, Macro F1-score: 0.1199,
Weighted Precision: 0.4463, Weighted Recall: 0.3154, Weighted F1-score: 0.1718,
Accuracy: 31.54%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3064    1.0000    0.4691        72
     Oxidation     0.5000    0.0282    0.0533        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     1.0000    0.0400    0.0769        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3154       241
     macro avg     0.3613    0.2136    0.1199       241
  weighted avg     0.4463    0.3154    0.1718       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:24:16,313] Trial 62 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5497,
Macro Precision: 0.4047, Macro Recall: 0.2198, Macro F1-score: 0.1312,
Weighted Precision: 0.4492, Weighted Recall: 0.3237, Weighted F1-score: 0.1868,
Accuracy: 32.37%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3090    1.0000    0.4721        72
     Oxidation     0.7143    0.0714    0.1299        70
       Phospho     1.0000    0.0278    0.0541        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3237       241
     macro avg     0.4047    0.2198    0.1312       241
  weighted avg     0.4492    0.3237    0.1868       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.6370, Accuracy: 95.94%
Epoch [2/20] - Loss: 0.6071, Accuracy: 97.29%
Epoch [3/20] - Loss: 0.5950, Accuracy: 96.98%
Epoch [4/20] - Loss: 0.5701, Accuracy: 96.88%
Epoch [5/20] - Loss: 0.5502, Accuracy: 97.09%
Epoch [6/20] - Loss: 0.5194, Accuracy: 97.29%
Epoch [7/20] - Loss: 0.5042, Accuracy: 98.34%
Epoch [8/20] - Loss: 0.4713, Accuracy: 98.54%
Epoch [9/20] - Loss: 0.4591, Accuracy: 98.75%
Epoch [10/20] - Loss: 0.4336, Accuracy: 99.27%
Sample Predictions: [2 3 0 3 3]
Actual Labels: tensor([2, 3, 0, 3, 3], device='cuda:0')
Sample Logits: [[ 0.61397713 -0.39099145  1.5638162  -0.7565627  -0.12106699]
 [-0.26262432 -0.13047817 -0.5126693   1.9844896  -1.1423815 ]
 [ 1.3873326  -0.11111382 -0.94592446 -0.5266207  -1.6057925 ]
 [ 0.17937349 -1.009566   -1.1854881   1.3737992  -1.1546521 ]
 [-0.6632798  -0.13112749 -0.10327765  1.5138551  -1.5018913 ]]
Epoch [11/20] - Loss: 0.4189, Accuracy: 98.44%
Epoch [12/20] - Loss: 0.3984, Accuracy: 99.38%
Epoch [13/20] - Loss: 0.38



Epoch [1/20] - Loss: 0.2943, Accuracy: 99.69%
Epoch [2/20] - Loss: 0.2951, Accuracy: 98.96%
Epoch [3/20] - Loss: 0.2800, Accuracy: 99.58%
Epoch [4/20] - Loss: 0.2738, Accuracy: 99.79%
Epoch [5/20] - Loss: 0.2678, Accuracy: 99.69%
Epoch [6/20] - Loss: 0.2512, Accuracy: 99.79%
Epoch [7/20] - Loss: 0.2461, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.2431, Accuracy: 99.69%
Epoch [9/20] - Loss: 0.2329, Accuracy: 99.90%
Epoch [10/20] - Loss: 0.2285, Accuracy: 100.00%
Sample Predictions: [2 3 0 3 3]
Actual Labels: tensor([2, 3, 0, 3, 3], device='cuda:0')
Sample Logits: [[ 0.22960284 -1.3542907   1.7991992   0.24888572  0.21295366]
 [-0.55510783 -0.797221   -1.0013617   2.5385346  -0.6020013 ]
 [ 2.5258055  -1.0312889  -1.2160195  -0.39044565 -0.11817188]
 [-0.39688903 -0.723429   -0.834991    2.4865425  -0.8068913 ]
 [-1.1656914  -0.25981957 -1.0269065   2.4319077  -0.5672449 ]]
Epoch [11/20] - Loss: 0.2232, Accuracy: 99.38%
Epoch [12/20] - Loss: 0.2175, Accuracy: 99.90%
Epoch [13/20] - Loss: 0.

[I 2025-09-16 02:24:49,901] Trial 63 finished with value: 0.5160968866299243 and parameters: {'latent_size': 64, 'num_heads': 4, 'num_layers': 5, 'dropout_prob': 0.1550655901381826, 'l1_lambda': 2.0269436633554208e-07, 'learning_rate': 0.00012373757282304937}. Best is trial 53 with value: 0.5448189082937827.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.5635,
Macro Precision: 0.6633, Macro Recall: 0.4701, Macro F1-score: 0.4987,
Weighted Precision: 0.5775, Weighted Recall: 0.5104, Weighted F1-score: 0.4878,
Accuracy: 51.04%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4150    0.8472    0.5571        72
     Oxidation     0.4318    0.2714    0.3333        70
       Phospho     0.5556    0.1429    0.2273        35
Ubiquitination     0.9143    0.6275    0.7442        51
   Acetylation     1.0000    0.4615    0.6316        13

      accuracy                         0.5104       241
     macro avg     0.6633    0.4701    0.4987       241
  weighted avg     0.5775    0.5104    0.4878       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:25:01,968] Trial 64 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.4456,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        37
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:25:13,589] Trial 65 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.0992,
Macro Precision: 0.2600, Macro Recall: 0.2029, Macro F1-score: 0.0979,
Weighted Precision: 0.3801, Weighted Recall: 0.3029, Weighted F1-score: 0.1461,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2029    0.0979       241
  weighted avg     0.3801    0.3029    0.1461       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:25:30,246] Trial 66 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5777,
Macro Precision: 0.4844, Macro Recall: 0.4536, Macro F1-score: 0.4409,
Weighted Precision: 0.5164, Weighted Recall: 0.4523, Weighted F1-score: 0.4534,
Accuracy: 45.23%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4035    0.6389    0.4946        72
     Oxidation     0.5135    0.2754    0.3585        69
       Phospho     0.2750    0.3056    0.2895        36
Ubiquitination     0.8966    0.5098    0.6500        51
   Acetylation     0.3333    0.5385    0.4118        13

      accuracy                         0.4523       241
     macro avg     0.4844    0.4536    0.4409       241
  weighted avg     0.5164    0.4523    0.4534       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:25:42,259] Trial 67 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 5.0489,
Macro Precision: 0.2600, Macro Recall: 0.2040, Macro F1-score: 0.1002,
Weighted Precision: 0.2971, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        37
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2040    0.1002       241
  weighted avg     0.2971    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.9605, Accuracy: 78.36%
Epoch [2/20] - Loss: 0.8990, Accuracy: 80.12%
Epoch [3/20] - Loss: 0.8657, Accuracy: 83.04%
Epoch [4/20] - Loss: 0.8167, Accuracy: 85.12%
Epoch [5/20] - Loss: 0.7891, Accuracy: 83.98%
Epoch [6/20] - Loss: 0.7427, Accuracy: 87.20%
Epoch [7/20] - Loss: 0.7051, Accuracy: 88.66%
Epoch [8/20] - Loss: 0.6716, Accuracy: 90.43%
Epoch [9/20] - Loss: 0.6472, Accuracy: 90.43%
Epoch [10/20] - Loss: 0.6172, Accuracy: 91.99%
Sample Predictions: [3 1 2 0 0]
Actual Labels: tensor([3, 4, 3, 4, 0], device='cuda:0')
Sample Logits: [[-0.56544006 -0.96990097  0.76146233  2.3442578  -0.7342507 ]
 [-1.5217657   1.0517433  -1.5720139   0.611816   -0.38461405]
 [-1.2623103  -0.1712197   0.94027984  0.89767337 -0.9304107 ]
 [ 1.3992052  -0.01749872 -0.9072755  -0.8071105  -0.45387563]
 [ 3.228186   -0.37554473 -0.95318735 -1.7431306  -0.2768052 ]]
Epoch [11/20] - Loss: 0.5777, Accuracy: 92.72%
Epoch [12/20] - Loss: 0.5587, Accuracy: 92.61%
Epoch [13/20] - Loss: 0.53



Epoch [1/20] - Loss: 0.4031, Accuracy: 95.53%
Epoch [2/20] - Loss: 0.3861, Accuracy: 96.36%
Epoch [3/20] - Loss: 0.3797, Accuracy: 96.46%
Epoch [4/20] - Loss: 0.3529, Accuracy: 97.61%
Epoch [5/20] - Loss: 0.3428, Accuracy: 97.09%
Epoch [6/20] - Loss: 0.3438, Accuracy: 96.77%
Epoch [7/20] - Loss: 0.3199, Accuracy: 97.92%
Epoch [8/20] - Loss: 0.3131, Accuracy: 98.13%
Epoch [9/20] - Loss: 0.2978, Accuracy: 98.44%
Epoch [10/20] - Loss: 0.2912, Accuracy: 98.65%
Sample Predictions: [3 0 3 1 0]
Actual Labels: tensor([3, 4, 3, 4, 0], device='cuda:0')
Sample Logits: [[ 0.12993422 -2.783696    1.4452803   3.088945    0.01422331]
 [ 1.987303   -0.6047826  -1.8492891  -1.4121323   1.7192341 ]
 [-1.7199569  -0.45310715 -0.45220613  3.5112908  -0.8001212 ]
 [-0.54982394  1.8408597   0.43346396 -1.9737358   1.1947577 ]
 [ 3.246456    0.12818508 -2.0474403  -1.0716335  -0.36093408]]
Epoch [11/20] - Loss: 0.2974, Accuracy: 98.23%
Epoch [12/20] - Loss: 0.2870, Accuracy: 98.13%
Epoch [13/20] - Loss: 0.28

[I 2025-09-16 02:26:15,615] Trial 68 finished with value: 0.5098311475047349 and parameters: {'latent_size': 128, 'num_heads': 8, 'num_layers': 4, 'dropout_prob': 0.34994592515584877, 'l1_lambda': 1.2069971280906732e-06, 'learning_rate': 0.00011507754709457142}. Best is trial 53 with value: 0.5448189082937827.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.1751,
Macro Precision: 0.4933, Macro Recall: 0.4348, Macro F1-score: 0.4508,
Weighted Precision: 0.5040, Weighted Recall: 0.4772, Weighted F1-score: 0.4774,
Accuracy: 47.72%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4245    0.6250    0.5056        72
     Oxidation     0.4138    0.3380    0.3721        71
       Phospho     0.4000    0.4000    0.4000        35
Ubiquitination     0.8529    0.5800    0.6905        50
   Acetylation     0.3750    0.2308    0.2857        13

      accuracy                         0.4772       241
     macro avg     0.4933    0.4348    0.4508       241
  weighted avg     0.5040    0.4772    0.4774       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:26:27,512] Trial 69 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.9948,
Macro Precision: 0.3287, Macro Recall: 0.2234, Macro F1-score: 0.1369,
Weighted Precision: 0.3486, Weighted Recall: 0.3195, Weighted F1-score: 0.1792,
Accuracy: 31.95%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3103    1.0000    0.4737        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     0.3333    0.0571    0.0976        35
Ubiquitination     1.0000    0.0600    0.1132        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3195       241
     macro avg     0.3287    0.2234    0.1369       241
  weighted avg     0.3486    0.3195    0.1792       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 1.1219, Accuracy: 67.12%
Epoch [2/20] - Loss: 1.1019, Accuracy: 66.70%
Epoch [3/20] - Loss: 1.0779, Accuracy: 66.49%
Epoch [4/20] - Loss: 1.0654, Accuracy: 70.03%
Epoch [5/20] - Loss: 1.0574, Accuracy: 68.68%
Epoch [6/20] - Loss: 1.0443, Accuracy: 70.03%
Epoch [7/20] - Loss: 1.0034, Accuracy: 72.94%
Epoch [8/20] - Loss: 0.9952, Accuracy: 72.01%
Epoch [9/20] - Loss: 0.9746, Accuracy: 75.44%
Epoch [10/20] - Loss: 0.9523, Accuracy: 75.96%
Sample Predictions: [0 0 0 3 3]
Actual Labels: tensor([0, 0, 0, 3, 3], device='cuda:0')
Sample Logits: [[ 1.7746089  -0.46238744 -0.86893046 -0.6312385  -1.095634  ]
 [ 2.3402321   1.0157588   0.02539911 -0.9520548  -0.24979052]
 [ 1.664674    0.15217492 -0.27923253 -0.24740851 -0.12223883]
 [-0.04063635  0.82678527 -0.29158762  1.5215634  -1.1333277 ]
 [ 0.5838709   0.62385815  0.22149834  1.1869457  -0.8340703 ]]
Epoch [11/20] - Loss: 0.9382, Accuracy: 76.69%
Epoch [12/20] - Loss: 0.9249, Accuracy: 76.69%
Epoch [13/20] - Loss: 0.89



Epoch [1/20] - Loss: 0.7887, Accuracy: 82.31%
Epoch [2/20] - Loss: 0.7701, Accuracy: 82.73%
Epoch [3/20] - Loss: 0.7608, Accuracy: 82.93%
Epoch [4/20] - Loss: 0.7364, Accuracy: 84.39%
Epoch [5/20] - Loss: 0.7159, Accuracy: 84.70%
Epoch [6/20] - Loss: 0.7333, Accuracy: 83.87%
Epoch [7/20] - Loss: 0.6946, Accuracy: 85.74%
Epoch [8/20] - Loss: 0.6909, Accuracy: 86.37%
Epoch [9/20] - Loss: 0.6723, Accuracy: 87.41%
Epoch [10/20] - Loss: 0.6845, Accuracy: 85.85%
Sample Predictions: [0 0 3 3 3]
Actual Labels: tensor([0, 0, 0, 3, 3], device='cuda:0')
Sample Logits: [[ 1.5838934   0.69584227 -0.03649187  0.03095854 -0.22872868]
 [ 1.8560344   0.22125384 -0.28871056  0.24889535 -0.3594604 ]
 [ 0.45320237 -0.63070464 -0.41325933  0.4633278   0.00987002]
 [-1.1998869  -0.22188333 -0.6440453   1.6239899  -1.0690613 ]
 [-0.00533215  0.9489161  -0.04758886  2.046798   -0.69353706]]
Epoch [11/20] - Loss: 0.6607, Accuracy: 87.51%
Epoch [12/20] - Loss: 0.6484, Accuracy: 88.14%
Epoch [13/20] - Loss: 0.65

[I 2025-09-16 02:27:00,832] Trial 70 finished with value: 0.5154728984042963 and parameters: {'latent_size': 64, 'num_heads': 8, 'num_layers': 3, 'dropout_prob': 0.3498303314995269, 'l1_lambda': 3.0337920243412924e-07, 'learning_rate': 0.00011623364440744019}. Best is trial 53 with value: 0.5448189082937827.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.4500,
Macro Precision: 0.7057, Macro Recall: 0.4771, Macro F1-score: 0.5101,
Weighted Precision: 0.5998, Weighted Recall: 0.5228, Weighted F1-score: 0.5098,
Accuracy: 52.28%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4062    0.7222    0.5200        72
     Oxidation     0.4681    0.3099    0.3729        71
       Phospho     0.9091    0.2857    0.4348        35
Ubiquitination     0.7451    0.7600    0.7525        50
   Acetylation     1.0000    0.3077    0.4706        13

      accuracy                         0.5228       241
     macro avg     0.7057    0.4771    0.5101       241
  weighted avg     0.5998    0.5228    0.5098       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:27:17,652] Trial 71 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.3528,
Macro Precision: 0.6336, Macro Recall: 0.4582, Macro F1-score: 0.4592,
Weighted Precision: 0.5628, Weighted Recall: 0.5311, Weighted F1-score: 0.5144,
Accuracy: 53.11%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4815    0.7222    0.5778        72
     Oxidation     0.5200    0.3662    0.4298        71
       Phospho     0.4857    0.4857    0.4857        35
Ubiquitination     0.6809    0.6400    0.6598        50
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.5311       241
     macro avg     0.6336    0.4582    0.4592       241
  weighted avg     0.5628    0.5311    0.5144       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:27:29,248] Trial 72 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.2978,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:27:40,936] Trial 73 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.3149,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:27:52,657] Trial 74 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.9700,
Macro Precision: 0.3102, Macro Recall: 0.2040, Macro F1-score: 0.1052,
Weighted Precision: 0.3751, Weighted Recall: 0.3029, Weighted F1-score: 0.1537,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 51, np.int64(2): 34, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3008    0.9861    0.4610        72
     Oxidation     0.2500    0.0141    0.0267        71
       Phospho     0.0000    0.0000    0.0000        34
Ubiquitination     1.0000    0.0196    0.0385        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.3102    0.2040    0.1052       241
  weighted avg     0.3751    0.3029    0.1537       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:28:04,800] Trial 75 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.7856,
Macro Precision: 0.4603, Macro Recall: 0.2084, Macro F1-score: 0.1090,
Weighted Precision: 0.5298, Weighted Recall: 0.3071, Weighted F1-score: 0.1546,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     1.0000    0.0278    0.0541        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.4603    0.2084    0.1090       241
  weighted avg     0.5298    0.3071    0.1546       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.3621, Accuracy: 97.61%
Epoch [2/20] - Loss: 0.3376, Accuracy: 98.23%
Epoch [3/20] - Loss: 0.3143, Accuracy: 98.65%
Epoch [4/20] - Loss: 0.2900, Accuracy: 99.06%
Epoch [5/20] - Loss: 0.2746, Accuracy: 98.65%
Epoch [6/20] - Loss: 0.2575, Accuracy: 99.27%
Epoch [7/20] - Loss: 0.2355, Accuracy: 99.69%
Epoch [8/20] - Loss: 0.2310, Accuracy: 99.69%
Epoch [9/20] - Loss: 0.2174, Accuracy: 99.79%
Epoch [10/20] - Loss: 0.2068, Accuracy: 99.69%
Sample Predictions: [3 3 3 0 1]
Actual Labels: tensor([3, 3, 3, 0, 1], device='cuda:0')
Sample Logits: [[-0.76064986 -1.0704112  -0.8291689   3.3419385  -0.89877254]
 [-1.1503962  -1.6777071  -0.7169137   3.1699114  -1.1398207 ]
 [-1.3993813  -0.5666753  -1.4000597   2.4639733  -0.33974576]
 [ 3.9171638  -0.7637584  -0.6571387  -0.6070791   0.29288003]
 [-2.109124    3.5774384  -1.5289826   0.0597223  -0.9858299 ]]
Epoch [11/20] - Loss: 0.1989, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.1947, Accuracy: 99.69%
Epoch [13/20] - Loss: 0.18



Epoch [1/20] - Loss: 0.1517, Accuracy: 100.00%
Epoch [2/20] - Loss: 0.1495, Accuracy: 99.90%
Epoch [3/20] - Loss: 0.1433, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.1393, Accuracy: 99.90%
Epoch [5/20] - Loss: 0.1353, Accuracy: 99.90%
Epoch [6/20] - Loss: 0.1332, Accuracy: 99.90%
Epoch [7/20] - Loss: 0.1276, Accuracy: 100.00%
Epoch [8/20] - Loss: 0.1246, Accuracy: 100.00%
Epoch [9/20] - Loss: 0.1209, Accuracy: 100.00%
Epoch [10/20] - Loss: 0.1188, Accuracy: 100.00%
Sample Predictions: [3 3 3 0 1]
Actual Labels: tensor([3, 3, 3, 0, 1], device='cuda:0')
Sample Logits: [[-2.0715358  -0.69163483 -0.8494519   4.1839848  -0.610022  ]
 [-0.6646846  -1.005255   -1.3981104   4.254385   -1.3889705 ]
 [-0.9264138  -1.5327556  -0.13563241  4.274175   -1.8642414 ]
 [ 4.4034333  -1.008356   -0.80211043 -0.10178694 -0.97076726]
 [-1.0856802   4.850807   -0.8140216  -1.715236   -1.2602615 ]]
Epoch [11/20] - Loss: 0.1156, Accuracy: 100.00%
Epoch [12/20] - Loss: 0.1135, Accuracy: 100.00%
Epoch [13/20] - Lo

[I 2025-09-16 02:28:38,195] Trial 76 finished with value: 0.44548364471099966 and parameters: {'latent_size': 128, 'num_heads': 8, 'num_layers': 4, 'dropout_prob': 0.24159453082715052, 'l1_lambda': 9.447648411011186e-07, 'learning_rate': 0.00015256899376673905}. Best is trial 53 with value: 0.5448189082937827.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2235,
Macro Precision: 0.5703, Macro Recall: 0.3643, Macro F1-score: 0.3589,
Weighted Precision: 0.4793, Weighted Recall: 0.3817, Weighted F1-score: 0.3324,
Accuracy: 38.17%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3522    0.7778    0.4848        72
     Oxidation     0.3333    0.0714    0.1176        70
       Phospho     0.3200    0.4444    0.3721        36
Ubiquitination     0.8462    0.2200    0.3492        50
   Acetylation     1.0000    0.3077    0.4706        13

      accuracy                         0.3817       241
     macro avg     0.5703    0.3643    0.3589       241
  weighted avg     0.4793    0.3817    0.3324       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:28:50,546] Trial 77 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.9033,
Macro Precision: 0.4110, Macro Recall: 0.2195, Macro F1-score: 0.1291,
Weighted Precision: 0.4936, Weighted Recall: 0.3154, Weighted F1-score: 0.1703,
Accuracy: 31.54%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3051    1.0000    0.4675        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     0.7500    0.0833    0.1500        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3154       241
     macro avg     0.4110    0.2195    0.1291       241
  weighted avg     0.4936    0.3154    0.1703       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:29:07,598] Trial 78 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.7871,
Macro Precision: 0.5934, Macro Recall: 0.4352, Macro F1-score: 0.4602,
Weighted Precision: 0.5685, Weighted Recall: 0.4938, Weighted F1-score: 0.4823,
Accuracy: 49.38%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4118    0.6806    0.5131        72
     Oxidation     0.5333    0.5797    0.5556        69
       Phospho     0.4800    0.3333    0.3934        36
Ubiquitination     0.8750    0.2745    0.4179        51
   Acetylation     0.6667    0.3077    0.4211        13

      accuracy                         0.4938       241
     macro avg     0.5934    0.4352    0.4602       241
  weighted avg     0.5685    0.4938    0.4823       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:29:19,175] Trial 79 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1507,
Macro Precision: 0.0600, Macro Recall: 0.2000, Macro F1-score: 0.0923,
Weighted Precision: 0.0896, Weighted Recall: 0.2988, Weighted F1-score: 0.1379,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0600    0.2000    0.0923       241
  weighted avg     0.0896    0.2988    0.1379       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset



Epoch [1/20] - Loss: 0.2820, Accuracy: 95.21%
Epoch [2/20] - Loss: 0.2581, Accuracy: 96.57%
Epoch [3/20] - Loss: 0.2174, Accuracy: 98.54%
Epoch [4/20] - Loss: 0.2006, Accuracy: 98.23%
Epoch [5/20] - Loss: 0.1883, Accuracy: 98.23%
Epoch [6/20] - Loss: 0.1684, Accuracy: 99.06%
Epoch [7/20] - Loss: 0.1646, Accuracy: 98.13%
Epoch [8/20] - Loss: 0.1383, Accuracy: 99.38%
Epoch [9/20] - Loss: 0.1277, Accuracy: 99.27%
Epoch [10/20] - Loss: 0.1240, Accuracy: 99.48%
Sample Predictions: [0 1 1 2 2]
Actual Labels: tensor([0, 1, 1, 2, 2], device='cuda:0')
Sample Logits: [[ 4.9952235  -0.93202925 -0.57662976 -1.8407339  -1.3563722 ]
 [-2.511007    3.657698   -2.2547674  -0.42351148  0.18496364]
 [-2.439963    4.7620816  -0.4013923  -1.272011   -0.46376795]
 [ 0.5575659  -1.2213355   4.1219087  -0.9568165  -0.9524763 ]
 [-1.406647   -1.0235648   4.17185    -0.92417616 -1.6265469 ]]
Epoch [11/20] - Loss: 0.1174, Accuracy: 99.69%
Epoch [12/20] - Loss: 0.1074, Accuracy: 99.79%
Epoch [13/20] - Loss: 0.10



Epoch [1/20] - Loss: 0.0840, Accuracy: 99.79%
Epoch [2/20] - Loss: 0.0810, Accuracy: 99.79%
Epoch [3/20] - Loss: 0.0779, Accuracy: 100.00%
Epoch [4/20] - Loss: 0.0779, Accuracy: 100.00%
Epoch [5/20] - Loss: 0.0772, Accuracy: 99.90%
Epoch [6/20] - Loss: 0.0756, Accuracy: 100.00%
Epoch [7/20] - Loss: 0.0766, Accuracy: 99.69%
Epoch [8/20] - Loss: 0.0731, Accuracy: 99.90%
Epoch [9/20] - Loss: 0.0766, Accuracy: 99.79%
Epoch [10/20] - Loss: 0.0714, Accuracy: 100.00%
Sample Predictions: [0 1 1 2 2]
Actual Labels: tensor([0, 1, 1, 2, 2], device='cuda:0')
Sample Logits: [[ 5.053124   -0.89543533 -1.300867   -0.40283537 -0.6828815 ]
 [-2.6679769   4.5570965  -0.6190639  -0.87293637 -0.79350424]
 [-0.31830317  5.195838   -1.487157   -1.0472484  -1.8783859 ]
 [-2.967065   -0.34358278  4.5017614  -1.3570187   0.12419935]
 [-0.01470466 -0.74558794  5.0824366  -1.9233639  -1.9366701 ]]
Epoch [11/20] - Loss: 0.0715, Accuracy: 99.90%
Epoch [12/20] - Loss: 0.0698, Accuracy: 100.00%
Epoch [13/20] - Loss:

[I 2025-09-16 02:29:53,475] Trial 80 finished with value: 0.49218216304146034 and parameters: {'latent_size': 256, 'num_heads': 8, 'num_layers': 5, 'dropout_prob': 0.32393563714495643, 'l1_lambda': 3.0152043340652367e-07, 'learning_rate': 0.00010318591661674329}. Best is trial 53 with value: 0.5448189082937827.


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1872,
Macro Precision: 0.5875, Macro Recall: 0.4663, Macro F1-score: 0.4893,
Weighted Precision: 0.5239, Weighted Recall: 0.4606, Weighted F1-score: 0.4571,
Accuracy: 46.06%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3471    0.5833    0.4352        72
     Oxidation     0.4507    0.4571    0.4539        70
       Phospho     0.5000    0.1389    0.2174        36
Ubiquitination     0.8214    0.4600    0.5897        50
   Acetylation     0.8182    0.6923    0.7500        13

      accuracy                         0.4606       241
     macro avg     0.5875    0.4663    0.4893       241
  weighted avg     0.5239    0.4606    0.4571       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:30:06,078] Trial 81 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.9122,
Macro Precision: 0.4608, Macro Recall: 0.2325, Macro F1-score: 0.1534,
Weighted Precision: 0.2899, Weighted Recall: 0.3154, Weighted F1-score: 0.1699,
Accuracy: 31.54%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3038    1.0000    0.4660        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     1.0000    0.0857    0.1579        35
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.3154       241
     macro avg     0.4608    0.2325    0.1534       241
  weighted avg     0.2899    0.3154    0.1699       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:30:18,310] Trial 82 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.7519,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:30:30,509] Trial 83 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 4.6566,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:30:48,054] Trial 84 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2732,
Macro Precision: 0.6446, Macro Recall: 0.3906, Macro F1-score: 0.4228,
Weighted Precision: 0.5562, Weighted Recall: 0.4647, Weighted F1-score: 0.4588,
Accuracy: 46.47%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3952    0.6806    0.5000        72
     Oxidation     0.4110    0.4286    0.4196        70
       Phospho     0.5000    0.2500    0.3333        36
Ubiquitination     0.9167    0.4400    0.5946        50
   Acetylation     1.0000    0.1538    0.2667        13

      accuracy                         0.4647       241
     macro avg     0.6446    0.3906    0.4228       241
  weighted avg     0.5562    0.4647    0.4588       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:31:00,424] Trial 85 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 5.1435,
Macro Precision: 0.5608, Macro Recall: 0.2238, Macro F1-score: 0.1381,
Weighted Precision: 0.4393, Weighted Recall: 0.3112, Weighted F1-score: 0.1631,
Accuracy: 31.12%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3038    1.0000    0.4660        72
     Oxidation     0.5000    0.0143    0.0278        70
       Phospho     1.0000    0.0278    0.0541        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     1.0000    0.0769    0.1429        13

      accuracy                         0.3112       241
     macro avg     0.5608    0.2238    0.1381       241
  weighted avg     0.4393    0.3112    0.1631       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:31:12,114] Trial 86 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.8958,
Macro Precision: 0.3118, Macro Recall: 0.2189, Macro F1-score: 0.1295,
Weighted Precision: 0.3724, Weighted Recall: 0.3195, Weighted F1-score: 0.1796,
Accuracy: 31.95%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3090    1.0000    0.4721        72
     Oxidation     0.2500    0.0143    0.0270        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     1.0000    0.0800    0.1481        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3195       241
     macro avg     0.3118    0.2189    0.1295       241
  weighted avg     0.3724    0.3195    0.1796       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:31:24,387] Trial 87 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.1074,
Macro Precision: 0.0598, Macro Recall: 0.2000, Macro F1-score: 0.0920,
Weighted Precision: 0.0893, Weighted Recall: 0.2988, Weighted F1-score: 0.1374,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2988    1.0000    0.4601        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0598    0.2000    0.0920       241
  weighted avg     0.0893    0.2988    0.1374       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:31:41,035] Trial 88 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.2955,
Macro Precision: 0.4923, Macro Recall: 0.5004, Macro F1-score: 0.4769,
Weighted Precision: 0.5273, Weighted Recall: 0.4896, Weighted F1-score: 0.4965,
Accuracy: 48.96%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.4222    0.5278    0.4691        72
     Oxidation     0.5532    0.3714    0.4444        70
       Phospho     0.5714    0.4444    0.5000        36
Ubiquitination     0.6889    0.6200    0.6526        50
   Acetylation     0.2258    0.5385    0.3182        13

      accuracy                         0.4896       241
     macro avg     0.4923    0.5004    0.4769       241
  weighted avg     0.5273    0.4896    0.4965       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:31:53,148] Trial 89 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.1948,
Macro Precision: 0.0600, Macro Recall: 0.2000, Macro F1-score: 0.0923,
Weighted Precision: 0.0896, Weighted Recall: 0.2988, Weighted F1-score: 0.1379,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.0600    0.2000    0.0923       241
  weighted avg     0.0896    0.2988    0.1379       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:32:04,926] Trial 90 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2520,
Macro Precision: 0.2597, Macro Recall: 0.2011, Macro F1-score: 0.0993,
Weighted Precision: 0.3007, Weighted Recall: 0.2988, Weighted F1-score: 0.1450,
Accuracy: 29.88%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 51, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2983    0.9861    0.4581        72
     Oxidation     0.0000    0.0000    0.0000        69
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     1.0000    0.0196    0.0385        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.2988       241
     macro avg     0.2597    0.2011    0.0993       241
  weighted avg     0.3007    0.2988    0.1450       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:32:16,565] Trial 91 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.0519,
Macro Precision: 0.2108, Macro Recall: 0.2171, Macro F1-score: 0.1240,
Weighted Precision: 0.1997, Weighted Recall: 0.3112, Weighted F1-score: 0.1616,
Accuracy: 31.12%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 51, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3038    1.0000    0.4660        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.7500    0.0857    0.1538        35
Ubiquitination     0.0000    0.0000    0.0000        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3112       241
     macro avg     0.2108    0.2171    0.1240       241
  weighted avg     0.1997    0.3112    0.1616       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:32:28,131] Trial 92 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.0919,
Macro Precision: 0.5279, Macro Recall: 0.2164, Macro F1-score: 0.1251,
Weighted Precision: 0.6393, Weighted Recall: 0.3154, Weighted F1-score: 0.1719,
Accuracy: 31.54%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3064    1.0000    0.4691        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     0.3333    0.0278    0.0513        36
Ubiquitination     1.0000    0.0400    0.0769        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3154       241
     macro avg     0.5279    0.2164    0.1251       241
  weighted avg     0.6393    0.3154    0.1719       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:32:40,217] Trial 93 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.1221,
Macro Precision: 0.5599, Macro Recall: 0.2095, Macro F1-score: 0.1159,
Weighted Precision: 0.5936, Weighted Recall: 0.3071, Weighted F1-score: 0.1616,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 69, np.int64(3): 50, np.int64(2): 37, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2996    0.9861    0.4595        72
     Oxidation     0.5000    0.0145    0.0282        69
       Phospho     1.0000    0.0270    0.0526        37
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.5599    0.2095    0.1159       241
  weighted avg     0.5936    0.3071    0.1616       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:32:52,235] Trial 94 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2064,
Macro Precision: 0.2600, Macro Recall: 0.2057, Macro F1-score: 0.1034,
Weighted Precision: 0.2349, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     1.0000    0.0286    0.0556        35
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2057    0.1034       241
  weighted avg     0.2349    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:33:04,384] Trial 95 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.2950,
Macro Precision: 0.2600, Macro Recall: 0.2040, Macro F1-score: 0.1002,
Weighted Precision: 0.2971, Weighted Recall: 0.3029, Weighted F1-score: 0.1460,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 50, np.int64(2): 35, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3000    1.0000    0.4615        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     0.0000    0.0000    0.0000        35
Ubiquitination     1.0000    0.0200    0.0392        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2600    0.2040    0.1002       241
  weighted avg     0.2971    0.3029    0.1460       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:33:16,347] Trial 96 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 5.1394,
Macro Precision: 0.2603, Macro Recall: 0.2029, Macro F1-score: 0.0982,
Weighted Precision: 0.3805, Weighted Recall: 0.3029, Weighted F1-score: 0.1465,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     1.0000    0.0143    0.0282        70
       Phospho     0.0000    0.0000    0.0000        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.2603    0.2029    0.0982       241
  weighted avg     0.3805    0.3029    0.1465       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:33:28,178] Trial 97 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 1.8520,
Macro Precision: 0.4603, Macro Recall: 0.2098, Macro F1-score: 0.1117,
Weighted Precision: 0.4427, Weighted Recall: 0.3071, Weighted F1-score: 0.1545,
Accuracy: 30.71%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 71, np.int64(3): 51, np.int64(2): 34, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     0.0000    0.0000    0.0000        71
       Phospho     1.0000    0.0294    0.0571        34
Ubiquitination     1.0000    0.0196    0.0385        51
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3071       241
     macro avg     0.4603    0.2098    0.1117       241
  weighted avg     0.4427    0.3071    0.1545       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:33:40,435] Trial 98 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 3.4227,
Macro Precision: 0.1603, Macro Recall: 0.2027, Macro F1-score: 0.0979,
Weighted Precision: 0.2415, Weighted Recall: 0.3029, Weighted F1-score: 0.1464,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(1): 73, np.int64(0): 72, np.int64(3): 50, np.int64(2): 33, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.3013    1.0000    0.4630        72
     Oxidation     0.5000    0.0137    0.0267        73
       Phospho     0.0000    0.0000    0.0000        33
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.1603    0.2027    0.0979       241
  weighted avg     0.2415    0.3029    0.1464       241

Processing file: /content/drive/MyDrive/data/4_mod_balanced_dataset

[I 2025-09-16 02:33:51,995] Trial 99 pruned. 


Final model weights saved to /dev/null
predictions device: cuda:0
targets device: cuda:0
Batch -: Validation Loss: 2.5242,
Macro Precision: 0.1930, Macro Recall: 0.2083, Macro F1-score: 0.1121,
Weighted Precision: 0.1887, Weighted Recall: 0.3029, Weighted F1-score: 0.1522,
Accuracy: 30.29%, Class Distribution: Counter({np.int64(0): 72, np.int64(1): 70, np.int64(3): 50, np.int64(2): 36, np.int64(4): 13})
Per-class metrics:
                precision    recall  f1-score   support

    Unmodified     0.2983    0.9861    0.4581        72
     Oxidation     0.0000    0.0000    0.0000        70
       Phospho     0.6667    0.0556    0.1026        36
Ubiquitination     0.0000    0.0000    0.0000        50
   Acetylation     0.0000    0.0000    0.0000        13

      accuracy                         0.3029       241
     macro avg     0.1930    0.2083    0.1121       241
  weighted avg     0.1887    0.3029    0.1522       241

Best trial:
{'latent_size': 64, 'num_heads': 4, 'num_layers': 6, 'd