# MLIP hackathon starter notebook

In this challenge, we train a Machine Learning Interatomic Potential (MLIP) model using the [`mlip`](https://github.com/instadeepai/mlip) library.

If you run this on Google Colab, make sure to select a GPU runtime (Runtime > Change runtime type > Hardware accelerator > GPU).

In [212]:
#!pip install mlip "jax[cuda12]"

In [213]:
#!pip install wandb

**Install, required imports, and logging setup**


In [214]:
# import wandb
# !wandb login

In [215]:
import os
import numpy as np
import pandas as pd
import logging
import matplotlib.pyplot as plt

# For dataset loading
from mlip.data import GraphDatasetBuilder, ExtxyzReader

# For model
from mlip.models import Mace, Nequip, Visnet, ForceField

# For optimizer
import optax

# For loss function
from mlip.models.loss import MSELoss

# For training
from mlip.training import TrainingLoop
from mlip.models.model_io import save_model_to_zip, load_model_from_zip
from mlip.models.params_loading import load_parameters_from_checkpoint

# For checkpointing
from mlip.training import TrainingIOHandler, log_metrics_to_line
from mlip.training.training_io_handler import LogCategory

# Set up logging
logging.basicConfig(level=logging.INFO, force=True, format='%(levelname)s - %(message)s')

# Set dedicated logging for mlip. Set this to logging.DEBUG to see more detailed logs.
logging.getLogger("mlip").setLevel(logging.INFO)

Let's also check what device we are using:

In [216]:
import jax

print(jax.devices())

[CudaDevice(id=0)]


## 1. Preparing a dataset

For this example, we train on configurations of a molecule called [3-(benzyloxy)pyridin-2-amine](https://pubchem.ncbi.nlm.nih.gov/substance/854545) (molecular formula `C12H12N2O`, abbreviated as `3BPA`) sampled with Molecular Dynamics at a temperature of 300 Kelvin.

It's molecular structure consists of a **pyridin-2-amine core**:

 * A **pyridine ring** (six-membered aromatic ring with one nitrogen atom).

 * An **amino group (-NH₂)** at the 2-position of the pyridine ring (adjacent to the nitrogen).

Benzyloxy substituent at position 3:

 * A **benzyloxy group (-OCH₂C₆H₅)** is attached to the 3-position of the pyridine ring.

 * This consists of a **methylene bridge (–CH₂–)** bonded to a **phenyl ring (C₆H₅)**, connected via an **ether linkage (–O–)**.

![3BPA](https://go.drugbank.com/structures/DB02352/thumb.svg)

The data processing **is a two step process**:

1. **We read the data from disk into [`ChemicalSystem`](https://instadeepai.github.io/mlip/api_reference/data/chemical_system.html) objects**. This is done by a "reader", and since the dataset is stored in extended xyz format, it can be read with the [`ExtxyzReader`](https://instadeepai.github.io/mlip/api_reference/data/chemical_systems_readers/extxyz_reader.html). The *mlip* library also includes a HDF5 format reader: [`Hdf5Reader`](https://instadeepai.github.io/mlip/api_reference/data/chemical_systems_readers/hdf5_reader.html).

In [217]:
%%bash
mkdir -p data

[ -f data/test_public.xyz ] || wget -P data https://raw.githubusercontent.com/BioGeek/hackathon_IndabaX_2025_mlip/refs/heads/main/data/test_public.xyz
[ -f data/train.xyz ] || wget -P data https://raw.githubusercontent.com/BioGeek/hackathon_IndabaX_2025_mlip/refs/heads/main/data/train.xyz
[ -f data/validation.xyz ] || wget -P data https://raw.githubusercontent.com/BioGeek/hackathon_IndabaX_2025_mlip/refs/heads/main/data/validation.xyz

In [218]:
reader = ExtxyzReader(
    ExtxyzReader.Config(
        train_dataset_paths="data/train.xyz",
        valid_dataset_paths="data/validation.xyz")
)

2. **We process these [`ChemicalSystem`](https://instadeepai.github.io/mlip/api_reference/data/chemical_system.html) objects into graphs.** This process uses the class [`GraphDatasetBuilder`](https://instadeepai.github.io/mlip/api_reference/data/graph_dataset_builder.html) which offers some degree of customisation through its [config class](https://instadeepai.github.io/mlip/api_reference/data/dataset_configs.html#mlip.data.configs.GraphDatasetBuilderConfig).



In [219]:
builder_config = GraphDatasetBuilder.Config(
    graph_cutoff_angstrom=5.0,
    batch_size=16,
)

builder = GraphDatasetBuilder(reader, builder_config)
builder.prepare_datasets() # This step is required to compute all dataset information (used later on by most MLIP model)

train_set, validation_set, _ = builder.get_splits()

Graph creation:   0%|          | 0/500 [00:00<?, ?it/s]

valid graph creation:   0%|          | 0/100 [00:00<?, ?it/s]

test graph creation: 0it [00:00, ?it/s]

INFO - Starting to compute mandatory dataset statistics: this may take some time...


Average number of neighbors computation:   0%|          | 0/500 [00:00<?, ?it/s]

INFO - Processed 10% of data
INFO - Processed 20% of data
INFO - Processed 30% of data
INFO - Processed 40% of data
INFO - Processed 50% of data
INFO - Processed 60% of data
INFO - Processed 70% of data
INFO - Processed 80% of data
INFO - Processed 90% of data
INFO - Processed 100% of data


More information can be found in the [deep-dive on data processing](https://instadeepai.github.io/mlip/user_guide/data_processing.html)  in our documentation for more details.

We can now **print some statistics about our dataset** along with the [`DatasetInfo`](https://instadeepai.github.io/mlip/api_reference/data/dataset_info.html) object that will be required for downstream tasks. The **dataset info** holds all the hyperparameters of the models that are directly derived from the dataset or its processing, e.g., the cutoff distance to determine the graph edges.

In [220]:
print("Dataset info:", builder.dataset_info)
print("Number of batches in train set:", len(train_set))
print("Number of batches in validation set:", len(validation_set))

Dataset info: Atomic Energies: {'H': -723.2941604798493, 'C': -723.2941604798525, 'N': -120.54902674664157, 'O': -60.27451337332064}, Avg. num. neighbors: 16.72, Avg. r_min: 1.00, Graph cutoff distance: 5.0
Number of batches in train set: 32
Number of batches in validation set: 7


The atomic energies and forces are stored in the `energy` and `forces` attributes of the [`ChemicalSystem`](https://instadeepai.github.io/mlip/api_reference/data/chemical_system.html) objects, respectively. The energies are in [eV](https://en.wikipedia.org/wiki/Electronvolt), and the forces are in eV/[Å](https://en.wikipedia.org/wiki/Angstrom).

## 2. Preparing a training loop

To start training, we first need to prepare some prerequisites. These are, as for all ML models:
- A **model architecture**,
- An **optimizer**, and
- A **loss function**

We start with the **model architecture**:

We can use one of the pre-defined models in the *mlip* library, such as MACE, NequIP, or ViSNet. These models are designed to handle molecular graphs and can be configured with various hyperparameters.

For this tutorial, we provide the initialization code for MACE, NequIP and ViSnet, but commented out two of them. For all the hyperparameters available, see the documentations of the [MACE config](https://instadeepai.github.io/mlip/api_reference/models/mace.html#mlip.models.mace.config.MaceConfig), the [NequIP config](https://instadeepai.github.io/mlip/api_reference/models/nequip.html#mlip.models.nequip.config.NequipConfig), and the [ViSNet config](https://instadeepai.github.io/mlip/api_reference/models/visnet.html#mlip.models.visnet.config.VisnetConfig).

The model creation process includes two steps:
* 1: the creation of the MLIP network and
* 2:the creation  of the force field.

See our [deep-dive on models](https://instadeepai.github.io/mlip/user_guide/models.html) for a detailed explanation of this pattern.

**Hint**: try some of the other pre-defined models by uncommenting the lines below and running the cell again. You can also try to change some of the hyperparameters in the config classes, e.g., the number of layers or channels

In [221]:
# run = wandb.init(
#     # Set the wandb entity where your project will be logged (generally your team name).
#     entity="cyberCharl",
#     # Set the wandb project where this run will be logged.
#     project="Instadeep Hackathon",
#     # Track hyperparameters and run metadata.
#     config={
#         "epochs": 100,
#         "node_irreps": "4x0e + 4x0o + 4x1o + 4x1e + 4x2e + 4x2o",
#         "num_layers": 4,
#         "weight_decay": 1e-5,
#         "grad_norm": 50,
#         "num_gradient_accumulation_steps": 1,
#         "init_learning_rate": 5e-4,  # Start a bit higher
#         "peak_learning_rate": 2e-3,  # Target higher peak
#         "final_learning_rate": 1e-4,  # Decay to a lower final LR
#         "warmup_steps": 10000,  # Adjusted for your dataset size and epoch count (approx 5 epochs)
#         "transition_steps": 1840000,
#         "energy_weight": 500.0
#     },
# )

In [222]:
sweep_config = {
    'method': 'bayes',  # or 'grid', 'random'
    'metric': {
        'name': 'val_loss',  # This should match your logged metric
        'goal': 'minimize'
    },
    'parameters': {
        'num_layers': {'values': [2, 4, 6]},
        'init_learning_rate': {'min': 1e-4, 'max': 5e-3},
        'peak_learning_rate': {'min': 1e-3, 'max': 5e-3},
        'final_learning_rate': {'min': 1e-5, 'max': 1e-3},
        'weight_decay': {'min': 1e-6, 'max': 1e-3},
        'energy_weight': {'values': [100.0, 500.0, 1000.0]}
        # Add more as needed
    }
}

The force field will be the essential object required for the

---

training below, as well as for running MD simulations.

In [None]:
import wandb
run = wandb.init(
    # Set the wandb entity where your project will be logged (generally your team name).
    entity="cybercharl-ai-safety-south-africa",
    # Set the wandb project where this run will be logged.
    project="instadeep-hackathon-2025",
    # Track hyperparameters and run metadata.
    config={
        "epochs": 100,
        "node_irreps": "4x0e + 4x0o + 4x1o + 4x1e + 4x2e + 4x2o",
        "num_layers": 4,
        "weight_decay": 1e-5,
        "grad_norm": 50,
        "num_gradient_accumulation_steps": 1,
        "init_learning_rate": 5e-4,  # Start a bit higher
        "peak_learning_rate": 2e-3,  # Target higher peak
        "final_learning_rate": 1e-4,  # Decay to a lower final LR
        "warmup_steps": 10000,  # Adjusted for your dataset size and epoch count (approx 5 epochs)
        "transition_steps": 1840000,
        "energy_weight": 500.0
    }
)

def train():
    # 1. Initialize wandb run
    run = wandb.init() # Commented out this line as the sweep will handle init
    config = run.config

    # 2. Build model with sweep hyperparameters
    mlip_network = Nequip(
        Nequip.Config(
            node_irreps="4x0e + 4x0o + 4x1o + 4x1e + 4x2e + 4x2o",  # or sweep this too
            num_layers=config.num_layers,
        ),
        builder.dataset_info,
    )
    force_field = ForceField.from_mlip_network(mlip_network)

    # 3. Optimizer config
    from mlip.training.optimizer_config import OptimizerConfig # Import here
    optimizer_config = OptimizerConfig(
        apply_weight_decay_mask=True,
        weight_decay=config.weight_decay,
        grad_norm=50,  # or config.grad_norm
        num_gradient_accumulation_steps=1,
        init_learning_rate=config.init_learning_rate,
        peak_learning_rate=config.peak_learning_rate,
        final_learning_rate=config.final_learning_rate,
        warmup_steps=10000,  # or config.warmup_steps
        transition_steps=1840000,  # or config.transition_steps
    )
    from mlip.training.optimizer import get_default_mlip_optimizer # Import here
    optimizer = get_default_mlip_optimizer(optimizer_config)

    # 4. Loss and training config
    from mlip.models.loss import MSELoss # Import here
    loss = MSELoss()
    from mlip.training import TrainingLoop # Import here
    training_config = TrainingLoop.Config(
        num_epochs=100,  # or config.epochs
        energy_weight=config.energy_weight,
        force_weight=0,  # or config.force_weight
    )

    # 5. IO handler and logger
    from mlip.training import TrainingIOHandler, log_metrics_to_line # Import here
    io_handler = TrainingIOHandler(
        TrainingIOHandler.Config(local_model_output_dir="training/model_training")
    )
    io_handler.attach_logger(log_metrics_to_line)

    # Custom logger to log val_loss to wandb
    from mlip.training.training_io_handler import LogCategory # Import here
    def wandb_logger(category, to_log, epoch_number):
        if category == LogCategory.EVAL_METRICS:
          run.log(to_log) # Log val_loss with step


    io_handler.attach_logger(wandb_logger)

    # 6. Training loop
    training_loop = TrainingLoop(
        train_dataset=train_set,
        validation_dataset=validation_set,
        force_field=force_field,
        loss=loss,
        optimizer=optimizer,
        config=training_config,
        io_handler=io_handler,
    )
    training_loop.run()

    # Save the best model
    save_model_to_zip(f"training/model_{wandb.run.id}.zip", training_loop.best_model)

    # Run inference and save CSV
    from ase.io import read as ase_read
    test_data = "data/test_public.xyz"
    structures = ase_read(test_data, index=":")

    from mlip.inference import run_batched_inference
    predictions = run_batched_inference(structures, training_loop.best_model, batch_size=8)
    energies = np.array([prediction.energy for prediction in predictions])

    df = pd.DataFrame({
        'ID': np.arange(len(energies)),
        'energies': energies
    })

    csv_name = f"{wandb.run.id}_submission.csv"
    df.to_csv(csv_name, index=False)
    wandb.save(csv_name)  # Uploads to wandb for easy download

sweep_id = wandb.sweep(sweep_config, project="instadeep-hackathon-2025")
wandb.agent(sweep_id, function=train)

ERROR - Task was destroyed but it is pending!
task: <Task cancelling name='Task-86786' coro=<Event.wait() running at /usr/lib/python3.11/asyncio/locks.py:213> wait_for=<Future cancelled>>


Create sweep with ID: 156uxkmm
Sweep URL: https://wandb.ai/cybercharl-ai-safety-south-africa/instadeep-hackathon-2025/sweeps/156uxkmm


[34m[1mwandb[0m: Agent Starting Run: qubguh2d with config:
[34m[1mwandb[0m: 	energy_weight: 500
[34m[1mwandb[0m: 	final_learning_rate: 0.00027571455565362585
[34m[1mwandb[0m: 	init_learning_rate: 0.000585894064037785
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	peak_learning_rate: 0.0028892082906843974
[34m[1mwandb[0m: 	weight_decay: 0.000806450734688251


INFO - Cleaning up existing temporary directories at /content/training/model_training/model.
INFO - Number of parameters: 29098
INFO - Number of parameters in optimizer: 116398
INFO - Starting training loop...
INFO - Validation: Loss = 87.034 | Mae e = 3.000 | Mae f = 1.389 | Rmse e = 3.066 | Rmse f = 1.866
INFO - Best model: Loss = 87.034 | Best epoch = 0


here??


INFO - ------------ Epoch 1 ------------
INFO - Training:   Loss = 22.330 | Gradient norm = 0.760 | Param update norm = 0.084 | Runtime in seconds = 13.467
INFO - Validation: Loss = 86.395 | Mae e = 3.152 | Mae f = 1.383 | Rmse e = 3.215 | Rmse f = 1.859
INFO - Saving checkpoint at epoch 1...
INFO - Best model: Loss = 86.395 | Best epoch = 1


here??


INFO - ------------ Epoch 2 ------------
INFO - Training:   Loss = 21.347 | Gradient norm = 2.438 | Param update norm = 0.088 | Runtime in seconds = 0.294
INFO - Validation: Loss = 84.800 | Mae e = 3.510 | Mae f = 1.370 | Rmse e = 3.566 | Rmse f = 1.842
INFO - Saving checkpoint at epoch 2...
INFO - Best model: Loss = 84.800 | Best epoch = 2


here??


INFO - ------------ Epoch 3 ------------
INFO - Training:   Loss = 17.062 | Gradient norm = 6.496 | Param update norm = 0.100 | Runtime in seconds = 0.432
INFO - Validation: Loss = 79.606 | Mae e = 4.522 | Mae f = 1.327 | Rmse e = 4.565 | Rmse f = 1.784
INFO - Saving checkpoint at epoch 3...
INFO - Best model: Loss = 79.606 | Best epoch = 3


here??


INFO - ------------ Epoch 4 ------------
INFO - Training:   Loss = 11.498 | Gradient norm = 6.398 | Param update norm = 0.089 | Runtime in seconds = 0.259
INFO - Validation: Loss = 70.259 | Mae e = 6.336 | Mae f = 1.247 | Rmse e = 6.365 | Rmse f = 1.676
INFO - Saving checkpoint at epoch 4...
INFO - Best model: Loss = 70.259 | Best epoch = 4


here??


INFO - ------------ Epoch 5 ------------
INFO - Training:   Loss = 7.958 | Gradient norm = 4.455 | Param update norm = 0.067 | Runtime in seconds = 0.268
INFO - Validation: Loss = 58.912 | Mae e = 8.629 | Mae f = 1.139 | Rmse e = 8.649 | Rmse f = 1.534
INFO - Saving checkpoint at epoch 5...
INFO - Best model: Loss = 58.912 | Best epoch = 5


here??


INFO - ------------ Epoch 6 ------------
INFO - Training:   Loss = 6.814 | Gradient norm = 4.101 | Param update norm = 0.051 | Runtime in seconds = 0.264
INFO - Validation: Loss = 49.111 | Mae e = 10.616 | Mae f = 1.036 | Rmse e = 10.630 | Rmse f = 1.399
INFO - Saving checkpoint at epoch 6...
INFO - Best model: Loss = 49.111 | Best epoch = 6


here??


INFO - ------------ Epoch 7 ------------
INFO - Training:   Loss = 6.196 | Gradient norm = 3.778 | Param update norm = 0.043 | Runtime in seconds = 0.431
INFO - Validation: Loss = 41.424 | Mae e = 12.097 | Mae f = 0.947 | Rmse e = 12.109 | Rmse f = 1.284
INFO - Saving checkpoint at epoch 7...
INFO - Best model: Loss = 41.424 | Best epoch = 7


here??


INFO - ------------ Epoch 8 ------------
INFO - Training:   Loss = 5.795 | Gradient norm = 3.514 | Param update norm = 0.038 | Runtime in seconds = 0.267
INFO - Validation: Loss = 35.654 | Mae e = 13.094 | Mae f = 0.874 | Rmse e = 13.104 | Rmse f = 1.190
INFO - Saving checkpoint at epoch 8...
INFO - Best model: Loss = 35.654 | Best epoch = 8


here??


INFO - ------------ Epoch 9 ------------
INFO - Training:   Loss = 5.496 | Gradient norm = 3.735 | Param update norm = 0.035 | Runtime in seconds = 0.264
INFO - Validation: Loss = 31.278 | Mae e = 13.788 | Mae f = 0.815 | Rmse e = 13.797 | Rmse f = 1.114
INFO - Saving checkpoint at epoch 9...
INFO - Best model: Loss = 31.278 | Best epoch = 9


here??


INFO - ------------ Epoch 10 ------------
INFO - Training:   Loss = 5.243 | Gradient norm = 3.252 | Param update norm = 0.032 | Runtime in seconds = 0.270
INFO - Validation: Loss = 27.925 | Mae e = 14.296 | Mae f = 0.767 | Rmse e = 14.304 | Rmse f = 1.052
INFO - Saving checkpoint at epoch 10...
INFO - Best model: Loss = 27.925 | Best epoch = 10


here??


INFO - ------------ Epoch 11 ------------
INFO - Training:   Loss = 5.032 | Gradient norm = 4.211 | Param update norm = 0.031 | Runtime in seconds = 0.260
INFO - Validation: Loss = 25.363 | Mae e = 14.682 | Mae f = 0.728 | Rmse e = 14.690 | Rmse f = 1.001
INFO - Saving checkpoint at epoch 11...
INFO - Best model: Loss = 25.363 | Best epoch = 11


here??


INFO - ------------ Epoch 12 ------------
INFO - Training:   Loss = 4.849 | Gradient norm = 5.166 | Param update norm = 0.031 | Runtime in seconds = 0.433
INFO - Validation: Loss = 23.417 | Mae e = 14.963 | Mae f = 0.698 | Rmse e = 14.970 | Rmse f = 0.961
INFO - Saving checkpoint at epoch 12...
INFO - Best model: Loss = 23.417 | Best epoch = 12


here??


INFO - ------------ Epoch 13 ------------
INFO - Training:   Loss = 4.652 | Gradient norm = 4.344 | Param update norm = 0.028 | Runtime in seconds = 0.266
INFO - Validation: Loss = 21.892 | Mae e = 15.167 | Mae f = 0.674 | Rmse e = 15.174 | Rmse f = 0.929
INFO - Saving checkpoint at epoch 13...
INFO - Best model: Loss = 21.892 | Best epoch = 13


here??


INFO - ------------ Epoch 14 ------------
INFO - Training:   Loss = 4.485 | Gradient norm = 4.806 | Param update norm = 0.028 | Runtime in seconds = 0.427
INFO - Validation: Loss = 20.685 | Mae e = 15.308 | Mae f = 0.655 | Rmse e = 15.315 | Rmse f = 0.903
INFO - Saving checkpoint at epoch 14...
INFO - Best model: Loss = 20.685 | Best epoch = 14


here??


INFO - ------------ Epoch 15 ------------
INFO - Training:   Loss = 4.317 | Gradient norm = 5.409 | Param update norm = 0.027 | Runtime in seconds = 0.281
INFO - Validation: Loss = 19.698 | Mae e = 15.375 | Mae f = 0.639 | Rmse e = 15.381 | Rmse f = 0.880
INFO - Saving checkpoint at epoch 15...
INFO - Best model: Loss = 19.698 | Best epoch = 15


here??


INFO - ------------ Epoch 16 ------------
INFO - Training:   Loss = 4.146 | Gradient norm = 4.741 | Param update norm = 0.026 | Runtime in seconds = 0.349
INFO - Validation: Loss = 18.856 | Mae e = 15.412 | Mae f = 0.626 | Rmse e = 15.418 | Rmse f = 0.861
INFO - Saving checkpoint at epoch 16...
INFO - Best model: Loss = 18.856 | Best epoch = 16


here??


INFO - ------------ Epoch 17 ------------
INFO - Training:   Loss = 3.994 | Gradient norm = 5.785 | Param update norm = 0.027 | Runtime in seconds = 0.443
INFO - Validation: Loss = 18.139 | Mae e = 15.389 | Mae f = 0.614 | Rmse e = 15.395 | Rmse f = 0.844
INFO - Saving checkpoint at epoch 17...
INFO - Best model: Loss = 18.139 | Best epoch = 17


here??


INFO - ------------ Epoch 18 ------------
INFO - Training:   Loss = 3.841 | Gradient norm = 6.331 | Param update norm = 0.026 | Runtime in seconds = 0.317
INFO - Validation: Loss = 17.499 | Mae e = 15.290 | Mae f = 0.604 | Rmse e = 15.297 | Rmse f = 0.829
INFO - Saving checkpoint at epoch 18...
INFO - Best model: Loss = 17.499 | Best epoch = 18


here??


INFO - ------------ Epoch 19 ------------
INFO - Training:   Loss = 3.698 | Gradient norm = 6.612 | Param update norm = 0.027 | Runtime in seconds = 0.336
INFO - Validation: Loss = 16.929 | Mae e = 15.143 | Mae f = 0.594 | Rmse e = 15.150 | Rmse f = 0.815
INFO - Saving checkpoint at epoch 19...
INFO - Best model: Loss = 16.929 | Best epoch = 19


here??


INFO - ------------ Epoch 20 ------------
INFO - Training:   Loss = 3.571 | Gradient norm = 8.884 | Param update norm = 0.027 | Runtime in seconds = 0.326
INFO - Validation: Loss = 16.391 | Mae e = 14.965 | Mae f = 0.585 | Rmse e = 14.971 | Rmse f = 0.802
INFO - Saving checkpoint at epoch 20...
INFO - Best model: Loss = 16.391 | Best epoch = 20


here??


INFO - ------------ Epoch 21 ------------
INFO - Training:   Loss = 3.408 | Gradient norm = 6.412 | Param update norm = 0.024 | Runtime in seconds = 0.471
INFO - Validation: Loss = 15.913 | Mae e = 14.694 | Mae f = 0.576 | Rmse e = 14.701 | Rmse f = 0.790
INFO - Saving checkpoint at epoch 21...
INFO - Best model: Loss = 15.913 | Best epoch = 21


here??


INFO - ------------ Epoch 22 ------------
INFO - Training:   Loss = 3.286 | Gradient norm = 7.066 | Param update norm = 0.024 | Runtime in seconds = 0.264
INFO - Validation: Loss = 15.430 | Mae e = 14.398 | Mae f = 0.568 | Rmse e = 14.404 | Rmse f = 0.778
INFO - Saving checkpoint at epoch 22...
INFO - Best model: Loss = 15.430 | Best epoch = 22


here??


INFO - ------------ Epoch 23 ------------
INFO - Training:   Loss = 3.161 | Gradient norm = 7.659 | Param update norm = 0.024 | Runtime in seconds = 0.263
INFO - Validation: Loss = 14.997 | Mae e = 14.010 | Mae f = 0.560 | Rmse e = 14.016 | Rmse f = 0.768
INFO - Saving checkpoint at epoch 23...
INFO - Best model: Loss = 14.997 | Best epoch = 23


here??


INFO - ------------ Epoch 24 ------------
INFO - Training:   Loss = 3.039 | Gradient norm = 8.095 | Param update norm = 0.024 | Runtime in seconds = 0.436
INFO - Validation: Loss = 14.573 | Mae e = 13.552 | Mae f = 0.552 | Rmse e = 13.559 | Rmse f = 0.757
INFO - Saving checkpoint at epoch 24...
INFO - Best model: Loss = 14.573 | Best epoch = 24


here??


INFO - ------------ Epoch 25 ------------
INFO - Training:   Loss = 2.932 | Gradient norm = 8.859 | Param update norm = 0.025 | Runtime in seconds = 0.260
INFO - Validation: Loss = 14.174 | Mae e = 13.024 | Mae f = 0.545 | Rmse e = 13.031 | Rmse f = 0.747
INFO - Saving checkpoint at epoch 25...
INFO - Best model: Loss = 14.174 | Best epoch = 25


here??


INFO - ------------ Epoch 26 ------------
INFO - Training:   Loss = 2.817 | Gradient norm = 8.108 | Param update norm = 0.024 | Runtime in seconds = 0.305
INFO - Validation: Loss = 13.776 | Mae e = 12.461 | Mae f = 0.537 | Rmse e = 12.468 | Rmse f = 0.737
INFO - Saving checkpoint at epoch 26...
INFO - Best model: Loss = 13.776 | Best epoch = 26


here??


INFO - ------------ Epoch 27 ------------
INFO - Training:   Loss = 2.693 | Gradient norm = 7.334 | Param update norm = 0.022 | Runtime in seconds = 0.452
INFO - Validation: Loss = 13.381 | Mae e = 11.849 | Mae f = 0.530 | Rmse e = 11.855 | Rmse f = 0.726
INFO - Saving checkpoint at epoch 27...
INFO - Best model: Loss = 13.381 | Best epoch = 27


here??


INFO - ------------ Epoch 28 ------------
INFO - Training:   Loss = 2.603 | Gradient norm = 11.282 | Param update norm = 0.024 | Runtime in seconds = 0.445
INFO - Validation: Loss = 13.000 | Mae e = 11.212 | Mae f = 0.523 | Rmse e = 11.219 | Rmse f = 0.716
INFO - Saving checkpoint at epoch 28...
INFO - Best model: Loss = 13.000 | Best epoch = 28


here??


INFO - ------------ Epoch 29 ------------
INFO - Training:   Loss = 2.478 | Gradient norm = 10.272 | Param update norm = 0.022 | Runtime in seconds = 0.473
INFO - Validation: Loss = 12.605 | Mae e = 10.571 | Mae f = 0.516 | Rmse e = 10.578 | Rmse f = 0.706
INFO - Saving checkpoint at epoch 29...
INFO - Best model: Loss = 12.605 | Best epoch = 29


here??


INFO - ------------ Epoch 30 ------------
INFO - Training:   Loss = 2.372 | Gradient norm = 8.334 | Param update norm = 0.022 | Runtime in seconds = 0.356
INFO - Validation: Loss = 12.214 | Mae e = 9.916 | Mae f = 0.508 | Rmse e = 9.923 | Rmse f = 0.695
INFO - Saving checkpoint at epoch 30...
INFO - Best model: Loss = 12.214 | Best epoch = 30


here??


INFO - ------------ Epoch 31 ------------
INFO - Training:   Loss = 2.276 | Gradient norm = 9.202 | Param update norm = 0.023 | Runtime in seconds = 0.421
INFO - Validation: Loss = 11.842 | Mae e = 9.243 | Mae f = 0.501 | Rmse e = 9.251 | Rmse f = 0.685
INFO - Saving checkpoint at epoch 31...
INFO - Best model: Loss = 11.842 | Best epoch = 31


here??


INFO - ------------ Epoch 32 ------------
INFO - Training:   Loss = 2.157 | Gradient norm = 7.736 | Param update norm = 0.021 | Runtime in seconds = 0.263
INFO - Validation: Loss = 11.476 | Mae e = 8.578 | Mae f = 0.493 | Rmse e = 8.586 | Rmse f = 0.675
INFO - Saving checkpoint at epoch 32...
INFO - Best model: Loss = 11.476 | Best epoch = 32


here??


INFO - ------------ Epoch 33 ------------
INFO - Training:   Loss = 2.083 | Gradient norm = 10.759 | Param update norm = 0.021 | Runtime in seconds = 0.268
INFO - Validation: Loss = 11.111 | Mae e = 7.944 | Mae f = 0.486 | Rmse e = 7.953 | Rmse f = 0.664
INFO - Saving checkpoint at epoch 33...
INFO - Best model: Loss = 11.111 | Best epoch = 33


here??


INFO - ------------ Epoch 34 ------------
INFO - Training:   Loss = 1.978 | Gradient norm = 7.121 | Param update norm = 0.019 | Runtime in seconds = 0.263
INFO - Validation: Loss = 10.789 | Mae e = 7.300 | Mae f = 0.478 | Rmse e = 7.309 | Rmse f = 0.655
INFO - Saving checkpoint at epoch 34...
INFO - Best model: Loss = 10.789 | Best epoch = 34


here??


INFO - ------------ Epoch 35 ------------
INFO - Training:   Loss = 1.928 | Gradient norm = 10.636 | Param update norm = 0.021 | Runtime in seconds = 0.265
INFO - Validation: Loss = 10.472 | Mae e = 6.692 | Mae f = 0.471 | Rmse e = 6.702 | Rmse f = 0.645
INFO - Saving checkpoint at epoch 35...
INFO - Best model: Loss = 10.472 | Best epoch = 35


here??


INFO - ------------ Epoch 36 ------------
INFO - Training:   Loss = 1.856 | Gradient norm = 8.533 | Param update norm = 0.020 | Runtime in seconds = 0.428
INFO - Validation: Loss = 10.209 | Mae e = 6.105 | Mae f = 0.465 | Rmse e = 6.117 | Rmse f = 0.637
INFO - Saving checkpoint at epoch 36...
INFO - Best model: Loss = 10.209 | Best epoch = 36


here??


INFO - ------------ Epoch 37 ------------
INFO - Training:   Loss = 1.797 | Gradient norm = 9.222 | Param update norm = 0.019 | Runtime in seconds = 0.258
INFO - Validation: Loss = 9.951 | Mae e = 5.560 | Mae f = 0.459 | Rmse e = 5.572 | Rmse f = 0.630
INFO - Saving checkpoint at epoch 37...
INFO - Best model: Loss = 9.951 | Best epoch = 37


here??


INFO - ------------ Epoch 38 ------------
INFO - Training:   Loss = 1.736 | Gradient norm = 7.757 | Param update norm = 0.018 | Runtime in seconds = 0.428
INFO - Validation: Loss = 9.711 | Mae e = 5.073 | Mae f = 0.453 | Rmse e = 5.086 | Rmse f = 0.622
INFO - Saving checkpoint at epoch 38...
INFO - Best model: Loss = 9.711 | Best epoch = 38


here??


INFO - ------------ Epoch 39 ------------
INFO - Training:   Loss = 1.683 | Gradient norm = 6.835 | Param update norm = 0.017 | Runtime in seconds = 0.258
INFO - Validation: Loss = 9.510 | Mae e = 4.617 | Mae f = 0.448 | Rmse e = 4.631 | Rmse f = 0.616
INFO - Saving checkpoint at epoch 39...
INFO - Best model: Loss = 9.510 | Best epoch = 39


here??


INFO - ------------ Epoch 40 ------------
INFO - Training:   Loss = 1.636 | Gradient norm = 6.694 | Param update norm = 0.017 | Runtime in seconds = 0.432
INFO - Validation: Loss = 9.318 | Mae e = 4.176 | Mae f = 0.443 | Rmse e = 4.192 | Rmse f = 0.610
INFO - Saving checkpoint at epoch 40...
INFO - Best model: Loss = 9.318 | Best epoch = 40


here??


INFO - ------------ Epoch 41 ------------
INFO - Training:   Loss = 1.610 | Gradient norm = 7.857 | Param update norm = 0.018 | Runtime in seconds = 0.430
INFO - Validation: Loss = 9.154 | Mae e = 3.770 | Mae f = 0.438 | Rmse e = 3.788 | Rmse f = 0.604
INFO - Saving checkpoint at epoch 41...
INFO - Best model: Loss = 9.154 | Best epoch = 41


here??


INFO - ------------ Epoch 42 ------------
INFO - Training:   Loss = 1.571 | Gradient norm = 8.534 | Param update norm = 0.018 | Runtime in seconds = 0.266
INFO - Validation: Loss = 9.014 | Mae e = 3.387 | Mae f = 0.434 | Rmse e = 3.406 | Rmse f = 0.600
INFO - Saving checkpoint at epoch 42...
INFO - Best model: Loss = 9.014 | Best epoch = 42


here??


INFO - ------------ Epoch 43 ------------
INFO - Training:   Loss = 1.537 | Gradient norm = 8.871 | Param update norm = 0.018 | Runtime in seconds = 0.275
INFO - Validation: Loss = 8.856 | Mae e = 3.066 | Mae f = 0.430 | Rmse e = 3.087 | Rmse f = 0.595
INFO - Saving checkpoint at epoch 43...
INFO - Best model: Loss = 8.856 | Best epoch = 43


here??


INFO - ------------ Epoch 44 ------------
INFO - Training:   Loss = 1.496 | Gradient norm = 7.190 | Param update norm = 0.017 | Runtime in seconds = 0.321
INFO - Validation: Loss = 8.745 | Mae e = 2.747 | Mae f = 0.427 | Rmse e = 2.770 | Rmse f = 0.591
INFO - Saving checkpoint at epoch 44...
INFO - Best model: Loss = 8.745 | Best epoch = 44


here??


INFO - ------------ Epoch 45 ------------
INFO - Training:   Loss = 1.465 | Gradient norm = 7.185 | Param update norm = 0.016 | Runtime in seconds = 0.513
INFO - Validation: Loss = 8.622 | Mae e = 2.466 | Mae f = 0.423 | Rmse e = 2.493 | Rmse f = 0.587
INFO - Saving checkpoint at epoch 45...
INFO - Best model: Loss = 8.622 | Best epoch = 45


here??


INFO - ------------ Epoch 46 ------------
INFO - Training:   Loss = 1.435 | Gradient norm = 7.036 | Param update norm = 0.016 | Runtime in seconds = 0.308
INFO - Validation: Loss = 8.515 | Mae e = 2.212 | Mae f = 0.420 | Rmse e = 2.241 | Rmse f = 0.583
INFO - Saving checkpoint at epoch 46...
INFO - Best model: Loss = 8.515 | Best epoch = 46


here??


INFO - ------------ Epoch 47 ------------
INFO - Training:   Loss = 1.415 | Gradient norm = 7.683 | Param update norm = 0.017 | Runtime in seconds = 0.341
INFO - Validation: Loss = 8.420 | Mae e = 1.963 | Mae f = 0.417 | Rmse e = 1.996 | Rmse f = 0.580
INFO - Saving checkpoint at epoch 47...
INFO - Best model: Loss = 8.420 | Best epoch = 47


here??


INFO - ------------ Epoch 48 ------------
INFO - Training:   Loss = 1.396 | Gradient norm = 8.983 | Param update norm = 0.017 | Runtime in seconds = 0.509
INFO - Validation: Loss = 8.325 | Mae e = 1.779 | Mae f = 0.414 | Rmse e = 1.815 | Rmse f = 0.577
INFO - Saving checkpoint at epoch 48...
INFO - Best model: Loss = 8.325 | Best epoch = 48


here??


INFO - ------------ Epoch 49 ------------
INFO - Training:   Loss = 1.361 | Gradient norm = 6.717 | Param update norm = 0.016 | Runtime in seconds = 0.279
INFO - Validation: Loss = 8.228 | Mae e = 1.563 | Mae f = 0.411 | Rmse e = 1.604 | Rmse f = 0.574
INFO - Saving checkpoint at epoch 49...
INFO - Best model: Loss = 8.228 | Best epoch = 49


here??


INFO - ------------ Epoch 50 ------------
INFO - Training:   Loss = 1.328 | Gradient norm = 5.614 | Param update norm = 0.015 | Runtime in seconds = 0.253
INFO - Validation: Loss = 8.146 | Mae e = 1.396 | Mae f = 0.408 | Rmse e = 1.441 | Rmse f = 0.571
INFO - Saving checkpoint at epoch 50...
INFO - Best model: Loss = 8.146 | Best epoch = 50


here??


INFO - ------------ Epoch 51 ------------
INFO - Training:   Loss = 1.313 | Gradient norm = 6.771 | Param update norm = 0.015 | Runtime in seconds = 0.253
INFO - Validation: Loss = 8.055 | Mae e = 1.253 | Mae f = 0.405 | Rmse e = 1.300 | Rmse f = 0.568
INFO - Saving checkpoint at epoch 51...
INFO - Best model: Loss = 8.055 | Best epoch = 51


here??


INFO - ------------ Epoch 52 ------------
INFO - Training:   Loss = 1.297 | Gradient norm = 7.227 | Param update norm = 0.016 | Runtime in seconds = 0.261
INFO - Validation: Loss = 7.986 | Mae e = 1.101 | Mae f = 0.403 | Rmse e = 1.151 | Rmse f = 0.565
INFO - Saving checkpoint at epoch 52...
INFO - Best model: Loss = 7.986 | Best epoch = 52


here??


INFO - ------------ Epoch 53 ------------
INFO - Training:   Loss = 1.277 | Gradient norm = 7.282 | Param update norm = 0.016 | Runtime in seconds = 0.259
INFO - Validation: Loss = 7.909 | Mae e = 0.984 | Mae f = 0.400 | Rmse e = 1.038 | Rmse f = 0.562
INFO - Saving checkpoint at epoch 53...
INFO - Best model: Loss = 7.909 | Best epoch = 53


here??


INFO - ------------ Epoch 54 ------------
INFO - Training:   Loss = 1.249 | Gradient norm = 6.614 | Param update norm = 0.015 | Runtime in seconds = 0.439
INFO - Validation: Loss = 7.839 | Mae e = 0.877 | Mae f = 0.398 | Rmse e = 0.935 | Rmse f = 0.560
INFO - Saving checkpoint at epoch 54...
INFO - Best model: Loss = 7.839 | Best epoch = 54


here??


INFO - ------------ Epoch 55 ------------
INFO - Training:   Loss = 1.225 | Gradient norm = 5.871 | Param update norm = 0.014 | Runtime in seconds = 0.258
INFO - Validation: Loss = 7.772 | Mae e = 0.750 | Mae f = 0.396 | Rmse e = 0.814 | Rmse f = 0.558
INFO - Saving checkpoint at epoch 55...
INFO - Best model: Loss = 7.772 | Best epoch = 55


here??


INFO - ------------ Epoch 56 ------------
INFO - Training:   Loss = 1.212 | Gradient norm = 6.603 | Param update norm = 0.015 | Runtime in seconds = 0.425
INFO - Validation: Loss = 7.702 | Mae e = 0.650 | Mae f = 0.393 | Rmse e = 0.720 | Rmse f = 0.555
INFO - Saving checkpoint at epoch 56...
INFO - Best model: Loss = 7.702 | Best epoch = 56


here??


INFO - ------------ Epoch 57 ------------
INFO - Training:   Loss = 1.195 | Gradient norm = 6.769 | Param update norm = 0.015 | Runtime in seconds = 0.261
INFO - Validation: Loss = 7.646 | Mae e = 0.578 | Mae f = 0.391 | Rmse e = 0.652 | Rmse f = 0.553
INFO - Saving checkpoint at epoch 57...
INFO - Best model: Loss = 7.646 | Best epoch = 57


here??


INFO - ------------ Epoch 58 ------------
INFO - Training:   Loss = 1.172 | Gradient norm = 6.056 | Param update norm = 0.015 | Runtime in seconds = 0.257
INFO - Validation: Loss = 7.579 | Mae e = 0.538 | Mae f = 0.389 | Rmse e = 0.615 | Rmse f = 0.551
INFO - Saving checkpoint at epoch 58...
INFO - Best model: Loss = 7.579 | Best epoch = 58


here??


INFO - ------------ Epoch 59 ------------
INFO - Training:   Loss = 1.168 | Gradient norm = 8.508 | Param update norm = 0.016 | Runtime in seconds = 0.441
INFO - Validation: Loss = 7.515 | Mae e = 0.476 | Mae f = 0.387 | Rmse e = 0.560 | Rmse f = 0.548
INFO - Saving checkpoint at epoch 59...
INFO - Best model: Loss = 7.515 | Best epoch = 59


here??


INFO - ------------ Epoch 60 ------------
INFO - Training:   Loss = 1.155 | Gradient norm = 8.580 | Param update norm = 0.017 | Runtime in seconds = 0.269
INFO - Validation: Loss = 7.450 | Mae e = 0.451 | Mae f = 0.385 | Rmse e = 0.538 | Rmse f = 0.546
INFO - Saving checkpoint at epoch 60...
INFO - Best model: Loss = 7.450 | Best epoch = 60


here??


INFO - ------------ Epoch 61 ------------
INFO - Training:   Loss = 1.141 | Gradient norm = 8.913 | Param update norm = 0.017 | Runtime in seconds = 0.431
INFO - Validation: Loss = 7.394 | Mae e = 0.406 | Mae f = 0.383 | Rmse e = 0.498 | Rmse f = 0.544
INFO - Saving checkpoint at epoch 61...
INFO - Best model: Loss = 7.394 | Best epoch = 61


here??


INFO - ------------ Epoch 62 ------------
INFO - Training:   Loss = 1.120 | Gradient norm = 7.619 | Param update norm = 0.016 | Runtime in seconds = 0.279
INFO - Validation: Loss = 7.337 | Mae e = 0.357 | Mae f = 0.381 | Rmse e = 0.455 | Rmse f = 0.542
INFO - Saving checkpoint at epoch 62...
INFO - Best model: Loss = 7.337 | Best epoch = 62


here??


INFO - ------------ Epoch 63 ------------
INFO - Training:   Loss = 1.108 | Gradient norm = 7.252 | Param update norm = 0.017 | Runtime in seconds = 0.270
INFO - Validation: Loss = 7.282 | Mae e = 0.326 | Mae f = 0.379 | Rmse e = 0.424 | Rmse f = 0.540
INFO - Saving checkpoint at epoch 63...
INFO - Best model: Loss = 7.282 | Best epoch = 63


here??


INFO - ------------ Epoch 64 ------------
INFO - Training:   Loss = 1.085 | Gradient norm = 7.508 | Param update norm = 0.015 | Runtime in seconds = 0.268
INFO - Validation: Loss = 7.216 | Mae e = 0.320 | Mae f = 0.377 | Rmse e = 0.417 | Rmse f = 0.537
INFO - Saving checkpoint at epoch 64...
INFO - Best model: Loss = 7.216 | Best epoch = 64


here??


INFO - ------------ Epoch 65 ------------
INFO - Training:   Loss = 1.066 | Gradient norm = 6.033 | Param update norm = 0.014 | Runtime in seconds = 0.255
INFO - Validation: Loss = 7.165 | Mae e = 0.300 | Mae f = 0.375 | Rmse e = 0.397 | Rmse f = 0.535
INFO - Saving checkpoint at epoch 65...
INFO - Best model: Loss = 7.165 | Best epoch = 65


here??


INFO - ------------ Epoch 66 ------------
INFO - Training:   Loss = 1.064 | Gradient norm = 8.352 | Param update norm = 0.016 | Runtime in seconds = 0.427
INFO - Validation: Loss = 7.112 | Mae e = 0.283 | Mae f = 0.373 | Rmse e = 0.378 | Rmse f = 0.533
INFO - Saving checkpoint at epoch 66...
INFO - Best model: Loss = 7.112 | Best epoch = 66


here??


INFO - ------------ Epoch 67 ------------
INFO - Training:   Loss = 1.045 | Gradient norm = 7.516 | Param update norm = 0.015 | Runtime in seconds = 0.260
INFO - Validation: Loss = 7.046 | Mae e = 0.279 | Mae f = 0.371 | Rmse e = 0.373 | Rmse f = 0.531
INFO - Saving checkpoint at epoch 67...
INFO - Best model: Loss = 7.046 | Best epoch = 67


here??


INFO - ------------ Epoch 68 ------------
INFO - Training:   Loss = 1.020 | Gradient norm = 5.810 | Param update norm = 0.013 | Runtime in seconds = 0.428
INFO - Validation: Loss = 6.999 | Mae e = 0.265 | Mae f = 0.369 | Rmse e = 0.357 | Rmse f = 0.529
INFO - Saving checkpoint at epoch 68...
INFO - Best model: Loss = 6.999 | Best epoch = 68


here??


INFO - ------------ Epoch 69 ------------
INFO - Training:   Loss = 1.010 | Gradient norm = 6.961 | Param update norm = 0.014 | Runtime in seconds = 1.237
INFO - Validation: Loss = 6.936 | Mae e = 0.264 | Mae f = 0.367 | Rmse e = 0.355 | Rmse f = 0.527
INFO - Saving checkpoint at epoch 69...
INFO - Best model: Loss = 6.936 | Best epoch = 69


here??


INFO - ------------ Epoch 70 ------------
INFO - Training:   Loss = 1.017 | Gradient norm = 8.988 | Param update norm = 0.017 | Runtime in seconds = 0.264
INFO - Validation: Loss = 6.889 | Mae e = 0.259 | Mae f = 0.365 | Rmse e = 0.349 | Rmse f = 0.525
INFO - Saving checkpoint at epoch 70...
INFO - Best model: Loss = 6.889 | Best epoch = 70


here??


INFO - ------------ Epoch 71 ------------
INFO - Training:   Loss = 0.997 | Gradient norm = 7.650 | Param update norm = 0.016 | Runtime in seconds = 0.264
INFO - Validation: Loss = 6.835 | Mae e = 0.262 | Mae f = 0.364 | Rmse e = 0.352 | Rmse f = 0.523
INFO - Saving checkpoint at epoch 71...
INFO - Best model: Loss = 6.835 | Best epoch = 71


here??


INFO - ------------ Epoch 72 ------------
INFO - Training:   Loss = 0.973 | Gradient norm = 6.215 | Param update norm = 0.013 | Runtime in seconds = 0.328
INFO - Validation: Loss = 6.780 | Mae e = 0.259 | Mae f = 0.362 | Rmse e = 0.347 | Rmse f = 0.521
INFO - Saving checkpoint at epoch 72...
INFO - Best model: Loss = 6.780 | Best epoch = 72


here??


INFO - ------------ Epoch 73 ------------
INFO - Training:   Loss = 0.971 | Gradient norm = 7.476 | Param update norm = 0.016 | Runtime in seconds = 0.315
INFO - Validation: Loss = 6.726 | Mae e = 0.256 | Mae f = 0.360 | Rmse e = 0.344 | Rmse f = 0.519
INFO - Saving checkpoint at epoch 73...
INFO - Best model: Loss = 6.726 | Best epoch = 73


here??


INFO - ------------ Epoch 74 ------------
INFO - Training:   Loss = 0.953 | Gradient norm = 6.995 | Param update norm = 0.014 | Runtime in seconds = 0.315
INFO - Validation: Loss = 6.680 | Mae e = 0.251 | Mae f = 0.358 | Rmse e = 0.337 | Rmse f = 0.517
INFO - Saving checkpoint at epoch 74...
INFO - Best model: Loss = 6.680 | Best epoch = 74


here??


INFO - ------------ Epoch 75 ------------
INFO - Training:   Loss = 0.939 | Gradient norm = 6.286 | Param update norm = 0.014 | Runtime in seconds = 0.317
INFO - Validation: Loss = 6.627 | Mae e = 0.249 | Mae f = 0.357 | Rmse e = 0.333 | Rmse f = 0.515
INFO - Saving checkpoint at epoch 75...
INFO - Best model: Loss = 6.627 | Best epoch = 75


here??


INFO - ------------ Epoch 76 ------------
INFO - Training:   Loss = 0.939 | Gradient norm = 7.318 | Param update norm = 0.016 | Runtime in seconds = 0.455
INFO - Validation: Loss = 6.584 | Mae e = 0.247 | Mae f = 0.355 | Rmse e = 0.330 | Rmse f = 0.513
INFO - Saving checkpoint at epoch 76...
INFO - Best model: Loss = 6.584 | Best epoch = 76


here??


INFO - ------------ Epoch 77 ------------
INFO - Training:   Loss = 0.922 | Gradient norm = 7.170 | Param update norm = 0.014 | Runtime in seconds = 0.327
INFO - Validation: Loss = 6.530 | Mae e = 0.245 | Mae f = 0.353 | Rmse e = 0.328 | Rmse f = 0.511
INFO - Saving checkpoint at epoch 77...
INFO - Best model: Loss = 6.530 | Best epoch = 77


here??


INFO - ------------ Epoch 78 ------------
INFO - Training:   Loss = 0.904 | Gradient norm = 5.270 | Param update norm = 0.013 | Runtime in seconds = 0.258
INFO - Validation: Loss = 6.477 | Mae e = 0.245 | Mae f = 0.352 | Rmse e = 0.327 | Rmse f = 0.509
INFO - Saving checkpoint at epoch 78...
INFO - Best model: Loss = 6.477 | Best epoch = 78


here??


INFO - ------------ Epoch 79 ------------
INFO - Training:   Loss = 0.903 | Gradient norm = 6.911 | Param update norm = 0.015 | Runtime in seconds = 0.269
INFO - Validation: Loss = 6.430 | Mae e = 0.246 | Mae f = 0.350 | Rmse e = 0.329 | Rmse f = 0.507
INFO - Saving checkpoint at epoch 79...
INFO - Best model: Loss = 6.430 | Best epoch = 79


here??


INFO - ------------ Epoch 80 ------------
INFO - Training:   Loss = 0.899 | Gradient norm = 6.731 | Param update norm = 0.016 | Runtime in seconds = 0.432
INFO - Validation: Loss = 6.382 | Mae e = 0.244 | Mae f = 0.348 | Rmse e = 0.326 | Rmse f = 0.505
INFO - Saving checkpoint at epoch 80...
INFO - Best model: Loss = 6.382 | Best epoch = 80


here??


INFO - ------------ Epoch 81 ------------
INFO - Training:   Loss = 0.882 | Gradient norm = 5.890 | Param update norm = 0.014 | Runtime in seconds = 0.261
INFO - Validation: Loss = 6.334 | Mae e = 0.247 | Mae f = 0.347 | Rmse e = 0.330 | Rmse f = 0.503
INFO - Saving checkpoint at epoch 81...
INFO - Best model: Loss = 6.334 | Best epoch = 81


here??


INFO - ------------ Epoch 82 ------------
INFO - Training:   Loss = 0.872 | Gradient norm = 6.644 | Param update norm = 0.014 | Runtime in seconds = 0.269
INFO - Validation: Loss = 6.284 | Mae e = 0.244 | Mae f = 0.345 | Rmse e = 0.326 | Rmse f = 0.501
INFO - Saving checkpoint at epoch 82...
INFO - Best model: Loss = 6.284 | Best epoch = 82


here??


INFO - ------------ Epoch 83 ------------
INFO - Training:   Loss = 0.869 | Gradient norm = 6.878 | Param update norm = 0.015 | Runtime in seconds = 0.262
INFO - Validation: Loss = 6.240 | Mae e = 0.248 | Mae f = 0.344 | Rmse e = 0.331 | Rmse f = 0.500
INFO - Saving checkpoint at epoch 83...
INFO - Best model: Loss = 6.240 | Best epoch = 83


here??


INFO - ------------ Epoch 84 ------------
INFO - Training:   Loss = 0.851 | Gradient norm = 5.766 | Param update norm = 0.013 | Runtime in seconds = 0.263
INFO - Validation: Loss = 6.194 | Mae e = 0.250 | Mae f = 0.342 | Rmse e = 0.333 | Rmse f = 0.498
INFO - Saving checkpoint at epoch 84...
INFO - Best model: Loss = 6.194 | Best epoch = 84


here??


INFO - ------------ Epoch 85 ------------
INFO - Training:   Loss = 0.854 | Gradient norm = 7.125 | Param update norm = 0.016 | Runtime in seconds = 0.424
INFO - Validation: Loss = 6.141 | Mae e = 0.261 | Mae f = 0.341 | Rmse e = 0.343 | Rmse f = 0.496
INFO - Saving checkpoint at epoch 85...
INFO - Best model: Loss = 6.141 | Best epoch = 85


here??


INFO - ------------ Epoch 86 ------------
INFO - Training:   Loss = 0.832 | Gradient norm = 5.463 | Param update norm = 0.013 | Runtime in seconds = 0.261
INFO - Validation: Loss = 6.089 | Mae e = 0.266 | Mae f = 0.339 | Rmse e = 0.347 | Rmse f = 0.494
INFO - Saving checkpoint at epoch 86...
INFO - Best model: Loss = 6.089 | Best epoch = 86


here??


INFO - ------------ Epoch 87 ------------
INFO - Training:   Loss = 0.829 | Gradient norm = 6.478 | Param update norm = 0.014 | Runtime in seconds = 0.276
INFO - Validation: Loss = 6.046 | Mae e = 0.257 | Mae f = 0.337 | Rmse e = 0.337 | Rmse f = 0.492
INFO - Saving checkpoint at epoch 87...
INFO - Best model: Loss = 6.046 | Best epoch = 87


here??


INFO - ------------ Epoch 88 ------------
INFO - Training:   Loss = 0.823 | Gradient norm = 7.326 | Param update norm = 0.014 | Runtime in seconds = 0.278
INFO - Validation: Loss = 6.001 | Mae e = 0.262 | Mae f = 0.336 | Rmse e = 0.342 | Rmse f = 0.490
INFO - Saving checkpoint at epoch 88...
INFO - Best model: Loss = 6.001 | Best epoch = 88


here??


INFO - ------------ Epoch 89 ------------
INFO - Training:   Loss = 0.814 | Gradient norm = 6.449 | Param update norm = 0.014 | Runtime in seconds = 0.274
INFO - Validation: Loss = 5.957 | Mae e = 0.270 | Mae f = 0.335 | Rmse e = 0.349 | Rmse f = 0.488
INFO - Saving checkpoint at epoch 89...
INFO - Best model: Loss = 5.957 | Best epoch = 89


here??


INFO - ------------ Epoch 90 ------------
INFO - Training:   Loss = 0.804 | Gradient norm = 5.353 | Param update norm = 0.014 | Runtime in seconds = 0.432
INFO - Validation: Loss = 5.919 | Mae e = 0.269 | Mae f = 0.333 | Rmse e = 0.347 | Rmse f = 0.487
INFO - Saving checkpoint at epoch 90...
INFO - Best model: Loss = 5.919 | Best epoch = 90


here??


INFO - ------------ Epoch 91 ------------
INFO - Training:   Loss = 0.797 | Gradient norm = 6.402 | Param update norm = 0.014 | Runtime in seconds = 0.259
INFO - Validation: Loss = 5.874 | Mae e = 0.295 | Mae f = 0.332 | Rmse e = 0.370 | Rmse f = 0.485
INFO - Saving checkpoint at epoch 91...
INFO - Best model: Loss = 5.874 | Best epoch = 91


here??


INFO - ------------ Epoch 92 ------------
INFO - Training:   Loss = 0.792 | Gradient norm = 7.774 | Param update norm = 0.014 | Runtime in seconds = 0.427
INFO - Validation: Loss = 5.828 | Mae e = 0.305 | Mae f = 0.330 | Rmse e = 0.379 | Rmse f = 0.483
INFO - Saving checkpoint at epoch 92...
INFO - Best model: Loss = 5.828 | Best epoch = 92


here??


INFO - ------------ Epoch 93 ------------
INFO - Training:   Loss = 0.775 | Gradient norm = 5.973 | Param update norm = 0.011 | Runtime in seconds = 0.261
INFO - Validation: Loss = 5.790 | Mae e = 0.299 | Mae f = 0.329 | Rmse e = 0.372 | Rmse f = 0.481
INFO - Saving checkpoint at epoch 93...
INFO - Best model: Loss = 5.790 | Best epoch = 93


here??


INFO - ------------ Epoch 94 ------------
INFO - Training:   Loss = 0.773 | Gradient norm = 6.047 | Param update norm = 0.013 | Runtime in seconds = 0.268
INFO - Validation: Loss = 5.747 | Mae e = 0.307 | Mae f = 0.328 | Rmse e = 0.379 | Rmse f = 0.479
INFO - Saving checkpoint at epoch 94...
INFO - Best model: Loss = 5.747 | Best epoch = 94


here??


INFO - ------------ Epoch 95 ------------
INFO - Training:   Loss = 0.773 | Gradient norm = 7.432 | Param update norm = 0.015 | Runtime in seconds = 0.266
INFO - Validation: Loss = 5.704 | Mae e = 0.324 | Mae f = 0.326 | Rmse e = 0.394 | Rmse f = 0.478
INFO - Saving checkpoint at epoch 95...
INFO - Best model: Loss = 5.704 | Best epoch = 95


here??


INFO - ------------ Epoch 96 ------------
INFO - Training:   Loss = 0.770 | Gradient norm = 7.053 | Param update norm = 0.016 | Runtime in seconds = 0.265
INFO - Validation: Loss = 5.656 | Mae e = 0.349 | Mae f = 0.325 | Rmse e = 0.414 | Rmse f = 0.476
INFO - Saving checkpoint at epoch 96...
INFO - Best model: Loss = 5.656 | Best epoch = 96


here??


INFO - ------------ Epoch 97 ------------
INFO - Training:   Loss = 0.771 | Gradient norm = 8.488 | Param update norm = 0.016 | Runtime in seconds = 0.443
INFO - Validation: Loss = 5.625 | Mae e = 0.342 | Mae f = 0.324 | Rmse e = 0.408 | Rmse f = 0.474
INFO - Saving checkpoint at epoch 97...
INFO - Best model: Loss = 5.625 | Best epoch = 97


here??


INFO - ------------ Epoch 98 ------------
INFO - Training:   Loss = 0.754 | Gradient norm = 6.439 | Param update norm = 0.015 | Runtime in seconds = 0.265
INFO - Validation: Loss = 5.585 | Mae e = 0.363 | Mae f = 0.323 | Rmse e = 0.425 | Rmse f = 0.473
INFO - Saving checkpoint at epoch 98...
INFO - Best model: Loss = 5.585 | Best epoch = 98


here??


INFO - ------------ Epoch 99 ------------
INFO - Training:   Loss = 0.745 | Gradient norm = 7.150 | Param update norm = 0.015 | Runtime in seconds = 0.435
INFO - Validation: Loss = 5.542 | Mae e = 0.366 | Mae f = 0.321 | Rmse e = 0.426 | Rmse f = 0.471
INFO - Saving checkpoint at epoch 99...
INFO - Best model: Loss = 5.542 | Best epoch = 99


here??


INFO - ------------ Epoch 100 ------------
INFO - Training:   Loss = 0.733 | Gradient norm = 6.229 | Param update norm = 0.013 | Runtime in seconds = 0.264
INFO - Validation: Loss = 5.508 | Mae e = 0.365 | Mae f = 0.320 | Rmse e = 0.424 | Rmse f = 0.469
INFO - Saving checkpoint at epoch 100...
INFO - Best model: Loss = 5.508 | Best epoch = 100
INFO - Training loop completed.


here??


0,1
loss,██▇▄▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mae_e,▂▄▆▇▇███████▇▇▇▅▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mae_f,█▆▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mse_e,▇▇██████▇▇▆▆▅▅▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mse_f,█▇▇▆▅▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
loss,5.50827
mae_e,0.36479
mae_f,0.32022
mse_e,0.18006
mse_f,0.22032


[34m[1mwandb[0m: Agent Starting Run: ahwoixtb with config:
[34m[1mwandb[0m: 	energy_weight: 500
[34m[1mwandb[0m: 	final_learning_rate: 0.0008861566590326184
[34m[1mwandb[0m: 	init_learning_rate: 0.003296112320353406
[34m[1mwandb[0m: 	num_layers: 6
[34m[1mwandb[0m: 	peak_learning_rate: 0.004651399023766935
[34m[1mwandb[0m: 	weight_decay: 4.767184712332288e-05


INFO - Cleaning up existing temporary directories at /content/training/model_training/model.
INFO - Number of parameters: 90282
INFO - Number of parameters in optimizer: 361134
INFO - Starting training loop...
INFO - Validation: Loss = 86.909 | Mae e = 2.979 | Mae f = 1.388 | Rmse e = 3.046 | Rmse f = 1.864
INFO - Best model: Loss = 86.909 | Best epoch = 0


here??


ERROR - Task was destroyed but it is pending!
task: <Task cancelling name='Task-86793' coro=<Event.wait() running at /usr/lib/python3.11/asyncio/locks.py:213> wait_for=<Future cancelled>>
INFO - ------------ Epoch 1 ------------
INFO - Training:   Loss = 21.719 | Gradient norm = 1.101 | Param update norm = 0.763 | Runtime in seconds = 73.070
INFO - Validation: Loss = 85.916 | Mae e = 3.204 | Mae f = 1.379 | Rmse e = 3.265 | Rmse f = 1.854
INFO - Saving checkpoint at epoch 1...
INFO - Best model: Loss = 85.916 | Best epoch = 1


here??


INFO - ------------ Epoch 2 ------------
INFO - Training:   Loss = 9.029 | Gradient norm = 5.907 | Param update norm = 0.551 | Runtime in seconds = 2.120
INFO - Validation: Loss = 73.903 | Mae e = 4.830 | Mae f = 1.274 | Rmse e = 4.867 | Rmse f = 1.719
INFO - Saving checkpoint at epoch 2...
INFO - Best model: Loss = 73.903 | Best epoch = 2


here??


INFO - ------------ Epoch 3 ------------
INFO - Training:   Loss = 4.565 | Gradient norm = 5.484 | Param update norm = 0.402 | Runtime in seconds = 2.214
INFO - Validation: Loss = 56.311 | Mae e = 6.812 | Mae f = 1.106 | Rmse e = 6.834 | Rmse f = 1.500
INFO - Saving checkpoint at epoch 3...
INFO - Best model: Loss = 56.311 | Best epoch = 3


here??


INFO - ------------ Epoch 4 ------------
INFO - Training:   Loss = 2.835 | Gradient norm = 6.681 | Param update norm = 0.331 | Runtime in seconds = 2.214
INFO - Validation: Loss = 44.543 | Mae e = 8.159 | Mae f = 0.978 | Rmse e = 8.175 | Rmse f = 1.333
INFO - Saving checkpoint at epoch 4...
INFO - Best model: Loss = 44.543 | Best epoch = 4


here??


INFO - ------------ Epoch 5 ------------
INFO - Training:   Loss = 1.763 | Gradient norm = 4.910 | Param update norm = 0.328 | Runtime in seconds = 2.261
INFO - Validation: Loss = 37.484 | Mae e = 8.844 | Mae f = 0.893 | Rmse e = 8.856 | Rmse f = 1.223
INFO - Saving checkpoint at epoch 5...
INFO - Best model: Loss = 37.484 | Best epoch = 5


here??


INFO - ------------ Epoch 6 ------------
INFO - Training:   Loss = 1.258 | Gradient norm = 5.725 | Param update norm = 0.248 | Runtime in seconds = 2.273
INFO - Validation: Loss = 32.714 | Mae e = 8.871 | Mae f = 0.831 | Rmse e = 8.882 | Rmse f = 1.142
INFO - Saving checkpoint at epoch 6...
INFO - Best model: Loss = 32.714 | Best epoch = 6


here??


INFO - ------------ Epoch 7 ------------
INFO - Training:   Loss = 1.017 | Gradient norm = 6.694 | Param update norm = 0.210 | Runtime in seconds = 2.298
INFO - Validation: Loss = 28.646 | Mae e = 8.556 | Mae f = 0.774 | Rmse e = 8.566 | Rmse f = 1.069
INFO - Saving checkpoint at epoch 7...
INFO - Best model: Loss = 28.646 | Best epoch = 7


here??


INFO - ------------ Epoch 8 ------------
INFO - Training:   Loss = 0.812 | Gradient norm = 5.339 | Param update norm = 0.193 | Runtime in seconds = 2.110
INFO - Validation: Loss = 25.089 | Mae e = 8.029 | Mae f = 0.720 | Rmse e = 8.038 | Rmse f = 1.000
INFO - Saving checkpoint at epoch 8...
INFO - Best model: Loss = 25.089 | Best epoch = 8


here??


INFO - ------------ Epoch 9 ------------
INFO - Training:   Loss = 0.706 | Gradient norm = 4.399 | Param update norm = 0.188 | Runtime in seconds = 2.974
INFO - Validation: Loss = 22.000 | Mae e = 7.415 | Mae f = 0.670 | Rmse e = 7.424 | Rmse f = 0.936
INFO - Saving checkpoint at epoch 9...
INFO - Best model: Loss = 22.000 | Best epoch = 9


here??


INFO - ------------ Epoch 10 ------------
INFO - Training:   Loss = 0.608 | Gradient norm = 4.710 | Param update norm = 0.168 | Runtime in seconds = 2.098
INFO - Validation: Loss = 19.320 | Mae e = 6.696 | Mae f = 0.624 | Rmse e = 6.705 | Rmse f = 0.878
INFO - Saving checkpoint at epoch 10...
INFO - Best model: Loss = 19.320 | Best epoch = 10


here??


INFO - ------------ Epoch 11 ------------
INFO - Training:   Loss = 0.544 | Gradient norm = 4.110 | Param update norm = 0.159 | Runtime in seconds = 2.174
INFO - Validation: Loss = 16.900 | Mae e = 6.001 | Mae f = 0.579 | Rmse e = 6.010 | Rmse f = 0.821
INFO - Saving checkpoint at epoch 11...
INFO - Best model: Loss = 16.900 | Best epoch = 11


here??


INFO - ------------ Epoch 12 ------------
INFO - Training:   Loss = 0.498 | Gradient norm = 4.002 | Param update norm = 0.147 | Runtime in seconds = 1.981
INFO - Validation: Loss = 14.797 | Mae e = 5.355 | Mae f = 0.538 | Rmse e = 5.364 | Rmse f = 0.768
INFO - Saving checkpoint at epoch 12...
INFO - Best model: Loss = 14.797 | Best epoch = 12


here??


INFO - ------------ Epoch 13 ------------
INFO - Training:   Loss = 0.448 | Gradient norm = 3.586 | Param update norm = 0.132 | Runtime in seconds = 2.193
INFO - Validation: Loss = 13.049 | Mae e = 4.743 | Mae f = 0.502 | Rmse e = 4.752 | Rmse f = 0.722
INFO - Saving checkpoint at epoch 13...
INFO - Best model: Loss = 13.049 | Best epoch = 13


here??


INFO - ------------ Epoch 14 ------------
INFO - Training:   Loss = 0.410 | Gradient norm = 3.341 | Param update norm = 0.126 | Runtime in seconds = 2.214
INFO - Validation: Loss = 11.611 | Mae e = 4.181 | Mae f = 0.471 | Rmse e = 4.190 | Rmse f = 0.681
INFO - Saving checkpoint at epoch 14...
INFO - Best model: Loss = 11.611 | Best epoch = 14


here??


INFO - ------------ Epoch 15 ------------
INFO - Training:   Loss = 0.375 | Gradient norm = 3.018 | Param update norm = 0.119 | Runtime in seconds = 2.181
INFO - Validation: Loss = 10.405 | Mae e = 3.705 | Mae f = 0.443 | Rmse e = 3.715 | Rmse f = 0.645
INFO - Saving checkpoint at epoch 15...
INFO - Best model: Loss = 10.405 | Best epoch = 15


here??


INFO - ------------ Epoch 16 ------------
INFO - Training:   Loss = 0.355 | Gradient norm = 3.555 | Param update norm = 0.118 | Runtime in seconds = 2.244
INFO - Validation: Loss = 9.400 | Mae e = 3.311 | Mae f = 0.419 | Rmse e = 3.322 | Rmse f = 0.613
INFO - Saving checkpoint at epoch 16...
INFO - Best model: Loss = 9.400 | Best epoch = 16


here??


INFO - ------------ Epoch 17 ------------
INFO - Training:   Loss = 0.339 | Gradient norm = 3.849 | Param update norm = 0.117 | Runtime in seconds = 2.052
INFO - Validation: Loss = 8.563 | Mae e = 2.979 | Mae f = 0.398 | Rmse e = 2.989 | Rmse f = 0.585
INFO - Saving checkpoint at epoch 17...
INFO - Best model: Loss = 8.563 | Best epoch = 17


here??


INFO - ------------ Epoch 18 ------------
INFO - Training:   Loss = 0.321 | Gradient norm = 3.663 | Param update norm = 0.117 | Runtime in seconds = 2.051
INFO - Validation: Loss = 7.855 | Mae e = 2.708 | Mae f = 0.379 | Rmse e = 2.719 | Rmse f = 0.560
INFO - Saving checkpoint at epoch 18...
INFO - Best model: Loss = 7.855 | Best epoch = 18


here??


INFO - ------------ Epoch 19 ------------
INFO - Training:   Loss = 0.291 | Gradient norm = 3.222 | Param update norm = 0.102 | Runtime in seconds = 2.082
INFO - Validation: Loss = 7.296 | Mae e = 2.451 | Mae f = 0.364 | Rmse e = 2.462 | Rmse f = 0.540
INFO - Saving checkpoint at epoch 19...
INFO - Best model: Loss = 7.296 | Best epoch = 19


here??


INFO - ------------ Epoch 20 ------------
INFO - Training:   Loss = 0.289 | Gradient norm = 3.481 | Param update norm = 0.115 | Runtime in seconds = 2.220
INFO - Validation: Loss = 6.800 | Mae e = 2.241 | Mae f = 0.350 | Rmse e = 2.253 | Rmse f = 0.521
INFO - Saving checkpoint at epoch 20...
INFO - Best model: Loss = 6.800 | Best epoch = 20


here??


INFO - ------------ Epoch 21 ------------
INFO - Training:   Loss = 0.267 | Gradient norm = 3.173 | Param update norm = 0.098 | Runtime in seconds = 2.218
INFO - Validation: Loss = 6.413 | Mae e = 2.043 | Mae f = 0.339 | Rmse e = 2.056 | Rmse f = 0.506
INFO - Saving checkpoint at epoch 21...
INFO - Best model: Loss = 6.413 | Best epoch = 21


here??


INFO - ------------ Epoch 22 ------------
INFO - Training:   Loss = 0.247 | Gradient norm = 2.591 | Param update norm = 0.093 | Runtime in seconds = 2.131
INFO - Validation: Loss = 6.074 | Mae e = 1.862 | Mae f = 0.329 | Rmse e = 1.876 | Rmse f = 0.493
INFO - Saving checkpoint at epoch 22...
INFO - Best model: Loss = 6.074 | Best epoch = 22


here??


INFO - ------------ Epoch 23 ------------
INFO - Training:   Loss = 0.232 | Gradient norm = 2.446 | Param update norm = 0.086 | Runtime in seconds = 2.224
INFO - Validation: Loss = 5.786 | Mae e = 1.715 | Mae f = 0.320 | Rmse e = 1.729 | Rmse f = 0.481
INFO - Saving checkpoint at epoch 23...
INFO - Best model: Loss = 5.786 | Best epoch = 23


here??


INFO - ------------ Epoch 24 ------------
INFO - Training:   Loss = 0.232 | Gradient norm = 3.116 | Param update norm = 0.093 | Runtime in seconds = 2.333
INFO - Validation: Loss = 5.525 | Mae e = 1.597 | Mae f = 0.312 | Rmse e = 1.611 | Rmse f = 0.470
INFO - Saving checkpoint at epoch 24...
INFO - Best model: Loss = 5.525 | Best epoch = 24


here??


INFO - ------------ Epoch 25 ------------
INFO - Training:   Loss = 0.217 | Gradient norm = 2.412 | Param update norm = 0.091 | Runtime in seconds = 2.277
INFO - Validation: Loss = 5.305 | Mae e = 1.461 | Mae f = 0.305 | Rmse e = 1.476 | Rmse f = 0.461
INFO - Saving checkpoint at epoch 25...
INFO - Best model: Loss = 5.305 | Best epoch = 25


here??


INFO - ------------ Epoch 26 ------------
INFO - Training:   Loss = 0.207 | Gradient norm = 2.503 | Param update norm = 0.086 | Runtime in seconds = 2.346
INFO - Validation: Loss = 5.121 | Mae e = 1.346 | Mae f = 0.299 | Rmse e = 1.362 | Rmse f = 0.452
INFO - Saving checkpoint at epoch 26...
INFO - Best model: Loss = 5.121 | Best epoch = 26


here??


INFO - ------------ Epoch 27 ------------
INFO - Training:   Loss = 0.196 | Gradient norm = 2.078 | Param update norm = 0.078 | Runtime in seconds = 2.197
INFO - Validation: Loss = 4.936 | Mae e = 1.252 | Mae f = 0.293 | Rmse e = 1.269 | Rmse f = 0.444
INFO - Saving checkpoint at epoch 27...
INFO - Best model: Loss = 4.936 | Best epoch = 27


here??


INFO - ------------ Epoch 28 ------------
INFO - Training:   Loss = 0.201 | Gradient norm = 2.949 | Param update norm = 0.088 | Runtime in seconds = 2.357
INFO - Validation: Loss = 4.794 | Mae e = 1.165 | Mae f = 0.288 | Rmse e = 1.183 | Rmse f = 0.438
INFO - Saving checkpoint at epoch 28...
INFO - Best model: Loss = 4.794 | Best epoch = 28


here??


INFO - ------------ Epoch 29 ------------
INFO - Training:   Loss = 0.183 | Gradient norm = 2.083 | Param update norm = 0.077 | Runtime in seconds = 2.233
INFO - Validation: Loss = 4.631 | Mae e = 1.104 | Mae f = 0.282 | Rmse e = 1.122 | Rmse f = 0.430
INFO - Saving checkpoint at epoch 29...
INFO - Best model: Loss = 4.631 | Best epoch = 29


here??


INFO - ------------ Epoch 30 ------------
INFO - Training:   Loss = 0.176 | Gradient norm = 2.096 | Param update norm = 0.076 | Runtime in seconds = 2.365
INFO - Validation: Loss = 4.492 | Mae e = 1.033 | Mae f = 0.277 | Rmse e = 1.052 | Rmse f = 0.424
INFO - Saving checkpoint at epoch 30...
INFO - Best model: Loss = 4.492 | Best epoch = 30


here??


INFO - ------------ Epoch 31 ------------
INFO - Training:   Loss = 0.171 | Gradient norm = 1.832 | Param update norm = 0.079 | Runtime in seconds = 2.295
INFO - Validation: Loss = 4.360 | Mae e = 0.977 | Mae f = 0.273 | Rmse e = 0.996 | Rmse f = 0.418
INFO - Saving checkpoint at epoch 31...
INFO - Best model: Loss = 4.360 | Best epoch = 31


here??


INFO - ------------ Epoch 32 ------------
INFO - Training:   Loss = 0.163 | Gradient norm = 1.794 | Param update norm = 0.073 | Runtime in seconds = 2.233
INFO - Validation: Loss = 4.256 | Mae e = 0.922 | Mae f = 0.269 | Rmse e = 0.942 | Rmse f = 0.413
INFO - Saving checkpoint at epoch 32...
INFO - Best model: Loss = 4.256 | Best epoch = 32


here??


INFO - ------------ Epoch 33 ------------
INFO - Training:   Loss = 0.170 | Gradient norm = 2.649 | Param update norm = 0.089 | Runtime in seconds = 2.209
INFO - Validation: Loss = 4.140 | Mae e = 0.885 | Mae f = 0.265 | Rmse e = 0.905 | Rmse f = 0.407
INFO - Saving checkpoint at epoch 33...
INFO - Best model: Loss = 4.140 | Best epoch = 33


here??


INFO - ------------ Epoch 34 ------------
INFO - Training:   Loss = 0.158 | Gradient norm = 1.932 | Param update norm = 0.078 | Runtime in seconds = 2.313
INFO - Validation: Loss = 4.051 | Mae e = 0.825 | Mae f = 0.262 | Rmse e = 0.846 | Rmse f = 0.403
INFO - Saving checkpoint at epoch 34...
INFO - Best model: Loss = 4.051 | Best epoch = 34


here??


INFO - ------------ Epoch 35 ------------
INFO - Training:   Loss = 0.158 | Gradient norm = 2.246 | Param update norm = 0.082 | Runtime in seconds = 2.265
INFO - Validation: Loss = 3.948 | Mae e = 0.799 | Mae f = 0.258 | Rmse e = 0.820 | Rmse f = 0.397
INFO - Saving checkpoint at epoch 35...
INFO - Best model: Loss = 3.948 | Best epoch = 35


here??


INFO - ------------ Epoch 36 ------------
INFO - Training:   Loss = 0.147 | Gradient norm = 1.842 | Param update norm = 0.073 | Runtime in seconds = 2.227
INFO - Validation: Loss = 3.861 | Mae e = 0.758 | Mae f = 0.255 | Rmse e = 0.778 | Rmse f = 0.393
INFO - Saving checkpoint at epoch 36...
INFO - Best model: Loss = 3.861 | Best epoch = 36


here??


INFO - ------------ Epoch 37 ------------
INFO - Training:   Loss = 0.146 | Gradient norm = 2.325 | Param update norm = 0.076 | Runtime in seconds = 2.130
INFO - Validation: Loss = 3.757 | Mae e = 0.731 | Mae f = 0.251 | Rmse e = 0.751 | Rmse f = 0.388
INFO - Saving checkpoint at epoch 37...
INFO - Best model: Loss = 3.757 | Best epoch = 37


here??


INFO - ------------ Epoch 38 ------------
INFO - Training:   Loss = 0.139 | Gradient norm = 1.864 | Param update norm = 0.069 | Runtime in seconds = 2.241
INFO - Validation: Loss = 3.677 | Mae e = 0.687 | Mae f = 0.248 | Rmse e = 0.708 | Rmse f = 0.383
INFO - Saving checkpoint at epoch 38...
INFO - Best model: Loss = 3.677 | Best epoch = 38


here??


INFO - ------------ Epoch 39 ------------
INFO - Training:   Loss = 0.142 | Gradient norm = 2.389 | Param update norm = 0.075 | Runtime in seconds = 2.248
INFO - Validation: Loss = 3.583 | Mae e = 0.667 | Mae f = 0.244 | Rmse e = 0.687 | Rmse f = 0.379
INFO - Saving checkpoint at epoch 39...
INFO - Best model: Loss = 3.583 | Best epoch = 39


here??


INFO - ------------ Epoch 40 ------------
INFO - Training:   Loss = 0.140 | Gradient norm = 2.272 | Param update norm = 0.080 | Runtime in seconds = 2.259
INFO - Validation: Loss = 3.510 | Mae e = 0.645 | Mae f = 0.241 | Rmse e = 0.665 | Rmse f = 0.375
INFO - Saving checkpoint at epoch 40...
INFO - Best model: Loss = 3.510 | Best epoch = 40


here??


INFO - ------------ Epoch 41 ------------
INFO - Training:   Loss = 0.134 | Gradient norm = 2.045 | Param update norm = 0.075 | Runtime in seconds = 2.204
INFO - Validation: Loss = 3.433 | Mae e = 0.625 | Mae f = 0.238 | Rmse e = 0.645 | Rmse f = 0.371
INFO - Saving checkpoint at epoch 41...
INFO - Best model: Loss = 3.433 | Best epoch = 41


here??


INFO - ------------ Epoch 42 ------------
INFO - Training:   Loss = 0.137 | Gradient norm = 2.549 | Param update norm = 0.078 | Runtime in seconds = 2.132
INFO - Validation: Loss = 3.364 | Mae e = 0.607 | Mae f = 0.236 | Rmse e = 0.627 | Rmse f = 0.367
INFO - Saving checkpoint at epoch 42...
INFO - Best model: Loss = 3.364 | Best epoch = 42


here??


INFO - ------------ Epoch 43 ------------
INFO - Training:   Loss = 0.127 | Gradient norm = 2.049 | Param update norm = 0.069 | Runtime in seconds = 2.269
INFO - Validation: Loss = 3.307 | Mae e = 0.586 | Mae f = 0.234 | Rmse e = 0.606 | Rmse f = 0.364
INFO - Saving checkpoint at epoch 43...
INFO - Best model: Loss = 3.307 | Best epoch = 43


here??


INFO - ------------ Epoch 44 ------------
INFO - Training:   Loss = 0.128 | Gradient norm = 2.331 | Param update norm = 0.073 | Runtime in seconds = 2.247
INFO - Validation: Loss = 3.256 | Mae e = 0.560 | Mae f = 0.232 | Rmse e = 0.580 | Rmse f = 0.361
INFO - Saving checkpoint at epoch 44...
INFO - Best model: Loss = 3.256 | Best epoch = 44


here??


INFO - ------------ Epoch 45 ------------
INFO - Training:   Loss = 0.121 | Gradient norm = 1.772 | Param update norm = 0.069 | Runtime in seconds = 2.247
INFO - Validation: Loss = 3.203 | Mae e = 0.540 | Mae f = 0.229 | Rmse e = 0.560 | Rmse f = 0.358
INFO - Saving checkpoint at epoch 45...
INFO - Best model: Loss = 3.203 | Best epoch = 45


here??


INFO - ------------ Epoch 46 ------------
INFO - Training:   Loss = 0.119 | Gradient norm = 1.969 | Param update norm = 0.067 | Runtime in seconds = 2.295
INFO - Validation: Loss = 3.161 | Mae e = 0.515 | Mae f = 0.228 | Rmse e = 0.535 | Rmse f = 0.356
INFO - Saving checkpoint at epoch 46...
INFO - Best model: Loss = 3.161 | Best epoch = 46


here??


INFO - ------------ Epoch 47 ------------
INFO - Training:   Loss = 0.120 | Gradient norm = 2.446 | Param update norm = 0.070 | Runtime in seconds = 2.367
INFO - Validation: Loss = 3.108 | Mae e = 0.508 | Mae f = 0.225 | Rmse e = 0.528 | Rmse f = 0.353
INFO - Saving checkpoint at epoch 47...
INFO - Best model: Loss = 3.108 | Best epoch = 47


here??


INFO - ------------ Epoch 48 ------------
INFO - Training:   Loss = 0.118 | Gradient norm = 2.256 | Param update norm = 0.069 | Runtime in seconds = 2.446
INFO - Validation: Loss = 3.056 | Mae e = 0.495 | Mae f = 0.223 | Rmse e = 0.515 | Rmse f = 0.350
INFO - Saving checkpoint at epoch 48...
INFO - Best model: Loss = 3.056 | Best epoch = 48


here??


INFO - ------------ Epoch 49 ------------
INFO - Training:   Loss = 0.114 | Gradient norm = 1.968 | Param update norm = 0.072 | Runtime in seconds = 2.076
INFO - Validation: Loss = 3.009 | Mae e = 0.482 | Mae f = 0.221 | Rmse e = 0.503 | Rmse f = 0.347
INFO - Saving checkpoint at epoch 49...
INFO - Best model: Loss = 3.009 | Best epoch = 49


here??


INFO - ------------ Epoch 50 ------------
INFO - Training:   Loss = 0.114 | Gradient norm = 1.882 | Param update norm = 0.068 | Runtime in seconds = 2.360
INFO - Validation: Loss = 2.964 | Mae e = 0.476 | Mae f = 0.220 | Rmse e = 0.496 | Rmse f = 0.344
INFO - Saving checkpoint at epoch 50...
INFO - Best model: Loss = 2.964 | Best epoch = 50


here??


INFO - ------------ Epoch 51 ------------
INFO - Training:   Loss = 0.113 | Gradient norm = 1.933 | Param update norm = 0.072 | Runtime in seconds = 2.217
INFO - Validation: Loss = 2.929 | Mae e = 0.465 | Mae f = 0.218 | Rmse e = 0.485 | Rmse f = 0.342
INFO - Saving checkpoint at epoch 51...
INFO - Best model: Loss = 2.929 | Best epoch = 51


here??


INFO - ------------ Epoch 52 ------------
INFO - Training:   Loss = 0.107 | Gradient norm = 1.898 | Param update norm = 0.064 | Runtime in seconds = 2.328
INFO - Validation: Loss = 2.895 | Mae e = 0.442 | Mae f = 0.217 | Rmse e = 0.463 | Rmse f = 0.340
INFO - Saving checkpoint at epoch 52...
INFO - Best model: Loss = 2.895 | Best epoch = 52


here??


INFO - ------------ Epoch 53 ------------
INFO - Training:   Loss = 0.108 | Gradient norm = 2.250 | Param update norm = 0.064 | Runtime in seconds = 2.300
INFO - Validation: Loss = 2.861 | Mae e = 0.427 | Mae f = 0.215 | Rmse e = 0.448 | Rmse f = 0.338
INFO - Saving checkpoint at epoch 53...
INFO - Best model: Loss = 2.861 | Best epoch = 53


here??


INFO - ------------ Epoch 54 ------------
INFO - Training:   Loss = 0.107 | Gradient norm = 2.032 | Param update norm = 0.067 | Runtime in seconds = 1.961
INFO - Validation: Loss = 2.817 | Mae e = 0.425 | Mae f = 0.213 | Rmse e = 0.446 | Rmse f = 0.336
INFO - Saving checkpoint at epoch 54...
INFO - Best model: Loss = 2.817 | Best epoch = 54


here??


INFO - ------------ Epoch 55 ------------
INFO - Training:   Loss = 0.107 | Gradient norm = 2.267 | Param update norm = 0.068 | Runtime in seconds = 2.102
INFO - Validation: Loss = 2.782 | Mae e = 0.410 | Mae f = 0.212 | Rmse e = 0.432 | Rmse f = 0.334
INFO - Saving checkpoint at epoch 55...
INFO - Best model: Loss = 2.782 | Best epoch = 55


here??


INFO - ------------ Epoch 56 ------------
INFO - Training:   Loss = 0.105 | Gradient norm = 2.167 | Param update norm = 0.067 | Runtime in seconds = 3.409
INFO - Validation: Loss = 2.750 | Mae e = 0.402 | Mae f = 0.211 | Rmse e = 0.424 | Rmse f = 0.332
INFO - Saving checkpoint at epoch 56...
INFO - Best model: Loss = 2.750 | Best epoch = 56


here??


INFO - ------------ Epoch 57 ------------
INFO - Training:   Loss = 0.108 | Gradient norm = 2.686 | Param update norm = 0.070 | Runtime in seconds = 2.134
INFO - Validation: Loss = 2.721 | Mae e = 0.401 | Mae f = 0.209 | Rmse e = 0.423 | Rmse f = 0.330
INFO - Saving checkpoint at epoch 57...
INFO - Best model: Loss = 2.721 | Best epoch = 57


here??


INFO - ------------ Epoch 58 ------------
INFO - Training:   Loss = 0.100 | Gradient norm = 1.908 | Param update norm = 0.064 | Runtime in seconds = 2.362
INFO - Validation: Loss = 2.692 | Mae e = 0.400 | Mae f = 0.208 | Rmse e = 0.422 | Rmse f = 0.328
INFO - Saving checkpoint at epoch 58...
INFO - Best model: Loss = 2.692 | Best epoch = 58


here??


INFO - ------------ Epoch 59 ------------
INFO - Training:   Loss = 0.101 | Gradient norm = 2.188 | Param update norm = 0.064 | Runtime in seconds = 2.410
INFO - Validation: Loss = 2.667 | Mae e = 0.379 | Mae f = 0.207 | Rmse e = 0.402 | Rmse f = 0.327
INFO - Saving checkpoint at epoch 59...
INFO - Best model: Loss = 2.667 | Best epoch = 59


here??


INFO - ------------ Epoch 60 ------------
INFO - Training:   Loss = 0.097 | Gradient norm = 1.928 | Param update norm = 0.060 | Runtime in seconds = 2.457
INFO - Validation: Loss = 2.638 | Mae e = 0.381 | Mae f = 0.206 | Rmse e = 0.404 | Rmse f = 0.325
INFO - Saving checkpoint at epoch 60...
INFO - Best model: Loss = 2.638 | Best epoch = 60


here??


INFO - ------------ Epoch 61 ------------
INFO - Training:   Loss = 0.094 | Gradient norm = 1.700 | Param update norm = 0.061 | Runtime in seconds = 2.554
INFO - Validation: Loss = 2.608 | Mae e = 0.369 | Mae f = 0.204 | Rmse e = 0.391 | Rmse f = 0.323
INFO - Saving checkpoint at epoch 61...
INFO - Best model: Loss = 2.608 | Best epoch = 61


here??


INFO - ------------ Epoch 62 ------------
INFO - Training:   Loss = 0.096 | Gradient norm = 1.981 | Param update norm = 0.063 | Runtime in seconds = 2.481
INFO - Validation: Loss = 2.578 | Mae e = 0.356 | Mae f = 0.203 | Rmse e = 0.379 | Rmse f = 0.321
INFO - Saving checkpoint at epoch 62...
INFO - Best model: Loss = 2.578 | Best epoch = 62


here??


INFO - ------------ Epoch 63 ------------
INFO - Training:   Loss = 0.106 | Gradient norm = 2.827 | Param update norm = 0.078 | Runtime in seconds = 2.653
INFO - Validation: Loss = 2.555 | Mae e = 0.365 | Mae f = 0.202 | Rmse e = 0.387 | Rmse f = 0.320
INFO - Saving checkpoint at epoch 63...
INFO - Best model: Loss = 2.555 | Best epoch = 63


here??


INFO - ------------ Epoch 64 ------------
INFO - Training:   Loss = 0.098 | Gradient norm = 2.142 | Param update norm = 0.068 | Runtime in seconds = 2.087
INFO - Validation: Loss = 2.534 | Mae e = 0.366 | Mae f = 0.201 | Rmse e = 0.388 | Rmse f = 0.318
INFO - Saving checkpoint at epoch 64...
INFO - Best model: Loss = 2.534 | Best epoch = 64


here??


INFO - ------------ Epoch 65 ------------
INFO - Training:   Loss = 0.093 | Gradient norm = 1.754 | Param update norm = 0.067 | Runtime in seconds = 2.195
INFO - Validation: Loss = 2.514 | Mae e = 0.352 | Mae f = 0.200 | Rmse e = 0.375 | Rmse f = 0.317
INFO - Saving checkpoint at epoch 65...
INFO - Best model: Loss = 2.514 | Best epoch = 65


here??


INFO - ------------ Epoch 66 ------------
INFO - Training:   Loss = 0.094 | Gradient norm = 1.939 | Param update norm = 0.066 | Runtime in seconds = 2.266
INFO - Validation: Loss = 2.494 | Mae e = 0.350 | Mae f = 0.199 | Rmse e = 0.372 | Rmse f = 0.316
INFO - Saving checkpoint at epoch 66...
INFO - Best model: Loss = 2.494 | Best epoch = 66


here??


INFO - ------------ Epoch 67 ------------
INFO - Training:   Loss = 0.096 | Gradient norm = 2.026 | Param update norm = 0.074 | Runtime in seconds = 2.282
INFO - Validation: Loss = 2.479 | Mae e = 0.331 | Mae f = 0.199 | Rmse e = 0.355 | Rmse f = 0.315
INFO - Saving checkpoint at epoch 67...
INFO - Best model: Loss = 2.479 | Best epoch = 67


here??


INFO - ------------ Epoch 68 ------------
INFO - Training:   Loss = 0.095 | Gradient norm = 2.249 | Param update norm = 0.067 | Runtime in seconds = 2.355
INFO - Validation: Loss = 2.464 | Mae e = 0.340 | Mae f = 0.198 | Rmse e = 0.363 | Rmse f = 0.314
INFO - Saving checkpoint at epoch 68...
INFO - Best model: Loss = 2.464 | Best epoch = 68


here??


INFO - ------------ Epoch 69 ------------
INFO - Training:   Loss = 0.090 | Gradient norm = 1.628 | Param update norm = 0.063 | Runtime in seconds = 2.261
INFO - Validation: Loss = 2.443 | Mae e = 0.329 | Mae f = 0.197 | Rmse e = 0.353 | Rmse f = 0.313
INFO - Saving checkpoint at epoch 69...
INFO - Best model: Loss = 2.443 | Best epoch = 69


here??


INFO - ------------ Epoch 70 ------------
INFO - Training:   Loss = 0.089 | Gradient norm = 2.124 | Param update norm = 0.059 | Runtime in seconds = 2.406
INFO - Validation: Loss = 2.420 | Mae e = 0.323 | Mae f = 0.196 | Rmse e = 0.347 | Rmse f = 0.311
INFO - Saving checkpoint at epoch 70...
INFO - Best model: Loss = 2.420 | Best epoch = 70


here??


INFO - ------------ Epoch 71 ------------
INFO - Training:   Loss = 0.090 | Gradient norm = 2.305 | Param update norm = 0.062 | Runtime in seconds = 2.467
INFO - Validation: Loss = 2.397 | Mae e = 0.324 | Mae f = 0.195 | Rmse e = 0.348 | Rmse f = 0.310
INFO - Saving checkpoint at epoch 71...
INFO - Best model: Loss = 2.397 | Best epoch = 71


here??


INFO - ------------ Epoch 72 ------------
INFO - Training:   Loss = 0.085 | Gradient norm = 1.759 | Param update norm = 0.060 | Runtime in seconds = 2.464
INFO - Validation: Loss = 2.379 | Mae e = 0.315 | Mae f = 0.194 | Rmse e = 0.339 | Rmse f = 0.308
INFO - Saving checkpoint at epoch 72...
INFO - Best model: Loss = 2.379 | Best epoch = 72


here??


INFO - ------------ Epoch 73 ------------
INFO - Training:   Loss = 0.089 | Gradient norm = 2.340 | Param update norm = 0.064 | Runtime in seconds = 2.317
INFO - Validation: Loss = 2.370 | Mae e = 0.296 | Mae f = 0.194 | Rmse e = 0.322 | Rmse f = 0.308
INFO - Saving checkpoint at epoch 73...
INFO - Best model: Loss = 2.370 | Best epoch = 73


here??


INFO - ------------ Epoch 74 ------------
INFO - Training:   Loss = 0.087 | Gradient norm = 2.040 | Param update norm = 0.062 | Runtime in seconds = 2.272
INFO - Validation: Loss = 2.348 | Mae e = 0.303 | Mae f = 0.193 | Rmse e = 0.328 | Rmse f = 0.306
INFO - Saving checkpoint at epoch 74...
INFO - Best model: Loss = 2.348 | Best epoch = 74


here??


INFO - ------------ Epoch 75 ------------
INFO - Training:   Loss = 0.084 | Gradient norm = 1.843 | Param update norm = 0.060 | Runtime in seconds = 2.266
INFO - Validation: Loss = 2.332 | Mae e = 0.293 | Mae f = 0.192 | Rmse e = 0.319 | Rmse f = 0.305
INFO - Saving checkpoint at epoch 75...
INFO - Best model: Loss = 2.332 | Best epoch = 75


here??


INFO - ------------ Epoch 76 ------------
INFO - Training:   Loss = 0.084 | Gradient norm = 1.951 | Param update norm = 0.059 | Runtime in seconds = 2.249
INFO - Validation: Loss = 2.319 | Mae e = 0.287 | Mae f = 0.191 | Rmse e = 0.313 | Rmse f = 0.305
INFO - Saving checkpoint at epoch 76...
INFO - Best model: Loss = 2.319 | Best epoch = 76


here??


INFO - ------------ Epoch 77 ------------
INFO - Training:   Loss = 0.081 | Gradient norm = 1.752 | Param update norm = 0.058 | Runtime in seconds = 2.356
INFO - Validation: Loss = 2.301 | Mae e = 0.286 | Mae f = 0.190 | Rmse e = 0.311 | Rmse f = 0.303
INFO - Saving checkpoint at epoch 77...
INFO - Best model: Loss = 2.301 | Best epoch = 77


here??


INFO - ------------ Epoch 78 ------------
INFO - Training:   Loss = 0.082 | Gradient norm = 1.819 | Param update norm = 0.060 | Runtime in seconds = 2.103
INFO - Validation: Loss = 2.286 | Mae e = 0.279 | Mae f = 0.190 | Rmse e = 0.305 | Rmse f = 0.302
INFO - Saving checkpoint at epoch 78...
INFO - Best model: Loss = 2.286 | Best epoch = 78


here??


INFO - ------------ Epoch 79 ------------
INFO - Training:   Loss = 0.083 | Gradient norm = 2.006 | Param update norm = 0.062 | Runtime in seconds = 2.313
INFO - Validation: Loss = 2.271 | Mae e = 0.282 | Mae f = 0.189 | Rmse e = 0.308 | Rmse f = 0.301
INFO - Saving checkpoint at epoch 79...
INFO - Best model: Loss = 2.271 | Best epoch = 79


here??


INFO - ------------ Epoch 80 ------------
INFO - Training:   Loss = 0.080 | Gradient norm = 1.672 | Param update norm = 0.058 | Runtime in seconds = 2.198
INFO - Validation: Loss = 2.254 | Mae e = 0.282 | Mae f = 0.188 | Rmse e = 0.307 | Rmse f = 0.300
INFO - Saving checkpoint at epoch 80...
INFO - Best model: Loss = 2.254 | Best epoch = 80


here??


INFO - ------------ Epoch 81 ------------
INFO - Training:   Loss = 0.083 | Gradient norm = 2.331 | Param update norm = 0.061 | Runtime in seconds = 2.091
INFO - Validation: Loss = 2.236 | Mae e = 0.278 | Mae f = 0.188 | Rmse e = 0.304 | Rmse f = 0.299
INFO - Saving checkpoint at epoch 81...
INFO - Best model: Loss = 2.236 | Best epoch = 81


here??


INFO - ------------ Epoch 82 ------------
INFO - Training:   Loss = 0.078 | Gradient norm = 1.825 | Param update norm = 0.054 | Runtime in seconds = 2.118
INFO - Validation: Loss = 2.217 | Mae e = 0.277 | Mae f = 0.187 | Rmse e = 0.303 | Rmse f = 0.298
INFO - Saving checkpoint at epoch 82...
INFO - Best model: Loss = 2.217 | Best epoch = 82


here??


INFO - ------------ Epoch 83 ------------
INFO - Training:   Loss = 0.077 | Gradient norm = 1.667 | Param update norm = 0.056 | Runtime in seconds = 2.305
INFO - Validation: Loss = 2.204 | Mae e = 0.273 | Mae f = 0.186 | Rmse e = 0.299 | Rmse f = 0.297
INFO - Saving checkpoint at epoch 83...
INFO - Best model: Loss = 2.204 | Best epoch = 83


here??


INFO - ------------ Epoch 84 ------------
INFO - Training:   Loss = 0.078 | Gradient norm = 2.109 | Param update norm = 0.058 | Runtime in seconds = 2.325
INFO - Validation: Loss = 2.194 | Mae e = 0.264 | Mae f = 0.186 | Rmse e = 0.290 | Rmse f = 0.296
INFO - Saving checkpoint at epoch 84...
INFO - Best model: Loss = 2.194 | Best epoch = 84


here??


INFO - ------------ Epoch 85 ------------
INFO - Training:   Loss = 0.074 | Gradient norm = 1.474 | Param update norm = 0.052 | Runtime in seconds = 2.229
INFO - Validation: Loss = 2.179 | Mae e = 0.267 | Mae f = 0.185 | Rmse e = 0.293 | Rmse f = 0.295
INFO - Saving checkpoint at epoch 85...
INFO - Best model: Loss = 2.179 | Best epoch = 85


here??


INFO - ------------ Epoch 86 ------------
INFO - Training:   Loss = 0.079 | Gradient norm = 1.995 | Param update norm = 0.062 | Runtime in seconds = 2.212
INFO - Validation: Loss = 2.175 | Mae e = 0.249 | Mae f = 0.185 | Rmse e = 0.277 | Rmse f = 0.295
INFO - Saving checkpoint at epoch 86...
INFO - Best model: Loss = 2.175 | Best epoch = 86


here??


INFO - ------------ Epoch 87 ------------
INFO - Training:   Loss = 0.076 | Gradient norm = 1.927 | Param update norm = 0.057 | Runtime in seconds = 2.249
INFO - Validation: Loss = 2.156 | Mae e = 0.256 | Mae f = 0.184 | Rmse e = 0.283 | Rmse f = 0.294
INFO - Saving checkpoint at epoch 87...
INFO - Best model: Loss = 2.156 | Best epoch = 87


here??


INFO - ------------ Epoch 88 ------------
INFO - Training:   Loss = 0.074 | Gradient norm = 1.700 | Param update norm = 0.054 | Runtime in seconds = 2.213
INFO - Validation: Loss = 2.147 | Mae e = 0.251 | Mae f = 0.183 | Rmse e = 0.279 | Rmse f = 0.293
INFO - Saving checkpoint at epoch 88...
INFO - Best model: Loss = 2.147 | Best epoch = 88


here??


INFO - ------------ Epoch 89 ------------
INFO - Training:   Loss = 0.085 | Gradient norm = 2.406 | Param update norm = 0.075 | Runtime in seconds = 2.733
INFO - Validation: Loss = 2.132 | Mae e = 0.261 | Mae f = 0.183 | Rmse e = 0.287 | Rmse f = 0.292
INFO - Saving checkpoint at epoch 89...
INFO - Best model: Loss = 2.132 | Best epoch = 89


here??


INFO - ------------ Epoch 90 ------------
INFO - Training:   Loss = 0.079 | Gradient norm = 2.060 | Param update norm = 0.066 | Runtime in seconds = 2.909
INFO - Validation: Loss = 2.118 | Mae e = 0.256 | Mae f = 0.182 | Rmse e = 0.283 | Rmse f = 0.291
INFO - Saving checkpoint at epoch 90...
INFO - Best model: Loss = 2.118 | Best epoch = 90


here??


INFO - ------------ Epoch 91 ------------
INFO - Training:   Loss = 0.078 | Gradient norm = 1.964 | Param update norm = 0.067 | Runtime in seconds = 2.361
INFO - Validation: Loss = 2.102 | Mae e = 0.252 | Mae f = 0.181 | Rmse e = 0.279 | Rmse f = 0.290
INFO - Saving checkpoint at epoch 91...
INFO - Best model: Loss = 2.102 | Best epoch = 91


here??


INFO - ------------ Epoch 92 ------------
INFO - Training:   Loss = 0.076 | Gradient norm = 2.026 | Param update norm = 0.060 | Runtime in seconds = 2.369
INFO - Validation: Loss = 2.088 | Mae e = 0.254 | Mae f = 0.181 | Rmse e = 0.281 | Rmse f = 0.289
INFO - Saving checkpoint at epoch 92...
INFO - Best model: Loss = 2.088 | Best epoch = 92


here??


INFO - ------------ Epoch 93 ------------
INFO - Training:   Loss = 0.071 | Gradient norm = 1.606 | Param update norm = 0.053 | Runtime in seconds = 2.287
INFO - Validation: Loss = 2.077 | Mae e = 0.243 | Mae f = 0.180 | Rmse e = 0.271 | Rmse f = 0.288
INFO - Saving checkpoint at epoch 93...
INFO - Best model: Loss = 2.077 | Best epoch = 93


here??


INFO - ------------ Epoch 94 ------------
INFO - Training:   Loss = 0.071 | Gradient norm = 1.597 | Param update norm = 0.056 | Runtime in seconds = 2.697
INFO - Validation: Loss = 2.066 | Mae e = 0.242 | Mae f = 0.180 | Rmse e = 0.270 | Rmse f = 0.287
INFO - Saving checkpoint at epoch 94...
INFO - Best model: Loss = 2.066 | Best epoch = 94


here??


INFO - ------------ Epoch 95 ------------
INFO - Training:   Loss = 0.074 | Gradient norm = 2.050 | Param update norm = 0.060 | Runtime in seconds = 2.196
INFO - Validation: Loss = 2.054 | Mae e = 0.238 | Mae f = 0.179 | Rmse e = 0.266 | Rmse f = 0.287
INFO - Saving checkpoint at epoch 95...
INFO - Best model: Loss = 2.054 | Best epoch = 95


here??


INFO - ------------ Epoch 96 ------------
INFO - Training:   Loss = 0.070 | Gradient norm = 1.618 | Param update norm = 0.053 | Runtime in seconds = 2.095
INFO - Validation: Loss = 2.046 | Mae e = 0.233 | Mae f = 0.179 | Rmse e = 0.261 | Rmse f = 0.286
INFO - Saving checkpoint at epoch 96...
INFO - Best model: Loss = 2.046 | Best epoch = 96


here??


INFO - ------------ Epoch 97 ------------
INFO - Training:   Loss = 0.072 | Gradient norm = 1.834 | Param update norm = 0.060 | Runtime in seconds = 2.303
INFO - Validation: Loss = 2.031 | Mae e = 0.240 | Mae f = 0.178 | Rmse e = 0.268 | Rmse f = 0.285
INFO - Saving checkpoint at epoch 97...
INFO - Best model: Loss = 2.031 | Best epoch = 97


here??


INFO - ------------ Epoch 98 ------------
INFO - Training:   Loss = 0.072 | Gradient norm = 1.806 | Param update norm = 0.062 | Runtime in seconds = 2.721
INFO - Validation: Loss = 2.017 | Mae e = 0.241 | Mae f = 0.177 | Rmse e = 0.268 | Rmse f = 0.284
INFO - Saving checkpoint at epoch 98...
INFO - Best model: Loss = 2.017 | Best epoch = 98


here??


INFO - ------------ Epoch 99 ------------
INFO - Training:   Loss = 0.067 | Gradient norm = 1.492 | Param update norm = 0.054 | Runtime in seconds = 2.240
INFO - Validation: Loss = 2.007 | Mae e = 0.223 | Mae f = 0.177 | Rmse e = 0.252 | Rmse f = 0.283
INFO - Saving checkpoint at epoch 99...
INFO - Best model: Loss = 2.007 | Best epoch = 99


here??


INFO - ------------ Epoch 100 ------------
INFO - Training:   Loss = 0.066 | Gradient norm = 1.402 | Param update norm = 0.051 | Runtime in seconds = 3.028
INFO - Validation: Loss = 1.993 | Mae e = 0.219 | Mae f = 0.176 | Rmse e = 0.248 | Rmse f = 0.282
INFO - Saving checkpoint at epoch 100...
INFO - Best model: Loss = 1.993 | Best epoch = 100
INFO - Training loop completed.


here??


0,1
loss,█▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mae_e,██▃▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mae_f,█▇▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mse_e,▂▂▇█▆▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mse_f,█▇▅▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
loss,1.99279
mae_e,0.21906
mae_f,0.17612
mse_e,0.06171
mse_f,0.07971


ERROR - Task was destroyed but it is pending!
task: <Task cancelling name='Task-86800' coro=<Event.wait() running at /usr/lib/python3.11/asyncio/locks.py:213> wait_for=<Future cancelled>>
[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 6nm8sgmm with config:
[34m[1mwandb[0m: 	energy_weight: 500
[34m[1mwandb[0m: 	final_learning_rate: 0.00026244440126435374
[34m[1mwandb[0m: 	init_learning_rate: 0.0013913684561052452
[34m[1mwandb[0m: 	num_layers: 4
[34m[1mwandb[0m: 	peak_learning_rate: 0.0015217805675181075
[34m[1mwandb[0m: 	weight_decay: 0.0008304696644800984


INFO - Cleaning up existing temporary directories at /content/training/model_training/model.
INFO - Number of parameters: 59626
INFO - Number of parameters in optimizer: 238510
INFO - Starting training loop...
INFO - Validation: Loss = 86.943 | Mae e = 2.963 | Mae f = 1.388 | Rmse e = 3.030 | Rmse f = 1.865
INFO - Best model: Loss = 86.943 | Best epoch = 0


here??


INFO - ------------ Epoch 1 ------------
INFO - Training:   Loss = 22.375 | Gradient norm = 0.251 | Param update norm = 0.268 | Runtime in seconds = 41.324
INFO - Validation: Loss = 86.540 | Mae e = 3.055 | Mae f = 1.384 | Rmse e = 3.120 | Rmse f = 1.860
INFO - Saving checkpoint at epoch 1...
INFO - Best model: Loss = 86.540 | Best epoch = 1


here??


INFO - ------------ Epoch 2 ------------
INFO - Training:   Loss = 20.976 | Gradient norm = 1.678 | Param update norm = 0.271 | Runtime in seconds = 1.330
INFO - Validation: Loss = 84.555 | Mae e = 3.424 | Mae f = 1.368 | Rmse e = 3.481 | Rmse f = 1.839
INFO - Saving checkpoint at epoch 2...
INFO - Best model: Loss = 84.555 | Best epoch = 2


here??


INFO - ------------ Epoch 3 ------------
INFO - Training:   Loss = 13.678 | Gradient norm = 4.085 | Param update norm = 0.313 | Runtime in seconds = 1.255
INFO - Validation: Loss = 76.963 | Mae e = 4.462 | Mae f = 1.305 | Rmse e = 4.504 | Rmse f = 1.754
INFO - Saving checkpoint at epoch 3...
INFO - Best model: Loss = 76.963 | Best epoch = 3


here??


INFO - ------------ Epoch 4 ------------
INFO - Training:   Loss = 6.565 | Gradient norm = 3.763 | Param update norm = 0.228 | Runtime in seconds = 1.240
INFO - Validation: Loss = 64.114 | Mae e = 6.086 | Mae f = 1.189 | Rmse e = 6.114 | Rmse f = 1.601
INFO - Saving checkpoint at epoch 4...
INFO - Best model: Loss = 64.114 | Best epoch = 4


here??


INFO - ------------ Epoch 5 ------------
INFO - Training:   Loss = 3.908 | Gradient norm = 4.295 | Param update norm = 0.156 | Runtime in seconds = 1.401
INFO - Validation: Loss = 51.836 | Mae e = 7.609 | Mae f = 1.066 | Rmse e = 7.628 | Rmse f = 1.439
INFO - Saving checkpoint at epoch 5...
INFO - Best model: Loss = 51.836 | Best epoch = 5


here??


INFO - ------------ Epoch 6 ------------
INFO - Training:   Loss = 3.098 | Gradient norm = 6.141 | Param update norm = 0.116 | Runtime in seconds = 1.262
INFO - Validation: Loss = 42.316 | Mae e = 8.783 | Mae f = 0.960 | Rmse e = 8.798 | Rmse f = 1.299
INFO - Saving checkpoint at epoch 6...
INFO - Best model: Loss = 42.316 | Best epoch = 6


here??


INFO - ------------ Epoch 7 ------------
INFO - Training:   Loss = 2.697 | Gradient norm = 5.619 | Param update norm = 0.099 | Runtime in seconds = 1.394
INFO - Validation: Loss = 35.233 | Mae e = 9.629 | Mae f = 0.872 | Rmse e = 9.641 | Rmse f = 1.185
INFO - Saving checkpoint at epoch 7...
INFO - Best model: Loss = 35.233 | Best epoch = 7


here??


INFO - ------------ Epoch 8 ------------
INFO - Training:   Loss = 2.410 | Gradient norm = 6.055 | Param update norm = 0.086 | Runtime in seconds = 1.252
INFO - Validation: Loss = 30.031 | Mae e = 10.161 | Mae f = 0.801 | Rmse e = 10.171 | Rmse f = 1.093
INFO - Saving checkpoint at epoch 8...
INFO - Best model: Loss = 30.031 | Best epoch = 8


here??


INFO - ------------ Epoch 9 ------------
INFO - Training:   Loss = 2.168 | Gradient norm = 6.624 | Param update norm = 0.082 | Runtime in seconds = 1.405
INFO - Validation: Loss = 26.301 | Mae e = 10.426 | Mae f = 0.746 | Rmse e = 10.435 | Rmse f = 1.023
INFO - Saving checkpoint at epoch 9...
INFO - Best model: Loss = 26.301 | Best epoch = 9


here??


INFO - ------------ Epoch 10 ------------
INFO - Training:   Loss = 2.016 | Gradient norm = 9.937 | Param update norm = 0.081 | Runtime in seconds = 1.254
INFO - Validation: Loss = 23.553 | Mae e = 10.508 | Mae f = 0.702 | Rmse e = 10.515 | Rmse f = 0.968
INFO - Saving checkpoint at epoch 10...
INFO - Best model: Loss = 23.553 | Best epoch = 10


here??


INFO - ------------ Epoch 11 ------------
INFO - Training:   Loss = 1.789 | Gradient norm = 7.623 | Param update norm = 0.072 | Runtime in seconds = 1.325
INFO - Validation: Loss = 21.579 | Mae e = 10.436 | Mae f = 0.667 | Rmse e = 10.443 | Rmse f = 0.926
INFO - Saving checkpoint at epoch 11...
INFO - Best model: Loss = 21.579 | Best epoch = 11


here??


INFO - ------------ Epoch 12 ------------
INFO - Training:   Loss = 1.601 | Gradient norm = 8.686 | Param update norm = 0.068 | Runtime in seconds = 1.343
INFO - Validation: Loss = 20.359 | Mae e = 10.173 | Mae f = 0.644 | Rmse e = 10.179 | Rmse f = 0.899
INFO - Saving checkpoint at epoch 12...
INFO - Best model: Loss = 20.359 | Best epoch = 12


here??


INFO - ------------ Epoch 13 ------------
INFO - Training:   Loss = 1.488 | Gradient norm = 12.824 | Param update norm = 0.069 | Runtime in seconds = 1.242
INFO - Validation: Loss = 19.544 | Mae e = 9.814 | Mae f = 0.626 | Rmse e = 9.820 | Rmse f = 0.881
INFO - Saving checkpoint at epoch 13...
INFO - Best model: Loss = 19.544 | Best epoch = 13


here??


INFO - ------------ Epoch 14 ------------
INFO - Training:   Loss = 1.356 | Gradient norm = 12.266 | Param update norm = 0.062 | Runtime in seconds = 1.269
INFO - Validation: Loss = 18.943 | Mae e = 9.415 | Mae f = 0.612 | Rmse e = 9.421 | Rmse f = 0.868
INFO - Saving checkpoint at epoch 14...
INFO - Best model: Loss = 18.943 | Best epoch = 14


here??


INFO - ------------ Epoch 15 ------------
INFO - Training:   Loss = 1.243 | Gradient norm = 12.228 | Param update norm = 0.057 | Runtime in seconds = 1.253
INFO - Validation: Loss = 18.412 | Mae e = 8.999 | Mae f = 0.599 | Rmse e = 9.005 | Rmse f = 0.856
INFO - Saving checkpoint at epoch 15...
INFO - Best model: Loss = 18.412 | Best epoch = 15


here??


INFO - ------------ Epoch 16 ------------
INFO - Training:   Loss = 1.107 | Gradient norm = 7.386 | Param update norm = 0.052 | Runtime in seconds = 1.298
INFO - Validation: Loss = 18.095 | Mae e = 8.525 | Mae f = 0.591 | Rmse e = 8.531 | Rmse f = 0.848
INFO - Saving checkpoint at epoch 16...
INFO - Best model: Loss = 18.095 | Best epoch = 16


here??


INFO - ------------ Epoch 17 ------------
INFO - Training:   Loss = 1.041 | Gradient norm = 9.060 | Param update norm = 0.051 | Runtime in seconds = 1.246
INFO - Validation: Loss = 17.864 | Mae e = 8.033 | Mae f = 0.583 | Rmse e = 8.039 | Rmse f = 0.843
INFO - Saving checkpoint at epoch 17...
INFO - Best model: Loss = 17.864 | Best epoch = 17


here??


INFO - ------------ Epoch 18 ------------
INFO - Training:   Loss = 0.965 | Gradient norm = 8.572 | Param update norm = 0.046 | Runtime in seconds = 1.255
INFO - Validation: Loss = 17.673 | Mae e = 7.527 | Mae f = 0.577 | Rmse e = 7.533 | Rmse f = 0.839
INFO - Saving checkpoint at epoch 18...
INFO - Best model: Loss = 17.673 | Best epoch = 18


here??


INFO - ------------ Epoch 19 ------------
INFO - Training:   Loss = 0.933 | Gradient norm = 10.010 | Param update norm = 0.048 | Runtime in seconds = 1.340
INFO - Validation: Loss = 17.422 | Mae e = 7.072 | Mae f = 0.570 | Rmse e = 7.078 | Rmse f = 0.833
INFO - Saving checkpoint at epoch 19...
INFO - Best model: Loss = 17.422 | Best epoch = 19


here??


INFO - ------------ Epoch 20 ------------
INFO - Training:   Loss = 0.869 | Gradient norm = 7.534 | Param update norm = 0.044 | Runtime in seconds = 1.311
INFO - Validation: Loss = 17.140 | Mae e = 6.648 | Mae f = 0.562 | Rmse e = 6.654 | Rmse f = 0.827
INFO - Saving checkpoint at epoch 20...
INFO - Best model: Loss = 17.140 | Best epoch = 20


here??


INFO - ------------ Epoch 21 ------------
INFO - Training:   Loss = 0.843 | Gradient norm = 9.021 | Param update norm = 0.044 | Runtime in seconds = 1.115
INFO - Validation: Loss = 16.840 | Mae e = 6.263 | Mae f = 0.555 | Rmse e = 6.270 | Rmse f = 0.819
INFO - Saving checkpoint at epoch 21...
INFO - Best model: Loss = 16.840 | Best epoch = 21


here??


INFO - ------------ Epoch 22 ------------
INFO - Training:   Loss = 0.803 | Gradient norm = 9.006 | Param update norm = 0.042 | Runtime in seconds = 1.452
INFO - Validation: Loss = 16.584 | Mae e = 5.890 | Mae f = 0.549 | Rmse e = 5.897 | Rmse f = 0.813
INFO - Saving checkpoint at epoch 22...
INFO - Best model: Loss = 16.584 | Best epoch = 22


here??


INFO - ------------ Epoch 23 ------------
INFO - Training:   Loss = 0.797 | Gradient norm = 11.142 | Param update norm = 0.044 | Runtime in seconds = 1.242
INFO - Validation: Loss = 16.156 | Mae e = 5.605 | Mae f = 0.540 | Rmse e = 5.612 | Rmse f = 0.803
INFO - Saving checkpoint at epoch 23...
INFO - Best model: Loss = 16.156 | Best epoch = 23


here??


INFO - ------------ Epoch 24 ------------
INFO - Training:   Loss = 0.745 | Gradient norm = 7.291 | Param update norm = 0.041 | Runtime in seconds = 1.368
INFO - Validation: Loss = 15.831 | Mae e = 5.316 | Mae f = 0.533 | Rmse e = 5.324 | Rmse f = 0.795
INFO - Saving checkpoint at epoch 24...
INFO - Best model: Loss = 15.831 | Best epoch = 24


here??


INFO - ------------ Epoch 25 ------------
INFO - Training:   Loss = 0.716 | Gradient norm = 8.784 | Param update norm = 0.037 | Runtime in seconds = 1.258
INFO - Validation: Loss = 15.562 | Mae e = 5.030 | Mae f = 0.526 | Rmse e = 5.038 | Rmse f = 0.788
INFO - Saving checkpoint at epoch 25...
INFO - Best model: Loss = 15.562 | Best epoch = 25


here??


INFO - ------------ Epoch 26 ------------
INFO - Training:   Loss = 0.684 | Gradient norm = 7.686 | Param update norm = 0.036 | Runtime in seconds = 1.246
INFO - Validation: Loss = 15.332 | Mae e = 4.755 | Mae f = 0.521 | Rmse e = 4.763 | Rmse f = 0.782
INFO - Saving checkpoint at epoch 26...
INFO - Best model: Loss = 15.332 | Best epoch = 26


here??


INFO - ------------ Epoch 27 ------------
INFO - Training:   Loss = 0.694 | Gradient norm = 10.688 | Param update norm = 0.040 | Runtime in seconds = 1.263
INFO - Validation: Loss = 15.081 | Mae e = 4.509 | Mae f = 0.515 | Rmse e = 4.517 | Rmse f = 0.776
INFO - Saving checkpoint at epoch 27...
INFO - Best model: Loss = 15.081 | Best epoch = 27


here??


INFO - ------------ Epoch 28 ------------
INFO - Training:   Loss = 0.661 | Gradient norm = 9.248 | Param update norm = 0.037 | Runtime in seconds = 1.289
INFO - Validation: Loss = 14.746 | Mae e = 4.338 | Mae f = 0.508 | Rmse e = 4.347 | Rmse f = 0.767
INFO - Saving checkpoint at epoch 28...
INFO - Best model: Loss = 14.746 | Best epoch = 28


here??


INFO - ------------ Epoch 29 ------------
INFO - Training:   Loss = 0.625 | Gradient norm = 6.367 | Param update norm = 0.035 | Runtime in seconds = 1.301
INFO - Validation: Loss = 14.497 | Mae e = 4.124 | Mae f = 0.503 | Rmse e = 4.133 | Rmse f = 0.761
INFO - Saving checkpoint at epoch 29...
INFO - Best model: Loss = 14.497 | Best epoch = 29


here??


INFO - ------------ Epoch 30 ------------
INFO - Training:   Loss = 0.632 | Gradient norm = 10.135 | Param update norm = 0.037 | Runtime in seconds = 1.259
INFO - Validation: Loss = 14.291 | Mae e = 3.919 | Mae f = 0.498 | Rmse e = 3.929 | Rmse f = 0.756
INFO - Saving checkpoint at epoch 30...
INFO - Best model: Loss = 14.291 | Best epoch = 30


here??


INFO - ------------ Epoch 31 ------------
INFO - Training:   Loss = 0.597 | Gradient norm = 7.444 | Param update norm = 0.032 | Runtime in seconds = 1.079
INFO - Validation: Loss = 13.966 | Mae e = 3.776 | Mae f = 0.492 | Rmse e = 3.786 | Rmse f = 0.747
INFO - Saving checkpoint at epoch 31...
INFO - Best model: Loss = 13.966 | Best epoch = 31


here??


INFO - ------------ Epoch 32 ------------
INFO - Training:   Loss = 0.596 | Gradient norm = 9.184 | Param update norm = 0.035 | Runtime in seconds = 1.085
INFO - Validation: Loss = 13.737 | Mae e = 3.611 | Mae f = 0.487 | Rmse e = 3.621 | Rmse f = 0.741
INFO - Saving checkpoint at epoch 32...
INFO - Best model: Loss = 13.737 | Best epoch = 32


here??


INFO - ------------ Epoch 33 ------------
INFO - Training:   Loss = 0.558 | Gradient norm = 5.570 | Param update norm = 0.029 | Runtime in seconds = 1.260
INFO - Validation: Loss = 13.517 | Mae e = 3.449 | Mae f = 0.482 | Rmse e = 3.460 | Rmse f = 0.735
INFO - Saving checkpoint at epoch 33...
INFO - Best model: Loss = 13.517 | Best epoch = 33


here??


INFO - ------------ Epoch 34 ------------
INFO - Training:   Loss = 0.555 | Gradient norm = 6.673 | Param update norm = 0.032 | Runtime in seconds = 1.224
INFO - Validation: Loss = 13.362 | Mae e = 3.266 | Mae f = 0.478 | Rmse e = 3.277 | Rmse f = 0.731
INFO - Saving checkpoint at epoch 34...
INFO - Best model: Loss = 13.362 | Best epoch = 34


here??


INFO - ------------ Epoch 35 ------------
INFO - Training:   Loss = 0.533 | Gradient norm = 5.531 | Param update norm = 0.029 | Runtime in seconds = 1.247
INFO - Validation: Loss = 13.167 | Mae e = 3.115 | Mae f = 0.474 | Rmse e = 3.127 | Rmse f = 0.725
INFO - Saving checkpoint at epoch 35...
INFO - Best model: Loss = 13.167 | Best epoch = 35


here??


INFO - ------------ Epoch 36 ------------
INFO - Training:   Loss = 0.518 | Gradient norm = 4.991 | Param update norm = 0.028 | Runtime in seconds = 1.426
INFO - Validation: Loss = 13.019 | Mae e = 2.960 | Mae f = 0.470 | Rmse e = 2.972 | Rmse f = 0.721
INFO - Saving checkpoint at epoch 36...
INFO - Best model: Loss = 13.019 | Best epoch = 36


here??


INFO - ------------ Epoch 37 ------------
INFO - Training:   Loss = 0.509 | Gradient norm = 5.436 | Param update norm = 0.029 | Runtime in seconds = 1.285
INFO - Validation: Loss = 12.852 | Mae e = 2.814 | Mae f = 0.466 | Rmse e = 2.827 | Rmse f = 0.717
INFO - Saving checkpoint at epoch 37...
INFO - Best model: Loss = 12.852 | Best epoch = 37


here??


INFO - ------------ Epoch 38 ------------
INFO - Training:   Loss = 0.512 | Gradient norm = 6.277 | Param update norm = 0.032 | Runtime in seconds = 1.305
INFO - Validation: Loss = 12.683 | Mae e = 2.689 | Mae f = 0.463 | Rmse e = 2.701 | Rmse f = 0.712
INFO - Saving checkpoint at epoch 38...
INFO - Best model: Loss = 12.683 | Best epoch = 38


here??


INFO - ------------ Epoch 39 ------------
INFO - Training:   Loss = 0.501 | Gradient norm = 6.762 | Param update norm = 0.031 | Runtime in seconds = 1.239
INFO - Validation: Loss = 12.479 | Mae e = 2.586 | Mae f = 0.458 | Rmse e = 2.599 | Rmse f = 0.706
INFO - Saving checkpoint at epoch 39...
INFO - Best model: Loss = 12.479 | Best epoch = 39


here??


INFO - ------------ Epoch 40 ------------
INFO - Training:   Loss = 0.522 | Gradient norm = 9.798 | Param update norm = 0.035 | Runtime in seconds = 1.389
INFO - Validation: Loss = 12.285 | Mae e = 2.523 | Mae f = 0.454 | Rmse e = 2.536 | Rmse f = 0.701
INFO - Saving checkpoint at epoch 40...
INFO - Best model: Loss = 12.285 | Best epoch = 40


here??


INFO - ------------ Epoch 41 ------------
INFO - Training:   Loss = 0.475 | Gradient norm = 5.902 | Param update norm = 0.026 | Runtime in seconds = 1.280
INFO - Validation: Loss = 12.098 | Mae e = 2.439 | Mae f = 0.450 | Rmse e = 2.453 | Rmse f = 0.695
INFO - Saving checkpoint at epoch 41...
INFO - Best model: Loss = 12.098 | Best epoch = 41


here??


INFO - ------------ Epoch 42 ------------
INFO - Training:   Loss = 0.497 | Gradient norm = 9.493 | Param update norm = 0.033 | Runtime in seconds = 1.416
INFO - Validation: Loss = 11.905 | Mae e = 2.370 | Mae f = 0.446 | Rmse e = 2.384 | Rmse f = 0.690
INFO - Saving checkpoint at epoch 42...
INFO - Best model: Loss = 11.905 | Best epoch = 42


here??


INFO - ------------ Epoch 43 ------------
INFO - Training:   Loss = 0.456 | Gradient norm = 4.652 | Param update norm = 0.025 | Runtime in seconds = 1.392
INFO - Validation: Loss = 11.714 | Mae e = 2.293 | Mae f = 0.442 | Rmse e = 2.308 | Rmse f = 0.684
INFO - Saving checkpoint at epoch 43...
INFO - Best model: Loss = 11.714 | Best epoch = 43


here??


In [None]:
# force_field = ForceField.from_mlip_network(mlip_network)

Next, we **create an optimizer**:

The *mlip* library is set up so that you can use any [`optax`](https://github.com/google-deepmind/optax) optimizer you like, here we've chosen for [`optax.adam`](https://optax.readthedocs.io/en/latest/api/optimizers.html#optax.adam).

**Hint:** Also try an optimizer specialized for MLIP models (see [this](https://instadeepai.github.io/mlip/api_reference/training/optimizer.html#mlip.training.optimizer.get_default_mlip_optimizer) part of the documentation for more details).

In [None]:
# from mlip.training.optimizer import get_default_mlip_optimizer
# from mlip.training.optimizer_config import OptimizerConfig

# # Configure the specialized MLIP optimizer
# optimizer_config = OptimizerConfig(
#     apply_weight_decay_mask=True,
#     weight_decay= run.config["weight_decay"],
#     grad_norm=run.config["grad_norm"],
#     num_gradient_accumulation_steps=run.config["num_gradient_accumulation_steps"],
#     init_learning_rate=run.config["init_learning_rate"],
#     peak_learning_rate=run.config["peak_learning_rate"],
#     final_learning_rate=run.config["final_learning_rate"],
#     warmup_steps=run.config["warmup_steps"],
#     transition_steps=run.config["transition_steps"]
# )

# optimizer = get_default_mlip_optimizer(optimizer_config)

# print("Loss function:", loss)
# print("Optimizer configuration:", optimizer_config)
# print("Optimizer:", optimizer)

For the **loss**:

We use a Mean-Squared-Error (MSE) loss, that by default uses a weighting factor of 25.0 for MSE of forces, 1.0 for MSE of energies, and zero for MSE of stress (which is not available in this dataset). See [this](https://instadeepai.github.io/mlip/user_guide/training.html#loss) part of the documentation for more information on further options such as alternative loss functions and how to create a weight flip schedule between energy and forces.

In [None]:
# loss = MSELoss()

# energy_weight_schedule = optax.piecewise_constant_schedule(run.config["energy_weight"])

We can now set up a **custom I/O handler** with checkpointing for training:

The I/O handler class is documented [here](https://instadeepai.github.io/mlip/api_reference/training/training_io_handling.html). Also check out [this](https://instadeepai.github.io/mlip/user_guide/training.html#io-handling-and-logging) part in our deep-dive on logging for more information. The code below adds a local directory for checkpointing to the I/O handler, which activates checkpointing during training.

In [None]:
# io_handler = TrainingIOHandler(
#     TrainingIOHandler.Config(
#         local_model_output_dir="training/model_training"
#     )
# )

Next, we can **attach logging functions to the I/O handler**:

Users can attached as many logging functions as required to the I/O handler. In this example, we attach two.
1. The [`log_metrics_to_line`](https://instadeepai.github.io/mlip/api_reference/training/training_io_handling.html#mlip.training.training_loggers.log_metrics_to_line) function that is also included in the default I/O handler that we used in the previous example.
2. A custom function that just keeps track of the validation set losses, so we can later easily create a curve from it.

In [None]:
# # The following logger is also attached in the default I/O handler
# # # that was used in the training above
# io_handler.attach_logger(log_metrics_to_line)

# # Define a custom logging function that keeps track of validation loss
# validation_losses = []
# training_losses = []
# def _custom_logger(category, to_log, epoch_number):
#   if category == LogCategory.EVAL_METRICS:
#     validation_losses.append(to_log["loss"])
#   elif category == LogCategory.TRAIN_METRICS:
#     training_losses.append(to_log["loss"])

# # Attach our custom logging function to the I/O handler
# io_handler.attach_logger(_custom_logger)

The custom logging function is called several times during the training loop with the argument `category` (e.g. `TRAIN_METRICS`, `EVAL_METRICS`) telling the function what is currently being logged, for example, train or evaluation metrics. It is of enum type [`LogCategory`](https://instadeepai.github.io/mlip/api_reference/training/training_io_handling.html#mlip.training.training_io_handler.LogCategory). See the documentation of the built-in function [`log_metrics_to_line`](https://instadeepai.github.io/mlip/api_reference/training/training_io_handling.html#mlip.training.training_loggers.log_metrics_to_line) for what we expect the logging function's signature to be.

Finally, we can **create our training loop**:

At a minimum, it needs as input:
- a training dataset
- a validation dataset
- a force field
- a loss
- an optimizer
- a config (which specifies for instace the number of epochs)

Note that the config is documented [here](https://instadeepai.github.io/mlip/api_reference/training/training_loop_config.html). Its only argument that lacks a default value is the number of epochs to train for.

In [None]:
# training_config = TrainingLoop.Config(
#     num_epochs=run.config["epochs"],
#     energy_weight=run.config["energy_weight"],
#     force_weight=0, # Reduce this to make energy more important
# )

# training_loop = TrainingLoop(
#     train_dataset=train_set,
#     validation_dataset=validation_set,
#     force_field=force_field,
#     loss=loss,
#     optimizer=optimizer,
#     config=training_config,
#     io_handler=io_handler,
# )

## 3. Running a training loop

**Running the loop**:

The following box runs the prepared training loop. Note that training will be a **lot more efficient for GPU users** (depending on the GPU, one should expect ~1s to ~12s per epoch, once the code is compiled) - for CPU users our measures ranged from ~12s to ~100s per epoch.

In [None]:
# training_loop.run()

We can now **access the information stored** by the custom logger saved into our validation loss list:

In [None]:
# print(validation_losses)

Let's create a training curve from these values.

In [None]:
# epoch_nums = list(range(len(validation_losses)))
# epoch_nums = list(range(len(training_losses)))
# plt.plot(epoch_nums, training_losses, c="red")
# plt.plot(epoch_nums, validation_losses, c="blue")
# plt.xlabel("Epoch")
# plt.ylabel("Validation loss (blue) & Training Loss (red)")
# plt.xticks(epoch_nums)
# plt.show()

**Recovering the best validation model**:

After training has completed, the [`TrainingLoop`](https://instadeepai.github.io/mlip/api_reference/training/training_loop.html) holds all the relevant information about the run. We can obtain the force field with the best validation parameters as follows:

In [None]:
best_force_field = training_loop.best_model

This force field object can now be applied in, for example, MD simulations or energy minimizations.

Furthermore, these checkpoints can of course also be **used to restart a training from a given checkpoint**. We refer to the [documentation of the I/O handler's config](https://instadeepai.github.io/mlip/api_reference/training/training_io_handling.html#mlip.training.training_io_handler.TrainingIOHandlerConfig) for more information on this.

**Saving the model to a zip file**:

We can also save the trained model in zip format. This is also the format that we provide our pre-trained models in.

In [None]:
save_model_to_zip("training/my_final_model.zip", best_force_field)

**Loading a pre-trained model**

We can also load a pre-trained model from a zip file. This is useful if you want to use a pre-trained model for inference or fine-tuning on a different dataset.

In [None]:
best_force_field = load_model_from_zip(Mace, "training/my_final_model.zip")

print("Dataset info:", best_force_field.dataset_info)

## Evaluation

The model will be tested on its ability to predict the energies of new conformations
of the same molecule. However, to test the generalization
capabilities of the model, these conformations are sampled at higher temperature,
i.e., 1200 Kelvin. The test conformations are located in the
`test_public.xyz` file. You can predict energies for them with a model saved in the
zip format with the mlip library's batched inference functionality, described
[here](https://instadeepai.github.io/mlip/user_guide/simulations.html#batched-inference)
in the mlip documentation or explained in section 2 of
[mlip's simulation tutorial](https://github.com/instadeepai/mlip/blob/main/tutorials/simulation_tutorial.ipynb).
The public leaderboard contains the target energies. The metric the predictions will be scored on is root-mean-square error (RMSE).


**Hint**: to get the best energies, you want to put higher weights on energies during training which is not the case in the default settings, where forces are more highly weighted.


In [None]:
from ase.io import read as ase_read

test_data = "data/test_public.xyz"
structures = ase_read(test_data, index=":")

We can now run inference with a single pre-built function, note Jax starts by compiling all the required functions. It may appear slow at the beginning but this provides significant acceleration at scales (compilation is saved in the notebook kernel, so if you want an illustration of the speed gains, you can run the cell twice):

In [None]:
from mlip.inference import run_batched_inference

predictions = run_batched_inference(structures, best_force_field, batch_size=8)

# Get your output into submission format

We need to get our outputs into their "camera-ready" form.

In [None]:
energies = np.array([prediction.energy for prediction in predictions])

# Create DataFrame
df = pd.DataFrame({
    'ID': np.arange(len(energies)),
    'energies': energies
})

df

In [None]:
your_name = "YOUR_NAME_HERE"
filename = f"{your_name}_submission.csv"
df.to_csv(filename, index=False)
print(f"Saved submission to {filename}")

Submit your solution by uploading the CSV file to the [Zindi competition page](https://zindi.africa/competitions/indabax-south-africa-2025-hackathon-with-instadeep).