---
# 03_training_and_evaluation.ipynb
---

This notebook provides a complete end-to-end example of how to train and evaluate a model using the `ModularML` framework. It demonstrates how to:

1. Define a `ModelGraph` with one or more trainable model stages
2. Wrap training logic in a `TrainingPhase`
3. Run training using the `Experiment` container
4. Evaluate the trained model using `EvaluationPhase`
5. Visualize predictions against ground truth

We use a simple example in which a fully connected MLP regressor is trained to estimate target values from pulse charge features of a battery dataset.

We will be utilizing the FeatureSet and ModelStages created in the prior two notebooks. 
If you haven't already, please go through the following examples:
* [01_featureset_basics.ipynb](./01_featureset_basics.ipynb)
* [02_modelgraph_basics.ipynb](./02_modelgraph_basics.ipynb)


Let's reload the pre-processed FeatureSet and ModelStages.


In [None]:
from pathlib import Path

import modularml as mml
from modularml.core import FeatureSet, ModelGraph, ModelStage, Optimizer
from modularml.models.torch import SequentialMLP

FILE_FEATURE_SET = Path("downloaded_data/charge_samples.joblib")
charge_samples = FeatureSet.load(FILE_FEATURE_SET)
charge_samples

We will start this example with a simple 3-layer MLP model.
It pulls features from the 'ChargePulseFeatures' FeatureSet.

We can confirm the target output size with `FeatureSet.target_shape_spec`

In [None]:
print(charge_samples.target_shape_spec)

ms_regressor = ModelStage(
    model=SequentialMLP(output_shape=(1, 1), n_layers=3, hidden_dim=32),
    label="Regressor",
    upstream_node=charge_samples,
    optimizer=Optimizer("adam", lr=1e-3),
)

The ModelGraph is constructed via a list of ModelStages and FeatureSets.

Calling `build_all` ensure all ModelStages instantiate their underlying NN components, and infer any missing input/output shapes. 

In [None]:
mg = ModelGraph(nodes=[charge_samples, ms_regressor])
mg.build_all()
mg.visualize()

Now that the `ModelGraph` is built, we are ready to move on to the core ModularML training logic and central `Experiment` container.

## Define a Training Phase

Training in ModularML is handled through a `TrainingPhase`. It is a declarative container that defines how to train the ModelGraph.

If has the following initiallization arguments:
* `label`: a name (str) to assigned to this training phase for logging (e.g., "pretrain_encoder").
* `losses`: a list of `AppliedLoss` objects
* `samplers`: a mapping of `FeatureSampler`s to FeatureSets in ModelGraph.
* `batch_size`: the batch size to use across all batches in a single training phase.
* `n_epochs`: the number of training epochs.



Let's start with the `AppliedLoss` class.
It defines the loss function and how it should be applied to the ModelGraph.

For example, let's assume we are using a TripletSampler, which produces 'anchor', 'positive', and 'negative' sample pairs.
We would have a corresponding TripletLoss that takes in the ModelStage outputs for each of these sample roles. 

``` python
def triplet_loss(anchor, positive, negative): ...
```

To apply this triplet_loss to a ModelStage's output (assume it's called 'encoder'), we would define an AppliedLoss where the `inputs` argument maps the loss function keyword arguments to the available samples and roles.


``` python
ap = AppliedLoss(
	label='my_triplet_loss',
	loss=Loss(loss_function=triplet_loss),
	inputs={
		'anchor': "ChargePulseFeatures.features.anchor",
		'positive': "ChargePulseFeatures.features.negative",
		'negative': "ChargePulseFeatures.features.positive",
	}
)
```

The `inputs` key-value following the following schema: 
* the key must be a keyword argument of the loss_function. If the loss function only accepts positional arguments, keys can take the form of integers or string equivalents (e.g., `{"0": ..., }` or `{0: ..., }`)
* the value is a period (.)-parsed string with the following pattern: `'node.attribute.role'`:
  * `node` is the label of a FeatureSet or ModelStage contained in ModelGraph
  * `attribute` is one of the following: ['features',  'targets', 'output']
    * 'features' and 'targets' only apply if `node` is a FeatureSet
    * 'output' is used if `node` is a ModelStage
  * `role` is a sample key defined by the FeatureSampler used. E.g, a TripletSampler creates 'anchor', 'positive', and 'negative' roles. A simple FeatureSampler has only a 'default' role. If `role` is ommited (e.g., `'node.attribute'`), then the role is assumed to be 'default'.

For our first example, we only have a single MLP ModelStage where our task is estimating battery state-of-health. 

We will start with only a single mean-squared-error loss function applied to the regressor outputs.
Common loss functions are easily accessible using the `name` and `backend` attribute of the `Loss` class.
See the documentation for a more detailed description on available losses.

In [None]:
from modularml.core import AppliedLoss, Loss

mse_loss = AppliedLoss(
    label="MyAppliedLoss",
    loss=Loss(name="mse", backend=mml.Backend.TORCH),
    all_inputs={  # The PyTorch MSELoss only accept positional arguments
        "0": "ChargePulses.targets",
        "1": "Regressor.output",
    },
)

The next set of `TrainingPhase` attributes define the sampling configuration.

Since we only have one FeatureSet (`"ChargePulseFeatures"`) in our ModelGraph, we only need to create one FeatureSampler.
Our `mse_loss` only need a single sample role ("default"), so we can stick with just a simple FeatureSampler.

In [None]:
from modularml.core import SimpleSampler

sampler = SimpleSampler(
    shuffle=True,
    stratify_by=["pulse_soc"],
    seed=13,
)

The base `FeatureSampler` support grouping and stratification via the `group_by` and `stratify_by` parameters.

Using `stratify_by=["pulse_soc"]` ensures that every batch created from the feature set contains an equal distribution of pulse states of charge (SOC).   

Attaching the sampler to `TrainingPhase` is done with a key:value entry in a dictionary. 
The key must be the name of FeatureSet or FeatureSubset (e.g., 'ChargePulseFeatures' or 'ChargePulseFeatures.train')

In [None]:
from modularml.core import TrainingPhase

phase1 = TrainingPhase(
    label="train_phase",
    losses=[mse_loss],
    train_samplers={"ChargePulses.train": sampler},
    val_samplers={"ChargePulses.val": sampler},
    batch_size=32,
    n_epochs=10,
    early_stop_patience=10,
    early_stop_metric="val_loss",
    early_stop_min_delta=0.5,
)

We now have a fully-configured TrainingPhase, that utilizes the 'train' subset of the 'ChargePulseFeatures' FeatureSet.
An MSE loss is applied to the single MLP regressor ModelStage.

## Create and Run the Experiment

The `Experiment` container manages both training and evaluation phases. 
It takes a `ModelGraph` and one or more phases (training or evaluation).

Calling `.run()` will automatically execute all training phases and manage phase-level loss computation, sampling, and optimizer stepping.

In [None]:
# Initialize and run the experiment
from modularml.core import Experiment

exp = Experiment(
    graph=mg,
    phases=[phase1],
)

# Calling run() will execute all phases in Experiment.phases
# These are run in the order they are provided
res = exp.run_training_phase(phase1)

In [None]:
from matplotlib import pyplot as plt

fig, axes = res.plot_losses()
plt.show()

While we could pass multiple stage right to Experiment during construction, it is sometimes useful during the exploratory stage to run one stage at a time and then evaluate the model.

This can be achieved by using the `run_training_phase` or `run_evaluation_phase` methods, which both take in a single phase to execute.

# Define and Run Evaluation Phases

Evaluation of the ModelGraph utilizes a separate `EvaluationPhase` class, which provides functionality catered explciity to ModelGraph evaluation and not training. 

`EvaluationPhase` utilizes a very similar constructor (minus `n_epochs`), and the `losses` argument is optional.

Here, we define three `EvaluationPhase` objects for the train, val, and test splits of our 'ChargePulseFeatures' FeatureSet. 
Each uses the same MSE loss but samples from different subsets of the dataset during each EvaluationPhase.

In [None]:
from modularml.core import EvaluationPhase

all_results = {}
val_phase = None
for subset in ["train", "val", "test"]:
    val_phase = EvaluationPhase(
        label=f"val_{subset}",
        samplers={f"ChargePulses.{subset}": sampler},
        batch_size=64,
        losses=[mse_loss],
    )
    all_results[subset] = exp.run_evaluation_phase(val_phase)

In [None]:
import numpy as np
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error

from modularml.visualization.common.parity_plot import plot_parity_from_node_outputs

fig, axes = plt.subplots(
    figsize=(2.3 * len(all_results), 2),
    ncols=len(all_results),
    sharex=True,
    sharey=True,
)
for i, subset in enumerate(all_results.keys()):
    val_phase = EvaluationPhase(
        label=f"val_{subset}",
        samplers={f"ChargePulses.{subset}": sampler},
        batch_size=64,
        losses=[mse_loss],
    )
    val_res = exp.run_evaluation_phase(val_phase)
    # fig, axes[i] = val_res.plot_parity("Regressor", ax=axes[i], plot_style='kde', kde_levels=5, vmin=0.5, alpha=1.0)

    df_unscaled = exp.inverse_transform_node_outputs(val_res.outputs, node="Regressor")
    pred = np.vstack(df_unscaled["output"].values).reshape(-1)
    true = np.vstack(df_unscaled["target"].values).reshape(-1)
    fig, axes[i] = plot_parity_from_node_outputs(
        outputs=df_unscaled,
        node="Regressor",
        ax=axes[i],
        plot_style="kde",
        kde_levels=5,
        vmin=1e-4,
        alpha=1.0,
    )
    mse = mean_squared_error(true, pred)
    mape = mean_absolute_percentage_error(true, pred) * 100
    axes[i].annotate(f"MSE={mse:.03f}", xy=(0.97, 0.05), xycoords="axes fraction", ha="right", fontsize=8)
    axes[i].annotate(f"MAPE={mape:.03f}", xy=(0.97, 0.12), xycoords="axes fraction", ha="right", fontsize=8)
    axes[i].set_title(subset, fontsize=10)
    if i > 0:
        axes[i].set_ylabel("")

plt.show()

## Summary

This notebook demonstrated how to:

- Build a ModelGraph with PyTorch backends
- Define training and evaluation phases
- Manage end-to-end execution with the Experiment container
- Visualize model performance


This concludes the **03_training_and_evaluation.ipynb** notebook.

The next tutorial explain `Experiment` tracking and rapid iteration: *...coming soon*