# Tutorial 02 — Train a SOEN Model

In this tutorial, we’ll walk through training a pre-built SOEN model using the training configuration file located at:
`tutorial_notebooks/training/training_configs/pulse_net.yaml`.

We’ll use the `run_from_config` function to launch training. This function makes it easy to set up an experiment — once all training settings are defined in your YAML file, you can start training with a single command.

You can run it either in a script or directly from the command line.
Python:
`run_from_config(str(BASE_CONFIG), script_dir=Path.cwd())`
CLI:
`python -m soen_toolkit.training --config path/to/training_config.yaml`

### ML Task Overview

This example tackles a binary classification problem on time-series inputs:
- Class 1: Input contains a single pulse.
- Class 2: Input contains two distinct pulses.

**Imports**

In [None]:
from pathlib import Path

from soen_toolkit.training.trainers.experiment import run_from_config

**Training**

We’ll use the example model and dataset to launch a local test training run. You can experiment by modifying the training YAML file as needed. For more detailed configurations, see: `src/soen_toolkit/training/examples/training_configs`.

Additional information about the training process can be found in: `src/soen_toolkit/training/README.md`.

If you wish to construct your own datasets, please use hdf5 file format. All instructions can be found at: `docs/DATASETS.md`.

In [None]:
# Launch training via Python API
run_from_config("training/training_configs/pulse_net.yaml", script_dir=Path.cwd())

**View logs in TensorBoard (Optional)**

Start TensorBoard in a terminal so you can watch metrics live.

1. Activate your environment (if not already):
2. Run TensorBoard, pointing at the logs root printed above ("Logs root:"):
```bash
tensorboard --logdir "/path/to/logs/root"
```

---


### Quick Notes on Datasets

soen_toolkit.training models expect datasets in **HDF5 format** with the following structure:

- **Inputs** (`data`): `[N, T, D]`  
  - `N`: number of samples  
  - `T`: sequence length  
  - `D`: feature dimension (should be equal to the number of units in the input layer - ID=0)

- **Labels** (`labels`): shape depends on the task  
  - Classification (seq2static): `[N]` (int64 class indices)  
  - Classification (seq2seq): `[N, T]` (int64 per-timestep classes)  
  - Regression (seq2static): `[N, K]` (float32)  
  - Regression (seq2seq): `[N, T, K]` (float32)  
  - Unsupervised (seq2seq): labels optional; inputs are used as targets  

**Recommended layout:**

root/
train/{data, labels}
val/{data, labels}
test/{data, labels}

**Key config notes:**
- Set `training.paradigm` and `training.mapping` in your YAML (e.g., `supervised` + `seq2static`).  
- Use `data.target_seq_len` to align input/output sequence lengths.  
- Pooling for seq2static tasks is controlled via `model.time_pooling`.
