[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Deep Learning Methods

## Deep Learning - Convolution Neural Network - 1D Convolution Net for Frequency Estimation

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.1.001 | 19/09/2025 | Royi Avital | Updated link of Google Colab                                       |
| 1.1.000 | 17/09/2025 | Royi Avital | Added visualizations on the Training of a model                    |
| 1.0.000 | 29/04/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/DeepLearningMethods/2025_08/0003DeepLearning1DConvFreqEst.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import fetch_california_housing

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.optim.lr_scheduler import LRScheduler
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from torchmetrics.regression import R2Score
import torchinfo

# Miscellaneous
from platform import python_version
import random
import time

# Typing
from typing import Callable, Dict, Generator, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib.pyplot as plt

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

D_CLASSES_CIFAR_10  = {0: 'Airplane', 1: 'Automobile', 2: 'Bird', 3: 'Cat', 4: 'Deer', 5: 'Dog', 6: 'Frog', 7: 'Horse', 8: 'Ship', 9: 'Truck'}
L_CLASSES_CIFAR_10  = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']
T_IMG_SIZE_CIFAR_10 = (32, 32, 3)

DATA_FOLDER_PATH    = 'Data'
TENSOR_BOARD_BASE   = 'TB'

In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    import os
    for fileName in ('DataManipulation.py', 'DataVisualization.py', 'DeepLearningBlocks.py', 'DeepLearningPyTorch.py'):
        if os.path.exists(fileName):
            continue
        os.system(f'wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/DeepLearningMethods/2025_08/{fileName}')

In [None]:
# Courses Packages

from DataVisualization import PlotRegressionResults
from DeepLearningPyTorch import NNMode

In [None]:
# General Auxiliary Functions

def GenHarmonicData( numSignals: int, numSamples: int, samplingFreq: float, maxFreq: float, σ: float ) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Generate a set of harmonic (Sine) signals with noise.
    Input:
        - numSignals: Number of signals.
        - numSamples: Number of samples per signal.
        - samplingFreq: Sampling frequency of the samples.
        - maxFreq: Maximum frequency of the signals. Must obey `maxFreq < 0.5 * samplingFreq`.
        - σ: The noise Standard Deviation.
    Output:
        - mX: Matrix of signals. Shape: (numSignals, numSamples).
        - vF: Vector of frequencies. Shape: (numSignals,).
    Remark:
        - Sampling by Nyquist.
        - The signals are generated as:
            x_i(t) = sin(2π f_i t + φ_i) + n_i(t)
    """

    π = torch.pi #<! Constant Pi

    vT   = torch.linspace(0, numSamples - 1, numSamples) / samplingFreq #<! Time samples
    vF   = maxFreq * torch.rand(numSignals)                             #<! Frequency per signal
    vPhi = 2 * π * torch.rand(numSignals)                               #<! Phase per signal
    
    # x_i(t) = sin(2π f_i t + φ_i) + n_i(t)
    # mX = torch.zeros((numSignals, numSamples))
    # for ii in range(numSignals):
    #     mX[ii] = torch.sin(2 * π * vF[ii] * vT + vPhi[ii])
    # mX = torch.sin(2 * π * vF[:, None] @ vT[None, :] + vPhi[:, None])
    mX = torch.sin(2 * π * torch.outer(vF, vT) + vPhi[:, None])
    mX = mX + σ * torch.randn(mX.shape) #<! Add noise

    return mX, vF

## Frequency Estimation with 1D Convolution Model in PyTorch

This notebook **estimates the frequency** of a given set of samples of an _Harmonic Signal_.

The notebook presents:

 * Use of convolution layers in PyTorch.
 * Use of pool layers in PyTorch.
 * Use of adaptive pool layer in PyTorch.  
   The motivation is to set a constant output size regardless of input.
 * Use the model for inference on the test data.

</br>

 * <font color='brown'>(**#**)</font> While the layers are called _Convolution Layer_ they actually implement correlation.  
   Since the weights are learned, in practice it makes no difference as _Correlation_ is convolution with the a flipped kernel.



* <font color='red'>(**?**)</font> What kind of a problem it frequency estimation?

In [None]:
# Parameters

# Data
numSignalsTrain = 15_000 #<! Tune model's parameters
numSignalsVal   = 5_000  #<! Tune Hyper Parameters, Evaluate real world performance
numSignalsTest  = 5_000  #<! Real World performance

numSamples = 500 #<! Samples in Signal

maxFreq      = 10.0  #<! [Hz]
samplingFreq = 100.0 #<! [Hz]

σ = 0.1 #<! Noise Std

# Model
dropP = 0.1 #<! Dropout Layer

# Training
batchSize   = 256
numWork     = 2 #<! Number of workers
nEpochs     = 20

# Visualization
numSigPlot = 5

## Generate / Load Data

This section generates the data from the following model:

$$\boldsymbol{x}_{i} \left( t \right) = \sin \left( 2 \pi {f}_{i} t + \phi_{i} \right) + \boldsymbol{n}_{i} \left( t \right) $$

 * $\boldsymbol{x}_{i}$ - Set of input samples (`numSamples`) of the $i$ -th _Harmonic Signal_.
 * ${f}_{i}$ - The signal frequency in the range `[0, maxFreq]`.
 * $t$ - Time sample.
 * $\phi_{i}$ - The signal phase in the range `[0, 2π]`.
 * $\boldsymbol{n}_{i}$ - Set of noise samples (`numSamples`) of the $i$ -th _Harmonic Signal_.  
   The noise standard deviation is set to `σ`.

The model input data is in the  format: ${N}_{b} \times {C} \times N$:

 * ${N}_{b}$ - Number of signals in the batch (`batchSize`).
 * $C$ - Number of channels (The signal is a single channel).

</br>

* <font color='brown'>(**#**)</font> Since it is a generated data set, data can be generated at will.
* <font color='brown'>(**#**)</font> The signal 1D with a single channel. EEG signals, Stereo Audio signal, ECG signals are examples for multi channel 1D signals.  
  The concept of 1D in this context is the index which the data is generated along.

In [None]:
# Generate Data

mXTrain, vYTrain = GenHarmonicData(numSignalsTrain, numSamples, samplingFreq, maxFreq, σ) #<! Train Data
mXVal, vYVal     = GenHarmonicData(numSignalsVal, numSamples, samplingFreq, maxFreq, σ)   #<! Validation Data
mXTest, vYTest   = GenHarmonicData(numSignalsTest, numSamples, samplingFreq, maxFreq, σ)  #<! Test Data

vT = np.linspace(0, numSamples - 1, numSamples) / samplingFreq #<! For plotting

mX = torch.concatenate((mXTrain, mXVal), axis = 0)
vY = torch.concatenate((vYTrain, vYVal), axis = 0)

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')

* <font color='red'>(**?**)</font> What is the content of `vY` above? Explain its shape.

### Plot Data

In [None]:
# Plot the Data

hF, hA = plt.subplots(figsize = (14, 5))
for _ in range(numSigPlot):
    idxSig = np.random.randint(numSignalsTrain)
    vX  = mXTrain[idxSig,:]
    hA.plot(vT, vX, lw = 2, label = f'$f_i = {(vYTrain[idxSig].item()):.3f}$')

hA.set_title('Train Signals')
hA.set_xlabel('Time')
hA.set_ylabel('Sample Value')
hA.legend()
plt.show()

### Input Data

There are several ways to convert the data into the shape expected by the model convention:

```python
# Assume: mX.shape = (N, L)
mX = mX.view(N, 1, L) #<! Option I
mX = mX.unsqueeze(1)  #<! Option II
mX = mX[:, None, :]   #<! Option III
#Output: mX.shape = (N, 1, L)
```

### PyTorch DataSet  

PyTorch `Dataset` is a _pseudo array_ which allows to fetch samples from a storage (Memory / Hard Drive / Net / etc...).  
It must define 2 methods:

1. `__init__()`  
   Sets the configuration of the Dataset.  
   Usually used to set the path of the files on the HD and transformations of the data.
2. `__len__()`  
   Returns the number of samples available in the dataset.
3. `__getitem__()`  
   Given an index it fetches the sample and label.  
   Applies a transformation on the data / label if defined.

In [None]:
# Data Sets

# Dataset is relatively small. Hence we can use the built in `TensorDataset` which is a subclass of `Dataset`.
dsTrain = torch.utils.data.TensorDataset(mXTrain.view(numSignalsTrain, 1, -1), vYTrain) #<! -1 -> Infer
dsVal   = torch.utils.data.TensorDataset(mXVal.view(numSignalsVal, 1, -1), vYVal)
dsTest  = torch.utils.data.TensorDataset(mXTest.view(numSignalsTest, 1, -1), vYTest)

* <font color='red'>(**?**)</font> Does the data require standardization? Why?

### Train by Epochs

In Deep Learning the data is usually trained in batches.  
The motivations are:

 * Memory Limitations.
 * Speed.
 * Regularization (Avoid Overfit).

An _Epoch_ is a set of batches which consists the whole data set.

* <font color='brown'>(**#**)</font> If a batch is the size of the whole data set, each iteration is an _Epoch_.

<!-- ![Number Iterations per Epoch for a Batch Size](https://i.imgur.com/XvK4QtL.png)
 
 
 * Credit to [Chandra Prakash Bathula - Demystifying Epoch in Machine Learning: Unleashing the Power of Iterative Learning](https://scribe.rip/979f4ae5a5b6). -->

![Number Iterations per Epoch for a Batch Size](https://i.imgur.com/HLoYAna.png)

### PyTorch Data Loaders

PyTorch data loaders (`DataLoader`) are _Iterators_ which iterates over an _Epoch_ given a `Dataset`.  
They are optimized to work in parallel in order to keep the accelerator (GPU) fed.

In [None]:
# Data Loaders

# Data is small, no real need for workers
dlTrain = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = 1 * batchSize, num_workers = numWork, drop_last = True, persistent_workers = True)
dlVal   = torch.utils.data.DataLoader(dsVal, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)
dlTest  = torch.utils.data.DataLoader(dsTest, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)

In [None]:
# Iterate on the Loader
# The first batch.
tX, vY = next(iter(dlTrain)) #<! PyTorch Tensors

print(f'The batch features dimensions: {tX.shape}')
print(f'The batch labels dimensions: {vY.shape}')

## Train a Model

During training there 2 operational states:

 * Training Mode  
   Iterating over the _Training Data_ in order to optimize the model weights.  
   The computational graph and the layers are set to _Train Mode_.
 * Evaluation Mode  
   Iterating over the _Validation Data_ in order to evaluate the model.
   The computational graph and the layers are set to _Inference Mode_.

### Training Mode Iteration

![](https://i.imgur.com/m4IKzHU.png)
<!-- ![](https://i.postimg.cc/NF08G3TR/Diagrams-Deep-Learning001.png) -->

The training mode iteration is composed of:

 1. Fetching the iteration data (Batch) using the _Data Loader_ of the _Training Data_.
 2. Executing a _Forward Pass_ of the model using the batch.  
    The pass generates the intermediate values of the Computational Graph and the model output.
 3. Running the _Loss Function_ to calculate the _loss_ of the iteration.  
    Using the data labels and the model output, the _Loss Function_ calculates the loss and the _Gradient_ of the loss with respect to the model output.
 4. Executing a _Backward Pass_ using the _Gradient_ of the loss with respect to the model output.  
    The pass generates propagates the loss gradient with respect to each parameter of the model.
 5. Executing an _Optimization Step_.  
    The _Optimizer_ applies an update rule to tune the model weights.

* <font color='brown'>(**#**)</font> One may aggregate several iterations into a single optimization step (_Gradient Accumulation_).  
  It is mostly used when there are some memory constraints on the batch size.
* <font color='brown'>(**#**)</font> _Schedulers_ may be used in order to make the Optimizer adaptive to the iteration index or the loss / score curve.

### Evaluation Mode Iteration

Commonly happens once in a few iterations of training.

![](https://i.imgur.com/M5zIs5K.png)
<!-- ![](https://i.postimg.cc/YqvKDcwJ/Diagrams-Deep-Learning002.png) -->

1. Fetching the iteration data (Batch) using the _Data Loader_ of the _Validation Data_.
2. Executing a _Forward Pass_ of the model using the batch.  
   The pass generates the model output.
3. Running the _Score Function_ to calculate the _Score_ of the iteration.  
   Using the data labels and the model output, the _Score Function_ calculates the score.
4. Early Stopping (Optional)  
   Using the curve of the score one may save the best model and decide on stopping the training process.


* <font color='brown'>(**#**)</font> Commonly an evaluation iteration happens after each Epoch.  
  Yet it should be defined according to the time budget and the computational cost of each iteration.

<br>

The rest of the notebook goes through a suggested workflow of training a model.

### Define the Model

The model is defined as a sequential model.


In [None]:
# Model
# Defining a sequential model.

numFeatures = mX.shape[1]

def GetModel( ) -> nn.Module:
    oModel = nn.Sequential(
        nn.Identity(),
        
        nn.Conv1d(in_channels = 1,   out_channels = 32,  kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
        nn.Conv1d(in_channels = 32,  out_channels = 64,  kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
        nn.Conv1d(in_channels = 64,  out_channels = 128, kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
        nn.Conv1d(in_channels = 128, out_channels = 256, kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
                
        nn.AdaptiveAvgPool1d(output_size = 1),
        nn.Flatten          (),
        nn.Linear           (in_features = 256, out_features = 1),
        nn.Flatten          (start_dim = 0),
        )
    
    return oModel

* <font color='brown'>(**#**)</font> The [`torch.nn.AdaptiveAvgPool1d`](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool1d.html) allows the same output shape regard less of the  input.
* <font color='red'>(**?**)</font> What is the role of the [`torch.nn.Flatten`](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) layers?

In [None]:
# Model Summary

oModel = GetModel()
torchinfo.summary(oModel, tX.shape, col_names = ['input_size', 'output_size', 'num_params'], device = 'cpu')

* <font color='brown'>(**#**)</font> Pay attention the dropout parameter of PyTorch is about the probability to zero out the value.

In [None]:
# Run Model
# Apply a test run.

mXX = torch.randn(batchSize, numSamples)
mXX = mXX.view(batchSize, 1, numSamples)
with torch.inference_mode():
    vYHat = oModel(mXX)

print(f'The input dimensions: {mXX.shape}')
print(f'The output dimensions: {vYHat.shape}')

### Training Loop

Use the training and validation samples.  
The objective will be defined as the Mean Squared Error and the score as ${R}^{2}$.

* <font color='red'>(**?**)</font> Will the best model loss wise will be the model with the best score?  
  Explain specifically and generally (For other loss and scores).

In [None]:
def RunEpoch( 
        oModel: nn.Module, 
        dlData: DataLoader, 
        hL: Callable, 
        hS: Callable, 
        oOpt: Optional[Optimizer] = None, 
        opMode: NNMode = NNMode.TRAIN 
    ) -> Tuple[float, float]:
    """
    Runs a single Epoch (Train / Test) of a model.  
    Input:
        oModel      - PyTorch `nn.Module` object.
        dlData      - PyTorch `Dataloader` object.
        hL          - Callable for the Loss function.
        hS          - Callable for the Score function.
        oOpt        - PyTorch `Optimizer` object.
        opMode      - An `NNMode` to set the mode of operation.
    Output:
        valLoss     - Scalar of the loss.
        valScore    - Scalar of the score.
    Remarks:
      - The `oDataSet` object returns a Tuple of (mX, vY) per batch.
      - The `hL` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a Tuple of `valLoss` (Scalar of the loss) and `mDz` (Gradient by the loss).
      - The `hS` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a scalar `valScore` of the score.
      - The optimizer is required for training mode.
    """
    
    epochLoss   = 0.0
    epochScore  = 0.0
    numSamples  = 0
    numBatches = len(dlData)

    runDevice = next(oModel.parameters()).device #<! CPU \ GPU

    if opMode == NNMode.TRAIN:
        oModel.train(True) #<! Equivalent of `oModel.train()`
        trainMode = True
    elif opMode == NNMode.INFERENCE:
        oModel.eval() #<! Equivalent of `oModel.train(False)`
        trainMode = False
    else:
        raise ValueError(f'The `opMode` value {opMode} is not supported!')
    
    for ii, (mX, vY) in enumerate(dlData):
        # Move Data to Model's device
        mX = mX.to(runDevice) #<! Lazy
        vY = vY.to(runDevice) #<! Lazy

        batchSize = mX.shape[0]
        
        if opMode == NNMode.TRAIN:
            # Forward
            mZ      = oModel(mX) #<! Model output
            valLoss = hL(mZ, vY) #<! Loss
            
            # Backward
            oOpt.zero_grad()    #<! Set gradients to zeros
            valLoss.backward()  #<! Backward
            oOpt.step()         #<! Update parameters
            oModel.eval()       #<! Inference mode for layers
        else: #<! Value of `opMode` was already validated
            with torch.inference_mode(): #<! The `torch.inference_mode()` scope is more optimized than `torch.no_grad()` 
                # No computational graph
                mZ      = oModel(mX) #<! Model output
                valLoss = hL(mZ, vY) #<! Loss

        with torch.inference_mode():
            # Score
            oModel.eval() #<! Ensure Evaluation Mode (Dropout / Normalization layers)
            valScore = hS(mZ, vY)
            # Normalize so each sample has the same weight
            epochLoss  += batchSize * valLoss.item()
            epochScore += batchSize * valScore.item()
            numSamples += batchSize
            oModel.train(trainMode) #<! Restore original mode

        print(f'\r{"Train" if trainMode else "Val"} - Iteration: {(ii + 1):3d} / {numBatches}, Loss: {valLoss:.6f}', end = '')
    
    print('', end = '\r')
            
    return epochLoss / numSamples, epochScore / numSamples

In [None]:
# Training Model Loop Function

def TrainModel( 
        oModel: nn.Module, 
        dlTrain: DataLoader, 
        dlVal: DataLoader, 
        oOpt: Optimizer, 
        numEpoch: int, 
        hL: Callable, 
        hS: Callable, *, 
        oSch: Optional[LRScheduler] = None, 
        oTBWriter: Optional[SummaryWriter] = None
    ) -> Tuple[nn.Module, List, List, List, List]:
    """
    Trains a model given test and validation data loaders.  
    Input:
        oModel      - PyTorch `nn.Module` object.
        dlTrain     - PyTorch `Dataloader` object (Training).
        dlVal       - PyTorch `Dataloader` object (Validation).
        oOpt        - PyTorch `Optimizer` object.
        numEpoch    - Number of epochs to run.
        hL          - Callable for the Loss function.
        hS          - Callable for the Score function.
        oSch        - PyTorch `Scheduler` (`LRScheduler`) object.
        oTBWriter   - PyTorch `SummaryWriter` object (TensorBoard).
    Output:
        lTrainLoss  - Scalar of the loss.
        lTrainScore - Scalar of the score.
        lValLoss    - Scalar of the score.
        lValScore   - Scalar of the score.
        lLearnRate  - Scalar of the score.
    Remarks:
      - The `oDataSet` object returns a Tuple of (mX, vY) per batch.
      - The `hL` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a Tuple of `valLoss` (Scalar of the loss) and `mDz` (Gradient by the loss).
      - The `hS` function should accept the `vY` (Reference target) and `mZ` (Output of the NN).  
        It should return a scalar `valScore` of the score.
      - The optimizer is required for training mode.
    """

    lTrainLoss  = []
    lTrainScore = []
    lValLoss    = []
    lValScore   = []
    lLearnRate  = []

    # Support R2
    bestScore = -1e9 #<! Assuming higher is better

    learnRate = oOpt.param_groups[0]['lr']

    for ii in range(numEpoch):
        startTime           = time.time()
        trainLoss, trainScr = RunEpoch(oModel, dlTrain, hL, hS, oOpt, opMode = NNMode.TRAIN) #<! Train
        valLoss,   valScr   = RunEpoch(oModel, dlVal, hL, hS, oOpt, opMode = NNMode.INFERENCE) #<! Score Validation
        if oSch is not None:
            # Adjusting the scheduler on Epoch level
            learnRate = oSch.get_last_lr()[0]
            oSch.step()
        epochTime           = time.time() - startTime

        # Aggregate Results
        lTrainLoss.append(trainLoss)
        lTrainScore.append(trainScr)
        lValLoss.append(valLoss)
        lValScore.append(valScr)
        lLearnRate.append(learnRate)

        if oTBWriter is not None:
            oTBWriter.add_scalars('Loss (Epoch)', {'Train': trainLoss, 'Validation': valLoss}, ii)
            oTBWriter.add_scalars('Score (Epoch)', {'Train': trainScr, 'Validation': valScr}, ii)
            oTBWriter.add_scalar('Learning Rate', learnRate, ii)
        
        # Display (Babysitting)
        print('Epoch '              f'{(ii + 1):4d} / ' f'{numEpoch}', end = '')
        print(' | Train Loss: '     f'{trainLoss          :6.3f}', end = '')
        print(' | Val Loss: '       f'{valLoss            :6.3f}', end = '')
        print(' | Train Score: '    f'{trainScr           :6.3f}', end = '')
        print(' | Val Score: '      f'{valScr             :6.3f}', end = '')
        print(' | Epoch Time: '     f'{epochTime          :5.2f}', end = '')

        # Save best model ("Early Stopping")
        if valScr > bestScore:
            bestScore = valScr
            try:
                dCheckPoint = {'Model': oModel.state_dict(), 'Optimizer': oOpt.state_dict()}
                if oSch is not None:
                    dCheckPoint['Scheduler'] = oSch.state_dict()
                torch.save(dCheckPoint, 'BestModel.pt')
                print(' | <-- Checkpoint!', end = '')
            except:
                print(' | <-- Failed!', end = '')
        print(' |')
    
    # Load best model ("Early Stopping")
    # dCheckPoint = torch.load('BestModel.pt')
    # oModel.load_state_dict(dCheckPoint['Model'])

    return oModel, lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate

### Training Phase

In [None]:
# Check GPU Availability

runDevice = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device
oModel    = oModel.to(runDevice) #<! Transfer model to device

In [None]:
# Set the Loss & Score

hL = nn.MSELoss() #<! Mean Squared Error
hS = R2Score(multioutput = 'uniform_average') #<! R² Score
hS = hS.to(runDevice)

In [None]:
# Define Optimizer

oOpt = torch.optim.AdamW(oModel.parameters(), lr = 1e-4, betas = (0.9, 0.99), weight_decay = 1e-5) #<! Define optimizer

In [None]:
# Train the Model

oRunModel, lTrainLoss, lTrainScore, lValLoss, lValScore, _ = TrainModel(oModel, dlTrain, dlTest, oOpt, nEpochs, hL, hS)

In [None]:
# Plot Results
hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 6))
vHa = vHa.flat

hA = vHa[0]
hA.plot(lTrainLoss, lw = 2, label = 'Train Loss')
hA.plot(lValLoss, lw = 2, label = 'Validation Loss')
hA.grid()
hA.set_title('Cross Entropy Loss')
hA.set_xlabel('Epoch Index')
hA.set_ylabel('Loss')
hA.legend();


hA = vHa[1]
hA.plot(lTrainScore, lw = 2, label = 'Train Score')
hA.plot(lValScore, lw = 2, label = 'Validation Score')
hA.grid()
hA.set_title('Accuracy Score')
hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score')
hA.legend();

## Results Analysis 

This section runs the model on the test data and analyze results.

### Test Data Results

In [None]:
# Run on Test Data
lYY     = []
lYYHat  = []
with torch.inference_mode():
    for tXX, vYY in dlTest:
        tXX = tXX.to(runDevice)
        lYY.append(vYY)
        lYYHat.append(oModel(tXX))

vYY    = torch.cat(lYY, dim = 0).cpu().numpy()
vYYHat = torch.cat(lYYHat, dim = 0).cpu().numpy()

* <font color='brown'>(**#**)</font> One could run the above using `mXTest`.  
  The motivation is to show the general way which can handle large data set.

In [None]:
# Plot Regression Result

# Plot the Data

scoreR2 = hS(torch.tensor(vYY), torch.tensor(vYYHat))

hF, hA = plt.subplots(figsize = (14, 5))
hA = PlotRegressionResults(vYY, vYYHat, hA = hA)
hA.set_title(f'Test Data Set, R2 = {scoreR2:0.2f}')
hA.grid()
hA.set_xlabel('Input Frequency')
hA.set_ylabel('Estimated Frequency')

plt.show()

* <font color='red'>(**?**)</font> Can you find where the model struggles?
* <font color='red'>(**?**)</font> Can it handle shorter signals? For examples 200 samples. How?
* <font color='red'>(**?**)</font> How will it generalize to cases with frequency above `maxFreq`?

### Extended Test Set

This section shows the performance of the model on data with frequencies spanned on the range `[0, 2 * maxFreq]`.

In [None]:
# Generate Data

mXTest, vYTest = GenHarmonicData(numSignalsTest, numSamples, samplingFreq, 2 * maxFreq, σ)  #<! Test Data

In [None]:
# Run on Test Data

with torch.inference_mode(): #<! No Computational Graph
    vYTestHat = oModel(mXTest.view(numSignalsTest, 1, -1).to(runDevice))

vYTestHat = vYTestHat.cpu()

In [None]:
# Plot Regression Result

# Plot the Data

scoreR2 = hS(vYTest, vYTestHat)

hF, hA = plt.subplots(figsize = (14, 5))
hA = PlotRegressionResults(vYTest.cpu().numpy(), vYTestHat.cpu().numpy(), hA = hA)
hA.set_title(f'Test Data Set, R2 = {scoreR2:0.2f}')
hA.grid()
hA.set_xlabel('Input Frequency')
hA.set_ylabel('Estimated Frequency')

plt.show()

* <font color='red'>(**?**)</font> Why does the model perform poorly?
* <font color='brown'>(**#**)</font> DL models do extrapolate and able to generalize (Think model which plays Chess).  
  Yet in order to generalize, the model loss and architecture must reflect the task well.  
  For most common cases, one must validate the train set matches the production data.