[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Deep Learning Methods

## Deep Learning for Computer Vision - 1D Convolution Net for Audio Classification

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 21/12/2025 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0085DeepLearning1DConvFreqEst.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.utils.data import Dataset
from torchmetrics.regression import R2Score
import torchinfo

# Miscellaneous
import os
from platform import python_version
import random

# Typing
from typing import Callable, Dict, Generator, List, Literal, Optional, Self, Sequence, Set, Tuple, Union
from numpy.typing import NDArray
from torch import Tensor

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

DATA_FOLDER_NAME           = 'DataSets'
TENSOR_BOARD_FOLDER_NAME   = 'TB'

BASE_FOLDER_NAME = 'FixelCourses'
BASE_FOLDER_PATH = os.getcwd()[:(len(os.getcwd()) - (os.getcwd()[::-1].lower().find(BASE_FOLDER_NAME.lower()[::-1])))]

In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataManipulation.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataVisualization.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DeepLearningPyTorch.py

In [None]:
# Courses Packages

from DataManipulation import DownloadUrl
from DataVisualization import PlotRegressionResults
from DeepLearningPyTorch import NNMode, TrainModel

In [None]:
# General Auxiliary Functions

# PyTorch Data Loader
class AudioMNISTDataset(Dataset):
    def __init__( self, filePath: str, *, targetCol: Literal['Digit', 'Speaker', 'Accent', 'Gender', 'NativeSpeaker'] = 'Digit', hMap: Callable[[Union[int, str]], int] ) -> None:
        """
        PyTorch Dataset class for the AudioMNIST dataset.
        Input:
            - filePath (str): Path to the Parquet file containing the AudioMNIST dataset.
            - targetCol (str): The target column to be used for classification. 
                               Options are 'Digit', 'Speaker', 'Accent', 'Gender', 'NativeSpeaker'.
            - hMap (Callable[[Union[int, str]], int]): A mapping function to convert target values to integer labels.
        """


        dfData = pd.read_parquet(filePath)

        lCol = dfData.columns.tolist()
        # Find the index of '0' in `lCol`
        signalStartIdx = lCol.index('0')

        
        self._dfData         = dfData
        self._targetCol      = targetCol
        self._numSamples     = len(dfData)
        self._hmap           = hMap
        self._signalStartIdx = signalStartIdx

    def __len__( self: Self ) -> int:
        
        return self._numSamples

    def __getitem__( self: Self, idx: int ) -> Tuple[NDArray, int]:

        valY = self._hmap(self._dfData.loc[idx, self._targetCol])
        numSamples = self._dfData.loc[idx, 'NumSamples']
        vA = self._dfData.iloc[idx, self._signalStartIdx:(self._signalStartIdx + numSamples)].to_numpy() #<! Int16 Values



        mX = 1

        return mX, valY
    
    def GenTrainValSplit( self: Self, valFraction: float = 0.2, *, shuffle: bool = True, randomSeed: Optional[int] = None ) -> Tuple[Sequence[int], Sequence[int]]:
        """
        Generates training and validation datasets from the current dataset.
        It uses the 'Speaker' column to ensure that samples from the same speaker are not split between training and validation sets.
        
        Input:
            - valFraction (float): Fraction of the dataset to be used for validation.
            - shuffle (bool): Whether to shuffle the dataset before splitting.
            - randomSeed (Optional[int]): Random seed for shuffling.
        Output:
            Tuple[NDArray, NDArray]: Training and validation indices.
        """

        oRng = random.Random(randomSeed)
        
        lSpeakerIdx = self._dfData['Speaker'].unique().tolist()
        numSpeakers = len(lSpeakerIdx)
        
        numValSpeakers   = int(np.floor(valFraction * numSpeakers))
        numTrainSpeakers = numSpeakers - numValSpeakers

        lTrainSpkIdx = oRng.sample(lSpeakerIdx, k = numTrainSpeakers)
        lValSpkIdx   = [spk for spk in lSpeakerIdx if spk not in lTrainSpkIdx]

        # Get indices of the samples for training and validation sets
        lTrainIdx = self._dfData[self._dfData['Speaker'].isin(lTrainSpkIdx)].index.tolist()
        lValIdx   = self._dfData[self._dfData['Speaker'].isin(lValSpkIdx)].index.tolist()

        return lTrainIdx, lValIdx
    

def DownloadUrlFile( fileUrl: str, destFolderPath: str, fileName: str ) -> str:
    """
    Downloads a file from a URL to a destination folder. 
    If the destination folder does not exist, it is created.
    If the file already exists in the destination folder, it is not downloaded again.

    Input:
        - fileUrl (str): The URL of the file to download.
        - fileName (str): The name to save the file as.
        - destFolderPath (str): The destination folder path.

    Output:
        str: The path to the downloaded file.
    """

    if not os.path.isdir(destFolderPath):
        os.makedirs(destFolderPath)
    

    filePath = os.path.join(destFolderPath, fileName)
    filePath = DownloadUrl(fileUrl, filePath)

    return filePath

## Frequency Estimation with 1D Convolution Model in PyTorch

This notebook **estimates the frequency** of a given set of samples of an _Harmonic Signal_.

The notebook presents:

 * Use of convolution layers in PyTorch.
 * Use of pool layers in PyTorch.
 * Use of adaptive pool layer in PyTorch.  
   The motivation is to set a constant output size regardless of input.
 * Use the model for inference on the test data.

</br>

 * <font color='brown'>(**#**)</font> While the layers are called _Convolution Layer_ they actually implement correlation.  
   Since the weights are learned, in practice it makes no difference as _Correlation_ is convolution with the a flipped kernel.



* <font color='red'>(**?**)</font> What kind of a problem it frequency estimation?

In [None]:
# Parameters

# Data
dataSetName     = 'AudioMNIST'
datasetFileUrl  = r'https://huggingface.co/datasets/Royi/AudioMNIST/resolve/main/AudioMNIST.parquet'
datasetFileName = 'AudioMNIST.parquet'


numSignalsTrain = 15_000 #<! Tune model's parameters
numSignalsVal   = 5_000  #<! Tune Hyper Parameters, Evaluate real world performance
numSignalsTest  = 5_000  #<! Real World performance

numSamples = 500 #<! Samples in Signal

maxFreq      = 10.0  #<! [Hz]
samplingFreq = 100.0 #<! [Hz]

σ = 0.1 #<! Noise Std

# Model
dropP = 0.1 #<! Dropout Layer

# Training
batchSize   = 256
numWork     = 2 #<! Number of workers
nEpochs     = 20

# Visualization
numSigPlot = 5

## Generate / Load Data

This section generates the data from the following model:



In [None]:
# Download Data

datasetFolderPath = os.path.join(BASE_FOLDER_PATH, DATA_FOLDER_NAME, dataSetName)
datasetFilePath = DownloadUrlFile(datasetFileUrl, datasetFolderPath, datasetFileName)

In [None]:
# Generate / Load Data

dfData = pd.read_parquet(datasetFilePath)
dfData

In [None]:
# Generate / Load Data

print(f'The number of samples: {dfData.shape[0]}')
print(f'The number of speakers: {dfData['Speaker'].nunique()}')

* <font color='red'>(**?**)</font> What is the content of `vY` above? Explain its shape.

### Plot Data

In [None]:
# Plot the Data

hF, hA = plt.subplots(figsize = (6, 4))
sns.histplot(data = dfData, x = 'Digit', bins = 10, discrete = True, kde = False, ax = hA)
hA.set_title('Distribution of Digits in AudioMNIST Dataset')
hA.set_xlabel('Digit')
hA.set_ylabel('Count');

In [None]:
hF, hA = plt.subplots(figsize = (6, 4))
sns.histplot(data = dfData, x = 'Speaker', bins = 10, discrete = True, kde = False, ax = hA)
hA.set_title('Distribution of Speakers in AudioMNIST Dataset')
hA.set_xlabel('Speaker')
hA.set_ylabel('Count');

In [None]:
hF, hA = plt.subplots(figsize = (6, 4))
sns.histplot(data = dfData, x = 'Gender', bins = 10, discrete = True, kde = False, ax = hA)
hA.set_title('Distribution of Genders in AudioMNIST Dataset')
hA.set_xlabel('Gender')
hA.set_ylabel('Count');

In [None]:
# See https://github.com/soerenab/AudioMNIST/issues/10
hF, hA = plt.subplots(figsize = (6, 4))
sns.histplot(data = dfData, x = 'Age', kde = False, ax = hA)
hA.set_title('Distribution of Ages in AudioMNIST Dataset')
hA.set_xlabel('Age')
hA.set_ylabel('Count');

In [None]:
valFraction = 0.2

oRng = random.Random(199)
        
lSpeakerIdx = dfData['Speaker'].unique().tolist()
numSpeakers = len(lSpeakerIdx)
        
numValSpeakers   = int(np.floor(valFraction * numSpeakers))
numTrainSpeakers = numSpeakers - numValSpeakers

lTrainSpkIdx = oRng.sample(lSpeakerIdx, k = numTrainSpeakers)
lValSpkIdx   = [spk for spk in lSpeakerIdx if spk not in lTrainSpkIdx]

# Get indices of the samples for training and validation sets
lTrainIdx = dfData[dfData['Speaker'].isin(lTrainSpkIdx)].index.tolist()
lValIdx   = dfData[dfData['Speaker'].isin(lValSpkIdx)].index.tolist()

In [None]:
dfData[dfData['Speaker'].isin(lTrainSpkIdx)].index.tolist()

### Input Data

There are several ways to convert the data into the shape expected by the model convention:

```python
# Assume: mX.shape = (N, L)
mX = mX.view(N, 1, L) #<! Option I
mX = mX.unsqueeze(1)  #<! Option II
mX = mX[:, None, :]   #<! Option III
#Output: mX.shape = (N, 1, L)
```

In [None]:
# Data Sets

dsTrain = torch.utils.data.TensorDataset(mXTrain.view(numSignalsTrain, 1, -1), vYTrain) #<! -1 -> Infer
dsVal   = torch.utils.data.TensorDataset(mXVal.view(numSignalsVal, 1, -1), vYVal)
dsTest  = torch.utils.data.TensorDataset(mXTest.view(numSignalsTest, 1, -1), vYTest)

* <font color='red'>(**?**)</font> Does the data require standardization? Why?

In [None]:
# Data Loaders

# Data is small, no real need for workers
dlTrain = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = 1 * batchSize, num_workers = numWork, drop_last = True, persistent_workers = True)
dlVal   = torch.utils.data.DataLoader(dsVal, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)
dlTest  = torch.utils.data.DataLoader(dsTest, shuffle = False, batch_size = 2 * batchSize, num_workers = numWork, persistent_workers = True)

In [None]:
# Iterate on the Loader
# The first batch.
tX, vY = next(iter(dlTrain)) #<! PyTorch Tensors

print(f'The batch features dimensions: {tX.shape}')
print(f'The batch labels dimensions: {vY.shape}')

## Define the Model

The model is defined as a sequential model.


In [None]:
# Model
# Defining a sequential model.

numFeatures = mX.shape[1]

def GetModel( ) -> nn.Module:
    oModel = nn.Sequential(
        nn.Identity(),
        
        nn.Conv1d(in_channels = 1,   out_channels = 32,  kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
        nn.Conv1d(in_channels = 32,  out_channels = 64,  kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
        nn.Conv1d(in_channels = 64,  out_channels = 128, kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
        nn.Conv1d(in_channels = 128, out_channels = 256, kernel_size = 11), nn.MaxPool1d(kernel_size = 2), nn.ReLU(),
                
        nn.AdaptiveAvgPool1d(output_size = 1),
        nn.Flatten          (),
        nn.Linear           (in_features = 256, out_features = 1),
        nn.Flatten          (start_dim = 0),
        )
    
    return oModel

* <font color='brown'>(**#**)</font> The [`torch.nn.AdaptiveAvgPool1d`](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool1d.html) allows the same output shape regard less of the  input.
* <font color='red'>(**?**)</font> What is the role of the [`torch.nn.Flatten`](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) layers?

In [None]:
# Model Summary

oModel = GetModel()
torchinfo.summary(oModel, tX.shape, col_names = ['input_size', 'output_size', 'num_params'], device = 'cpu')

* <font color='brown'>(**#**)</font> Pay attention the dropout parameter of PyTorch is about the probability to zero out the value.

In [None]:
# Run Model
# Apply a test run.

mXX = torch.randn(batchSize, numSamples)
mXX = mXX.view(batchSize, 1, numSamples)
with torch.inference_mode():
    vYHat = oModel(mXX)

print(f'The input dimensions: {mXX.shape}')
print(f'The output dimensions: {vYHat.shape}')

## Training Loop

Use the training and validation samples.  
The objective will be defined as the Mean Squared Error and the score as ${R}^{2}$.


* <font color='red'>(**?**)</font> Will the best model loss wise will be the model with the best score?  
  Explain specifically and generally (For other loss and scores).

## Train the Model

In [None]:
# Check GPU Availability

runDevice   = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device
oModel      = oModel.to(runDevice) #<! Transfer model to device

In [None]:
# Set the Loss & Score

hL = nn.MSELoss()
hS = R2Score(multioutput = 'uniform_average')
hS = hS.to(runDevice)

In [None]:
# Define Optimizer

oOpt = torch.optim.AdamW(oModel.parameters(), lr = 1e-4, betas = (0.9, 0.99), weight_decay = 1e-5) #<! Define optimizer

In [None]:
# Train the Model

oRunModel, lTrainLoss, lTrainScore, lValLoss, lValScore, _ = TrainModel(oModel, dlTrain, dlTest, oOpt, nEpochs, hL, hS)

In [None]:
# Plot Results
hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 6))
vHa = vHa.flat

hA = vHa[0]
hA.plot(lTrainLoss, lw = 2, label = 'Train Loss')
hA.plot(lValLoss, lw = 2, label = 'Validation Loss')
hA.grid()
hA.set_title('Cross Entropy Loss')
hA.set_xlabel('Epoch Index')
hA.set_ylabel('Loss')
hA.legend();


hA = vHa[1]
hA.plot(lTrainScore, lw = 2, label = 'Train Score')
hA.plot(lValScore, lw = 2, label = 'Validation Score')
hA.grid()
hA.set_title('Accuracy Score')
hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score')
hA.legend();

## Test Data Analysis 

This section runs the model on the test data and analyze results.

### Test Data Results

In [None]:
# Run on Test Data
lYY     = []
lYYHat  = []
with torch.inference_mode():
    for tXX, vYY in dlTest:
        tXX = tXX.to(runDevice)
        lYY.append(vYY)
        lYYHat.append(oModel(tXX))

vYY    = torch.cat(lYY, dim = 0).cpu().numpy()
vYYHat = torch.cat(lYYHat, dim = 0).cpu().numpy()

* <font color='brown'>(**#**)</font> One could run the above using `mXTest`.  
  The motivation is to show the general way which can handle large data set.

In [None]:
# Plot Regression Result

# Plot the Data

scoreR2 = hS(torch.tensor(vYY), torch.tensor(vYYHat))

hF, hA = plt.subplots(figsize = (14, 5))
hA = PlotRegressionResults(vYY, vYYHat, hA = hA)
hA.set_title(f'Test Data Set, R2 = {scoreR2:0.2f}')
hA.grid()
hA.set_xlabel('Input Frequency')
hA.set_ylabel('Estimated Frequency');

* <font color='red'>(**?**)</font> Can you find where the model struggles?
* <font color='red'>(**?**)</font> Can it handle shorter signals? For examples 200 samples. How?
* <font color='red'>(**?**)</font> How will it generalize to cases with frequency above `maxFreq`?

### Extended Test Set

This section shows the performance of the model on data with frequencies spanned on the range `[0, 2 * maxFreq]`.

In [None]:
# Generate Data

mXTest, vYTest = GenHarmonicData(numSignalsTest, numSamples, samplingFreq, 2 * maxFreq, σ)  #<! Test Data

In [None]:
# Run on Test Data

with torch.inference_mode():
    vYTestHat = oModel(mXTest.view(numSignalsTest, 1, -1).to(runDevice))

vYTestHat = vYTestHat.cpu()

In [None]:
# Plot Regression Result

# Plot the Data

scoreR2 = hS(vYTest, vYTestHat)

hF, hA = plt.subplots(figsize = (14, 5))
hA = PlotRegressionResults(vYTest.cpu().numpy(), vYTestHat.cpu().numpy(), hA = hA)
hA.set_title(f'Test Data Set, R2 = {scoreR2:0.2f}')
hA.grid()
hA.set_xlabel('Input Frequency')
hA.set_ylabel('Estimated Frequency');

* <font color='red'>(**?**)</font> Why does the model perform poorly?
* <font color='brown'>(**#**)</font> DL models do extrapolate and able to generalize (Think model which plays Chess).  
  Yet in order to generalize, the model loss and architecture has to built accordingly.  
  For most common cases, one must validate the train set matches the production data.