# Audio Classification with CNN under Cross-Validation Experiment
Try to analysis the data with,
+ accuracy
+ precision
+ recall
  
both of training step and validation stage.
Meanwhile, try to use the plot-box to demonstrates the data.

# The Cross-Validation under Multi-Classification
`This explaination comes from the Chat-GPT4o.`
Calculating precision and recall for a multi-class classification algorithm involves considering each class separately and then aggregating the results. Here's how you can do it step by step:

### 1. Definitions

- **Precision** for a class $C_i$: The ratio of true positive predictions for class $C_i$ to the total number of instances predicted as $C_i$ (i.e., true positives + false positives).
  
  $$
  \text{Precision}(C_i) = \frac{\text{True Positives}(C_i)}{\text{True Positives}(C_i) + \text{False Positives}(C_i)}
  $$

- **Recall** for a class $C_i $: The ratio of true positive predictions for class $ C_i $ to the total number of instances that actually belong to $ C_i $ (i.e., true positives + false negatives).

  $$
  \text{Recall}(C_i) = \frac{\text{True Positives}(C_i)}{\text{True Positives}(C_i) + \text{False Negatives}(C_i)}
  $$

### 2. Confusion Matrix

For a multi-class classification problem, you typically generate a confusion matrix, where each row represents the actual class, and each column represents the predicted class. The element at position $ (i, j) $ in the confusion matrix represents the number of instances of class $ i $ that were predicted as class $ j $.

### 3. Calculation for Each Class

- **True Positives (TP)**: The diagonal element for the class $ C_i $ in the confusion matrix, i.e., $ \text{TP}(C_i) = \text{Confusion Matrix}[i][i] $.
- **False Positives (FP)**: The sum of the corresponding column $ i $ minus the true positives, i.e., $ \text{FP}(C_i) = \sum_j \text{Confusion Matrix}[j][i] - \text{Confusion Matrix}[i][i] $.
- **False Negatives (FN)**: The sum of the corresponding row $ i $ minus the true positives, i.e., $ \text{FN}(C_i) = \sum_j \text{Confusion Matrix}[i][j] - \text{Confusion Matrix}[i][i] $.

### 4. Precision and Recall for Each Class

Using the definitions and the components from the confusion matrix, you can calculate precision and recall for each class $ C_i $:

$$
\text{Precision}(C_i) = \frac{\text{TP}(C_i)}{\text{TP}(C_i) + \text{FP}(C_i)}
$$

$$
\text{Recall}(C_i) = \frac{\text{TP}(C_i)}{\text{TP}(C_i) + \text{FN}(C_i)}
$$

### 5. Aggregating Results

To get overall metrics, you can use macro-averaging or micro-averaging:

- **Macro-Averaging**: Calculate the metrics for each class independently and then take the average.

  $$
  \text{Macro Precision} = \frac{1}{N} \sum_{i=1}^{N} \text{Precision}(C_i)
  $$

  $$
  \text{Macro Recall} = \frac{1}{N} \sum_{i=1}^{N} \text{Recall}(C_i)
  $$

- **Micro-Averaging**: Aggregate the contributions of all classes to calculate the metrics. Essentially, you sum up the true positives, false positives, and false negatives across all classes and then calculate precision and recall.

  $$
  \text{Micro Precision} = \frac{\sum_{i=1}^{N} \text{TP}(C_i)}{\sum_{i=1}^{N} (\text{TP}(C_i) + \text{FP}(C_i))}
  $$

  $$
  \text{Micro Recall} = \frac{\sum_{i=1}^{N} \text{TP}(C_i)}{\sum_{i=1}^{N} (\text{TP}(C_i) + \text{FN}(C_i))}
  $$

### Example

Let's consider a simple example with three classes A, B, and C, and the following confusion matrix:

$$
\begin{array}{c|ccc}
    & \text{Pred A} & \text{Pred B} & \text{Pred C} \\
    \hline
    \text{Actual A} & 40 & 5 & 5 \\
    \text{Actual B} & 3 & 50 & 2 \\
    \text{Actual C} & 2 & 4 & 45 \\
\end{array}
$$

For class A:

- $\text{TP}(A) = 40$
- $\text{FP}(A) = 3 + 2 = 5$
- $\text{FN}(A) = 5 + 5 = 10$

$$
\text{Precision}(A) = \frac{40}{40 + 5} = \frac{40}{45} \approx 0.89
$$

$$
\text{Recall}(A) = \frac{40}{40 + 10} = \frac{40}{50} = 0.8
$$

You can calculate precision and recall similarly for classes B and C and then aggregate the results using macro-averaging or micro-averaging as needed.

In [1]:
import pandas as pd
from pathlib import Path 

datasource_path = Path.home()/'dataset'/'UrbanSound8K'

# read the metadata file
metadata_file_path = datasource_path/'metadata'/'UrbanSound8K.csv'
df = pd.read_csv(metadata_file_path)
df.head()

Unnamed: 0,slice_file_name,fsID,start,end,salience,fold,classID,class
0,100032-3-0-0.wav,100032,0.0,0.317551,1,5,3,dog_bark
1,100263-2-0-117.wav,100263,58.5,62.5,1,5,2,children_playing
2,100263-2-0-121.wav,100263,60.5,64.5,1,5,2,children_playing
3,100263-2-0-126.wav,100263,63.0,67.0,1,5,2,children_playing
4,100263-2-0-137.wav,100263,68.5,72.5,1,5,2,children_playing


In [2]:
df['path'] = '/fold' + df['fold'].astype(str) + '/' + df['slice_file_name'].astype(str)
df.head()

Unnamed: 0,slice_file_name,fsID,start,end,salience,fold,classID,class,path
0,100032-3-0-0.wav,100032,0.0,0.317551,1,5,3,dog_bark,/fold5/100032-3-0-0.wav
1,100263-2-0-117.wav,100263,58.5,62.5,1,5,2,children_playing,/fold5/100263-2-0-117.wav
2,100263-2-0-121.wav,100263,60.5,64.5,1,5,2,children_playing,/fold5/100263-2-0-121.wav
3,100263-2-0-126.wav,100263,63.0,67.0,1,5,2,children_playing,/fold5/100263-2-0-126.wav
4,100263-2-0-137.wav,100263,68.5,72.5,1,5,2,children_playing,/fold5/100263-2-0-137.wav


In [4]:
from lib.crossValUtils import calIndexes
from lib.wavDataUtil import WavDataset
audio_path = datasource_path/'audio'

audioDS = WavDataset(df= df, data_path= audio_path)

audio, class_id = audioDS[0]
audio.shape, class_id

OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory

In [None]:
from lib.crossValUtils import calIndexes, switchFold

indexes = calIndexes(dataset=audioDS, n_flod=10)
train_ids, val_ids = switchFold(val_fold=2, indexes=indexes)

# train_ids = df[df['fold'] != 5].index.to_numpy()
# val_ids = df[df['fold'] == 5].index.to_numpy()

train_ids.shape, val_ids.shape

In [None]:
from lib.crossValUtils import SubsetDs

train_ds = SubsetDs(dataset=audioDS, indexes=train_ids)
val_ds = SubsetDs(dataset=audioDS, indexes=val_ids)

audio, class_id = val_ds[9]
audio.shape, class_id

In [None]:
from torch.utils.data import DataLoader

train_dl = DataLoader(dataset=train_ds, batch_size=16, shuffle=True)
val_dl = DataLoader(dataset=val_ds, batch_size=16, shuffle=False)

feature, labels = next(iter(val_dl))
feature.shape, labels

In [None]:
import torch 
from lib.acModel import AudioClassifier

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AudioClassifier().to(device=device)

model

In [None]:
from lib.acModel import Processor
Processor.training(model=model, train_dl=train_dl, num_epochs=1, device=device)

In [None]:
from lib.crossValUtils import ValidationRecord
import torch

vr = ValidationRecord(n_fold=10, label_size=10)

model.eval()
with torch.no_grad():
    for (feature, labels) in val_dl:
        feature, labels = feature.to(device), labels.to(device)
        outputs = model(feature)
        vr.noteRecord(outputs=outputs, labels=labels, val_fold=2, iter=0)

vr.calRecord()
records = vr.getRecord()
records[records['val_fold'] == 2]

In [None]:
records[records['val_fold']==5]

In [None]:
Processor.inference(model=model, val_dl=val_dl, device=device)

In [None]:
from lib.crossValUtils import corss_validation_analysis
from torch.utils.data import Dataset

def training(dataset: Dataset, indexes):
    train_ds = SubsetDs(dataset=dataset, indexes=indexes)
    train_dl = DataLoader(dataset=train_ds, batch_size=16, shuffle=True)
    model = AudioClassifier().to(device=device)
    Processor.training(model=model, train_dl=train_dl, num_epochs=1, device=device)
    return model

def inference(model, iter: int, val_fold: int, dataset: Dataset, indexes, records: ValidationRecord):
    val_ds = SubsetDs(dataset=dataset, indexes=indexes)
    val_dl = DataLoader(dataset=val_ds, batch_size=16, shuffle=False)

    model.eval()
    with torch.no_grad():
        for (feature, labels) in val_dl:
            feature, labels = feature.to(device), labels.to(device)
            outputs = model(feature)
            records.noteRecord(outputs=outputs, labels=labels, val_fold=val_fold, iter=iter)

rc = corss_validation_analysis(dataset=audioDS, n_fold=10, label_size=10, tranFunc=training, inferFunc=inference, n_iter=2)
rc.calRecord()
rc_df = rc.getRecord()
rc_df.to_csv('CNN_AC_Cross_Val.csv', sep=',', header=True, encoding='utf-8', mode='w')
vLog = rc.getValidateLog()
vLog.to_csv('CNN_AC_Cross_Val_Log.csv', sep=',', header=True, encoding='utf-8', mode='w')

In [None]:
iters = torch.ones((20,1))
iters * 3