<hr>
<center> <h1>Voice spoofing detection </h1> <h3>Validation (1)</h3></center>
<center> <h3> By Alhasan Alkhaddour</h3></center>
<center> <h5><a href = "mailto: alkhaddour.alhasan@gmail.com">alkhaddour.alhasan@gmail.com</a></h5></center>
<center> <h5>Last updated 13/12/2021</h5></center>

<hr>


<h3> Task summary </h3>
<p>Human voices is one of the methods used authenticate identity using automatic speaker verification (ASV) systems. However, these systems are vulnerable to voice spoofing attacks, such as replay attacks, text-to-speech, and voice conversion. Recently, researchers are becoming more interestd in developing ASV systems more reliable against such attacks.</p>

<h3> Notes </h3>
<ul>
    <li> In this notebook we will validate the model trained in <a href="https://mfd.sk/nX4LvUe9xl3k5XmbciY1nwUq">this notebook</a>.</li>
    <li> We will choose the classification threshold which maximaize AUPRC for the validation set. </li>
    <li> Using this threshold, we will calculate perfomance metrics for the train and validation sets. </li>
    <li> Finally we provide the score for the test set. </li>
    
</ul>

### Load libraries

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.optim.lr_scheduler import StepLR
from RNN_trainer import train
from datasets import ReplySpoofDataset, collate_fn_pad
from config import LSTM_NUM_LAYERS, HIDDEN_SIZE, BATCH_SIZE, INPUT_SIZE, LINEAR_SIZE, OUTPUT_SIZE, LR, N_EPOCHS, \
    TRAIN_INDEX, VAL_INDEX, MODELS_DIR, MODEL_NAME, OUTPUT_DIR, TEST_RAW_DIR, SCALER_PATH, SCHEDULER_STEP_SIZE, \
    SCHEDULER_GAMMA
from utilities.basic_utils import make_valid_path, get_accelerator, export_incorrect_samples_to_csv
from utilities.model_utils import ModelManager, Metrics
from utilities.disply_utils import plot_losses, info
from RNN_tester import test, test_sample
from models import AntiSpoofingRNN
import IPython.display as ipd
import pickle
from tqdm import tqdm
import pandas as pd
import time
import csv
import os


### Define datasets

In [2]:
%%time
train_dataset= ReplySpoofDataset(TRAIN_INDEX)
val_dataset = ReplySpoofDataset(VAL_INDEX)

[2021-12-13 18:25:05.314102] -- Loading data (40001 files)...


100%|███████████████████████████████████████████████████████████████████████████| 40001/40001 [02:29<00:00, 267.26it/s]


[2021-12-13 18:27:35.502189] -- Done...
[2021-12-13 18:27:36.030767] -- Loading data (9999 files)...


100%|█████████████████████████████████████████████████████████████████████████████| 9999/9999 [00:25<00:00, 386.44it/s]


[2021-12-13 18:28:02.411332] -- Done...
Wall time: 2min 57s


### Define data loaders

In [3]:
train_loader = DataLoader(train_dataset, shuffle=True, batch_size=BATCH_SIZE, collate_fn=collate_fn_pad)
val_loader = DataLoader(val_dataset, shuffle=True, batch_size=BATCH_SIZE, collate_fn=collate_fn_pad)

### Validating model performance

In [4]:
# Used saved model to ensure that weights were successfully saved
info("Validating model")
model = AntiSpoofingRNN(INPUT_SIZE, HIDDEN_SIZE, LSTM_NUM_LAYERS, LINEAR_SIZE, OUTPUT_SIZE)
model_manager = ModelManager(MODEL_NAME, make_valid_path(MODELS_DIR, is_dir=True, exist_ok=True))
model, _ = model_manager.load_checkpoint(MODEL_NAME + '.pkl', model)

[2021-12-13 18:28:02.957199] -- Validating model


In [5]:
# Determine threshold based on validation data, then calculate metrics
info("Validating performance on validation data")
files, y_true, y_pred, y_prob, threshold = test(model, val_loader, pred_threshold='auto', device=get_accelerator('cuda'))
info(f"Classification threshold = {threshold}")
export_incorrect_samples_to_csv(files, y_pred, y_true, os.path.join(OUTPUT_DIR, 'val_incorrect.csv'))

info("Calculating validation metrics")
val_metrics, val_metrics_str = Metrics.calculate_metrics(y_true, y_pred, y_prob)
info(f"Validation accuracy = {val_metrics['Accuracy']:0.4f}")

# Then find train metrics
info("Validating performance on train data")
files, y_true, y_pred, y_prob, threshold = test(model, train_loader, pred_threshold=threshold, 
                                                device=get_accelerator('cuda'))
export_incorrect_samples_to_csv(files, y_pred, y_true, os.path.join(OUTPUT_DIR, 'train_incorrect.csv'))

info("Calculating train metrics")
train_metrics, train_metrics_str = Metrics.calculate_metrics(y_true, y_pred, y_prob)
info(f"Train accuracy = {train_metrics['Accuracy']:0.4f}")

# Save metrics to file
with open(os.path.join(OUTPUT_DIR, 'metrics.csv'), 'w') as f:
    train_metrics['#'] = 'Train'
    val_metrics['#'] = 'Validation'
    w = csv.DictWriter(f, sorted(val_metrics.keys()))
    w.writeheader()
    w.writerow(train_metrics)
    w.writerow(val_metrics)

info("Validating model done!")

[2021-12-13 18:28:03.709946] -- Validating performance on validation data


Processing batch # 156/157: : 157it [00:12, 12.26it/s]


[2021-12-13 18:28:20.167603] -- Classification threshold = 0.4798378646373749
[2021-12-13 18:28:20.700978] -- Calculating validation metrics
[2021-12-13 18:28:21.243534] -- Validation accuracy = 97.3597
[2021-12-13 18:28:21.744031] -- Validating performance on train data


Processing batch # 625/626: : 626it [00:35, 17.41it/s]


[2021-12-13 18:28:58.268309] -- Calculating train metrics
[2021-12-13 18:28:58.824226] -- Train accuracy = 98.1275
[2021-12-13 18:28:59.333199] -- Validating model done!


### Generating scores for test set

In [6]:
with open(SCALER_PATH, "rb") as f:
    scaler = pickle.load(f)

In [7]:
test_files = []
test_scores = []
test_preds = []
bar = tqdm(os.listdir(TEST_RAW_DIR))
time.sleep(1.0)

for wav_file in bar:
    bar.set_description(wav_file)
    audio_path = os.path.join(TEST_RAW_DIR, wav_file)
    score, code = test_sample(model, audio_path, scaler, pred_threshold=threshold, return_types=['score', 'code'])
    test_files.append(wav_file)
    test_scores.append(score)
    test_preds.append(code)
    

sample_4999.wav: 100%|█████████████████████████████████████████████████████████████| 5000/5000 [02:33<00:00, 32.56it/s]


### Save predictions to file

In [8]:
score_file = os.path.join(OUTPUT_DIR, 'test_scores.csv')
predictions_file = os.path.join(OUTPUT_DIR, 'test_predictions.csv')
pd.DataFrame(zip(test_files, test_scores)).to_csv(score_file, index=False, header=['Filename', 'Score'])
pd.DataFrame(zip(test_files, test_preds)).to_csv(predictions_file, index=False, header=['Filename', 'Score'])

### Listen and Test some sample file

In [9]:
sample_audio = 'E:/Datasets/ID R&D/data/raw/Testing_Data/sample_0000.wav'
ipd.Audio(sample_audio)

In [10]:
out = test_sample(model, sample_audio, scaler, return_types='class_name', pred_threshold=threshold)
info(f"The sample audio '{os.path.basename(sample_audio)}' is: {out}")

[2021-12-13 18:31:33.603033] -- The sample audio 'sample_0000.wav' is: human
