# Part 2 - Custom Testset Walkthrough
This notebook contains blocks to guide you through how to create your own test set and evaluate the pre-trained models with it. 

In [None]:
# Setup

# Import the git repo and install required libraries
!git clone --branch Showcase https://github.com/Hapemo/Deepfake-Audio-Detection/
%cd Deepfake-Audio-Detection/
%pip install -r requirements.txt

# Prepare data folder and download speech samples
!mkdir data
%cd data
!gdown 1WevCjrJJ7pv9XzCwjbyJ1iNr2uoOYnZI
!gdown 1TE88aXpA5YvTP5KwjM_1SYs3_HinXjDj
!gdown 1JvitJbYdjojw5ORYv42XTA9iairs8_40
!unzip showcase_samples.zip
!unzip SileroVAD_samples.zip
!unzip SPEAKER0001.zip
%cd ..


## Preparing your testset
There are 2 things to take note when you are making the dataset, the file type and naming convention.

First and simplest, the file type has to be .wav file

The naming convention is a little more complex. To make things simple, during training and evaluation, the model determines that a speech is spoofed if the naming of the file without extension is more than 8 characters. Eg, 12345678.wav is deemed bonafide and 123456789.wav is deemed spoof.

To use your own dataset, you will need to exercise your python skills a little to change the names to fit the testing ground. The 'os' module will be useful for managing path and the 'shutil' module will be useful for renaming the file. Below are some tips for your custom file names.

1. Using alphabet on any filename will ensure the naming will not clash with sample audio provided, because sample audio file name only contains number and '-'.
2. Please make sure your audio file extension is '.wav'

In [None]:
# Small code segment to demonstrate how to convert the files to suitable naming convention and 'wav' files. 
%pip install pydub # Comment this out when it's installed already
from pydub import AudioSegment
import os
import shutil

srcdir = "source directory of speech to convert"
dstdir = "final dircetory of speech converted"
counter = 0
spoof = False # Remember to change this!

if not os.path.exists(dstdir): os.makedirs(dstdir)

def nestLooper(dir: str, func):
    ''' Loop through all the files in the directory and sub directories, and apply a function on it '''
    for file in os.listdir(dir):
        nestedDir = os.path.join(dir, file)
        if os.path.isdir(nestedDir): nestLooper(nestedDir, func)
        func(nestedDir)

def mp3FileNamer(filepath:str):
    ''' Generate an unique name for speech file, convert mp3 file into wav and save it '''
    extension = filepath.split('.')[-1]
    
    if extension != 'mp3' and extension != 'wav': return

    global counter
    if spoof:
        newbasename = f"s{str(counter).zfill(10)}.wav"
    else:
        newbasename = f"b{str(counter).zfill(7)}.wav"
    counter += 1

    if extension == "mp3":
        sound = AudioSegment.from_mp3(filepath)
        sound.export(os.path.join(dstdir, newbasename), format="wav")
    else:
        shutil.copy(filepath, os.path.join(dstdir, newbasename))
    print(f"Converted {filepath} to {os.path.join(dstdir, newbasename)}")

nestLooper(srcdir, mp3FileNamer)


## Prepare your config file for dataset
The config file dictates which pretrained model, what model parameters, what hyperparameters, and what dataset to use. For this section, the focus will be on custom dataset, you wil need to take note of 6 settings in the config file.
1. database_path
    
    database path indicates the relative path of the database
2. use_new_fileloader
    
    Since the original AASIST codebase have their own dataloading method that is unsuitable for our usage, the value of this must be 1, to ensure it uses our custom dataloading method.
3. blacklist_folders, eval_folders, dev_folders, train_folders
    
    To explain how to configure this, we have to first introduce the loading structure. All the files and folders residing in database_path will be scanned, all nested folders and nested files. blacklist_folders indicates the folders that will be ignored when collecting data. eval_folders indicates the folders AND all their nested folders will be scanned for wav file, adding all scanned wav files to eval dataset. Same for dev_folders and train_folders. 
    
    After scanning, there will be a segmented_info.txt generated, containing information of data path and the dataset they belong to. This file will be created at the database_path. When you want to run a new dataset config, delete this text file, if not the model will just run the old dataset config to save time.

### Example
This is the folder structure

<pre>└── dataset_root/
   ├── data1/
   │   ├── eval_1/
   │   │   ├── eval1.1
   │   │   └── eval1.2
   │   ├── dev_1/
   │   │   ├── dev1.1
   │   │   └── dev1.2
   │   └── train
   ├── data2/
   │   ├── eval_2/
   │   │   ├── eval2.1
   │   │   └── eval2.2
   │   └── dev_train_2/
   │       ├── data2_dev
   │       └── 
   └── data3/
       ├── eval_1/
       │
       └── dev_train

"database_path": "dataset_root",
"blacklist_folders": "data3",
"eval_folders": "eval_1, eval_2",
"dev_folders": "dev_1, data2_dev",
"train_folders": "train, dev_train_2",
"use_new_fileloader": 1

</pre>
Final dataset will contain data from these folders,
- Eval Set: eval1.1, eval1.2, eval2.1, eval2.2
- Dev Set: dev1.1, dev1.2, data2_dev
- Train Set: train, data2_train

NOTE!
- All the dataset must contain at least one data, thus SPEAKER0001 (dummy data) was added in some of the config file's train and dev set to fulfill this requirement. 
- There must be at least one bonafide and one spoof data in eval dataset during testing. 


## Running evaluation
Navigate to the base directory of the repository, change the config file below and ensure there is '--eval' flag

The EER for the following config should be as follows:
- Apple1.1.config: 0.14065%
- Apple1.2.config: 1.54716%
- AASIST.config: 16.80744%

### Details about the pretrained models and evaluation dataset
AASIST model is trained on the ASV2019 dataset, comprising of english spoken speeches with western accent. They spoofed speeches are generated from wide range of audio formats, Text-To-Speech and Voice Conversion models. 

Apple1.1 model is trained on all the dataset in AASIST model, plus bonafide english speeches with singaporean accent and their spoofed counterparts. The spoofed counter parts are generated with Mangio-RVC and wide range of TTS like Coqui, FastSpeech2 and StyleTTS2

Apple1.2 model is trained on all the dataset in Apple1.1. Except, the Singaporean accent bonafide and spoofed data went through audio trimming via ML model SileroVAD to remove non-speech portions in the audio. 

In [None]:
%run "pyfiles/main.py" --eval --config "./config/Apple1.1.conf" 
%run "pyfiles/main.py" --eval --config "./config/Apple1.2.conf"
%run "pyfiles/main.py" --eval --config "./config/AASIST.conf"
# Remove segment_info.txt to refresh new dataset. 
# We only remove after these 3 config files because they are using the same evaluation dataset
os.remove("data/segment_info.txt")

### Testing on SileroVAD speeches
The EER result with Apple1.1 should be 33.33%, and for Apple1.2, it should be 0%.

Why does Apple1.2 perform way better than Apple1.1?

Try out these pretrained models on your custom evaluation dataset and see which one has the highest EER!

In [None]:
%run "pyfiles/main.py" --eval --config "./config/Apple1.1_eval_on_SileroVAD.conf"
%run "pyfiles/main.py" --eval --config "./config/Apple1.2_eval_on_SileroVAD.conf"
# Remove segment_info.txt to refresh new dataset
os.remove("data/segment_info.txt")