Machine learning based KNN classifier: towards robust, efficient DTMF tone detection for a Noisy environment
Link to publication - https://doi.org/10.1007/s11042-021-11194-3
NOTE:
- The entire modelling, simulation and analysis has been done on MATLAB R2019b environment and the code is not backwards-compatible with older versions of MATLAB due to the usage of the novel functions which have been added only in this version such as 'audioDataAugmenter' and 'audioDatastore' which are part of the 'Machine Learning and Deep Learning for Audio Toolbox'.
- The following instructions will be given in the same order as the proposed methodology of the project, i.e. data acquisition, data augmentation, data exploration followed by modelling of all the KNN classifier models (A, B, C, D, E and F).
INSTRUCTIONS:
Data Acquisition: (All files are inside 'Data Handling' folder)
- The dataset used for the KNN models A, B and C were acquired through repetitive downloading of dual-tone audio files from the website 'audiocheck.net' using a selenium script written on python. This python script is named ' autodownload_dtmftones.py'. Upon execution, the script starts downloading all 2032 files serially in the default downloads folder of the browser.
NOTE: The file directory for the web - driver needs to be changed according to the file location of the user's default web browser application.
- All files are downloaded with their filenames in accordance with the nomenclature of 'audiocheck.net'. In order to rename the files to a more legible format, run ' rename_for_dataset.m'. This is a MATLAB code which simply renames all files into this format 'lower_freq'+'higher_freq'_'index number'.wav. Ex. '770+1336_522.wav'
NOTE: The 'source' and 'destination' variables need to be changed to the required source (browser's download folder) and destination file path according to the user's computer. The above codes have been executed and the files have already been downloaded, renamed and stored under the folder 'dataset'.
Data Augmentation: (All files are inside 'Data Handling' folder)
- The augmented dataset required for training KNN models D, E and F as well as for testing all KNN models is created through the MATLAB code named ' augmented_dataset_create.m'.
NOTE: The variable 'data_loc' needs to be changed to the file path of the originally downloaded dataset and the variable 'augmented_dataset_loc' needs to be changed to the required file path of the augmented dataset. The above code has been executed and the augmented dataset is already stored in the folder named 'augmented_dataset'.
Data Exploration: (All files are inside 'Data Exploration' folder)
- In order to plot the melspectrogram of various audio files to visually identify if MFCCs are viable features, ' plot_melspec.m' MATLAB code is executed.
NOTE: The value of the variable 'dest' must be changed to the names of the audio files whose melspectrogram is required to be plotted. The value of the variable 'prefix' needs to be changed to the file path of the dataset folder containing the audio files.
- In order to plot the DFT coefficients values using Goertzel's algorithm to visually identify if they are viable features, ' plot_goertzel.m' MATLAB code is executed.
NOTE: The value of the variable 'loc' must be changed to the file path pf the audio file whose plot is to be constructed.
KNN Classifier Modelling:
KNN Model A: (All files are inside 'KNN MODEL A' folder)
- In order to extract relevant features to train KNN Model A, ' data_prep_features_extract_without_da_mfcc.m' MATLAB code is executed.
NOTE: The value of the variable 'finaldataset_loc' must be changed to file path of audio file dataset downloaded using 'autodownload_dtmftones.py' which is basically the folder named 'dataset' and has been provided.
- In order to train KNN classifier Model A using features computed from the above-mentioned code, ' KNN_classifier_without_DA_mfcc.m' MATLAB code is executed.The code trains the model and then prints the 5-fold stratified cross validation accuracy and then plots the confusion charts for Validation and Testing Accuracy.
- In order to test the model using real-world imitating augmented dataset, ' test_accuracy_augmented_data.m' MATLAB code is executed to feed the augmented testing dataset to the model and plot the resulting confusion chart.
NOTE: The value of the variable 'augmented_dataset_loc' must be changed to the file path of the augmented dataset folder which is basically the folder named 'augmented_dataset' and has been provided.
- The MATLAB codes named ' HelperComputePitchAndMFCC.m' and ' HelperTestKNNClassifier.m' are helper functions and do not need to be explicitly executed. They are automatically invoked from the main MATLAB codes mentioned above.
KNN Model B: (All files are inside 'KNN MODEL B' folder)
- In order to extract relevant features to train KNN Model B, ' data_prep_features_extract_without_da_g_without_sh.m' MATLAB code is executed.
NOTE: The value of the variable 'finaldataset_loc' must be changed to file path of audio file dataset downloaded using 'autodownload_dtmftones.py' which is basically the folder named 'dataset' and has been provided.
- In order to train KNN classifier Model B using features computed from the above-mentioned code, ' KNN_classifier_without_DA_g_without_sh.m' MATLAB code is executed.The code trains the model and then prints the 5-fold stratified cross validation accuracy and then plots the confusion charts for Validation and Testing Accuracy.
- In order to test the model using real-world imitating augmented dataset, ' test_accuracy_augmented_data.m' MATLAB code is executed to feed the augmented testing dataset to the model and plot the resulting confusion chart.
NOTE: The value of the variable 'augmented_dataset_loc' must be changed to the file path of the augmented dataset folder which is basically the folder named 'augmented_dataset' and has been provided.
- The MATLAB codes named ' HelperComputeGoertzelFreq.m' and ' HelperTestKNNClassifier.m' are helper functions and do not need to be explicitly executed. They are automatically invoked from the main MATLAB codes mentioned above.
KNN Model C: (All files are inside 'KNN MODEL C' folder)
- In order to extract relevant features to train KNN Model C, ' data_prep_features_extract_without_da_g_with_sh.m'MATLAB code is executed.
NOTE: The value of the variable 'finaldataset_loc' must be changed to file path of audio file dataset downloaded using 'autodownload_dtmftones.py' which is basically the folder named 'dataset' and has been provided.
- In order to train KNN classifier Model C using features computed from the above-mentioned code, ' KNN_classifier_without_DA_g_with_sh.m' MATLAB code is executed.The code trains the model and then prints the 5-fold stratified cross validation accuracy and then plots the confusion charts for Validation and Testing Accuracy.
- In order to test the model using real-world imitating augmented dataset, ' test_accuracy_augmented_data.m' MATLAB code is executed to feed the augmented testing dataset to the model and plot the resulting confusion chart.
NOTE: The value of the variable 'augmented_dataset_loc' must be changed to the file path of the augmented dataset folder which is basically the folder named 'augmented_dataset' and has been provided.
- The MATLAB codes named ' HelperComputeGoertzelFreq.m' and ' HelperTestKNNClassifier.m' are helper functions and do not need to be explicitly executed. They are automatically invoked from the main MATLAB codes mentioned above.
KNN Model D: (All files are inside 'KNN MODEL D' folder)
- In order to extract relevant features to train KNN Model D, ' data_prep_features_extract_with_da_mfcc.m' MATLAB code is executed.
NOTE: The value of the variable 'augmented_dataset_loc' must be changed to file path of augmented audio file dataset which is basically the folder named 'augmented_dataset' and has been provided.
- In order to train KNN classifier Model D using features computed from the above-mentioned code, ' KNN_classifier_with_DA_mfcc.m' MATLAB code is executed.The code trains the model and then prints the 5-fold stratified cross validation accuracy and then plots the confusion charts for Validation and Testing Accuracy.
- The MATLAB codes named ' HelperComputePitchAndMFCC.m' and ' HelperTestKNNClassifier.m' are helper functions and do not need to be explicitly executed. They are automatically invoked from the main MATLAB codes mentioned above.
KNN Model E: (All files are inside 'KNN MODEL E' folder)
- In order to extract relevant features to train KNN Model E, ' data_prep_features_extract_with_da_g_without_sh.m' MATLAB code is executed.
NOTE: The value of the variable 'augmented_dataset_loc' must be changed to file path of augmented audio file dataset which is basically the folder named 'augmented_dataset' and has been provided.
- In order to train KNN classifier Model E using features computed from the above-mentioned code, ' KNN_classifier_with_DA_mfcc.m' MATLAB code is executed.The code trains the model and then prints the 5-fold stratified cross validation accuracy and then plots the confusion charts for Validation and Testing Accuracy.
- The MATLAB codes named ' HelperComputeGoertzelFreq.m' and ' HelperTestKNNClassifier.m' are helper functions and do not need to be explicitly executed. They are automatically invoked from the main MATLAB codes mentioned above.
KNN Model F: (All files are inside 'KNN MODEL F' folder)
- In order to extract relevant features to train KNN Model F, ' data_prep_features_extract_with_da_g_with_sh.m' MATLAB code is executed.
NOTE: The value of the variable 'augmented_dataset_loc' must be changed to file path of augmented audio file dataset which is basically the folder named 'augmented_dataset' and has been provided.
- In order to train KNN classifier Model F using features computed from the above-mentioned code, ' KNN_classifier_with_DA_g_with_sh.m' MATLAB code is executed.The code trains the model and then prints the 5-fold stratified cross validation accuracy and then plots the confusion charts for Validation and Testing Accuracy.
- The MATLAB codes named ' HelperComputeGoertzelFreq.m' and ' HelperTestKNNClassifier.m' are helper functions and do not need to be explicitly executed. They are automatically invoked from the main MATLAB codes mentioned above.