<a href="https://colab.research.google.com/github/dbruce6/ucb_voiceshield/blob/main/VoiceShield_Open_Source_detecting_ai_generated_speech_with_random_forests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demonstrating Audio
A demonstration before we implement some machine learning.

Let's first play two audio clips. First, the real audio from Linus Sebastian. Second, the DeepFake conversion of this audio to Elon Musk.

**Compression:** Since the audio files are quite large and crash Kaggle notebooks, I have uploaded compressed versions of these files for use here.

**Note:** It is important not to use the files in the "DEMONSTRATION" directory for your experiments

In [9]:
import IPython
import librosa
from google.colab import drive

##real_audio_path = "/content/drive/My Drive/YourFolder/linus-original-DEMO.mp3"
##fake_audio_path = "/content/drive/My Drive/YourFolder/linus-to-musk-DEMO.mp3"

real_audio, _ = librosa.load('/content/drive/My Drive/linus-to-musk-DEMO.mp3')
fake_audio, _ = librosa.load("/content/drive/My Drive/Capstone/data/DEMONSTRATION/DEMONSTRATION/linus-to-musk-DEMO.mp3", sr=None)

  real_audio, _ = librosa.load('/content/drive/My Drive/linus-to-musk-DEMO.mp3')


FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/linus-to-musk-DEMO.mp3'

## Real Audio:
The audio clip below is real human speech.

In [None]:
print("Real audio (Linus Sebastian):")
IPython.display.Audio(real_audio)

## DeepFake Audio

The audio clip below is the previous speech converted to Elon Musk via Retrieval-based Voice Conversion.

In [None]:
print("DeepFake audio (Linus to Elon Musk):")
IPython.display.Audio(fake_audio)

# Machine Learning Demonstration

You may wish to use the audio files in the "AUDIO" directory for your own methods of feature extraction and preprocessing.

Several audio features have already been extracted for you. These are found in the "DATASET-balanced.csv" file


## Loading and preparing the data

We will first load the CSV file and split into two dataframes. X will contain our features and y will contain the ground-truth labels.

In [None]:
import pandas as pd

df = pd.read_csv("/kaggle/input/deep-voice-deepfake-voice-recognition/KAGGLE/DATASET-balanced.csv")

X = df.iloc[:,:-1]
y = df.iloc[:,-1]

print(X.head(10))
print(y.head(10))

## Binarise the Labels for Binary Classification

Most classifiers will automatically process our "REAL" and "FAKE" strings with one-hot encoding, but others may crash due to their implementation.

In this code, we will create a label binariser before fitting and transforming it to our data. It will now be compatible with any Scikit model capable of binary classification.

In [None]:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(y)
y = lb.transform(y)
y = y.ravel()
print(y)

## Prepare the model and K-Fold Cross Validation

In this code, we first create a Random Forest classifier, a voting ensemble of 50 Random Decision Trees.

Then, we create a KFold splitter for training over 5-fold cross validation.

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=50, random_state=1)

from sklearn.model_selection import KFold
kf = KFold(n_splits=5,  shuffle=True, random_state=1)

print(model)
print("KFold splits: " + str(kf.get_n_splits(X)))

## Train Each Fold

We will now train the model on each of the KFold splits. The training is timed, and all results are stored in their respective lists. Time taken is rounded to two decimal places for readability.

In this example, we consider: Accuracy, Precision, Recall, F1-Score, Matthew's Correlation Coefficient (MCC), and Receiver Operating Characteristic (ROC) scores.

In [None]:
import time

import numpy as np

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, matthews_corrcoef, roc_auc_score

acc_score = []
prec_score = []
rec_score = []
f1s = []
MCCs = []
ROCareas = []

start = time.time()
for train_index , test_index in kf.split(X):
    X_train , X_test = X.iloc[train_index,:],X.iloc[test_index,:]
    y_train , y_test = y[train_index] , y[test_index]

    model.fit(X_train,y_train)
    pred_values = model.predict(X_test)

    acc = accuracy_score(pred_values , y_test)
    acc_score.append(acc)

    prec = precision_score(y_test , pred_values, average="binary", pos_label=1)
    prec_score.append(prec)

    rec = recall_score(y_test , pred_values, average="binary", pos_label=1)
    rec_score.append(rec)

    f1 = f1_score(y_test , pred_values, average="binary", pos_label=1)
    f1s.append(f1)

    mcc = matthews_corrcoef(y_test , pred_values)
    MCCs.append(mcc)

    roc = roc_auc_score(y_test , pred_values)
    ROCareas.append(roc)

end = time.time()
timeTaken = (end - start)
print("Model trained in: " + str( round(timeTaken, 2) ) + " seconds.")

## Calculate the Mean Results and Standard Deviation

For each metric, we have k number of results (where k is the number of splits in the KFold). To calculate the final results, we will now compute the means and standard deviations of each set of metrics.

Each are rounded to three decimal places for readability.

In [None]:
print("Mean results and (std.):\n")
print("Accuracy: " + str( round(np.mean(acc_score)*100, 3) ) + "% (" + str( round(np.std(acc_score)*100, 3) ) + ")\n")
print("Precision: " + str( round(np.mean(prec_score), 3) ) + " (" + str( round(np.std(prec_score), 3) ) + ")")
print("Recall: " + str( round(np.mean(rec_score), 3) ) + " (" + str( round(np.std(rec_score), 3) ) + ")")
print("F1-Score: " + str( round(np.mean(f1s), 3) ) + " (" + str( round(np.std(f1s), 3) ) + ")")
print("MCC: " + str( round(np.mean(MCCs), 3) ) + " (" + str( round(np.std(MCCs), 3) ) + ")")
print("ROC AUC: " + str( round(np.mean(ROCareas), 3) ) + " (" + str( round(np.std(ROCareas), 3) ) + ")")