# Voice Cloning detection 

## Links of interest

### Data sets
- https://lionbridge.ai/datasets/best-speech-recognition-datasets-for-machine-learning/
- https://data.mendeley.com/datasets/k47yd3m28w/2
- https://keithito.com/LJ-Speech-Dataset/
- https://www.kaggle.com/charlesaverill/imagenet-voice
- https://www.kaggle.com/fabawi/augmented-extended-train-robots
- https://www.kaggle.com/jbuchner/synthetic-speech-commands-dataset
- https://www.kaggle.com/primaryobjects/voicegender

### Articles & Studies
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7058910/
- https://yorkspace.library.yorku.ca/xmlui/bitstream/handle/10315/36698/Reimao_Ricardo_AM_2019_Masters.pdf?sequence=2
- https://r9y9.github.io/deepvoice3_pytorch/

In [16]:
import os
import scipy
import numpy as np
import pandas as pd
from glob import glob as globlin
from helpers import *
from tempfile import mktemp
from subprocess import check_call

## Reading VCTK real human speaker data

In [17]:
vctk_info_df = pd.read_table('./Data/VCTK/VCTK-Corpus/VCTK-Corpus/speaker-info.txt', sep ='\s+', index_col=False)
vctk_info_df.head(5)

Unnamed: 0,ID,AGE,GENDER,ACCENTS,REGION
0,225,23,F,English,Southern
1,226,22,M,English,Surrey
2,227,38,M,English,Cumbria
3,228,22,F,English,Southern
4,229,23,F,English,Southern


In [18]:
number_of_real_speakers('./Data/VCTK/VCTK-Corpus/VCTK-Corpus/wav48/*')

44243

## Reading Generated Data

In [19]:
gen_data_annotations = pd.read_csv('./Data/GeneratedData/SpeechMultimodalCSV/virtual/Annotations/all.csv', sep='\t')
gen_data_annotations.head(5)

Unnamed: 0,mr_link,ref,id,mutated
0,{DS_PATH}/GeneratedData/SpeechMultimodalCSV/vi...,This is the first test .,100001,False
1,{DS_PATH}/GeneratedData/SpeechMultimodalCSV/vi...,move the red brick in the corner and place it ...,15,False
2,{DS_PATH}/GeneratedData/SpeechMultimodalCSV/vi...,place green pyramid on top of red brick,19,False
3,{DS_PATH}/GeneratedData/SpeechMultimodalCSV/vi...,place the red pyramid sitting on top of the re...,28,False
4,{DS_PATH}/GeneratedData/SpeechMultimodalCSV/vi...,Move the blue block on top of the grey block .,34,False


In [20]:
len(gen_data_annotations)

186504

In [24]:
mp3filename = gen_data_annotations['mr_link'].iloc[0].replace('{DS_PATH}', './Data')
mp3filename

'Data/GeneratedData/SpeechMultimodalCSV/virtual/MP3Audio/en-US-Wavenet-B/100001.mp3'

In [25]:
from pydub import AudioSegment
import matplotlib.pyplot as plt
from scipy.io import wavfile
from tempfile import mktemp

mp3_audio = AudioSegment.from_file(mp3filename, format="mp3")  # read mp3
wname = mktemp('.wav')  # use temporary file
mp3_audio.export(wname, format="wav")  # convert to wav
FS, data = wavfile.read(wname)  # read wav file
plt.specgram(data, Fs=FS, NFFT=128, noverlap=0)  # plot
plt.show()

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
# convert mp3, read wav
# mp3filename = 'XC124158.mp3'
wname = mktemp('.wav')
check_call(['avconv', '-i', mp3filename, wname])
sig, fs, enc = wavread(wname)
os.unlink(wname)