# ASVspoof 2017 v2.0 evaluation dataset meta-data analysis

> Color code specifies the level of difficulty
 1. **Green** indicates the easiest case to be detected by the system
 1. **Yellow** - medium level
 1. **Red** - difficult level
 
 
Using the following end-to-end CNN model (the one we submitted in INTERSPEECH), we compute EER for different qualities of environment, playback and the recording devices

> **models_After_ICASSP/InterSpeech2018_v2.0/testing_best_model_with3sec_RELU/keep_0.5_0.5_relurun8 **

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import os
%matplotlib inline

## Environment ID List :

1. Red color - the hardest ones (Total = 5)

['E01','E23','E24','E25','E26']

2. Green color - the easiest to detect as it is the noisy ones: total = 3

['E02','E03','E06']

3. Yello color - the medium : total = 18

['E04','E05','E07','E08','E09','E10','E11','E12','E13','E14','E15','E16','E17','E18','E19','E20','E21','E22']
    

In [2]:
yellowE = ['E04','E05','E07','E08','E09','E10','E11','E12','E13','E14','E15','E16','E17','E18','E19','E20','E21','E22']

In [3]:
greenE = ['E02','E03','E06']

In [4]:
redE = ['E01','E23','E24','E25','E26']

In [5]:
print(len(yellowE))

18


## Playback devices

> Red color, total = 12

['P03','P04','P05','P07','P09','P13','P15','P22','P23','P24','P25','P26']

> Yellow color, total = 5

['P01','P02','P12','P14','P19']

> Green color, total = 9

['P06','P08','P10','P11','P16','P17','P18','P20','P21']

## Verifiying the counts

> The above counts obtained for various devices and environment configuration has been verified with the baseline paper by hector. It matches perfect.

> It would be better to also create an scp and protocal list for each of them so in case, if we want to perform analysis for them !

In [6]:
redP = ['P03','P04','P05','P07','P09','P13','P15','P22','P23','P24','P25','P26']
yellowP = ['P01','P02','P12','P14','P19']
greenP = ['P06','P08','P10','P11','P16','P17','P18','P20','P21']

In [7]:
print(len(redP))
print(len(yellowP))
print(len(greenP))

12
5
9


## Recording device

> Red color, total = 13

['R01','R05','R06','R08','R09','R10','R11','R16','R21','R22','R23','R24','R25']


> Yello color, total = 02

['R03','R15']

> Green color, total = 10

['R02','R04','R07','R12','R13','R14','R17','R18','R19','R20']

In [8]:
redR = ['R01','R05','R06','R08','R09','R10','R11','R16','R21','R22','R23','R24','R25']
yellowR = ['R03','R15']
greenR = ['R02','R04','R07','R12','R13','R14','R17','R18','R19','R20']

In [9]:
print(len(redR))
print(len(yellowR))
print(len(greenR))

13
2
10


# Replay configurations in the evaluation set

The evaluation dataset has total of 57 replay configurations. Please refer to the Hector odyssey paper to see more details on this.

In [10]:
evalProt = '/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl.txt'

In [11]:
protList = []
with open(evalProt) as f:
    for line in f:
        rc = line.strip().split(' ')[4] + ' '+  line.strip().split(' ')[5] + ' ' + line.strip().split(' ')[6]
        if rc not in protList and rc != '- - -':
            protList.append(rc)
    #print(protList)
    print(len(protList))        

57


> ** The eval set has 57 new spoofing configurations ! (verified with the paper )**

# Inferring RC-specific file list from the score files

> Note that in most of our implementation we used genuineFirstSpoof scp and label list for the evaluation data. Here, we first extracted all the genuine trials in the beginngin and put all the spoof later (like the way trials are organised in the training and the validation sets). There is an immediate need to clean and organize the code structure with proper documentation so that few months later when you read the code you do not get confused on the implementation. Make it a bit organised !

> However, we did not create the corresponding protocal file having all the meta-data information for these files. so we will have to carefully infer these meta-data for the score files.





In [12]:
# This is the scp file where we put all genuine files first then spoof

evalScp='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/filelists/eval_genFirstSpoof.scp'

In [13]:
# The original protocal file where genuine and spoof are mixed up

evalProt='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl.txt'

In [14]:
# The model prediction scores on eval data that use evalScp for spectrogram computation
# one example

model='/homes/bc305/myphd/stage2/deeplearning.experiment1/CNN3//models_After_ICASSP/InterSpeech2018_v2.0/testing_best_model_with3sec_RELU/'
evalScore=model+'/keep_0.5_0.5_relurun8/predictions_original/eval_prediction_new.txt'

> With above function we generated a new protocal file for the evaluation data that is based on the genuineFirst
 spoof next criterion we used during spectrogram computation etc
 
> **new protocal** /homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl_genFirstSpoof.txt  

> The code for producing the new protocal is here: python_codes/make_genuineFirstspoof_protocal_evalset.py

In [15]:
%%bash

cat /homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl_genFirstSpoof.txt | head

E_1000010.wav genuine M0035 S06 - - -
E_1000018.wav genuine M0023 S01 - - -
E_1000074.wav genuine M0031 S09 - - -
E_1000102.wav genuine M0030 S09 - - -
E_1000123.wav genuine M0029 S10 - - -
E_1000151.wav genuine M0028 S07 - - -
E_1000161.wav genuine M0034 S10 - - -
E_1000179.wav genuine M0025 S05 - - -
E_1000192.wav genuine M0029 S08 - - -
E_1000217.wav genuine M0028 S05 - - -


# CASE 1: Environment specific list

> **redE** means the hardest one to detect. Please refer to the baseline paper for more details

> **yellowE** the medium hard and **green** the easiest ones

In [16]:
from eer import find_eers
from python_codes import analyse_conf

In [17]:
scoreFile = evalScore
evalScores = analyse_conf.get_scores(scoreFile)

In [18]:
evProtocal='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl_genFirstSpoof.txt'
evalScpFile='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/filelists/eval_genFirstSpoof.scp'

with open(evalScpFile) as f:
    evalScp = [line.strip() for line in f]

### i)  for greenE ['E02', 'E03', 'E06']

In [22]:
# E02

confKey = ['E02']
saveFolder = 'E02'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

1939
1939


In [23]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/E02


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   38.7762

Writing to the file !!!


In [25]:
# E03

confKey = ['E03']
saveFolder = 'E03'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

1411
1411


In [26]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/E03


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   32.5645

Writing to the file !!!


In [27]:
# E06

confKey = ['E06']
saveFolder = 'E06'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

1583
1583


In [28]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/E06


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

    3.3726

Writing to the file !!!


In [31]:
# Lets do for greenE, the easiest one first: all three in one: E02_E03_E06

confKey = ['E02','E03','E06']
gc=1298
saveFolder = 'greenEnvironment'


scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

2337
2337


In [32]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/greenEnvironment


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   32.7170

Writing to the file !!!


# EER stats on greenE, the easiest ones as reported in baseline

Using the scores that was produced by our end-to-end model (we reported in Interspeech analysis paper), that showed about 5% EER on DEV and about 29.4 % on the eval set. We computed the EER with respect to the green environment (that is highly noisly and should have been easily detected by the system - thats what the baseline paper argues which use the CQCC features. But our system perform worse still, the reason is fairly simply if we corerelate our INTERSPEECH findings about the classes. )

> **E02** has 641 spoof (+1298 genuine), EER = 38.77

> **E03** has 113 spoof (+1298 genuine), EER = 32.56

> **E06** has 285 spoof (+1298 genuine), EER = 3.3726

> **greenEnvironment (E02,E03,E06) ** has 1039 spoof + 1298 genuine), EER = 32.71

### ii)  for redE = ['E01','E23','E24','E25','E26']

In [33]:
# Lets do for redE, the most difficult ones as characterized in the paper

confKey = ['E01','E23','E24','E25','E26']
gc=1298
saveFolder = 'redEnvironment'

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)


2931
2931


In [34]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/redEnvironment


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   19.3090

Writing to the file !!!


In [35]:
%%bash
# Just to double check number of files under redE


cat '/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl.txt' | grep E26 | wc -l

178


In [36]:
748+342+183+182+178+1298

2931

# EER stats on redE, the most difficult ones

> redE = ['E01','E23','E24','E25','E26']

> **redEnvironment ** has total 2931 files (out of which first 1298 are genuine), EER = 19.3. This is very interesting as our system in this configuration seems to ***give better results than the baseline (21.86).*** See Hectors paper.

### iii)  for yellowE = ['E04','E05','E07','E08','E09','E10','E11','E12','E13','E14','E15','E16','E17','E18','E19','E20','E21','E22'] 

The medium difficulty level environment !

In [37]:
# Lets do for redE, the most difficult ones as characterized in the paper

confKey = yellowE   # see details of yellowE above
gc=1298
saveFolder='yellowEnvironment'

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

10634
10634


In [38]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/yellowEnvironment


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   30.7771

Writing to the file !!!


# Observation on environment specific results

> **greenEnvironment (E02,E03,E06) ** has 1039 spoof + 1298 genuine), EER = 32.71

> **yellowEnvironment ** has total 10634 files (out of which first 1298 are genuine), EER = 30.77. May be these contain those files that we found earlier in our analysis work that bear genuine file characteristics. Need to double check though.

> **redEnvironment ** has total 2931 files (out of which first 1298 are genuine), EER = 19.3. This is very interesting as our system in this configuration seems to ***give better results than the baseline (21.86).*** See Hectors paper.

In [39]:
2931-1298

1633

# CASE 2: Playback device specific list

> **redE** means the hardest one to detect. Please refer to the baseline paper for more details

> **yellowE** the medium hard and **green** the easiest ones

### i)  for greenP ['P06', 'P08', 'P10', 'P11', 'P16', 'P17', 'P18', 'P20', 'P21']

In [40]:
greenP

['P06', 'P08', 'P10', 'P11', 'P16', 'P17', 'P18', 'P20', 'P21']

In [41]:
# Lets do for greenP, the easiest one first

confKey = greenP                  # pass this as a list
saveFolder='greenPlayback'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

5910
5910


In [42]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/greenPlayback/


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   42.9804

Writing to the file !!!


### ii)  for yellowP ['P01', 'P02', 'P12', 'P14', 'P19']

In [43]:
yellowP

['P01', 'P02', 'P12', 'P14', 'P19']

In [44]:
confKey = yellowP                  # pass this as a list
saveFolder='yellowPlayback'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

2866
2866


In [45]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/yellowPlayback/


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   22.4272

Writing to the file !!!


### iii)  for redP  ['P03', 'P04', 'P05', 'P07', 'P09', 'P13', 'P15', 'P22', 'P23', 'P24', 'P25', 'P26']

In [46]:
redP

['P03',
 'P04',
 'P05',
 'P07',
 'P09',
 'P13',
 'P15',
 'P22',
 'P23',
 'P24',
 'P25',
 'P26']

In [47]:
confKey = redP                  # pass this as a list
saveFolder='redPlayback'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

7126
7126


In [48]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/redPlayback/


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   17.4575

Writing to the file !!!


**Observations on Playback devices**

> **greenP** has 5910 total files (first 1298 are genuine and remaining spoof). EER = 42.98

> **yellowP** has 2866 total files (first 1298 are genuine and remaining spoof). EER = 22.42

> **redP** has 7126 total files (first 1298 are genuine and remaining spoof). EER = 17.45

In [49]:
7126-1298

5828

# CASE 3: Recording device specific list

> **redE** means the hardest one to detect. Please refer to the baseline paper for more details

> **yellowE** the medium hard and **green** the easiest ones

### i)  for greenR: ['R02', 'R04', 'R07', 'R12', 'R13', 'R14', 'R17', 'R18', 'R19', 'R20']

In [50]:
greenR

['R02', 'R04', 'R07', 'R12', 'R13', 'R14', 'R17', 'R18', 'R19', 'R20']

In [51]:
confKey = greenR                  # pass this as a list
saveFolder='greenRecording'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

6390
6390


In [52]:
6390-1298

5092

In [53]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/greenRecording/


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   33.0372

Writing to the file !!!


### ii)  for yellowR: ['R03', 'R15']

In [54]:
yellowR

['R03', 'R15']

In [55]:
confKey = yellowR                  # pass this as a list
saveFolder='yellowRecording'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

2890
2890


In [56]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/yellowRecording/


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   31.9441

Writing to the file !!!


### iii)  for redR: ['R01', 'R05', 'R06', 'R08', 'R09', 'R10', 'R11', 'R16', 'R21', 'R22', 'R23', 'R24', 'R25']

In [57]:
redR

['R01',
 'R05',
 'R06',
 'R08',
 'R09',
 'R10',
 'R11',
 'R16',
 'R21',
 'R22',
 'R23',
 'R24',
 'R25']

In [58]:
confKey = redR                  # pass this as a list
saveFolder='redRecording'
gc=1298

scores_spf,scp_spf = analyse_conf.get_config_specific_scores(scoreFile, evProtocal, evalScpFile, confKey)

scp_gen = evalScp[0:gc]
scores_gen = evalScores[0:gc]

scores = np.hstack((scores_gen, scores_spf))
labels = np.hstack((np.ones(len(scores_gen)), np.zeros(len(scores_spf))))
scps = np.hstack((scp_gen,scp_spf))

print(len(scores))
print(len(scps))

scoreSavePath = model+'/keep_0.5_0.5_relurun8/environment_wise/'

# Save the score file and label file
analyse_conf.save_scores_labels_scps(scoreSavePath, scores, labels, scps, saveFolder)

6622
6622


In [60]:
%%bash

python eer.py keep_0.5_0.5_relurun8/environment_wise/redRecording/


                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018a (9.4.0.813654) 64-bit (glnxa64)
                             February 23, 2018

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
in get_eer

EER =

   24.2499

Writing to the file !!!


**Observations on Recording devices**

> **greenR** has 6390 total files (first 1298 are genuine and remaining spoof). EER = 33.03

> **yellowR** has 2890 total files (first 1298 are genuine and remaining spoof). EER = 31.9

> **redR** has 6622 total files (first 1298 are genuine and remaining spoof). EER = 24.24

In [61]:
6622-1298

5324

In [None]:
# Input: a) pass the score file and the new protocal filelist corresponding to these scores
#        b) search configuration (could be environment, playback etc)

# Output:
#        a) configuration specific score files along with labels
#        b) should i write this in a file or find a way to pass this into EER computing script
#        c) return the EER for this configuration

import numpy as np
import os

def get_config_specific_scores11(scoreFile, protocal, scpFile, confKeys):
    
    '''
    This function returns the scores corresponding to replay config parameter in confkey
    Inputs
       1) scoreFile: this is a score file as per genuineFirstspoof list which is often used in my work
       2) protocal:  denotes the genuineFirstspoof protocal file we created using genuineFirstSpoof.scp file
       2) scpFile :  this is the scp file that holds the path to the audio files.
       3) confKeys : could be a list of configuration key based on which we want to search the spoof scores.
                     This is basically used to create sub-score list using config and evaluate the EER
                     
    NOTE: if you use it for dev and train set, you need to be careful passing correct files !                 
                     
    Output:
       1) list of scores that match keys in the confKeys list
       2) list of scps that match keys in the confKeys list. This is to create filepath for all those spoof files
          This will be useful for analysing later, may be using slime !
                     
    '''
    
    # Read the scp file into a list
    with open(scpFile) as f:
        scps = [line.strip() for line in f]
                
    # Read scores and put into a list
    with open(scoreFile) as f:
        scores = [line.strip() for line in f]
        
    # Read protocal files into a list
    with open(protocal) as f:
        prots = [line.strip() for line in f]  
    
   
    # Loop over all the items/keys in the confkey and append all the corresponding scores    
    count=0
    confKey_scores = list()
    confKey_scps = list()
    
    for confKey in confKeys:
        # find all those files that match confKey in the protocal, and get the respective scores    
        
        for i in range(0,len(prots)):
            if confKey in prots[i]:            
                # append the spoof scores that match the confKey
                confKey_scores.append(scores[i])
                confKey_scps.append(scps[i])
                
                count += 1
        
    #print('Total spoof in this config is: ', count)    
    
    return confKey_scores, confKey_scps
    # Just return all the scores and scps that match confKey
    
    
def save_scores_labels_scps(savePath,scores,labels,scps,confKey):
    '''
    This function is used to save the scores and labels as per confkeys
    
    Inputs
         1) savePath: where to save the scores? 
         2) scores and labels are corresponding to the confKey
         3) scps: which will have first genuine and then all those spoof ones that matched confKey
    
    '''
              
    #First create directories related to confKey for saving scores and labels
    saveDir = savePath+'/'+confKey
            
    make_directory(saveDir)    
        
    with open(saveDir+'/score.txt', 'w') as f:
        for s in scores:
            f.write(str(s)+'\n')

    with open(saveDir+'/labels.txt','w') as f:
        for l in labels:
            if l == 1.0:
                out='genuine'
            else:
                out='spoof'
                
            f.write(out+'\n')   
            
    with open(saveDir+'audio.scp','w') as f:
        for path in scps:
            f.write(path+'\n')
                                    

# Find meta-data details on Training and the Development dataset

In [15]:
trainProt='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_train.trn.txt'

In [25]:
%%bash
#cat /homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_train.trn.txt | tail

cat /homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_dev.trl.txt | tail

D_1001701.wav spoof M0018 S04 E03 P08 R04
D_1001702.wav spoof M0014 S09 E03 P08 R04
D_1001703.wav spoof M0016 S01 E03 P08 R04
D_1001704.wav spoof M0018 S01 E03 P08 R04
D_1001705.wav spoof M0018 S02 E03 P08 R04
D_1001706.wav spoof M0016 S07 E03 P08 R04
D_1001707.wav spoof M0011 S03 E03 P08 R04
D_1001708.wav spoof M0011 S02 E03 P08 R04
D_1001709.wav spoof M0012 S08 E03 P08 R04
D_1001710.wav spoof M0012 S06 E03 P08 R04


In [42]:
def count_config(protocal, key):
    '''
    Inputs:
    > get the count of key in the file. Key could be environment, playback device, replay device or replay config
    > protocal is the protocal file of the dataset (train,dev or eval)
    
    Outputs:
    > the number of files with matching key
    '''
    if key == 'E':
        n=4
    elif key == 'P':
        n=5
    elif key == 'R':
        n=6
    
    confList = list()
    with open(protocal) as f:
        for line in f:                                  
            if key != 'EPR':
                d = line.strip().split(' ')[n]
            else:
                d = line.strip().split(' ')[4] + ' ' + line.strip().split(' ')[5] + ' ' + line.strip().split(' ')[6]                                                
            
            if d not in confList:
                if d == '-' or d == '- - -':
                    continue
                else:
                    confList.append(d)                                                                                                    
                
    return confList
                

In [43]:
count_config(trainProt, 'E')

['E03', 'E21']

In [44]:
count_config(trainProt, 'P')

['P01', 'P02', 'P03']

In [45]:
count_config(trainProt, 'R')

['R01']

In [46]:
count_config(trainProt, 'EPR')

['E03 P01 R01', 'E21 P02 R01', 'E21 P03 R01']

## On training set, version 2.0

> ** Environments** : 2 environements. ['E03', 'E21']

> **Playback devices**: 3 playback devices. ['P01', 'P02', 'P03']

> **Recording devices**: 1 recording devices. ['R01']

> **Replay configurations**: 3. ['E03 P01 R01', 'E21 P02 R01', 'E21 P03 R01']

# On development set, version 2.0

In [27]:
devProt='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_dev.trl.txt'

In [28]:
count_config(devProt, 'E')

['E16', 'E06', 'E18', 'E04', 'E05', 'E03']

In [29]:
count_config(devProt, 'P')

['P07', 'P09', 'P05', 'P06', 'P01', 'P08']

In [30]:
count_config(devProt, 'R')

['R06', 'R05', 'R07', 'R03', 'R02', 'R01', 'R04']

In [47]:
count_config(devProt, 'EPR')

['E16 P07 R06',
 'E16 P07 R05',
 'E16 P07 R07',
 'E06 P09 R06',
 'E06 P09 R05',
 'E06 P09 R07',
 'E18 P05 R03',
 'E04 P06 R02',
 'E05 P01 R01',
 'E03 P08 R04']

## On development set, version 2.0

> ** Environments** : 6 environements. ['E16', 'E06', 'E18', 'E04', 'E05', 'E03']

> **Playback devices**: 6 playback devices. ['P07', 'P09', 'P05', 'P06', 'P01', 'P08']

> **Recording devices**: 7 recording devices. ['R06', 'R05', 'R07', 'R03', 'R02', 'R01', 'R04']

> **Replay configurations**: 10. ['E16 P07 R06', 'E16 P07 R05', 'E16 P07 R07', 'E06 P09 R06', 'E06 P09 R05', 'E06 P09 R07', 'E18 P05 R03', 'E04 P06 R02', 'E05 P01 R01', 'E03 P08 R04']

# On the evaluation set, version 2.0

In [50]:
evalProt='/homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl_genFirstSpoof.txt'

In [51]:
%%bash
cat /homes/bc305/myphd/datasets/ASVSpoof2017_v2.0/protocol_V2/ASVspoof2017_V2_eval.trl_genFirstSpoof.txt | head

E_1000010.wav genuine M0035 S06 - - -
E_1000018.wav genuine M0023 S01 - - -
E_1000074.wav genuine M0031 S09 - - -
E_1000102.wav genuine M0030 S09 - - -
E_1000123.wav genuine M0029 S10 - - -
E_1000151.wav genuine M0028 S07 - - -
E_1000161.wav genuine M0034 S10 - - -
E_1000179.wav genuine M0025 S05 - - -
E_1000192.wav genuine M0029 S08 - - -
E_1000217.wav genuine M0028 S05 - - -


In [66]:
#count_config(evalProt, 'E')  # total 24
sum([1 for i in count_config(evalProt, 'E')])


24

In [67]:
#count_config(evalProt, 'P')  # total 23
sum([1 for i in count_config(evalProt, 'P')])


23

In [68]:
#count_config(evalProt, 'R')  # total 24
sum([1 for i in count_config(evalProt, 'R')])


24

In [65]:
#count_config(evalProt, 'EPR')  # total 
sum([1 for i in count_config(evalProt, 'EPR')])


57

## On evaluation set, version 2.0

> ** Environments** : 24 environements. ['E19','E14','E12','E18','E11','E20','E07','E15','E17','E02','E01','E08','E13', 'E21','E26','E22','E25','E24','E10','E16','E23','E06','E09','E03']

> **Playback devices**: 23 playback devices. ['P22','P03','P16','P05','P26','P10','P21','P19','P12','P15','P20','P14', 'P24','P13','P23','P17','P07','P25','P09','P04','P18','P08','P11']

> **Recording devices**: 24 recording devices. ['R22','R04','R11','R03','R16','R15','R18','R10','R25','R13','R14','R01',
 'R24','R08','R23','R19','R06','R17','R09','R21','R05','R07','R20','R12']

> **Replay configurations**: 57 replay configurations. Test set has all replay configurations ! 

In [58]:
# Total environments = 26
# Total Playback = 26
# Total recording = 25
# Total Replay configurations = 57

# Overlap between training and eval set

> **Environment overlap**: ['E03', 'E21'] in train set appears in test set. E03 is balcony environment, which is assumed to be easy to detect cause it would be very noisy. E21 is office 09  environment recorded in office condition which will have some noise. According to hector paper these two environments are categorised under easy condition.

> **Playback devices**:  3 playback devices. ['P01', 'P02', 'P03'] in training set. ***P03 appears in eval*** P01 and P03 are easy categories (yellow) and P03 hard category (red). See hector paper for details on color coding.

> **Recording devices**:  Only 'R01' appears in training set. ***R01 appears*** in eval set too. R01 is the hard category

> **Replay configuration**. 3 RC appear in training set ['E03 P01 R01', 'E21 P02 R01', 'E21 P03 R01']. Configuration ***'E21 P03 R01'*** (which is called RC28, see hector paper) is repeated in eval set. RC28 has yellow environment and red playback and recording device.

*** Note:*** Green and yellow color signifies that replayed signals should be relatively easy to detect. While red color signifies that the signals are pretty hard to be distinguished as spoof signals cause they leave no discriminative cue.

# Overlap between development and eval set

> **Environment overlap**:  6 environements. ['E16', 'E06', 'E18', 'E04', 'E05', 'E03'] in dev set. Except E04 and E05 all other appear in test set. All these environments comes under green and yellow conditions.

> **Playback devices**: 6 playback devices. ['P07', 'P09', 'P05', 'P06', 'P01', 'P08'] in dev set. Except P06,P01 all other appear in test set.

> **Recording devices**: 7 recording devices. ['R06', 'R05', 'R07', 'R03', 'R02', 'R01', 'R04'] in dev set. Except R02 all other appear in test set.

> **Replay configurations**:  10 rc's ['E16 P07 R06', 'E16 P07 R05', 'E16 P07 R07', 'E06 P09 R06', 'E06 P09 R05', 'E06 P09 R07', 'E18 P05 R03', 'E04 P06 R02', 'E05 P01 R01', 'E03 P08 R04'] in dev set.

> Repeated in eval set are

    'E16 P07 R06' = RC40
    'E16 P07 R05' = RC51
    'E16 P07 R07' = RC31
    'E06 P09 R06' = RC43
    'E06 P09 R05' = RC54
    'E06 P09 R07' = RC27
    'E18 P05 R03' = RC29


> Not repeated

    'E04 P06 R02'
    'E05 P01 R01'
    'E03 P08 R04'




# Overlap between the training and development set

