### Combining 2 systems with DOVER-Lap
> DOVER-Lap is meant to be used to combine more than 2 systems, since black-box voting between 2 systems does not make much sense. Still, if 2 systems are provided as input, we fall back on the Hungarian algorithm for label mapping, since it is provably optimal for this case. Both the systems are assigned equal weights, and in case of voting conflicts, the region is equally divided among the two labels. This is not the intended use case and will almost certainly lead to performance degradation.

In [1]:
# !pip install dover-lap
# !pip install spy-der


Collecting scalene
  Downloading scalene-1.5.9-cp39-cp39-manylinux_2_24_x86_64.whl (584 kB)
[K     |████████████████████████████████| 584 kB 2.8 kB/s  eta 0:00:01
[?25hCollecting pynvml>=11.0.0
  Downloading pynvml-11.4.1-py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 4.3 kB/s  eta 0:00:01
Installing collected packages: pynvml, scalene
Successfully installed pynvml-11.4.1 scalene-1.5.9
Scalene extension successfully loaded. Note: Scalene currently only
supports CPU+GPU profiling inside Jupyter notebooks. For full Scalene
profiling, use the command line version.


In [4]:
# %%scalene --reduced-profile
import os
import glob
# !dover-lap --help

Scalene: Program did not run for long enough to profile.


In [6]:
ROOT = os.getcwd()
data_dir = os.path.join(ROOT,'data/ami_mix_headset')
os.makedirs(data_dir, exist_ok=True)
print("ROOT: ", ROOT)
print("Data Directory: ", data_dir)

output_path = os.path.join(ROOT, 'tmp1')

groundtruth_path = os.path.join(data_dir, 'rttms/test/EN2002a.Mix-Headset.rttm')
nemo_path = '/home/DATA/amit_kesari/SD1/NeMo-Nvidia/data/ami_mix_headset/oracle_vad/pred_rttms/EN2002a.Mix-Headset.rttm'
diart_path = '/home/DATA/amit_kesari/SD1/diart/data/ami_mix_headset/out/EN2002a.Mix-Headset.rttm '
speechbrain_path = '/home/DATA/amit_kesari/SD1/speechbrain_try/results/ami/ecapa/save/sys_rttms/Mix-Headset/AMI_eval/oracle_cos_SC/EN2002a.rttm'
speechbrain_path_tmp = os.path.join(data_dir, 'sb_EN2002a.Mix-Headset.rttm')
!cp {speechbrain_path} {speechbrain_path_tmp}

ROOT:  /home/DATA/amit_kesari/SD1/dover-lap
Data Directory:  /home/DATA/amit_kesari/SD1/dover-lap/data/ami_mix_headset
Scalene: Program did not run for long enough to profile.


In [51]:
def convert_ami_label(filepath, find, replace, debug_file=True):
    """
    converts some words of file into other. 
    Make sure to give space or some other character at the end or it will replace it everytime.
    Eg: TS3003a -> TS3003a.Mix-Headset
    """
    # Read in the file
    with open(filepath, 'r') as file :
      filedata = file.read()

    # Replace the target string
    filedata = filedata.replace(find, replace)
    if debug_file:
        print("Changed to: \n", filedata)

    # Write the file out again
    with open(filepath, 'w') as file:
      file.write(filedata)

# run once is ok
# convert_ami_label(speechbrain_path_tmp, 'EN2002a ', 'EN2002a.Mix-Headset ', debug_file=True)

Changed to: 
 SPEAKER EN2002a.Mix-Headset 0 0.0 3.745 <NA> <NA> EN2002a_3 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 3.745 1.5 <NA> <NA> EN2002a_2 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 5.245 6.0 <NA> <NA> EN2002a_1 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 11.245 4.5 <NA> <NA> EN2002a_2 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 15.745 3.0 <NA> <NA> EN2002a_0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 18.745 4.5 <NA> <NA> EN2002a_1 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 23.245 2.019 <NA> <NA> EN2002a_0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 25.344 3.745 <NA> <NA> EN2002a_3 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 29.089 1.5 <NA> <NA> EN2002a_1 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 30.589 4.5 <NA> <NA> EN2002a_0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 35.089 6.0 <NA> <NA> EN2002a_2 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 41.089 1.5 <NA> <NA> EN2002a_1 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 42.589 3.139 <NA> <NA> EN2002a_2 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 0 48.256 2.245 <NA> <NA> EN2002a

In [52]:
n = 6
print("NeMo")
!head -{n} {nemo_path}
print("diart")
!head -{n} {diart_path}
print("speechbrain")
!head -{n} {speechbrain_path_tmp}

print("dover-lap final")
!head -{n} {output_path}

print("groundtruth")
!head -{n} {groundtruth_path}


NeMo
SPEAKER EN2002a.Mix-Headset 1   0.370   3.375 <NA> <NA> speaker_3 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1   3.745   0.750 <NA> <NA> speaker_1 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1   4.495   0.750 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1   5.245   0.750 <NA> <NA> speaker_1 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1   5.995   1.500 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1   7.495   0.750 <NA> <NA> speaker_1 <NA> <NA>
diart
SPEAKER EN2002a.Mix-Headset 1 0.009 7.499 <NA> <NA> speaker0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1 8.792 1.416 <NA> <NA> speaker0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1 11.875 0.133 <NA> <NA> speaker0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1 12.292 0.217 <NA> <NA> speaker0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1 13.158 2.850 <NA> <NA> speaker0 <NA> <NA>
SPEAKER EN2002a.Mix-Headset 1 16.508 0.500 <NA> <NA> speaker0 <NA> <NA>
speechbrain
SPEAKER EN2002a.Mix-Headset 0 0.0 3.745 <NA> <NA> EN2002a_3 <NA> <NA>
SPEAKER EN2002a.Mix-H

In [54]:
# make sure labels are same in all! Eg: EN200a.Mix-Headset
!dover-lap --label-mapping=hungarian {ouput_path} {nemo_path} {diart_path} {speechbrain_path_tmp}

Loading speaker turns from input RTTMs...
Merging overlapping speaker turns...
Processing file EN2002a.Mix-Headset..


In [55]:
!spyder {groundtruth_path} {nemo_path} 
!spyder {groundtruth_path} {diart_path} 
!spyder {groundtruth_path} {speechbrain_path_tmp} 

print("\n\n\n Groundtruth vs Dover-lap:")
!spyder {groundtruth_path} {ouput_path} --per-file

Average error rates:
----------------------------------------------------
Missed speaker time = 25.11
False alarm speaker time = 0.00
Speaker error time = 3.88
Diarization error rate (DER) = 28.99
Average error rates:
----------------------------------------------------
Missed speaker time = 24.06
False alarm speaker time = 1.35
Speaker error time = 7.72
Diarization error rate (DER) = 33.13
Average error rates:
----------------------------------------------------
Missed speaker time = 25.11
False alarm speaker time = 1.98
Speaker error time = 4.73
Diarization error rate (DER) = 31.82


 Groundtruth vs Dover-lap
EN2002a.Mix-Headset: DERMetrics(miss=25.11,falarm=0.50,conf=3.41,der=29.02)
Average error rates:
----------------------------------------------------
Missed speaker time = 25.11
False alarm speaker time = 0.50
Speaker error time = 3.41
Diarization error rate (DER) = 29.02
