# Accuracy Evaluation

This notebook goes through an example of evaluating the accuracy of a speaker diarization tool.

The `process_accuracy()` function evaluates predictions using the Diarization Error rate, and the Jaccard Error rate 

## Accuracy Evaluation Example

This script uses the VoxConverse v0.3 dataset via Hugging Face, which contains a collection of multi-speaker audio .wav files and labeled RTTM files for comparison.

For instructions on how to get set up with this dataset, please view the [calculate performance](https://github.com/Digital-Working-Group/speaker-diarization/blob/main/README.md#calculate-performance-metrics) section of the README

In [6]:
import json
from pyannote.metrics.diarization import DiarizationErrorRate, JaccardErrorRate
from tqdm import tqdm
from dataset import *
from engine import *
from util import load_rttm, rttm_to_annotation
from read_token import read_token
from speaker_diarization_evaluate import _process_accuracy
import os

In [13]:

DEFAULT_CACHE_FOLDER = os.path.join(os.getcwd(), "cache")
RESULTS_FOLDER = os.path.join(os.getcwd(), "results")

dataset_kwargs = {
        "dataset": Datasets.VOX_CONVERSE,
        "data_folder": "hf_voxconverse_data"
    }

engine_kwargs = {
        "engine": Engines.PYANNOTE,
        "auth_token": read_token
    }

process_kwargs = {
        "verbose": "True"
    }

dataset = Dataset.create(**dataset_kwargs)
print(f"Dataset: {dataset}")

engine = Engine.create(**engine_kwargs)
print(f"Engine: {engine}")

_process_accuracy(engine, dataset, **process_kwargs)

Dataset: VoxConverse


  if ismodule(module) and hasattr(module, '__file__'):


Engine: PYANNOTE


  0%|                                                                                           | 0/10 [00:00<?, ?it/s]

Processing C:\Users\hazel\speaker-diarization\scripts\speaker_diarization\hf_voxconverse_data\sample_0\sample_0.wav...


  std = sequences.std(dim=-1, correction=1)
 10%|████████▏                                                                         | 1/10 [04:51<43:42, 291.37s/it]

GT segments: [<Segment(0.16, 34)>, <Segment(34.72, 42.36)>, <Segment(42.4, 66.2)>, <Segment(65.56, 96.24)>, <Segment(68.28, 68.92)>, <Segment(80.72, 81.24)>, <Segment(96.24, 100.2)>, <Segment(98.04, 98.76)>, <Segment(100.28, 123.24)>, <Segment(113.84, 115)>, <Segment(123.24, 127.04)>, <Segment(127.04, 145.6)>, <Segment(145.76, 152.72)>, <Segment(152.44, 154.44)>, <Segment(153.32, 154.48)>, <Segment(154.52, 163.36)>, <Segment(163.32, 167.04)>, <Segment(167.04, 173.72)>, <Segment(173.76, 174.44)>, <Segment(174.4, 175.24)>, <Segment(175.12, 175.6)>, <Segment(175.4, 178.12)>, <Segment(178.12, 178.76)>]
HYP segments: [<Segment(0.0309687, 33.0385)>, <Segment(33.0385, 33.1228)>, <Segment(33.1228, 33.2578)>, <Segment(33.2578, 33.3085)>, <Segment(33.3085, 33.3422)>, <Segment(33.3422, 33.3591)>, <Segment(33.3591, 33.7978)>, <Segment(33.7978, 33.8485)>, <Segment(33.8485, 34.0172)>, <Segment(34.0172, 43.771)>, <Segment(43.771, 43.8047)>, <Segment(43.8047, 43.8553)>, <Segment(43.8553, 54.706)>, <Se

 20%|████████████████▍                                                                 | 2/10 [10:20<41:48, 313.58s/it]

GT segments: [<Segment(1.88, 3.32)>, <Segment(3.32, 7.76)>, <Segment(9.36, 21.2)>, <Segment(21.72, 34.44)>, <Segment(34.48, 53)>, <Segment(53.92, 58.28)>, <Segment(58.88, 69.16)>, <Segment(69.24, 100.12)>, <Segment(100.16, 136.96)>, <Segment(136.88, 141.8)>, <Segment(142.04, 162.52)>, <Segment(162.6, 163.6)>, <Segment(163.72, 168.24)>, <Segment(169.36, 193.88)>, <Segment(193.36, 195.92)>, <Segment(195.68, 196.12)>, <Segment(197.72, 198.32)>]
HYP segments: [<Segment(1.51597, 3.03472)>, <Segment(3.03472, 7.69222)>, <Segment(9.21097, 21.2766)>, <Segment(21.631, 52.9341)>, <Segment(53.8791, 58.621)>, <Segment(58.891, 69.1847)>, <Segment(69.1847, 99.6441)>, <Segment(100.15, 137.967)>, <Segment(137.967, 141.798)>, <Segment(141.882, 162.52)>, <Segment(162.52, 168.325)>, <Segment(168.865, 193.773)>, <Segment(189.706, 189.942)>, <Segment(193.081, 195.933)>, <Segment(197.502, 198.38)>]
diarization error rate: {'total': 188.63999999999996, 'correct': 180.39784375, 'missed detection': 0.8098749999

 30%|██████████████████████▏                                                   | 3/10 [16:44:43<53:18:17, 27413.87s/it]

GT segments: [<Segment(0.24, 11.12)>, <Segment(11.88, 14.48)>, <Segment(14.68, 18.6)>, <Segment(18.6, 22.28)>, <Segment(18.6, 22.32)>, <Segment(22.28, 26.2)>, <Segment(26.24, 27.12)>, <Segment(27.12, 41.96)>, <Segment(41.96, 45.2)>, <Segment(45.2, 49.56)>, <Segment(49.68, 54.4)>, <Segment(54.4, 63.2)>, <Segment(63.2, 68.56)>, <Segment(68.56, 73.8)>, <Segment(73.8, 77)>, <Segment(77, 81)>, <Segment(81.16, 85.56)>, <Segment(85.68, 88.6)>, <Segment(88.84, 90.32)>, <Segment(88.88, 92.68)>, <Segment(91.16, 92.28)>, <Segment(92.8, 95.8)>, <Segment(95.8, 100.84)>, <Segment(100.92, 104.96)>, <Segment(105.12, 106.64)>, <Segment(107, 114.72)>, <Segment(114.96, 117.84)>, <Segment(117.8, 119.32)>, <Segment(119.4, 123)>, <Segment(123, 125.16)>, <Segment(125.32, 126.36)>, <Segment(126.48, 131.28)>, <Segment(131.32, 135.16)>, <Segment(135.24, 139.52)>, <Segment(139.8, 142.4)>, <Segment(143.8, 150.6)>, <Segment(150.76, 154.08)>, <Segment(154.36, 158.76)>, <Segment(158.6, 161.52)>, <Segment(161.52, 170

 40%|█████████████████████████████▌                                            | 4/10 [16:45:19<27:40:32, 16605.38s/it]

GT segments: [<Segment(0.16, 5)>, <Segment(5.4, 27.96)>]
HYP segments: [<Segment(0.0309687, 27.976)>]
diarization error rate: {'total': 27.400000000000002, 'correct': 27.400000000000002, 'missed detection': 0.0, 'false alarm': 0.5449999999999994, 'confusion': 0.0, 'diarization error rate': 0.019890510948905087}
GT segments: [<Segment(0.16, 5)>, <Segment(5.4, 27.96)>]
HYP segments: [<Segment(0.0309687, 27.976)>]
jaccard error rate: {'speaker error': 0.019502594381821368, 'speaker count': 1.0, 'jaccard error rate': 0.019502594381821368}
Processing C:\Users\hazel\speaker-diarization\scripts\speaker_diarization\hf_voxconverse_data\sample_4\sample_4.wav...


 50%|█████████████████████████████████████                                     | 5/10 [16:46:20<14:46:35, 10639.12s/it]

GT segments: [<Segment(0.04, 2.88)>, <Segment(3.44, 5.2)>, <Segment(5.24, 6.64)>, <Segment(7.52, 9.96)>, <Segment(10.8, 16.08)>, <Segment(16.16, 19.28)>, <Segment(20.56, 24.68)>, <Segment(30.8, 31.2)>, <Segment(32.64, 43.76)>]
HYP segments: [<Segment(0.0309687, 2.71409)>, <Segment(3.42284, 4.72222)>, <Segment(5.24534, 6.54472)>, <Segment(7.35472, 9.83534)>, <Segment(10.7972, 19.0491)>, <Segment(20.551, 24.6516)>, <Segment(30.7941, 30.811)>, <Segment(30.811, 31.081)>, <Segment(32.6503, 43.5347)>]
diarization error rate: {'total': 32.48, 'correct': 30.727062500000002, 'missed detection': 1.4829374999999967, 'false alarm': 0.28918749999999077, 'confusion': 0.2699999999999996, 'diarization error rate': 0.06287330665024592}
GT segments: [<Segment(0.04, 2.88)>, <Segment(3.44, 5.2)>, <Segment(5.24, 6.64)>, <Segment(7.52, 9.96)>, <Segment(10.8, 16.08)>, <Segment(16.16, 19.28)>, <Segment(20.56, 24.68)>, <Segment(30.8, 31.2)>, <Segment(32.64, 43.76)>]
HYP segments: [<Segment(0.0309687, 2.71409)>

 60%|█████████████████████████████████████████████▌                              | 6/10 [16:49:42<7:52:42, 7090.58s/it]

GT segments: [<Segment(0.56, 1.68)>, <Segment(1.76, 4.12)>, <Segment(9, 34.12)>, <Segment(34.16, 58.64)>, <Segment(58.84, 60.88)>, <Segment(61.04, 120.72)>]
HYP segments: [<Segment(0.419094, 4.38472)>, <Segment(8.97472, 58.3678)>, <Segment(58.7897, 60.9666)>, <Segment(60.9666, 60.9835)>, <Segment(61.0003, 113.701)>, <Segment(114.089, 120.434)>]
diarization error rate: {'total': 114.80000000000001, 'correct': 112.73381250000001, 'missed detection': 0.9461874999999935, 'false alarm': 0.7443124999999802, 'confusion': 1.12, 'diarization error rate': 0.02448170731707294}
GT segments: [<Segment(0.56, 1.68)>, <Segment(1.76, 4.12)>, <Segment(9, 34.12)>, <Segment(34.16, 58.64)>, <Segment(58.84, 60.88)>, <Segment(61.04, 120.72)>]
HYP segments: [<Segment(0.419094, 4.38472)>, <Segment(8.97472, 58.3678)>, <Segment(58.7897, 60.9666)>, <Segment(60.9666, 60.9835)>, <Segment(61.0003, 113.701)>, <Segment(114.089, 120.434)>]
jaccard error rate: {'speaker error': 0.0676782779697849, 'speaker count': 2.0, 

 70%|█████████████████████████████████████████████████████▏                      | 7/10 [16:56:11<4:04:59, 4899.73s/it]

GT segments: [<Segment(0.04, 16.32)>, <Segment(16.48, 36)>, <Segment(37.44, 44.64)>, <Segment(45.6, 49.44)>, <Segment(45.88, 46.76)>, <Segment(48.24, 62.04)>, <Segment(64.08, 69.32)>, <Segment(69.96, 73.96)>, <Segment(74.04, 77.6)>, <Segment(78, 126.64)>, <Segment(126.48, 147.76)>, <Segment(148.44, 155.04)>, <Segment(155.36, 156.24)>, <Segment(156.28, 159)>, <Segment(159.56, 180.12)>, <Segment(181.52, 186.4)>, <Segment(187.32, 197.2)>, <Segment(198.32, 202.32)>, <Segment(203.2, 205.44)>, <Segment(205.68, 209.44)>, <Segment(209.48, 213.24)>, <Segment(213.28, 215)>]
HYP segments: [<Segment(0.0309687, 35.9747)>, <Segment(37.2572, 42.1847)>, <Segment(42.4885, 44.6991)>, <Segment(45.5428, 49.1878)>, <Segment(45.7453, 46.5722)>, <Segment(46.5722, 46.6228)>, <Segment(46.6228, 46.7241)>, <Segment(48.1247, 62.0128)>, <Segment(64.0041, 77.6897)>, <Segment(77.9597, 126.762)>, <Segment(126.425, 155.045)>, <Segment(155.247, 156.243)>, <Segment(156.243, 161.592)>, <Segment(161.828, 180.188)>, <Segme

 80%|████████████████████████████████████████████████████████████▊               | 8/10 [17:04:28<1:56:36, 3498.11s/it]

GT segments: [<Segment(1.48, 26.24)>, <Segment(26.28, 111.12)>, <Segment(97.64, 98.04)>, <Segment(101.8, 102.04)>, <Segment(111.36, 114.4)>, <Segment(115.84, 155.96)>, <Segment(156.12, 208.68)>, <Segment(208.88, 254.52)>, <Segment(254.4, 266.32)>, <Segment(267, 295.2)>, <Segment(295.48, 297.72)>, <Segment(296, 308.88)>, <Segment(308.96, 311)>]
HYP segments: [<Segment(1.46534, 26.221)>, <Segment(26.221, 111.119)>, <Segment(101.77, 102.125)>, <Segment(111.119, 114.528)>, <Segment(115.793, 155.905)>, <Segment(156.142, 208.707)>, <Segment(208.758, 254.32)>, <Segment(254.32, 266.335)>, <Segment(266.96, 281.877)>, <Segment(282.097, 297.335)>, <Segment(295.934, 308.81)>, <Segment(305.333, 306.363)>, <Segment(308.928, 310.885)>]
diarization error rate: {'total': 303.91999999999996, 'correct': 302.81984375, 'missed detection': 0.4805624999999907, 'false alarm': 2.1958437499999004, 'confusion': 0.6195937499999822, 'diarization error rate': 0.010844959199789003}
GT segments: [<Segment(1.48, 26.24

 90%|██████████████████████████████████████████████████████████████████████▏       | 9/10 [17:12:59<42:44, 2564.20s/it]

GT segments: [<Segment(0.16, 42.32)>, <Segment(42.24, 44.88)>, <Segment(46.36, 51.64)>, <Segment(55.36, 55.96)>, <Segment(56.44, 57.36)>, <Segment(56.8, 70.08)>, <Segment(60.12, 60.4)>, <Segment(69.84, 71.44)>, <Segment(72.6, 75.96)>, <Segment(76.28, 84.2)>, <Segment(84.24, 85.76)>, <Segment(85.4, 89.6)>, <Segment(90.88, 91.2)>, <Segment(92.2, 95.36)>, <Segment(95.4, 98.24)>, <Segment(98.28, 100.04)>, <Segment(99.44, 100.56)>, <Segment(100.64, 101.8)>, <Segment(102.76, 103.28)>, <Segment(103.88, 109.24)>, <Segment(107.32, 120.88)>, <Segment(121.6, 137.2)>, <Segment(123.04, 127.04)>, <Segment(130.28, 133.16)>, <Segment(133.72, 135.2)>, <Segment(136.44, 137.32)>, <Segment(137.68, 141.16)>, <Segment(142.12, 146)>, <Segment(144.56, 153.96)>, <Segment(151.52, 151.8)>, <Segment(153.24, 156.04)>, <Segment(156.08, 182.28)>, <Segment(182.28, 195.56)>, <Segment(186.28, 188.88)>, <Segment(192.68, 199.2)>, <Segment(196.92, 197.28)>, <Segment(198.04, 202.96)>, <Segment(199.92, 200.72)>, <Segment(20

100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [17:25:24<00:00, 6272.45s/it]

GT segments: [<Segment(0.16, 15.32)>, <Segment(15.8, 39.68)>, <Segment(41.24, 46.24)>, <Segment(46.32, 47)>, <Segment(47.08, 54.36)>, <Segment(55.36, 115.36)>, <Segment(115.52, 119.68)>, <Segment(120.32, 124.48)>, <Segment(125.04, 135.72)>, <Segment(137.48, 190)>, <Segment(190.24, 195.72)>, <Segment(196.88, 237.08)>, <Segment(237.48, 255.24)>, <Segment(256.84, 285.12)>, <Segment(285.64, 312)>, <Segment(313.36, 359.16)>, <Segment(359.32, 361.72)>, <Segment(362.4, 402.32)>, <Segment(402.52, 405.56)>, <Segment(405.76, 406.44)>]
HYP segments: [<Segment(0.0309687, 24.5841)>, <Segment(24.7528, 39.7041)>, <Segment(41.206, 46.3191)>, <Segment(46.3191, 46.876)>, <Segment(46.876, 54.4022)>, <Segment(55.2966, 115.439)>, <Segment(115.439, 119.692)>, <Segment(120.063, 124.585)>, <Segment(125.007, 135.706)>, <Segment(137.056, 190.128)>, <Segment(190.229, 195.595)>, <Segment(196.895, 237.158)>, <Segment(237.344, 255.316)>, <Segment(256.852, 285.269)>, <Segment(285.522, 312.033)>, <Segment(313.096, 35


