# Accuracy Evaluation

This notebook goes through an example of evaluating the accuracy of a speaker diarization tool.

The `process_accuracy()` function evaluates predictions using the Diarization Error rate, and the Jaccard Error rate.

## Accuracy Evaluation Example

This script uses the VoxConverse v0.3 dataset via Hugging Face, which contains a collection of multi-speaker audio .wav files and labeled RTTM files for comparison.

For instructions on how to get set up with this dataset, please view the [calculate performance](https://github.com/Digital-Working-Group/speaker-diarization/blob/main/README.md#calculate-performance-metrics) section of the README.

In [3]:
import json
from pyannote.metrics.diarization import DiarizationErrorRate, JaccardErrorRate
from tqdm import tqdm
from dataset import *
from engine import *
from util import load_rttm, rttm_to_annotation
from read_token import read_token
from speaker_diarization_evaluate import _process_accuracy
import os

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [7]:

DEFAULT_CACHE_FOLDER = os.path.join(os.getcwd(), "cache")
RESULTS_FOLDER = os.path.join(os.getcwd(), "results")

dataset_kwargs = {
        "dataset": Datasets.VOX_CONVERSE,
        "data_folder": "hf_voxconverse_data"
    }

engine_kwargs = {
        "engine": Engines.PYANNOTE,
        "auth_token": read_token
    }

process_kwargs = {
        "verbose": "True"
    }

dataset = Dataset.create(**dataset_kwargs)
print(f"Dataset: {dataset}")

engine = Engine.create(**engine_kwargs)
print(f"Engine: {engine}")

_process_accuracy(engine, dataset, **process_kwargs)

Dataset: VoxConverse
Engine: PYANNOTE


  0%|                                | 0/9 [00:00<?, ?it/s]

Processing C:\Users\hazel\speaker-diarization\scripts\speaker_diarization\hf_voxconverse_data\sample_1\sample_1.wav...


 11%|██                | 1/9 [1:14:39<9:57:18, 4479.82s/it]

GT segments: [<Segment(1.516, 3.035)>, <Segment(3.035, 7.693)>, <Segment(9.211, 21.277)>, <Segment(21.631, 52.934)>, <Segment(53.879, 58.621)>, <Segment(58.891, 69.185)>, <Segment(69.185, 99.644)>, <Segment(100.15, 137.967)>, <Segment(137.967, 141.798)>, <Segment(141.882, 162.52)>, <Segment(162.52, 168.325)>, <Segment(168.865, 193.772)>, <Segment(189.706, 189.942)>, <Segment(193.081, 195.933)>, <Segment(197.502, 198.379)>]
HYP segments: [<Segment(1.51597, 3.03472)>, <Segment(3.03472, 7.69222)>, <Segment(9.21097, 21.2766)>, <Segment(21.631, 52.9341)>, <Segment(53.8791, 58.621)>, <Segment(58.891, 69.1847)>, <Segment(69.1847, 99.6441)>, <Segment(100.15, 137.967)>, <Segment(137.967, 141.798)>, <Segment(141.882, 162.52)>, <Segment(162.52, 168.325)>, <Segment(168.865, 193.773)>, <Segment(189.706, 189.942)>, <Segment(193.081, 195.933)>, <Segment(197.502, 198.38)>]
diarization error rate: {'missed detection': 0.0027500000000211244, 'confusion': 0.001125000000001375, 'false alarm': 0.0024999999

 22%|████              | 2/9 [1:37:52<5:10:48, 2664.09s/it]

GT segments: [<Segment(0.031, 10.932)>, <Segment(11.962, 11.996)>, <Segment(11.995, 13.969)>, <Segment(13.97, 18.391)>, <Segment(18.526, 22.222)>, <Segment(19.066, 19.15)>, <Segment(19.15, 21.968)>, <Segment(22.34, 28.027)>, <Segment(28.027, 41.881)>, <Segment(41.948, 44.969)>, <Segment(44.969, 54.352)>, <Segment(54.352, 63.498)>, <Segment(63.498, 77.977)>, <Segment(77.977, 78.331)>, <Segment(78.331, 78.348)>, <Segment(78.348, 78.365)>, <Segment(78.365, 78.973)>, <Segment(78.972, 80.946)>, <Segment(80.947, 92.624)>, <Segment(88.928, 90.042)>, <Segment(90.042, 91.679)>, <Segment(91.679, 91.696)>, <Segment(92.692, 95.223)>, <Segment(95.341, 105.753)>, <Segment(104.892, 106.006)>, <Segment(105.753, 105.77)>, <Segment(105.77, 105.787)>, <Segment(106.006, 106.563)>, <Segment(106.563, 111.575)>, <Segment(111.963, 114.562)>, <Segment(114.916, 117.599)>, <Segment(117.903, 122.982)>, <Segment(122.982, 123.033)>, <Segment(123.033, 125.26)>, <Segment(125.26, 126.171)>, <Segment(126.408, 135.014)>

 33%|██████            | 3/9 [1:38:24<2:26:12, 1462.07s/it]

GT segments: [<Segment(0.031, 27.976)>]
HYP segments: [<Segment(0.0309687, 27.976)>]
diarization error rate: {'missed detection': 3.1249999999261036e-05, 'confusion': 0.0, 'false alarm': 3.125000000000003e-05, 'total': 27.945, 'correct': 27.94496875, 'diarization error rate': 2.2365360529347314e-06}
GT segments: [<Segment(0.031, 27.976)>]
HYP segments: [<Segment(0.0309687, 27.976)>]
jaccard error rate: {'speaker count': 1.0, 'speaker error': 2.2365335518643255e-06, 'jaccard error rate': 2.2365335518643255e-06}
Processing C:\Users\hazel\speaker-diarization\scripts\speaker_diarization\hf_voxconverse_data\sample_4\sample_4.wav...


 44%|████████▍          | 4/9 [1:39:24<1:15:42, 908.50s/it]

GT segments: [<Segment(0.031, 2.714)>, <Segment(3.423, 4.722)>, <Segment(5.245, 6.544)>, <Segment(7.355, 9.836)>, <Segment(10.797, 19.049)>, <Segment(20.551, 24.652)>, <Segment(30.794, 30.811)>, <Segment(30.811, 31.081)>, <Segment(32.65, 43.534)>]
HYP segments: [<Segment(0.0309687, 2.71409)>, <Segment(3.42284, 4.72222)>, <Segment(5.24534, 6.54472)>, <Segment(7.35472, 9.83534)>, <Segment(10.7972, 19.0491)>, <Segment(20.551, 24.6516)>, <Segment(30.7941, 30.811)>, <Segment(30.811, 31.081)>, <Segment(32.6503, 43.5347)>]
diarization error rate: {'missed detection': 0.0020937500000064446, 'confusion': 3.1249999999261036e-05, 'false alarm': 0.002343750000002604, 'total': 31.285999999999998, 'correct': 31.28387499999999, 'diarization error rate': 0.00014283545355776736}
GT segments: [<Segment(0.031, 2.714)>, <Segment(3.423, 4.722)>, <Segment(5.245, 6.544)>, <Segment(7.355, 9.836)>, <Segment(10.797, 19.049)>, <Segment(20.551, 24.652)>, <Segment(30.794, 30.811)>, <Segment(30.811, 31.081)>, <Segm

 56%|██████████        | 5/9 [2:03:41<1:13:44, 1106.25s/it]

GT segments: [<Segment(0.419, 4.385)>, <Segment(8.975, 58.368)>, <Segment(58.79, 60.967)>, <Segment(60.967, 60.984)>, <Segment(61, 113.701)>, <Segment(114.089, 120.434)>]
HYP segments: [<Segment(0.419094, 4.38472)>, <Segment(8.97472, 58.3678)>, <Segment(58.7897, 60.9666)>, <Segment(60.9666, 60.9835)>, <Segment(61.0003, 113.701)>, <Segment(114.089, 120.434)>]
diarization error rate: {'missed detection': 0.0015312499999998175, 'confusion': 0.0004062499999974989, 'false alarm': 0.0006562499999969162, 'total': 114.599, 'correct': 114.5970625, 'diarization error rate': 2.2633269051163034e-05}
GT segments: [<Segment(0.419, 4.385)>, <Segment(8.975, 58.368)>, <Segment(58.79, 60.967)>, <Segment(60.967, 60.984)>, <Segment(61, 113.701)>, <Segment(114.089, 120.434)>]
HYP segments: [<Segment(0.419094, 4.38472)>, <Segment(8.97472, 58.3678)>, <Segment(58.7897, 60.9666)>, <Segment(60.9666, 60.9835)>, <Segment(61.0003, 113.701)>, <Segment(114.089, 120.434)>]
jaccard error rate: {'speaker count': 2.0, '

 67%|██████████████       | 6/9 [2:09:16<42:12, 844.02s/it]

GT segments: [<Segment(0.031, 35.975)>, <Segment(37.257, 42.185)>, <Segment(42.488, 44.699)>, <Segment(45.543, 49.188)>, <Segment(45.745, 46.572)>, <Segment(46.572, 46.623)>, <Segment(46.623, 46.724)>, <Segment(48.125, 62.013)>, <Segment(64.004, 77.69)>, <Segment(77.96, 126.762)>, <Segment(126.425, 155.045)>, <Segment(155.247, 156.243)>, <Segment(156.243, 161.592)>, <Segment(161.828, 180.188)>, <Segment(180.897, 186.398)>, <Segment(187.192, 197.384)>, <Segment(198.211, 202.413)>, <Segment(203.088, 205.299)>, <Segment(205.636, 214.934)>]
HYP segments: [<Segment(0.0309687, 35.9747)>, <Segment(37.2572, 42.1847)>, <Segment(42.4885, 44.6991)>, <Segment(45.5428, 49.1878)>, <Segment(45.7453, 46.5722)>, <Segment(46.5722, 46.6228)>, <Segment(46.6228, 46.7241)>, <Segment(48.1247, 62.0128)>, <Segment(64.0041, 77.6897)>, <Segment(77.9597, 126.762)>, <Segment(126.425, 155.045)>, <Segment(155.247, 156.243)>, <Segment(156.243, 161.592)>, <Segment(161.828, 180.188)>, <Segment(180.897, 186.398)>, <Segm

 78%|████████████████▎    | 7/9 [2:17:25<24:16, 728.05s/it]

GT segments: [<Segment(1.465, 26.221)>, <Segment(26.221, 111.119)>, <Segment(101.77, 102.124)>, <Segment(111.119, 114.528)>, <Segment(115.793, 155.905)>, <Segment(156.142, 208.708)>, <Segment(208.758, 254.32)>, <Segment(254.32, 266.335)>, <Segment(266.96, 281.877)>, <Segment(282.097, 297.335)>, <Segment(295.934, 308.81)>, <Segment(305.333, 306.362)>, <Segment(308.928, 310.885)>]
HYP segments: [<Segment(1.46534, 26.221)>, <Segment(26.221, 111.119)>, <Segment(101.77, 102.125)>, <Segment(111.119, 114.528)>, <Segment(115.793, 155.905)>, <Segment(156.142, 208.707)>, <Segment(208.758, 254.32)>, <Segment(254.32, 266.335)>, <Segment(266.96, 281.877)>, <Segment(282.097, 297.335)>, <Segment(295.934, 308.81)>, <Segment(305.333, 306.363)>, <Segment(308.928, 310.885)>]
diarization error rate: {'missed detection': 0.0020312499999886047, 'confusion': 0.00046875000001733724, 'false alarm': 0.0042187499999783995, 'total': 304.121, 'correct': 304.1185, 'diarization error rate': 2.209235797588572e-05}
GT

 89%|█████████████████▊  | 8/9 [3:40:41<34:46, 2086.69s/it]

GT segments: [<Segment(0.031, 42.033)>, <Segment(42.033, 44.902)>, <Segment(42.218, 42.235)>, <Segment(46.235, 51.618)>, <Segment(55.28, 55.702)>, <Segment(56.275, 69.978)>, <Segment(60.022, 60.495)>, <Segment(69.978, 71.564)>, <Segment(72.526, 84.946)>, <Segment(84.946, 89.603)>, <Segment(85.148, 85.57)>, <Segment(90.869, 91.375)>, <Segment(92.135, 101.855)>, <Segment(99.391, 99.948)>, <Segment(102.665, 103.222)>, <Segment(103.694, 109.178)>, <Segment(107.356, 121.092)>, <Segment(121.48, 124.231)>, <Segment(122.965, 141.376)>, <Segment(124.568, 125.429)>, <Segment(125.868, 127.049)>, <Segment(130.424, 135.014)>, <Segment(136.533, 137.36)>, <Segment(141.983, 144.97)>, <Segment(144.97, 153.627)>, <Segment(145.493, 145.78)>, <Segment(151.535, 151.738)>, <Segment(153.627, 182.264)>, <Segment(153.695, 153.897)>, <Segment(182.264, 195.578)>, <Segment(186.179, 188.693)>, <Segment(192.642, 199.122)>, <Segment(197.873, 199.24)>, <Segment(199.24, 199.645)>, <Segment(199.645, 203.307)>, <Segment

100%|████████████████████| 9/9 [3:51:02<00:00, 1540.28s/it]

GT segments: [<Segment(0.031, 24.584)>, <Segment(24.753, 39.704)>, <Segment(41.206, 46.319)>, <Segment(46.319, 46.876)>, <Segment(46.876, 54.402)>, <Segment(55.297, 115.44)>, <Segment(115.439, 119.691)>, <Segment(120.063, 124.585)>, <Segment(125.007, 135.706)>, <Segment(137.056, 190.128)>, <Segment(190.229, 195.595)>, <Segment(196.895, 237.159)>, <Segment(237.344, 255.316)>, <Segment(256.852, 285.27)>, <Segment(285.522, 312.033)>, <Segment(313.096, 359.215)>, <Segment(359.215, 361.814)>, <Segment(362.287, 402.433)>, <Segment(402.432, 406.246)>]
HYP segments: [<Segment(0.0309687, 24.5841)>, <Segment(24.7528, 39.7041)>, <Segment(41.206, 46.3191)>, <Segment(46.3191, 46.876)>, <Segment(46.876, 54.4022)>, <Segment(55.2966, 115.439)>, <Segment(115.439, 119.692)>, <Segment(120.063, 124.585)>, <Segment(125.007, 135.706)>, <Segment(137.056, 190.128)>, <Segment(190.229, 195.595)>, <Segment(196.895, 237.158)>, <Segment(237.344, 255.316)>, <Segment(256.852, 285.269)>, <Segment(285.522, 312.033)>, 


