Removing the dependency on Pyannote for Diarization and VAD#15632
Open
tango4j wants to merge 17 commits into
Open
Removing the dependency on Pyannote for Diarization and VAD#15632tango4j wants to merge 17 commits into
tango4j wants to merge 17 commits into
Conversation
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Collaborator
Author
|
@pzelasko |
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Collaborator
Author
|
@chtruong814 |
blisc
reviewed
May 13, 2026
Signed-off-by: taejinp <tango4j@gmail.com>
Collaborator
Author
|
/ok to test dd3055b |
Collaborator
Author
|
/ok to test 69301bf |
chtruong814
approved these changes
May 16, 2026
Collaborator
|
/ok to test 0d82d77 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Removes the
pyannote.coreandpyannote.metricsdependencies from NeMo'sspeaker-diarization stack and replaces them with an in-tree, NIST
md-eval-22.pl-faithful Python engine pluslhotse.SupervisionSegment-basedannotation objects. The public API of
nemo.collections.asr.metrics.derispreserved, including byte-for-byte numerical parity with historical NeMo
diarization results (no shift in published DER numbers).
Tried to replace Pyannote classes with Lhotse's classes, to minimize the code
added to the repo by removing Pyannote imports. Except RTTM writing functions,
mostly replaceable.
Collection: ASR (speaker tasks / diarization, VAD)
Changelog
New: in-tree DER engine (
nemo/collections/asr/metrics/md_eval.py)md-eval-22.pl, written in NeMo style(Apache header, type hints, Google-style docstrings,
__all__,nemo.utils.logging, no CLI). Drives all DER computation.DiarizationErrorResultresult object exposing the dict-like interfaceused throughout NeMo (
abs(result),result['total' | 'confusion' | 'false alarm' | 'missed detection'],result.results_,result.optimal_mapping(...),result.report()).nemo/collections/asr/metrics/der.py(DER public API)score_labels,evaluate_der,score_labels_from_rttm_labels,get_partial_ref_labels,get_online_DER_stats,calculate_session_cpWER,calculate_session_cpWER_bruteforce,concat_perm_word_error_rateareall preserved with their original names, signatures, and return shapes.
No breaking changes for downstream callers.
pyannote.coretypes):make_diar_segment(start, end, speaker, ...)->SupervisionSegmentmake_diar_annotation(labels, uniq_name=...)->list[SupervisionSegment]make_uem_timeline(uem_lines, uniq_id=...)->list[SupervisionSegment](UEM regions carried as supervisions with
speaker="UEM")unique_speakers(annotation)->list[str]write_supervisions_to_rttm(annotation, file_handle, ...)score_labels_from_rttm_labels(...)convenience entry point that takesraw
"start end speaker"label strings (no annotation object constructionrequired by the caller).
_default_uem_from_ref_sys(ref_data, sys_data)helper. When a callerdoes not supply a UEM, the high-level wrappers now auto-derive
[min(ref ∪ sys TBEG), max(ref ∪ sys TEND)]per(file_id, channel)andpass it to
evaluate(). This matches the historical no-UEM scoring mapused by the previous external engine and prevents any over-shoot of the
hypothesis past the last reference segment from being silently dropped.
md_eval.evaluate()itself remains a faithful NIST port (ref-extent only)for power users that call it directly.
collarargument in bothscore_labelsandscore_labels_from_rttm_labelsclarifies the NIST half-width semantics(total no-score zone =
2 * collar) and gives the cross-engine conversionrule (
NeMo collar=X<==> external libs that define collar as total widthcollar=2X).Source code rename / scrub (no behaviour change)
nemo/collections/asr/parts/utils/speaker_utils.py:labels_to_pyannote_object->labels_to_supervisionstimestamps_to_pyannote_object->timestamps_to_supervisionslist[SupervisionSegment]nemo/collections/asr/parts/utils/vad_utils.py:vad_construct_pyannote_object_per_file->vad_construct_supervisions_per_fileframe_vad_construct_pyannote_object_per_file->frame_vad_construct_supervisions_per_fileread_rttm_as_pyannote_object->read_rttm_as_supervisions_DetectionErrorRateAccumulatorclass replacespyannote.metrics.detection.DetectionErrorRate, backed bymd_eval. Itpreserves the
metric(reference, hypothesis)accumulation +metric.report(display=False)API and returns a pandas DataFrame withthe same
('detection error rate', '%'),('false alarm', '%'),('miss', '%')columns that downstream code consumes.scripts/speaker_tasks/eval_diar_with_asr.py:get_pyannote_objs_from_rttms->get_supervisions_from_rttmsexamples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py:timestamps_to_supervisionsnamepackage by name have been rewritten (or replaced with neutral wording such
as "External Annotation Library") so a
git grep -i pyannoteover thebranch returns zero matches.
tutorials/speaker_tasks/End_to_End_Diarization_*.ipynb,tutorials/tools/Multispeaker_Simulator.ipynb) and the inference notebookupdated to use the new names and
score_labels_from_rttm_labels.Dependencies removed
requirements/requirements_asr.txt: removedpyannote.coreandpyannote.metrics.examples/voice_agent/environment.yaml: removedpyannote-core==5.0.0,pyannote-database==5.1.3,pyannote-metrics==3.2.1.uv.lock: removed the three corresponding[[package]]blocks and everytransitive
{ name = "pyannote-..." }entry. TOML structure validatedafter edit.
Tests
tests/collections/speaker_tasks/utils/test_der.py(119 unit tests)covering:
score_labels_from_rttm_labels(string-label public API)engine implementation (class
TestExternalEngineVerifiedValues)with the string-label path
TestNoUemAutoUnionregression class pinning the auto-UEMbehaviour and the NIST collar semantics with hand-derived expected
values from the diarization tutorial sample
pyannote.core/pyannote.metricssubmodulesare never imported when
der/md_evalare importedtests/collections/{asr,speaker_tasks}/utils/test_vad_utils_*.pyupdatedto use lhotse-based assertions via a new
_annotation_equals(annotation, expected_segments)helper.May/11/2026: Added more changes that fix remaining issues.
der.pyinto newmetrics/cpwer.py, and updated internal callers, tests, and tutorial imports to use the new module.153 passed.Usage
The public API is unchanged, so existing user code continues to work. New
shorthand for users that already have RTTM-style label strings:
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information
Removes a maintenance liability: the previous external diarization metric packages have been on pip with infrequent updates and have pulled in a large transitive closure (pyannote-database, pyannote-pipeline, ...). After this PR, NeMo's DER pipeline depends only on numpy, scipy, lhotse, and editdistance -- all already required.
Backward-compatibility audit: git grep -i pyannote over the branch returns zero matches across Python sources, notebooks, configs, lockfile, docs, and shell scripts. import nemo followed by inspecting sys.modules shows no pyannote.* entries.
Numerical-parity audit: 21 verified-against-the-previous-engine DER values hardcoded in TestExternalEngineVerifiedValues, plus 7 regression tests pinning the auto-UEM and collar semantics with hand-derived expected values from the diarization tutorial sample.