Introuction

Here, we offer an unofficial, automatically generated speaker label of the WenetSpeech dataset for speech research.

The label is formulated with the kaldi utt2spk style as follows:

X0000018313_42315061_S00016 X0000018313_42315061_spk0
X0000018313_42315061_S00009 X0000018313_42315061_spk0
X0000018313_42315061_S00011 X0000018313_42315061_spk0
X0000018313_42315061_S00033 X0000018313_42315061_spk0
X0000018313_42315061_S00010 X0000018313_42315061_spk0

The first item represents the original utterance identity of WeNetSpeech, while the second term indicates the speaker label.

Methods

The speaker label is generated through the following steps:

Enhancing the original speech utterance using a state-of-the-art band-split RNN (BSRNN) model.
Calculating segment-level speaker embeddings using a pre-trained speaker verification model from Wespeaker.
For each long utterance, applying spectral clustering to the speech segments to generate the speaker labels.

Details

Dataset	Utterance num	Segment num	Speaker number
Wenetspeech (ori)	0.06M	14.6M	-
Wenetspeech (cluster)	0.06M	11.9M	0.23M

Note that, currently we donot apply speaker clustering across difference long utterance of wenetspeech.

Download

The utt2spk file can be downloaded via Link.

TODO

Automatic speaker label of Gigaspeech

License

Authorship: Jianwei Yu, Hangting Chen, Shuai Wang

License: Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
speaker_cluster3.png		speaker_cluster3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introuction

Methods

Details

Download

TODO

License

About

Releases

Packages

TomJwYu/WenetSpeechSpeakerCluster

Folders and files

Latest commit

History

Repository files navigation

Introuction

Methods

Details

Download

TODO

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages