Persona-ASR

Bilingual Kazakh–English target-speaker ASR for overlapping speech.

Persona-ASR transcribes an enrolled target speaker from a multi-talker overlapping mixture and explicitly rejects utterances in which the target speaker is absent. It couples an enrollment-conditioned recognizer (a frozen ECAPA-TDNN speaker embedding modulating a WavLM-Base+ encoder through FiLM, with language-specific CTC heads) with a target-presence gate, and supports same- and cross-language enrollment.

Datasets and checkpoints (Hugging Face)

Resource	Link
Model checkpoints (backbone + presence gate)	https://huggingface.co/issai/Persona-ASR
KazMix-3 (Kazakh 3-speaker TS-ASR dataset)	https://huggingface.co/datasets/issai/KazMix-3
PersonaMix (controlled bilingual benchmark)	https://huggingface.co/datasets/issai/PersonaMix

KazMix-3 ships mixture manifests and generation scripts; the audio is regenerated from the Kazakh Speech Dataset (OpenSLR 140). PersonaMix provides the full controlled benchmark.

Code

Training and evaluation code will be released in this repository.

License

Released under CC BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Persona-ASR

Datasets and checkpoints (Hugging Face)

Code

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Persona-ASR

Datasets and checkpoints (Hugging Face)

Code

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages