Skip to content
/ nlp Public
forked from Majdoddin/nlp

modelo de diarização e transcrição

License

Notifications You must be signed in to change notification settings

albcunha/nlp

 
 

Repository files navigation

If you like my code, please donate

Pyannote plays and Whisper rhymes Open In Colab

Andrej Karpathy suggested training a classifier on top of openai/whisper model features to identify the speaker, so we can visualize the speaker in the transcript. But, as pointed out by Christian Perone, it seems that features from whisper wouldn't be that great for speaker recognition as its main objective is basically to ignore speaker differences.

In the following, I use pyannote-audio, a speaker diarization toolkit by Hervé Bredin, to identify the speakers, and then match it with the transcriptions of Whispr. I do it on the first 30 minutes of Lex's 2nd interview with Yann LeCun. Check the result here.

To make it easier to match the transcriptions to diarizations by speaker change, Sarah Kaiser suggested runnnig the pyannote.audio first and then just running whisper on the split-by-speaker chunks. For sake of performance (and transcription quality?), we attach the audio segements into a single audio file with a silent spacer as a seperator, and run whisper on it. Enjoy it!

About

modelo de diarização e transcrição

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%