This notebook shows how to use some of the pre-trained models availble in speechbrain. You can find the list of pre-trained models on huggingface [here](https://huggingface.co/speechbrain).

##Installing Speechbrain

In [None]:
!pip install speechbrain
!pip install transformers

Import necessary libraries from speechbrain

In [2]:
import speechbrain as sb
from speechbrain.dataio.dataio import read_audio
from IPython.display import Audio

## **Automatic Speech Recognition Task**
## English

Here, we use the **EncoderDecoderASR** model to transcribe into text.

You can find more information about the model [here](https://speechbrain.readthedocs.io/en/latest/API/speechbrain.pretrained.interfaces.html#speechbrain.pretrained.interfaces.EncoderDecoderASR)

In [6]:
from speechbrain.pretrained import EncoderDecoderASR

#Load the pre-trained model
asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-rnnlm-librispeech", savedir="pretrained_models/asr-crdnn-rnnlm-librispeech")

# Load the wav file and transcribe into text
asr_model.transcribe_file('/content/00001.wav')

"BECAUSE OF THAT I CRIED PERSONALITY THAT WOULD HE BRINGS TO A FILM NAY YOU IMAGINE YOU'D BE VERY VERY NERVOUS TO WORK FOR SOMEBODY LIKE THAT BUT ACTUALLY QUITE THE"

In [7]:
signal = read_audio("/content/00001.wav").squeeze()
Audio(signal, rate=16000)

## *Mandarin*

It also supports multiple languages, for example the following is text transcribed to mandarin

In [8]:
from speechbrain.pretrained import EncoderDecoderASR

asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-aishell", savedir="pretrained_models/asr-transformer-aishell_model")
asr_model.transcribe_file("speechbrain/asr-transformer-aishell/example_mandarin.wav")

Downloading:   0%|          | 0.00/3.74k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/127M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/300k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/69.0k [00:00<?, ?B/s]

'他 应该 也是 喜欢'

In [9]:
signal = read_audio("example_mandarin.wav").squeeze()
Audio(signal, rate=16000)

# **Speaker Verification**

Here, given two sentences the task is to verify if they belong to the same speaker. 

You can find more information about the mode [here](https://speechbrain.readthedocs.io/en/latest/API/speechbrain.pretrained.interfaces.html#speechbrain.pretrained.interfaces.SpeakerRecognition)

In [12]:
from speechbrain.pretrained import SpeakerRecognition

#Load the pre-trained model
verification = SpeakerRecognition.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb", savedir="pretrained_models/spkrec-ecapa-voxceleb")

# Infer the similarity using cosine distance
score, prediction = verification.verify_files("speechbrain/spkrec-ecapa-voxceleb/example1.wav", "speechbrain/spkrec-ecapa-voxceleb/example2.flac")

print(prediction, score)

tensor([False]) tensor([0.1635])


In [13]:
signal = read_audio("example1.wav").squeeze()
Audio(signal, rate=16000)

In [14]:
signal = read_audio("example2.flac").squeeze()
Audio(signal, rate=16000)