# Automatic Speech Recognition with DeCRED

This notebook provides a simple example of how to use the **DeCRED** model for automatic speech recognition (ASR). DeCRED (**De**coder-**C**entric **R**egularisation in **E**ncoder-**D**ecoder) improves model robustness and generalisation, especially in out-of-domain scenarios.

For more details on the **DeCRED** model, refer to the [model card on Hugging Face](https://huggingface.co/BUT-FIT/DeCRED-base).

For more information on the **ASR pipeline**, please refer to the [documentation](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.AutomaticSpeechRecognitionPipeline).


In [None]:
%cd ..

In [None]:
!pip install transformers
!pip install torchaudio

# Instantiating the pipeline

In [None]:
from transformers import pipeline

model_id = "BUT-FIT/DeCRED-base"
pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
# The warning can be ignored.
pipe.type = "seq2seq"

# Load dummy audio file and play it

In [4]:
import IPython.display as ipd
import torchaudio
wav, sr = torchaudio.load("data/audio.wav")
ipd.Audio(wav, rate=sr)

# Transcribe the audio file with default settings (joint CTC-attention beam search decoding)

In [5]:
pipe("data/audio.wav")

{'text': 'this is a demo recording to test the pipeline'}

# Run greedy decoding without joint CTC-attention scorer


In [8]:
pipe.model.generation_config.ctc_weight = 0.0
pipe.model.generation_config.num_beams = 1

pipe("data/audio.wav")

{'text': 'this is a demo recording to test the pipeline'}