# NVIDIA-Pipecat Automatic Speech Recognition Basics

The RivaASRService provides streaming speech recognition using NVIDIA’s Riva ASR models. It supports real-time transcription with interim results and interruption handling.

## Setup and Prerequisites
Before running this notebook, make sure you have:
- An NVIDIA API key for accessing cloud-hosted models via NVCF: [build.nvidia.com](build.nvidia.com)

## Setup Environment and Import Libraries

In [5]:
import os
import getpass
from dotenv import load_dotenv

# Load environment variables from a .env file if available
load_dotenv()
api_key = os.getenv("NVIDIA_API_KEY")

# Prompt if not set or invalid
if not api_key or not api_key.startswith("nvapi-"):
    print("NVIDIA API key not found or invalid.")
    api_key = getpass.getpass("🔐 Enter your NVIDIA API key: ").strip()
    if not api_key.startswith("nvapi-"):
        raise ValueError(f"{api_key[:5]}... is not a valid NVIDIA API key")
    # Set in environment for the current session
    os.environ["NVIDIA_API_KEY"] = api_key

In [2]:
import asyncio
import nest_asyncio
import io
import os
import sys
import grpc
import IPython.display as ipd
from dotenv import load_dotenv
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.frames.frames import AudioRawFrame, TextFrame, EndFrame
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from nvidia_pipecat.services.nvidia_llm import NvidiaLLMService
from nvidia_pipecat.services.riva_speech import RivaASRService, RivaTTSService
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams

## Transcription with Riva ASR
**ASR** takes an audio stream or audio buffer as input and returns one or more text transcripts, along with additional optional metadata. Speech recognition in Riva is a GPU-accelerated compute pipeline, with optimized performance and accuracy.  

Riva provides state-of-the-art OOTB (out-of-the-box) models and pipelines for multiple languages, like English, Spanish, German, Russian and Mandarin, that can be easily deployed with nvidia-pipecat.  

Now, let's generate a transcript using Riva ASR Service for a sample audio clip, starting with English.

In [6]:
# Connect to the RivaASRService
stt = RivaASRService(
    api_key=os.getenv("NVIDIA_API_KEY"), # set API Key
    voice_id= "English-US.Female-1",  # define the voice
    )

### Offline recognition for English
You can use Riva ASR in either **streaming** mode or **offline** mode. In streaming mode, a continuous stream of audio is captured and recognized, producing a stream of transcribed text.  
In offline mode, an audio clip of a set length is transcribed to text. Riva ASR supports .wav files in pulse-code modulation (PCM) format; including .alaw, .mulaw, and .flac formats.

Now, let's start by loading the sample audio. Let's look at an example showing offline ASR for an English audio clip:

In [8]:

# This example uses a .wav file with LINEAR_PCM encoding.
# read in an audio file from local disk
audio_path = "audio_samples/en-Mark_Neutral.wav"
with io.open(audio_path, 'rb') as fh:
    audio_data = fh.read()
ipd.Audio(audio_path)

<br>
When an STT service processes audio, it generates TranscriptionFrame objects that contain the transcribed text.  
These frames have a .text property that contains the actual transcription:

In [9]:
async def main():

    pipeline = Pipeline([stt])
    task = PipelineTask(pipeline)

    async for frame in stt.run_stt(audio_data):
        if frame is not None:
            print(frame)
        
    
if __name__ == "__main__":
    nest_asyncio.apply()
    await main()

[32m2025-05-14 15:03:13.669[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m177[0m - [34m[1mLinking PipelineSource#0 -> RivaASRService#1[0m
[32m2025-05-14 15:03:13.671[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m177[0m - [34m[1mLinking RivaASRService#1 -> PipelineSink#0[0m
[32m2025-05-14 15:03:13.672[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m177[0m - [34m[1mLinking PipelineTaskSource#0 -> Pipeline#0[0m
[32m2025-05-14 15:03:13.672[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m177[0m - [34m[1mLinking Pipeline#0 -> PipelineTaskSink#0[0m


Exception: RivaASRService#1 TaskManager is still not initialized.