![Ambient Provider](./assets/ambientprovider.png)

# Ambient Provider Getting Started Guide

# Table of Contents

- [Ambient Provider](#ambient-provider)
  - [Summary](#summary)
  - [Key Capabilities](#key-capabilities)
  - [System Architecture](#system-architecture)
- [Getting Started](#getting-started)
  - [Prerequisites at a Glance](#prerequisites-at-a-glance)
  - [NGC Account](#ngc-account)
  - [HW Requirements](#hw-requirements)
  - [Docker Installation](#docker-installation)
  - [NVIDIA NIM Deployment](#nvidia-nim-deployment)
  - [Dataset Download](#dataset-download)
  - [Installation](#installation)
- [Using the Platform](#using-the-platform)
    - [Basic workflow](#basic-workflow)
    - [Advanced Features](#advanced-features)
---

# Important: Git Submodule Setup

⚠️⚠️  **Before proceeding, make sure to pull the git submodule first:**  ⚠️ ⚠️ 

In [None]:
#Navigate to the ambient-healthcare-agents directory
%cd ~/ambient-healthcare-agents/

In [None]:
!git submodule update --init --recursive

If you elect to run these commands directly from within the jupyter notebook. Please enable scrolling for cell outputs to ensure clear visualization.

# Summary
Ambient Provider is a comprehensive platform that converts audio recordings of medical consultations into structured clinical notes. The system uses NVIDIA NIM (NVIDIA Inference Microservices) for accurate speech recognition with speaker diarization, combined with reasoning large language models to generate medical documentation.

# Prerequisite Setup
There are two key components to this transcription workflow:
1) The NVIDIA NIM ASR transcription services with diarization (Parakeet model)
2) The [llama-3.3-nemotron-super-49b-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1) reasoning model. 

This getting started guide will help you set up the necessary hardware and api keys to be able to run the ambient provider developer example.

### Prerequisites at a Glance
The bullet points below highlight an overview of the steps in this getting started guide:
- **Setup NGC account**: Setting up account to download resources
- **Ensure valid HW**: Confirm your system has the required HW
- **Install Docker & NVIDIA Container Toolkit**: Enable GPU support for containers
- **Deploy NIM**: Launch NVIDIA NIM for ASR and diarization
- **Install SW**: Clone the repository and establish environment

### Key Requirements
1. **Hardware Requirements**:
   - NVIDIA GPU with 16GB+ VRAM (for NVIDIA Riva ie: NVIDIA RTX, T4, L4)

2. **Software Requirements**:
   - Docker & Docker Compose v2.0+
   - Git (for cloning repository)
   - npm (Node Package Manager, with Node > 20)

3. **API Keys**:
   - NVIDIA API Key (from NGC)
   - Network access to NVIDIA Riva deployment


### NGC Account
Setup an account on [NGC](https://ngc.nvidia.com) using the procedure in the [NGC user guide](https://docs.nvidia.com/ngc/gpu-cloud/ngc-user-guide/index.html). 

This is needed in order to obtain the necessary NGC API key credentials required to pull the [Riva Speech Skills SDK](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/containers/riva-speech) container and access the cloud api endpoints.

Once your NGC credentials and Cloud account is setup, follow the details below to obtain the NGC_API_KEY to be able to access the [NVIDIA API catalog endpoints](https://build.nvidia.com) (AI Models, etc.). 

### Generate an API Key
To access NGC resources, you need an NGC API key:

1. Visit [NGC Personal Key Generation](https://org.ngc.nvidia.com/setup/personal-keys)
2. Create a new API key
3. Ensure "NGC Catalog" is selected from the "Services Included" dropdown
4. Copy the generated API key

### Export the API Key
Make the NGC API key available to Docker:


In [None]:
import os
os.environ["NGC_API_KEY"] = "<your-api-key>"

## Prepare Your Machine

### Docker
Install [Docker](https://docs.docker.com/engine/install/) on your system. Check to ensure the installation worked:

In [None]:
!docker -v

### NVIDIA Container Toolkit
Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit) to enable GPU support in Docker containers.

After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit [documentation.](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker)

### Verify Installation
Test your setup with the following command:


In [None]:
!docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

This should produce output similar to:
```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:1B:00.0 Off |                    0 |
| N/A   36C    P0            112W /  700W |   78489MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
```

### Docker Login to NGC
Authenticate with the NVIDIA Container Registry:


In [None]:
!echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

### Checking Docker Storage Location

Before we start with this notebook, we need to check the Docker storage location of the Brev instance. Since self deploying the NIMs will require 325 GB of disk space for the Docker related artifacts, we need to make sure the docker storage is specified to a location with enough disk space.


In [None]:
# view the disk space of the Brev instance you are using,
# you should see a partition /ephemeral with enough space (more than 325 GB)
!df -h

In [None]:
# next view the content of the docker service file
!cat /etc/docker/daemon.json

If the "data-root" is not specified, or specified to a partition that does not have enough disk space, modify the /etc/docker/daemon.json file so that it has `"data-root": "/ephemeral"`:
```
{
 ...,
 "data-root": "/ephemeral"
}
```

Open the terminal, and run 
```
sudo nano /etc/docker/daemon.json 
```

to open the file for editing.

For example:
```
{
    "default-runtime": "nvidia",
    "mtu": 1500,
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "data-root": "/ephemeral"
}
```

In [None]:
# view the content of the docker service file to make sure it has the correct setting
!cat /etc/docker/daemon.json

In [None]:
# then restart the docker service
!sudo systemctl restart docker

In [None]:
# ensure the new volume read+execute access
!sudo chmod 755 /ephemeral/

# Quick Start

## 1. Setup the virtual environment


If the uv package manager is not installed, please install with the command below:

In [None]:
!curl -LsSf https://astral.sh/uv/install.sh | sh #sudo snap install uv (if within terminal)
import os
os.environ["PATH"] = os.path.expanduser("~/.local/bin") + ":" + os.environ["PATH"]
!uv --version

In [None]:
!sudo apt-get update
!sudo apt-get install -y portaudio19-dev

In [None]:
%cd ambient-provider

In [None]:
!uv python install 3.13
!uv python pin 3.13

!uv venv --clear
!uv sync

In [None]:
%pwd
%cd ambient-scribe

## 2. Download the medical conversation dataset
The dataset example used in this workflow can be obtained from [Hugging Face](https://huggingface.co/datasets/yfyeung/medical) and consists of simulated patient-physician interactions. Download the dataset onto the machine that will be used to visualize the UI. You do not need to add this to the repository. 

For example, If you host this develoepr exmaple on brev but are accessing the UI from a PC, please download the dataset to that PC directly. 

Be sure to untar the audio.tar.gz from within the dataset. Specifically, you should use the command tar -xzvf audio.tar.gz to obtain the audio folder.

Later, once the UI is deployed, you will be able to select files from this folder from within the UI. 


## 3. Bootstrap the environment


The following command:
- Creates necessary directories
- Sets up environment files
- Validates dependencies

> **Note:** After running make bootstrap, you may notice a WARNING to fill in the .env file located at apps/api/.env. This file is created during the make bootstrap command. The WARNING indicates you must fill in the parameters as specified below before proceeding to the make dev-nim command. 

In [None]:
# 1. Check Node.js version (must be >= 20)
!node --version

In [None]:
# 2. If not installed or version < 20, install nvm and use Node.js 20
!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
!sudo apt-get install -y nodejs
!export NVM_DIR="$HOME/.nvm" && [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" && nvm install 20 && nvm use 20

In [None]:
# 3. Check node and npm availability
!node --version
!npm --version

In [None]:
# This command may take a few minutes to setup the npm packages
!make bootstrap

## 4. Deploy Riva and the Ambient Provider

You have two options for deploying NVIDIA NIM:

### Option 1: RIVA Integrated with Docker Compose (Recommended)
The easiest way is to use the built-in NIM profile that's integrated with the application:

#### Configure environment variables:


If you do not see the hidden .env file in jupyter lab, please modify the file within a terminal session. 

In [None]:
# Edit the API configuration
# nano apps/api/.env

# Add the following configuration:
# NVIDIA API Configuration (Required)
# NVIDIA_API_KEY=your_nvidia_api_key_here
# RIVA_URI=parakeet-nim:50051

#### Deploy the dev environment:


In [None]:
# Development with local NIM. This will take 5-10 minutes for the NIMs to standup
!make dev-nim

In [None]:
# Health Check for Riva and LLM NIMs
import requests

llm_url = "http://localhost:8001"
try:
    resp = requests.get(llm_url + "/v1/")
    print(f'LLM ready')
except Exception as e:
    print(f'LLM not ready')

# Check Riva ASR
riva_url = "http://localhost:9000"
try:
    resp = requests.get(riva_url + "/v1/health")
    print(f'Riva ready')
except Exception as e:
    print(f'Riva not ready')

In [None]:
!make dev

### Option 2: Standalone NIM Deployment
Deploy the Parakeet 1.1b English ASR model manually with speaker diarization support:

This option is primarily if you intend to deploy the riva container on a separate machine as your ambient provider. If you deploy riva outside of the docker network of ambient provider on the same machine, you may experience difficulties communicating between your riva and ambient provider applications due to firewall rules.

#### On the separate machine:


In [None]:
# Set container configuration
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=all"

# Launch the NIM container
docker run -d --rm --name=$CONTAINER_ID \
   --runtime=nvidia \
   --gpus '"device=0"' \
   --shm-size=8GB \
   -e NGC_API_KEY \
   -e NIM_HTTP_API_PORT=9000 \
   -e NIM_GRPC_API_PORT=50051 \
   -p 9000:9000 \
   -p 50051:50051 \
   -e NIM_TAGS_SELECTOR \
   nvcr.io/nim/nvidia/$CONTAINER_ID:latest

If you are self hosting the LLM reasonign NIM as well. Please follow the following documentation. https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1/deploy.

For Option 2, configure your environment to point to your separate Riva machine:


In [None]:
# Edit the API configuration
# nano apps/api/.env

# Add the following configuration:
# NVIDIA API Configuration (Required)  
# NVIDIA_API_KEY=your_nvidia_api_key_here
# RIVA_URI=<YOUR_RIVA_IP>:50051

In [None]:
# Then deploy without local NIM
# make dev

#### Verify NIM Deployment
Check that the NIM container is running:


In [None]:
docker ps | grep parakeet

You should see output similar to:
```
a1b2c3d4e5f6   nvcr.io/nim/nvidia/parakeet-1-1b-ctc-en-us:latest   "/opt/nvidia/nvidia_…"   2 minutes ago   Up 2 minutes   0.0.0.0:9000->9000/tcp, 0.0.0.0:50051->50051/tcp   parakeet-1-1b-ctc-en-us
```

The NIM will be accessible at:
- **HTTP API**: http://localhost:9000
- **gRPC API**: localhost:50051

> **Note:** After starting the NIM container, check the container logs to ensure you see a message indicating that Riva is running and listening on port 9000. If you do not see this message, the Riva NIM may still be starting up. You can view the logs with:
>
> ```bash
> docker logs -f $CONTAINER_ID
> ```
>
> Wait until you see confirmation that the service is running on port 9000 before proceeding.

## 5. Access the applications
Please note if you are using brev, please follow step 6 to either expose the port as a secure link or create an ngrok tunnel. 

- **UI**: http://localhost:5173
- **API Documentation**: http://localhost:8000/api/docs
- **Health Check**: http://localhost:8000/api/health


## 6. Enable port access
- **Brev**: If your cloud service provider enables exposing a port through the UI like in brev, you may specify to expose TCP/UDP traffic to port 5173 for this quick start guide. 
- **ngrok**: If you cannot expose ports directly, you can use [ngrok](https://ngrok.com/) to create a secure tunnel to your local development environment.

### Using ngrok for remote access


In [None]:
# Install ngrok (if not already installed)
sudo snap install ngrok

# Add your ngrok authtoken (get from ngrok.com dashboard)
ngrok config add-authtoken <YOUR_NGROK_AUTHTOKEN>

# Expose your local port (e.g., 5173 for the UI)
ngrok http 5173

# Using the Platform

## Basic Workflow

1. **Upload Audio File**:
   - Drag and drop an audio file (MP3, WAV, M4A, FLAC)
   - Supported formats are automatically validated
   - Maximum file size: 100MB (configurable)

2. **Transcription Process**:
   - Audio is converted to 16kHz mono WAV format
   - NVIDIA Riva processes with speaker diarization
   - Transcript segments are created with timestamps and speaker tags

3. **Select Note Template**:
   - Choose from available templates:
     - **SOAP Default**: Standard Subjective, Objective, Assessment, Plan format
     - **Progress Note**: For follow-up visits
     - **Custom templates**: Created by your organization

4. **Generate Medical Note**:
   - AI processes the transcript using the selected template
   - Real-time progress is shown with processing traces
   - Note sections are generated and displayed incrementally

5. **Edit and Refine**:
   - Use the rich text editor to modify content
   - Citations automatically link note content to transcript segments
   - Autocomplete suggests content from the transcript

6. **Export and Save**:
   - Copy note to clipboard
   - Save for future reference
   - Export in various formats


## How to Convert Between Streaming and Offline Transcription
To switch between streaming and offline transcription modes, you need to update both the frontend and backend environment configuration files:

1. **Frontend**:  
   - Go into your frontend environment file (e.g., `apps/ui/.env`).
   - Find the setting that enables streaming (e.g., `VITE_ENABLE_STREAMING=true`) and change it to `false`:
     ```
     VITE_ENABLE_STREAMING=false
     ```

2. **Backend**:  
   - Open your backend environment file (e.g., `apps/api/.env`).
   - Change `ENABLE_STREAMING=true` to `ENABLE_STREAMING=false`.
   - Update the Riva model name to use the offline model by replacing the word `streaming` with `offline` in the `RIVA_MODEL` variable. For example:
     ```
     ENABLE_STREAMING=false
     RIVA_MODEL=parakeet-1.1b-en-US-asr-offline-silero-vad-sortformer
     ```
   - Make sure to restart both the frontend and backend services after making these changes. If you have a dev deployment the system will restart automatically.

3. **Reload**:

In [None]:
!make down

In [None]:
!make dev

## Use hosted NIM
To use the hosted NIM (NVIDIA Inference Microservice) instead of a self-hosted Riva deployment, you need to update your backend environment configuration:

1. **Set `SELF_HOSTED` to `false`**  
   In your backend `.env` file, change:
   ```
   SELF_HOSTED=false
   ```

2. **Update the Riva Function ID**  
   Replace the `RIVA_FUNCTION_ID` value with the function ID provided by NVIDIA for your hosted NIM instance:
   ```
   RIVA_FUNCTION_ID=your_hosted_nim_function_id_here
   ```

3. **Set the Riva URI to the NVIDIA GRP URL**  
   Update the `RIVA_URI` to point to the NVIDIA hosted endpoint, for example:
   ```
   RIVA_URI=grp.nvidia.com:443
   ```

Make sure to restart your backend service after making these changes for them to take effect.


In [None]:
!make down

In [None]:
!make dev