# Deploy NVIDIA Parakeet TDT 0.6B v2 ASR from AWS Marketplace


NVIDIA NIM, a component of NVIDIA AI Enterprise, provides state-of-the-art Automatic Speech Recognition (ASR) capabilities through the Parakeet TDT 0.6B v2 model. This model delivers high-quality speech-to-text conversion with efficient processing and support for real-time inference. Whether you're developing voice assistants, transcription services, or any application that needs to convert speech to text, NVIDIA NIM for ASR has you covered.


In this example we show how to deploy the NVIDIA Parakeet TDT 0.6B v2 ASR model from AWS Marketplace

NVIDIA Parakeet TDT 0.6B v2 is an Automatic Speech Recognition (ASR) model optimized for efficient processing and real-time inference. This model provides high-quality speech-to-text conversion with excellent accuracy and low latency. The model is designed to handle various audio formats and provides robust transcription capabilities for voice applications.

Key Features:
- **High Accuracy**: State-of-the-art speech recognition performance
- **Real-time Processing**: Optimized for low-latency inference
- **Efficient**: 0.6B parameter model for fast processing
- **GPU Optimized**: Supports multiple GPU architectures (L40S, A100, H100)

The model supports English language processing and is optimized for various audio quality conditions.


Please check out the [NIM ASR docs](https://docs.nvidia.com/nim/automatic-speech-recognition/latest/introduction.html) for more information.


## ⚠️ Important Notes

This ASR model supports **automatic routing** between HTTP and gRPC protocols based on audio file size and requirements:

- **Small files (< 4MB)**: Automatically routed through HTTP for fast processing
- **Large files (≥ 4MB)**: Automatically routed through gRPC for efficient streaming
- **Force HTTP**: Use header `X-Amzn-SageMaker-Custom-Attributes: /invocations/http` or POST to `/invocations/http`
- **Force gRPC**: Use header `X-Amzn-SageMaker-Custom-Attributes: /invocations/grpc` or POST to `/invocations/grpc`

The gRPC mode enables advanced features like speaker diarization and word-level timestamps.


## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to the Parakeet TDT 0.6B v2 ASR model package.
1. **Audio Requirements**: The model accepts audio files in common formats (WAV, MP3, FLAC, etc.) and supports various sample rates.


## Subscribe to the model package
To subscribe to the model package:
1. Open the model package listing page
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model. Copy the ARN corresponding to your region and specify the same in the following cell.


In [25]:
import boto3, json, sagemaker, time, os
from sagemaker import get_execution_role, ModelPackage
from botocore.config import Config

config = Config(read_timeout=3600)
sess = boto3.Session()
sm = sess.client("sagemaker")
sagemaker_session = sagemaker.Session(boto_session=sess)
role = get_execution_role()
client = boto3.client("sagemaker-runtime", config=config)
region = sess.region_name
sm_runtime = boto3.client("sagemaker-runtime")



In [26]:
# replace the arn below with the model package arn you want to deploy
nim_package = "nvidia-parakeet-tdt-0-6b-v2-02d6240216f539a185542e8e7f41706d"

# Mapping for Model Packages
model_package_map = {
    "us-east-1": f"arn:aws:sagemaker:us-east-1:865070037744:model-package/{nim_package}",
    "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{nim_package}",
    "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{nim_package}",
    "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{nim_package}",
    "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{nim_package}",
    "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{nim_package}",
    "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{nim_package}",
    "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{nim_package}",
    "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{nim_package}",
    "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{nim_package}",
    "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{nim_package}",
    "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{nim_package}",
    "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{nim_package}",
    "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{nim_package}",
    "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{nim_package}",
    "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{nim_package}",
}

region = boto3.Session().region_name
if region not in model_package_map.keys():
    raise Exception(f"Current boto3 session region {region} is not supported.")

model_package_arn = model_package_map[region]
model_package_arn


'arn:aws:sagemaker:us-east-1:865070037744:model-package/nvidia-parakeet-tdt-0-6b-v2-02d6240216f539a185542e8e7f41706d'

## Create the SageMaker Endpoint

We first define SageMaker model using the specified ModelPackageArn.


In [27]:
# Define the model details
sm_model_name = "nvidia-parakeet-tdt-0-6b-v2-asr"

# Create the SageMaker model
create_model_response = sm.create_model(
    ModelName=sm_model_name,
    PrimaryContainer={
        'ModelPackageName': model_package_arn
    },
    ExecutionRoleArn=role,
    EnableNetworkIsolation=True
)
print("Model Arn: " + create_model_response["ModelArn"])


Model Arn: arn:aws:sagemaker:us-east-1:492681118881:model/nvidia-parakeet-tdt-0-6b-v2-asr


Next we create endpoint configuration specifying instance type


In [30]:
# Create the endpoint configuration
endpoint_config_name = sm_model_name

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            'VariantName': 'AllTraffic',
            'ModelName': sm_model_name,
            'InitialInstanceCount': 1,
            'InstanceType': 'ml.g6e.2xlarge', 
            'InferenceAmiVersion': 'al2-ami-sagemaker-inference-gpu-2',
            'RoutingConfig': {'RoutingStrategy': 'LEAST_OUTSTANDING_REQUESTS'},
            'ModelDataDownloadTimeoutInSeconds': 3600, # Specify the model download timeout in seconds.
            'ContainerStartupHealthCheckTimeoutInSeconds': 3600, # Specify the health checkup timeout in seconds
        }
    ]
)
print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])


Endpoint Config Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint-config/nvidia-parakeet-tdt-0-6b-v2-asr


Using the above endpoint configuration we create a new sagemaker endpoint and wait for the deployment to finish. The status will change to InService once the deployment is successful.


In [31]:
# Create the endpoint
endpoint_name = endpoint_config_name
create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])


Endpoint Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint/nvidia-parakeet-tdt-0-6b-v2-asr


In [36]:
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)


Status: InService
Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint/nvidia-parakeet-tdt-0-6b-v2-asr
Status: InService


## Run Inference

### Test Endpoint with Different Protocols

Our NIM ASR endpoint supports multiple inference methods:

1. **Auto-routing** (`/invocations`): Automatically chooses HTTP or gRPC based on file size
2. **Force HTTP** (`X-Amzn-SageMaker-Custom-Attributes: /invocations/http`): Direct HTTP route

#### Test 1: Auto-routing (Recommended)


In [37]:
def test_asr_endpoint_autorouting(audio_file_path, endpoint_name):
    """Test endpoint with auto-routing"""
    print(f"Testing auto-routing with {audio_file_path}")
    
    try:
        # Read audio file
        with open(audio_file_path, 'rb') as f:
            audio_data = f.read()
        
        file_size = len(audio_data)
        print(f"Audio file size: {file_size:,} bytes ({file_size / (1024*1024):.2f} MB)")
        
        # Create multipart form data
        boundary = f"----WebKitFormBoundary{uuid.uuid4().hex}"
        
        # Build multipart payload
        parts = []
        parts.append(f'--{boundary}')
        parts.append('Content-Disposition: form-data; name="file"; filename="test.wav"')
        parts.append('Content-Type: audio/wav')
        parts.append('')
        
        # Join text parts
        text_part = '\r\n'.join(parts) + '\r\n'
        language_part = f'\r\n--{boundary}\r\nContent-Disposition: form-data; name="language_code"\r\n\r\nen-US\r\n--{boundary}--'
        
        # Combine all parts
        payload = text_part.encode() + audio_data + language_part.encode()
        
        # Invoke endpoint
        response = sm_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType=f'multipart/form-data; boundary={boundary}',
            Body=payload
        )
        
        # Parse response
        result = json.loads(response['Body'].read().decode())
        print("result: \n")
        print(result)
        print(f"\nAuto-routing inference successful!")
        print(f"Response: {json.dumps(result, indent=2)}")
        return result
        
    except Exception as e:
        print(f"Auto-routing test failed: {e}")
        return None 

# Run auto-routing test
# Example usage (replace with your audio file path)
audio_file = "data/test.wav"
test_asr_endpoint_autorouting(audio_file, endpoint_name)


Testing auto-routing with data/test.wav
Audio file size: 237,964 bytes (0.23 MB)
result: 

{'text': "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait. "}

Auto-routing inference successful!
Response: {
  "text": "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait. "
}


{'text': "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait. "}

#### Test 2: Force HTTP Route

Test the HTTP-only route (optimized for files <5MB):


In [38]:
# Test 2: Force HTTP route
def test_endpoint_http_route(audio_file_path):
    """Test endpoint with forced HTTP route"""
    print(f"Testing HTTP route with {audio_file_path}")
    
    try:
        # Read audio file
        with open(audio_file_path, 'rb') as f:
            audio_data = f.read()
        
        # Create multipart form data
        boundary = f"----WebKitFormBoundary{uuid.uuid4().hex}"
        
        # Build multipart payload (same as auto-routing)
        parts = []
        parts.append(f'--{boundary}')
        parts.append('Content-Disposition: form-data; name="file"; filename="test.wav"')
        parts.append('Content-Type: audio/wav')
        parts.append('')
        
        text_part = '\r\n'.join(parts) + '\r\n'
        language_part = f'\r\n--{boundary}\r\nContent-Disposition: form-data; name="language_code"\r\n\r\nen-US\r\n--{boundary}--'
        payload = text_part.encode() + audio_data + language_part.encode()
        
        # Invoke endpoint with HTTP route forced
        response = sm_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType=f'multipart/form-data; boundary={boundary}',
            Body=payload,
            CustomAttributes='/invocations/http'  # Force HTTP route
        )
        
        result = json.loads(response['Body'].read().decode())
        print(f"\nHTTP route inference successful!")
        print(f"Response: {json.dumps(result, indent=2)}")
        return result
        
    except Exception as e:
        print(f"HTTP route test failed: {e}")
        return None

# Run HTTP route test
audio_file = "data/test.wav"
http_result = test_endpoint_http_route(audio_file)

Testing HTTP route with data/test.wav

HTTP route inference successful!
Response: {
  "text": "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait. "
}


### Clean up resources

Clean up the SageMaker endpoint and related resources when you're done testing.


In [39]:
# Clean up SageMaker resources
print("Cleaning up SageMaker resources...")

# Delete the endpoint
try:
    sm.delete_endpoint(EndpointName=endpoint_name)
    print(f"✅ Deleted endpoint: {endpoint_name}")
except Exception as e:
    print(f"⚠️ Error deleting endpoint: {e}")

# Delete the endpoint configuration
try:
    sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
    print(f"✅ Deleted endpoint config: {endpoint_config_name}")
except Exception as e:
    print(f"⚠️ Error deleting endpoint config: {e}")

# Delete the model
try:
    sm.delete_model(ModelName=sm_model_name)
    print(f"✅ Deleted model: {sm_model_name}")
except Exception as e:
    print(f"⚠️ Error deleting model: {e}")

print("Cleanup completed!")


Cleaning up SageMaker resources...
✅ Deleted endpoint: nvidia-parakeet-tdt-0-6b-v2-asr
✅ Deleted endpoint config: nvidia-parakeet-tdt-0-6b-v2-asr
✅ Deleted model: nvidia-parakeet-tdt-0-6b-v2-asr
Cleanup completed!
