# Deploy NVIDIA NIM from AWS Marketplace

NVIDIA NIM, a component of NVIDIA AI Enterprise, enhances your applications with optimized inference microservices for NVIDIA models, delivering a standard API, optimized profiles, and enterprise support.

In this example we show how to deploy the **Magpie TTS Multilingual** NIM from AWS Marketplace on Amazon SageMaker. Magpie TTS supports text-to-speech synthesis in English (en-US), Spanish (es-US), French (fr-FR), German (de-DE), Mandarin (zh-CN), Vietnamese (vi-VN), and Italian (it-IT).

Please check out the [Magpie TTS model card](https://build.nvidia.com/nvidia/magpie-tts-multilingual/modelcard) and [NIM TTS docs](https://docs.nvidia.com/nim/riva/tts/latest/index.html) for more information.

## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**.
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used:
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**
    2. or your AWS account already has a subscription to the model package.

## Subscribe to the model package
To subscribe to the model package:
1. Open the model package listing page
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [None]:
import boto3, json, sagemaker, time, os, base64
from sagemaker import get_execution_role, ModelPackage
from botocore.config import Config

config = Config(read_timeout=3600)
sess = boto3.Session()
sm = sess.client("sagemaker")
sagemaker_session = sagemaker.Session(boto_session=sess)
role = get_execution_role()
client = boto3.client("sagemaker-runtime", config=config)
region = sess.region_name

In [None]:
# Replace the package name below with the AWS Marketplace package name for Magpie TTS Multilingual
nim_package = ""

# Mapping for Model Packages (fill nim_package above, or directly paste full ARNs here)
model_package_map = {
    "us-east-1": f"arn:aws:sagemaker:us-east-1:865070037744:model-package/{nim_package}",
    "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{nim_package}",
    "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{nim_package}",
    "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{nim_package}",
    "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{nim_package}",
    "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{nim_package}",
    "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{nim_package}",
    "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{nim_package}",
    "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{nim_package}",
    "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{nim_package}",
    "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{nim_package}",
    "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{nim_package}",
    "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{nim_package}",
    "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{nim_package}",
    "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{nim_package}",
    "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{nim_package}",
}

region = boto3.Session().region_name
if region not in model_package_map:
    raise Exception(f"Current boto3 session region {region} is not supported.")

model_package_arn = model_package_map[region]
model_package_arn

## Create the SageMaker Endpoint

We first define a SageMaker model using the specified `ModelPackageArn`.

In [None]:
# Define the model details
sm_model_name = "magpie-tts-multilingual"

create_model_response = sm.create_model(
    ModelName=sm_model_name,
    PrimaryContainer={
        "ModelPackageName": model_package_arn
    },
    ExecutionRoleArn=role,
    EnableNetworkIsolation=True,
)
print("Model Arn: " + create_model_response["ModelArn"])

Next we create an endpoint configuration specifying instance type. Recommended starting point: `ml.g6e.xlarge`.

In [None]:
endpoint_config_name = sm_model_name

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTraffic",
            "ModelName": sm_model_name,
            "InitialInstanceCount": 1,
            "InstanceType": "ml.g6e.xlarge",
            "InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-2",
            "RoutingConfig": {"RoutingStrategy": "LEAST_OUTSTANDING_REQUESTS"},
            "ModelDataDownloadTimeoutInSeconds": 3600,
            "ContainerStartupHealthCheckTimeoutInSeconds": 3600,
        }
    ],
)
print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

Create an endpoint and wait for it to become `InService`.

In [None]:
endpoint_name = endpoint_config_name
create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)
print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

In [None]:
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

### Run Inference

### API Request Format

The request body follows the [NVIDIA Riva TTS SynthesizeSpeechRequest proto](https://docs.nvidia.com/nim/riva/tts/1.6.0/protos.html#nvidia-riva-tts-synthesizespeechrequest):

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | ✅ Yes | Text to synthesize |
| `voice_name` | string | No | Voice name ([available voices](https://docs.nvidia.com/nim/riva/tts/1.10.0/support-matrix.html#available-voices)) |
| `language_code` | string | No | Language code: en-US, es-US, fr-FR, de-DE, zh-CN, vi-VN, it-IT ([docs](https://docs.nvidia.com/nim/riva/tts/1.10.0/support-matrix.html#magpie-tts-multilingual)) |
| `sample_rate_hz` | int | No | Sample rate (default: 44100) |
| `encoding` | string | No | `LINEAR_PCM` or `OGGOPUS` |
| `zero_shot_data` | object | No | `{audio_prompt, quality, transcript}` for voice cloning (gRPC only) |
| `custom_dictionary` | string | No | `"word1  pron1,word2  pron2"` with double-space separator (gRPC only) |

### Response Format

Response matches [NIM SynthesizeSpeechResponse](https://docs.nvidia.com/nim/riva/tts/1.6.0/protos.html#nvidia-riva-tts-synthesizespeechresponse):

| Field | Type | Description |
|-------|------|-------------|
| `audio` | string | Base64-encoded audio bytes |
| `meta` | object | Optional metadata from NIM |

### Non-Streaming Request

In [None]:
payload = {
    "text": "Hello, this is a Magpie TTS AWS Marketplace test.",
    "language_code": "en-US",
    "voice_name": "Magpie-Multilingual.EN-US.Aria",
    "sample_rate_hz": 44100,
    "encoding": "LINEAR_PCM",
}

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
    Accept="application/json",
)

out = json.loads(response["Body"].read().decode("utf8"))
audio_bytes = base64.b64decode(out["audio"])
with open("magpie_marketplace.wav", "wb") as f:
    f.write(audio_bytes)
print("Wrote magpie_marketplace.wav", len(audio_bytes), "bytes")

### Transport Selection (Optional)

Use `CustomAttributes` header to select HTTP or gRPC transport:

| CustomAttributes | Description |
|------------------|-------------|
| `/invocations/http` | Force HTTP transport |
| `/invocations/grpc` | Force gRPC transport |
| *(not set)* | Auto-routing |

In [None]:
payload = {
    "text": "This request forces gRPC transport.",
    "language_code": "en-US",
    "voice_name": "Magpie-Multilingual.EN-US.Aria",
    "sample_rate_hz": 44100,
    "encoding": "LINEAR_PCM",
}

print("Sending TTS request forcing gRPC transport...")

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
    Accept="application/json",
    CustomAttributes="/invocations/grpc",
)

out = json.loads(response["Body"].read().decode("utf8"))

if "audio" in output:
    audio_bytes = base64.b64decode(output["audio"])
    with open("output_forced_grpc.wav", "wb") as f:
        f.write(audio_bytes)
    print(f"Audio saved to output_forced_grpc.wav ({len(audio_bytes)} bytes)")
else:
    print("No 'audio' key in response:", output)

### TTS with Custom Dictionary (gRPC-only)
Use custom pronunciation dictionary for specific words.

In [None]:
payload = {
    "text": "Welcome to NVIDIA and Amazon SageMaker integration.",
    "language_code": "en-US",
    "voice_name": "Magpie-Multilingual.EN-US.Aria",
    "custom_dictionary": "NVIDIA  en-VID-ee-ah,SageMaker  SAGE-may-ker,TTS  tee-tee-ess",
    "sample_rate_hz": 44100
}

print("Sending TTS request with custom dictionary...")

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
    Accept="application/json"
)

output = json.loads(response["Body"].read().decode("utf8"))

if 'audio' in output:
    audio_b64 = output['audio']
    audio_bytes = base64.b64decode(audio_b64)
    
    with open('output_custom_dict.wav', 'wb') as f:
        f.write(audio_bytes)
    
    print(f"Audio saved to output_custom_dict.wav ({len(audio_bytes)} bytes)")

### Streaming inference

Use `invoke_endpoint_with_response_stream` and `CustomAttributes="/invocations/stream"`.

This returns raw audio chunks which you can write to a file and optionally convert to WAV.

In [None]:
payload = {
    "text": "This is a streaming text-to-speech example. Audio is delivered in real-time.",
    "language_code": "en-US",
    "voice_name": "Magpie-Multilingual.EN-US.Aria",
    "sample_rate_hz": 44100
}

print("Sending streaming TTS request...")

stream_resp = client.invoke_endpoint_with_response_stream(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
    CustomAttributes="/invocations/stream",
)

event_stream = stream_resp['Body']
audio_chunks = []
chunk_count = 0

for event in event_stream:
    try:
        payload_part = event.get('PayloadPart', {})
        if 'Bytes' in payload_part:
            chunk = payload_part['Bytes']
            audio_chunks.append(chunk)
            chunk_count += 1
            if chunk_count == 1:
                print(f"First audio chunk received! ({len(chunk)} bytes)")
    except Exception as e:
        print(f"Error processing stream: {e}")
        break

if audio_chunks:
    audio_data = b''.join(audio_chunks)
    
    with open('output_streaming.raw', 'wb') as f:
        f.write(audio_data)
    
    print(f"Streaming complete: {chunk_count} chunks, {len(audio_data)} bytes total")
    print(f"Raw audio saved to output_streaming.raw")
    
    import wave, io
    wav_buffer = io.BytesIO()
    with wave.open(wav_buffer, 'wb') as wav_file:
        wav_file.setnchannels(1)
        wav_file.setsampwidth(2)
        wav_file.setframerate(44100)
        wav_file.writeframesraw(audio_data)
    
    with open('output_streaming.wav', 'wb') as f:
        f.write(wav_buffer.getvalue())
    
    print(f"WAV audio saved to output_streaming.wav")

### Multilingual Examples
Supported languages: en-US, es-US, fr-FR, de-DE, zh-CN, vi-VN, it-IT

In [None]:
# Spanish example
payload_spanish = {
    "text": "Hola, bienvenido a NVIDIA.",
    "language_code": "es-US",
    "sample_rate_hz": 44100
}

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(payload_spanish),
    ContentType="application/json",
    Accept="application/json"
)

output = json.loads(response["Body"].read().decode("utf8"))
if 'audio' in output:
    with open('output_spanish.wav', 'wb') as f:
        f.write(base64.b64decode(output['audio']))
    print("Spanish audio saved")

### Cleanup

Delete resources when you’re done to avoid ongoing charges.

In [None]:
# sm.delete_model(ModelName=sm_model_name)
# sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
# sm.delete_endpoint(EndpointName=endpoint_name)
print("Cleanup cells ready (commented out)")