# Deploy Nemotron Super 49B to Vertex Model Garden
This notebook demonstrates how to deploy Llama Nemotron NVIDIA Inference Microservices (NIM) to Google Cloud Platform (GCP) Vertex AI Model Garden.

## Use Case
Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks.

For more information please refer to this [documentation](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/) and [build.nvidia.com](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1_5/modelcard).

## 1. Enable model in Vertex Model Garden

Please go to [Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/nvidia/model-garden/llama-nemotron-super), click `Enable` and follow the prompts.

## 2. Authenticate and install dependencies

Run the following command in the terminal

In [None]:
! gcloud auth application-default login

Install dependencies

In [None]:
! pip install --upgrade --force-reinstall "google-cloud-aiplatform>=1.135.0"

## 3. Initialize clients

Set the following variables accordingly:

In [None]:
PROJECT_ID = "your-project-id"
REGION = "us-central1"

In [None]:
MODEL_ID = "nvidia/llama-nemotron-super@49b"
MODEL_NAME = "nvidia/llama-3.3-nemotron-super-49b-v1.5"
ENDPOINT_NAME = "nvidia-llama-nemotron-super-49b"

import json
import vertexai
from vertexai import model_garden
from google.cloud import aiplatform

vertexai.init(project=PROJECT_ID, location=REGION)
aiplatform.init(project=PROJECT_ID, location=REGION)

## 4. Deploy the model

In [None]:
model = model_garden.OpenModel(MODEL_ID)

endpoint = model.deploy(
    machine_type="g4-standard-384",
    accelerator_type="NVIDIA_RTX_PRO_6000",
    accelerator_count=8,
    accept_eula=True,
)

## 5. List model endpoints and filter for endpoint name

In [None]:
endpoints = aiplatform.Endpoint.list()
target = next((ep for ep in endpoints if ENDPOINT_NAME in ep.display_name), None)
assert target, f"Endpoint containing {ENDPOINT_NAME} not found"
target

## 6. Perform inference against the filtered endpoint

In [None]:
prompt = "Give one fact about Vertex AI."
body = json.dumps(
    {
        "model": MODEL_NAME,
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": 64,
    }
).encode("utf-8")

response = target.raw_predict(
    body=body,
    headers={"Content-Type": "application/json"},
    use_dedicated_endpoint=True,
)

assert response.status_code == 200, response.text
payload = response.json()
print(payload)

print(payload["choices"][0]["message"]["content"])

## 7. Cleanup
Run only if you want to remove the deployment.

In [None]:
endpoints = aiplatform.Endpoint.list()
target = next((ep for ep in endpoints if ENDPOINT_NAME in ep.display_name), None)
target.undeploy_all()
target.delete()