# Deploy JumpStart Models on SageMaker HyperPod Inference

This notebook demonstrates how to deploy a SageMaker JumpStart model on a HyperPod cluster using the Inference Operator.

We use **Qwen2.5-7B-Instruct** as the example model.

## Prerequisites
- HyperPod cluster created with inference operator installed
- SageMaker JumpStart model available to you
- `kubectl` configured to access your EKS cluster

## 1.0 Set Environment Variables

In [None]:
import os

os.environ["MODEL_ID"] = "huggingface-llm-qwen2-5-7b-instruct"
os.environ["SAGEMAKER_ENDPOINT_NAME"] = "qwen25-7b-jumpstart"
os.environ["CLUSTER_NAMESPACE"] = "default"
os.environ["INSTANCE_TYPE"] = "ml.g5.24xlarge"
os.environ["REGION"] = "<region>"                        # e.g. us-east-1
os.environ["HYPERPOD_CLUSTER_NAME"] = "<hyperpod-cluster-name>"

## 1.1 Resolve EKS Cluster Name and Update Kubeconfig

In [None]:
%%bash
export EKS_CLUSTER_NAME=$(aws --region $REGION sagemaker describe-cluster \
    --cluster-name $HYPERPOD_CLUSTER_NAME \
    --query 'Orchestrator.Eks.ClusterArn' --output text | cut -d'/' -f2)
echo "EKS Cluster: $EKS_CLUSTER_NAME"
aws eks update-kubeconfig --name $EKS_CLUSTER_NAME --region $REGION

## 2.0 Deploy JumpStart Model

Create and apply the JumpStartModel YAML manifest.

In [None]:
%%bash
cat << 'EOF' > jumpstart_model.yaml
---
apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: JumpStartModel
metadata:
  name: ${SAGEMAKER_ENDPOINT_NAME}
  namespace: ${CLUSTER_NAMESPACE}
spec:
  sageMakerEndpoint:
    name: ${SAGEMAKER_ENDPOINT_NAME}
  model:
    modelHubName: SageMakerPublicHub
    modelId: ${MODEL_ID}
  server:
    instanceType: ${INSTANCE_TYPE}
  metrics:
    enabled: true
  maxDeployTimeInSeconds: 1800
  autoScalingSpec:
    cloudWatchTrigger:
      name: "SageMaker-Invocations"
      namespace: "AWS/SageMaker"
      useCachedMetrics: false
      metricName: "Invocations"
      targetValue: 10
      minValue: 0.0
      metricCollectionPeriod: 30
      metricStat: "Sum"
      metricType: "Average"
      dimensions:
        - name: "EndpointName"
          value: "${SAGEMAKER_ENDPOINT_NAME}"
        - name: "VariantName"
          value: "AllTraffic"
EOF

# Substitute env vars and apply
envsubst < jumpstart_model.yaml > jumpstart_model_resolved.yaml
kubectl apply -f jumpstart_model_resolved.yaml

## 3.0 Verify Deployment

In [None]:
!kubectl describe JumpStartModel $SAGEMAKER_ENDPOINT_NAME -n $CLUSTER_NAMESPACE

## 4.0 Invoke Model through ALB (Ingress)

The primary invocation method is through the Application Load Balancer ingress endpoint. Since local machines cannot directly access the ingress URL, use a debug pod to test from within the cluster.

In [None]:
%%bash
export INGRESS_URL=$(kubectl get ingress alb-$SAGEMAKER_ENDPOINT_NAME \
    -n $CLUSTER_NAMESPACE \
    -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "Ingress URL: $INGRESS_URL"

Create a debug pod and invoke the model from within the cluster:

```bash
kubectl run debug-pod --image=public.ecr.aws/amazonlinux/amazonlinux:latest \
    -n $CLUSTER_NAMESPACE --env="INGRESS_URL=$INGRESS_URL" --rm -it -- sh
```

Inside the debug pod:

```bash
curl -k -X POST https://${INGRESS_URL}/invocations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "huggingface-llm-qwen2-5-7b-instruct",
    "messages": [
      {
        "role": "user",
        "content": "Hello! How are you?"
      }
    ]
  }'
```

## 5.0 (Optional) Invoke through SageMaker Endpoint

You can also invoke through the SageMaker Runtime API.

In [None]:
!aws sagemaker describe-endpoint --region $REGION --endpoint-name $SAGEMAKER_ENDPOINT_NAME --output table

In [None]:
import boto3
import json

runtime = boto3.client("sagemaker-runtime", region_name=os.environ["REGION"])

payload = {
    "model": os.environ["MODEL_ID"],
    "messages": [{"role": "user", "content": "Hello! How are you?"}]
}

response = runtime.invoke_endpoint(
    EndpointName=os.environ["SAGEMAKER_ENDPOINT_NAME"],
    ContentType="application/json",
    Body=json.dumps(payload)
)

result = json.loads(response["Body"].read().decode())
print(json.dumps(result, indent=2))

## 6.0 Clean Up

Delete the JumpStartModel resource to remove the deployment and associated SageMaker endpoint.

In [None]:
!kubectl delete -f jumpstart_model_resolved.yaml