## Inference Operator PySDK E2E Expereience

In [1]:
import sys
import warnings
import logging

logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
warnings.filterwarnings("ignore")

sys.path.insert(0, '/Users/jzhaoqwa/Documents/GitHub/private-sagemaker-hyperpod-cli-staging/sagemaker-hyperpod/src/sagemaker')

<b>Prerequisite:</b> Data scientists should list clusters and set cluster context

In [2]:
from hyperpod.hyperpod_manager import HyperPodManager
hyperpod_manager = HyperPodManager()

In [3]:
hyperpod_manager.list_clusters(region='us-east-2')

Orchestrator    Cluster Name
--------------  --------------
EKS             ml-cluster-c


In [4]:
# choose the HP cluster user works on
hyperpod_manager.set_context('ml-cluster', region='us-east-2')

Updated context arn:aws:eks:us-west-2:728022909529:cluster/sagemaker-hyperpod-eks-cluster in /tmp/kubeconfig
Successfully set current cluster: ml-cluster


In [4]:
# verify current kube context
hyperpod_manager.get_context()

Current Eks context is: arn:aws:eks:us-east-2:637423555983:cluster/EKSClusterForInf-Beta2try1


### Create JumpStart model endpoint

#### Create from spec object (for experienced users)

In [5]:
from hyperpod.inference.config.hp_jumpstart_endpoint_config import Model, Server, SageMakerEndpoint, JumpStartModelSpec
from hyperpod.inference.jumpstart_model_endpoint import JumpStartModelEndpoint

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/jzhaoqwa/Library/Application Support/sagemaker/config.yaml


In [11]:
# create configs
model=Model(model_id='sklearn-regression-linear')
server=Server(instance_type='ml.t3.medium')
endpoint_name=SageMakerEndpoint(name='sklearn-regression-/!@#$%^&*()_-linear-endpoint')

# create spec
spec=JumpStartModelSpec(model=model, server=server, sage_maker_endpoint=endpoint_name)

In [12]:
# use spec to deploy
JumpStartModelEndpoint.create_from_spec(namespace='default', spec=spec)


Deploying model and endpoint using config:
 apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1
kind: JumpStartModel
metadata:
  name: sklearn-regression-linear
  namespace: default
spec:
  maxDeployTimeInSeconds: 3600
  model:
    acceptEula: false
    modelHubName: SageMakerPublicHub
    modelId: sklearn-regression-linear
  replicas: 1
  sageMakerEndpoint:
    name: sklearn-regression-/!@#$%^&*()_-linear-endpoint
  server:
    instanceType: ml.t3.medium


Failed to deploy model and its endpoint: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e4e15edc-d17c-4156-b447-1fa08b3349b3', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'f8f85dbb-f5aa-4a06-9c36-3d79feebdf24', 'X-Kubernetes-Pf-Prioritylevel-Uid': '5b6e12ee-b997-4f26-a5a6-fc672028f3e5', 'Date': 'Mon, 16 Jun 2025 20:42:05 GMT', 'Content-Length': '758'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"stat

<b>Note:</b> We auto-generate config class definitions above using script, such as `Model`, `Server`, `SageMakerEndpoint` and `JumpStartModelSpec`. This is based on [Inference CRD file](https://code.amazon.com/packages/AWSCrescendoInferenceOperator/blobs/mainline/--/dist/config/crd/inference.sagemaker.aws.amazon.com_jumpstartmodels.yaml).

#### Quick create with required inputs only

This method overloads `create` function with required inputs. There is validation inside to make sure user cannot enter `spec` and other inputs at the same time.

In [9]:
# create with required inputs
JumpStartModelEndpoint.create(
    namespace='default',
    model_id='sklearn-regression-linear',
    instance_type='ml.t3.medium',
)


Deploying model and endpoint using config:
 apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1
kind: JumpStartModel
metadata:
  name: sklearn-regression-linear
  namespace: default
spec:
  maxDeployTimeInSeconds: 3600
  model:
    acceptEula: false
    modelHubName: SageMakerPublicHub
    modelId: sklearn-regression-linear
  replicas: 1
  sageMakerEndpoint:
    name: sklearn-regression-linear-250613-122217-491618
  server:
    instanceType: ml.t3.medium


Failed to deploy model and its endpoint: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Audit-Id': '5a01f13a-2195-4d31-b91c-2382b2297aff', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'a63ff61d-2cb1-4dc6-b815-25dc200f4b2c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '30e014a8-613f-408e-b39b-ce6221b3b1be', 'Date': 'Fri, 13 Jun 2025 19:22:18 GMT', 'Content-Length': '330'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure"

#### Other operations

In [10]:
# list all deployed endpoints
JumpStartModelEndpoint.list_endpoints()

# kubectl get jumpstartmodels

METADATA NAME              CREATE TIME
-------------------------  --------------------
sklearn-regression-linear  2025-06-13T19:18:57Z


In [11]:
# describe deployed endpoints
JumpStartModelEndpoint.describe_endpoint(name='sklearn-regression-linear', namespace='default')

# kubectl describe jumpstartmodel sklearn-regression-linear

apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1
kind: JumpStartModel
metadata:
  creationTimestamp: '2025-06-13T19:18:57Z'
  generation: 1
  name: sklearn-regression-linear
  namespace: default
  resourceVersion: '8415824'
  uid: 3c3da046-3c57-4e17-9f91-a2735e6cf470
spec:
  maxDeployTimeInSeconds: 3600
  model:
    acceptEula: false
    modelHubName: SageMakerPublicHub
    modelId: sklearn-regression-linear
  replicas: 1
  sageMakerEndpoint:
    name: sklearn-regression-linear-endpoint
  server:
    instanceType: ml.t3.medium



In [12]:
HPJumpStartEndpoint.delete_endpoint(name='sklearn-regression-linear', namespace='default')

Successful deleted model and endpoint!


In [None]:
# invoke endpoint
endpoint = HPJumpStartEndpoint.get_endpoint(endpoint_name='endpoint-name')

body = '{}'
endpoint.invoke(body)