## Inference Operator PySDK E2E Expereience

<b>Prerequisite:</b> Data scientists should list clusters and set cluster context

In [1]:
from sagemaker.hyperpod.hyperpod_manager import HyperPodManager
hyperpod_manager = HyperPodManager()

In [2]:
hyperpod_manager.list_clusters(region='us-east-2')

Orchestrator    Cluster Name
--------------  ----------------------------
EKS             hp-cluster-for-inf-Beta2try1


In [3]:
# choose the HP cluster user works on
hyperpod_manager.set_context('hp-cluster-for-inf-Beta2try1', region='us-east-2')

Updated context arn:aws:eks:us-east-2:637423555983:cluster/EKSClusterForInf-Beta2try1 in /tmp/kubeconfig
Successfully set current cluster: hp-cluster-for-inf-Beta2try1


In [4]:
# verify current kube context
hyperpod_manager.get_context()

Current Eks context is: arn:aws:eks:us-east-2:637423555983:cluster/EKSClusterForInf-Beta2try1


### Create JumpStart model endpoint

#### Create from spec object (for experienced users)

In [13]:
from sagemaker.hyperpod.inference.config.hp_jumpstart_endpoint_config import Model, Server, SageMakerEndpoint, JumpStartModelSpec, TlsConfig
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint

In [11]:
# create configs
model=Model(model_id='huggingface-eqa-bert-base-cased')
server=Server(instance_type='ml.c5.2xlarge')
endpoint_name=SageMakerEndpoint(name='huggingface-eqa-bert-base-cased')
tls_config=TlsConfig(tls_certificate_output_s3_uri='s3://jupiter-bucket-beta-3/')

# create spec
spec=JumpStartModelSpec(
    model=model,
    server=server,
    sage_maker_endpoint=endpoint_name,
    tls_config=tls_config,
)

In [12]:
# use spec to deploy
HPJumpStartEndpoint.create_from_spec(spec=spec)


Deploying model and endpoint using config:
 apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1
kind: JumpStartModel
metadata:
  name: huggingface-eqa-bert-base-cased
  namespace: default
spec:
  maxDeployTimeInSeconds: 3600
  model:
    acceptEula: false
    modelHubName: SageMakerPublicHub
    modelId: huggingface-eqa-bert-base-cased
  replicas: 1
  sageMakerEndpoint:
    name: huggingface-eqa-bert-base-cased
  server:
    instanceType: ml.c5.2xlarge
  tlsConfig:
    tlsCertificateOutputS3Uri: s3://jupiter-bucket-beta-3/


Deploying model and its endpoint... The process may take a few minutes.


<b>Note:</b> We auto-generate config class definitions above using script, such as `Model`, `Server`, `SageMakerEndpoint` and `JumpStartModelSpec`. This is based on [Inference CRD file](https://code.amazon.com/packages/AWSCrescendoInferenceOperator/blobs/mainline/--/dist/config/crd/inference.sagemaker.aws.amazon.com_jumpstartmodels.yaml).

#### Quick create with required inputs only

In [None]:
# fast create with fewer inputs
'''
JumpStartModelEndpoint.create(
    namespace='default',
    model_id='huggingface-eqa-bert-base-cased',
    instance_type='ml.c5.2xlarge',
)
'''

#### Other operations

In [13]:
# output is similar to kubectl get jumpstartmodels
HPJumpStartEndpoint.list_endpoints()

METADATA NAME                    CREATE TIME
-------------------------------  --------------------
huggingface-eqa-bert-base-cased  2025-06-19T23:21:48Z
sklearn-regression-linear-4      2025-06-19T23:00:36Z


In [9]:
# output is similar to kubectl describe jumpstartmodel huggingface-eqa-bert-base-cased
HPJumpStartEndpoint.describe_endpoint(name='huggingface-eqa-bert-base-cased', namespace='default')

apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1
kind: JumpStartModel
metadata:
  creationTimestamp: '2025-06-19T23:02:04Z'
  finalizers:
  - inference.sagemaker.aws.JumpStartModelFinalizer
  generation: 1
  name: huggingface-eqa-bert-base-cased
  namespace: default
  resourceVersion: '4622285'
  uid: b7da864e-1d13-43c9-8b62-63fb1894d39e
spec:
  maxDeployTimeInSeconds: 3600
  model:
    acceptEula: false
    modelHubName: SageMakerPublicHub
    modelId: huggingface-eqa-bert-base-cased
  replicas: 1
  sageMakerEndpoint:
    name: sklearn-regression-linear-endpoint
  server:
    instanceType: ml.c5.2xlarge
  tlsConfig:
    tlsCertificateOutputS3Uri: s3://jupiter-bucket-beta-3/
status:
  conditions:
  - lastTransitionTime: '2025-06-19T23:04:54Z'
    message: Deployment or SageMaker endpoint registration creation for model is in
      progress
    reason: InProgress
    status: 'True'
    type: DeploymentInProgress
  - lastTransitionTime: '2025-06-19T23:07:28Z'
    message: Deployme

In [10]:
HPJumpStartEndpoint.delete_endpoint(name='huggingface-eqa-bert-base-cased')

Successful deleted model and endpoint!


### Invoke endpoint

In [11]:
from sagemaker_core.resources import Endpoint

In [14]:
# get sagemaker_core Endpoint object
endpoint = HPJumpStartEndpoint.get_endpoint(
    endpoint_name='huggingface-eqa-bert-base-cased',
    region='us-east-2',
)

In [15]:
# invoke
endpoint.invoke(body='{"question" :"what is the name of the planet?","context" : "earth"}')