## Inference Operator PySDK E2E Expereience (JumpStart model)

<b>Prerequisite:</b> Data scientists should list clusters and set cluster context

In [5]:
from sagemaker.hyperpod.hyperpod_manager import HyperPodManager

In [16]:
HyperPodManager.list_clusters(region='us-east-2')

Orchestrator    Cluster Name
--------------  ----------------------------
EKS             hp-cluster-for-inf-Beta2try1


In [17]:
# choose the HP cluster user works on
HyperPodManager.set_context('hp-cluster-for-inf-Beta2try1', region='us-east-2')

Updated context arn:aws:eks:us-east-2:637423555983:cluster/EKSClusterForInf-Beta2try1 in /tmp/kubeconfig
Successfully set current cluster as: hp-cluster-for-inf-Beta2try1


In [6]:
# verify current kube context
HyperPodManager.get_context()

'arn:aws:eks:us-east-2:637423555983:cluster/EKSClusterForInf-Beta2try1'

### Create JumpStart model endpoint

#### Create from spec object (for experienced users)

In [1]:
from sagemaker.hyperpod.inference.config.hp_jumpstart_endpoint_config import Model, Server,SageMakerEndpoint, TlsConfig, EnvironmentVariables
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint

In [2]:
# create configs
model=Model(
    model_id='deepseek-llm-r1-distill-qwen-1-5b',
    model_version='2.0.4',
)
server=Server(
    instance_type='ml.g5.8xlarge',
)
endpoint_name=SageMakerEndpoint(name='deepsek7bsme-testing-zhaoqi-0627-jumpstart')
tls_config=TlsConfig(tls_certificate_output_s3_uri='s3://tls-bucket-inf1-beta2')

# create spec
js_endpoint=HPJumpStartEndpoint(
    model=model,
    server=server,
    sage_maker_endpoint=endpoint_name,
    tls_config=tls_config,
)

In [3]:
# use spec to deploy
js_endpoint.create()


Deploying model and endpoint using config:
 apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1
kind: JumpStartModel
metadata:
  name: deepseek-llm-r1-distill-qwen-1-5b
  namespace: default
spec:
  maxDeployTimeInSeconds: 3600
  model:
    acceptEula: false
    modelHubName: SageMakerPublicHub
    modelId: deepseek-llm-r1-distill-qwen-1-5b
    modelVersion: 2.0.4
  replicas: 1
  sageMakerEndpoint:
    name: deepsek7bsme-testing-zhaoqi-0627-jumpstart
  server:
    instanceType: ml.g5.8xlarge
  tlsConfig:
    tlsCertificateOutputS3Uri: s3://tls-bucket-inf1-beta2


Deploying model and its endpoint... The process may take a few minutes.


In [16]:
js_endpoint.refresh()

In [30]:
js_endpoint.status.endpoints.sagemaker.state

'CreationCompleted'

In [17]:
# print refreshed config
import yaml
print(yaml.dump(js_endpoint.model_dump(exclude_none=True)))

maxDeployTimeInSeconds: 3600
model:
  acceptEula: false
  modelHubName: SageMakerPublicHub
  modelId: deepseek-llm-r1-distill-qwen-1-5b
  modelVersion: 2.0.4
namespace: default
replicas: 1
sageMakerEndpoint:
  name: deepsek7bsme-testing-zhaoqi-0627-jumpstart
server:
  instanceType: ml.g5.8xlarge
status:
  conditions:
  - lastTransitionTime: '2025-06-28T23:50:57Z'
    message: Deployment, ALB Creation or SageMaker endpoint registration creation
      for model is in progress
    reason: InProgress
    status: 'True'
    type: DeploymentInProgress
  - lastTransitionTime: '2025-06-28T23:56:01Z'
    message: Deployment and SageMaker endpoint registration for model have been created
      successfully
    reason: Success
    status: 'True'
    type: DeploymentComplete
  deploymentStatus:
    deploymentObjectOverallState: DeploymentComplete
    lastUpdated: '2025-06-28T23:56:02Z'
    name: deepseek-llm-r1-distill-qwen-1-5b
    reason: NativeDeploymentObjectFound
    status:
      availableRe

<b>Note:</b> We auto-generate config class definitions above using script, such as `Model`, `Server`, `SageMakerEndpoint`. This is based on [Inference CRD file](https://code.amazon.com/packages/AWSCrescendoInferenceOperator/blobs/mainline/--/dist/config/crd/inference.sagemaker.aws.amazon.com_jumpstartmodels.yaml).

In [18]:
# output is similar to kubectl get jumpstartmodels
endpoint_list = HPJumpStartEndpoint.list()

In [19]:
# output is similar to kubectl describe jumpstartmodel huggingface-eqa-bert-base-cased
endpoint = HPJumpStartEndpoint.get(name='deepseek-llm-r1-distill-qwen-1-5b')
endpoint

HPJumpStartEndpoint(autoScalingSpec=AutoScalingSpec(cloudWatchTrigger=CloudWatchTrigger(dimensions=None, metricCollectionPeriod=300, metricCollectionStartTime=300, metricName=None, metricStat='Average', metricType='Average', minValue=0.0, name=None, namespace=None, targetValue=None, useCachedMetrics=True), cooldownPeriod=300, initialCooldownPeriod=300, maxReplicaCount=5, minReplicaCount=1, pollingInterval=30, prometheusTrigger=PrometheusTrigger(customHeaders=None, metricType='Average', name=None, namespace=None, query=None, serverAddress=None, targetValue=None, useCachedMetrics=True), scaleDownStabilizationTime=300, scaleUpStabilizationTime=0), environmentVariables=None, maxDeployTimeInSeconds=3600, metrics=None, model=Model(acceptEula=False, additionalConfigs=None, gatedModelDownloadRole=None, modelHubName='SageMakerPublicHub', modelId='deepseek-llm-r1-distill-qwen-1-5b', modelVersion='2.0.4'), replicas=1, sageMakerEndpoint=SageMakerEndpoint(name='deepsek7bsme-testing-zhaoqi-0627-jump

In [31]:
# delete endpoint
endpoint.delete()

Deleting model and its endpoint...


### Invoke endpoint

In [20]:
# invoke
data='{"inputs":"What is the capital of USA?"}'

endpoint.invoke(body=data).body.read()

b'{"generated_text": " What is the capital of France? What is the capital of Japan? What is the capital of China? What is the capital of Germany? What is"}'