# SageMaker JumpStart - deploy text generation model

This notebook demonstrates how to use the SageMaker Python SDK to deploy a SageMaker JumpStart text generation model and invoke the endpoint.

In [14]:
from sagemaker.jumpstart.model import JumpStartModel

Select your desired model ID. You can search for available models in the [Built-in Algorithms with pre-trained Model Table](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html).

In [15]:
model_id = "meta-textgeneration-llama-2-13b"

If your selected model is gated, you will need to set `accept_eula` to True to accept the model end-user license agreement (EULA).

In [24]:
accept_eula = True

## Deploy model

Using the model ID, define your model as a JumpStart model. You can deploy the model on other instance types by passing `instance_type` to `JumpStartModel`. See [Deploy publicly available foundation models with the JumpStartModel class](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-use-python-sdk.html#jumpstart-foundation-models-use-python-sdk-model-class) for more configuration options.

In [25]:
model = JumpStartModel(model_id=model_id)

You can now deploy your JumpStart model. The deployment might take few minutes.

In [26]:
predictor = model.deploy(accept_eula=accept_eula)

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'ml.g5.12xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.

## Invoke endpoint

Programmatically retrieve example playloads from the `JumpStartModel` object.

In [20]:
import os
import sagemaker
import boto3
from sagemaker import get_execution_role
sm_runtime = boto3.client('runtime.sagemaker')
def call_sagemaker_endpoint(endpoint_name):
    try:
        payload = json.dumps({'inputs': 'who is the GOAT in tamil cinema', 'parameters': {'max_new_tokens': 500, 'top_p': 0.9, 'temperature': 0.6, 'details': True}})
        response = sm_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',
            Body=payload
        )
        result = json.loads(response['Body'].read().decode())
        return result
    except Exception as e:
        logger.error(f"Error calling SageMaker endpoint: {str(e)}")
        return None
    
response = call_sagemaker_endpoint("")
#print(type(response))
response = response[0] if isinstance(response, list) else response
    #print("Input:\n", payload.body, end="\n\n")
print("Output:\n", response["generated_text"].strip(), end="\n\n\n")


## Clean up

In [None]:
#predictor.delete_predictor()