# Run Model Inference on Fine-Tuned Model Endpoint

Once a model is deployed as a Sagemaker Endpoint, you can test model endpoint inference using `sagemaker.Predictor` class which test as input and allowing `Predictor` Class to do the heavy lifting.

In [None]:
import sagemaker
from datasets import load_dataset
from random import randrange
from sagemaker import serializers, deserializers

In [None]:
sess = sagemaker.Session()

## Sample Dataset

We need sample dataset to test our model inference

In [None]:
def format_dolly(sample, incl_answer=True):
    instruction = f"### Instruction\n{sample['instruction']}"
    context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
    response = f"### Answer\n{sample['response']}" if incl_answer else None
    # join all the parts together
    prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])

    if not incl_answer:
        return prompt, sample['response']
    else:
        return prompt

In [None]:
inference_dataset = load_dataset("databricks/databricks-dolly-15k", split="train[15%:17%]")

In [None]:
sample_query, gt_answer = format_dolly(inference_dataset[0], False) 
sample_query = sample_query + "\n\n### Answer"

In [None]:
print(sample_query)

## Run Prediction

To run inference, we need to instantiate a new `sagemaker.Predictor` class.

In [None]:
predictor = sagemaker.Predictor(
    endpoint_name="ft-meta-llama2-7b-chat-tg-ep",
    sagemaker_session=sess,
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
)

In [None]:
response = predictor.predict(
    {
        "inputs": sample_query,
        "parameters": {"temperature": 0.6, "max_new_tokens": 256}
    }
)

In [None]:
print(sample_query + "\n" + response['generated_text'])