# Demo: Deploy Models Locally with SageMaker Model Builder in IN_PROCESS Mode

This notebook was tested with the `Python 3` kernel on an Amazon SageMaker notebook instance of type `ml.g5.4xlarge`.

In this notebook, we demonstrate how customers can deploy a model locally to a FastAPI server without needing to set up a container. This approach enables quicker validation and allows faster iteration before customers proceed with deployment using either local container mode or SageMaker endpoint mode. After successful in-process testing, customers can switch to another mode for further testing.

You can either launch this notebook from an Amazon SageMaker notebook instance which handles all credentials automatically, or by running it locally and setting credentials manually.


In [None]:
!pip install sagemaker

In [None]:
# import these to run fast api servers
!pip install --quiet torch transformers fastapi uvicorn nest-asyncio "protobuf==4.23.0"
!pip install -U pyopenssl

# [WalkThrough] Define the custom inference code
Just tell us how to load your model and how to invoke it. We'll take care of the rest

In [None]:
from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
import json


class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        return pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

    def invoke(self, input_data, model):
        if isinstance(input_data, str):
            input_data = json.loads(input_data)
        response = model(question=input_data["question"], context=input_data["context"])
        return response


inf_spec = MyInferenceSpec()

# [WalkThrough] Start the IN_PROCESS mode server

In [None]:
from sagemaker.serve import Mode
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Expected output: the model’s answer based on the provided context
schema = SchemaBuilder(
    {
        "context": "The demo is focused on SageMaker and machine learning. It has gone well so far, with no major issues, and the participants are engaged.",
        "question": "What is the demo about?"
    },
    {
        "answer": "SageMaker and machine learning."
    }
)

# deploying the model to a fast api server with minimum inputs from user
predictor = ModelBuilder(
    inference_spec=inf_spec,
    schema_builder=schema,
    mode=Mode.IN_PROCESS,  # you can change it to Mode.LOCAL_CONTAINER for local container testing
).build().deploy()

# [WalkThrough] Now that the server is running, send a prompt and see the response

In [None]:
# Define input data for the question-answering model
input_data = {
    "question": "What is the main topic?",
    "context": "The demo is focused on SageMaker and machine learning. It has gone well so far, with no major issues, and the participants are engaged."
}

# Convert the input data to JSON format and pass it to `predict`
response = predictor.predict(input_data)

# Check the model's response
print(response)

## [WalkThrough] Cleanup the server

In [None]:
predictor.delete_predictor()

---
# Now try it out for yourself


### Your custom load and invoke logic here

In [None]:
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        # your load logic here <---
        pass

    def invoke(self, input_data, model):
        # your invoke logic here <---
        pass

inf_spec = MyInferenceSpec()

### Now deploy it

In [None]:
from sagemaker.serve import Mode
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

schema = SchemaBuilder(
    {},
    {}
)

predictor = ModelBuilder(
    inference_spec=inf_spec,
    schema_builder=schema,
    mode=Mode.IN_PROCESS,
).build().deploy()

### Now invoke it

In [None]:
input_data = {} # your input data here <---

response = predictor.predict(input_data)

response