# Demo: Deploy Models Locally with SageMaker Model Builder in IN_PROCESS Mode

This notebook was tested with the `Python 3` kernel on an Amazon SageMaker notebook instance of type `ml.g5.4xlarge`.

In this notebook, we demonstrate how customers can deploy a model locally to a FastAPI server without needing to set up a container. This approach enables quicker validation and allows faster iteration before customers proceed with deployment using either local container mode or SageMaker endpoint mode. After successful in-process testing, customers can switch to another mode for further testing.

You can either launch this notebook from an Amazon SageMaker notebook instance which handles all credentials automatically, or by running it locally and setting credentials manually.

***

The notebook is accompanied by the following files:
- `sagemaker-2.232.4.dev0-py3-none-any.whl`: The whl file containing all of the PythonSDK changes for these features. 

In [1]:
!pip install pydantic>=2.0.0

In [1]:
!pip install --force-reinstall --no-cache-dir --quiet sagemaker-2.232.4.dev0-py3-none-any.whl

Collecting setuptools==65.5.1
  Downloading setuptools-65.5.1-py3-none-any.whl.metadata (6.3 kB)
Downloading setuptools-65.5.1-py3-none-any.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 75.1.0
    Uninstalling setuptools-75.1.0:
      Successfully uninstalled setuptools-75.1.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-multimodal 1.1.1 requires nvidia-ml-py3==7.352.0, which is not installed.
dash 2.18.1 requires dash-core-components==2.0.0, which is not installed.
dash 2.18.1 requires dash-html-components==2.0.0, which is not installed.
dash 2.18.1 requires dash-table==5.0.0, which is not installed.
autogluon-core 1.1.1 requires sciki

In [3]:
# import these to run fast api servers
# TO_DO: add these to sagemaker pysdk requirements.
!pip install --quiet torch transformers fastapi uvicorn nest-asyncio
!pip install -U pyopenssl



# [WalkThrough] Define the custom inference code
Just tell us how to load your model and how to invoke it. We'll take care of the rest

In [6]:
from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
import json


class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        return pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

    def invoke(self, input_data, model):
        if isinstance(input_data, str):
            input_data = json.loads(input_data)
        response = model(question=input_data["question"], context=input_data["context"])
        return response


inf_spec = MyInferenceSpec()

# [WalkThrough] Start the IN_PROCESS mode server

In [7]:
from sagemaker.serve import Mode
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Expected output: the model’s answer based on the provided context
schema = SchemaBuilder(
    {
        "context": "The demo is focused on SageMaker and machine learning. It has gone well so far, with no major issues, and the participants are engaged.",
        "question": "What is the demo about?"
    },
    {
        "answer": "SageMaker and machine learning."
    }
)

# deploying the model to a fast api server with minimum inputs from user
predictor = ModelBuilder(
    inference_spec=inf_spec,
    schema_builder=schema,
    mode=Mode.IN_PROCESS,  # you can change it to Mode.LOCAL_CONTAINER for local container testing
).build().deploy()

ModelBuilder: INFO:     Either inference spec or model is provided. ModelBuilder is not handling MLflow model input
ModelBuilder: INFO:     ModelBuilder will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features. To opt out of telemetry, please disable via TelemetryOptOut in intelligent defaults. See https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk for more info.
ModelBuilder: INFO:     Waiting for fastapi server to start up...
ModelBuilder: INFO:     Waiting for a connection...
[32mINFO[0m:     Started server process [[36m1797[0m]
[32mINFO[0m:     Waiting for application startup.
[32mINFO[0m:     Application startup complete.
[32mINFO[0m:     Uvicorn running on [1mhttp://127.0.0.1:9007[0m (Press CTRL+C to quit)
ModelBuilder: DEBUG:     Received request: {'context': 'The demo is focused on SageMaker and machine learning. It has gone well so far, w

[32mINFO[0m:     127.0.0.1:56262 - "[1mPOST /invoke HTTP/1.1[0m" [32m200 OK[0m


ModelBuilder: DEBUG:     Ping health check has passed. Returned b'{"score":0.7950457334518433,"start":23,"end":53,"answer":"SageMaker and machine learning"}'
ModelBuilder: DEBUG:     ModelBuilder metrics emitted.
ModelBuilder: DEBUG:     Received request: {'question': 'What is the main topic?', 'context': 'The demo is focused on SageMaker and machine learning. It has gone well so far, with no major issues, and the participants are engaged.'}


[32mINFO[0m:     127.0.0.1:50454 - "[1mPOST /invoke HTTP/1.1[0m" [32m200 OK[0m


# [WalkThrough] Now that the server is running, send a prompt and see the response

In [8]:
# Define input data for the question-answering model
input_data = {
    "question": "What is the main topic?",
    "context": "The demo is focused on SageMaker and machine learning. It has gone well so far, with no major issues, and the participants are engaged."
}

# Convert the input data to JSON format and pass it to `predict`
response = predictor.predict(input_data)

# Check the model's response
print(response)

b'{"score":0.8696708679199219,"start":23,"end":53,"answer":"SageMaker and machine learning"}'


## [WalkThrough] Cleanup the server

In [5]:
predictor.delete_predictor()

ModelBuilder: INFO:     Shutting down the server...
ModelBuilder: INFO:     Server shutdown complete.


---
# Now try it out for yourself

Samples:
- Can this embedding model to work with IN_PROCESS mode? https://huggingface.co/BAAI/bge-m3#generate-embedding-for-text
- Can yo

# Your custom load and invoke logic here

In [12]:
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        # your load logic here <---
        pass

    def invoke(self, input_data, model):
        # your invoke logic here <---
        pass

inf_spec = MyInferenceSpec()

# Now deploy it

In [None]:
from sagemaker.serve import Mode
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

schema = SchemaBuilder(
    {},
    {}
)

predictor = ModelBuilder(
    inference_spec=inf_spec,
    schema_builder=schema,
    mode=Mode.IN_PROCESS,
).build().deploy()

# Now invoke it

In [None]:
input_data = {} # your input data here <---

response = predictor.predict(input_data)

response