You will need to upload to S3 a .tar.gz
file, including:
- the trained model file(s)
- an inference handler script in
code/
- an S3 bucket to store the model
It may be just a model (e.g., pytorch_model.pt
) or a whole set of files as, for example, with Hugging Face models (e.g., WellcomeBertMesh/...
)
IMPORTANT: If you have a folder, It's very important you don't zip the folder, but the contents of the folder itself.
Note that some models require an inference handler script, implementing the following Sagemaker interface functions:
input_fn
: responsible for pre-processing the inputmodel_fn
: responsible for loading the modelpredict_fn
: responsible for running the model on the input dataoutput_fn
: post-processing of the model's output
See the example entrypoints in sage/handlers
folder for examples. Note that in model_fn
, you should refer to the model's path without the tar.gz
extension, i.e. as if the model were local and already unzipped (SageMaker will automatically unzip the model for you).
You can create an S3 bucket via:
aws s3 mb s3://<bucket-name>
Go to the folder of your model, and add the entrypoint
file to a folder inside named code/
.
Example:
mkdir <path-to-model>/code
cp sage/handlers/pytorch/inference.py <path-to-model>/code/
Next, zip your local model:
cd <path-to-model>
tar -czvf model.tar.gz *
Upload the zipped model to S3:
aws s3 cp model.tar.gz s3://<bucket-name>
curl -sSL https://install.python-poetry.org | python3 -
poetry install
poetry shell
Deploys a model to an inference endpoint. The model can be either from a local path or an ECR URI.
sage deploy
<image-uri> \
<task> \
<role> \
--model-path <model-path> \
--instance-count <instance-count> \
--instance-type <instance-type> \
--endpoint-name <endpoint-name>
image-uri
: Currently, we support fourimage-uri
frameworks for deployment, namely:aws
: This is where you specify an image URI that is available in ECR. This image will be used as the entrypoint for the deployment app.pytorch
: This is where you specify a path to a PyTorch model in an S3 bucket. Model must be tar-gzip compressed.sklearn
: This is where you specify a path to a scikit-learn model in an S3 bucket. Model must be tar-gzip compressed.transformers
: This is where a model from the HuggingFace model repository is deployed. The model name must be specified.
task
means the type of task, by defaulttext-classification
role
here means yourarn:...
role with permissions for sagemakermodel-path
is the path tos3
where your model lies. You can also use model names if using hugging face,- but many of them are not supported.
instance-type
is the type of instance from AWS (e.g,ml.m5.4xlarge
), orlocal
to run everything locally in a Docker Containerendpoint-name
give a name to your endpoint. This will be used when calling prediction.
In addition to these supported frameworks, the deployment can be either local or remote. For local deployment, the --instance-type local
flag must be specified. For remote deployment, specify a SageMaker instance type, e.g. --instance-type ml.m5.large
.
sage list
Initially, the endpoint will be in the Creating
state. Wait until it is in the InService
state.
sage predict <endpoint-name> "This is a test sentence."
The output of the inference call will be displayed to stdout.
Displays the logs of a deployed endpoint.
sage logs \
--endpoint-name <endpoint-name>
Deletes a deployed endpoint.
sage delete \
<endpoint-name>
In this section, we will deploy several models as examples.
Download, for example, WellcomeBertMesh
:
git lfs install
git clone https://huggingface.co/Wellcome/WellcomeBertMesh
cd WellcomeBertMesh
mkdir code
cp sage/handlers/huggingface_bertmesh/inference.py code/
cd WellcomeBertMesh
tar -czvf model.tar.gz *
aws s3 cp model.tar.gz s3://<bucket-name>
Activate your poetry shell and run the deploy command.
poetry shell
sage deploy \
transformers \
text-classification \
<arn-role> \
--model-path s3://<bucket-name>/model.tar.gz \
--endpoint-name wellcome \
--instance-type ml.m5.4xlarge
This will deploy the hugging face model to a remote sagemaker endpoint.
Next we'll run the inference command on this entrypoint.
sage predict \
wellcome \
"This is a test sentence."
You should get a Result: b'[{"label": "Humans", "score": 0.8084890842437744}]'
output.
We provide an example script to load the model and store it in a tar.gz archive. Note that this is a dummy model intended for demonstration only.
You can find the script in scripts/save_pt_dummy_model.py
python scripts/save_pt_dummy_model.py
That script already includes the pytorch
inference file inside, so no need to copy it.
Note that you need to run this script from an environment with PyTorch 2.0.0 installed. We purposefully do not include PyTorch in the pyproject.toml
file as the CLI itself does not need it and we want to keep the dependencies minimal.
You should now have a dummy.pt.tar.gz.
file on your disk. Upload it to S3:
aws s3 cp dummy.pt.tar.gz s3://<bucket-name>
Activate your poetry shell and run the deploy command.
poetry shell
sage deploy \
pytorch \
text-classification \
<arn-role> \
--model-path s3://<bucket-name>/dummy.pt.tar.gz \
--endpoint-name test \
--instance-type local
This will deploy the dummy pytorch model to your local machine.
Next we'll run the inference command on this entrypoint.
sage predict \
test \
"This is a test sentence." \
--local
You should get a Result: 1
output. This is the output of the dummy model.