Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions doc/doc_utils/jumpstart_doc_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,20 +143,26 @@ def create_jumpstart_model_table():
file_content.append(".. |external-link| raw:: html\n\n")
file_content.append(' <i class="fa fa-external-link"></i>\n\n')

file_content.append("==================================\n")
file_content.append("JumpStart Available Model Table\n")
file_content.append("==================================\n")
file_content.append("================================================\n")
file_content.append("Built-in Algorithms with pre-trained Model Table\n")
file_content.append("================================================\n")
file_content.append(
"""
JumpStart for the SageMaker Python SDK uses model IDs and model versions to access the necessary
utilities. This table serves to provide the core material plus some extra information that can be useful
in selecting the correct model ID and corresponding parameters.\n"""
The SageMaker Python SDK uses model IDs and model versions to access the necessary
utilities for pre-trained models. This table serves to provide the core material plus
some extra information that can be useful in selecting the correct model ID and
corresponding parameters.\n"""
)
file_content.append(
"""
If you want to automatically use the latest version of the model, use "*" for the `model_version` attribute.
We highly suggest pinning an exact model version however.\n"""
)
file_content.append(
"""
These models are also available through the
`JumpStart UI in SageMaker Studio <https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html>`__\n"""
)
file_content.append("\n")
file_content.append(".. list-table:: Available Models\n")
file_content.append(" :widths: 50 20 20 20 30 20\n")
Expand All @@ -183,5 +189,6 @@ def create_jumpstart_model_table():
" - `{} <{}>`__ |external-link|\n".format(model_source, model_spec["url"])
)

f = open("doc_utils/jumpstart.rst", "w")
f = open("doc_utils/pretrainedmodels.rst", "w")
f.writelines(file_content)
f.close()
File renamed without changes.
130 changes: 61 additions & 69 deletions doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -573,24 +573,31 @@ Here is an example:
# When you are done using your endpoint
model.sagemaker_session.delete_endpoint('my-endpoint')

*********************************************************
Use SageMaker JumpStart Algorithms with Pretrained Models
*********************************************************
***********************************************************************
Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK
***********************************************************************

SageMaker Python SDK provides built-in algorithms with pre-trained models from popular open source model
hubs, such as TensorFlow Hub, Pytorch Hub, and HuggingFace. Customer can deploy these pre-trained models
as-is or first fine-tune them on a custom dataset and then deploy to a SageMaker endpoint for inference.


SageMaker SDK built-in algorithms allow customers access pre-trained models using model ids and model
versions. The ‘pre-trained model’ table below provides list of models with information useful in
selecting the correct model id and corresponding parameters. These models are also available through
the `JumpStart UI in SageMaker Studio <https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html>`__.

JumpStart for the SageMaker Python SDK uses model ids and model versions to access the necessary
utilities. This table serves to provide the core material plus some extra information that can be useful
in selecting the correct model id and corresponding parameters.

.. toctree::
:maxdepth: 2

doc_utils/jumpstart
doc_utils/pretrainedmodels

Example notebooks
=================

JumpStart supports 15 different machine learning problem types. Below is a list of all the supported
problem types with a link to a Jupyter notebook that provides example usage.
SageMaker built-in algorithms with pre-trained models support 15 different machine learning problem types.
Below is a list of all the supported problem types with a link to a Jupyter notebook that provides example usage.

Vision
- `Image Classification <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_image_classification/Amazon_JumpStart_Image_Classification.ipynb>`__
Expand All @@ -610,25 +617,15 @@ Text
- `Text Embedding <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_embedding/Amazon_JumpStart_Text_Embedding.ipynb>`__

Tabular
- `Tabular Classification (LightGBM & Catboost) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_tabular_classification/Amazon_JumpStart_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
- `Tabular Classification (XGBoost & Linear Learner) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_tabular_classification/Amazon_JumpStart_Tabular_Classification_XGBoost_LinearLearner.ipynb>`__
- `Tabular Regression (LightGBM & Catboost) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_tabular_regression/Amazon_JumpStart_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
- `Tabular Regression (XGBoost & Linear Learner) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_tabular_regression/Amazon_JumpStart_Tabular_Regression_XGBoost_LinearLearner.ipynb>`__


`Amazon SageMaker JumpStart <https://aws.amazon.com/sagemaker/getting-started/>`__ is a
SageMaker feature that helps users bring machine learning (ML)
applications to market using prebuilt solutions for common use cases,
example notebooks, open source models from model zoos, and built-in
algorithms.

A JumpStart model enables you to quickly start a machine learning
workflow. JumpStart takes models from popular open source model hubs,
such as TensorFlow and HuggingFace, and pre-trains them on an open
source dataset. Using the SageMaker Python SDK, you can select a
prebuilt model from the model zoo to train on custom data or deploy
to a SageMaker endpoint for inference without signing up for
SageMaker Studio.
- `Tabular Classification (LightGBM & Catboost) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
- `Tabular Classification (XGBoost & Scikit-learn Linear Learner) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/xgboost_linear_learner_tabular/Amazon_Tabular_Classification_XGBoost_LinearLearner.ipynb>`__
- `Tabular Classification (AutoGluon) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Classification_AutoGluon.ipynb>`__
- `Tabular Classification (TabTransformer) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb>`__
- `Tabular Regression (LightGBM & Catboost) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
- `Tabular Regression (XGBoost & Scikit-learn Linear Learner) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/xgboost_linear_learner_tabular/Amazon_Tabular_Regression_XGBoost_LinearLearner.ipynb>`__
- `Tabular Regression (AutoGluon) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Regression_AutoGluon.ipynb>`__
- `Tabular Regression (TabTransformer) <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb>`__


The following topic give you information about JumpStart components,
as well as how to use the SageMaker Python SDK for these workflows.
Expand All @@ -644,24 +641,22 @@ Prerequisites
Amazon S3. For more information about IAM role permissions,
see `Policies and permissions in IAM <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html>`__.

JumpStart Components
====================
Built-in Components
===================

The following sections give information about the main JumpStart
The following sections give information about the main built-in
components and their function.

JumpStart models
----------------
Pre-trained models
------------------

JumpStart maintains a model zoo of over 300 models pre-trained on
open source datasets. You can use the SageMaker Python SDK
to fine-tune a model on your own dataset or deploy it directly to a
SageMaker endpoint for inference.
SageMaker maintains a model zoo of over 300 models from popular open source model hubs, such as
TensorFlow Hub, Pytorch Hub, and HuggingFace. You can use the SageMaker Python SDK to fine-tune
a model on your own dataset or deploy it directly to a SageMaker endpoint for inference.

JumpStart model artifacts are stored as tarballs in the JumpStart S3
bucket. Each model is versioned and contains a unique ID which can be
used to retrieve the model URI. The following information describes
the ``model_id`` and ``model_version`` needed to retrieve the URI.
Model artifacts are stored as tarballs in a S3 bucket. Each model is versioned and contains a
unique ID which can be used to retrieve the model URI. The following information describes the
``model_id`` and ``model_version`` needed to retrieve the URI.

.. container::

Expand All @@ -671,7 +666,7 @@ the ``model_id`` and ``model_version`` needed to retrieve the URI.
required parameter.

To retrieve a model, first select a ``model ID`` and ``version`` from
the :doc:`available models <./doc_utils/jumpstart>`.
the :doc:`available models <./doc_utils/pretrainedmodels>`.

.. code:: python

Expand All @@ -688,15 +683,13 @@ Then use those values to retrieve the model as follows.
    model_id=model_id, model_version=model_version, model_scope=scope
)

JumpStart scripts
-----------------
Model scripts
-------------

To adapt JumpStart models for SageMaker, a custom
script is needed to perform training or inference. JumpStart
maintains a suite of scripts used for each of the models in the
JumpStart S3 bucket, which can be accessed using the SageMaker Python
SDK. Use the ``model_id`` and ``version`` of the corresponding model
to retrieve the related script as follows.
To adapt pre-trained models for SageMaker, a custom script is needed to perform training
or inference. SageMaker maintains a suite of scripts used for each of the models in the
S3 bucket, which can be accessed using the SageMaker Python SDK Use the ``model_id`` and
``version`` of the corresponding model to retrieve the related script as follows.

.. code:: python

Expand All @@ -706,11 +699,11 @@ to retrieve the related script as follows.
    model_id=model_id, model_version=model_version, script_scope=scope
)

JumpStart images
----------------
Model images
-------------

A Docker image is required to perform training or inference on all
SageMaker models. JumpStart relies on Docker images from the
SageMaker models. SageMaker relies on Docker images from the
following repos https://github.com/aws/deep-learning-containers,
https://github.com/aws/sagemaker-xgboost-container,
and https://github.com/aws/sagemaker-scikit-learn-container. Use
Expand All @@ -733,16 +726,16 @@ retrieve the related image as follows.
Deploy a  Pre-Trained Model Directly to a SageMaker Endpoint
============================================================

In this section, you learn how to take a pre-trained JumpStart model
and deploy it directly to a SageMaker Endpoint. This is the fastest
way to start machine learning with a JumpStart model. The following
In this section, you learn how to take a pre-trained model and deploy
it directly to a SageMaker Endpoint. This is the fastest way to start
machine learning with a pre-trained model. The following
assumes familiarity with `SageMaker
models <https://sagemaker.readthedocs.io/en/stable/api/inference/model.html>`__
and their deploy functions.

To begin, select a ``model_id`` and ``version`` from the JumpStart
To begin, select a ``model_id`` and ``version`` from the pre-trained
models table, as well as a model scope of either “inference” or
“training”. For this example, you use a pre-trained JumpStart model,
“training”. For this example, you use a pre-trained model,
so select “inference”  for your model scope. Use the utility
functions to retrieve the URI of each of the three components you
need to continue.
Expand Down Expand Up @@ -772,7 +765,7 @@ need to continue.

Next, pass the URIs and other key parameters as part of a new
SageMaker Model class. The ``entry_point`` is a JumpStart script
named ``inference.py``. JumpStart handles the implementation of this
named ``inference.py``. SageMaker handles the implementation of this
script. You must use this value for model inference to be successful.
For more information about the Model class and its parameters,
see `Model <https://sagemaker.readthedocs.io/en/stable/api/inference/model.html>`__.
Expand Down Expand Up @@ -811,7 +804,7 @@ Deployment may take about 5 minutes.
Because the model and script URIs are distributed by SageMaker JumpStart,
the endpoint, endpoint config and model resources will be prefixed with
``sagemaker-jumpstart``. Refer to the model ``Tags`` to inspect the
JumpStart artifacts involved in the model creation.
model artifacts involved in the model creation.

Perform Inference
-----------------
Expand All @@ -829,17 +822,16 @@ the
Fine-tune a Model and Deploy to a SageMaker Endpoint
====================================================

In this section, you initiate a training job to further train one of
the pretrained JumpStart models for your use case, then deploy it to
a SageMaker Endpoint for inference. This lets you fine tune the model
for your use case with your custom dataset. The following assumes
In this section, you initiate a training job to further train one of the pre-trained models
for your use case, then deploy it to a SageMaker Endpoint for inference. This lets you fine
tune the model for your use case with your custom dataset. The following assumes
familiarity with `SageMaker training jobs and their
architecture <https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html>`__.

Fine-tune a JumpStart Model on a Custom Dataset
-----------------------------------------------
Fine-tune a Pre-trained Model on a Custom Dataset
-------------------------------------------------

To begin, select a ``model_id`` and ``version`` from the JumpStart
To begin, select a ``model_id`` and ``version`` from the pre-trained
models table, as well as a model scope. In this case, you begin by
using “training” as the model scope. Use the utility functions to
retrieve the URI of each of the three components you need to
Expand Down Expand Up @@ -875,10 +867,10 @@ Table <https://aws.amazon.com/sagemaker/pricing/#On-Demand_Pricing>`__ and selec
    instance_type=training_instance_type,
)

Next, use the JumpStart resource URIs to create an ``Estimator`` and
Next, use the model resource URIs to create an ``Estimator`` and
train it on a custom training dataset. You must specify the S3 path
of your custom training dataset. The Estimator class requires
an ``entry_point`` parameter. In this case, JumpStart uses
an ``entry_point`` parameter. In this case, SageMaker uses
“transfer_learning.py”. The training job fails to execute if this
value is not set.

Expand Down