# Getting Started with Tensor Parallelism using the SageMaker Model Parallelism Library

This notebook walks you through how to use the tensor parallelism feature provided by the SageMaker model parallelism library. You'll learn how to train the GPT-J model with tensor parallelism on a synthetic text data.

**Note**: To run this example training job, you must be in `us-west-2`. The preview version of container images are available only in those two regions.

## Install and Upgrade Libraries

The SageMaker model parallelism library's tensor parallelism feature requires the SageMaker Python SDK and the SageMaker Experiments library. Run the following cell to install or upgrade the libraries.

**Note:** To finish applying the changes, you must restart the kernel.

In [1]:
# run once, restart kernel, then comment out this cell
# update sagemaker to the latest 2.x version
# ! pip install -qU pip
# ! pip install -qU "sagemaker>=2,<3"
# ! pip install -qU sagemaker-experiments

# import IPython
# IPython.Application.instance().kernel.do_shutdown(True)

Import and check if the SageMaker Python SDK version is successfully set to the latest version

In [2]:
import sagemaker
print(sagemaker.__version__)

2.86.0


## Amazon SageMaker Initialization

This private preview feature is available to use in `us-east-1` and `us-west-2`.
Throughout this example, you'll use a training script of GPT model and a text dataset.

Run the following cell to import SageMaker modules and retrieve information of your current SageMaker work environment: your AWS account ID, the AWS Region you are using to run the notebook, and the ARN of your Amazon SageMaker execution role.

In [3]:
%%time
import os

from sagemaker import get_execution_role
from sagemaker.huggingface import HuggingFace
from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
import boto3

# If running in Sagemaker notebook this can stay commented
# os.environ["AWS_PROFILE"] = "sm"

# supported regions only us-west-2 and us-east-1
# preview images are only in these two regions
os.environ["AWS_DEFAULT_REGION"] = "us-west-2"

# role = get_execution_role() # provide a pre-existing role ARN as an alternative to creating a new role
role = "SageMakerRole"
print(f'SageMaker Execution Role:{role}')

client = boto3.client('sts')
account = client.get_caller_identity()['Account']
print(f'AWS account:{account}')

session = boto3.session.Session()
region = session.region_name
print(f'AWS region:{region}')

sm_boto_client = boto3.client("sagemaker")
sagemaker_session = sagemaker.session.Session(boto_session=session)

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


SageMaker Execution Role:SageMakerRole


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


AWS account:855988369404
AWS region:us-west-2
CPU times: user 190 ms, sys: 43.8 ms, total: 234 ms
Wall time: 700 ms


## Specify Amazon S3 Bucket Paths

Here you need to specify the paths for training data to be used by your job. The bucket used must be in the same region as where training will run. As part of the private preview artifacts, we provide a synthetic dataset that you can use to quickly get started in 'smdistributed-modelparallel-preview' bucket. This bucket is in us-west-2, and we recommend you copy the data to your own bucket and update the paths in the next cell to avoid any cross-account permission issues depending on your IAM role permissions.

After you successfully run this example tensor parallel training job, you can modify the S3 bucket to where your own dataset is stored.

In [6]:
if region == 'us-east-1':
    s3_train_bucket = "s3://cakarak-playground/gpt-debug/train-synthetic/"
    s3_test_bucket = "s3://cakarak-playground/gpt-debug/val-synthetic/"
elif region == 'us-west-2':
    s3_train_bucket = "s3://smdistributed-modelparallel-preview/synthetic-gpt-data/train_synthetic/"
    s3_test_bucket = "s3://smdistributed-modelparallel-preview/synthetic-gpt-data/val_synthetic/"

The below bucket will store output artifacts of the training job. You can modify this as needed.

In [7]:
s3_output_bucket = f"s3://sagemaker-{region}-{account}/smp-tensorparallel-beta/gpj_synthetic_simpletrainer_outputdir/"

## Define Data Channels for SageMaker Training

In this step, you define SageMaker training data channels using the above buckets.  

In [8]:
# Set use_fsx to False by default
# Set below var to True if you want to use fsx (see next cell)
use_fsx = False
if not use_fsx:
    train = sagemaker.inputs.TrainingInput(s3_train_bucket, distribution='FullyReplicated', s3_data_type='S3Prefix')
    test = sagemaker.inputs.TrainingInput(s3_test_bucket, distribution='FullyReplicated', s3_data_type='S3Prefix')
    data_channels = {'train': train, 'test': test}


## Setup fsx and use fsx for data channels and checkpoints

While the above option is easier to setup, using an FSX can be beneficial for performance when dealing with large input sizes and large model sizes. If you are using models above 13B, checkpointing should be done using FSX. 

Please see the instructions [here](https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/distributed_tensorflow_mask_rcnn/mask-rcnn-scriptmode-fsx.ipynb), to create the FSx lustre filesystem and import the dataset from the S3 bucket to your fsx filesystem. Note that the FSX must be created in a private subnet with internet gateway to ensure that training job has access to the internet. 

In [9]:
# Instructions obtained from:
# https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/distributed_tensorflow_mask_rcnn/mask-rcnn-scriptmode-fsx.ipynb

if use_fsx:
    from sagemaker.inputs import FileSystemInput

    # Specify FSx Lustre file system id.
    file_system_id = "fs-01bdca7270e7a4c38"
    
    # Specify the SG and subnet used by the FSX, these are passed to SM Estimator so jobs use this as well
    fsx_security_group_id = "sg-0187f4ddd05b7241c"
    fsx_subnet = "subnet-0ae2d781b32d8b3d0"
    
    # Specify directory path for input data on the file system. 
    # You need to provide normalized and absolute path below.
    # Your mount name can be provided by you when creating fsx, or generated automatically.
    # You can find this mount_name on the FSX page in console. 
    # Example of fsx generated mount_name: "3x5lhbmv"
    base_path = "/3x5lhbmv"

    # Specify your file system type.
    file_system_type = 'FSxLustre'

    train = FileSystemInput(file_system_id=file_system_id,
                            file_system_type=file_system_type,
                            directory_path=base_path,
                            file_system_access_mode="rw")

    data_channels = {"train": train, "test": train}

## Set Up Hyperparameters, Metric Definitions, and MPI Options
The following `hyperparameters` dictionary is to pass arguments to the training script (`train_gpt_simple.py`) and set the model parallel configuration when creating the training job.

You can also add custom mpi flags. By default, we have `--mca btl_vader_single_copy_mechanism none` to remove unnecessary logs.

Next we add a base metric definitions to enable the metric upload in SageMaker. You can add any further metric definitions.

In [26]:
hyperparameters = {'max_steps': 100,
                   'seed': 12345,
                   'fp16': 1,
                   'lr': 2.e-4,
                   'lr_decay_iters': 125000,
                   'min_lr': 0.00001,
                   'lr-decay-style': 'linear',
                   'warmup': 0.01,
                   'num_kept_checkpoints': 5,
                   'checkpoint_freq': 200,
                   'validation_freq': 1000,
                   'logging_freq': 10,
                   'save_final_full_model': 1,
                   'skip_full_optimizer': 1,
                   'shard_optimizer_state': 1,
                   'activation_checkpointing': 1,
                   'activation_strategy': 'each',
                   'optimize': 'speed',
                    # below flag loads model and optimizer state from checkpoint_s3_uri
                    # 'load_partial': 1,
                  }


if use_fsx:
    # make sure to update paths for training-dir and test-dir based on the paths of datasets in fsx
    # If you want to resume training, set checkpoint-dir to the same path as a previous job.
    SM_TRAIN_DIR = "/opt/ml/input/data/train"
    hyperparameters['checkpoint-dir'] = f"{SM_TRAIN_DIR}/checkpointdir-job2"
    hyperparameters['model-dir'] = f"{SM_TRAIN_DIR}/modeldir-job2"
    hyperparameters['training-dir'] = f"{SM_TRAIN_DIR}/datasets/pytorch_gpt2/train_synthetic"
    hyperparameters['test-dir'] = f"{SM_TRAIN_DIR}/datasets/pytorch_gpt2/val_synthetic"

# The checkpoint path (hyperparameters['checkpoint-dir'] or checkpoint_s3_uri) is not unique per job. 
# You need to modify as needed for different runs. 
# If same path is used for unrelated runs, this may increase time when downloading unnecessary checkpoints, 
# and cause conflicts when loading checkpoints.


mpioptions = "-x NCCL_DEBUG=WARN -x SMDEBUG_LOG_LEVEL=ERROR "
mpioptions += "-x SMP_DISABLE_D2D=1 -x SMP_D2D_GPU_BUFFER_SIZE_BYTES=1 -x SMP_NCCL_THROTTLE_LIMIT=1 "
mpioptions += "-x FI_EFA_USE_DEVICE_RDMA=1 -x FI_PROVIDER=efa -x RDMAV_FORK_SAFE=1"

metric_definitions = [{"Name": "base_metric", "Regex": "<><><><><><>"}] # Add your custom metric definitions

Set the model configuration below.

In [35]:
model_config = 'gptj-6b'

if model_config == 'gptj-6b':
    model_params = {        
        'max_context_width': 512, 
        'hidden_width': 4096, 
        'num_layers': 28, 
        'num_heads': 16,
        
        'tensor_parallel_degree': 8,
        'pipeline_parallel_degree': 2,

        'train_batch_size': 16,
        'val_batch_size': 16,
        'prescaled_batch': 1,
    }
else:
    raise RuntimeError("Unknown model config")

for k, v in model_params.items():
    hyperparameters[k] = v


## Set Up SageMaker Studio Experiment
Create or load [SageMaker Experiment](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html) for the example training job. This will create an experiment trial object in SageMaker Studio.

In [28]:
from time import gmtime, strftime

# Specify your experiment name
experiment_name = "smp-gptj"
# Specify your trial name
trial_name = f'{experiment_name}-trial1' 

all_experiment_names = [exp.experiment_name for exp in Experiment.list()]
# Load the experiment if it exists, otherwise create 
if experiment_name not in all_experiment_names:
    experiment = Experiment.create(experiment_name=experiment_name, sagemaker_boto_client=sm_boto_client)
else:
    experiment = Experiment.load(experiment_name=experiment_name, sagemaker_boto_client=sm_boto_client)

# Create the trial
trial = Trial.create(
        trial_name="smp-{}-{}".format(trial_name, strftime("%Y-%m-%d-%H-%M-%S", gmtime())),
        experiment_name=experiment.experiment_name,
        sagemaker_boto_client=sm_boto_client,
    )

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


## Specify Essential Parameters for a SageMaker Training Job

Next, you will use the [`SageMaker Estimator API`](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) to define a SageMaker Training Job, passing values through the following parameters for training job name, the number of EC2 instances, the instance type, and the size of the volume attached to the instances. 

* `instance_count`
* `instance_type`
* `volume_size`
* `base_job_name`

### Update the Type and Number of EC2 Instance to Use

The instance type and the number of instances you specify to the `instance_type` and `instance_count` parameters, respectively, will determine the total number of GPUs (world size).

$$ \text{(world size) = (the number of GPUs on a single instance)}\times\text{(the number of instance)}$$

In [36]:
instance_type = 'ml.p4d.24xlarge'

instance_count = 2

# set to the number of GPUs on that instance
processes_per_host = 8

To look up the number of GPUs of different instance types, see [Amazon EC2 Instance Types](https://aws.amazon.com/ec2/instance-types/). Use the section **Accelerated Computing** to see general purpose GPU instances. Note that, for example, a given instance type `p4d.24xlarge` has a corresponding instance type `ml.p4d.24xlarge` in SageMaker.
For SageMaker supported `ml` instances and cost information, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/). 

### Attach an EBS Volume to the Training Instance
The volume size you specify in `volume_size` must be larger than your input data size. In this example, the volume size is set to 500GB.

In [30]:
volume_size=500

### Specify a Base Job Name

In [31]:
machine_str = instance_type.split('.')[1] + instance_type.split('.')[2][:3]
pp_degree = hyperparameters['pipeline_parallel_degree']
tp_degree = hyperparameters['tensor_parallel_degree']
base_job_name = f'smp-{model_config}-{machine_str}-tp{tp_degree}-pp{pp_degree}-bs{hyperparameters["train_batch_size"]}'

In [32]:
if not use_fsx:
    # If you want to resume training, set checkpoint_s3_uri to the same path as a previous job.
    # Previous checkpoint to load must have same model config.
    checkpoint_bucket = f"s3://sagemaker-{region}-{account}/"
    checkpoint_s3_uri = f"{checkpoint_bucket}/experiments/gpt_synthetic_simpletrainer_checkpoints/{base_job_name}/"

### Create a SageMaker PyTorch Estimator

The following cell constructs a PyTorch estimator using the parameters defined above. To see how the SageMaker tensor parallelism modules and functions are applied to the script, see the `train_gpt_simple.py` file and the private preview documentation. 

In [33]:
kwargs = {}
if use_fsx:
    # Use the security group and subnet that was used to create the fsx filesystem
    kwargs["security_group_ids"] = [fsx_security_group_id]
    kwargs["subnets"] = [fsx_subnet]

smp_estimator = HuggingFace(
        entry_point="train_gptj_simple.py",
        source_dir=os.getcwd(),
        role=role,
        instance_type=instance_type,
        image="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04-v1.0",
        volume_size=volume_size,
        instance_count=instance_count,
        sagemaker_session=sagemaker_session,
        distribution={
            "mpi": {
                "enabled": True,
                "processes_per_host": processes_per_host,
                "custom_mpi_options": mpioptions,
            },
            "smdistributed": {
                "modelparallel": {
                    "enabled":True,
                    "parameters": {
                        "ddp": True,
                        "tensor_parallel_degree": hyperparameters['tensor_parallel_degree'],
                        # partitions is a required param in the current SM SDK so it needs to be passed,
                        # these two map to the same config
                        "partitions": hyperparameters['pipeline_parallel_degree'],
                        "shard_optimizer_state": hyperparameters['shard_optimizer_state'] > 0,
                        "prescaled_batch": hyperparameters['prescaled_batch'] > 0,
                        "fp16_params": hyperparameters['fp16'] > 0,
                        "optimize": hyperparameters['optimize'],
                        "auto_partition": True,
                        "default_partition": 0,                        
                        "fp16_params": hyperparameters['fp16'] > 0,
                        "optimize": hyperparameters['optimize'],
                    }
                }
            }
        },
        pytorch_version='1.10',
        transformers_version='4.17',
        py_version='py38',
        output_path=s3_output_bucket,
        checkpoint_s3_uri=checkpoint_s3_uri if not use_fsx else None,
        checkpoint_local_path=hyperparameters['checkpoint-dir'] if use_fsx else None,
        metric_definitions=metric_definitions,
        hyperparameters=hyperparameters,
        debugger_hook_config=False,
        disable_profiler=True,
        base_job_name=base_job_name,
        **kwargs
    )

Finally, run the estimator to launch the SageMaker training job of GPT-J model with tensor parallelism.

In [34]:
smp_estimator.fit(inputs=data_channels, 
                  experiment_config={
                    "ExperimentName": experiment.experiment_name,
                    "TrialName": trial.trial_name,
                    "TrialComponentDisplayName": "Training",
                  },
                  logs=True)

INFO:sagemaker:Creating training-job with name: smp-gptj-6b-p4d24x-tp8-pp2-bs16-2022-04-12-21-00-49-763


2022-04-12 21:00:52 Starting - Starting the training job......
2022-04-12 21:01:51 Starting - Preparing the instances for training................................................
2022-04-12 21:09:50 Downloading - Downloading input data...
2022-04-12 21:10:16 Training - Downloading the training image.......................[35mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[35mbash: no job control in this shell[0m
[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[35m2022-04-12 21:14:25,865 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[35m2022-04-12 21:14:25,945 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[35m2022-04-12 21:14:25,952 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-04-12 21:14:25,884 sagemaker-training-toolkit INFO 

[35m2022-04-12 21:14:32,079 sagemaker-training-toolkit INFO     Process[es]: [psutil.Process(pid=57, name='orted', status='sleeping', started='21:14:31')][0m
[35m2022-04-12 21:14:32,080 sagemaker-training-toolkit INFO     Orted process found [psutil.Process(pid=57, name='orted', status='sleeping', started='21:14:31')][0m
[35m2022-04-12 21:14:32,080 sagemaker-training-toolkit INFO     Waiting for orted process [psutil.Process(pid=57, name='orted', status='sleeping', started='21:14:31')][0m

2022-04-12 21:14:23 Training - Training image download completed. Training in progress.[34m[1,mpirank:8,algo-2]<stderr>:Environment variable SAGEMAKER_INSTANCE_TYPE is not set[0m
[34m[1,mpirank:9,algo-2]<stderr>:Environment variable SAGEMAKER_INSTANCE_TYPE is not set[0m
[34m[1,mpirank:10,algo-2]<stderr>:Environment variable SAGEMAKER_INSTANCE_TYPE is not set[0m
[34m[1,mpirank:11,algo-2]<stderr>:Environment variable SAGEMAKER_INSTANCE_TYPE is not set[0m
[34m[1,mpirank:12,algo-2]<stderr>

[34m[1,mpirank:0,algo-1]<stdout>:# total parameters: 6050882784[0m
[34m[1,mpirank:0,algo-1]<stdout>:Learning rate decay style: linear[0m
[34m[1,mpirank:0,algo-1]<stdout>:Creating val dataloader[0m
[34m[1,mpirank:0,algo-1]<stdout>:Created val dataloader[0m
[34m[1,mpirank:0,algo-1]<stdout>:Reading data from training path ['/opt/ml/input/data/train/part-00001-synthetic.json.gz'][0m
[34m[1,mpirank:5,algo-1]<stdout>:[2022-04-12 21:18:05.120: W smdistributed/modelparallel/backend/split.py:166] Non-splittable object of type <class 'fp16.fp16.FP16_Optimizer'> passed to smp.step. If this object contains tensors that need to be split across microbatches, implement a 'smp_slice' method for this class. See SMP documentation for further information.[0m
[34m[1,mpirank:4,algo-1]<stdout>:[2022-04-12 21:18:05.120: W smdistributed/modelparallel/backend/split.py:166] Non-splittable object of type <class 'fp16.fp16.FP16_Optimizer'> passed to smp.step. If this object contains tensors that need

[34m[1,mpirank:8,algo-2]<stdout>:[2022-04-12 21:18:13.966: I smdistributed/modelparallel/torch/model.py:556] Broadcasted parameters and buffers for partition 1[0m
[34m[1,mpirank:0,algo-1]<stdout>:[2022-04-12 21:18:14.016: I smdistributed/modelparallel/torch/model.py:556] Broadcasted parameters and buffers for partition 0[0m
[34m[1,mpirank:8,algo-2]<stdout>:[2022-04-12 21:18:24.255: W smdistributed/modelparallel/torch/nn/transformer.py:1469] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.[0m
[34m[1,mpirank:8,algo-2]<stdout>:[2022-04-12 21:18:24.273: W smdistributed/modelparallel/torch/nn/transformer.py:1469] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one to

[34m[1,mpirank:0,algo-1]<stdout>:(34s), Batch 9 Loss: 10.8359375, Speed: 15.794319691291253 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(45s), Batch 19 Loss: 10.8671875, Speed: 12.390117131145452 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(56s), Batch 29 Loss: 10.84375, Speed: 15.60102725355667 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(66s), Batch 39 Loss: 10.8671875, Speed: 15.505916886324062 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(77s), Batch 49 Loss: 10.8203125, Speed: 15.226126669632835 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(88s), Batch 59 Loss: 10.8125, Speed: 15.649176416378209 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(98s), Batch 69 Loss: 10.84375, Speed: 15.770490940332909 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(109s), Batch 79 Loss: 10.828125, Speed: 15.554608237893346 samples/sec[0m
[34m[1,mpirank:0,algo-1]<stdout>:(120s), Batch 89 Loss: 10.8203125, Speed: 14.937066498337108 samples/sec[0m
[34m[1,mpirank:0,

[34m[1,mpirank:0,algo-1]<stdout>:Skipping saving the final optimizer state[0m
[34m[1,mpirank:8,algo-2]<stdout>:Skipping saving the final optimizer state[0m
[34m[1,mpirank:8,algo-2]<stdout>:Finished checkpointing after 100 steps: /opt/ml/model/trained_gpt_nparams-6050882784_steps-100.pt[0m
[34m[1,mpirank:0,algo-1]<stdout>:Finished checkpointing after 100 steps: /opt/ml/model/trained_gpt_nparams-6050882784_steps-100.pt[0m
[34m[1,mpirank:0,algo-1]<stdout>:SMP training finished successfully[0m
[35m2022-04-12 21:20:53,166 sagemaker-training-toolkit INFO     Orted process exited[0m
[34m2022-04-12 21:20:53,145 sagemaker-training-toolkit INFO     Reporting training SUCCESS[0m
[35m2022-04-12 21:21:23,194 sagemaker-training-toolkit INFO     MPI process finished.[0m
[35m2022-04-12 21:21:23,194 sagemaker-training-toolkit INFO     Reporting training SUCCESS[0m

2022-04-12 21:21:25 Uploading - Uploading generated training model
2022-04-12 21:38:34 Completed - Training job completed

# Accessing the Training Logs

You can access the training logs from [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html). Make sure to look at the logs of algo-1 as that is the master node whose output stream will have the training job logs.

You can use CloudWatch to track SageMaker GPU and memory utilization during training and inference. To view the metrics and logs that SageMaker writes to CloudWatch, see *Processing Job, Training Job, Batch Transform Job, and Endpoint Instance Metrics* in [Monitor Amazon SageMaker with Amazon CloudWatch](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html).

If you are a new user of CloudWatch, see [Getting Started with Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/GettingStarted.html). 

For additional information on monitoring and analyzing Amazon SageMaker training jobs, see [Monitor and Analyze Training Jobs Using Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html).

# Deploying Trained Model for Inference

In most cases the trained model can be deployed on a single device for inference, since inference has smaller memory requirements. You can use the SMP API to create a single, unified model after training. For TensorFlow, a SavedModel can be created using `smp.DistributedModel.save_model` API, and for PyTorch, `smp.save()` can be used.

After you build and train your models, you can deploy them to get predictions in one of two ways:

* To set up a persistent endpoint to get predictions from your models, use SageMaker hosting services. For an overview on deploying a single model or multiple models with SageMaker hosting services, see [Deploy a Model on SageMaker Hosting Services](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html#how-it-works-hosting).
* To get predictions for an entire dataset, use SageMaker batch transform. For an overview on deploying a model with SageMaker batch transform, see [Get Inferences for an Entire Dataset with Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html).

To learn more about deploying models for inference using SageMaker, see [Deploy Models for Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html). 
