# Distributed Training: Data Parallelism with Your Own Container

Amazon SageMaker supported distributed training with your own container. When you are using a commonly used ML framework such as [TensorFlow](https://github.com/aws/sagemaker-tensorflow-container), [MXNet](https://github.com/aws/sagemaker-mxnet-container), [PyTorch](https://github.com/aws/sagemaker-pytorch-container) or [Chainer](https://github.com/aws/sagemaker-chainer-container) that has direct support in SageMaker to train a large DL model, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework.

Even if there is direct SDK support for your environment or framework, you may want to add additional functionality or configure your container environment differently while utilizing our container to use on SageMaker.

**Some of the reasons to extend a SageMaker deep learning framework container are:**
1. Install additional dependencies. (E.g. I want to install a specific Python library, that the current SageMaker containers don't install.)
2. Leverage latest package revision such as the newest Transformers
3. Configure your environment. (E.g. I want to add an environment variable to my container.)


This walkthrough shows that it is quite straightforward to extend one of our containers to build your own custom container for Q&A tasks.

## Permissions

Running this notebook requires permissions in addition to the normal `SageMakerFullAccess` permissions. This is because it creates new repositories in Amazon ECR. The easiest way to add these permissions is simply to add the managed policy `AmazonEC2ContainerRegistryFullAccess` to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.

## The example

In this example we show how to package a tranformers container from ground up, based on Nvidia GPU and PyTorch framwork with a Q&A example which works with the Hugging Face dataset. By extending the SageMaker PyTorch container we can utilize the existing training and hosting solution made to work on SageMaker. 


## Packaging and Uploading your Algorithm for use with Amazon SageMaker

### An overview of Docker

If you're familiar with Docker already, you can skip ahead to the next section.

For many data scientists, Docker containers are a new technology. But they are not difficult and can significantly simplify the deployment of your software packages. 

Docker provides a simple way to package arbitrary code into an _image_ that is totally self-contained. Once you have an image, you can use Docker to run a _container_ based on that image. Running a container is just like running a program on the machine except that the container creates a fully self-contained environment for the program to run. Containers are isolated from each other and from the host environment, so the way your program is set up is the way it runs, no matter where you run it.

Docker is more powerful than environment managers like conda or virtualenv because (a) it is completely language independent and (b) it comprises your whole operating environment, including startup commands, and environment variable.

A Docker container is like a virtual machine, but it is much lighter weight. For example, a program running in a container can start in less than a second and many containers can run simultaneously on the same physical or virtual machine instance.

Docker uses a simple file called a `Dockerfile` to specify how the image is assembled. An example is provided below. You can build your Docker images based on Docker images built by yourself or by others, which can simplify things quite a bit.

Docker has become very popular in programming and devops communities due to its flexibility and its well-defined specification of how code can be run in its containers. It is the underpinning of many services built in the past few years, such as [Amazon ECS].

Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms.

In Amazon SageMaker, Docker containers are invoked in a one way for training and another, slightly different, way for hosting. The following sections outline how to build containers for the SageMaker environment.

Some helpful links:

* [Docker home page](http://www.docker.com)
* [Getting started with Docker](https://docs.docker.com/get-started/)
* [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)
* [`docker run` reference](https://docs.docker.com/engine/reference/run/)

[Amazon ECS]: https://aws.amazon.com/ecs/

### How Amazon SageMaker runs your Docker container

Because you can run the same image in training or hosting, Amazon SageMaker runs your container with the argument `train` or `serve`. How your container processes this argument depends on the container. All SageMaker deep learning framework containers already cover this requirement and will trigger your defined training algorithm and inference code.

* If you specify a program as an `ENTRYPOINT` in the Dockerfile, that program will be run at startup and its first argument will be `train` or `serve`. The program can then look at that argument and decide what to do. The original `ENTRYPOINT` specified within the SageMaker PyTorch is [here](https://github.com/aws/deep-learning-containers/blob/master/pytorch/training/docker/1.5.1/py3/Dockerfile.cpu#L123).

#### Running your container during training

Currently, our SageMaker PyTorch container utilizes [console_scripts](http://python-packaging.readthedocs.io/en/latest/command-line-scripts.html#the-console-scripts-entry-point) to make use of the `train` command issued at training time. The line that gets invoked during `train` is defined within the setup.py file inside [SageMaker Containers](https://github.com/aws/sagemaker-containers/blob/master/setup.py#L48), our common SageMaker deep learning container framework. When this command is run, it will invoke the [trainer class](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/cli/train.py) to run, which will finally invoke our [PyTorch container code](https://github.com/aws/sagemaker-pytorch-container/blob/master/src/sagemaker_pytorch_container/training.py) to run your Python file.

A number of files are laid out for your use, under the `/opt/ml` directory:

    /opt/ml
    |-- input
    |   |-- config
    |   |   |-- hyperparameters.json
    |   |   `-- resourceConfig.json
    |   `-- data
    |       `-- <channel_name>
    |           `-- <input data>
    |-- model
    |   `-- <model files>
    `-- output
        `-- failure

##### The input

* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values are always strings, so you may need to convert them. `resourceConfig.json` is a JSON-formatted file that describes the network layout used for distributed training.
* `/opt/ml/input/data/<channel_name>/` (for File mode) contains the input data for that channel. The channels are created based on the call to CreateTrainingJob but it's generally important that channels match algorithm expectations. The files for each channel are copied from S3 to this directory, preserving the tree structure indicated by the S3 key structure. 
* `/opt/ml/input/data/<channel_name>_<epoch_number>` (for Pipe mode) is the pipe for a given epoch. Epochs start at zero and go up by one each time you read them. There is no limit to the number of epochs that you can run, but you must close each pipe before reading the next epoch.

##### The output

* `/opt/ml/model/` is the directory where you write the model that your algorithm generates. Your model can be in any format that you want. It can be a single file or a whole directory tree. SageMaker packages any files in this directory into a compressed tar archive file. This file is made available at the S3 location returned in the `DescribeTrainingJob` result.
* `/opt/ml/output` is a directory where the algorithm can write a file `failure` that describes why the job failed. The contents of this file are returned in the `FailureReason` field of the `DescribeTrainingJob` result. For jobs that succeed, there is no reason to write this file as it is ignored.

#### Running your container during hosting

Hosting has a very different model than training because hosting is reponding to inference requests that come in via HTTP. Currently, the SageMaker PyTorch containers [uses](https://github.com/aws/sagemaker-pytorch-container/blob/master/src/sagemaker_pytorch_container/serving.py#L103) our [recommended Python serving stack](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_server.py#L44) to provide robust and scalable serving of inference requests:

![Request serving stack](stack.png)

Amazon SageMaker uses two URLs in the container:

* `/ping` receives `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these are passed in as well. 

The container has the model files in the same place that they were written to during training:

    /opt/ml
    `-- model
        `-- <model files>



## The Dockerfile

The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. 

We start from the SageMaker PyTorch image as the base. The base image is an ECR image, so it will have the following pattern.
* {account}.dkr.ecr.{region}.amazonaws.com/sagemaker-{framework}:{framework_version}-{processor_type}-{python_version}

Here is an explanation of each field.
1. account - AWS account ID the ECR image belongs to. Our public deep learning framework images are all under the 520713654638 account.
2. region - The region the ECR image belongs to. [Available regions](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).
3. framework - The deep learning framework.
4. framework_version - The version of the deep learning framework.
5. processor_type - CPU or GPU.
6. python_version - The supported version of Python.

So the SageMaker PyTorch ECR image would be:
520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-pytorch:0.4.0-cpu-py3

Information on supported frameworks and versions can be found in this [README](https://github.com/aws/sagemaker-python-sdk).

Next, we add the code that implements our specific algorithm to the container and set up the right environment for it to run under.

**DISCLAIMER: As of now, the support for the two environment variables below are only supported for the SageMaker Chainer (4.1.0+) and PyTorch (0.4.0+) containers.**

Finally, we need to specify two environment variables.
1. SAGEMAKER_SUBMIT_DIRECTORY - the directory within the container containing our Python script for training and inference.
2. SAGEMAKER_PROGRAM - the Python script that should be invoked for training and inference.

Let's look at the Dockerfile for this example.

In [1]:
!cat scripts/Dockerfile.fixed

FROM nvidia/cuda:11.5.1-base-ubuntu20.04

# Remove any third-party apt sources to avoid issues with expiring keys.
RUN rm -f /etc/apt/sources.list.d/*.list

# Install some basic utilities
RUN apt-get update && apt-get install -yq \
    curl \
    ca-certificates \
    #nginx \
    sudo \
    git \
    bzip2 \
    libx11-6 \
    build-essential \
 && rm -rf /var/lib/apt/lists/*

# Create a working directory
RUN mkdir /app
WORKDIR /app

# Create a non-root user and switch to it
RUN adduser --disabled-password --gecos '' --shell /bin/bash user \
 && chown -R user:user /app
RUN echo "user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-user
USER user

# All users can use /home/user as their home directory
ENV HOME=/home/user
RUN mkdir $HOME/.cache $HOME/.config \
 && chmod -R 777 $HOME

# Set up the Conda environment (using Miniforge)
ENV PATH=$HOME/mambaforge/bin:$PATH
COPY environment.yml /app/environment.yml
RUN curl -sLo ~/mambaforge.sh https://github.com/conda-forge/miniforge/releases/dow

In [51]:
!cat scripts/environment.yml

name: base
channels:
- conda-forge
dependencies:
- cudatoolkit=11.5.1
- numpy=1.22.4
- pillow=9.1.1
- pip=22.1.2
- python=3.9.13
- pytorch::pytorch=1.11.0=py3.9_cuda11.5_cudnn8.3.2_0
- pytorch::torchvision=0.12.0=py39_cu115
- scipy=1.8.1
- ffmpeg=5.0.1
- tqdm=4.64.0
- transformers=4.19.2
- tensorboard=2.9.1
- tensorboardX=2.5.1
- mpi4py=3.1.3
#- datatsets=2.3.2

## Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`, which you can run as `build-and-push.sh pytorch-extending-our-containers-example` to build the image `pytorch-extending-our-containers-example`. 

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it. In addition, since we are using the SageMaker PyTorch image as the base, we will need to retrieve ECR credentials to pull this public image.

In [52]:
%%sh

# The name of our algorithm
algorithm_name=torch-hf-gpu-991
cd scripts

account=$(aws sts get-caller-identity --query Account --output text)
#ecr_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account[0], region, algorithm_name)
# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
#$(aws ecr get-login --region ${region} --no-include-email)

# Get the login command from ECR in order to pull down the SageMaker PyTorch image
#$(aws ecr get-login --registry-ids 520713654638 --region ${region} --no-include-email)
aws ecr get-login-password --region ${region}| docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} . -f Dockerfile.fixed --build-arg REGION=${region} --progress=plain
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Process is interrupted.


### Training with Amazon SageMaker
Once we have correctly pushed our container to Amazon ECR, we are ready to start training with Amazon SageMaker, which requires the ECR path to the Docker container used for training as parameter for starting a training job.

SageMaker sets up a docker container for a training job where:

* Environment variables are set as in SageMaker Docker Container. Environment Variables.
* Training data is setup under /opt/ml/input/data.
* Training script codes are setup under /opt/ml/code.

/opt/ml/model and /opt/ml/output directories are setup to store training outputs.

```
/opt/ml
├── input
│   ├── config
│   │   ├── hyperparameters.json  <--- From Estimator hyperparameter arg
│   │   └── resourceConfig.json
│   └── data
│       └── <channel_name>        <--- From Estimator fit method inputs arg
│           └── <input data>
├── code
│   └── <code files>              <--- From Estimator src_dir arg
├── model
│   └── <model files>             <--- Location to save the trained model artifacts
└── output
    └── failure                   <--- Training job failure logs
```

SageMaker Estimator fit(inputs) method executes the training script. Estimator hyperparameters and fit method inputs are provided as its command line arguments.

* The training script saves the model artifacts in the /opt/ml/model once the training is completed.

* SageMaker archives the artifacts under /opt/ml/model into model.tar.gz and save it to the S3 location specified to output_path Estimator parameter.

* You can set Estimator metric_definitions parameter to extract model metrics from the training logs. Then you can monitor the training progress in the SageMaker console metrics.

![SM_Diagram](https://i.stack.imgur.com/gi8bU.png)


The Amazon SageMaker Python SDK provides framework estimators and generic estimators to train your model while orchestrating the machine learning (ML) lifecycle accessing the SageMaker features for training and the AWS infrastructures. To train a model by using the SageMaker Python SDK:

1. Prepare a training script
2. Create an estimator
3. Call the fit method of the estimator
4. Finally this document gives concrete steps and ideas. However still missing comprehensiv details about Environment Variables, Directory structure in the SageMaker docker container**, S3 for uploading code, placing data, S3 where the trained model is saved, etc.


## Setup SageMaker environment

_If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) more about it._

In [1]:
import logging
import transformers
import sagemaker

logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
logger.info(f'Using SageMaker version: {sagemaker.__version__}')
logger.info(f'Using Transfomer version: {transformers.__version__}')


sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

#role = sagemaker.get_execution_role()
role = 'arn:aws:iam::976939723775:role/service-role/AmazonSageMaker-ExecutionRole-20210317T133000'
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

Using SageMaker version: 2.96.0
Using Transfomer version: 4.19.2


sagemaker role arn: arn:aws:iam::976939723775:role/service-role/AmazonSageMaker-ExecutionRole-20210317T133000
sagemaker bucket: sagemaker-us-west-2-976939723775
sagemaker session region: us-west-2


Sagemaker is providing useful properties about the training environment through various environment variables, including the following:

* `SM_MODEL_DIR`: A string that represents the path where the training job writes the model artifacts to. After training, artifacts in this directory are uploaded to S3 for model hosting.

* `SM_NUM_GPUS`: An integer representing the number of GPUs available to the host.

* `SM_CHANNEL_XXXX:` A string that represents the path to the directory that contains the input data for the specified channel. For example, if you specify two input channels in the HuggingFace estimator’s fit call, named `train` and `test`, the environment variables `SM_CHANNEL_TRAIN` and `SM_CHANNEL_TEST` are set.


To run your training job locally you can define `instance_type='local'` or `instance_type='local_gpu'` for gpu usage. _Note: this does not working within SageMaker Studio_

## Creating an Estimator and start a training job

In this example we are going to use the capability to download/use a fine-tuning script from a `git`- repository. We are using the `run_qa.py` from the `transformers` example scripts. You can find the code [here](https://github.com/huggingface/transformers/tree/master/examples/question-answering).


### RANK in MPI

As to your question: processes are the actual instances of the program that are running. MPI allows you to create logical groups of processes, and in each group, a process is identified by its rank. This is an integer in the range [0, N-1] where N is the size of the group. Communicators are objects that handle communication between processes. An intra-communicator handles processes within a single group, while an inter-communicator handles communication between two distinct groups.

By default, you have a single group that contains all your processes, and the intra-communicator MPI_COMM_WORLD that handles communication between them. This is sufficient for most applications, and does blur the distinction between process and rank a bit. The main thing to remember is that the rank of a process is always relative to a group. If you were to split your processes into two groups (e.g. one group to read input and another group to process data), then each process would now have two ranks: the one it originally had in MPI_COMM_WORLD, and one in its new group.

In [2]:
from transformers import AutoTokenizer

# tokenizer used in preprocessing
tokenizer_name = 'bert-large-uncased-whole-word-masking'
#tokenizer_name = 'roberta-large'

# download tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

### Distributed Training with Parameter Servers
A common pattern in distributed training is to use dedicated processes to collect gradients computed by “worker” processes, then aggregate them and distribute the updated gradients back to the workers. These processes are known as parameter servers. In general, they can be run either on their own machines or co-located on the same machines as the workers. In a parameter server cluster, each parameter server communicates with all workers (“all-to-all”). The Amazon SageMaker prebuilt TensorFlow container comes with a built-in option to use parameter servers for distributed training. The container runs a parameter server thread in each training instance, so there is a 1:1 ratio of parameter servers to workers. With this built-in option, gradient updates are made asynchronously (though some other versions of parameters servers use synchronous updates).

In [28]:
from sagemaker.estimator import Estimator

# configuration for running training on smdistributed Data Parallel
dp_options = {
    "enabled":True,
    "custom_mpi_options": "-verbose -x NCCL_DEBUG=VERSION  -x RANK=0 -x WORLD_SIZE=2", #Rank: unique number to identify a comminicator; WORLD_SIZE: total number of the communicators in a MPI_COMM_WORLD
    "parameters": {
        "microbatches": 4,
    }
}

# configuration for running training on smdistributed Data Parallel
distribution = {
    'smdistributed':{'dataparallel': dp_options}, 
    'parameter_server':{'enabled': True}, # or False
}

# instance configurations
instance_type='ml.g4dn.2xlarge'  #ml.p3.2xlarge or larger types work as swell
instance_count=2 # if set 1 you might get "Environment variable SAGEMAKER_INSTANCE_TYPE is not set" error
volume_size=200


# metric definition to extract the results
metric_definitions=[
     {"Name": "train_runtime", "Regex": "train_runtime.*=\D*(.*?)$"},
     {'Name': 'train_samples_per_second', 'Regex': "train_samples_per_second.*=\D*(.*?)$"},
     {'Name': 'epoch', 'Regex': "epoch.*=\D*(.*?)$"},
     {'Name': 'f1', 'Regex': "f1.*=\D*(.*?)$"},
     {'Name': 'exact_match', 'Regex': "exact_match.*=\D*(.*?)$"}]

In [4]:
import json
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}

In [5]:
hyperparameters_json=json_encode_hyperparameters({
    'model_name_or_path': tokenizer_name,
    'dataset_name':'squad',
    'do_train': True,
    'do_eval': True,
    'fp16': True,
    'per_device_train_batch_size': 4,
    'per_device_eval_batch_size': 4,
    'num_train_epochs': 2,
    'max_seq_length': 384,
    'max_steps': 100,
    'pad_to_max_length': True,
    'doc_stride': 128,
    'output_dir': '/opt/ml/model',
})

In [37]:
import boto3
session = boto3.session.Session()

account = !aws sts get-caller-identity --query Account --output text
algorithm_name='torch-hf-gpu-991'
ecr_image_self = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account[0], session.region_name, algorithm_name)
ecr_image_smd190= '763104351884.dkr.ecr.{}.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker'.format(session.region_name)
ecr_image_smd181 = '763104351884.dkr.ecr.{}.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04'.format(session.region_name)
ecr_image_smd180 = '763104351884.dkr.ecr.{}amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04'.format(session.region_name)

## Use BOYC to overcome instance type limitation

By default, SageMaker HuggingFace module supports only ml.p3.16xlarge', 'ml.p3dn.24xlarge', 'ml.p4d.24xlarge', 'local_gpu'. Following changes allow SM to use other instance types

In [39]:
# estimator
env={'SAGEMAKER_REQUIREMENTS': 'requirements.txt'}

huggingface_estimator = Estimator(image_uri=ecr_image_smd181,
                                  source_dir="./scripts",
                                  entry_point="run_qa.py",
                                  dependencies=['./scripts/trainer_qa.py', './scripts/utils_qa.py'],
                                  env=env,
                                    metric_definitions=metric_definitions,
                                    instance_type=instance_type,
                                    instance_count=instance_count,
                                    volume_size=volume_size,
                                    role=role,
                                    py_version='3.8',
                                    distribution= distribution,
                                    hyperparameters = hyperparameters_json)

In [None]:
job_name='torch-hf-gpu-2022-07-01-08'

In [None]:
# starting the train job
huggingface_estimator.fit(job_name=job_name)

2022-07-01 19:55:26 Starting - Starting the training job...
2022-07-01 19:55:50 Starting - Preparing the instances for trainingProfilerReport-1656705326: InProgress
.........
2022-07-01 19:57:14 Downloading - Downloading input data
2022-07-01 19:57:14 Training - Downloading the training image.......................[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
  "class": algorithms.Blowfish,[0m
[34m2022-07-01 20:01:13,080 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-07-01 20:01:13,099 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-07-01 20:01:13,105 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-07-01 20:01:13,548 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:[0m
[34m/opt/conda/bin/python3.8 -m pip install -

### Display the estimator's stored profiler artifact path

BY default the artifacts are stored the a designated S3 folder in your bucket.

In [11]:
huggingface_estimator.latest_job_profiler_artifacts_path()

's3://sagemaker-us-west-2-976939723775/torch-hf-gpu-2022-07-01-04/profiler-output'

### Deploy an endpoint for inference

In [12]:
from sagemaker.huggingface import HuggingFaceModel

# By default SageMaker saved model artifact as model.tar.gz on the ./output directory
model_data = f's3://sagemaker-us-west-2-976939723775/'+job_name+f'/output/model.tar.gz'

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_data,  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.12", # transformers version used
   pytorch_version="1.9", # pytorch version used
   py_version="py38", # python version of the DLC
)

In [13]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.2xlarge"
)

------!

### Send sample data to the endpoint fopr inference

In [14]:
data = {
"inputs": {
    "question": "Who is Philipp?",
	"context": "My Name is Philipp and I am a computer eingeer who lives in San Francisco, California."
	}
}
predictor.predict(data)

{'score': 0.1322418451309204,
 'start': 30,
 'end': 46,
 'answer': 'computer eingeer'}

### Optional cleanup
When you're done with the endpoint, you should clean it up.

All the training jobs, models and endpoints we created can be viewed through the SageMaker console of your AWS account.

In [None]:
predictor.delete_endpoint()