### Install git-lfs and Download the Model

The model's weights `pytorch_model.bin` is stored using git LFS on GitHub. To properly load the model's weights on a notebook, install `git-lfs` and retrieve the file from LFS. This can take several minutes on a small notebook instance.

In [None]:
%%sh

conda install -y -c conda-forge git-lfs
git lfs install
git lfs fetch --all
git lfs pull
git lfs checkout

### Package the Model

The Sagemaker SDK expects a `.tar.gz` file to be passed as `model_data` to a `PyTorchModel` instance. Create a `model.tar.gz` file with the following structure:

```txt
| model/
    | pytorch_model.bin
| code/
    | infer.py
    | model.py
    | requirements.txt
```

The entrypoint for the model is `infer.py`, which will load `pytorch_model.bin` from the `model/` directory and import `BertForWordBoundaryDetection` from `model.py`. `requirements.txt` includes an install of `transformers`, which is not included in the base Docker image for PyTorch models.

In [None]:
import os
import tarfile

model_path = "model/"
code_path = "code/"

zipped_model_path = "model.tar.gz"

with tarfile.open(zipped_model_path, "w:gz") as tar:
    tar.add("pytorch_model.bin", os.path.join(model_path, "pytorch_model.bin"))
    tar.add("sagemaker_infer.py", os.path.join(code_path, "infer.py"))
    tar.add("model.py", os.path.join(code_path, "model.py"))
    tar.add("sagemaker_requirements.txt", os.path.join(code_path, "requirements.txt"))

### Deploy the Model

Use the Sagemaker SDK to initialize a `PyTorchModel` instance with the `model.tar.gz` file created in the previous cell.

This cell creates a serverless deployment, which may be suitable for lightweight use cases. A provisioned instance can instead be created by calling `model.deploy` without using a serverless instance config. For example:

```python
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=endpoint_name
)
```

In [None]:
import time

from sagemaker import get_execution_role, Session
from sagemaker.pytorch import PyTorchModel
from sagemaker.serverless import ServerlessInferenceConfig

algorithm_name = "bert-small-ko-word-boundary-detection"
timestamp = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
endpoint_name = f"{algorithm_name}-{timestamp}"

model = PyTorchModel(
    name=f"{algorithm_name}-model-{timestamp}",
    entry_point="infer.py",
    model_data=zipped_model_path,
    role=get_execution_role(),
    framework_version="2.0",
    py_version="py310",
)

config = ServerlessInferenceConfig(
    memory_size_in_mb=3072,
    max_concurrency=4,
)

predictor = model.deploy(
    serverless_inference_config=config,
    endpoint_name=endpoint_name
)

### Test the Endpoint

Send a test invocation to the endpoint created in the previous cell. Note that the entrypoint `infer.py` expects a list of strings as the request body.

In [None]:
import json

client = Session().sagemaker_runtime_client
body = ["공장 훗날"]

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(body, ensure_ascii=False),
    ContentType="application/json",
)

response["Body"].read()