### Install git-lfs and Download the Model

The model's weights `pytorch_model.bin` is stored using git LFS on GitHub. To properly load the model's weights on a notebook, install `git-lfs` and retrieve the file from LFS. This can take several minutes on a small notebook instance.

In [None]:
%%sh

conda install -y -c conda-forge git-lfs
git lfs install
git lfs fetch --all
git lfs pull
git lfs checkout

### Package the Model

The Sagemaker SDK expects a `.tar.gz` file to be passed as `model_data` to a `PyTorchModel` instance. Create a `model.tar.gz` file with the following structure:

```txt
| model/
    | pytorch_model.bin
| code/
    | infer.py
    | model.py
    | requirements.txt
```

The entrypoint for the model is `infer.py`, which will load `pytorch_model.bin` from the `model/` directory and import `BertForWordBoundaryDetection` from `model.py`. `requirements.txt` includes an install of `transformers`, which is not included in the base Docker image for PyTorch models.

In [None]:
import os
import tarfile

model_path = "model/"
code_path = "code/"

zipped_model_path = "model.tar.gz"

with tarfile.open(zipped_model_path, "w:gz") as tar:
    tar.add("pytorch_model.bin", os.path.join(model_path, "pytorch_model.bin"))
    tar.add("sagemaker_infer.py", os.path.join(code_path, "infer.py"))
    tar.add("model.py", os.path.join(code_path, "model.py"))
    tar.add("sagemaker_requirements.txt", os.path.join(code_path, "requirements.txt"))

### Deploy the Model

Use `boto3` to upload the `model.tar.gz` file created in the previous cell to S3, then create a model on SageMaker and deploy it for serverless inference.

This cell creates a serverless deployment, which may be suitable for lightweight use cases. A provisioned instance can instead be created with a modified endpoint configuration. For example:

```python
instance_type = "ml.m5.large"  # Replace with your desired instance type
initial_instance_count = 1

endpoint_config_name = f"{algorithm_name}-config-{timestamp}"

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTraffic",
            "ModelName": model_name,
            "InstanceType": instance_type,
            "InitialInstanceCount": initial_instance_count,
        }
    ],
)
```

In [None]:
import os
import time

import boto3
from sagemaker import get_execution_role, Session

sagemaker_session = Session()
s3_client = boto3.client("s3")
sagemaker_client = boto3.client("sagemaker")

# Define the necessary variables
algorithm_name = "bert-small-ko-wbd"
timestamp = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
endpoint_name = f"{algorithm_name}-{timestamp}"
model_name = f"{algorithm_name}-model-{timestamp}"

# Upload the model file to S3
bucket_name = sagemaker_session.default_bucket()
s3_key = f"{endpoint_name}/{os.path.basename(zipped_model_path)}"
s3_client.upload_file(zipped_model_path, bucket_name, s3_key)
model_data_s3_uri = f"s3://{bucket_name}/{s3_key}"
print("Model uploaded to S3:", model_data_s3_uri)

# Create the SageMaker model
create_model_response = sagemaker_client.create_model(
    ModelName=model_name,
    PrimaryContainer={
        "Image": "763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.0-cpu-py310",
        "ModelDataUrl": model_data_s3_uri,
        "Environment": {
            "SAGEMAKER_PROGRAM": "infer.py",
            "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
            "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
            "SAGEMAKER_REGION": "us-east-1",
        },
    },
    ExecutionRoleArn=get_execution_role(),
)

print("Model created:", create_model_response)

# Configure serverless inference
serverless_config = {
    "MemorySizeInMB": 3072,
    "MaxConcurrency": 4
}

# Deploy the model to a serverless endpoint
endpoint_config_name = f"{algorithm_name}-config-{timestamp}"

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTraffic",
            "ModelName": model_name,
            "ServerlessConfig": serverless_config,
        }
    ],
)

print("Endpoint configuration created:", create_endpoint_config_response)

create_endpoint_response = sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

print("Endpoint creation initiated:", create_endpoint_response)

# Wait for the endpoint to be ready
print("Waiting for endpoint to reach InService status...")
while True:
    response = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
    status = response["EndpointStatus"]
    print(f"Current endpoint status: {status}")
    if status == "InService":
        print(f"Endpoint {endpoint_name} is ready.")
        break
    elif status == "Failed":
        raise RuntimeError(f"Endpoint {endpoint_name} failed to deploy. Reason: {response.get('FailureReason', 'Unknown')}")
    time.sleep(30)  # Check status every 30 seconds

### Test the Endpoint

Send a test invocation to the endpoint created in the previous cell. Note that the entrypoint `infer.py` expects a list of strings as the request body.

In [None]:
import json

client = Session().sagemaker_runtime_client
body = ["공장 훗날"]

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(body, ensure_ascii=False),
    ContentType="application/json",
)

response["Body"].read()