### Batch Transform notebook
- Notebook initiates SageMaker Batch Transform job for a HuggingFace model stored in S3


### Generate input file for batch transform
- Inputs csv file and outputs jsonl file
- Adds "return_all_scores" parameter to file so that batch transform job will return all inference scores per label

In [2]:
# generate input file
import csv
import json
from sagemaker.s3 import S3Uploader, s3_path_join

# datset files
dataset_csv_file = "sample_data/sample_data_input.csv"
dataset_jsonl_file = "outputs/data.jsonl"

with open(dataset_csv_file, "r+") as infile, open(dataset_jsonl_file, "w+") as outfile:
    reader = csv.DictReader(infile)
    for row in reader:
        row["parameters"] = {"return_all_scores": True, "truncation":True, "max_length":512}
        json.dump(row, outfile)
        print(row)
        outfile.write("\n")

sagemaker_session_bucket = "sector-classification-aiml"
# uploads a given file to S3.
input_s3_path = s3_path_join("s3://", sagemaker_session_bucket, "batch_transform/input")
output_s3_path = s3_path_join(
    "s3://", sagemaker_session_bucket, "batch_transform/output"
)
s3_file_uri = S3Uploader.upload(dataset_jsonl_file, input_s3_path)

print(f"{dataset_jsonl_file} uploaded to {s3_file_uri}")

{'inputs': 'Theta Lake is a machine learning based cyber security company with a focus on modern work collaboration tools.', 'parameters': {'return_all_scores': True, 'truncation': True, 'max_length': 512}}
{'inputs': 'Pluribus Networks is the developer of an open, virtualized and highly programmable network fabric for next generation Data Centers with simplified management and white-box economics.', 'parameters': {'return_all_scores': True, 'truncation': True, 'max_length': 512}}
{'inputs': 'Remine is the developer of a home buying platform designed to efficiently connect mortgage lenders, real estate agents, and consumers in one streamlined experience.', 'parameters': {'return_all_scores': True, 'truncation': True, 'max_length': 512}}
{'inputs': 'Minted is a lifestyle brand and developer of a design marketplace designed to connect users with the world’s best artists to create something one of a kind.', 'parameters': {'return_all_scores': True, 'truncation': True, 'max_length': 512}}


### Load model from S3

In [None]:
from sagemaker.huggingface import HuggingFaceModel
from dotenv import load_dotenv
import os

load_dotenv()

model_uri = os.environ.get("MODEL_URI") # S3 URI path to trained model
role = os.environ.get("SAGEMAKER_ROLE")

transformers_version = "4.26"
pytorch_version = "1.13"
python_version = "py39"

huggingface_model = HuggingFaceModel(
    model_data=model_uri,
    role=role,
    transformers_version=transformers_version,
    pytorch_version=pytorch_version,
    py_version=python_version,
    env={"HF_TASK": "text-classification"},
)

### Batch transform job

In [2]:
# source: https://github.com/huggingface/notebooks/blob/main/sagemaker/12_batch_transform_inference/sagemaker-notebook.ipynb
# parameters source: https://discuss.huggingface.co/t/errors-while-running-a-sagemaker-batch-transform-inference-job/38598
output_s3_path = "s3://sector-classification-aiml/batch_transform/output"
s3_file_uri = "s3://sector-classification-aiml/batch_transform/input/data.jsonl"

# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path=output_s3_path, # we are using the same s3 path to save the output with the input
    strategy='SingleRecord')

# starts batch transform job and uses s3 data as input
batch_job.transform(
    data=s3_file_uri,
    content_type='application/json',    
    split_type='Line')

INFO:sagemaker:Creating transform job with name: huggingface-pytorch-inference-2024-06-11-16-03-33-799


....................................
2024-06-11T16:09:48,201 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /opt/conda/lib/python3.9/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 4008 M
Python executable: /opt/conda/bin/python3.9
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /
Initial Models: model=/opt/ml/model
Log dir: null
Metrics dir: null
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2024-06-11T16:09:48,212 [INFO ] main com.amazonaws.ml.mms.ModelServer - Loading initial models: /opt/ml/model preload_model: false
2024-06-11T16:09:48,288 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
2024-06-11T1