# Technical Support Ticket Triage using SageMaker Neo and an LLM Model

This notebook demonstrates how to compile a large language model (LLM) using Amazon SageMaker Neo and deploy the compiled model on-device. The use case addresses the problem of slow manual categorization of technical support tickets by automating the process. The model is designed to classify tickets into categories such as **Billing** and **Technical**.

## Setup

We begin by importing the necessary libraries and setting up environment variables. Make sure to update the S3 bucket name and other variables as needed for your AWS environment.

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
import os

# Set the AWS region
region = boto3.Session().region_name

# Get the SageMaker execution role
role = get_execution_role()

# Specify your S3 bucket name (update this value)
bucket = 'your-s3-bucket-name'  # Replace with your S3 bucket name

# Define S3 paths for the pre-trained model artifact and for compiled output
model_data = f's3://{bucket}/models/ticket_classifier.tar.gz'  # Pre-trained model artifact
compiled_output_path = f's3://{bucket}/compiled-models/'

print(f'AWS Region: {region}')
print(f'Model Data: {model_data}')
print(f'Compiled Model Output Path: {compiled_output_path}')

## Model Compilation with SageMaker Neo

In this section, we define a SageMaker model and compile it with SageMaker Neo. The compilation step optimizes the model for the target device family (for example, a CPU instance type like `ml_c5`).

For a text classification model, the input shape might represent token IDs with a fixed sequence length. Adjust the `input_shape` parameter as required by your model.

In [None]:
from sagemaker.model import Model

# Create a SageMaker Model object using the pre-trained model artifact
model = Model(
    model_data=model_data,
    role=role,
    framework_version='1.9.0',  # Example framework version; adjust as needed
    sagemaker_session=sagemaker.Session()
)

# Compile the model for the target instance family. The input shape here is an example for a text model.
compiled_model = model.compile(
    target_instance_family='ml_c5',  
    input_shape={'input_ids': [1, 128]},  # Example: batch size 1, sequence length 128
    output_path=compiled_output_path
)

print('Compilation job initiated. Check the SageMaker console for job status.')

## Deploying the Compiled Model On-Device

After the model is compiled, the optimized artifact is stored in S3. In a real-world scenario, you would download this artifact and integrate it with the runtime environment on your target edge device. The following cell demonstrates how to list and download the compiled model artifact from S3.

In [None]:
s3 = boto3.client('s3')

# List objects in the compiled model output path
compiled_bucket = bucket
compiled_prefix = 'compiled-models/'  # Adjust if necessary

response = s3.list_objects_v2(Bucket=compiled_bucket, Prefix=compiled_prefix)

if 'Contents' in response:
    print('Compiled model artifacts:')
    for obj in response['Contents']:
        print(obj['Key'])
else:
    print('No compiled model artifacts found.')

## On-Device Inference Simulation

The following code simulates the process of on-device inference using the compiled model artifact. In production, the artifact would be loaded by a device-specific runtime (for example, an ONNX runtime if the artifact is in ONNX format).

For our technical support ticket triage use case, the model processes ticket text and classifies it into categories (e.g., **Billing** or **Technical**).

In [None]:
# Pseudo-code for on-device inference

def load_compiled_model(model_path):
    # In practice, load the compiled model using the device-specific runtime.
    # For example, if the compiled model is in ONNX format, you might use onnxruntime.InferenceSession.
    print(f'Loading compiled model from: {model_path}')
    # Return a model object (this is a placeholder for the actual model loader)
    return None

def preprocess_ticket(ticket_text):
    # Convert the ticket text into the input format required by the model (e.g., tokenization, padding).
    # In practice, use the tokenizer associated with your LLM model.
    return {'input_ids': [0] * 128}  # Dummy tokenized input

def infer(model, processed_input):
    # Run inference on the processed input using the loaded model.
    # This function simulates inference and returns dummy probabilities for each category.
    simulated_output = {'Billing': 0.2, 'Technical': 0.8}
    return simulated_output

# Assume the compiled model artifact has been downloaded to a local directory (update this path as needed)
local_compiled_model_path = 'compiled_model_artifact'  # Replace with the actual local path

# Load the compiled model (simulation)
compiled_model_obj = load_compiled_model(local_compiled_model_path)

# Example support ticket text
ticket_text = 'My internet is down and I cannot connect. Please help.'

# Preprocess the ticket text
processed_input = preprocess_ticket(ticket_text)

# Perform inference (simulation)
inference_result = infer(compiled_model_obj, processed_input)

print('Inference Result:')
print(inference_result)

# Determine the predicted category based on the highest score
predicted_category = max(inference_result, key=inference_result.get)
print(f'Predicted Ticket Category: {predicted_category}')

## Conclusion

In this notebook, we demonstrated how to compile an LLM model for technical support ticket triage using Amazon SageMaker Neo. The compiled model is optimized for on-device deployment, enabling faster inference on edge devices. This approach can help automate ticket categorization, reduce manual effort, and improve response times.

For production use, integrate the compiled artifact with your device-specific runtime environment to achieve real-time inference on edge devices.