# Alternative to Amazon Lookout for Vision with SageMaker Algorithm: Computer Vision Defect Detection Model from AWS Marketplace
## Updated with SageMaker Python SDK v2 and Model Trainer Class

Amazon Lookout for Vision, the AWS service designed to create customized artificial intelligence and machine learning (AI/ML) computer vision models for automated quality inspection, will be discontinuing on October 31, 2025. As part of this transition, the Lookout for Vision (LFV) team has published their algorithm for use within Amazon SageMaker, ensuring continuity and expanded possibilities for users.

This notebook guides you through the process of:

1. Subscribe to the LFV-published algorithm in Amazon SageMaker
1. Train an image classification model using this algorithm with **SageMaker Python SDK v2**
1. Train an image segmentation model using this algorithm with **Model Trainer class**

By following this guide, you'll be able to seamlessly incorporate LFV's proven computer vision capabilities into your SageMaker workflows using modern SDK patterns. This updated notebook replaces direct boto3 calls with the SageMaker Python SDK v2 for better maintainability and ease of use.

-------------

### Install Required Packages

First, let's ensure we have the correct version of the SageMaker Python SDK v2 installed.

In [None]:
# install updated numexpr
!pip install --upgrade numexpr>=2.8.4

# Install or upgrade SageMaker Python SDK v2
!pip install --upgrade 'sagemaker>=2.0,<3.0'

In [None]:
# Verify installation
import sagemaker
print(f"SageMaker SDK version: {sagemaker.__version__}")

# Check if we have the required classes
try:
    from sagemaker.algorithm import AlgorithmEstimator
    from sagemaker.inputs import TrainingInput
    print("✓ Required SageMaker SDK v2 classes are available")
except ImportError as e:
    print(f"✗ Missing required classes: {e}")
    print("Please restart the kernel after installation")

### Pre-requisites

1. Note: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using **Amazon SageMaker**.
1. **SageMaker Python SDK v2** installed (sagemaker>=2.0)
1. To use this algorithm successfully, ensure that:
   
   A. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used:
   
        a. aws-marketplace:ViewSubscriptions
        b. aws-marketplace:Unsubscribe
        c. aws-marketplace:Subscribe
   
   B: or your AWS account has a subscription to: [Computer Vision Defect Detection Model](https://aws.amazon.com/marketplace/pp/prodview-j72hhmlt6avp6).

### Subscribe to the algorithm

To subscribe to the algorithm:

1. Open the algorithm listing page: [Computer Vision Defect Detection Model](https://aws.amazon.com/marketplace/pp/prodview-j72hhmlt6avp6).
1. On the AWS Marketplace listing, click on Continue to subscribe button.
1. On the Subscribe to this software page, review and click on "Accept Offer" if you agree with EULA, pricing, and support terms.
1. Once you click on Continue to configuration button and then choose a region, you will see a Product Arn. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the algorithm name and specify the same in the following cell.

In [None]:
# TODO: change this to use subscribed SageMaker algorithm
algorithm_name = "Customer to specify the algorithm name after subscription"

### Initial Set Up 

Set up your SageMaker environment: First, we'll import necessary libraries, set up our SageMaker session, and define key variables using SageMaker Python SDK v2.

In [None]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.algorithm import AlgorithmEstimator
from sagemaker.inputs import TrainingInput
from sagemaker.model import Model
from sagemaker.transformer import Transformer
import boto3
import json
import datetime

In [None]:
# Initialize SageMaker session and get execution role
sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
#bucket = sagemaker_session.default_bucket()
bucket =  "bucket name goes here" # "l4v-dda"
role = get_execution_role()

# Project name would be used as part of s3 output path
project = "LFV-public-test-bd" # "Computer-Vision-Defect-Detection"

print(f"Region: {region}")
print(f"Bucket: {bucket}")
print(f"Role: {role}")

### Create IAM Role with SageMaker Permission (if needed)

If you're not running in a SageMaker environment, you may need to create an IAM role. Otherwise, the get_execution_role() function will automatically retrieve the appropriate role.

In [None]:
# Only run this cell if you need to create a custom IAM role
# In most SageMaker environments, get_execution_role() is sufficient

try:
    # Test if the current role works
    print(f"Using existing execution role: {role}")
except Exception as e:
    print("Creating custom IAM role...")
    iam_client = boto3.client('iam')
    trust_policy = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "sagemaker.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    
    role_name = "SageMakerExecutionRole"
    try:
        response = iam_client.create_role(
            RoleName=role_name,
            AssumeRolePolicyDocument=json.dumps(trust_policy),
            Description="IAM role with full S3 and SageMaker access"
        )
        role = response['Role']['Arn']
        
        # Attach policies
        iam_client.attach_role_policy(
            RoleName=role_name,
            PolicyArn="arn:aws:iam::aws:policy/AmazonS3FullAccess"
        )
        iam_client.attach_role_policy(
            RoleName=role_name,
            PolicyArn="arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
        )
        print(f"Created role: {role}")
    except Exception:
        response = iam_client.get_role(RoleName=role_name)
        role = response['Role']['Arn']
        print(f"Using existing role: {role}")

----------------------------------
We will go through two examples, one for image classification model, the other one for image segmentation model using SageMaker Python SDK v2.

## Classification Model with AlgorithmEstimator

**Prepare your classification data:**
For this step, we'll follow the data preparation guidelines as outlined in the Amazon Lookout for Vision Developer Guide. We will use cookie dataset in this guide.

a. Organize your images:
Place your normal (non-defective) images in a S3 path named "normal".
Place your anomalous (defective) images in a S3 path named "anomaly".

b. Create a manifest file: The manifest file is a JSON Lines file that lists your images and their classifications.

In [None]:
!cat train_class.manifest

Upload manifest file to S3 using SageMaker session utilities.

In [None]:
# Upload manifest file using SageMaker session
classification_manifest_key = f"{project}/manifests/train.manifest"
classification_s3_path = sagemaker_session.upload_data(
    path='train_class.manifest',
    bucket=bucket,
    key_prefix=classification_manifest_key
)
print(f"Classification manifest uploaded to: {classification_s3_path}")


**Create and train the model using AlgorithmEstimator:**
Now we'll use the SageMaker Python SDK v2's AlgorithmEstimator class to create and train our model.

In [None]:
# Create AlgorithmEstimator for classification
classification_estimator = AlgorithmEstimator(
    algorithm_arn=algorithm_name,
    role=role,
    instance_count=1,
    instance_type='ml.g4dn.2xlarge',
    volume_size=20,
    max_run=7200,
    input_mode='Pipe',  # REQUIRED: Algorithm only supports Pipe mode
    sagemaker_session=sagemaker_session,
    enable_network_isolation=True
)

# Set hyperparameters
classification_estimator.set_hyperparameters(
    ModelType='classification',
    TestInputDataAttributeNames='source-ref,anomaly-label-metadata,anomaly-label',
    TrainingInputDataAttributeNames='source-ref,anomaly-label-metadata,anomaly-label'
)

print("Classification estimator configured successfully")

In [None]:
# Define training input using TrainingInput class
classification_training_input = TrainingInput(
    s3_data=classification_s3_path,
    s3_data_type='AugmentedManifestFile',
    attribute_names=[
        'source-ref',
        'anomaly-label-metadata', 
        'anomaly-label'
    ],
    record_wrapping='RecordIO',
    input_mode='Pipe' # Must match the estimator's input_mode
)

# Start training job
classification_job_name = f'defect-detection-classification-{datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}'

print(f"Starting classification training job: {classification_job_name}")
classification_estimator.fit(
    inputs={'training': classification_training_input},
    job_name=classification_job_name,
    wait=True,
    logs=True
)

The training job will run with real-time log streaming and automatic waiting for completion. The `wait=True` parameter ensures the notebook waits for job completion, while `logs=True` streams training logs in real-time for monitoring progress.

******************

## Segmentation Model with Algorithm Estimator

**Prepare your segmentation data:**
For segmentation tasks, the data preparation process includes both images and corresponding segmentation masks.

In [None]:
!cat train_segmentation.manifest

In [None]:
seg_manifest_key = f"{project}/manifests/train_segmentation.manifest "

# Upload segmentation manifest file using SageMaker session
segmentation_s3_path = sagemaker_session.upload_data(
    path='train_segmentation.manifest',
    bucket=bucket,
    key_prefix=seg_manifest_key
)
print(f"Segmentation manifest uploaded to: {segmentation_s3_path}")


**Create and train segmentation model using AlgorithmEstimator:**

In [None]:
# Create AlgorithmEstimator for segmentation
segmentation_estimator = AlgorithmEstimator(
    algorithm_arn=algorithm_name,
    role=role,
    instance_count=1,
    instance_type='ml.g4dn.2xlarge',
    volume_size=20,
    max_run=7200,
    input_mode='Pipe',  # REQUIRED: Algorithm only supports Pipe mode
    sagemaker_session=sagemaker_session,
    enable_network_isolation=True
)

# Set hyperparameters for segmentation
segmentation_estimator.set_hyperparameters(
    ModelType='segmentation',
    TestInputDataAttributeNames='source-ref,anomaly-label-metadata,anomaly-label,anomaly-mask-ref-metadata,anomaly-mask-ref',
    TrainingInputDataAttributeNames='source-ref,anomaly-label-metadata,anomaly-label,anomaly-mask-ref-metadata,anomaly-mask-ref'
)

print("Segmentation estimator configured successfully")

In [None]:
# Define training input for segmentation
segmentation_training_input = TrainingInput(
    s3_data=segmentation_s3_path,
    s3_data_type='AugmentedManifestFile',
    attribute_names=[
        'source-ref',
        'anomaly-label-metadata',
        'anomaly-label',
        'anomaly-mask-ref-metadata',
        'anomaly-mask-ref'
    ],
    record_wrapping='RecordIO',
    input_mode='Pipe'
)

# Start segmentation training job
segmentation_job_name = f'defect-detection-segmentation-{datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}'

print(f"Starting segmentation training job: {segmentation_job_name}")
segmentation_estimator.fit(
    inputs={'training': segmentation_training_input},
    job_name=segmentation_job_name,
    wait=True,
    logs=True
)

***********

### (Optional) Run Batch Transform Inference using SageMaker SDK v2

We'll use the classification model to demonstrate inference using SageMaker Python SDK v2 patterns.

In [None]:
# Create estimator from training job
#classification_training_job_name = "" # No need to set, the valuw from the model trained for the classification model applies
estimator = AlgorithmEstimator.attach(classification_training_job_name)

In [None]:
# Create transformer from trained estimator
transformer = estimator.transformer(
    instance_count=1,
    output_path=s3_output_path,
    instance_type='ml.c5.2xlarge'
)

In [None]:
# Run batch transform job

#############################################
# Change to your input/output data S3 path  #
#############################################

s3_input_data = "s3://<Specify-s3-path-to-test-images>"
s3_output_path = f"s3://{bucket}/{project}/batch-transform-output"

print(f"Starting batch transform job: {classification_job_name}")
print(f"Input data: {s3_input_data}")
print(f"Output path: {s3_output_path}")

In [None]:
transformer.transform(
    data=s3_input_data,
    content_type='image/jpeg',
    wait=False
)

# Wait with status updates
transformer.wait(logs=True)

# Wait for completion
#transformer.wait()

print(f"Batch transform job completed. Results available at: {s3_output_path}")

After the batch transform job completes successfully, check the S3 output path for results. Each input image will have a corresponding `.out` file containing the prediction results in JSON format:

```json
{"Source": {"Type": "direct"}, "IsAnomalous": true, "Confidence": 0.9378743361326908}
```



In [None]:
### (Optional) Running real-time inference using Amazon SageMaker endpoints

In [None]:

#classification_training_job_name = "defect-detection-classification-2025-10-01-00-29-57" # remove
classification_training_job_name = "<provide training job name here>"

# Create estimator from training job
estimator = AlgorithmEstimator.attach(classification_training_job_name)

# Deploy endpoint using SageMaker v2 SDK
predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.c5.2xlarge'
)

print(f"Endpoint deployed: {predictor.endpoint_name}")


In [None]:
### Invoke the endpoint 

In [None]:
#Invoke the endpoint 

import os
import boto3

# S3 bucket and path from your batch transform job
s3_bucket = "[s3 bucket]"
s3_prefix = "[s3 prefix]"

# Initialize S3 client
s3 = boto3.client('s3')

# List objects in the S3 bucket to find an image
response = s3.list_objects_v2(
    Bucket=s3_bucket,
    Prefix=s3_prefix,
    MaxKeys=10
)

# Find the first image file
image_key = None
for obj in response.get('Contents', []):
    if obj['Key'].lower().endswith(('.jpg', '.jpeg', '.png')):
        image_key = obj['Key']
        break

if not image_key:
    print("No image files found in the S3 bucket.")
else:
    print(f"Using image: s3://{s3_bucket}/{image_key}")
    
    # Download the image to a temporary file
    local_file = '/tmp/test_image.jpg'
    s3.download_file(s3_bucket, image_key, local_file)
    
    # Read the image file and predict
    with open(local_file, 'rb') as f:
        image_data = f.read()
    
    # Invoke the endpoint using predictor
    result = predictor.predict(image_data)
    
    # Clean up the temporary file
    os.remove(local_file)
    
    # Print the result
    print("\nEndpoint Response:")
    print(json.dumps(result, indent=2))


In [None]:
# Delete the endpoint
predictor.delete_endpoint()
print("Endpoint deleted")
