# SageMaker Training for DDA (Defect Detection Application)

This notebook demonstrates how to train and compile computer vision models for defect detection using Amazon SageMaker. The workflow includes:
1. Training classification and segmentation models
2. Compiling models for different target devices (x86_64, ARM64, Jetson Xavier)
3. Preparing models for deployment with DDA edge application

## Prerequisites

Before running this notebook, ensure you have:

1. **Environment**: Open this notebook in Amazon SageMaker Notebook Instance or SageMaker Studio
2. **SageMaker Experience**: Basic familiarity with Amazon SageMaker
3. **Marketplace Access**: Either:
   - IAM permissions for AWS Marketplace operations:
     - `aws-marketplace:ViewSubscriptions`
     - `aws-marketplace:Unsubscribe`
     - `aws-marketplace:Subscribe`
   - OR existing subscription to [Computer Vision Defect Detection Model](https://aws.amazon.com/marketplace/pp/prodview-j72hhmlt6avp6)

## Step 1: Subscribe to Algorithm

To use the defect detection algorithm:

1. Visit: [Computer Vision Defect Detection Model](https://aws.amazon.com/marketplace/pp/prodview-j72hhmlt6avp6)
2. Click **Continue to subscribe**
3. Review and **Accept Offer** (EULA, pricing, support terms)
4. Click **Continue to configuration** and select your region
5. Copy the **Product ARN** and paste it below

In [None]:
# Replace with your subscribed algorithm ARN from AWS Marketplace
algorithm_name = "<Customer to specify the algorithm name after subscription>"

## Step 2: Environment Setup

In [None]:
# Import required libraries
import boto3
import sagemaker
import json
import datetime
import time

In [None]:
# Initialize SageMaker session and get default configurations
session = sagemaker.Session()
region = session.boto_region_name
bucket = session.default_bucket()

# Project identifier for S3 output paths
project = "LFV-public-test"

print(f"Region: {region}")
print(f"S3 Bucket: {bucket}")
print(f"Project: {project}")

## Step 3: Setup Sample Cookie Dataset (Optional)

**For Testing Only:** Download and prepare sample cookie dataset from Lookout for Vision.

**Skip this entire section if you have:**
- Your own labeled training images
- Custom manifest files pointing to your S3 data
- Data from SageMaker Ground Truth labeling jobs

In [None]:
# Setup S3 paths and create project structure
s3_client = boto3.client('s3')
s3_uri = f"s3://{bucket}/{project}/"

# Create S3 folder structure
folders = ['', 'output/', 'compilation_output/']
for folder in folders:
    s3_client.put_object(Bucket=bucket, Key=f"{project}/{folder}")

# Define paths for later use
output_path = f's3://{bucket}/{project}/output'
compilation_output_path = f's3://{bucket}/{project}/compilation_output'

print(f"‚úÖ S3 structure created: {s3_uri}")

In [None]:
# Download cookie dataset from GitHub
!git clone --depth 1 https://github.com/aws-samples/amazon-lookout-for-vision.git
!cp -r amazon-lookout-for-vision/computer-vision-defect-detection/cookie-dataset ./
!rm -rf amazon-lookout-for-vision

print(f"‚úÖ Cookie dataset downloaded ({len(os.listdir('cookie-dataset/dataset-files/training-images'))} images)")

In [None]:
# Process and upload dataset to S3
import os
import shutil

# Copy dataset files locally
if os.path.exists('dataset-files'):
    shutil.rmtree('dataset-files')
shutil.copytree('cookie-dataset/dataset-files', 'dataset-files')

# Copy mask file
if os.path.exists('cookie-dataset/dummy_anomaly_mask.png'):
    shutil.copy2('cookie-dataset/dummy_anomaly_mask.png', 'dataset-files/mask-images/')

# Upload dataset to S3 using getting_started.py script
!python cookie-dataset/getting_started.py {s3_uri}

# Set manifest URIs
training_manifest_s3_uri = f"{s3_uri}manifests/train.manifest"

print(f"‚úÖ Dataset uploaded to S3: {training_manifest_s3_uri}")

In [None]:
# Download and process segmentation manifest
import json

# Download segmentation manifest from GitHub
!wget -q https://raw.githubusercontent.com/aws-samples/amazon-lookout-for-vision/d4002d64b1ba395d332b994a0c268342ac62b1ed/computer-vision-defect-detection/train_segmentation.manifest

# Update manifest with current S3 bucket
def update_manifest_paths(manifest_file, old_prefix, new_prefix):
    updated_lines = []
    with open(manifest_file, 'r') as f:
        for line in f:
            data = json.loads(line.strip())
            for key in ['source-ref', 'anomaly-mask-ref']:
                if key in data and data[key].startswith(old_prefix):
                    data[key] = data[key].replace(old_prefix, new_prefix)
            updated_lines.append(json.dumps(data))
    return updated_lines

# Process manifest
old_prefix = 's3://lookoutvision-us-east-1-0e205be246/getting-started/'
segmentation_lines = update_manifest_paths('train_segmentation.manifest', old_prefix, s3_uri)

# Save updated manifest
seg_manifest_path = 'dataset-files/manifests/train_segmentation.manifest'
with open(seg_manifest_path, 'w') as f:
    f.write('\n'.join(segmentation_lines))

# Upload to S3
s3_key = f"{project}/manifests/train_segmentation.manifest"
s3_client.upload_file(seg_manifest_path, bucket, s3_key)
segmentation_manifest_s3_uri = f"s3://{bucket}/{s3_key}"

# Cleanup
os.remove('train_segmentation.manifest')

print(f"‚úÖ Segmentation manifest created: {segmentation_manifest_s3_uri}")

In [None]:
# Summary of created resources
print("üéâ Sample Dataset Setup Complete:")
print(f"üìÅ Training images: {s3_uri}training-images/")
print(f"üé≠ Mask images: {s3_uri}mask-images/")
print(f"üìã Classification manifest: {training_manifest_s3_uri}")
print(f"üîç Segmentation manifest: {segmentation_manifest_s3_uri}")
print(f"üìä Output path: {output_path}")
print(f"‚öôÔ∏è Compilation path: {compilation_output_path}")

## Step 4: Get SageMaker Execution Role

Get the current SageMaker execution role for training jobs.

In [None]:
# Get the current execution role
sm_role_arn = sagemaker.get_execution_role()
print(f"Current SageMaker execution role ARN: {sm_role_arn}")

# Now you can use sm_role_arn in your SageMaker operations

## Step 5: Train Classification Model

Start training job for binary classification (normal vs anomaly detection).

In [None]:
# Initialize SageMaker client and create unique job name
sagemaker_client = boto3.Session(region_name=region).client("sagemaker")
classification_training_job_name = 'LFV-classification-' + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

print(f"Classification training job: {classification_training_job_name}")

**Model Type Options:**
- `classification`: Standard classification model
- `classification-robust`: Enhanced model with improved robustness

**Data Attributes:**
- `source-ref`: Image file location
- `anomaly-label-metadata`: Label metadata
- `anomaly-label`: Binary classification label (normal/anomaly)

In [None]:
# Create classification training job
response = sagemaker_client.create_training_job(
    TrainingJobName=classification_training_job_name,
    
    # Model configuration
    HyperParameters={
        'ModelType': 'classification',  # Use 'classification-robust' for enhanced model
        'TestInputDataAttributeNames': 'source-ref,anomaly-label-metadata,anomaly-label',
        'TrainingInputDataAttributeNames': 'source-ref,anomaly-label-metadata,anomaly-label'
    },
    
    # Algorithm specification
    AlgorithmSpecification={
        'AlgorithmName': algorithm_name,
        'TrainingInputMode': 'File',
        'EnableSageMakerMetricsTimeSeries': False
    },
    
    # IAM role for training
    RoleArn=sm_role_arn,
    
    # Training data configuration
    InputDataConfig=[
        {
            'ChannelName': 'training',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'AugmentedManifestFile',
                    'S3Uri': 's3://lookoutvision-us-east-1-0e205be246/getting-started/manifests/train_class.manifest',
                    'S3DataDistributionType': 'ShardedByS3Key',
                    'AttributeNames': [
                        'source-ref',
                        'anomaly-label-metadata',
                        'anomaly-label'
                    ],
                }
            },
            'CompressionType': 'None',
            'RecordWrapperType': 'RecordIO',
            'InputMode': 'Pipe'
        },
    ],
    
    # Output configuration
    OutputDataConfig={'S3OutputPath': f's3://{bucket}/{project}/output'},
    
    # Compute resources
    ResourceConfig={
        'InstanceType': 'ml.g4dn.2xlarge',  # GPU instance for faster training
        'InstanceCount': 1,
        'VolumeSizeInGB': 20
    },
    
    # Training time limit (2 hours)
    StoppingCondition={
        'MaxRuntimeInSeconds': 7200
    }

    # Enable network isolation for security
    EnableNetworkIsolation=True 
)

print("Classification training job started successfully")

In [None]:
# Monitor classification training progress
print("Monitoring classification training progress...")
print("Status: ", end="")

while True:
    training_response = sagemaker_client.describe_training_job(
        TrainingJobName=classification_training_job_name
    )
    
    status = training_response['TrainingJobStatus']
    
    if status == 'InProgress':
        print(".", end='')
    elif status == 'Completed':
        print("\nClassification training completed successfully!")
        break
    elif status == 'Failed':
        print("\nClassification training failed!")
        print(f"Failure reason: {training_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)  # Check every minute

---

## Step 6: Train Segmentation Model

Start training job for pixel-level segmentation (identifies exact defect locations).

In [None]:
# Create unique segmentation training job name
segmentation_training_job_name = 'LFV-segmentation-' + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
print(f"Segmentation training job: {segmentation_training_job_name}")

In [None]:
# Create segmentation training job
response = sagemaker_client.create_training_job(
    TrainingJobName=segmentation_training_job_name,
    
    # Model configuration for segmentation
    HyperParameters={
        'ModelType': 'segmentation',  # Use 'segmentation-robust' for enhanced model
        'TestInputDataAttributeNames': 'source-ref,anomaly-label-metadata,anomaly-label,anomaly-mask-ref-metadata,anomaly-mask-ref',
        'TrainingInputDataAttributeNames': 'source-ref,anomaly-label-metadata,anomaly-label,anomaly-mask-ref-metadata,anomaly-mask-ref'
        # Optional: Add 'classification_logic': 'seg_head' to use segmentation head only
    },
    
    # Algorithm specification
    AlgorithmSpecification={
        'AlgorithmName': algorithm_name,
        'TrainingInputMode': 'File',
        'EnableSageMakerMetricsTimeSeries': False
    },
    
    # IAM role for training
    RoleArn=sm_role_arn,
    
    # Training data with mask annotations
    InputDataConfig=[
        {
            'ChannelName': 'training',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'AugmentedManifestFile',
                    'S3Uri': 's3://lookoutvision-us-east-1-0e205be246/getting-started/manifests/train_segmentation.manifest',
                    'S3DataDistributionType': 'ShardedByS3Key',
                    'AttributeNames': [
                        'source-ref',
                        'anomaly-label-metadata',
                        'anomaly-label',
                        'anomaly-mask-ref-metadata',  # Segmentation mask metadata
                        'anomaly-mask-ref'            # Segmentation mask file
                    ],
                }
            },
            'CompressionType': 'None',
            'RecordWrapperType': 'RecordIO',
            'InputMode': 'Pipe'
        },
    ],
    
    # Output configuration
    OutputDataConfig={'S3OutputPath': f's3://{bucket}/{project}/output'},
    
    # Compute resources
    ResourceConfig={
        'InstanceType': 'ml.g4dn.2xlarge',
        'InstanceCount': 1,
        'VolumeSizeInGB': 20
    },
    
    # Training time limit
    StoppingCondition={
        'MaxRuntimeInSeconds': 7200
    }
    # Enable network isolation for security
    EnableNetworkIsolation=True 
)

print("Segmentation training job started successfully")

In [None]:
# Monitor segmentation training progress
print("Monitoring segmentation training progress...")
print("Status: ", end="")

while True:
    training_response = sagemaker_client.describe_training_job(
        TrainingJobName=segmentation_training_job_name
    )
    
    status = training_response['TrainingJobStatus']
    
    if status == 'InProgress':
        print(".", end='')
    elif status == 'Completed':
        print("\nSegmentation training completed successfully!")
        break
    elif status == 'Failed':
        print("\nSegmentation training failed!")
        print(f"Failure reason: {training_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

**Segmentation Model Options:**

1. **Segmentation Head Only:**
   ```python
   HyperParameters={
       'ModelType': 'segmentation',
       'classification_logic': 'seg_head',
       # ... other parameters
   }
   ```

2. **Robust Segmentation Model:**
   ```python
   HyperParameters={
       'ModelType': 'segmentation-robust',
       # ... other parameters
   }
   ```

---

## Step 7: Model Compilation - Classification

Compile the trained classification model for different target devices. SageMaker Neo optimizes models for specific hardware platforms.

### Prepare Model for Compilation

SageMaker compilation requires a single PyTorch model file. We need to:
1. Download the trained model artifact
2. Extract and repackage the `mochi.pt` file
3. Upload to S3 for compilation

In [None]:
# Get classification model artifact location
res_class = sagemaker_client.describe_training_job(TrainingJobName=classification_training_job_name)
output_model_path = res_class['ModelArtifacts']['S3ModelArtifacts']
print(f"Classification model artifact: {output_model_path}")

In [None]:
# Parse S3 URI to extract bucket and key
from urllib.parse import urlparse

parsed_url = urlparse(output_model_path)
output_bucket = parsed_url.netloc
output_key = parsed_url.path.lstrip('/')

print(f"S3 Bucket: {output_bucket}")
print(f"S3 Key: {output_key}")

In [None]:
# Download, extract, and repackage model for compilation
import tarfile
import os
from pathlib import Path

s3_client = boto3.client('s3')
path = "./classification"
Path(path).mkdir(parents=True, exist_ok=True)

# Download model artifact from S3
input_tar_gz = os.path.join(path, 'model.tar.gz')
s3_client.download_file(output_bucket, output_key, input_tar_gz)
print(f"Downloaded model artifact to {input_tar_gz}")

# Extract the model archive
extract_dir = os.path.join(path, 'extracted')
Path(extract_dir).mkdir(parents=True, exist_ok=True)
with tarfile.open(input_tar_gz, 'r:gz') as tar:
    tar.extractall(path=extract_dir)
print(f"Extracted contents to {extract_dir}")

# Find the mochi.pt model file
model_file = os.path.join(extract_dir, 'mochi.pt')
if not os.path.exists(model_file):
    raise FileNotFoundError("mochi.pt file not found in extracted contents")
print(f"Found model file: {model_file}")

# Extract input_shape from mochi.json
mochi_json_path = os.path.join(extract_dir, 'mochi.json')
if not os.path.exists(mochi_json_path):
    raise FileNotFoundError("mochi.json file not found in extracted contents")

print(f"Found mochi.json file: {mochi_json_path}")
with open(mochi_json_path, 'r') as f:
    mochi_data = json.load(f)
    input_shape = mochi_data['stages'][0]['input_shape']
    print(f"Extracted input_shape: {input_shape}")
    
    # Extract height and width
    height = input_shape[2]
    width = input_shape[3]
    
    # Build tensor shape for DataInputConfig
    tensor_shape = [1, 3, height, width]
    classification_data_input_config = json.dumps({"input_shape": tensor_shape})
    
    print(f"Height: {height}, Width: {width}")
    print(f"Classification DataInputConfig: {classification_data_input_config}")

# Create new archive with just the model file
output_tar_gz = os.path.join(path, 'classification.tar.gz')
with tarfile.open(output_tar_gz, "w:gz") as tar:
    tar.add(model_file, arcname=os.path.basename(model_file))
print(f"Created compilation-ready archive: {output_tar_gz}")

# Upload repackaged model to S3
target_key = output_key.rsplit('/', 1)[0] + '/classification.tar.gz'
s3_client.upload_file(output_tar_gz, output_bucket, target_key)
print(f"Uploaded to s3://{output_bucket}/{target_key}")

### Compile for Jetson Xavier (JetPack 4)

Target: ARM64 architecture with NVIDIA GPU acceleration

In [None]:
# Create compilation job for Jetson Xavier
compilation_job_name = "class-xavier-gpu-" + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
compressed_model_path = f"s3://{output_bucket}/{target_key}"

print(f"Compilation job: {compilation_job_name}")
print(f"Model path: {compressed_model_path}")

In [None]:
# Start compilation for Jetson Xavier
create_response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=sm_role_arn,
    
    # Input model configuration
    InputConfig={
        'S3Uri': compressed_model_path,
        'DataInputConfig': classification_data_input_config,
        'Framework': 'PYTORCH',
        'FrameworkVersion': '1.8'
    },
    
    # Output and target platform configuration
    OutputConfig={
        'S3OutputLocation': f's3://{bucket}/{project}/compilation_output',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'ARM64',
            'Accelerator': 'NVIDIA'  # GPU acceleration
        },
        # Jetson Xavier specific compiler options
        'CompilerOptions': '{"cuda-ver": "10.2","gpu-code": "sm_72","trt-ver": "8.2.1"}'
    },
    
    # Compilation time limit (1 hour)
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    }
)

print("Jetson Xavier compilation job started")

In [None]:
# Monitor Jetson Xavier compilation progress
print("Monitoring Jetson Xavier compilation...")
print("Status: ", end="")

while True:
    compile_response = sagemaker_client.describe_compilation_job(
        CompilationJobName=compilation_job_name
    )
    
    status = compile_response['CompilationJobStatus']
    
    if status == 'INPROGRESS':
        print(".", end='')
    elif status == 'STARTING':
        print("*", end='')
    elif status == 'COMPLETED':
        print("\nJetson Xavier compilation completed!")
        break
    elif status == 'FAILED':
        print("\nJetson Xavier compilation failed!")
        print(f"Failure reason: {compile_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

### Compile for x86_64 CPU

Target: Standard x86_64 architecture (Intel/AMD processors)

In [None]:
# Create compilation job for x86_64 CPU
compilation_job_name = "class-x86-cpu-" + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
print(f"x86_64 compilation job: {compilation_job_name}")

In [None]:
# Start compilation for x86_64 CPU
create_response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=sm_role_arn,
    
    # Input model configuration
    InputConfig={
        'S3Uri': compressed_model_path,
        'DataInputConfig': classification_data_input_config,
        'Framework': 'PYTORCH',
        'FrameworkVersion': '1.8'
    },
    
    # Output and target platform configuration
    OutputConfig={
        'S3OutputLocation': f's3://{bucket}/{project}/compilation_output',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'X86_64'  # No GPU acceleration for CPU target
        }
    },
    
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    }
)

print("x86_64 CPU compilation job started")

In [None]:
# Monitor x86_64 compilation progress
print("Monitoring x86_64 compilation...")
print("Status: ", end="")

while True:
    compile_response = sagemaker_client.describe_compilation_job(
        CompilationJobName=compilation_job_name
    )
    
    status = compile_response['CompilationJobStatus']
    
    if status == 'INPROGRESS':
        print(".", end='')
    elif status == 'STARTING':
        print("*", end='')
    elif status == 'COMPLETED':
        print("\nx86_64 compilation completed!")
        break
    elif status == 'FAILED':
        print("\nx86_64 compilation failed!")
        print(f"Failure reason: {compile_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

### Compile for ARM64 CPU

Target: ARM64 architecture without GPU acceleration

In [None]:
# Create compilation job for ARM64 CPU
compilation_arm_cpu = "class-arm-cpu-" + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
print(f"ARM64 CPU compilation job: {compilation_arm_cpu}")

In [None]:
# Start compilation for ARM64 CPU
create_arm_response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_arm_cpu,
    RoleArn=sm_role_arn,
    
    # Input model configuration
    InputConfig={
        'S3Uri': compressed_model_path,
        'DataInputConfig': classification_data_input_config,
        'Framework': 'PYTORCH',
        'FrameworkVersion': '1.8'
    },
    
    # Output and target platform configuration
    OutputConfig={
        'S3OutputLocation': f's3://{bucket}/{project}/compilation_output',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'ARM64'  # ARM64 without GPU
        }
    },
    
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    }
)

print("ARM64 CPU compilation job started")

In [None]:
# Monitor ARM64 compilation progress
print("Monitoring ARM64 compilation...")
print("Status: ", end="")

while True:
    create_arm_response = sagemaker_client.describe_compilation_job(
        CompilationJobName=compilation_arm_cpu
    )
    
    status = create_arm_response['CompilationJobStatus']
    
    if status == 'INPROGRESS':
        print(".", end='')
    elif status == 'STARTING':
        print("*", end='')
    elif status == 'COMPLETED':
        print("\nARM64 compilation completed!")
        break
    elif status == 'FAILED':
        print("\nARM64 compilation failed!")
        print(f"Failure reason: {create_arm_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

---

## Step 8: Model Compilation - Segmentation

Compile the trained segmentation model for x86_64 CPU deployment.

### Prepare Segmentation Model for Compilation

Similar process as classification: extract and repackage the segmentation model.

In [None]:
# Get segmentation model artifact location
res_seg = sagemaker_client.describe_training_job(TrainingJobName=segmentation_training_job_name)
seg_output_model_path = res_seg['ModelArtifacts']['S3ModelArtifacts']
print(f"Segmentation model artifact: {seg_output_model_path}")

In [None]:
# Parse segmentation model S3 URI
parsed_url = urlparse(seg_output_model_path)
output_bucket = parsed_url.netloc
output_key = parsed_url.path.lstrip('/')

print(f"S3 Bucket: {output_bucket}")
print(f"S3 Key: {output_key}")

In [None]:
# Download, extract, and repackage segmentation model
path = "./segmentation"
Path(path).mkdir(parents=True, exist_ok=True)

# Download segmentation model artifact
input_tar_gz = os.path.join(path, 'model.tar.gz')
s3_client.download_file(output_bucket, output_key, input_tar_gz)
print(f"Downloaded segmentation model to {input_tar_gz}")

# Extract the model archive
extract_dir = os.path.join(path, 'extracted')
Path(extract_dir).mkdir(parents=True, exist_ok=True)
with tarfile.open(input_tar_gz, 'r:gz') as tar:
    tar.extractall(path=extract_dir)
print(f"Extracted contents to {extract_dir}")

# Find the mochi.pt model file
model_file = os.path.join(extract_dir, 'mochi.pt')
if not os.path.exists(model_file):
    raise FileNotFoundError("mochi.pt file not found in segmentation model")
print(f"Found segmentation model file: {model_file}")

# Extract input_shape from mochi.json
mochi_json_path = os.path.join(extract_dir, 'mochi.json')
if not os.path.exists(mochi_json_path):
    raise FileNotFoundError("mochi.json file not found in segmentation model")

print(f"Found mochi.json file: {mochi_json_path}")
with open(mochi_json_path, 'r') as f:
    mochi_data = json.load(f)
    input_shape = mochi_data['stages'][0]['input_shape']
    print(f"Extracted input_shape: {input_shape}")
    
    # Extract height and width
    height = input_shape[2]
    width = input_shape[3]
    
    # Build tensor shape for DataInputConfig
    tensor_shape = [1, 3, height, width]
    segmentation_data_input_config = json.dumps({"input_shape": tensor_shape})
    
    print(f"Height: {height}, Width: {width}")
    print(f"Segmentation DataInputConfig: {segmentation_data_input_config}")

# Create new archive for compilation
output_tar_gz = os.path.join(path, 'segmentation.tar.gz')
with tarfile.open(output_tar_gz, "w:gz") as tar:
    tar.add(model_file, arcname=os.path.basename(model_file))
print(f"Created compilation-ready archive: {output_tar_gz}")

# Upload repackaged segmentation model
target_key = output_key.rsplit('/', 1)[0] + '/segmentation.tar.gz'
s3_client.upload_file(output_tar_gz, output_bucket, target_key)
print(f"Uploaded to s3://{output_bucket}/{target_key}")

### Compile Segmentation Model for x86_64 CPU

In [None]:
# Create segmentation compilation job
compilation_job = "seg-x86-cpu-" + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
model_path = f"s3://{output_bucket}/{target_key}"

print(f"Segmentation compilation job: {compilation_job}")
print(f"Model path: {model_path}")

In [None]:
# Start segmentation model compilation
seg_x86_response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_job,
    RoleArn=sm_role_arn,
    
    # Input configuration for segmentation model
    InputConfig={
        'S3Uri': model_path,
        'DataInputConfig': segmentation_data_input_config,
        'Framework': 'PYTORCH',
        'FrameworkVersion': '1.8'
    },
    
    # Output configuration
    OutputConfig={
        'S3OutputLocation': f's3://{bucket}/{project}/compilation_output',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'X86_64'
        }
    },
    
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    }
)

print("Segmentation x86_64 compilation job started")

In [None]:
# Monitor segmentation compilation progress
print("Monitoring segmentation compilation...")
print("Status: ", end="")

while True:
    create_response = sagemaker_client.describe_compilation_job(
        CompilationJobName=compilation_job
    )
    
    status = create_response['CompilationJobStatus']
    
    if status == 'INPROGRESS':
        print(".", end='')
    elif status == 'STARTING':
        print("*", end='')
    elif status == 'COMPLETED':
        print("\nSegmentation compilation completed!")
        break
    elif status == 'FAILED':
        print("\nSegmentation compilation failed!")
        print(f"Failure reason: {create_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

### Compile Segmentation Model for ARM64 CPU

In [None]:
# Create segmentation ARM64 compilation job
compilation_seg_arm = "seg-arm-cpu-" + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

print(f"Segmentation ARM64 compilation job: {compilation_seg_arm}")

In [None]:
# Start segmentation ARM64 compilation
seg_arm_response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_seg_arm,
    RoleArn=sm_role_arn,
    
    # Input configuration
    InputConfig={
        'S3Uri': model_path,
        'DataInputConfig': segmentation_data_input_config,
        'Framework': 'PYTORCH',
        'FrameworkVersion': '1.8'
    },
    
    # Output configuration for ARM64
    OutputConfig={
        'S3OutputLocation': f's3://{bucket}/{project}/compilation_output',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'ARM64'
        }
    },
    
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    }
)

print("Segmentation ARM64 compilation job started")

In [None]:
# Monitor segmentation ARM64 compilation
print("Monitoring segmentation ARM64 compilation...")
print("Status: ", end="")

while True:
    seg_arm_response = sagemaker_client.describe_compilation_job(
        CompilationJobName=compilation_seg_arm
    )
    
    status = seg_arm_response['CompilationJobStatus']
    
    if status == 'INPROGRESS':
        print(".", end='')
    elif status == 'STARTING':
        print("*", end='')
    elif status == 'COMPLETED':
        print("\nSegmentation ARM64 compilation completed!")
        break
    elif status == 'FAILED':
        print("\nSegmentation ARM64 compilation failed!")
        print(f"Failure reason: {seg_arm_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

### Compile Segmentation Model for Jetson Xavier

Target: ARM64 architecture with NVIDIA GPU acceleration

In [None]:
# Create segmentation Jetson compilation job
compilation_seg_jetson = "seg-xavier-gpu-" + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

print(f"Segmentation Jetson Xavier compilation job: {compilation_seg_jetson}")

In [None]:
# Start segmentation Jetson Xavier compilation
seg_jetson_response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_seg_jetson,
    RoleArn=sm_role_arn,
    
    # Input configuration
    InputConfig={
        'S3Uri': model_path,
        'DataInputConfig': segmentation_data_input_config,
        'Framework': 'PYTORCH',
        'FrameworkVersion': '1.8'
    },
    
    # Output configuration for Jetson Xavier
    OutputConfig={
        'S3OutputLocation': f's3://{bucket}/{project}/compilation_output',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'ARM64',
            'Accelerator': 'NVIDIA'
        },
        # Jetson Xavier specific compiler options
        'CompilerOptions': '{"cuda-ver": "10.2","gpu-code": "sm_72","trt-ver": "8.2.1"}'
    },
    
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    }
)

print("Segmentation Jetson Xavier compilation job started")

In [None]:
# Monitor segmentation Jetson compilation
print("Monitoring segmentation Jetson compilation...")
print("Status: ", end="")

while True:
    seg_jetson_response = sagemaker_client.describe_compilation_job(
        CompilationJobName=compilation_seg_jetson
    )
    
    status = seg_jetson_response['CompilationJobStatus']
    
    if status == 'INPROGRESS':
        print(".", end='')
    elif status == 'STARTING':
        print("*", end='')
    elif status == 'COMPLETED':
        print("\nSegmentation Jetson compilation completed!")
        break
    elif status == 'FAILED':
        print("\nSegmentation Jetson compilation failed!")
        print(f"Failure reason: {seg_jetson_response.get('FailureReason', 'Unknown')}")
        break
    else:
        print("?", end='')
    
    time.sleep(60)

## Summary

This notebook has successfully:

1. **Trained Models:**
   - Classification model for binary defect detection
   - Segmentation model for pixel-level defect localization

2. **Compiled Models for Multiple Targets:**
   - **Classification:** Jetson Xavier (ARM64 + GPU), x86_64 CPU, ARM64 CPU
   - **Segmentation:** x86_64 CPU, ARM64 CPU, Jetson Xavier (ARM64 + GPU)

3. **Prepared for DDA Deployment:**
   - Models are optimized for edge deployment
   - Compatible with AWS IoT Greengrass
   - Ready for integration with DDA application

**Next Steps:**
- Download compiled models from S3
- Deploy to target edge devices using DDA
- Configure inference parameters
- Test with production data