# Deploying Mistral Models on Amazon SageMaker JumpStart

This notebook guides you through deploying and using Mistral models on Amazon SageMaker JumpStart.

## What is SageMaker JumpStart?

SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types. Benefits:
- **One-click deployment**: Deploy models without writing deployment code
- **Customizable**: Fine-tune models on your own data
- **Cost-effective**: Pay only for the compute you use
- **Full control**: Models run in your VPC, data stays in your account

## Available Mistral Models on JumpStart

As of June 2025, 16+ Mistral models are available. This notebook focuses on:

**Mistral-Small-3.2-24B-Instruct-2506** (Latest)
- **Size**: 24 billion parameters
- **Context**: 32K tokens
- **Best for**: Balanced performance and cost
- **Advantages**: Latest improvements, efficient inference

## Prerequisites

1. **AWS Account** with SageMaker access
2. **IAM Permissions**: SageMaker full access or specific permissions
3. **Service Quotas**: Ensure you have quota for ml.g5 instances
4. **Region**: This notebook uses us-west-2 for best model availability

## Step 1: Install Required Packages

First, let's install the SageMaker Python SDK if it's not already installed.

In [6]:
# Install SageMaker SDK
import sys
import subprocess

print("üì¶ Installing SageMaker Python SDK...\n")

try:
    import sagemaker
    print(f"‚úÖ SageMaker SDK already installed (version {sagemaker.__version__})")
except ImportError:
    print("Installing sagemaker package...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "sagemaker"])
    print("‚úÖ SageMaker SDK installed successfully")
    print("\n‚ö†Ô∏è  Please restart the kernel and run this cell again.")

üì¶ Installing SageMaker Python SDK...

‚úÖ SageMaker SDK already installed (version 2.254.1)


## Step 2: Setup and Imports

In [9]:
import boto3
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.session import Session
import json
from datetime import datetime

# Initialize SageMaker session with US region
region = 'us-east-2'  # Change to 'us-west-1' if preferred
boto3_session = boto3.Session(region_name=region)
sagemaker_session = Session(boto_session=boto3_session)
role = sagemaker.get_execution_role()

print(f"‚úÖ SageMaker session initialized")
print(f"   Region: {region}")
print(f"   Role: {role}")
print(f"   Session: {sagemaker_session}")

‚úÖ SageMaker session initialized
   Region: us-east-2
   Role: arn:aws:iam::314146324612:role/Admin
   Session: <sagemaker.session.Session object at 0x13bca9130>


## Step 2a: Create IAM Role for Workshop Studio (If Needed)

For Workshop Studio environments with limited IAM permissions, we need to create a proper SageMaker execution role.

This cell will:
1. Check if the current role has necessary permissions
2. Create a new SageMaker execution role if needed
3. Attach required policies for JumpStart model access

**Note**: If you're in a Workshop Studio environment, you may need to use this role instead of the default one.

In [10]:
import boto3
import json
from botocore.exceptions import ClientError

iam_client = boto3.client('iam')
sts_client = boto3.client('sts')

print("üîß Setting up IAM Role for SageMaker JumpStart\n")
print("=" * 80)

# Get account ID
account_id = sts_client.get_caller_identity()['Account']
print(f"Account ID: {account_id}")
print(f"Region: {region}\n")

# Define role name
sagemaker_role_name = 'SageMakerJumpStartExecutionRole'

# Trust policy for SageMaker
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

# Custom policy for JumpStart S3 access
jumpstart_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::jumpstart-cache-prod-{region}",
                f"arn:aws:s3:::jumpstart-cache-prod-{region}/*",
                "arn:aws:s3:::jumpstart-cache-prod-*",
                "arn:aws:s3:::jumpstart-cache-prod-*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

try:
    # Try to create the role
    print(f"Creating IAM role: {sagemaker_role_name}...")
    
    try:
        create_role_response = iam_client.create_role(
            RoleName=sagemaker_role_name,
            AssumeRolePolicyDocument=json.dumps(trust_policy),
            Description='SageMaker execution role for JumpStart models',
            MaxSessionDuration=3600
        )
        print(f"‚úÖ Role created: {create_role_response['Role']['Arn']}")
        new_role_arn = create_role_response['Role']['Arn']
        
    except ClientError as e:
        if e.response['Error']['Code'] == 'EntityAlreadyExists':
            print(f"‚ÑπÔ∏è  Role already exists, retrieving...")
            get_role_response = iam_client.get_role(RoleName=sagemaker_role_name)
            new_role_arn = get_role_response['Role']['Arn']
            print(f"‚úÖ Using existing role: {new_role_arn}")
        else:
            raise
    
    # Attach AWS managed policy for SageMaker
    print("\nAttaching managed policies...")
    try:
        iam_client.attach_role_policy(
            RoleName=sagemaker_role_name,
            PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess'
        )
        print("‚úÖ Attached AmazonSageMakerFullAccess")
    except ClientError as e:
        if e.response['Error']['Code'] != 'EntityAlreadyExists':
            print(f"‚ö†Ô∏è  Could not attach AmazonSageMakerFullAccess: {e}")
    
    # Create and attach custom JumpStart policy
    jumpstart_policy_name = 'SageMakerJumpStartS3Access'
    jumpstart_policy_arn = f"arn:aws:iam::{account_id}:policy/{jumpstart_policy_name}"
    
    print(f"\nCreating custom policy: {jumpstart_policy_name}...")
    try:
        create_policy_response = iam_client.create_policy(
            PolicyName=jumpstart_policy_name,
            PolicyDocument=json.dumps(jumpstart_policy),
            Description='Custom policy for SageMaker JumpStart S3 access'
        )
        print(f"‚úÖ Policy created: {create_policy_response['Policy']['Arn']}")
        jumpstart_policy_arn = create_policy_response['Policy']['Arn']
        
    except ClientError as e:
        if e.response['Error']['Code'] == 'EntityAlreadyExists':
            print(f"‚ÑπÔ∏è  Policy already exists")
        else:
            print(f"‚ö†Ô∏è  Could not create policy: {e}")
    
    # Attach custom policy to role
    try:
        iam_client.attach_role_policy(
            RoleName=sagemaker_role_name,
            PolicyArn=jumpstart_policy_arn
        )
        print(f"‚úÖ Attached custom JumpStart policy")
    except ClientError as e:
        if e.response['Error']['Code'] != 'EntityAlreadyExists':
            print(f"‚ö†Ô∏è  Could not attach custom policy: {e}")
    
    # Update the role variable to use the new role
    role = new_role_arn
    
    print("\n" + "=" * 80)
    print(f"\n‚úÖ IAM Setup Complete!")
    print(f"\nUsing role: {role}")
    print(f"\nüí° This role has permissions to:")
    print(f"   - Access JumpStart model artifacts in S3")
    print(f"   - Pull container images from ECR")
    print(f"   - Create and manage SageMaker endpoints")
    print(f"   - Write logs to CloudWatch")
    
    # Wait a few seconds for IAM to propagate
    print(f"\n‚è≥ Waiting 10 seconds for IAM changes to propagate...")
    import time
    time.sleep(10)
    print(f"‚úÖ Ready to deploy!")
    
except ClientError as e:
    error_code = e.response['Error']['Code']
    
    if error_code == 'AccessDenied':
        print(f"\n‚ùå Access Denied: Your current IAM user/role doesn't have permission to create IAM roles.")
        print(f"\nüìã Required IAM Permissions:")
        print(f"   - iam:CreateRole")
        print(f"   - iam:AttachRolePolicy")
        print(f"   - iam:CreatePolicy")
        print(f"   - iam:GetRole")
        print(f"\nüîß Solutions:")
        print(f"   1. Ask your AWS administrator to create the role with the policies shown above")
        print(f"   2. Use the existing role if it has the necessary permissions")
        print(f"   3. Try using Amazon Bedrock instead (no IAM setup needed) - see notebook 03")
        print(f"\nüìù Current role: {role}")
        print(f"   You can try to proceed with this role, but deployment may fail if it lacks S3 permissions.")
    else:
        print(f"\n‚ùå Error: {e}")
        print(f"\nüìù Current role: {role}")
        print(f"   Proceeding with existing role...")

except Exception as e:
    print(f"\n‚ùå Unexpected error: {e}")
    print(f"\nüìù Current role: {role}")
    print(f"   Proceeding with existing role...")

print("\n" + "=" * 80)

üîß Setting up IAM Role for SageMaker JumpStart

Account ID: 314146324612
Region: us-east-2

Creating IAM role: SageMakerJumpStartExecutionRole...
‚úÖ Role created: arn:aws:iam::314146324612:role/SageMakerJumpStartExecutionRole

Attaching managed policies...
‚úÖ Attached AmazonSageMakerFullAccess

Creating custom policy: SageMakerJumpStartS3Access...
‚úÖ Policy created: arn:aws:iam::314146324612:policy/SageMakerJumpStartS3Access
‚úÖ Attached custom JumpStart policy


‚úÖ IAM Setup Complete!

Using role: arn:aws:iam::314146324612:role/SageMakerJumpStartExecutionRole

üí° This role has permissions to:
   - Access JumpStart model artifacts in S3
   - Pull container images from ECR
   - Create and manage SageMaker endpoints
   - Write logs to CloudWatch

‚è≥ Waiting 10 seconds for IAM changes to propagate...
‚úÖ Ready to deploy!



## Step 3: Select and Configure the Model

We'll use **Mistral-7B-Instruct**, a reliable and widely available model in SageMaker JumpStart.

### Why Mistral 7B Instruct?

- **Widely Available**: Supported in all major AWS regions
- **Cost-Effective**: Smaller model means lower inference costs
- **Fast**: Quick inference times with good performance
- **Versatile**: Handles most common use cases well
- **Proven**: Battle-tested model with strong community support

### Instance Selection

Recommended instances for 7B model:
- **ml.g5.2xlarge**: 1 GPU, good for testing ($1.21/hour)
- **ml.g5.xlarge**: 1 GPU, more economical ($1.01/hour)
- **ml.g5.4xlarge**: 1 GPU, better performance ($1.94/hour)

In [11]:
# Model configuration
# Using Mistral 7B Instruct - widely available in SageMaker JumpStart
model_id = "huggingface-llm-mistral-7b-instruct"
model_version = "*"  # Use latest version

# Instance configuration
instance_type = "ml.g5.2xlarge"  # Start with smaller instance for testing

print(f"Model Configuration:")
print(f"  Model ID: {model_id}")
print(f"  Version: {model_version}")
print(f"  Instance: {instance_type}")
print(f"  Region: {region}")
print(f"\nüí° Tip: You can change instance_type to ml.g5.4xlarge or ml.g5.12xlarge for better performance.")
print(f"\nüìù Note: Using Mistral 7B Instruct - a reliable and widely available model.")

Model Configuration:
  Model ID: huggingface-llm-mistral-7b-instruct
  Version: *
  Instance: ml.g5.2xlarge
  Region: us-east-2

üí° Tip: You can change instance_type to ml.g5.4xlarge or ml.g5.12xlarge for better performance.

üìù Note: Using Mistral 7B Instruct - a reliable and widely available model.


## Step 3a: Verify IAM Permissions (Optional)

Before deploying, let's verify your IAM role has the necessary permissions to access JumpStart models.

In [12]:
# Check IAM role permissions
import boto3

iam_client = boto3.client('iam')
sts_client = boto3.client('sts')

print("üîç Checking IAM Role Configuration...\n")
print("=" * 80)

try:
    # Get current identity
    identity = sts_client.get_caller_identity()
    print(f"Current Identity: {identity['Arn']}")
    print(f"Account: {identity['Account']}")
    
    # Extract role name from ARN
    role_name = role.split('/')[-1]
    print(f"\nSageMaker Role: {role_name}")
    
    # Check if role exists and has SageMaker trust policy
    try:
        role_info = iam_client.get_role(RoleName=role_name)
        print("‚úÖ Role exists")
        
        # Check attached policies
        attached_policies = iam_client.list_attached_role_policies(RoleName=role_name)
        print(f"\nAttached Policies: {len(attached_policies['AttachedPolicies'])}")
        for policy in attached_policies['AttachedPolicies'][:5]:
            print(f"  - {policy['PolicyName']}")
        
        print("\nüí° If deployment fails with S3 access errors, you may need to add:")
        print("   - AmazonSageMakerFullAccess policy")
        print("   - Or custom policy with s3:GetObject on jumpstart-cache-prod-* buckets")
        
    except Exception as e:
        print(f"‚ö†Ô∏è  Could not retrieve role details: {e}")
        
except Exception as e:
    print(f"‚ö†Ô∏è  Error checking permissions: {e}")

print("\n" + "=" * 80)

üîç Checking IAM Role Configuration...

Current Identity: arn:aws:sts::314146324612:assumed-role/Admin/joshtam-Isengard
Account: 314146324612

SageMaker Role: SageMakerJumpStartExecutionRole
‚úÖ Role exists

Attached Policies: 2
  - SageMakerJumpStartS3Access
  - AmazonSageMakerFullAccess

üí° If deployment fails with S3 access errors, you may need to add:
   - AmazonSageMakerFullAccess policy
   - Or custom policy with s3:GetObject on jumpstart-cache-prod-* buckets



## Step 4: Deploy the Model

This will:
1. Create a SageMaker endpoint configuration
2. Launch the specified instance
3. Load the model onto the instance
4. Create an endpoint for inference

**‚è±Ô∏è Deployment time**: 5-10 minutes

**üí∞ Cost**: You'll be charged for the instance from deployment until deletion

### Troubleshooting IAM Permissions

If you get S3 access errors, your IAM role needs these permissions:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::jumpstart-cache-prod-*",
                "arn:aws:s3:::jumpstart-cache-prod-*/*"
            ]
        }
    ]
}
```

**Alternative**: Use Amazon Bedrock for Mistral models (no IAM setup needed) - see notebook 03.

In [13]:
print("üöÄ Starting model deployment...\n")
print("This will take 5-10 minutes. Please wait...\n")
print("=" * 80)

try:
    # Create JumpStart model
    model = JumpStartModel(
        model_id=model_id,
        model_version=model_version,
        instance_type=instance_type,
        role=role,
        sagemaker_session=sagemaker_session
    )
    
    # Deploy the model
    start_time = datetime.now()
    predictor = model.deploy(
        initial_instance_count=1,
        wait=True  # Wait for deployment to complete
    )
    end_time = datetime.now()
    
    deployment_time = (end_time - start_time).total_seconds() / 60
    
    print("\n" + "=" * 80)
    print(f"\n‚úÖ Model deployed successfully!")
    print(f"   Endpoint name: {predictor.endpoint_name}")
    print(f"   Deployment time: {deployment_time:.1f} minutes")
    print(f"   Instance: {instance_type}")
    print(f"\n‚ö†Ô∏è  Remember: You're now being charged for this instance!")
    print(f"   Delete the endpoint when done to stop charges.")
    
except Exception as e:
    print(f"\n‚ùå Deployment failed: {e}")
    print("\nCommon issues:")
    print("1. Insufficient service quota for the instance type")
    print("2. Model not available in your region")
    print("3. IAM role lacks necessary permissions")
    print("\nTry:")
    print("- Request quota increase in Service Quotas console")
    print("- Use a different instance type (ml.g5.xlarge)")
    print("- Check IAM role has SageMaker permissions")

üöÄ Starting model deployment...

This will take 5-10 minutes. Please wait...



Using model 'huggingface-llm-mistral-7b-instruct' with wildcard version identifier '*'. You can pin to version '3.22.2' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


--------------------------------------------------------------*

Please check the troubleshooting guide for common errors: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-python-sdk-troubleshooting.html#sagemaker-python-sdk-troubleshooting-create-endpoint



‚ùå Deployment failed: Error hosting endpoint hf-llm-mistral-7b-instruct-2025-11-22-04-29-10-139: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html

Common issues:
1. Insufficient service quota for the instance type
2. Model not available in your region
3. IAM role lacks necessary permissions

Try:
- Request quota increase in Service Quotas console
- Use a different instance type (ml.g5.xlarge)
- Check IAM role has SageMaker permissions


## Step 5: Invoke the Model - Basic Usage

Now that the model is deployed, let's learn how to invoke it. Mistral models use a specific message format with roles and content.

### Message Format

Mistral models expect messages in this format:
```python
{
    "messages": [
        {"role": "user", "content": "Your prompt here"}
    ],
    "max_tokens": 512,
    "temperature": 0.7
}
```

### Key Parameters

- **messages**: List of conversation messages with roles (user/assistant/system)
- **max_tokens**: Maximum tokens to generate (default: 512)
- **temperature**: Randomness (0.0 = deterministic, 1.0 = creative)
- **top_p**: Nucleus sampling (0.0-1.0, default: 1.0)
- **top_k**: Top-k sampling (default: 50)

In [None]:
# Helper function to invoke the model
def invoke_model(prompt, max_tokens=512, temperature=0.7, top_p=1.0):
    """
    Invoke the Mistral model with a prompt.
    
    Args:
        prompt: User prompt string
        max_tokens: Maximum tokens to generate
        temperature: Sampling temperature (0.0-1.0)
        top_p: Nucleus sampling parameter
    
    Returns:
        Generated text response
    """
    payload = {
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "max_tokens": max_tokens,
        "temperature": temperature,
        "top_p": top_p
    }
    
    try:
        response = predictor.predict(payload)
        
        # Extract the generated text
        if isinstance(response, dict) and 'choices' in response:
            return response['choices'][0]['message']['content']
        elif isinstance(response, list) and len(response) > 0:
            return response[0]['generated_text']
        else:
            return str(response)
    
    except Exception as e:
        return f"Error invoking model: {e}"

# Test basic invocation
print("üß™ Testing basic model invocation...\n")
print("=" * 80)

test_prompt = "What is Amazon SageMaker? Explain in 2-3 sentences."
print(f"Prompt: {test_prompt}\n")

response = invoke_model(test_prompt, max_tokens=200, temperature=0.7)
print(f"Response:\n{response}")
print("\n" + "=" * 80)

## Step 6: Use Case 1 - Text Summarization

Mistral Small excels at summarizing long documents into concise summaries.

In [None]:
# Example: Summarize a technical document
long_text = """
Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists 
and developers can quickly and easily build and train machine learning models, and then directly 
deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring 
notebook instance for easy access to your data sources for exploration and analysis, so you don't 
have to manage servers. It also provides common machine learning algorithms that are optimized to 
run efficiently against extremely large data in a distributed environment. With native support for 
bring-your-own-algorithms and frameworks, SageMaker offers flexible distributed training options 
that adjust to your specific workflows. Deploy a model into a secure and scalable environment by 
launching it with a single click from the SageMaker console.
"""

summarization_prompt = f"""
Summarize the following text in 2-3 bullet points:

{long_text}

Summary:
"""

print("üìù Use Case: Text Summarization\n")
print("=" * 80)
print(f"Original text length: {len(long_text)} characters\n")

summary = invoke_model(summarization_prompt, max_tokens=300, temperature=0.5)
print(f"Summary:\n{summary}")
print("\n" + "=" * 80)

## Step 7: Use Case 2 - Code Generation

Generate code snippets in various programming languages.

In [None]:
# Example: Generate Python code
code_prompt = """
Write a Python function that:
1. Takes a list of numbers as input
2. Removes duplicates
3. Sorts the list in descending order
4. Returns the top 5 numbers

Include docstring and type hints.
"""

print("üíª Use Case: Code Generation\n")
print("=" * 80)
print(f"Request: {code_prompt.strip()}\n")

code_response = invoke_model(code_prompt, max_tokens=500, temperature=0.3)
print(f"Generated Code:\n{code_response}")
print("\n" + "=" * 80)

## Step 8: Use Case 3 - Question Answering with Context

Answer questions based on provided context (RAG-style use case).

In [None]:
# Example: Q&A with context
context = """
Singapore is a sovereign city-state and island country in Southeast Asia. It lies off the southern 
tip of the Malay Peninsula and is separated from Malaysia by the Straits of Johor to the north. 
The country is highly urbanized with very little primary rainforest remaining. Singapore's territory 
consists of one main island along with 62 other islets. Since independence, extensive land reclamation 
has increased its total size by 25%. The country has a tropical rainforest climate with no distinctive 
seasons. Its GDP per capita is one of the highest in the world.
"""

question = "What is Singapore's climate like and how has its land area changed?"

qa_prompt = f"""
Context:
{context}

Question: {question}

Answer the question based only on the context provided above. Be concise.
"""

print("‚ùì Use Case: Question Answering with Context\n")
print("=" * 80)
print(f"Question: {question}\n")

answer = invoke_model(qa_prompt, max_tokens=300, temperature=0.3)
print(f"Answer:\n{answer}")
print("\n" + "=" * 80)

## Step 9: Use Case 4 - Sentiment Analysis

Analyze sentiment and extract insights from customer feedback.

In [None]:
# Example: Analyze customer reviews
reviews = [
    "The product arrived quickly and works perfectly. Very satisfied!",
    "Disappointed with the quality. Not worth the price.",
    "Good product but customer service could be better."
]

sentiment_prompt = f"""
Analyze the sentiment of these customer reviews and provide:
1. Overall sentiment (Positive/Negative/Mixed)
2. Key themes
3. Actionable insights

Reviews:
{chr(10).join([f'{i+1}. {r}' for i, r in enumerate(reviews)])}

Analysis:
"""

print("üòä Use Case: Sentiment Analysis\n")
print("=" * 80)
print("Analyzing customer reviews...\n")

sentiment_analysis = invoke_model(sentiment_prompt, max_tokens=400, temperature=0.5)
print(f"Analysis:\n{sentiment_analysis}")
print("\n" + "=" * 80)

## Step 10: Use Case 5 - Multi-turn Conversation

Maintain context across multiple conversation turns.

In [None]:
# Example: Multi-turn conversation
def multi_turn_conversation(conversation_history):
    """
    Handle multi-turn conversations with context.
    
    Args:
        conversation_history: List of message dicts with 'role' and 'content'
    
    Returns:
        Assistant's response
    """
    payload = {
        "messages": conversation_history,
        "max_tokens": 300,
        "temperature": 0.7
    }
    
    try:
        response = predictor.predict(payload)
        if isinstance(response, dict) and 'choices' in response:
            return response['choices'][0]['message']['content']
        return str(response)
    except Exception as e:
        return f"Error: {e}"

# Simulate a conversation
print("üí¨ Use Case: Multi-turn Conversation\n")
print("=" * 80)

conversation = [
    {"role": "user", "content": "What are the main AWS compute services?"},
]

print("User: What are the main AWS compute services?")
response1 = multi_turn_conversation(conversation)
print(f"\nAssistant: {response1}\n")

# Add to conversation history
conversation.append({"role": "assistant", "content": response1})
conversation.append({"role": "user", "content": "Which one is best for machine learning workloads?"})

print("\nUser: Which one is best for machine learning workloads?")
response2 = multi_turn_conversation(conversation)
print(f"\nAssistant: {response2}")
print("\n" + "=" * 80)

## Step 11: Use Case 6 - Structured Data Extraction

Extract structured information from unstructured text.

In [None]:
# Example: Extract structured data
unstructured_text = """
John Smith works as a Senior Data Scientist at TechCorp in Singapore. 
He can be reached at john.smith@techcorp.com or +65 9123 4567. 
He specializes in machine learning and has 8 years of experience.
"""

extraction_prompt = f"""
Extract the following information from the text and format as JSON:
- name
- job_title
- company
- location
- email
- phone
- specialization
- years_of_experience

Text:
{unstructured_text}

JSON:
"""

print("üìä Use Case: Structured Data Extraction\n")
print("=" * 80)
print(f"Input text:\n{unstructured_text}\n")

extracted_data = invoke_model(extraction_prompt, max_tokens=300, temperature=0.1)
print(f"Extracted JSON:\n{extracted_data}")
print("\n" + "=" * 80)

## Step 12: Advanced - Batch Processing

Process multiple prompts efficiently.

In [None]:
# Example: Batch processing
import time

def batch_process(prompts, max_tokens=200, temperature=0.7):
    """
    Process multiple prompts in batch.
    
    Args:
        prompts: List of prompt strings
        max_tokens: Maximum tokens per response
        temperature: Sampling temperature
    
    Returns:
        List of responses
    """
    results = []
    
    for i, prompt in enumerate(prompts, 1):
        print(f"Processing {i}/{len(prompts)}...", end=" ")
        start = time.time()
        
        response = invoke_model(prompt, max_tokens, temperature)
        
        elapsed = time.time() - start
        print(f"Done ({elapsed:.2f}s)")
        
        results.append({
            "prompt": prompt,
            "response": response,
            "time": elapsed
        })
    
    return results

# Test batch processing
print("‚ö° Advanced: Batch Processing\n")
print("=" * 80)

batch_prompts = [
    "What is machine learning in one sentence?",
    "What is deep learning in one sentence?",
    "What is natural language processing in one sentence?"
]

print(f"Processing {len(batch_prompts)} prompts...\n")
batch_results = batch_process(batch_prompts, max_tokens=100, temperature=0.5)

print("\nResults:")
for i, result in enumerate(batch_results, 1):
    print(f"\n{i}. {result['prompt']}")
    print(f"   Response: {result['response']}")
    print(f"   Time: {result['time']:.2f}s")

avg_time = sum(r['time'] for r in batch_results) / len(batch_results)
print(f"\nAverage response time: {avg_time:.2f}s")
print("\n" + "=" * 80)

## Step 13: Performance Monitoring

Monitor endpoint performance and costs.

In [None]:
# Get endpoint information
import boto3

sagemaker_client = boto3.client('sagemaker', region_name=region)

print("üìä Endpoint Performance Monitoring\n")
print("=" * 80)

try:
    # Get endpoint details
    endpoint_desc = sagemaker_client.describe_endpoint(
        EndpointName=predictor.endpoint_name
    )
    
    print(f"Endpoint Name: {endpoint_desc['EndpointName']}")
    print(f"Status: {endpoint_desc['EndpointStatus']}")
    print(f"Creation Time: {endpoint_desc['CreationTime']}")
    print(f"Last Modified: {endpoint_desc['LastModifiedTime']}")
    
    # Get endpoint config
    config_desc = sagemaker_client.describe_endpoint_config(
        EndpointConfigName=endpoint_desc['EndpointConfigName']
    )
    
    variant = config_desc['ProductionVariants'][0]
    print(f"\nInstance Type: {variant['InstanceType']}")
    print(f"Instance Count: {variant['InitialInstanceCount']}")
    
    # Calculate estimated cost
    instance_costs = {
        'ml.g5.2xlarge': 1.21,
        'ml.g5.4xlarge': 1.94,
        'ml.g5.12xlarge': 7.09
    }
    
    hourly_cost = instance_costs.get(variant['InstanceType'], 0)
    daily_cost = hourly_cost * 24
    monthly_cost = daily_cost * 30
    
    print(f"\nEstimated Costs (USD):")
    print(f"  Hourly: ${hourly_cost:.2f}")
    print(f"  Daily: ${daily_cost:.2f}")
    print(f"  Monthly: ${monthly_cost:.2f}")
    
    print(f"\nüí° Tip: Delete the endpoint when not in use to save costs!")
    
except Exception as e:
    print(f"Error getting endpoint info: {e}")

print("\n" + "=" * 80)

## Step 14: Best Practices and Tips

Key recommendations for production use:

### 1. Temperature Settings
- **0.0-0.3**: Factual, deterministic tasks (Q&A, extraction)
- **0.4-0.7**: Balanced creativity (summarization, general chat)
- **0.8-1.0**: Creative tasks (brainstorming, storytelling)

### 2. Token Management
- Set appropriate `max_tokens` to control costs
- Monitor token usage for billing
- Use shorter prompts when possible

### 3. Error Handling
- Always wrap predictions in try-except blocks
- Implement retry logic for transient failures
- Log errors for debugging

### 4. Cost Optimization
- Delete endpoints when not in use
- Use auto-scaling for variable workloads
- Consider SageMaker Serverless Inference for sporadic traffic

### 5. Performance
- Batch similar requests together
- Use appropriate instance types for your workload
- Monitor CloudWatch metrics

### 6. Security
- Use VPC endpoints for private connectivity
- Enable encryption at rest and in transit
- Implement IAM policies for access control

## Step 15: Cleanup - Delete the Endpoint

**IMPORTANT**: Always delete your endpoint when finished to avoid ongoing charges.

This will:
1. Delete the endpoint
2. Delete the endpoint configuration
3. Optionally delete the model

‚ö†Ô∏è **Warning**: This action cannot be undone. You'll need to redeploy to use the model again.

In [None]:
# Cleanup function
def cleanup_endpoint(delete_model=False):
    """
    Delete the SageMaker endpoint and associated resources.
    
    Args:
        delete_model: If True, also delete the model
    """
    print("üßπ Cleaning up resources...\n")
    print("=" * 80)
    
    try:
        # Delete endpoint
        print(f"Deleting endpoint: {predictor.endpoint_name}")
        predictor.delete_endpoint(delete_endpoint_config=True)
        print("‚úÖ Endpoint deleted successfully")
        
        # Optionally delete model
        if delete_model:
            print(f"\nDeleting model...")
            predictor.delete_model()
            print("‚úÖ Model deleted successfully")
        
        print("\n" + "=" * 80)
        print("\n‚úÖ Cleanup complete! You are no longer being charged.")
        
    except Exception as e:
        print(f"\n‚ùå Error during cleanup: {e}")
        print("\nYou may need to manually delete resources from the SageMaker console.")

# Uncomment the line below to delete the endpoint
# cleanup_endpoint(delete_model=True)

print("‚ö†Ô∏è  Endpoint is still running!")
print("\nTo delete the endpoint and stop charges, uncomment and run:")
print("cleanup_endpoint(delete_model=True)")