# Fine-tuning Llama 3.2 3B Instruct Model with SageMaker JumpStart

This notebook demonstrates how to fine-tune the Llama 3.2 3B instruct model on Amazon SageMaker using JumpStart. We'll use a small JSONL dataset (~100 rows) and leverage SageMaker JumpStart's pre-built containers and scripts.

In [None]:
import os
import boto3
import json
import sagemaker
import pandas as pd
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.jumpstart.estimator import JumpStartEstimator
from sagemaker import get_execution_role

## Setup SageMaker Session

In [None]:
# Set up SageMaker session and role
session = sagemaker.Session()
role = get_execution_role()

# Define S3 bucket for storing training data and model artifacts
bucket = session.default_bucket()
prefix = 'llama-3-2-finetuning'

print(f"SageMaker session is using bucket: {bucket}")

## Prepare and Format Training Data

We need to format our data according to the requirements of the Llama 3.2 model. Let's create a simple function to convert our dataset to the correct format.

In [None]:
# Sample data preparation
# Assuming you have a jsonl file with 'prompt' and 'response' fields
# You can replace this with loading your actual data

def format_data_for_llama(input_file, output_file):
    """Convert data to Llama 3.2 chat format"""
    formatted_data = []
    
    with open(input_file, 'r') as f:
        for line in f:
            example = json.loads(line)
            
            # Extract prompt and response
            prompt = example.get('prompt', '')
            response = example.get('response', '')
            
            # Create conversation in Llama 3.2 format
            conversation = [
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": response}
            ]
            
            formatted_data.append({"conversations": conversation})
    
    # Write formatted data to output file
    with open(output_file, 'w') as f:
        for item in formatted_data:
            f.write(json.dumps(item) + '\n')
    
    print(f"Formatted {len(formatted_data)} examples to {output_file}")
    return output_file

In [None]:
# For demonstration, let's create a simple sample dataset
# In practice, you would use your own dataset file

sample_data = [
    {"prompt": "What are the key features of your product?", "response": "Our product offers seamless integration, robust security, and an intuitive interface."},
    {"prompt": "How do I reset my password?", "response": "To reset your password, click on the 'Forgot Password' link on the login page and follow the instructions sent to your email."}
]

# Create a sample input file
input_file = "sample_data.jsonl"
with open(input_file, 'w') as f:
    for item in sample_data:
        f.write(json.dumps(item) + '\n')

# Format the data for Llama 3.2
formatted_file = "formatted_data.jsonl"
format_data_for_llama(input_file, formatted_file)

# Display the formatted data
with open(formatted_file, 'r') as f:
    for i, line in enumerate(f):
        print(f"Example {i+1}:")
        print(json.loads(line))
        print()

## Upload Training Data to S3

In [None]:
# Upload formatted data to S3
train_data_s3_path = session.upload_data(
    path=formatted_file,
    bucket=bucket,
    key_prefix=f"{prefix}/data"
)

print(f"Training data uploaded to: {train_data_s3_path}")

## Configure and Launch Fine-tuning Job with JumpStart

Now, let's use SageMaker JumpStart to fine-tune the Llama 3.2 3B model:

In [None]:
# Define model ID for Llama 3.2 3B Instruct in JumpStart
model_id = "meta-textgeneration-llama-3-2-3b-instruct"
model_version = "*"

# Configure hyperparameters for fine-tuning
hyperparameters = {
    "max_steps": "300",               # Small dataset, so limit steps
    "epoch": "3",                     # 3 epochs for small dataset
    "learning_rate": "2e-4",          # Slightly higher learning rate
    "per_device_train_batch_size": "4", # Batch size per GPU
    "gradient_accumulation_steps": "2", # Effective batch size = 8
    "max_seq_length": "2048",         # Max sequence length
    
    # LoRA parameters
    "lora_alpha": "32",               # LoRA scaling factor
    "lora_dropout": "0.05",           # Dropout probability
    "lora_r": "16",                   # LoRA rank
    
    # QLoRA parameters for efficient training
    "use_bnb_4bit": "True",            # Use 4-bit quantization
    "use_peft": "True",               # Use parameter-efficient fine-tuning
    
    # Training settings
    "save_strategy": "epoch",         # Save checkpoint every epoch
    "warmup_ratio": "0.03",           # Percentage of steps for learning rate warmup
    "hub_model_id": None,              # Don't push to HF Hub
    "hub_private_repo": "False",      # Don't use private repo
    "hub_token": None,                # No HF token needed
    "push_to_hub": "False",           # Don't push to HF Hub
}

In [None]:
# Create JumpStart estimator for fine-tuning
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    role=role,
    instance_count=1,
    instance_type="ml.g5.2xlarge",  # 1 NVIDIA A10G GPU
    hyperparameters=hyperparameters,
    environment={"SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600"},
    disable_output_compression=True
)

In [None]:
# Start the fine-tuning job
estimator.fit(
    train_data_s3_path,
    logs=True,
    wait=True
)

## Deploy the Fine-tuned Model

After training completes, you can deploy the fine-tuned model as a SageMaker endpoint:

In [None]:
# Deploy the model as an endpoint
predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.xlarge",
    container_startup_health_check_timeout=600  # Longer timeout for model loading
)

## Test the Fine-tuned Model

In [None]:
# Create a test function
def test_model(predictor, prompt):
    payload = {
        "inputs": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "parameters": {
            "max_new_tokens": 512,
            "top_p": 0.9,
            "temperature": 0.7,
            "return_full_text": False
        }
    }
    
    response = predictor.predict(payload)
    return response

In [None]:
# Test with sample prompts
test_prompt = "What are the key features of your product?"
response = test_model(predictor, test_prompt)
print(f"Prompt: {test_prompt}")
print(f"Response: {response}")

## Python Script Version

Below is a standalone Python script version that can be run outside of a notebook:

In [None]:
%%writefile finetune_llama_jumpstart.py
import os
import boto3
import json
import argparse
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.jumpstart.estimator import JumpStartEstimator
from sagemaker import get_execution_role

def format_data_for_llama(input_file, output_file):
    """Convert data to Llama 3.2 chat format"""
    formatted_data = []
    
    with open(input_file, 'r') as f:
        for line in f:
            example = json.loads(line)
            
            # Extract prompt and response
            prompt = example.get('prompt', '')
            response = example.get('response', '')
            
            # Create conversation in Llama 3.2 format
            conversation = [
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": response}
            ]
            
            formatted_data.append({"conversations": conversation})
    
    # Write formatted data to output file
    with open(output_file, 'w') as f:
        for item in formatted_data:
            f.write(json.dumps(item) + '\n')
    
    print(f"Formatted {len(formatted_data)} examples to {output_file}")
    return output_file

def main():
    parser = argparse.ArgumentParser(description="Fine-tune Llama 3.2 3B using SageMaker JumpStart")
    parser.add_argument("--input-file", type=str, required=True, help="Path to input JSONL file")
    parser.add_argument("--s3-bucket", type=str, help="S3 bucket name")
    parser.add_argument("--s3-prefix", type=str, default="llama-3-2-finetuning", help="S3 prefix for data and model")
    parser.add_argument("--instance-type", type=str, default="ml.g5.2xlarge", help="Training instance type")
    parser.add_argument("--epochs", type=int, default=3, help="Number of training epochs")
    parser.add_argument("--learning-rate", type=float, default=2e-4, help="Learning rate")
    parser.add_argument("--batch-size", type=int, default=4, help="Batch size per device")
    parser.add_argument("--deploy", action="store_true", help="Deploy model after training")
    parser.add_argument("--deploy-instance-type", type=str, default="ml.g5.xlarge", help="Deployment instance type")
    
    args = parser.parse_args()
    
    # Initialize SageMaker session
    session = sagemaker.Session()
    role = get_execution_role()
    
    # Set S3 bucket
    bucket = args.s3_bucket if args.s3_bucket else session.default_bucket()
    prefix = args.s3_prefix
    
    print(f"Using S3 bucket: {bucket}")
    
    # Format data for Llama 3.2
    formatted_file = "formatted_data.jsonl"
    format_data_for_llama(args.input_file, formatted_file)
    
    # Upload data to S3
    train_data_s3_path = session.upload_data(
        path=formatted_file,
        bucket=bucket,
        key_prefix=f"{prefix}/data"
    )
    
    print(f"Training data uploaded to: {train_data_s3_path}")
    
    # Define model ID for Llama 3.2 3B
    model_id = "meta-textgeneration-llama-3-2-3b-instruct"
    model_version = "*"
    
    # Configure hyperparameters
    hyperparameters = {
        "epoch": str(args.epochs),
        "learning_rate": str(args.learning_rate),
        "per_device_train_batch_size": str(args.batch_size),
        "gradient_accumulation_steps": "2",
        "max_seq_length": "2048",
        
        # LoRA parameters
        "lora_alpha": "32",
        "lora_dropout": "0.05",
        "lora_r": "16",
        
        # QLoRA and PEFT settings
        "use_bnb_4bit": "True",
        "use_peft": "True",
        
        # Training settings
        "save_strategy": "epoch",
        "warmup_ratio": "0.03",
        "push_to_hub": "False",
    }
    
    # Create JumpStart estimator
    estimator = JumpStartEstimator(
        model_id=model_id,
        model_version=model_version,
        role=role,
        instance_count=1,
        instance_type=args.instance_type,
        hyperparameters=hyperparameters,
        environment={"SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600"},
        disable_output_compression=True
    )
    
    # Start training
    print("Starting fine-tuning job...")
    estimator.fit(
        train_data_s3_path,
        logs=True,
        wait=True
    )
    
    print("Training complete!")
    print(f"Model artifacts saved to: {estimator.model_data}")
    
    # Deploy model if requested
    if args.deploy:
        print(f"Deploying model to endpoint using instance type: {args.deploy_instance_type}")
        predictor = estimator.deploy(
            initial_instance_count=1,
            instance_type=args.deploy_instance_type,
            container_startup_health_check_timeout=600
        )
        
        endpoint_name = predictor.endpoint_name
        print(f"Model deployed successfully to endpoint: {endpoint_name}")
        
        # Save endpoint name to file for future reference
        with open("endpoint_info.json", "w") as f:
            json.dump({"endpoint_name": endpoint_name}, f)

if __name__ == "__main__":
    main()

## Script Usage Example

You can run the script with the following command:

```bash
python finetune_llama_jumpstart.py \
    --input-file your_data.jsonl \
    --s3-bucket your-bucket-name \
    --epochs 3 \
    --learning-rate 2e-4 \
    --batch-size 4 \
    --deploy
```

## Clean Up Resources

Remember to clean up resources when you're done to avoid unnecessary charges.

In [None]:
# Delete endpoint if it was created
try:
    predictor.delete_endpoint(delete_endpoint_config=True)
    print("Endpoint deleted successfully.")
except NameError:
    print("No endpoint to delete.")