# SageMaker V3 JumpStart Training Example

This notebook demonstrates how to use SageMaker V3 ModelTrainer with JumpStart models for easy model training and fine-tuning.

### Prerequisites
Note: Ensure you have sagemaker-train and ipywidgets installed in your environment. The ipywidgets package is required to monitor training job progress in Jupyter notebooks.

In [None]:
# Import required libraries
import json
import uuid

from sagemaker.train.model_trainer import ModelTrainer
from sagemaker.core.jumpstart import JumpStartConfig
from sagemaker.core.helper.session_helper import Session, get_execution_role

## Step 1: Setup Session and Configuration

Initialize the SageMaker session and define our training configuration.

In [None]:
# Initialize SageMaker session
sagemaker_session = Session()
role = get_execution_role()

# Configuration
JOB_NAME_PREFIX = "js-v3-training-example"

# Generate unique identifier
unique_id = str(uuid.uuid4())[:8]
base_job_name = f"{JOB_NAME_PREFIX}-{unique_id}"

print(f"Base job name: {base_job_name}")
print(f"SageMaker execution role: {role}")

## Step 2: Train HuggingFace BERT Model

Train a HuggingFace BERT model for text classification using JumpStart.

In [None]:
# Configure JumpStart for HuggingFace BERT
bert_jumpstart_config = JumpStartConfig(
    model_id="huggingface-spc-bert-base-cased",
    accept_eula=False  # This model doesn't require EULA acceptance
)

# Create ModelTrainer from JumpStart config
bert_trainer = ModelTrainer.from_jumpstart_config(
    jumpstart_config=bert_jumpstart_config,
    base_job_name=f"{base_job_name}-bert",
    hyperparameters={
        "epochs": 1,  # Set to 1 for quick demonstration
        "learning_rate": 5e-5,
        "train_batch_size": 32
    },
    sagemaker_session=sagemaker_session
)

print("BERT ModelTrainer created successfully from JumpStart config!")

In [None]:
# Start BERT training
print("Starting BERT training job...")
print("Note: This will use the default JumpStart dataset and may take 10-15 minutes.")

bert_trainer.train()
print(f"BERT training job completed!")

## Step 3: Train XGBoost Classification Model

Train an XGBoost model for classification tasks using JumpStart.

In [None]:
# Configure JumpStart for XGBoost
xgboost_jumpstart_config = JumpStartConfig(
    model_id="xgboost-classification-model"
)

# Create ModelTrainer from JumpStart config
xgboost_trainer = ModelTrainer.from_jumpstart_config(
    jumpstart_config=xgboost_jumpstart_config,
    base_job_name=f"{base_job_name}-xgboost",
    hyperparameters={
        "num_round": 10,  # Reduced for quick demonstration
        "max_depth": 5,
        "eta": 0.2,
        "objective": "binary:logistic"
    },
    sagemaker_session=sagemaker_session
)

print("XGBoost ModelTrainer created successfully from JumpStart config!")

In [None]:
# Start XGBoost training
print("Starting XGBoost training job...")
print("Note: This will use the default JumpStart dataset and should complete in 5-10 minutes.")

xgboost_trainer.train()
print(f"XGBoost training job completed!")

## Step 4: Train CatBoost Regression Model

Train a CatBoost model for regression tasks using JumpStart.

In [None]:
# Configure JumpStart for CatBoost
catboost_jumpstart_config = JumpStartConfig(
    model_id="catboost-regression-model"
)

# Create ModelTrainer from JumpStart config
catboost_trainer = ModelTrainer.from_jumpstart_config(
    jumpstart_config=catboost_jumpstart_config,
    base_job_name=f"{base_job_name}-catboost",
    hyperparameters={
        "iterations": 50,  # Reduced for quick demonstration
        "learning_rate": 0.1,
        "depth": 6,
        "loss_function": "RMSE"
    },
    sagemaker_session=sagemaker_session
)

print("CatBoost ModelTrainer created successfully from JumpStart config!")

In [None]:
# Start CatBoost training
print("Starting CatBoost training job...")
print("Note: This will use the default JumpStart dataset and should complete in 5-10 minutes.")

catboost_trainer.train()
print(f"CatBoost training job completed!")

## Step 5: Review Training Results

Check the status and results of our training jobs.

In [None]:
# Display training job information
training_jobs = [
    ("BERT", bert_trainer),
    ("XGBoost", xgboost_trainer),
    ("CatBoost", catboost_trainer)
]

print("Training Job Summary:")
print("=" * 50)

for model_name, trainer in training_jobs:
    job_name = trainer._latest_training_job.training_job_name
    model_artifacts = trainer._latest_training_job.model_artifacts
    
    print(f"\n{model_name} Model:")
    print(f"  Job Name: {job_name}")
    print(f"  Model Artifacts: {model_artifacts}")
    print(f"  Status: Completed")

## Step 6: Access Training Metrics (Optional)

View training metrics and logs from CloudWatch.

In [None]:
# Example: Access training job details
print("Training Job Details:")
print("\nTo view detailed training metrics and logs:")
print("1. Go to the SageMaker Console")
print("2. Navigate to 'Training' > 'Training jobs'")
print("3. Search for jobs with prefix:", base_job_name)
print("4. Click on any job to view metrics, logs, and model artifacts")

# You can also access logs programmatically
print("\nProgrammatic access to logs:")
for model_name, trainer in training_jobs:
    print(f"{model_name}: trainer.latest_training_job.describe()")

## Summary

This notebook demonstrated:
1. Creating ModelTrainer instances from JumpStart configurations
2. Training multiple model types (BERT, XGBoost, CatBoost) with custom hyperparameters
3. Using JumpStart's built-in datasets and training scripts
4. Monitoring training job progress and results

## Benefits of JumpStart Training:
- **Pre-configured models**: No need to write training scripts or handle data preprocessing
- **Best practices**: Optimized hyperparameters and training configurations
- **Multiple frameworks**: Support for HuggingFace, XGBoost, CatBoost, and more
- **Easy customization**: Override hyperparameters while keeping proven defaults
- **Built-in datasets**: Start training immediately with curated datasets

## Next Steps:
- Deploy trained models using SageMaker V3 ModelBuilder
- Fine-tune models with your own datasets
- Experiment with different hyperparameters
- Set up automated training pipelines

JumpStart training with V3 ModelTrainer makes it incredibly easy to get started with machine learning while maintaining the flexibility to customize as needed!