# Experiment Management with Comet ML and Amazon SageMaker
## Example: Fraud Detection

This notebook demonstrates how to build, track, and evaluate machine learning 
models for credit card fraud detection using Comet ML's experiment tracking 
platform integrated with Amazon SageMaker's managed ML services.

What you'll learn:
- Setting up Comet ML for experiment tracking in SageMaker
- Dataset Tracking with Comet Artifacts
- Logging a SageMaker train job with Comet
- End to end experiment management with Comet


## Setup
### Environment Setup and Imports

In [None]:
# Install required packages
!pip install comet_ml --upgrade --quiet

# Core imports
import os
import pandas as pd
import numpy as np
from dotenv import load_dotenv

# AWS and ML libraries
import sagemaker
import boto3
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
from sagemaker.serializers import CSVSerializer
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.sklearn.processing import SKLearnProcessor

# Comet ML for experiment tracking
import comet_ml
from comet_ml import Experiment, API, Artifact
from comet_ml.integration.sagemaker import log_sagemaker_training_job_v1

# Scikit-learn for evaluation
from sklearn.metrics import (confusion_matrix, accuracy_score, f1_score, 
                           precision_score, recall_score, roc_auc_score, 
                           precision_recall_curve, roc_curve)


### Configuration and Authentication
**Make sure the following environment variables are set:**
- COMET_API_KEY
- AWS_PARTNER_APP_ARN
- AWS_PARTNER_APP_AUTH='true'

In [None]:
# # Set Environment Variables
# os.environ['AWS_PARTNER_APP_AUTH'] = 'true'
# os.environ['AWS_PARTNER_APP_ARN'] = 'Your AWS PARTNER APP ARN'
# os.environ['COMET_API_KEY'] = ''

# Load environment variables
load_dotenv()

# Comet ML configuration
COMET_WORKSPACE = 'your-workspace'  # Replace with your Comet workspace
comet_api = API()

# SageMaker configuration
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket_name = 'your-sagemaker-bucket' #sagemaker_session.default_bucket()  # Or set your own bucket
s3 = boto3.client('s3')

input_data_prefix = 'fraud-detection-demo/datasets'
processed_data_prefix = 'fraud-detection-demo/datasets/processed'
model_output_prefix = 'fraud-detection-demo/models/'

## Data Preparation and Tracking

### Data Loading and Exploration
We'll be working with the credit card fraud dataset.
Download from: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud


In [None]:
# Load the dataset
file_path = 'creditcard.csv'
df = pd.read_csv(file_path)

# Analyze the dataset structure
print("\n Dataset Overview:")
print(f"   Dataset shape: {df.shape}")
print(f"   Columns: {list(df.columns)}")
print(f"   Data types: {df.dtypes.value_counts().to_dict()}")
print(f"   Missing values: {df.isnull().sum().sum()}")

# Analyze class distribution - this is crucial for fraud detection!
class_dist = df['Class'].value_counts()
fraud_percentage = df['Class'].mean() * 100

print(f"\n Class Distribution Analysis:")
print(f"   Normal transactions: {class_dist[0]:,} ({100-fraud_percentage:.2f}%)")
print(f"   Fraudulent transactions: {class_dist[1]:,} ({fraud_percentage:.2f}%)")

### Upload file to S3

Now, let us upload the CSV to S3.

In [None]:
s3_key = 'creditcard.csv'

a = s3.upload_file(file_path, bucket_name, os.path.join(input_data_prefix, s3_key))
# The S3 path is constructed from the bucket and key
s3_data_path = f"s3://{bucket_name}/{input_data_prefix}/{s3_key}"
print(f"File uploaded to {s3_data_path}")

### Dataset Artifact Creation
We'll create a Comet Dataset Artifact to begin tracking and versioning our dataset. The can now be linked to any experiment we start in Comet. We can update it, add assets, or create a new version any time.

In [None]:
# Create a Comet Artifact to track our raw dataset
dataset_artifact = Artifact(
    name="fraud-dataset",
    artifact_type="dataset",
    aliases=["raw"]
)

# Add the raw dataset file to the artifact
dataset_artifact.add_remote(s3_data_path, metadata={
    "dataset_stage": "raw", 
    "dataset_split": "not_split", 
    "preprocessing": "none"
})

## Initialize an Experiemnt

To begin tracking our work, we'll create a Comet Experiment. As soon as we initialize the experiment, Comet gets to work, already tracking our code, installed libraries, and other metadata in the background. To begin tracking data lineage for this experiment, we simply need to log the dataset artifact we just created to the experiment.

In [None]:
# Create a new Comet experiment
experiment_1 = comet_ml.Experiment(
    project_name=COMET_PROJECT_NAME,
    workspace=COMET_WORKSPACE,
)

# Log the dataset artifact to this experiment for lineage tracking
experiment_1.log_artifact(dataset_artifact)

## Data Preprocessing

The code for data processing can be found in `preprocess.py`.

In [None]:
!pygmentize preprocess.py

### Run the processing job

Now, let us run the data processing as a SageMaker Processing job.

In [None]:
# Run SageMaker processing job
processor = SKLearnProcessor(
    framework_version='1.0-1',
    role=sagemaker.get_execution_role(),
    instance_count=1,
    instance_type='ml.t3.medium'
)

processor.run(
    code='preprocess.py',
    inputs=[ProcessingInput(source=s3_data_path, destination='/opt/ml/processing/input')],
    outputs=[ProcessingOutput(source='/opt/ml/processing/output', destination=f's3://{bucket_name}/{processed_data_prefix}')]
)
print('Processing job started')

### Log Preprocessed Dataset
Now that we've modified our dataset, we'll log the preprocessed version to track lineage. We'll create a new dataset artifact for the preprocessed data, with the same name as our original dataset artifact. Comet will track this as a new version of the dataset. Next we will **log our S3 dataset path as a remote asset**. By default Comet will now link to and track all files in this path, meaning that we don't need to explicitly log each file separately.

Once we've logged the dataset to our first experiment, we'll be able to access all of these assets through Comet in the future.

In [None]:
# Create an updated version of the 'fraud-dataset' Artifact for the preprocessed data
preprocessed_dataset_artifact = Artifact(
    name="fraud-dataset",
    artifact_type="dataset", 
    aliases=["preprocessed"],
    metadata={
        "description": "Credit card fraud detection dataset",
        "source": "Kaggle - Credit Card Fraud Detection",
        "data_card": "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud",
        "total_samples": len(df),
        "fraud_samples": int(df['Class'].sum()),
        "fraud_percentage": f"{fraud_percentage:.3f}%",
        "dataset_stage": "preprocessed",
        "preprocessing": "StandardScaler + train/val/test split",
        "columns": ['Class'] + list(df.columns.drop(['Time'])),
        "feature_columns": list(df.columns.drop(['Time', 'Class'])),
        "target": "Class"
    }
)

# Add our train, validation, and test dataset files as remote assets 
preprocessed_dataset_artifact.add_remote(
    uri=f's3://{bucket_name}/{processed_data_prefix}',
    logical_path='split_data'
)

# Add our preprocessing code as an asset to be uploaded to Comet
preprocessed_dataset_artifact.add("preprocess.py")

# Log the updated dataset to the experiment to track the updates
experiment_1.log_artifact(preprocessed_dataset_artifact)

## Define common functions to be used across experiments

Now, we will define the steps of an experiment which are:
1. Training
2. Logging the training job and model artifact
3. Logging the model metrics
4. Deploy and evaluate the model and log evaluation metrics

For each of the steps, we create a function.

### Function for Training the model
This function runs a straightforward training job on SageMaker, no additional logging needed!

In [None]:
def train(
    model_output_path,
    execution_role,
    sagemaker_session_obj,
    hyperparameters_dict,
    train_channel_loc,
    val_channel_loc
):
    """
    Train an XGBoost model using SageMaker.

    Args:
        model_output_path (str): Path where the trained model will be saved
        execution_role (str): IAM role for SageMaker execution
        sagemaker_session_obj: SageMaker session object
        hyperparameters_dict (dict): Dictionary of hyperparameters
        train_channel_loc (str): Location of training data
        val_channel_loc (str): Location of validation data

    Returns:
        Estimator: Trained XGBoost estimator
    """
    # Get XGBoost container image
    xgboost_image = sagemaker.image_uris.retrieve(
        "xgboost",
        sagemaker_session.boto_region_name,
        version='1.5-1'
    )
    print(f"Using XGBoost image: {xgboost_image}")
    print(f"Model output location: {model_output_path}")

    # Create SageMaker estimator
    estimator = Estimator(
        image_uri=xgboost_image,
        role=execution_role,
        instance_count=1,
        instance_type='ml.m5.large',
        output_path=model_output_path,
        sagemaker_session=sagemaker_session_obj,
        hyperparameters=hyperparameters_dict,
        max_run=1800  # Maximum training time in seconds
    )

    # Set up data channels for SageMaker
    train_channel = TrainingInput(
        train_channel_loc,
        content_type='text/csv'
    )
    val_channel = TrainingInput(
        val_channel_loc,
        content_type='text/csv'
    )

    # Start training
    estimator.fit({
        'train': train_channel,
        'validation': val_channel
    })

    return estimator

### Function to Log the SageMaker Training Job with Comet
Once training is complete, we'll use this function to log the training job to Comet. 

In [None]:
def log_training_job(experiment_key, training_estimator):
    """
    Log SageMaker training job details to Comet.

    Args:
        experiment_key: Key identifier for the experiment
        training_estimator: SageMaker estimator object
    """
    # Get the API experiment object associated with our current experiment
    api_experiment = comet_api.get_experiment(
        COMET_WORKSPACE,
        COMET_PROJECT_NAME,
        experiment_key # accessed thorugh experiment.get_key()
    )

    # Log SageMaker training job details to Comet
    # (Warnings are expected here. They're not a problem!)
    log_sagemaker_training_job_v1(
        estimator=training_estimator,
        experiment=api_experiment
    )

### Function to log the model artifact
We'll log our model as a remote artifact, linking to the model saved on S3.  Logging the model artifact will enable us to register it to our model registry in the UI.

In [None]:
def log_model_to_comet(experiment, model_name, model_artifact_path, metadata):
    # Log model to Comet
    experiment.log_remote_model(
        model_name=model_name,
        uri=model_artifact_path,
        metadata=metadata
    )

### Function to deploy and evaluate the model
This function will deploy our model to an endpoint and evaluate its performance on our test dataset. We'll log relevant metrics and analytics to Comet to help with model debugging and improvement:

- Model performance metrics
- Confusion matrix
- Performance curves (ROC and precision-recall)

In [None]:
def deploy_and_evaluate_model(
    experiment,
    estimator,
    X_test_scaled,
    y_test
):
    """
    Deploy model to endpoint and evaluate its performance.

    Args:
        experiment: The currently running Comet experiment
        estimator: The trained estimator model
        X_test_scaled: Scaled test features
        y_test: Test labels

    Returns:
        dict: Dictionary containing model performance metrics
    """
    # Deploy to endpoint
    predictor = estimator.deploy(
        initial_instance_count=1,
        instance_type="ml.m5.xlarge"
    )
    print(f"Endpoint deployed: {predictor.endpoint_name}")

    # Prepare test data and make predictions
    predictor.serializer = CSVSerializer()

    # Process in batches to handle large datasets
    batch_size = 1000
    all_predictions = []

    for i in range(0, len(X_test_scaled), batch_size):
        batch = X_test_scaled.iloc[i:i+batch_size].values
        batch_pred = predictor.predict(batch)
        batch_decoded = batch_pred.decode('utf-8')
        batch_probs = [float(x.strip()) for x in batch_decoded.split('\n') if x.strip()]
        all_predictions.extend(batch_probs)

    y_pred_prob_as_np_array = np.array(all_predictions)

    # Validate prediction count matches test data
    if len(y_pred_prob_as_np_array) != len(y_test):
        raise ValueError(f"Prediction count ({len(y_pred_prob_as_np_array)}) doesn't match test data ({len(y_test)})")

    decision_threshold = 0.5
    y_pred = (y_pred_prob_as_np_array > decision_threshold).astype(int)

    # Calculate confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    tn, fp, fn, tp = cm.ravel()

    # Calculate comprehensive metrics
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1_score": f1_score(y_test, y_pred),
        "roc_auc": roc_auc_score(y_test, y_pred_prob_as_np_array),
        "true_positives": int(tp),
        "true_negatives": int(tn),
        "false_positives": int(fp),
        "false_negatives": int(fn),
        "total_fraud_cases": int(y_test.sum()),
        "total_test_samples": len(y_test),
        "decision_threshold": decision_threshold
    }

    # Log metrics to Comet
    experiment.log_metrics(metrics)

    # Print performance results
    print("\nModel Performance Results:")
    print(f"   Accuracy: {metrics['accuracy']:.4f}")
    print(f"   Precision: {metrics['precision']:.4f}")
    print(f"   Recall: {metrics['recall']:.4f}")
    print(f"   F1-Score: {metrics['f1_score']:.4f}")
    print(f"   ROC-AUC: {metrics['roc_auc']:.4f}")

    print("\nConfusion Matrix:")
    print(f"   True Negatives: {tn:,}")
    print(f"   False Positives: {fp:,}")
    print(f"   False Negatives: {fn:,}")
    print(f"   True Positives: {tp:,}")

    # Log confusion matrix to Comet
    labels = ['Normal', 'Fraud']
    experiment.log_confusion_matrix(matrix=cm, labels=labels)

    # Log precision-recall and ROC curves
    precision_curve, recall_curve, pr_thresholds = precision_recall_curve(y_test, y_pred_prob_as_np_array)
    experiment.log_curve("precision_recall_curve", x=recall_curve, y=precision_curve)

    fpr, tpr, _ = roc_curve(y_test, y_pred_prob_as_np_array)
    experiment.log_curve("roc_curve", x=fpr, y=tpr)


    return metrics

## Run the experiments

Now that we have set up all common utility functions, let us use them to run our experiments.

### Set up common variables for all experiments.

Some variables will remain the same across all experiments. Lets put them here

In [None]:
train_channel_location = f's3://{bucket_name}/{processed_data_prefix}/train_data.csv'
validation_channel_location = f's3://{bucket_name}/{processed_data_prefix}/val_data.csv'

In [None]:
# Save the test dataset locally for evaluation
s3.download_file(bucket_name, f'{processed_data_prefix}/test_data.csv', 'test_data.csv')
test_data = pd.read_csv('test_data.csv', header=None)
y_test = test_data[0]
X_test_scaled = test_data.drop(0, axis=1)

### Experiment 1

In [None]:
# Define hyperparameters for first experiment
hyperparameters_v1 = {
    'objective': 'binary:logistic',          # Binary classification
    'num_round': 100,                        # Number of boosting rounds
    'eval_metric': 'auc',                    # Evaluation metric (good for imbalanced data)
    'learning_rate': 0.15,                   # Learning rate
    'max_depth': 6,                          # Maximum tree depth
    'subsample': 0.7,                        # Fraction of samples for each tree
    'colsample_bytree': 0.7,                 # Fraction of features for each tree
    'booster': 'gbtree'                      # Booster algorithm: 'gbtree', 'gblinear', or 'dart'
}



estimator_1 = train(
    model_output_path=f"s3://{bucket_name}/{model_output_prefix}/1",
    execution_role=role,
    sagemaker_session_obj=sagemaker_session,
    hyperparameters_dict=hyperparameters_v1,
    train_channel_loc=train_channel_location,
    val_channel_loc=validation_channel_location
)

In [None]:
# log the training job
log_training_job(experiment_key = experiment_1.get_key(), training_estimator=estimator_1)

# log the model artifact to comet
metadata = {
    "framework": "XGBoost-SageMaker-Built-In", 
    "algorithm": "Gradient Boosting",
    "use_case": "fraud_detection",
    "data_type": "tabular",
    "target_metric": "auc"
}
log_model_to_comet(experiment = experiment_1,
                   model_name="fraud-detection-xgb-v1", 
                   model_artifact_path=estimator_1.model_data, 
                   metadata=metadata)

In [None]:
# Deploy and evaluate the model with Comet logging
deploy_and_evaluate_model(experiment=experiment_1,
                          estimator=estimator_1,
                          X_test_scaled=X_test_scaled,
                          y_test=y_test
                         )

#### End the first Experiemnt

In [None]:
# When running a Comet experiment from a Jupyter Notebook make sure to end the experiment to make sure everything is captured
experiment_1.end()

### Experiment 2: Weight positive class to improve true positive rate
Now that we've run our first experiment and logged it to Comet,let's run another experiment to compare side by side.
Here we'll train a new model using the same dataset and different hyperparameters.

#### Access Comet Artifacts

Since the preprocessed data was logged to the first experiment, we can access and re-use the same dataset through Comet. This is useful when collaborating on a project with a team across different notebooks and avoids the hassle of keeping track of dataset versions manually.

In [None]:
# Create a new experiment
experiment_2 = comet_ml.Experiment(
    project_name=COMET_PROJECT_NAME,
    workspace=COMET_WORKSPACE,
)

# Fetch the Artifact object from Comet and attach it to the new experiment
recovered_artifact = experiment_2.get_artifact('fraud-dataset',
                                       version_or_alias='preprocessed') # This automatically retrieves the latest version with the 'preprocessed' alias
        

In [None]:
# If we were running the next experiment from a separate notebook, we could use the S3 paths saved in Comet to access our datasets 
for asset in recovered_artifact.assets:
    if asset.logical_path == "split_data/test_data.csv":
        print(asset.link)

#### Train, log, and evaluate the second model
Train a new model using the same dataset and different hyperparameters.

In [None]:
# Experiment 2: Weight positive class to improve true positive rate

hyperparameters_v2 = {
    'objective': 'binary:logistic',
    'num_round': 175,
    'eval_metric': 'auc',
    'learning_rate': 0.14,
    'max_depth': 6,
    'subsample': 0.8,
    'colsample_bytree': 1.0,
    'scale_pos_weight': 1500,  # Handle class imbalance
    'booster': 'gbtree'
}

tags_v2 = ["fraud-detection", "sagemaker", "xgboost", "class-weighted"]
experiment_2.add_tags(tags_v2)


estimator_2 = train(
    model_output_path=f"s3://{bucket_name}/{model_output_prefix}/2",
    execution_role=role,
    sagemaker_session_obj=sagemaker_session,
    hyperparameters_dict=hyperparameters_v2,
    train_channel_loc=train_channel_location,
    val_channel_loc=validation_channel_location
)

#log training job metrics
log_training_job(experiment_key = experiment_2.get_key(), training_estimator=estimator_2)

#log model artifact to comet
metadata={
    "framework": "XGBoost-SageMaker-Built-In", 
    "algorithm": "Gradient Boosting",
    "use_case": "fraud_detection",
    "data_type": "tabular",
    "target_metric": "auc",
}
log_model_to_comet(experiment = experiment_2,
                   model_name="fraud-detection-xgb-v2", 
                   model_artifact_path=estimator_2.model_data, 
                   metadata=metadata)

#deploy the model and log eval metrics
deploy_and_evaluate_model(experiment=experiment_2,
                          estimator=estimator_2, 
                          X_test_scaled=X_test_scaled, 
                          y_test=y_test
                         )

# End the Experiment
experiment_2.end()

## View Comet Experiments in the UI
The experiment URL links us to our experiment in the Comet UI, where we'll be able to debug, view metrics, and register our model. From the current experiment, navigate back to the project in the UI to compare it against our previous experiment.

In [None]:
experiment_2.url 