<img src="https://cdn.comet.ml/img/notebook_logo.png">

# Comet + Pytorch: Credit Card Fraud Detection

[Comet](https://www.comet.com/site/products/ml-experiment-tracking/?utm_campaign=pytorch&utm_medium=colab) is an MLOps Platform that is designed to help Data Scientists and Teams build better models faster. Comet provides tooling to track, Explain, Manage, and Monitor your models in a single place. It works with Jupyter Notebooks and Scripts.

[PyTorch](https://pytorch.org/) is a popular open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.

PyTorch enables fast, flexible experimentation and efficient production through a user-friendly front-end, distributed training, and ecosystem of tools and libraries.

Instrument PyTorch with Comet to start managing experiments, create dataset versions and track hyperparameters for faster and easier reproducibility and collaboration.

[Find more information about our integration with Pytorch](https://www.comet.ml/docs/v2/integrations/ml-frameworks/pytorch/)

Curious about how Comet can help you build better models, faster? Find out more about [Comet](https://www.comet.com/site/products/ml-experiment-tracking/?utm_campaign=pytorch&utm_medium=colab) and our [other integrations](https://www.comet.ml/docs/v2/integrations/overview/)


## Importing Required Packages

In [None]:
from comet_ml import Experiment
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sklearn imports
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve
from comet_ml.integration.pytorch import log_model, watch

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"PyTorch version: {torch.__version__}")
print(f"Using device: {device}")

## Load Data

Let's read our data using pandas library. The dataset can be downloaded from Kaggle at https://www.kaggle.com/code/rawaaelghali/credit-card-fraud-detection-using-xgboost/data

In [None]:
# Load the credit card fraud dataset
# You can download this from Kaggle or use the path from the XGBoost example
df = pd.read_csv('./creditcard.csv')

In [None]:
print("-" * 50)
print('Shape of the dataframe:', df.shape)
print("Number of records in dataset:", df.shape[0])
print("\nInformation of the dataset:")
df.info()
print("-" * 50)
print("\nFirst 5 records of the dataset:")
df.head()

## Initialize Comet Experiment

In [None]:
# Instantiate Comet Experiment
experiment = Experiment(
    project_name='fraud_detection',
    # api_key="YOUR_API_KEY",  # Uncomment and add your API key
    # workspace="YOUR_WORKSPACE"  # Uncomment and add your workspace
)

experiment.add_tag('pytorch')

## Log Dataframe Profile

In [None]:
# Log pandas profiling report to Comet
experiment.log_dataframe_profile(df, "pandas_profiling_full", minimal=True)

## Log Dataset Artifact

In [None]:
from comet_ml import Artifact

# Create dataset artifact
artifact = Artifact(
    name="fraud-dataset-pytorch",
    artifact_type="dataset",
    aliases=["raw"]
)

artifact.add('./creditcard.csv')

# Log artifact
experiment.log_artifact(artifact)

## Data Exploration and Visualization

In [None]:
# Check class distribution
print("Class distribution:")
print(df['Class'].value_counts())

# Log class counts
fraud_count = df['Class'].sum()
non_fraud_count = len(df) - fraud_count

# Visualize class distribution
plt.figure(figsize=(8, 6))
sns.countplot(x='Class', data=df)
plt.title('Class Distribution (0: Non-Fraud, 1: Fraud)')
experiment.log_figure("class_distribution", plt)
plt.show()

## Data Preprocessing

In [None]:
# Scale the 'Amount' column
scaler = StandardScaler()
df['Amount'] = scaler.fit_transform(df['Amount'].values.reshape(-1, 1))

# Drop the 'Time' column as it's not useful for our model
df = df.drop(['Time'], axis=1)

print(df['Amount'].head(10))

In [None]:
# Split features and target
X = df.drop('Class', axis=1).values.astype(np.float32)  # Convert to float32 for faster training
y = df['Class'].values.astype(np.float32)

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Fraud cases: {int(y.sum())} ({y.sum()/len(y)*100:.2f}%)")

## Build PyTorch Model
Once the model has been defined, use [watch(model)](https://www.comet.com/docs/v2/integrations/ml-frameworks/pytorch/#weightsbiases-and-gradients-logging) to auto-log weights, biases, and gradients to Comet.

In [None]:
# Define hyperparameters
hyper_params = {
    'test_size': 0.2,
    'learning_rate': 0.001,
    'epochs': 20,  # Reduced for faster demo (increase for production)
    'batch_size': 2048,  # Increased for faster training with large dataset
    'hidden_units_1': 128,
    'hidden_units_2': 64,
    'hidden_units_3': 32,
    'dropout_rate': 0.3
}

experiment.log_parameters(hyper_params)

In [None]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=hyper_params['test_size'],
    random_state=42,
    stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

In [None]:
# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train).to(device)
y_train_tensor = torch.FloatTensor(y_train).to(device)
X_test_tensor = torch.FloatTensor(X_test).to(device)
y_test_tensor = torch.FloatTensor(y_test).to(device)

# Create DataLoaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=hyper_params['batch_size'], shuffle=True)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=hyper_params['batch_size'], shuffle=False)

In [None]:
class FraudDetectionNet(nn.Module):
    """PyTorch Neural Network for Fraud Detection."""
    
    def __init__(self, input_dim, params):
        super(FraudDetectionNet, self).__init__()
        
        self.fc1 = nn.Linear(input_dim, params['hidden_units_1'])
        self.bn1 = nn.BatchNorm1d(params['hidden_units_1'])
        self.dropout1 = nn.Dropout(params['dropout_rate'])
        
        self.fc2 = nn.Linear(params['hidden_units_1'], params['hidden_units_2'])
        self.bn2 = nn.BatchNorm1d(params['hidden_units_2'])
        self.dropout2 = nn.Dropout(params['dropout_rate'])
        
        self.fc3 = nn.Linear(params['hidden_units_2'], params['hidden_units_3'])
        self.bn3 = nn.BatchNorm1d(params['hidden_units_3'])
        self.dropout3 = nn.Dropout(params['dropout_rate'])
        
        self.fc4 = nn.Linear(params['hidden_units_3'], 1)
        
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.relu(self.bn1(self.fc1(x)))
        x = self.dropout1(x)
        
        x = self.relu(self.bn2(self.fc2(x)))
        x = self.dropout2(x)
        
        x = self.relu(self.bn3(self.fc3(x)))
        x = self.dropout3(x)
        
        x = self.sigmoid(self.fc4(x))
        return x

# Build the model
model = FraudDetectionNet(X_train.shape[1], hyper_params).to(device)

# Log weights, biases, and gradients to Comet using watch
watch(model)

In [None]:
# Define loss function and optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=hyper_params['learning_rate'])

## Train the Model

Train model and log custom metrics to Comet. Loss metric will be auto-logged.

In [None]:
# Training loop
train_losses = []
train_accuracies = []

with experiment.train():
    for epoch in range(hyper_params['epochs']):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        
        for batch_X, batch_y in train_loader:
            # Zero the gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(batch_X).squeeze()
            loss = criterion(outputs, batch_y)
            
            # Backward pass and optimize
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            # Calculate accuracy
            predicted = (outputs > 0.5).float()
            total += batch_y.size(0)
            correct += (predicted == batch_y).sum().item()
        
        epoch_loss = running_loss / len(train_loader)
        epoch_acc = correct / total
        
        train_losses.append(epoch_loss)
        train_accuracies.append(epoch_acc)
        
        # Log metrics to Comet
        #experiment.log_metric('train_loss', epoch_loss, step=epoch)
        experiment.log_metric('accuracy', epoch_acc, step=epoch)
        experiment.log_current_epoch(epoch)
        
        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{hyper_params["epochs"]}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.4f}')

## Log Model
Log model to Comet using Comet's  [pytorch integration](https://www.comet.com/docs/v2/integrations/ml-frameworks/pytorch/#pytorch-model-saving-and-loading). Logging the model will later allow us to register the model to Comet's Model Registry. 

In [None]:
# Save and log the model to Comet
# torch.save(model.state_dict(), 'fraud_model_pytorch.pth')
# experiment.log_model('fraud-demo-pytorch', 'fraud_model_pytorch.pth')

log_model(experiment, model, "pytorch-fraud-model")

## Evaluate the Model
Use log_metric to log any evaluation metrics to Comet. 

In [None]:
# Evaluation
with experiment.test():
    model.eval()
    with torch.no_grad():
        y_pred_proba = model(X_test_tensor).cpu().numpy().flatten()
        y_pred = (y_pred_proba > 0.5).astype(int)
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='macro')
    
    print(f"Accuracy: {accuracy * 100:.2f}%")
    print(f"Precision: {precision * 100:.2f}%")
    print(f"Recall: {recall * 100:.2f}%")
    print(f"F1 Score: {f1 * 100:.2f}%")
    
    # Log metrics to Comet
    experiment.log_metric('accuracy', accuracy)
    experiment.log_metric('precision', precision)
    experiment.log_metric('recall', recall)
    experiment.log_metric('f1_score', f1)

## Log Curve
Log ROC curve to Comet, which will be able to be displayed interactively with Comet's Curves Panel. 

In [None]:
# Log ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
experiment.log_curve("ROC_Curve", x=fpr, y=tpr)

In [None]:
# End the experiment
experiment.end()