
# TASK #1: UNDERSTAND THE PROBLEM STATEMENT AND RESEARCH CONTEXT

## RSA Semiprime Factorization using Machine Learning

**Objective:** Develop and compare neural network architectures for RSA semiprime factorization, improving upon previous research by Murat et al. and Nene & Uludag.

**Problem Definition:**
- Given a semiprime N = p × q (product of two primes), predict the prime factors p and q
- Evaluate models using β-metrics (tolerance for bit errors)
- Compare performance across different architectural approaches

**Previous Research Context:**
- **Murat et al.**: Binary LSTM approach with basic feature engineering
- **Nene & Uludag**: Enhanced feature extraction with mathematical properties
- **Our Improvements**: Advanced feature engineering (125D vs 14-bit), GAN-based generation, Transformer architectures

**Models to Evaluate:**
1. **Binary LSTM** (Murat et al. baseline)
2. **Dual Loss LSTM** (Enhanced with p,q prediction)
3. **Enhanced Transformer** (125D mathematical features)
4. **GAN-based Factorization** (Adversarial approach)

**Evaluation Metrics:**
- β₀: Exact bit match accuracy
- β₁: ≤1 bit error tolerance
- β₂: ≤2 bit error tolerance  
- β₃: ≤3 bit error tolerance
- β₄: ≤4 bit error tolerance

**Dataset Scales:** tiny, small, medium, large (varying N bit sizes)

# TASK #2: IMPORT LIBRARIES AND SETUP AWS ENVIRONMENT

In [None]:
# AWS and SageMaker imports
import sagemaker
import boto3
from sagemaker import Session
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner
)

# Standard ML libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
from tqdm import tqdm
import json
import os
from datetime import datetime

# Display settings
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline

print("Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"SageMaker version: {sagemaker.__version__}")

In [None]:
# Initialize SageMaker session and get execution role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
region = sagemaker_session.boto_region_name

# S3 bucket and prefix configuration
bucket = sagemaker_session.default_bucket()  # Replace with your bucket name if needed
prefix = 'rsa-ml-attack'

print(f"SageMaker session region: {region}")
print(f"SageMaker execution role: {role}")
print(f"S3 bucket: {bucket}")
print(f"S3 prefix: {prefix}")

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Training device: {device}")

# TASK #3: LOAD AND EXPLORE RSA DATASET FROM S3

In [None]:
# S3 paths for different dataset scales
data_scales = ['tiny', 'small', 'medium', 'large']
s3_data_paths = {}

for scale in data_scales:
    s3_data_paths[scale] = {
        'train': f's3://{bucket}/{prefix}/data/{scale}_train.csv',
        'test': f's3://{bucket}/{prefix}/data/{scale}_test.csv'
    }

print("S3 Data Paths:")
for scale, paths in s3_data_paths.items():
    print(f"  {scale.upper()}:")
    print(f"    Train: {paths['train']}")
    print(f"    Test: {paths['test']}")

In [None]:
# Load datasets from S3
def load_dataset_from_s3(s3_path):
    """Load CSV dataset from S3 path"""
    return pd.read_csv(s3_path)

# Load small dataset for initial exploration
scale = 'small'  # Start with small dataset
print(f"Loading {scale} dataset for exploration...")

train_df = load_dataset_from_s3(s3_data_paths[scale]['train'])
test_df = load_dataset_from_s3(s3_data_paths[scale]['test'])

print(f"\nDataset loaded:")
print(f"Training samples: {len(train_df)}")
print(f"Test samples: {len(test_df)}")
print(f"Features: {list(train_df.columns)}")

In [None]:
# Data exploration and visualization
print("Dataset Statistics:")
print(f"Max N: {max(train_df['N'].max(), test_df['N'].max()):,}")
print(f"Max p: {max(train_df['p'].max(), test_df['p'].max()):,}")
print(f"Max q: {max(train_df['q'].max(), test_df['q'].max()):,}")

# Display sample data
print("\nSample training data:")
display(train_df.head())

print("\nData types and info:")
print(train_df.info())

# Check for any missing values
print("\nMissing values:")
print(train_df.isnull().sum())

In [None]:
# Visualize data distributions
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Distribution of N values
axes[0,0].hist(train_df['N'], bins=50, alpha=0.7, edgecolor='black')
axes[0,0].set_title('Distribution of Semiprime N Values')
axes[0,0].set_xlabel('N (Semiprime)')
axes[0,0].set_ylabel('Frequency')

# Distribution of p values
axes[0,1].hist(train_df['p'], bins=50, alpha=0.7, color='orange', edgecolor='black')
axes[0,1].set_title('Distribution of Prime p Values')
axes[0,1].set_xlabel('p (Prime Factor)')
axes[0,1].set_ylabel('Frequency')

# Distribution of q values
axes[1,0].hist(train_df['q'], bins=50, alpha=0.7, color='green', edgecolor='black')
axes[1,0].set_title('Distribution of Prime q Values')
axes[1,0].set_xlabel('q (Prime Factor)')
axes[1,0].set_ylabel('Frequency')

# Bit lengths visualization
train_df['N_bits'] = train_df['N'].apply(lambda x: x.bit_length())
train_df['p_bits'] = train_df['p'].apply(lambda x: x.bit_length())
train_df['q_bits'] = train_df['q'].apply(lambda x: x.bit_length())

bit_lengths = ['N_bits', 'p_bits', 'q_bits']
bit_data = [train_df[col] for col in bit_lengths]
axes[1,1].boxplot(bit_data, labels=bit_lengths)
axes[1,1].set_title('Bit Length Distributions')
axes[1,1].set_ylabel('Bit Length')

plt.tight_layout()
plt.show()

print(f"Bit length statistics:")
print(f"N bits: {train_df['N_bits'].min()}-{train_df['N_bits'].max()}")
print(f"p bits: {train_df['p_bits'].min()}-{train_df['p_bits'].max()}")
print(f"q bits: {train_df['q_bits'].min()}-{train_df['q_bits'].max()}")

# TASK #4: FEATURE ENGINEERING AND DATA PREPROCESSING

In [None]:
# Enhanced Feature Engineering (improved over Murat et al. and Nene & Uludag)
class FeatureEngineer:
    """Advanced feature extraction for RSA semiprimes with mathematical properties"""
    
    def __init__(self):
        self.feature_names = None
    
    def basic_features(self, N):
        """Basic mathematical properties"""
        features = []
        
        # Basic properties
        features.extend([
            N,                              # Original number
            N.bit_length(),                 # Bit length
            N % 2,                          # Parity (always 1 for odd semiprimes)
            (N - 1) // 2,                   # (N-1)/2
            (N + 1) // 2,                   # (N+1)/2
        ])
        
        return features
    
    def modular_features(self, N):
        """Modular arithmetic features"""
        features = []
        
        # Modular properties with small primes
        small_primes = [3, 5, 7, 11, 13, 17, 19, 23]
        for p in small_primes:
            features.append(N % p)
        
        # Quadratic residues
        for p in [3, 5, 7, 11]:
            features.append(pow(N, (p-1)//2, p))  # Legendre symbol approximation
        
        return features
    
    def number_theory_features(self, N):
        """Advanced number theory features (ECPP/GNFS inspired)"""
        features = []
        
        # Digit-based features
        N_str = str(N)
        features.extend([
            len(N_str),                     # Number of digits
            sum(int(d) for d in N_str),     # Digit sum
            int(N_str[-1]),                 # Last digit
            int(N_str[0]),                  # First digit
        ])
        
        # Divisibility tests
        features.extend([
            N % 3,
            N % 9,
            N % 11,
            sum(int(d) for i, d in enumerate(N_str) if i % 2 == 0) - 
            sum(int(d) for i, d in enumerate(N_str) if i % 2 == 1)  # Alternating sum for 11-divisibility
        ])
        
        return features
    
    def statistical_features(self, N):
        """Statistical and bit-pattern features"""
        features = []
        
        # Binary representation analysis
        binary = bin(N)[2:]  # Remove '0b' prefix
        
        features.extend([
            binary.count('1'),              # Hamming weight
            binary.count('0'),              # Number of zeros
            binary.count('11'),             # Consecutive ones
            binary.count('00'),             # Consecutive zeros
            len(binary) - len(binary.rstrip('0')),  # Trailing zeros
        ])
        
        # Bit pattern features
        if len(binary) >= 4:
            features.extend([
                int(binary[:4], 2),         # First 4 bits
                int(binary[-4:], 2),        # Last 4 bits
            ])
        else:
            features.extend([0, 0])
        
        return features
    
    def crypto_features(self, N):
        """Cryptographic and complexity features"""
        features = []
        
        # Fermat-like tests (probabilistic)
        for a in [2, 3, 5, 7]:
            if a < N:
                features.append(pow(a, N-1, N))  # Fermat test
            else:
                features.append(0)
        
        # Miller-Rabin inspired features
        # Write N-1 as d * 2^r
        d = N - 1
        r = 0
        while d % 2 == 0:
            d //= 2
            r += 1
        
        features.extend([d, r])
        
        # Additional complexity measures
        features.extend([
            N // 100,                       # Scaled down version
            int(np.log2(N)) if N > 0 else 0,  # Log base 2
            int(np.sqrt(N)),                # Integer square root
        ])
        
        return features
    
    def extract_all_features(self, N):
        """Extract comprehensive 125-dimensional feature vector"""
        features = []
        
        # Combine all feature types
        features.extend(self.basic_features(N))
        features.extend(self.modular_features(N))
        features.extend(self.number_theory_features(N))
        features.extend(self.statistical_features(N))
        features.extend(self.crypto_features(N))
        
        # Pad or truncate to exactly 125 features
        while len(features) < 125:
            features.append(0)
        
        return np.array(features[:125], dtype=np.float32)

# Initialize feature engineer
feature_engineer = FeatureEngineer()

# Test feature extraction on sample
sample_N = train_df['N'].iloc[0]
sample_features = feature_engineer.extract_all_features(sample_N)
print(f"Feature extraction test:")
print(f"Input N: {sample_N}")
print(f"Feature vector shape: {sample_features.shape}")
print(f"Feature preview: {sample_features[:10]}")

# TASK #5: BINARY LSTM MODEL (MURAT ET AL. BASELINE)

## Binary LSTM Architecture

This model replicates the approach from **Murat et al.** research:
- Input: Binary representation of semiprime N
- Architecture: LSTM → Dense layers
- Output: Binary representation of prime factor p
- Limitation: Only predicts one factor (p), not both p and q

In [None]:
# Binary LSTM Dataset (Murat et al. approach)
class BinaryLSTMDataset(Dataset):
    """Dataset for binary LSTM using bit representations"""
    
    def __init__(self, N_values, p_values):
        self.binary_sequences = []
        self.factor_bits = []
        
        # Determine bit sizes
        max_N = max(N_values)
        max_p = max(p_values)
        self.N_bits = int(max_N).bit_length()
        self.p_bits = int(max_p).bit_length()
        
        print(f"Using {self.N_bits} bits for N, {self.p_bits} bits for p")
        
        for N, p in zip(N_values, p_values):
            # Binary representation of N (input sequence)
            N_binary = format(N, f'0{self.N_bits}b')
            self.binary_sequences.append([int(bit) for bit in N_binary])
            
            # Binary representation of p (target)
            p_binary = format(p, f'0{self.p_bits}b')
            self.factor_bits.append([int(bit) for bit in p_binary])
        
        # Convert to tensors
        self.X = torch.FloatTensor(self.binary_sequences)
        self.y = torch.FloatTensor(self.factor_bits)
        
        print(f"Binary LSTM dataset created: {len(self.X)} samples")
        print(f"Input shape: {self.X.shape}, Output shape: {self.y.shape}")
    
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

# Create binary LSTM dataset
binary_train_dataset = BinaryLSTMDataset(train_df['N'].values, train_df['p'].values)
binary_test_dataset = BinaryLSTMDataset(test_df['N'].values, test_df['p'].values)

In [None]:
# Binary LSTM Model (Murat et al. architecture)
class BinaryLSTM(nn.Module):
    """LSTM model for binary sequence prediction (Murat et al. approach)"""
    
    def __init__(self, input_size, hidden_size=128, num_layers=2, output_size=None):
        super(BinaryLSTM, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layers
        self.lstm = nn.LSTM(1, hidden_size, num_layers, batch_first=True, dropout=0.2)
        
        # Dense layers with LayerNorm (avoiding BatchNorm for small batches)
        self.ln1 = nn.LayerNorm(hidden_size)
        self.fc1 = nn.Linear(hidden_size, 128)
        self.ln2 = nn.LayerNorm(128)
        self.fc2 = nn.Linear(128, 64)
        self.ln3 = nn.LayerNorm(64)
        self.fc3 = nn.Linear(64, output_size)
        
        self.dropout = nn.Dropout(0.3)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        # Reshape for LSTM: (batch, sequence, feature)
        x = x.unsqueeze(-1)  # Add feature dimension
        
        # LSTM forward
        lstm_out, (hidden, cell) = self.lstm(x)
        
        # Use last hidden state
        output = hidden[-1]  # Take last layer's hidden state
        
        # Dense layers
        output = self.ln1(output)
        output = self.relu(self.fc1(output))
        output = self.dropout(output)
        
        output = self.ln2(output)
        output = self.relu(self.fc2(output))
        output = self.dropout(output)
        
        output = self.ln3(output)
        output = self.sigmoid(self.fc3(output))
        
        return output

# Initialize Binary LSTM model
binary_model = BinaryLSTM(
    input_size=binary_train_dataset.N_bits,
    output_size=binary_train_dataset.p_bits
).to(device)

print(f"Binary LSTM Model:")
print(f"Input size: {binary_train_dataset.N_bits} bits")
print(f"Output size: {binary_train_dataset.p_bits} bits")
print(f"Total parameters: {sum(p.numel() for p in binary_model.parameters()):,}")
print(f"Model summary:")
print(binary_model)

In [None]:
# β-metrics evaluation functions
def calculate_beta_metrics(predictions, targets):
    """Calculate β_i metrics (percentage with at most i bit errors)"""
    # Convert to binary predictions
    binary_preds = (predictions > 0.5).float()
    
    # Count bit errors for each sample
    errors = (binary_preds != targets).sum(dim=1)
    
    beta_metrics = {}
    for i in range(5):
        beta_metrics[f'beta_{i}'] = (errors <= i).float().mean().item()
    
    return beta_metrics

def evaluate_model_comprehensive(model, test_loader, model_name="Model"):
    """Comprehensive evaluation with β-metrics"""
    model.eval()
    all_predictions = []
    all_targets = []
    total_loss = 0
    
    criterion = nn.BCELoss()
    
    with torch.no_grad():
        for batch_features, batch_targets in test_loader:
            batch_features = batch_features.to(device)
            batch_targets = batch_targets.to(device)
            
            predictions = model(batch_features)
            loss = criterion(predictions, batch_targets)
            total_loss += loss.item()
            
            all_predictions.append(predictions)
            all_targets.append(batch_targets)
    
    # Concatenate all batches
    final_predictions = torch.cat(all_predictions, dim=0)
    final_targets = torch.cat(all_targets, dim=0)
    
    # Calculate β-metrics
    beta_metrics = calculate_beta_metrics(final_predictions, final_targets)
    avg_loss = total_loss / len(test_loader)
    
    print(f"\n{model_name} Evaluation Results:")
    print(f"  Average Loss: {avg_loss:.4f}")
    print(f"  β₀ (exact match): {beta_metrics['beta_0']:.4f} ({beta_metrics['beta_0']*100:.2f}%)")
    print(f"  β₁ (≤1 bit error): {beta_metrics['beta_1']:.4f} ({beta_metrics['beta_1']*100:.2f}%)")
    print(f"  β₂ (≤2 bit error): {beta_metrics['beta_2']:.4f} ({beta_metrics['beta_2']*100:.2f}%)")
    print(f"  β₃ (≤3 bit error): {beta_metrics['beta_3']:.4f} ({beta_metrics['beta_3']*100:.2f}%)")
    print(f"  β₄ (≤4 bit error): {beta_metrics['beta_4']:.4f} ({beta_metrics['beta_4']*100:.2f}%)")
    
    return beta_metrics, avg_loss

print("Evaluation functions defined successfully!")

In [None]:
# Training function for Binary LSTM
def train_binary_lstm(model, train_loader, test_loader, epochs=30, lr=0.001):
    """Train Binary LSTM model"""
    
    criterion = nn.BCELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.8)
    
    training_history = []
    
    print(f"Training Binary LSTM for {epochs} epochs...")
    print(f"Batch size: {train_loader.batch_size}")
    print(f"Training batches: {len(train_loader)}")
    print(f"Test batches: {len(test_loader)}")
    
    for epoch in range(epochs):
        # Training phase
        model.train()
        epoch_loss = 0
        
        for batch_features, batch_targets in tqdm(train_loader, desc=f"Epoch {epoch+1}"):
            batch_features = batch_features.to(device)
            batch_targets = batch_targets.to(device)
            
            optimizer.zero_grad()
            predictions = model(batch_features)
            loss = criterion(predictions, batch_targets)
            loss.backward()
            
            # Gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            optimizer.step()
            epoch_loss += loss.item()
        
        scheduler.step()
        avg_train_loss = epoch_loss / len(train_loader)
        
        # Evaluation every 5 epochs
        if (epoch + 1) % 5 == 0 or epoch == epochs - 1:
            beta_metrics, test_loss = evaluate_model_comprehensive(
                model, test_loader, f"Binary LSTM (Epoch {epoch+1})"
            )
            
            training_history.append({
                'epoch': epoch + 1,
                'train_loss': avg_train_loss,
                'test_loss': test_loss,
                'beta_0': beta_metrics['beta_0'],
                'beta_1': beta_metrics['beta_1'],
                'beta_2': beta_metrics['beta_2'],
                'beta_3': beta_metrics['beta_3'],
                'beta_4': beta_metrics['beta_4']
            })
        else:
            print(f"Epoch {epoch+1}/{epochs}: Train Loss = {avg_train_loss:.4f}")
    
    return training_history

# Create data loaders
batch_size = 4  # Small batch size for stability
binary_train_loader = DataLoader(binary_train_dataset, batch_size=batch_size, shuffle=True)
binary_test_loader = DataLoader(binary_test_dataset, batch_size=batch_size, shuffle=False)

print(f"Data loaders created with batch size: {batch_size}")

In [None]:
# Execute Binary LSTM training
print("Starting Binary LSTM Training (Murat et al. baseline)...")
print("="*60)

binary_history = train_binary_lstm(
    binary_model, 
    binary_train_loader, 
    binary_test_loader, 
    epochs=30
)

print("\nBinary LSTM training completed!")
print("="*60)

In [None]:
# Visualize Binary LSTM training results
def plot_training_history(history, model_name):
    """Plot training history with β-metrics"""
    if not history:
        print("No training history to plot")
        return
    
    df = pd.DataFrame(history)
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Loss curves
    axes[0,0].plot(df['epoch'], df['train_loss'], label='Train Loss', marker='o')
    axes[0,0].plot(df['epoch'], df['test_loss'], label='Test Loss', marker='s')
    axes[0,0].set_title(f'{model_name} - Loss Curves')
    axes[0,0].set_xlabel('Epoch')
    axes[0,0].set_ylabel('Loss')
    axes[0,0].legend()
    axes[0,0].grid(True)
    
    # β-metrics evolution
    for i in range(5):
        axes[0,1].plot(df['epoch'], df[f'beta_{i}'], label=f'β{i}', marker='o')
    axes[0,1].set_title(f'{model_name} - β-metrics Evolution')
    axes[0,1].set_xlabel('Epoch')
    axes[0,1].set_ylabel('Accuracy')
    axes[0,1].legend()
    axes[0,1].grid(True)
    
    # Final β-metrics bar chart
    final_betas = [df[f'beta_{i}'].iloc[-1] for i in range(5)]
    beta_labels = [f'β{i}' for i in range(5)]
    
    bars = axes[1,0].bar(beta_labels, final_betas, color=['red', 'orange', 'yellow', 'lightgreen', 'green'])
    axes[1,0].set_title(f'{model_name} - Final β-metrics')
    axes[1,0].set_ylabel('Accuracy')
    axes[1,0].set_ylim(0, 1)
    
    # Add value labels on bars
    for bar, value in zip(bars, final_betas):
        axes[1,0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
                      f'{value:.3f}', ha='center', va='bottom')
    
    # Performance improvement over epochs
    axes[1,1].plot(df['epoch'], df['beta_0'], label='β₀ (Exact)', linewidth=3)
    axes[1,1].plot(df['epoch'], df['beta_1'], label='β₁ (≤1 error)', linewidth=2)
    axes[1,1].set_title(f'{model_name} - Key Performance Metrics')
    axes[1,1].set_xlabel('Epoch')
    axes[1,1].set_ylabel('Accuracy')
    axes[1,1].legend()
    axes[1,1].grid(True)
    
    plt.tight_layout()
    plt.show()
    
    # Print final results summary
    print(f"\n{model_name} - Final Results Summary:")
    print(f"{'Metric':<15} {'Value':<10} {'Percentage':<12}")
    print("-" * 40)
    for i in range(5):
        value = df[f'beta_{i}'].iloc[-1]
        print(f"β{i} (≤{i} errors)  {value:<10.4f} {value*100:<12.2f}%")

# Plot Binary LSTM results
plot_training_history(binary_history, "Binary LSTM (Murat et al.)")

# TASK #6: PREPARE FOR ADDITIONAL MODELS

## Next Steps:
1. **Dual Loss LSTM** - Enhanced version with both p and q prediction
2. **Enhanced Transformer** - Using 125D mathematical features
3. **GAN-based Factorization** - Adversarial approach for factor generation
4. **Model Comparison** - Comprehensive comparison across all architectures

Each model will be evaluated using the same β-metrics framework for fair comparison.

In [None]:
# Save Binary LSTM results
binary_results = {
    'model_name': 'Binary LSTM (Murat et al.)',
    'approach': 'Binary sequence prediction with LSTM',
    'training_history': binary_history,
    'final_metrics': binary_history[-1] if binary_history else {},
    'dataset_scale': scale,
    'training_samples': len(binary_train_dataset),
    'test_samples': len(binary_test_dataset),
    'model_parameters': sum(p.numel() for p in binary_model.parameters()),
    'timestamp': datetime.now().isoformat()
}

# Store for comparison
model_results = {'binary_lstm': binary_results}

print("Binary LSTM results saved for comparison.")
print(f"Model achieved:")
if binary_history:
    final = binary_history[-1]
    print(f"  β₀: {final['beta_0']:.4f} ({final['beta_0']*100:.2f}%)")
    print(f"  β₁: {final['beta_1']:.4f} ({final['beta_1']*100:.2f}%)")
    print(f"  β₂: {final['beta_2']:.4f} ({final['beta_2']*100:.2f}%)")

# TASK #7: SAGEMAKER EXPERIMENT TRACKING SETUP

Set up SageMaker Experiments for systematic tracking of all model training runs.

In [None]:
# SageMaker Experiments setup
from sagemaker.experiments.experiment import Experiment
from sagemaker.experiments.trial import Trial
from sagemaker.experiments.trial_component import TrialComponent
from sagemaker.analytics import ExperimentAnalytics

# Create experiment
experiment_name = f"rsa-ml-attack-{datetime.now().strftime('%Y-%m-%d-%H-%M')}"

try:
    experiment = Experiment.create(
        experiment_name=experiment_name,
        description="RSA Semiprime Factorization using Neural Networks - Comparison Study",
        sagemaker_session=sagemaker_session
    )
    print(f"Experiment created: {experiment_name}")
except Exception as e:
    print(f"Experiment might already exist: {e}")
    experiment = Experiment.load(experiment_name=experiment_name, sagemaker_session=sagemaker_session)

# Function to log metrics to SageMaker
def log_metrics_to_sagemaker(trial_name, metrics_dict, model_name):
    """Log training metrics to SageMaker Experiments"""
    try:
        with Trial.load(trial_name=trial_name, experiment_name=experiment_name) as trial:
            for metric_name, value in metrics_dict.items():
                trial.log_parameter(f"{model_name}_{metric_name}", value)
        print(f"Metrics logged to trial: {trial_name}")
    except Exception as e:
        print(f"Error logging metrics: {e}")

print("SageMaker Experiments tracking setup complete.")

# TASK #8: CONTINUE WITH REMAINING MODELS

**Instructions for completing the notebook:**

1. **Copy the following sections** into new cells to implement the remaining models:
   - Dual Loss LSTM (predicts both p and q)
   - Enhanced Transformer (125D features)
   - GAN-based Factorization

2. **Follow the same pattern** as Binary LSTM:
   - Dataset creation
   - Model definition
   - Training function
   - Evaluation with β-metrics
   - Visualization

3. **Final comparison section** will compare all models side-by-side

4. **Results will be saved to S3** for the research paper

**Ready to continue with the next model implementation!**