# K-Means Clustering — PyTorch (GPU-Accelerated)

K-Means implementation using PyTorch GPU tensors. Same Lloyd's algorithm as No-Framework, but distance computation and centroid updates run on CUDA cores.

**Dataset**: Dry Beans — 13,543 samples, 16 geometric features, 7 bean types.

## PyTorch Advantages for K-Means
- **`torch.cdist`**: GPU-accelerated pairwise distance computation
- **`torch.argmin`**: Parallel cluster assignment across all samples
- **Boolean masking on GPU**: Efficient centroid updates
- **`torch.multinomial`**: GPU-native weighted random sampling for K-Means++ init

## PyTorch-Specific Showcases
- **`torch.compile`**: JIT-compiles K-Means into optimized GPU kernels (PyTorch 2.0+)
- **`torch.vmap`**: Vectorizes n_init runs to execute in parallel on GPU instead of sequentially


In [2]:
# Imports and configuration

# standard libraries
import sys
import os
import numpy as np

# pytorch for gpu computation
import torch

# add project root to path
sys.path.append(os.path.abspath('../..'))

# project utilities
from utils.data_loader import load_processed_data
from utils.metrics import inertia, silhouette_score, adjusted_rand_index
from utils.visualization import (plot_elbow_curve, plot_silhouette_comparison,
                                  plot_silhouette_analysis, plot_convergence_curve)
from utils.performance import track_performance
from utils.results import save_results, add_result, print_comparison

# Constants across all frameworks
RANDOM_STATE = 113
K_RANGE = range(2, 13)        # Test K=2 through K=12
MAX_ITER = 300                 # Maximum iterations per run
TOL = 1e-4                     # Convergence tolerance (max centroid shift)
N_INIT = 5                     # Number of random initializations
FRAMEWORK = 'PyTorch'

# GPU detection
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if device.type == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Data loading
X_train, X_test, y_train, y_test, metadata = load_processed_data('kmeans')

print("=" * 60)
print(f"Dataset: {metadata['dataset']}")
print(f"Train: {X_train.shape[0]} samples, {X_train.shape[1]} features")
print(f"Test: {X_test.shape[0]} samples")
print(f"Classes: {metadata['n_classes']} bean types")
print("=" * 60)

# Convert to gpu tensors
# Move data to GPU - all k-means computation happens on cuda tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32, device=device)
X_test_t = torch.tensor(X_test, dtype=torch.float32, device=device)

print(f"\nTensors on: {X_train_t.device}")
print(f"X_train_t: {X_train_t.shape}")
print(f"X_test_t:  {X_test_t.shape}")

Using device: cuda
GPU: NVIDIA GeForce RTX 4090
VRAM: 25.8 GB
Dataset: Dry Beans (UCI ML Repository)
Train: 10834 samples, 16 features
Test: 2709 samples
Classes: 7 bean types

Tensors on: cuda:0
X_train_t: torch.Size([10834, 16])
X_test_t:  torch.Size([2709, 16])
