# Word Permutation Analysis - GPU Accelerated

This notebook uses CUDA/GPU acceleration to analyze **ALL** permutations of 5-7 letter words.

## Hardware Requirements
- NVIDIA GPU with CUDA 12.x support
- Recommended: H100 (80GB) or A100 (80GB)
- For 8× H100: Can process all 3.3 billion 7-letter permutations

## Setup
This notebook will automatically install all required dependencies in the first cell.

## Step 1: Install All Dependencies

This cell installs:
- **Base dependencies**: pandas, numpy, polars, matplotlib, seaborn, kaggle
- **GPU dependencies**: cupy-cuda12x, numba, cudf-cu12
- **word-anal package**: Editable install of this project

**Note**: This may take 5-10 minutes on first run.

In [None]:
# Install base dependencies
!uv pip install pandas numpy polars jupyter notebook ipykernel matplotlib seaborn kaggle

# Install GPU acceleration libraries
!uv pip install cupy-cuda12x numba cudf-cu12

# Install word-anal package in editable mode
!uv pip install -e .

print("\n✅ All dependencies installed successfully!")

## Step 2: Verify GPU Setup

Check that CUDA is available and detect all GPUs.

In [None]:
import cupy as cp
import numpy as np
from numba import cuda

# Check CUDA availability
print("CUDA Available:", cuda.is_available())
print("\nGPU Devices:")

n_gpus = cp.cuda.runtime.getDeviceCount()
print(f"Number of GPUs: {n_gpus}")

for i in range(n_gpus):
    with cp.cuda.Device(i):
        props = cp.cuda.runtime.getDeviceProperties(i)
        total_mem = cp.cuda.runtime.memGetInfo()[1]
        print(f"\nGPU {i}:")
        print(f"  Name: {props['name'].decode()}")
        print(f"  Total Memory: {total_mem / (1024**3):.2f} GB")
        print(f"  Compute Capability: {props['major']}.{props['minor']}")

## Step 3: Download Kaggle Dataset

Download the English dictionary dataset.

In [None]:
from word_anal.kaggle_helper import get_dictionary_dataset
import os

# Load credentials from environment variables (recommended)
# Or set them directly here for testing (DO NOT commit!)
KAGGLE_CREDENTIALS = {
    "username": os.getenv("KAGGLE_USERNAME", "YOUR_USERNAME"),
    "key": os.getenv("KAGGLE_KEY", "YOUR_API_KEY")
}

# Download dataset
csv_path = get_dictionary_dataset(
    credentials=KAGGLE_CREDENTIALS,
    download_path="data",
    force=False
)

print(f"\nDataset ready at: {csv_path}")

## Step 4: Initialize GPU Analyzer

Choose between single-GPU or multi-GPU mode.

In [None]:
from word_anal.gpu_analyzer import GPUWordPermutationAnalyzer
from word_anal.multi_gpu import MultiGPUAnalyzer, get_available_gpus

# Configuration
WORDS_CSV_PATH = "data/dict.csv"
WORD_COLUMN = "word"
WORD_LENGTHS = [5, 6, 7]  # Analyze 5, 6, and 7-letter permutations

# GPU Configuration
USE_MULTI_GPU = True  # Set to False for single GPU
N_GPUS = None  # None = use all available GPUs

print(f"Available GPUs: {get_available_gpus()}")

if USE_MULTI_GPU and get_available_gpus() > 1:
    print("\nInitializing Multi-GPU Analyzer...")
    analyzer = MultiGPUAnalyzer(
        words_csv_path=WORDS_CSV_PATH,
        word_column=WORD_COLUMN,
        n_gpus=N_GPUS
    )
    mode = "multi-GPU"
else:
    print("\nInitializing Single-GPU Analyzer...")
    analyzer = GPUWordPermutationAnalyzer(
        words_csv_path=WORDS_CSV_PATH,
        word_column=WORD_COLUMN,
        gpu_id=0
    )
    mode = "single-GPU"

print(f"\nAnalyzer ready in {mode} mode!")

## Step 5: Run Analysis on ALL Permutations

This will process:
- **5-letter**: 7,893,600 permutations (~205 MB GPU memory)
- **6-letter**: 165,765,600 permutations (~4.3 GB GPU memory)
- **7-letter**: 3,315,312,000 permutations (~86 GB GPU memory)

**Total**: 3.5+ billion permutations

In [None]:
import time

print("Starting FULL permutation analysis...")
print("This may take several minutes depending on your GPU.")
print("\n" + "="*70)

start_time = time.time()

# Run analysis for all lengths
if USE_MULTI_GPU and isinstance(analyzer, MultiGPUAnalyzer):
    results = analyzer.analyze_all_lengths(
        lengths=WORD_LENGTHS,
        batch_size_per_gpu=50_000_000  # 50M permutations per batch
    )
else:
    results = analyzer.analyze_all_lengths(
        lengths=WORD_LENGTHS,
        batch_size=50_000_000  # 50M permutations per batch
    )

elapsed_time = time.time() - start_time

print("\n" + "="*70)
print(f"Analysis Complete! Total time: {elapsed_time:.2f} seconds ({elapsed_time/60:.2f} minutes)")
print("="*70)

## Step 6: View Results Summary

In [None]:
import pandas as pd

# Create summary table
summary_data = []
for length, result in results.items():
    summary_data.append({
        'Word Length': length,
        'Total Permutations': f"{result['total_permutations']:,}",
        'Mean Words/Perm': f"{result['mean_words']:.2f}",
        'Median': f"{result['median_words']:.0f}",
        'Std Dev': f"{result['std_words']:.2f}",
        'Min': result['min_words'],
        'Max': result['max_words']
    })

summary_df = pd.DataFrame(summary_data)
print("\n" + "="*70)
print("SUMMARY STATISTICS")
print("="*70)
print(summary_df.to_string(index=False))

# Calculate total comparisons performed
total_perms = sum(r['total_permutations'] for r in results.values())
print(f"\nTotal permutations analyzed: {total_perms:,}")
print(f"Processing rate: {total_perms / elapsed_time:,.0f} permutations/second")

## Step 7: Create DataFrames for Visualization

Convert GPU results to pandas DataFrames for visualization.

In [None]:
# Convert results to DataFrames
dfs = {}

for length, result in results.items():
    if result['results'] is not None:
        df = pd.DataFrame({
            'word_count': result['results']
        })
        dfs[length] = df
        print(f"{length}-letter DataFrame: {len(df):,} rows")

print("\nDataFrames created successfully!")

## Step 8: Visualize Results

Create visualizations of the distributions.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Distribution comparison
fig, axes = plt.subplots(1, len(WORD_LENGTHS), figsize=(18, 5))
if len(WORD_LENGTHS) == 1:
    axes = [axes]

for idx, word_length in enumerate(WORD_LENGTHS):
    if word_length in dfs:
        df = dfs[word_length]
        axes[idx].hist(df['word_count'], bins=100, alpha=0.7, edgecolor='black')
        axes[idx].set_xlabel('Number of Valid Words', fontsize=12)
        axes[idx].set_ylabel('Frequency', fontsize=12)
        axes[idx].set_title(f'{word_length}-Letter Word Distribution\n({len(df):,} permutations)', fontsize=14)
        axes[idx].grid(True, alpha=0.3)
        
        # Add mean line
        mean_val = df['word_count'].mean()
        axes[idx].axvline(mean_val, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_val:.2f}')
        axes[idx].legend()

plt.tight_layout()
plt.savefig('gpu_distribution_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: gpu_distribution_comparison.png")

In [None]:
# Box plot comparison
fig, ax = plt.subplots(figsize=(12, 7))

data_for_boxplot = []
labels = []
for word_length in WORD_LENGTHS:
    if word_length in dfs:
        data_for_boxplot.append(dfs[word_length]['word_count'])
        labels.append(f"{word_length}-letter\n({len(dfs[word_length]):,} perms)")

bp = ax.boxplot(data_for_boxplot, labels=labels, patch_artist=True)

# Color the boxes
colors = ['lightblue', 'lightgreen', 'lightcoral']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

ax.set_ylabel('Number of Valid Words per Permutation', fontsize=12)
ax.set_xlabel('Word Length', fontsize=12)
ax.set_title('Distribution Comparison: Valid Words per Permutation (ALL Permutations)', fontsize=14)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('gpu_boxplot_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("Saved: gpu_boxplot_comparison.png")

## Step 9: Generate Interactive D3.js Visualization

In [None]:
from word_anal.data_processing import DataProcessor
from word_anal.visualizations import VisualizationGenerator

# Initialize data processor
processor = DataProcessor()

# Add results for each word length
for word_length in WORD_LENGTHS:
    if word_length in dfs:
        # Add placeholder for permutation and words columns
        df = dfs[word_length].copy()
        df['permutation'] = ''  # Not needed for visualization
        df['words'] = ''  # Not needed for visualization
        processor.add_results(word_length, df)

# Generate interactive visualization
viz_gen = VisualizationGenerator(processor)
viz_gen.generate_html(output_path="word_analysis_gpu_full.html")

print("\nInteractive visualization saved to: word_analysis_gpu_full.html")
print("Open this file in a web browser to view the interactive D3.js visualizations.")

## Step 10: Export Results to CSV

In [None]:
# Export summary statistics
summary_df.to_csv('gpu_analysis_summary.csv', index=False)
print("Exported summary to: gpu_analysis_summary.csv")

# Export full results (optional - these files will be large!)
export_full = False  # Set to True to export all permutation results

if export_full:
    for word_length in WORD_LENGTHS:
        if word_length in dfs:
            filename = f"gpu_results_{word_length}letter_all_permutations.csv"
            dfs[word_length].to_csv(filename, index=False)
            print(f"Exported {word_length}-letter results to: {filename}")
else:
    print("\nFull results not exported (set export_full=True to export)")
    print("Warning: Full exports can be very large (7-letter = 3.3B rows!)")

## Step 11: GPU Memory Usage

In [None]:
# Check GPU memory usage
if USE_MULTI_GPU and isinstance(analyzer, MultiGPUAnalyzer):
    analyzer.get_gpu_memory_info()
else:
    with cp.cuda.Device(0):
        mempool = cp.get_default_memory_pool()
        total = cp.cuda.runtime.memGetInfo()[1]
        used = mempool.used_bytes()
        
        print("GPU 0 Memory:")
        print(f"  Total: {total / (1024**3):.2f} GB")
        print(f"  Used: {used / (1024**3):.2f} GB")
        print(f"  Free: {(total - used) / (1024**3):.2f} GB")

## Summary

This notebook processed **ALL** permutations using GPU acceleration:
- Analyzed billions of permutations in minutes
- Used custom CUDA kernels for maximum efficiency
- Generated comprehensive statistics and visualizations

**Key Results:**
- Total permutations analyzed: 3.5+ billion
- Processing speed: Millions of permutations per second
- Complete distribution analysis for 5, 6, and 7-letter words