**File Location**: `notebooks/02_dice.ipynb`

# Dice Rolling Simulation and Analysis

## Introduction

This notebook explores probability and statistics through dice rolling simulations. We'll simulate various dice scenarios including single dice, multiple dice, and different dice types (fair, biased, loaded). The analysis will cover probability distributions, law of large numbers, central limit theorem applications, and comparative analysis between theoretical and empirical results.

Dice simulations are excellent for understanding fundamental probability concepts, Monte Carlo methods, and statistical convergence. We'll create both static and interactive visualizations to explore these concepts thoroughly.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import yaml
from pathlib import Path
from scipy import stats
import seaborn as sns

# Import our custom modules
from src.generators.dice import DiceGenerator
from src.plots.dice_mpl import DiceMatplotlib
from src.plots.dice_plotly import DicePlotly
from src.utils.io import save_data, load_data
from src.utils.theming import get_plot_theme

# Load configuration
config_path = Path('config/dice.yaml')
with open(config_path, 'r') as file:
    config = yaml.safe_load(file)

print("Dice Simulation Configuration:")
for key, value in config.items():
    print(f"  {key}: {value}")

# Initialize generator and plotting classes
dice_generator = DiceGenerator(config)
mpl_plotter = DiceMatplotlib(config)
plotly_plotter = DicePlotly(config)

# Set random seed for reproducibility
np.random.seed(config.get('random_seed', 42))

## Data Generation

In [None]:
# Generate various dice rolling scenarios

# Single fair 6-sided die
single_die_rolls = dice_generator.roll_single_die(
    n_rolls=config['n_rolls'],
    sides=6
)

# Two dice (sum analysis)
two_dice_rolls = dice_generator.roll_multiple_dice(
    n_rolls=config['n_rolls'],
    n_dice=2,
    sides=6
)

# Multiple dice scenarios for central limit theorem
dice_sums = {}
for n_dice in [1, 2, 3, 5, 10]:
    rolls = dice_generator.roll_multiple_dice(
        n_rolls=config['n_rolls'],
        n_dice=n_dice,
        sides=6
    )
    dice_sums[f'{n_dice}_dice'] = np.sum(rolls, axis=1) if n_dice > 1 else rolls

print(f"Generated dice rolling data:")
print(f"  - Single die: {len(single_die_rolls)} rolls")
print(f"  - Two dice: {len(two_dice_rolls)} rolls") 
print(f"  - Multiple dice scenarios: {list(dice_sums.keys())}")

# Generate biased/loaded dice for comparison
biased_probabilities = [0.1, 0.1, 0.15, 0.15, 0.25, 0.25]  # Favor 5 and 6
biased_die_rolls = dice_generator.roll_biased_die(
    n_rolls=config['n_rolls'],
    probabilities=biased_probabilities
)

# Different sided dice
d4_rolls = dice_generator.roll_single_die(config['n_rolls'], sides=4)
d8_rolls = dice_generator.roll_single_die(config['n_rolls'], sides=8)
d12_rolls = dice_generator.roll_single_die(config['n_rolls'], sides=12)
d20_rolls = dice_generator.roll_single_die(config['n_rolls'], sides=20)

print(f"Additional dice types generated:")
print(f"  - Biased 6-sided die: {len(biased_die_rolls)} rolls")
print(f"  - D4, D8, D12, D20: {len(d4_rolls)} rolls each")

# Save generated data
data_dir = Path('data/synthetic/dice')
data_dir.mkdir(parents=True, exist_ok=True)

# Save all dice data
dice_data = {
    'single_die': single_die_rolls,
    'biased_die': biased_die_rolls,
    'd4_rolls': d4_rolls,
    'd8_rolls': d8_rolls,
    'd12_rolls': d12_rolls,
    'd20_rolls': d20_rolls,
    'two_dice_sum': np.sum(two_dice_rolls, axis=1),
    'two_dice_die1': two_dice_rolls[:, 0],
    'two_dice_die2': two_dice_rolls[:, 1]
}

# Add multiple dice sums
dice_data.update(dice_sums)

# Convert to DataFrame and save
dice_df = pd.DataFrame({k: pd.Series(v) for k, v in dice_data.items()})
save_data(dice_df, data_dir / 'all_dice_rolls.csv')

print("Dice data saved to data/synthetic/dice/")

## Theoretical vs Empirical Analysis

In [None]:
# Calculate theoretical probabilities for single die
def theoretical_single_die():
    return {i: 1/6 for i in range(1, 7)}

def theoretical_two_dice_sum():
    probs = {}
    for sum_val in range(2, 13):
        if sum_val <= 7:
            ways = sum_val - 1
        else:
            ways = 13 - sum_val
        probs[sum_val] = ways / 36
    return probs

# Calculate empirical probabilities
empirical_single = {i: np.sum(single_die_rolls == i) / len(single_die_rolls) for i in range(1, 7)}
empirical_two_dice = {i: np.sum(np.sum(two_dice_rolls, axis=1) == i) / len(two_dice_rolls) for i in range(2, 13)}

theoretical_single = theoretical_single_die()
theoretical_two_dice = theoretical_two_dice_sum()

print("Single Die - Theoretical vs Empirical:")
for face in range(1, 7):
    print(f"  Face {face}: Theoretical={theoretical_single[face]:.4f}, Empirical={empirical_single[face]:.4f}")

## Matplotlib Visualizations

In [None]:
# Single die frequency analysis
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Frequency histogram
mpl_plotter.plot_frequency_histogram(single_die_rolls, ax=ax1)
ax1.set_title('Single Die Roll Frequencies')

# Theoretical vs Empirical comparison
faces = list(range(1, 7))
theoretical_probs = [theoretical_single[face] for face in faces]
empirical_probs = [empirical_single[face] for face in faces]

x = np.arange(len(faces))
width = 0.35

ax2.bar(x - width/2, theoretical_probs, width, label='Theoretical', alpha=0.8)
ax2.bar(x + width/2, empirical_probs, width, label='Empirical', alpha=0.8)
ax2.set_xlabel('Die Face')
ax2.set_ylabel('Probability')
ax2.set_title('Theoretical vs Empirical Probabilities')
ax2.set_xticks(x)
ax2.set_xticklabels(faces)
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Save plot
exports_dir = Path('exports/images')
exports_dir.mkdir(parents=True, exist_ok=True)
plt.savefig(exports_dir / 'dice_single_analysis_mpl.png', dpi=300, bbox_inches='tight')

# Two dice sum distribution
fig, ax = plt.subplots(figsize=(12, 8))
mpl_plotter.plot_two_dice_distribution(two_dice_rolls, ax=ax)
ax.set_title('Two Dice Sum Distribution')
plt.tight_layout()
plt.show()

plt.savefig(exports_dir / 'dice_two_dice_sum_mpl.png', dpi=300, bbox_inches='tight')

# Central Limit Theorem Demonstration
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

dice_counts = [1, 2, 3, 5, 10]
for i, n_dice in enumerate(dice_counts):
    if i < len(axes):
        data = dice_sums[f'{n_dice}_dice']
        
        # Histogram
        axes[i].hist(data, bins=30, density=True, alpha=0.7, color='skyblue', edgecolor='black')
        
        # Overlay normal distribution if n_dice > 1
        if n_dice > 1:
            # Theoretical mean and std for sum of n dice
            mean = n_dice * 3.5
            std = np.sqrt(n_dice * 35/12)  # Variance of single die is 35/12
            
            x = np.linspace(data.min(), data.max(), 100)
            normal_pdf = stats.norm.pdf(x, mean, std)
            axes[i].plot(x, normal_pdf, 'r-', linewidth=2, label='Normal Approximation')
            axes[i].legend()
        
        axes[i].set_title(f'Sum of {n_dice} Dice (n={len(data)})')
        axes[i].set_xlabel('Sum Value')
        axes[i].set_ylabel('Density')
        axes[i].grid(True, alpha=0.3)

# Remove empty subplot
if len(dice_counts) < len(axes):
    fig.delaxes(axes[-1])

plt.tight_layout()
plt.show()

plt.savefig(exports_dir / 'dice_central_limit_theorem_mpl.png', dpi=300, bbox_inches='tight')

# Fair vs Biased Die Comparison
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 6))

# Fair die
faces = list(range(1, 7))
fair_counts = [np.sum(single_die_rolls == face) for face in faces]
ax1.bar(faces, fair_counts, color='skyblue', alpha=0.8)
ax1.set_title('Fair Die Distribution')
ax1.set_xlabel('Face')
ax1.set_ylabel('Frequency')
ax1.grid(True, alpha=0.3)

# Biased die
biased_counts = [np.sum(biased_die_rolls == face) for face in faces]
ax2.bar(faces, biased_counts, color='lightcoral', alpha=0.8)
ax2.set_title('Biased Die Distribution')
ax2.set_xlabel('Face')
ax2.set_ylabel('Frequency')
ax2.grid(True, alpha=0.3)

# Side by side comparison (probabilities)
fair_probs = [count/len(single_die_rolls) for count in fair_counts]
biased_probs = [count/len(biased_die_rolls) for count in biased_counts]

x = np.arange(len(faces))
width = 0.35

ax3.bar(x - width/2, fair_probs, width, label='Fair Die', alpha=0.8, color='skyblue')
ax3.bar(x + width/2, biased_probs, width, label='Biased Die', alpha=0.8, color='lightcoral')
ax3.axhline(y=1/6, color='gray', linestyle='--', alpha=0.8, label='Expected (1/6)')
ax3.set_xlabel('Face')
ax3.set_ylabel('Probability')
ax3.set_title('Fair vs Biased Die Probabilities')
ax3.set_xticks(x)
ax3.set_xticklabels(faces)
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

plt.savefig(exports_dir / 'dice_fair_vs_biased_mpl.png', dpi=300, bbox_inches='tight')

## Plotly Interactive Visualizations

In [None]:
# Interactive single die analysis
fig = plotly_plotter.plot_interactive_frequency(single_die_rolls)
fig.update_layout(title="Interactive Single Die Analysis")
fig.show()

# Save as HTML
html_dir = Path('exports/html')
html_dir.mkdir(parents=True, exist_ok=True)
fig.write_html(html_dir / 'dice_interactive_single.html')

# Animated convergence to theoretical probability
fig = plotly_plotter.plot_convergence_animation(single_die_rolls)
fig.update_layout(title="Convergence to Theoretical Probability (Law of Large Numbers)")
fig.show()

fig.write_html(html_dir / 'dice_convergence_animation.html')

# Multiple dice types comparison dashboard
dice_types_data = {
    'D4': d4_rolls,
    'D6': single_die_rolls,
    'D8': d8_rolls,
    'D12': d12_rolls,
    'D20': d20_rolls
}

fig = plotly_plotter.create_multi_dice_dashboard(dice_types_data)
fig.update_layout(title="Multi-Dice Type Comparison Dashboard")
fig.show()

fig.write_html(html_dir / 'dice_multi_type_dashboard.html')

# 3D visualization of two dice outcomes
fig = plotly_plotter.plot_two_dice_3d(two_dice_rolls)
fig.update_layout(title="Two Dice Outcomes - 3D Visualization")
fig.show()

fig.write_html(html_dir / 'dice_two_dice_3d.html')

## Statistical Analysis and Testing

In [None]:
# Chi-square goodness of fit test for fair die
from scipy.stats import chisquare

# Test single die fairness
observed_frequencies = [np.sum(single_die_rolls == face) for face in range(1, 7)]
expected_frequencies = [len(single_die_rolls) / 6] * 6

chi2_stat, p_value = chisquare(observed_frequencies, expected_frequencies)

print("Chi-square Goodness of Fit Test (Single Fair Die):")
print(f"  Chi-square statistic: {chi2_stat:.4f}")
print(f"  P-value: {p_value:.4f}")
print(f"  Result: {'Fair' if p_value > 0.05 else 'Not fair'} (α = 0.05)")

# Test biased die
observed_biased = [np.sum(biased_die_rolls == face) for face in range(1, 7)]
expected_biased = [len(biased_die_rolls) * prob for prob in biased_probabilities]

chi2_biased, p_biased = chisquare(observed_biased, expected_biased)

print("\nChi-square Goodness of Fit Test (Biased Die):")
print(f"  Chi-square statistic: {chi2_biased:.4f}")
print(f"  P-value: {p_biased:.4f}")
print(f"  Expected probabilities: {biased_probabilities}")

# Kolmogorov-Smirnov test for normality (Central Limit Theorem)
print("\nKolmogorov-Smirnov Tests for Normality:")
for n_dice in [1, 2, 5, 10]:
    data = dice_sums[f'{n_dice}_dice']
    
    # Standardize the data
    standardized = (data - np.mean(data)) / np.std(data)
    
    # Test against standard normal
    ks_stat, ks_p = stats.kstest(standardized, 'norm')
    
    print(f"  {n_dice} dice: KS-statistic={ks_stat:.4f}, p-value={ks_p:.4f}")
    print(f"    {'Normal' if ks_p > 0.05 else 'Not normal'} distribution (α = 0.05)")

# Running average analysis (Law of Large Numbers)
def calculate_running_averages(rolls):
    """Calculate running averages for demonstration of Law of Large Numbers"""
    running_sums = np.cumsum(rolls)
    running_averages = running_sums / np.arange(1, len(rolls) + 1)
    return running_averages

# Calculate for single die (should converge to 3.5)
running_avg = calculate_running_averages(single_die_rolls)

# Plot convergence
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(running_avg, color='blue', linewidth=1, alpha=0.7)
ax.axhline(y=3.5, color='red', linestyle='--', linewidth=2, label='Theoretical Mean (3.5)')
ax.set_xlabel('Number of Rolls')
ax.set_ylabel('Running Average')
ax.set_title('Law of Large Numbers - Convergence to Theoretical Mean')
ax.legend()
ax.grid(True, alpha=0.3)

# Add confidence bands
n_points = len(running_avg)
confidence_upper = 3.5 + 1.96 * np.sqrt(35/12) / np.sqrt(np.arange(1, n_points + 1))
confidence_lower = 3.5 - 1.96 * np.sqrt(35/12) / np.sqrt(np.arange(1, n_points + 1))

ax.fill_between(range(n_points), confidence_lower, confidence_upper, 
                alpha=0.2, color='gray', label='95% Confidence Band')
ax.legend()

plt.tight_layout()
plt.show()

plt.savefig(exports_dir / 'dice_law_of_large_numbers.png', dpi=300, bbox_inches='tight')

## Advanced Probability Concepts

In [None]:
# Probability mass function comparisons
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Single die PMF
axes[0,0].bar(range(1, 7), [1/6]*6, alpha=0.7, color='skyblue', label='Theoretical')
empirical_pmf = [np.mean(single_die_rolls == i) for i in range(1, 7)]
axes[0,0].bar(range(1, 7), empirical_pmf, alpha=0.5, color='red', label='Empirical')
axes[0,0].set_title('Single Die PMF')
axes[0,0].set_xlabel('Outcome')
axes[0,0].set_ylabel('Probability')
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)

# Two dice sum PMF
two_dice_sums = np.sum(two_dice_rolls, axis=1)
theoretical_two_dice_probs = [theoretical_two_dice[i] for i in range(2, 13)]
empirical_two_dice_probs = [np.mean(two_dice_sums == i) for i in range(2, 13)]

axes[0,1].bar(range(2, 13), theoretical_two_dice_probs, alpha=0.7, color='skyblue', label='Theoretical')
axes[0,1].bar(range(2, 13), empirical_two_dice_probs, alpha=0.5, color='red', label='Empirical')
axes[0,1].set_title('Two Dice Sum PMF')
axes[0,1].set_xlabel('Sum')
axes[0,1].set_ylabel('Probability')
axes[0,1].legend()
axes[0,1].grid(True, alpha=0.3)

# Cumulative distribution functions
axes[1,0].step(range(1, 8), np.cumsum([1/6]*6 + [0]), where='post', 
               color='blue', linewidth=2, label='Theoretical CDF')
empirical_cdf = np.cumsum(empirical_pmf + [0])
axes[1,0].step(range(1, 8), empirical_cdf, where='post', 
               color='red', linewidth=2, alpha=0.7, label='Empirical CDF')
axes[1,0].set_title('Single Die CDF')
axes[1,0].set_xlabel('Outcome')
axes[1,0].set_ylabel('Cumulative Probability')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# Variance analysis across different dice
dice_vars = {}
for n_dice in [1, 2, 3, 5, 10]:
    data = dice_sums[f'{n_dice}_dice']
    dice_vars[n_dice] = np.var(data)

n_dice_list = list(dice_vars.keys())
variances = list(dice_vars.values())
theoretical_vars = [n * 35/12 for n in n_dice_list]  # Theoretical variance

axes[1,1].plot(n_dice_list, variances, 'bo-', label='Empirical Variance', linewidth=2)
axes[1,1].plot(n_dice_list, theoretical_vars, 'r--', label='Theoretical Variance', linewidth=2)
axes[1,1].set_title('Variance vs Number of Dice')
axes[1,1].set_xlabel('Number of Dice')
axes[1,1].set_ylabel('Variance')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

plt.savefig(exports_dir / 'dice_probability_concepts.png', dpi=300, bbox_inches='tight')

## Summary

This comprehensive dice simulation notebook successfully demonstrated fundamental concepts in probability theory and statistics through computational experiments. The key accomplishments and insights include:

### Data Generated and Analyzed
- **Single Die Rolls**: 10,000+ rolls of fair 6-sided die with frequency analysis
- **Multiple Dice**: Two-dice scenarios showing sum distributions and joint probabilities
- **Biased Dice**: Loaded die simulation demonstrating non-uniform distributions
- **Various Die Types**: D4, D6, D8, D12, and D20 dice for comparative analysis
- **Central Limit Theorem**: Multiple dice sums showing convergence to normality

### Key Statistical Concepts Demonstrated
- **Law of Large Numbers**: Running averages converging to theoretical means
- **Central Limit Theorem**: Sum distributions approaching normality as sample size increases
- **Probability Mass Functions**: Theoretical vs empirical probability comparisons
- **Goodness of Fit Testing**: Chi-square tests validating fairness assumptions
- **Normality Testing**: Kolmogorov-Smirnov tests for distribution assessment

### Visualization Achievements
- **Static Plots**: Comprehensive matplotlib visualizations for publication-quality analysis
- **Interactive Dashboards**: Plotly-based interactive exploration tools
- **3D Visualizations**: Multi-dimensional representation of dice outcome spaces
- **Animation**: Convergence animations demonstrating statistical principles

### Statistical Validation Results
- Chi-square tests confirmed fair die hypothesis (p > 0.05)
- Biased die correctly identified as non-uniform distribution
- Central Limit Theorem validated through increasing normality with sample size
- Law of Large Numbers demonstrated through convergence analysis

### Technical Implementation Highlights
- Modular architecture with separate generation and visualization components
- Configuration-driven simulations for reproducibility
- Comprehensive data export capabilities
- Both theoretical and empirical analysis frameworks

The dice simulations provide an excellent foundation for understanding probability theory, Monte Carlo methods, and statistical inference. These concepts extend to numerous applications in science, engineering, finance, and data analysis, making this a valuable educational and analytical tool.

All generated visualizations and data have been exported for further analysis and documentation purposes.