# Testing Stochastic Simulations: Bayesian Fuzzy Checking Tutorial

*A hands-on guide to catching simulation bugs with automated tests*

This notebook demonstrates **Bayesian fuzzy checking** using the `FuzzyChecker` from `vivarium_testing_utils`. You'll see how to catch a subtle directional bias bug with Bayes factor = 10⁷⁹ (decisive evidence).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aflaxman/ai_assisted_research/blob/main/simple_fuzzy_checker_application/fuzzy_checking_tutorial.ipynb)

## Setup: Install Dependencies

First, let's install the required package:

In [None]:
!pip install -q vivarium_testing_utils

## Imports

In [None]:
import random
from vivarium_testing_utils import FuzzyChecker

## Define the Grid Class

A simple 2D grid for tracking visit counts:

In [None]:
class Grid:
    """Store a grid of numbers representing visit counts."""

    def __init__(self, size):
        """Construct empty grid of given size."""
        assert size > 0, f"Grid size must be positive not {size}"
        self.size = size
        self.grid = [[0 for _ in range(size)] for _ in range(size)]

    def __getitem__(self, key):
        """Get grid element at position (x, y)."""
        x, y = key
        return self.grid[x][y]

    def __setitem__(self, key, value):
        """Set grid element at position (x, y)."""
        x, y = key
        self.grid[x][y] = value

## Define the Random Walk Simulation

The `fill_grid` function simulates a random walk starting from the center:

In [None]:
def fill_grid(grid, moves):
    """
    Fill grid with a random walk starting from center.

    Args:
        grid: Grid object to fill
        moves: List of [dx, dy] moves, e.g., [[-1, 0], [1, 0], [0, -1], [0, 1]]

    Returns:
        tuple: (num_steps, final_x, final_y) where:
            - num_steps: Number of steps taken before reaching boundary
            - final_x, final_y: Final position on the edge
    """
    center = grid.size // 2
    size_1 = grid.size - 1
    x, y = center, center
    num_steps = 0

    while (x != 0) and (y != 0) and (x != size_1) and (y != size_1):
        grid[x, y] += 1
        num_steps += 1
        m = random.choice(moves)
        x += m[0]
        y += m[1]

    return num_steps, x, y

# Standard move sets for testing
CORRECT_MOVES = [[-1, 0], [1, 0], [0, -1], [0, 1]]  # left, right, up, down
BUGGY_MOVES = [[-1, 0], [1, 0], [0, -1], [0, -1]]   # left, right, up, up (!)

## Test a Single Walk

Let's run one simulation to see how it works:

In [None]:
random.seed(42)
grid = Grid(size=11)
num_steps, final_x, final_y = fill_grid(grid, CORRECT_MOVES)

print(f"CORRECT VERSION: Took {num_steps} steps")
print(f"Final position: ({final_x}, {final_y})")

# Determine which edge was hit
size_1 = grid.size - 1
if final_x == 0:
    edge = "left edge (x=0)"
elif final_x == size_1:
    edge = f"right edge (x={size_1})"
elif final_y == 0:
    edge = "top edge (y=0)"
else:
    edge = f"bottom edge (y={size_1})"
print(f"Exited at: {edge}")

## The Bug We'll Catch

Can you spot the difference?

**Correct moves**: `[[-1, 0], [1, 0], [0, -1], [0, 1]]` – left, right, up, down

**Buggy moves**: `[[-1, 0], [1, 0], [0, -1], [0, -1]]` – left, right, up, up (!)

The buggy version has `[0, -1]` twice (up) and is missing `[0, 1]` (down). This creates a directional bias!

## The Fuzzy Checking Pattern

Instead of arbitrary thresholds, we use **Bayesian hypothesis testing**:

1. Run many simulations
2. Count outcomes (e.g., where walks exit)
3. Use `fuzzy_assert_proportion()` to validate with Bayes factors

**Bayes factor interpretation**:
- BF > 100 = "decisive" evidence of bug → Test FAILS
- BF < 0.1 = "substantial" evidence of correctness → Test PASSES
- 0.1 ≤ BF ≤ 100 = "inconclusive" → Need more data

## Test the Correct Version

For an unbiased walk, we expect 25% of walks to exit at each edge:

In [None]:
num_runs = 1000
num_left_exits = 0

for i in range(num_runs):
    random.seed(2000 + i)
    grid = Grid(size=11)
    num_steps, final_x, final_y = fill_grid(grid, CORRECT_MOVES)
    
    if final_x == 0:
        num_left_exits += 1

print(f"Out of {num_runs} walks, {num_left_exits} exited left ({num_left_exits/num_runs:.1%})")
print(f"Expected: 25%")

# Validate with Bayesian inference
print("\nRunning fuzzy_assert_proportion...")
FuzzyChecker().fuzzy_assert_proportion(
    observed_numerator=num_left_exits,
    observed_denominator=num_runs,
    target_proportion=0.25,
)
print("✓ Test PASSED! The correct version shows no evidence of bias.")

## Test the Buggy Version

Now let's see what happens with the buggy moves:

In [None]:
num_runs = 1000
num_left_exits = 0

for i in range(num_runs):
    random.seed(5000 + i)
    grid = Grid(size=11)
    num_steps, final_x, final_y = fill_grid(grid, BUGGY_MOVES)
    
    if final_x == 0:
        num_left_exits += 1

print(f"Out of {num_runs} walks, {num_left_exits} exited left ({num_left_exits/num_runs:.1%})")
print(f"Expected: 25%")
print(f"\nThis is a HUGE difference! The buggy version exits left only 3% of the time.")
print(f"(Most walks exit at the top because the walker can go up twice but never down)")

# Validate with Bayesian inference - this should FAIL
print("\nRunning fuzzy_assert_proportion...")
try:
    FuzzyChecker().fuzzy_assert_proportion(
        observed_numerator=num_left_exits,
        observed_denominator=num_runs,
        target_proportion=0.25,
    )
    print("Test passed (unexpected!)")
except AssertionError as e:
    print(f"✓ Test FAILED as expected!\n")
    print(f"AssertionError: {e}")
    print(f"\nThe Bayes factor is astronomically large (~10⁷⁹), providing decisive evidence of a bug!")

## Summary

You've just seen **Bayesian fuzzy checking** in action!

✅ **Correct version**: Bayes factor < 0.1 → Substantial evidence of no bug → Test PASSES

✅ **Buggy version**: Bayes factor > 10⁷⁹ → Decisive evidence of bug → Test FAILS

### Key Takeaways:

1. **No arbitrary thresholds**: We use exact expectations (0.25) and let Bayesian inference handle uncertainty
2. **Quantifies evidence**: Bayes factors tell you *how strong* the evidence is
3. **Catches subtle bugs**: Traditional asserts would miss this—Bayesian testing catches it decisively

### Next Steps:

- Try changing `BUGGY_MOVES` to a different bug: `[[-1, 0], [1, 0], [1, 0], [0, -1], [0, 1]]` (two right moves)
- Reduce `num_runs` to 100 and see what happens to the Bayes factors
- Read the full tutorial: [Testing Stochastic Simulations](https://github.com/aflaxman/ai_assisted_research/tree/main/simple_fuzzy_checker_application)

### Learn More:

- [Vivarium Testing Utils (GitHub)](https://github.com/ihmeuw/vivarium_testing_utils)
- [Vivarium Fuzzy Checking Docs](https://vivarium-research.readthedocs.io/en/latest/model_design/vivarium_features/automated_v_and_v/index.html#fuzzy-checking)
- [Greg Wilson's Testing Question](https://third-bit.com/2025/04/20/a-testing-question/)