# Bayes Core

> Core Bayesian inference functions - updates, sequential processing, and posterior predictive

In [None]:
#| default_exp rbe.bayes_core

In [None]:
#| hide
from nbdev.showdoc import *

In [None]:
#| export
import numpy as np
from typing import Optional, Union, List, Callable
from fastcore.test import test_eq, test_close
from fastcore.all import *
from technical_blog.rbe.probability import normalize, sample

## Core Bayesian Updates

The heart of Bayesian inference - updating beliefs with evidence.

The `update()` function is the core implementation of **Bayes' theorem** - it's how we mathematically update our beliefs when we receive new evidence. Let me break it down:

### What Bayes' Theorem Does

Bayes' theorem tells us how to revise our beliefs (prior) when we observe new evidence:

$$P(H|E) = \frac{P(E|H) \times P(H)}{P(E)}$$

Where:
- **P(H|E)** = posterior (updated belief after seeing evidence)
- **P(E|H)** = likelihood (how probable the evidence is under each hypothesis)
- **P(H)** = prior (our initial belief before seeing evidence)
- **P(E)** = evidence (total probability of observing this evidence)

## How the Function Works

```python
def update(prior, likelihood, evidence=None):
    # Returns: (prior * likelihood) / evidence
```

**Step 1: Input Validation**
- Ensures prior and likelihood have the same shape
- Checks for non-negative values (probabilities can't be negative)
- Auto-normalizes the prior if it doesn't sum to 1

**Step 2: Calculate Evidence**
If not provided, evidence is computed as: `evidence = sum(prior * likelihood)`

This represents the total probability of seeing the evidence across all possible hypotheses.

**Step 3: Apply Bayes' Rule**
Returns `(prior * likelihood) / evidence`



In [None]:
#| export
def update(prior, # Prior probabilities
           likelihood, # Likelihood of evidence given hypothesis
           evidence=None # Optional evidence, defaults to sum(prior * likelihood)
           ):
    """Update prior beliefs with likelihood using Bayes' theorem."""
    prior = np.asarray(prior, dtype=np.float64)
    likelihood = np.asarray(likelihood, dtype=np.float64)
    
    # Validate inputs
    if prior.shape != likelihood.shape: raise ValueError(f"Prior and likelihood shapes don't match: {prior.shape} vs {likelihood.shape}")
    if np.any(prior < 0) or np.any(likelihood < 0): raise ValueError("Prior and likelihood must be non-negative")
    # Normalize prior if needed (common in practice)
    if not np.isclose(np.sum(prior), 1.0): prior = normalize(prior)
    # Compute evidence if not provided
    if evidence is None: evidence = np.sum(prior * likelihood)
    # Check for impossible observation
    if evidence == 0: raise ValueError("Impossible observation: zero evidence")
    # Numerical stability check
    if evidence < 1e-15:
        import warnings
        warnings.warn("Very small evidence value - numerical instability possible")
    
    return (prior * likelihood) / evidence


## Cyber Security Example

Imagine you're detecting network intrusions:

In [None]:
# Prior beliefs about network state
prior = [0.9, 0.08, 0.02]  # [normal, suspicious, attack]

# New evidence: unusual port scanning detected
# Likelihood of seeing port scans under each hypothesis
likelihood = [0.01, 0.7, 0.95]  # Very unlikely if normal, likely if attack

# Update beliefs
posterior = update(prior, likelihood)
# Result: attack probability increases significantly!
posterior


array([0.10714286, 0.66666667, 0.22619048])


## Key Features for Robust Applications

**Automatic Normalization**: Handles unnormalized priors (common when combining multiple sources)

**Error Handling**: 
- Detects impossible observations (zero evidence)
- Warns about numerical instability
- Validates input shapes and non-negativity

**Numerical Stability**: Uses float64 precision to handle the tiny probabilities common in some applications

The beauty is that this single function encapsulates the mathematical foundation of all Bayesian learning - whether you're tracking individual threats or updating complex network models, it all comes down to this core update rule!

In [None]:
# Test Bayesian updates
prior = np.array([0.3, 0.7])
likelihood = np.array([0.8, 0.2])
posterior = update(prior, likelihood)
test_close(np.sum(posterior), 1.0)
assert posterior[0] > prior[0]  # First hypothesis should increase
# Test with unnormalized prior (common in practice)
unnorm_prior = [3, 7]  # Sums to 10, not 1
likelihood = [0.8, 0.2]
posterior = update(unnorm_prior, likelihood)
test_close(np.sum(posterior), 1.0)

# Test numerical stability with tiny values
tiny_prior = [1e-10, 1-1e-10]
tiny_likelihood = [1e-10, 1-1e-10]
posterior = update(tiny_prior, tiny_likelihood)
test_close(np.sum(posterior), 1.0)

# Test shape mismatch error
try:
    update([0.5, 0.5], [0.8, 0.2, 0.1])
    assert False, "Should raise ValueError for shape mismatch"
except ValueError as e:
    assert "shapes don't match" in str(e)

### Sequential
The `sequential` function implements **sequential Bayesian updating** - it's how you process multiple observations one after another, updating your beliefs with each new piece of evidence.


In [None]:
#| export
def sequential(priors,  # prior probabilities of hypotheses 
               likelihoods, # likelihoods of observations given hypotheses
               evidences=None # evidence for each observation
               ):
    """Sequential Bayesian updates with multiple observations."""
    priors = np.asarray(priors, dtype=np.float64)
    likelihoods = np.asarray(likelihoods, dtype=np.float64)
    
    # Validate inputs
    if len(likelihoods) == 0:
        return np.array([priors])
    
    if likelihoods.ndim != 2:
        raise ValueError("Likelihoods must be 2D array (n_observations, n_hypotheses)")
    
    if likelihoods.shape[1] != len(priors):
        raise ValueError(f"Likelihood shape {likelihoods.shape} incompatible with prior length {len(priors)}")
    
    if evidences is None:
        evidences = [None] * len(likelihoods)
    elif len(evidences) != len(likelihoods):
        raise ValueError("Number of evidences must match number of likelihoods")
    
    # Perform sequential updates
    posterior = priors.copy()
    posteriors = [posterior.copy()]
    
    for likelihood, evidence in zip(likelihoods, evidences):
        posterior = update(posterior, likelihood, evidence)
        posteriors.append(posterior.copy())
    
    return np.array(posteriors)


#### What It Does

Instead of updating beliefs with just one observation (like the basic `update` function), `sequential` handles a whole series of observations:

In [None]:
# Start with initial beliefs
prior = [0.95, 0.04, 0.01]  # [normal, suspicious, attack]

# Process multiple observations over time
observations = [
    [0.1, 0.6, 0.9],   # High anomaly score
    [0.05, 0.8, 0.95], # Even higher anomaly  
    [0.2, 0.3, 0.7],   # Moderate anomaly
    [0.9, 0.1, 0.05]   # Back to normal
]

timeline = sequential(prior, observations)
# Returns: array of beliefs after each observation
timeline


array([[0.95      , 0.04      , 0.01      ],
       [0.7421875 , 0.1875    , 0.0703125 ],
       [0.14615385, 0.59076923, 0.26307692],
       [0.07483261, 0.45372194, 0.47144545],
       [0.49414824, 0.33289987, 0.17295189]])


#### How It Works

The function performs these steps:

1. **Starts with your prior beliefs**
2. **For each observation**:
   - Takes current beliefs as the "prior" for this update
   - Applies Bayes' theorem using the observation's likelihood
   - The resulting posterior becomes the prior for the next observation
3. **Returns the complete timeline** of how beliefs evolved

Mathematically, it's chaining Bayes' updates:
- After obs 1: `P(H|obs1) = P(obs1|H) × P(H) / P(obs1)`
- After obs 2: `P(H|obs1,obs2) = P(obs2|H) × P(H|obs1) / P(obs2)`
- And so on...

#### Key Features for Cyber Security

**Timeline Tracking**: You get the complete evolution of beliefs, not just the final result. This lets you see:
- When threat probability peaked
- How quickly beliefs changed
- Whether the system is converging or oscillating

**Robust Error Handling**: 
- Validates that likelihoods are properly shaped (2D array)
- Ensures evidence counts match observations
- Handles edge cases like empty observation sequences

**Memory Efficiency**: Processes observations one at a time rather than requiring all data in memory simultaneously.

#### Cyber Security Example

In a network anomaly detection scenario:

```python
# Each row represents likelihood of observation under each network state
# [normal_likelihood, suspicious_likelihood, attack_likelihood]
network_observations = [
    [0.1, 0.6, 0.9],   # Suspicious traffic pattern
    [0.05, 0.8, 0.95], # Even more suspicious
    [0.9, 0.1, 0.05]   # Returns to normal
]

belief_timeline = sequential(network_prior, network_observations)
```

This gives you a complete picture of how your RBE's confidence in different threat levels evolved as new network data arrived - essential for understanding both the current threat state and the system's decision-making process.

The function essentially turns your single-shot Bayesian update into a learning system that accumulates evidence over time!



##### Edge Cases & Error Conditions

In [None]:
# Test sequential updating
priors = [0.5, 0.5]
likelihoods = [[0.9, 0.1], [0.8, 0.2], [0.7, 0.3]]
posteriors = sequential(priors, likelihoods)
assert posteriors.shape == (4, 2)  # Initial + 3 updates
test_close(np.sum(posteriors, axis=1), 1.0)  # All normalized

In [None]:
# Empty observations (should return just the prior)
empty_result = sequential([0.6, 0.4], [])
test_eq(empty_result.shape, (1, 2))
test_close(empty_result[0], [0.6, 0.4])

# Single observation (common case)
single_result = sequential([0.5, 0.5], [[0.8, 0.2]])
test_eq(single_result.shape, (2, 2))

# Test with custom evidences
priors = [0.4, 0.6]
likelihoods = [[0.9, 0.1], [0.7, 0.3]]
evidences = [0.5, 0.8]  # Custom evidence values
result = sequential(priors, likelihoods, evidences)
# Should use provided evidences instead of computing them


##### Cyber Security Specific Tests

In [None]:
# Realistic network anomaly scenario
network_prior = [0.95, 0.04, 0.01]  # [normal, suspicious, attack]

# Sequence of observations over time
observations = [
    [0.1, 0.6, 0.9],   # High anomaly score
    [0.05, 0.8, 0.95], # Even higher anomaly
    [0.2, 0.3, 0.7],   # Moderate anomaly
    [0.9, 0.1, 0.05]   # Back to normal
]

timeline = sequential(network_prior, observations)

# Verify attack probability peaks and then decreases
attack_probs = timeline[:, 2]  # Extract attack column
peak_idx = np.argmax(attack_probs[1:]) + 1  # Skip initial prior
assert attack_probs[peak_idx] > attack_probs[0], "Attack probability should increase"
assert attack_probs[-1] < attack_probs[peak_idx], "Should decrease after normal observation"

##### Numerical Stability Tests

In [None]:
# Very small likelihoods (rare events)
tiny_likelihoods = [[1e-15, 1-1e-15], [1e-14, 1-1e-14]]
result = sequential([0.5, 0.5], tiny_likelihoods)
assert np.all(np.isfinite(result)), "Should handle tiny values"

# Extreme confidence updates
extreme_likes = [[0.999, 0.001], [0.001, 0.999]]
result = sequential([0.5, 0.5], extreme_likes)
# Should handle rapid belief changes without numerical issues


##### Input Validation Tests

In [None]:
# Wrong likelihood dimensions
try:
    sequential([0.5, 0.5], [0.8, 0.2])  # 1D instead of 2D
    assert False, "Should reject 1D likelihoods"
except ValueError as e:
    assert "2D array" in str(e)

# Mismatched evidence count
try:
    sequential([0.5, 0.5], [[0.8, 0.2]], evidences=[0.5, 0.6])  # 2 evidences, 1 likelihood
    assert False, "Should reject mismatched evidence count"
except ValueError as e:
    assert "match number of likelihoods" in str(e)

# Incompatible shapes
try:
    sequential([0.5, 0.5], [[0.8, 0.2, 0.1]])  # 3 hypotheses vs 2 in prior
    assert False, "Should reject shape mismatch"
except ValueError as e:
    assert "incompatible" in str(e)

##### Convergence and Learning Tests

In [None]:
# Test convergence with consistent evidence
consistent_evidence = [[0.9, 0.1]] * 10  # Same observation repeated
result = sequential([0.5, 0.5], consistent_evidence)

# Should converge toward first hypothesis
final_belief = result[-1, 0]
assert final_belief > 0.95, "Should strongly favor consistent hypothesis"

# Test belief oscillation with conflicting evidence
conflicting = [[0.9, 0.1], [0.1, 0.9]] * 5  # Alternating evidence
result = sequential([0.5, 0.5], conflicting)
# Final belief shouldn't be too extreme in either direction
assert 0.2 < result[-1, 0] < 0.8, "Conflicting evidence should maintain uncertainty"


## Posterior Predictive

Sample from the posterior predictive distribution - what future observations might look like given our current beliefs.

The `predictive` function implements **posterior predictive sampling** - a key technique in Bayesian inference that answers the question: "Given what I've learned so far, what kinds of observations might I see in the future?"

#### What It Does

The function generates synthetic future observations by combining:
1. **Your current beliefs** (posterior distribution over parameters/hypotheses)
2. **The observation model** (likelihood function that maps parameters to observation probabilities)


In [None]:
#| export
def predictive(posterior, likelihood_fn, n_samples=1000, rng=None):
    """Vectorized posterior predictive sampling."""
    if rng is None: rng = np.random.default_rng()
    
    posterior = normalize(posterior)
    param_samples = sample(posterior, n_samples, rng)
    
    # Group samples by parameter for efficient batch processing
    unique_params, counts = np.unique(param_samples, return_counts=True)
    
    predictions = []
    for param_idx, count in zip(unique_params, counts):
        obs_dist = normalize(likelihood_fn(param_idx))
        obs_samples = sample(obs_dist, count, rng)
        predictions.extend(obs_samples)
    
    # Shuffle to remove parameter ordering bias
    rng.shuffle(predictions)
    return np.array(predictions, dtype=int)



#### How It Works

The algorithm follows a two-step process that mirrors the generative story of Bayesian models:

**Step 1: Sample Parameters**


```python
param_samples = sample(posterior, n_samples, rng)
```


This draws parameter values according to your current beliefs. If you're 70% confident in "normal network state" and 30% confident in "attack state", about 70% of the samples will be "normal".

**Step 2: Generate Observations**
For each sampled parameter, use the likelihood function to determine what observations that parameter would generate:
```python
obs_dist = normalize(likelihood_fn(param_idx))
obs_samples = sample(obs_dist, count, rng)
```

## Clever Optimization

The function uses **vectorized batch processing** for efficiency:

```python
unique_params, counts = np.unique(param_samples, return_counts=True)
```

Instead of processing 1000 individual samples, it groups them: "I need 700 observations from parameter 0 and 300 from parameter 1." This is much faster than generating observations one by one.

## Cyber Security Applications

**1. Threat Forecasting**
```python
# Current beliefs about network state
network_posterior = [0.6, 0.3, 0.1]  # [normal, suspicious, attack]

def network_observations(state_idx):
    if state_idx == 0:  # Normal state
        return [0.9, 0.08, 0.02]  # [normal_traffic, anomaly, alert]
    elif state_idx == 1:  # Suspicious state  
        return [0.4, 0.5, 0.1]
    else:  # Attack state
        return [0.1, 0.3, 0.6]

# What kinds of network events should we expect?
future_events = predictive(network_posterior, network_observations, n_samples=1000)
```

**2. Anomaly Detection Validation**
Generate synthetic data that matches your current model, then compare with actual observations to detect model drift or new attack patterns.

**3. Alert System Tuning**
Predict how many alerts different threshold settings would generate under your current threat model.

**4. Resource Planning**
Estimate future computational or analyst workload based on predicted event distributions.

## Key Features

**Numerical Stability**: Normalizes both posterior and likelihood distributions to ensure valid probabilities.

**Reproducibility**: Uses controlled random number generation for consistent results across runs.

**Bias Removal**: Shuffles final predictions to remove any ordering artifacts from the batch processing.

**Type Safety**: Returns integer indices (not floats) since observations are typically categorical.

## Example Output Interpretation

If you get predictions like `[0, 0, 1, 0, 0, 2, 0, ...]`, this means:
- Most future observations will be type 0 (normal)
- Occasional type 1 observations (suspicious)  
- Rare type 2 observations (attacks)

The relative frequencies tell you what to expect: if 80% are type 0, your model predicts the network will be normal 80% of the time.

This is invaluable for **proactive security planning** - instead of just reacting to threats, you can anticipate what's likely to happen and prepare accordingly!



In [None]:
# Test posterior predictive
posterior = [0.6, 0.4]
def simple_likelihood(param_idx):
    if param_idx == 0:
        return [0.8, 0.2]  # Biased toward observation 0
    else:
        return [0.3, 0.7]  # Biased toward observation 1

rng = np.random.default_rng(42)
predictions = predictive(posterior, simple_likelihood, n_samples=100, rng=rng)
assert len(predictions) == 100
assert np.all((predictions >= 0) & (predictions <= 1))

In [None]:
posterior = [0.7, 0.3]

def likelihood_fn(param_idx):
    if param_idx == 0:
        return [0.9, 0.1]  # Parameter 0 strongly predicts observation 0
    else:
        return [0.2, 0.8]  # Parameter 1 strongly predicts observation 1

rng = np.random.default_rng(42)
predictions = predictive(posterior, likelihood_fn, n_samples=1000, rng=rng)

# Check output format
assert predictions.shape == (1000,), "Should return 1D array"
assert predictions.dtype == int, "Should return integer indices"
assert np.all((predictions >= 0) & (predictions <= 1)), "All predictions should be valid indices"

# Check statistical properties
# Since posterior favors param 0 (0.7 vs 0.3), and param 0 favors obs 0 (0.9 vs 0.1),
# we should see more 0s than 1s in predictions
obs_0_count = np.sum(predictions == 0)
obs_1_count = np.sum(predictions == 1)
assert obs_0_count > obs_1_count, "Should predict observation 0 more often"

# Rough check: expect about 70% * 90% + 30% * 20% = 69% observation 0
expected_ratio = 0.7 * 0.9 + 0.3 * 0.2  # ≈ 0.69
actual_ratio = obs_0_count / 1000
assert abs(actual_ratio - expected_ratio) < 0.05, f"Expected ~{expected_ratio:.2f}, got {actual_ratio:.2f}"



## Bayes Factors

Compare evidence for different hypotheses.

The `bayes_factor` function is a powerful tool for **comparing the evidence** that supports one hypothesis versus another. Let me break it down:

#### What Bayes Factors Do

A Bayes factor quantifies how much more likely the observed data is under one hypothesis compared to another:

$$BF_{12} = \frac{P(\text{data}|H_1)}{P(\text{data}|H_2)}$$

**Interpretation:**
- `BF > 1`: Evidence favors hypothesis 1
- `BF < 1`: Evidence favors hypothesis 2  
- `BF = 1`: Data is equally likely under both hypotheses
- `BF = ∞`: Hypothesis 2 considers the data impossible
- `BF = 0`: Hypothesis 1 considers the data impossible

#### How the Function Works


In [None]:
#| export
def bayes_factor(likelihood1, # likelihood of hypothesis 1
                 likelihood2, # likelihood of hypothesis 2
                 data, # data to use for the calculation
                 eps=1e-15 # epsilon for numerical stability
                 ):
    """Calculate Bayes factor for hypothesis 1 vs 2 given data."""
    likelihood1 = np.asarray(likelihood1)
    likelihood2 = np.asarray(likelihood2)
    
    # Validate inputs
    if len(likelihood1) != len(likelihood2):
        raise ValueError("Likelihood arrays must have same length")
    
    # For single observation
    if np.isscalar(data):
        if likelihood2[data] == 0:
            return np.inf if likelihood1[data] > 0 else np.nan
        return likelihood1[data] / likelihood2[data]
    
    # For multiple observations (assuming independence)
    # Use log-space computation for numerical stability
    log_bf = 0.0
    for obs in data:
        if likelihood2[obs] == 0:
            if likelihood1[obs] > 0:
                return np.inf  # Decisive evidence for H1
            else:
                return np.nan  # Both hypotheses say impossible
        
        if likelihood1[obs] == 0:
            return 0.0  # Decisive evidence for H2
        
        # Accumulate in log space
        log_bf += np.log(likelihood1[obs]) - np.log(likelihood2[obs])
    
    return np.exp(log_bf)

def interpret_bf(bf):
    "Interpret Bayes factor strength"
    if bf < 1/100:
        return "Decisive evidence against H1"
    elif bf < 1/10:
        return "Strong evidence against H1"
    elif bf < 1/3:
        return "Moderate evidence against H1"
    elif bf < 1:
        return "Weak evidence against H1"
    elif bf < 3:
        return "Weak evidence for H1"
    elif bf < 10:
        return "Moderate evidence for H1"
    elif bf < 100:
        return "Strong evidence for H1"
    else:
        return "Decisive evidence for H1"

The function handles two key scenarios:

**Single Observation:**
```python
if np.isscalar(data):
    return likelihood1[data] / likelihood2[data]
```
Simply divides the probability each hypothesis assigns to the observed data.

**Multiple Observations:**
For numerical stability with many observations, it uses **log-space computation**:
```python
log_bf += np.log(likelihood1[obs]) - np.log(likelihood2[obs])
return np.exp(log_bf)
```

This prevents overflow/underflow when multiplying many small probabilities.

#### Cyber Security Example


In [None]:
# Two competing models of network behavior
normal_model = [0.9, 0.08, 0.02]    # [normal, suspicious, attack]
attack_model = [0.1, 0.3, 0.6]      # Attack-focused model

# Observe sequence of mostly attack indicators
observations = [2, 2, 1, 2]  # [attack, attack, suspicious, attack]

bf = bayes_factor(attack_model, normal_model, observations)
# Result: Large BF means attack_model explains data much better
bf

np.float64(101250.00000000001)


## Robust Edge Case Handling

The function handles critical edge cases that would crash simpler implementations:

**Impossible Data Under H2:**
```python
if likelihood2[obs] == 0:
    return np.inf  # Decisive evidence for H1
```

**Impossible Data Under H1:**
```python
if likelihood1[obs] == 0:
    return 0.0  # Decisive evidence for H2
```

**Impossible Under Both:**
```python
return np.nan  # Neither hypothesis can explain the data
```

## Interpretation Helper

The `interpret_bf` function provides human-readable interpretations:

```python
interpret_bf(4.5)   # "Moderate evidence for H1"
interpret_bf(0.1)   # "Strong evidence against H1"
interpret_bf(150)   # "Decisive evidence for H1"
```

## Why It's Crucial for RBE

In your Recursive Bayesian Estimator for network anomaly detection:

1. **Model Selection**: Compare different threat models to see which best explains current data
2. **Anomaly Scoring**: Quantify how much more likely data is under "attack" vs "normal" hypotheses
3. **Adaptive Thresholds**: Use BF strength to automatically adjust alert sensitivity
4. **Forensic Analysis**: Provide quantitative evidence for security decisions

The key advantage is that Bayes factors give us a **principled, quantitative measure** of evidence strength, rather than arbitrary scores or binary classifications.



In [None]:
# Test Bayes factors
# Basic functionality
like1 = [0.9, 0.1]  # H1: mostly generates observation 0
like2 = [0.2, 0.8]  # H2: mostly generates observation 1

bf_single = bayes_factor(like1, like2, 0)
test_close(bf_single, 4.5)  # 0.9/0.2

bf_multiple = bayes_factor(like1, like2, [0, 0, 1])
test_close(bf_multiple, 4.5 * 4.5 * 0.125)  # (0.9/0.2)^2 * (0.1/0.8)

# Test edge cases that would crash the original

# Case 1: H2 assigns zero probability to observed data
like1_safe = [0.8, 0.15, 0.05]  # Normal traffic model
like2_zero = [0.0, 0.3, 0.7]     # Attack model that never sees normal traffic

bf_inf = bayes_factor(like1_safe, like2_zero, 0)  # Observe normal traffic
assert bf_inf == np.inf, "Should return infinity when H2 impossible"

bf_inf_multi = bayes_factor(like1_safe, like2_zero, [0, 1])  # Mixed observations
assert bf_inf_multi == np.inf, "Should return infinity if any observation impossible under H2"

# Case 2: H1 assigns zero probability to observed data  
like1_zero = [0.0, 0.4, 0.6]     # Model that never sees normal traffic
like2_safe = [0.7, 0.2, 0.1]     # Model that can see normal traffic

bf_zero = bayes_factor(like1_zero, like2_safe, 0)  # Observe normal traffic
test_close(bf_zero, 0.0)  # Decisive evidence for H2

# Case 3: Both hypotheses assign zero probability (impossible data)
like1_impossible = [0.0, 0.5, 0.5]
like2_impossible = [0.0, 0.3, 0.7]

bf_nan = bayes_factor(like1_impossible, like2_impossible, 0)
assert np.isnan(bf_nan), "Should return NaN when both hypotheses say impossible"

# Test numerical stability with many observations
like1_stable = [0.6, 0.4]
like2_stable = [0.4, 0.6]
many_obs = [0] * 100 + [1] * 100  # 100 of each observation

bf_stable = bayes_factor(like1_stable, like2_stable, many_obs)
assert np.isfinite(bf_stable), "Should handle many observations without overflow/underflow"

# Expected value: (0.6/0.4)^100 * (0.4/0.6)^100 = (1.5)^100 * (2/3)^100 = 1
test_close(bf_stable, 1.0, eps=1e-10)

# Test input validation
try:
    bayes_factor([0.5, 0.5], [0.3, 0.3, 0.4], 0)  # Mismatched lengths
    assert False, "Should raise ValueError for mismatched lengths"
except ValueError as e:
    assert "same length" in str(e)

# Test with extreme values (numerical stability)
like1_extreme = [1e-100, 1-1e-100]
like2_extreme = [1-1e-100, 1e-100]

bf_extreme = bayes_factor(like1_extreme, like2_extreme, 0)
assert np.isfinite(bf_extreme), "Should handle extreme values"
assert bf_extreme < 1e-90, "Should be very small but finite"

# Cyber security scenario test
normal_model = [0.9, 0.08, 0.02]    # [normal, suspicious, attack]
attack_model = [0.1, 0.3, 0.6]      # Attack-focused model

# Sequence of observations suggesting attack
attack_sequence = [2, 2, 1, 2, 1]   # Mostly attack/suspicious observations

bf_attack = bayes_factor(attack_model, normal_model, attack_sequence)
assert bf_attack > 1, "Attack model should be favored for attack sequence"

# Normal sequence
normal_sequence = [0, 0, 0, 1, 0]    # Mostly normal observations

bf_normal = bayes_factor(attack_model, normal_model, normal_sequence)
assert bf_normal < 1, "Normal model should be favored for normal sequence"

# Test interpretation
assert "Strong evidence for" in interpret_bf(50)
assert "Weak evidence against" in interpret_bf(0.5)

## Conjugate Priors

Helper functions for common conjugate prior-likelihood pairs.

## Conjugate Priors

**Conjugate priors** are a mathematical convenience in Bayesian inference where the prior and posterior distributions belong to the same family of probability distributions. This creates a beautiful mathematical symmetry that makes Bayesian updating much simpler.

### The Mathematical Beauty

When you have a conjugate prior-likelihood pair:
- **Prior**: Some distribution family (e.g., Beta)
- **Likelihood**: Compatible distribution (e.g., Binomial)
- **Posterior**: Same family as prior (Beta again!)

The key insight is that you can update your beliefs using **simple arithmetic** instead of complex integration.

### Beta-Binomial: The Classic Example

The `beta_binomial_update` function implements the most famous conjugate pair:


In [None]:
#| export
def beta_binomial_update(alpha, # prior alpha
                         beta, # prior beta
                         successes, # number of successes
                         failures # number of failures
                         ):
    "Update Beta prior with binomial data"
    return alpha + successes, beta + failures

### How It Works

**Prior Beliefs**: Beta(α, β) distribution represents your initial beliefs about a probability p
- α represents "pseudo-successes" you've already seen
- β represents "pseudo-failures" you've already seen
- Your initial belief about p has mean α/(α+β)

**New Data**: You observe some binomial data (successes and failures)

**Posterior Update**: Simply add the new data to your prior parameters!
- New α = old α + observed successes
- New β = old β + observed failures

### Cyber Security Example

Imagine you're tracking the success rate of a particular attack method:


In [None]:
# Initial belief: uniform prior (no prior knowledge)
alpha_prior, beta_prior = 1, 1  # Beta(1,1) = Uniform(0,1)

# Observe attack attempts: 7 successes, 3 failures
alpha_post, beta_post = beta_binomial_update(alpha_prior, beta_prior, 7, 3)
# Result: Beta(8, 4) - attack success rate ≈ 8/(8+4) = 67%

# More data arrives: 2 more successes, 5 more failures  
alpha_final, beta_final = beta_binomial_update(alpha_post, beta_post, 2, 5)
# Result: Beta(10, 9) - refined estimate ≈ 10/(10+9) = 53%
alpha_final, beta_final

(10, 9)


## Why Conjugate Priors Are Powerful

**1. Computational Efficiency**: No complex integrals - just arithmetic!

**2. Interpretable Parameters**: α and β have clear meanings (pseudo-counts)

**3. Sequential Learning**: Each update gives you a new prior for the next observation

**4. Uncertainty Quantification**: The Beta distribution naturally captures uncertainty about the probability

## Applications in Your RBE System

**Threat Success Rates**: Track how often different attack types succeed
```python
# Each attack type gets its own Beta distribution
malware_params = beta_binomial_update(1, 1, detected_malware, missed_malware)
phishing_params = beta_binomial_update(1, 1, detected_phishing, missed_phishing)
```

**Sensor Reliability**: Model how often each sensor correctly identifies threats
```python
sensor_reliability = beta_binomial_update(5, 2, correct_alerts, false_alarms)
```

**Adaptive Thresholds**: Use the posterior distribution to set confidence-based thresholds

## The Magic Formula

For Beta-Binomial conjugacy:
- **Prior**: Beta(α, β)
- **Likelihood**: Binomial(n, p) with k successes
- **Posterior**: Beta(α + k, β + n - k)

This means your `beta_binomial_update` function is implementing one of the most fundamental results in Bayesian statistics - turning complex probability calculations into simple addition!

The beauty is that this same pattern (prior parameters + data = posterior parameters) appears throughout Bayesian statistics with different conjugate pairs, making Bayesian learning both principled and computationally tractable.



In [None]:
# Test conjugate updates
# Beta-Binomial
alpha_post, beta_post = beta_binomial_update(1, 1, 7, 3)
test_eq(alpha_post, 8)
test_eq(beta_post, 4)


### Normal-Normal Conjugate Update

The `normal_normal_update` function implements another classic conjugate prior-likelihood pair: **Normal prior with Normal likelihood**. This is particularly powerful for continuous parameter estimation in cyber security applications.

## The Mathematical Framework

When you have:
- **Prior**: Normal(μ₀, σ₀²) - your initial belief about some parameter
- **Likelihood**: Normal data with known variance - observations from that parameter
- **Posterior**: Normal(μ₁, σ₁²) - updated belief after seeing data

The magic is that the posterior is also Normal, and we can compute it using **precision weighting**.

#### How the Function Works


In [None]:
#| export
def normal_normal_update(prior_mean, prior_var, data_mean, data_var, n_obs):
    "Update Normal prior with Normal likelihood"
    # Precision weighting
    prior_prec = 1 / prior_var
    data_prec = n_obs / data_var
    
    post_prec = prior_prec + data_prec
    post_mean = (prior_prec * prior_mean + data_prec * data_mean) / post_prec
    post_var = 1 / post_prec
    
    return post_mean, post_var


## The Precision Weighting Insight

The key insight is working with **precision** (1/variance) instead of variance:
- **High precision** = low variance = high certainty
- **Low precision** = high variance = high uncertainty

The algorithm:
1. **Convert to precisions**: More precise sources get more weight
2. **Add precisions**: Total precision = prior precision + data precision  
3. **Weight the means**: Final mean is precision-weighted average
4. **Convert back**: Final variance = 1/total precision

## Intuitive Understanding

Think of it as **combining two sources of information**:


In [None]:
# Example: Estimating average response time
prior_mean, prior_var = 100, 25    # Prior: 100ms ± 5ms (uncertain)
data_mean, data_var = 95, 4        # Data: 95ms ± 2ms (10 observations)

post_mean, post_var = normal_normal_update(100, 25, 95, 4, 10)
# Result: ~96ms with much lower variance
post_mean, post_var

(95.07874015748031, 0.39370078740157477)


The data gets more weight because:
- It has lower variance (more precise)
- It has more observations (n_obs = 10)

## Cyber Security Applications

**1. Response Time Modeling**
```python
# Track system response times under different threat levels
normal_response = normal_normal_update(50, 100, 48, 16, 20)    # Normal traffic
attack_response = normal_normal_update(50, 100, 150, 400, 5)   # During attack
```

**2. Anomaly Score Calibration**
```python
# Calibrate anomaly detection thresholds
baseline_score = normal_normal_update(0, 1, 0.1, 0.25, 100)   # Normal baseline
current_score = normal_normal_update(*baseline_score, 2.5, 0.5, 10)  # Recent data
```

**3. Sensor Drift Detection**
```python
# Track how sensor readings change over time
sensor_baseline = normal_normal_update(1000, 50, 995, 25, 50)  # Calibrated reading
current_reading = normal_normal_update(*sensor_baseline, 1020, 30, 10)  # Recent drift
```

**4. Performance Metrics**
```python
# Model network latency under different conditions
latency_model = normal_normal_update(10, 4, 12, 2, 25)  # ms latency
```

## Key Properties

**Precision Accumulation**: Each update increases total precision (reduces uncertainty)
```python
# More data = more certainty
assert post_var < min(prior_var, data_var/n_obs)
```

**Weighted Compromise**: The posterior mean balances prior belief and data
```python
# If data is much more precise, posterior approaches data_mean
# If prior is much more precise, posterior stays near prior_mean
```

**Sequential Learning**: You can chain updates naturally
```python
# Day 1 update
post1 = normal_normal_update(prior_mean, prior_var, day1_mean, day1_var, n1)

# Day 2 update (using Day 1 posterior as new prior)
post2 = normal_normal_update(*post1, day2_mean, day2_var, n2)
```

## The Test Case Breakdown

```python
post_mean, post_var = normal_normal_update(0, 1, 2, 0.5, 10)
```

- **Prior**: Mean=0, Var=1 (precision=1)
- **Data**: Mean=2, Var=0.5, n=10 (precision=10/0.5=20)
- **Result**: Data precision (20) >> Prior precision (1), so posterior ≈ data

The posterior mean (≈1.9) is close to the data mean (2) because the data is much more precise than the prior. The posterior variance (≈0.048) is much smaller than either source alone because precisions add up.

## Why This Matters for RBE

In our Recursive Bayesian Estimator, this conjugate update allows us to:
- **Efficiently track continuous parameters** (response times, scores, etc.)
- **Quantify uncertainty** in your estimates
- **Adapt to new data** without storing historical observations
- **Balance prior knowledge with evidence** in a principled way

The mathematical elegance is that complex Bayesian inference reduces to simple arithmetic with means and precisions!



In [None]:
# Normal-Normal  
post_mean, post_var = normal_normal_update(0, 1, 2, 0.5, 10)
# Should be weighted toward data due to more observations
assert 1.5 < post_mean < 2.0
assert post_var < 0.5  # Should be more certain than either alone

## Export

In [None]:
#| export
__all__ = [
    # Core updates
    'update', 'sequential',
    
    # Posterior predictive
    'predictive',
    
    # Model comparison
    'bayes_factor', 'interpret_bf',
    
    # Conjugate priors
    'beta_binomial_update', 'normal_normal_update'
]