# 08 - Bayesian and Frequentist Sample Size Calculation

**Purpose**: This notebook provides a set of functions for calculating the required sample size for both model development and performance evaluation, offering both Bayesian and frequentist approaches.

**Inputs**:
- Pilot study data (for model development).
- Expected performance metrics (sensitivity, specificity, prevalence) for performance evaluation.

**Outputs**:
- The notebook prints the calculated sample sizes for model development and performance evaluation based on the example usage.

### Key Functions:
1.  **`beta_hdi()`**: Calculates the Highest Density Interval (HDI) for a Beta distribution.
2.  **`model_development_sample_size()`**: A Bayesian simulation to determine the sample size needed for stable model parameter estimation.
3.  **`performance_sample_size()`**: Calculates the sample size needed to estimate classification performance metrics (sensitivity and specificity) to a desired precision. This function can use either a frequentist (Buderer's formula) or a Bayesian simulation approach.

### 8.1 Sample Size Calculation Functions and Examples

This cell defines and demonstrates the functions for sample size calculation.

- **`beta_hdi`**: A helper function to find the optimal Highest Density Interval for a Beta distribution.
- **`model_development_sample_size`**: Implements a Bayesian simulation to find the sample size needed to ensure the model's parameter estimates are stable (i.e., have a narrow HDI). It uses a Beta-Binomial model and iterates through sample sizes until a target assurance level is met.
- **`performance_sample_size`**: A versatile function that calculates the required sample size to evaluate classification performance. It supports both a frequentist method (using Buderer's formula) and a Bayesian simulation method.
- **Example Usage**: The final part of the cell demonstrates how to use these functions with sample pilot data and performance targets, printing the estimated required sample sizes for both model development and performance evaluation.

In [None]:
from importlib import reload
%reload_ext autoreload
%autoreload 2

import numpy as np
import scipy.stats as stats
from scipy.optimize import fmin

def beta_hdi(alpha, beta, ci=0.95):
    """Calculate HDI for Beta distribution using optimization.

    Args:
        alpha (float): The alpha parameter of the Beta distribution.
        beta (float): The beta parameter of the Beta distribution.
        ci (float, optional): The desired credible interval coverage. Defaults to 0.95.

    Returns:
        tuple: A tuple containing the lower and upper bounds of the HDI.
    """    def interval_width(low):
        high = low + ci
        lower = stats.beta.ppf(low, alpha, beta)
        upper = stats.beta.ppf(high, alpha, beta)
        return abs(upper - lower)
    
    result = fmin(interval_width, (1 - ci)/2, disp=False)
    low = result[0]
    high = low + ci
    return stats.beta.ppf([low, high], alpha, beta)

# --------------------------
# 1. Model Development Sample Size (Beta-Binomial)
# --------------------------

def model_development_sample_size(pilot_data, hdi_width=0.2, ci=0.95, 
                                  min_n=10, max_n=1000, step=10, 
                                  simulations=1000, target_prob=0.8):
    """Determine sample size for stable model parameters using Bayesian simulation.

    This function simulates future data collection to find the sample size `n` at
    which there is a high probability (`target_prob`) that the resulting
    posterior distribution's HDI will be narrower than `hdi_width`.

    Args:
        pilot_data (np.array): A binary array (1s/0s) of data from a pilot study.
        hdi_width (float, optional): The desired HDI width threshold. Defaults to 0.2.
        ci (float, optional): The credible interval level. Defaults to 0.95.
        min_n (int, optional): The minimum sample size to test. Defaults to 10.
        max_n (int, optional): The maximum sample size to test. Defaults to 1000.
        step (int, optional): The increment for searching sample sizes. Defaults to 10.
        simulations (int, optional): The number of simulations to run per sample size. Defaults to 1000.
        target_prob (float, optional): The desired probability of achieving the HDI width. Defaults to 0.8.

    Returns:
        int | None: The optimal sample size, or None if not found within the search range.
    """    # Convert pilot data to Beta prior
    successes = np.sum(pilot_data)
    failures = len(pilot_data) - successes
    alpha_prior = successes + 1
    beta_prior = failures + 1
    p_pilot = np.mean(pilot_data)
    
    # Search through candidate sample sizes
    for n in range(min_n, max_n+1, step):
        valid_count = 0
        
        for _ in range(simulations):
            # Generate synthetic data
            k = np.random.binomial(n, p_pilot)
            
            # Calculate posterior parameters
            alpha_post = alpha_prior + k
            beta_post = beta_prior + (n - k)
            
            # Calculate HDI width
            lower, upper = beta_hdi(alpha_post, beta_post, ci)
            width = upper - lower
            
            if width <= hdi_width:
                valid_count += 1
                
        probability = valid_count / simulations
        if probability >= target_prob:
            return n
        
    return None

# --------------------------
# 2. Performance Evaluation Sample Size
# --------------------------

def performance_sample_size(sens=0.8, spec=0.85, prevalence=0.3, 
                            hdi_width=0.1, ci=0.95, method='bayesian',
                            prior_strength=10, simulations=1000):
    """Calculate sample size for classification performance estimation.

    This function can operate in two modes:
    - 'frequentist': Uses Buderer's formula for a quick, analytical estimate.
    - 'bayesian': Uses a simulation-based approach to find the sample size
      that achieves a desired HDI width with a certain probability (assurance).

    Args:
        sens (float, optional): Expected sensitivity. Defaults to 0.8.
        spec (float, optional): Expected specificity. Defaults to 0.85.
        prevalence (float, optional): Expected prevalence of the condition. Defaults to 0.3.
        hdi_width (float, optional): Desired HDI width. Defaults to 0.1.
        ci (float, optional): Credible/confidence interval level. Defaults to 0.95.
        method (str, optional): 'bayesian' or 'frequentist'. Defaults to 'bayesian'.
        prior_strength (int, optional): Strength of the Beta prior for the Bayesian method. Defaults to 10.
        simulations (int, optional): Number of simulations for the Bayesian method. Defaults to 1000.

    Returns:
        tuple: A tuple containing the estimated total sample size and a dictionary
            with the breakdown of required cases for sensitivity and specificity.
    """    
    if method == 'frequentist':
        # Buderer's formula implementation [4]
        z = stats.norm.ppf(1 - (1 - ci)/2)
        
        # Sensitivity calculation
        n_sens = (z**2 * sens * (1 - sens)) / (hdi_width/2)**2 
        n_sens /= prevalence
        
        # Specificity calculation
        n_spec = (z**2 * spec * (1 - spec)) / (hdi_width/2)**2
        n_spec /= (1 - prevalence)
        
        n_total = int(np.ceil(max(n_sens, n_spec)))
        
        return n_total, {
            'sensitivity_sample': int(np.ceil(n_sens)),
            'specificity_sample': int(np.ceil(n_spec))
        }
    
    elif method == 'bayesian':
        # Bayesian simulation approach
        def find_sample_size(true_p, prev, is_sens=True):
            alpha_prior = prior_strength * true_p + 1
            beta_prior = prior_strength * (1 - true_p) + 1
            
            for n in range(10, 10000, 10):
                valid = 0
                cases = int(n * prev) if is_sens else int(n * (1 - prev))
                if cases < 1: continue
                
                for _ in range(simulations):
                    k = np.random.binomial(cases, true_p)
                    alpha_post = alpha_prior + k
                    beta_post = beta_prior + (cases - k)
                    
                    lower, upper = beta_hdi(alpha_post, beta_post, ci)
                    if (upper - lower) <= hdi_width:
                        valid += 1
                
                if valid/simulations >= 0.8:
                    return n, cases
            return None, None
        
        # Sensitivity calculation
        n_sens_total, n_sens_cases = find_sample_size(sens, prevalence, True)
        n_spec_total, n_spec_cases = find_sample_size(spec, prevalence, False)
        
        n_total = max(n_sens_total, n_spec_total)
        
        return n_total, {
            'sensitivity': {
                'total_samples': n_sens_total,
                'positive_cases': n_sens_cases
            },
            'specificity': {
                'total_samples': n_spec_total,
                'negative_cases': n_spec_cases
            }
        }

# --------------------------
# Example Usage
# --------------------------
# Generate example pilot data (75% accuracy)
np.random.seed(42)
pilot_data = np.random.binomial(1, 0.75, 50)

# 1. Model development sample size
dev_sample_size = model_development_sample_size(
    pilot_data, hdi_width=0.15, min_n=100, max_n=500
)
print(f"Model development sample size: {dev_sample_size}")

# 2. Performance evaluation sample size
# Frequentist approach
n_freq, breakdown_freq = performance_sample_size(method='frequentist')
print(f"Frequentist sample size: {n_freq}")
print(breakdown_freq)

# Bayesian approach
n_bayes, breakdown_bayes = performance_sample_size(method='bayesian')
print(f"Bayesian sample size: {n_bayes}")
print(breakdown_bayes)


### 8.2 Alternative BAM Implementation

This cell defines `bam_n_dev`, an alternative implementation of the Bayesian Assurance Method logic for model development when planning new studies. It uses a hierarchical prior and a binary search algorithm to find the optimal sample size, which can be more efficient than a linear search.

In [None]:
%reload_ext autoreload
%autoreload 2

import numpy as np
import scipy.stats as stats
from scipy.optimize import fmin


def bam_n_dev(pilot_data, hdi_width=0.1, ci=0.95, 
                             simulations=1000, target_assurance=0.8):
    """
    BAM implementation for model development sample size
    """
    # Hierarchical prior for pilot data
    alpha_hyper = stats.gamma(a=2, scale=1).rvs()
    beta_hyper = stats.gamma(a=2, scale=1).rvs()
    alpha_post = alpha_hyper + np.sum(pilot_data)
    beta_post = beta_hyper + len(pilot_data) - np.sum(pilot_data)
    
    # Binary search for optimal n
    low, high = 10, 10000
    while low <= high:
        mid = (low + high) // 2
        assurance = 0
        
        for _ in range(simulations):
            theta = stats.beta(alpha_post, beta_post).rvs()
            y_sim = stats.bernoulli(theta).rvs(mid)
            alpha_post_sim = alpha_post + np.sum(y_sim)
            beta_post_sim = beta_post + mid - np.sum(y_sim)
            
            lower, upper = beta_hdi(alpha_post_sim, beta_post_sim, ci)
            if (upper - lower) <= hdi_width:
                assurance += 1
                
        assurance_prob = assurance / simulations
        if assurance_prob >= target_assurance:
            high = mid - 1
        else:
            low = mid + 1
            
    return low
