ROC Power Analysis For 1 Curve

**Hanley & McNeil formula**


In the above code, the function auc_sample_size takes as input the expected AUC (between 0.5 and 1), the delta value (minimum difference from 0.5 that is considered significant), and the desired power of the test. It calculates the required sample sizes for positive (n1) and negative (n2) instances to achieve the given power for the significance test. It's assumed that you have an equal number of positive and negative instances, which is common in many studies.

The function uses Hanley and McNeil's formula for the standard error (SE) of the AUC to calculate the observed Z score (Z), which is the difference between the observed AUC and 0.5, divided by the standard error. It iteratively increases the sample sizes until the observed Z score is greater than the required Z score for the desired power. This ensures that the test has sufficient power to detect a difference from 0.5 of at least delta.
```
    Parameters:
    auc: float
        The expected AUC (Area Under the ROC Curve) for the diagnostic test. This is the probability that a
        randomly chosen positive instance will be ranked higher by the classifier than a randomly chosen 
        negative instance.
    delta: float
        The minimum difference from 0.5 (random guess) that is considered to be practically significant.
        This determines the alternative hypothesis for the significance test. For example, if delta is 0.05,
        the alternative hypothesis is that the AUC is either 0.55 or more, or 0.45 or less, assuming a 
        two-sided test.
    power: float
        The desired power of the test, i.e., the probability of correctly rejecting the null hypothesis 
        (that the AUC is 0.5) when the alternative hypothesis is true.
``````

In [20]:
from math import sqrt
from scipy.stats import norm

def auc_sample_size(auc, delta, power):
    """
    Calculate the sample size required to declare a single ROC AUC significant.

    Parameters:
    auc: Expected AUC
    delta: The minimal detectable difference from 0.5 (which represents a random guess)
    power: Desired power of the test

    Returns:
    Required sample size for both positive and negative instances.
    """

    # Quantities Q1 and Q2
    Q1 = auc / (2 - auc)
    Q2 = 2 * auc**2 / (1 + auc)

    # Required Z score for the desired power
    Z_beta = norm.ppf(power)

    # Iteratively search for the required sample size
    n1 = n2 = 1
    while True:
        SE = sqrt( (auc * (1 - auc) + (n1 - 1)*(Q1 - auc**2) + (n2 - 1)*(Q2 - auc**2)) / (n1 * n2) )
        Z = (auc - 0.5) / SE

        if Z > Z_beta - norm.ppf(delta):
            break

        n1 += 1
        n2 += 1

    return n1, n2

# Example usage:
# auc = 0.7
# delta = 0.05
# power = 0.8

# n1, n2 = auc_sample_size(auc, delta, power)

# print(f"Sample size required: {n1} positive instances and {n2} negative instances.")


In [41]:
auc = 0.8
delta = 0.1
power = 0.8
#----------------------------------------------------------------
n1, n2 = auc_sample_size(auc, delta, power)
print(f"Sample size required: {n1} positive instances and {n2} negative instances.")


Sample size required: 6 positive instances and 6 negative instances.


ROC Power Analysis Comparing 2 Curves

In [1]:
import math
import scipy.stats as stats

def sample_size(auc, delta, sig_level, power, r, p):
    """
    Calculate the sample size for a single ROC curve study.

    Parameters:
    auc: The expected Area Under the Curve
    delta: The difference in AUC that you want to detect
    sig_level: The significance level (type I error)
    power: The power of the study (1 - type II error)
    r: The correlation coefficient
    p: The proportion of positives in the sample

    Returns:
    The calculated sample size.
    """

    # calculate the Z scores for the significance level and power
    Z_alpha = stats.norm.ppf(1 - sig_level / 2)
    Z_beta = stats.norm.ppf(power)

    # calculate the proportion of negatives in the sample
    q = 1 - p

    # calculate the sample size using the formula
    n = (Z_alpha * math.sqrt(2 * r * (1 - r)) + Z_beta * math.sqrt(2 * p * q * (1 - 2 * r))) ** 2 / delta ** 2

    return math.ceil(n)  # round up to the next whole number

# Example usage:
# auc = 0.7  # expected AUC
# delta = 0.1  # difference in AUC that we want to detect
# sig_level = 0.05  # significance level
# power = 0.8  # power of the study
# r = 0.2  # correlation coefficient
# p = 0.5  # proportion of positives in the sample

In [19]:
auc = 0.8  # expected AUC
delta = 0.15  # difference in AUCs that we want to detect
sig_level = 0.05  # significance level
power = 0.8  # power of the study
r = 0.5  # correlation coefficient
p = 0.5  # proportion of positives in the sample
#------------
sample_size(auc, delta, sig_level, power, r, p)

86

Intraclass Correlation Coefficient Power Analysis

In [34]:
import numpy as np
import pingouin as pg
import pandas as pd

# def simulate_data(n, mean, std_dev, subject_effects=None):
#     """
#     Simulate data for a single group with shared subject effects.
    
#     n - Number of subjects (sample size).
#     mean - Mean score for the group.
#     std_dev - Standard deviation of the scores for the group.
#     subject_effects - Shared subject effects across groups.
#     """
    
#     # Simulate scores for the group
#     scores = np.random.normal(mean, std_dev, n)
    
#     # Add subject-specific random effects if provided
#     if subject_effects is not None:
#         scores += subject_effects
    
#     return scores


def simulate_data(n, mean_control, std_dev_control, mean_experimental_error, std_dev_experimental_error):
    """
    Simulate data for control and experimental groups with paired sampling.
    
    n - Number of subjects (sample size).
    mean_control - Mean score for the control group.
    std_dev_control - Standard deviation of the scores for the control group.
    mean_experimental - Mean score for the experimental group.
    std_dev_experimental - Standard deviation of the scores for the experimental group.
    subject_effects - Individual subject effects.
    """
    
    # Generate subject effects
    subject_effects = np.random.normal(mean_experimental_error, std_dev_experimental_error, n)
    
    # Generate scores for control group without any adjustment
    control_scores = np.random.normal(mean_control, std_dev_control, n)
    
    # Add subject effects to the experimental scores
    experimental_scores = control_scores + subject_effects
    
    return control_scores, experimental_scores


def compute_icc(control_scores, experimental_scores):
    """
    Compute ICC using the pingouin package.
    
    control_scores - Scores from the control group.
    experimental_scores - Scores from the experimental group.
    """
    # Combine the scores into a single DataFrame
    data = {'Scores': np.concatenate((control_scores, experimental_scores)),
            'Groups': ['Control'] * len(control_scores) + ['Experimental'] * len(experimental_scores),
            'Subjects': list(range(len(control_scores))) * 2}
    df = pd.DataFrame(data)
    
    # Calculate the ICC
    icc = pg.intraclass_corr(data=df, targets='Subjects', raters='Groups', ratings='Scores').set_index('Type').loc['ICC1', 'ICC']
    
    return icc

def power_analysis(target_icc, mean_control, std_dev_control, mean_experimental_error, std_dev_experimental_error, alpha=0.05, power=0.80, min_n=5, max_n=1000, n_sims=100):
    """
    Perform power analysis to find the number of subjects needed to detect a given ICC.
    
    target_icc - The ICC you want to detect.
    mean_control - Mean score for the control group.
    std_dev_control - Standard deviation of the scores for the control group.
    mean_experimental - Mean score for the experimental group.
    std_dev_experimental - Standard deviation of the scores for the experimental group.
    alpha - Significance level, the probability of a Type I error.
    power - The desired power level, probability of correctly rejecting the null hypothesis.
    max_n - Minimum sample size to try.
    max_n - Maximum sample size to try.
    n_sims - Number of simulations for each sample size.
    """
    
    # Loop through sample sizes
    for n in range(min_n, max_n + 1):
        
        # Counter for the number of times the null hypothesis is rejected
        null_rejected = 0
        
        # Shared subject effects
        
        
        # Simulate data and compute ICC for each simulation
        for _ in range(n_sims):
            
            control_scores, experimental_scores = simulate_data(n, mean_control, std_dev_control, mean_experimental_error, std_dev_experimental_error)
            
            # Simulate data
            # control_scores = simulate_data(n, mean_control, std_dev_control, subject_effects)
            # experimental_scores = simulate_data(n, mean_experimental, std_dev_experimental, subject_effects)
            computed_icc = compute_icc(control_scores, experimental_scores)
            
            # Check if the computed ICC meets or exceeds the desired ICC
            if computed_icc >= target_icc:
                null_rejected += 1
        
        # Calculate the achieved power
        achieved_power = null_rejected / n_sims
        
        # Print diagnostic information
        print(f"Sample size: {n}, Achieved Power: {achieved_power}")
        
        # Check if the desired power is achieved
        if achieved_power >= power:
            return n
    
    # Return None if the desired power is not achieved within max_n
    return None


# Example usage:
# required_n = power_analysis(target_icc=0.75, mean_control=90, std_dev_control=3, mean_experimental=90, std_dev_experimental=3, min_n=50, max_n=100, n_sims=100)
# print(f"Required sample size for desired power: {required_n}")


In [60]:
required_n = power_analysis(
    target_icc=0.75,              # Set target ICC to 0.5
    mean_control=60,             # Mean score for the control group
    std_dev_control=5,          # Standard deviation of the scores for the control group
    mean_experimental_error=0,        # Mean score for the experimental group
    std_dev_experimental_error=5,     # Standard deviation of the scores for the experimental group
    min_n=90, max_n=100, n_sims=100)  # Try sample sizes from 5 to 100

print(f"Required sample size for desired power: {required_n}")


Sample size: 90, Achieved Power: 0.01
Sample size: 91, Achieved Power: 0.02
Sample size: 92, Achieved Power: 0.06
Sample size: 93, Achieved Power: 0.04
Sample size: 94, Achieved Power: 0.04
Sample size: 95, Achieved Power: 0.05
Sample size: 96, Achieved Power: 0.04
Sample size: 97, Achieved Power: 0.04
Sample size: 98, Achieved Power: 0.01
Sample size: 99, Achieved Power: 0.02
Sample size: 100, Achieved Power: 0.05
Required sample size for desired power: None
