### Sample Size Calculation for Model Accuracy

The required sample size to estimate model accuracy within a given margin of error is calculated using:

$$
n = \frac{Z^2 \cdot p \cdot (1 - p)}{MOE^2}
$$

where:  
- \( n \) = required sample size  
- \( Z \) = Z-score for the desired confidence level (e.g., **1.96** for 95%)  
- \( p \) = expected model accuracy (e.g., 0.75 for 75%)  
- \( MOE \) = desired margin of error (e.g., **0.03** for ±3%)  

This formula ensures that the estimated accuracy has a reliable confidence interval.


In [7]:
import scipy.stats as stats
import pandas as pd

# Constants
z = stats.norm.ppf(0.975)  # Z-score for 95% confidence level (two-tailed, 1.96)

# Expected accuracy values to test
accuracy_values = [0.50, 0.55, 0.60, 0.60, 0.70, 0.75, 0.80, 0.85, 0.90]  # Example expected model accuracies

# Desired margins of error (MOE)
moe_values = [0.02, 0.03, 0.05, 0.10, 0.15, 0.20]  # Example MOEs of ±2%, ±3%, ±5%

# Compute required sample sizes
sample_sizes = {}
for p in accuracy_values:
    sample_sizes[p] = {moe: (z**2 * p * (1 - p)) / (moe**2) for moe in moe_values}

sample_sizes

sample_sizes_rounded = {p: {moe: round(size, 0) for moe, size in moes.items()} for p, moes in sample_sizes.items()}

# Create a DataFrame for better visualization
df = pd.DataFrame(sample_sizes_rounded).T  # Transpose to match expected format
df.columns = [f"MOE ±{int(moe*100)}%" for moe in moe_values]
df.index.name = "Expected Accuracy"
df

Unnamed: 0_level_0,MOE ±2%,MOE ±3%,MOE ±5%,MOE ±10%,MOE ±15%,MOE ±20%
Expected Accuracy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0.5,2401.0,1067.0,384.0,96.0,43.0,24.0
0.55,2377.0,1056.0,380.0,95.0,42.0,24.0
0.6,2305.0,1024.0,369.0,92.0,41.0,23.0
0.7,2017.0,896.0,323.0,81.0,36.0,20.0
0.75,1801.0,800.0,288.0,72.0,32.0,18.0
0.8,1537.0,683.0,246.0,61.0,27.0,15.0
0.85,1224.0,544.0,196.0,49.0,22.0,12.0
0.9,864.0,384.0,138.0,35.0,15.0,9.0
