# Sample Sizes for Budgeting

There are two types of sampling in interim tests:

1. Discovery sampling for discovery of *out-of-control* transaction streams
2. Attribute sampling for estimating transaction error rate

Discovery sampling sets a sample size that is likely to discover at least one error in the sample if the actual transaction error rate exceeds the minimum acceptable error-rate (alternatively called the out-of-control rate of error). Discovery tests helps the auditor decide whether the systems processing a particular transaction stream are in or out of control. Budgeted sample sizes in interim testing will depend on whether the RAM suggests that control risk is low or high. If it is low, then the discovery sample size plus a 'security' factor for cases where error is discovered will estimated the scope of auditing.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd

# Discovery sampling calculation
confidence = np.arange(0.99, 0.69, -0.01)
n = np.log(1 - confidence) / np.log(1 - 0.05)

plt.figure(figsize=(10, 6))
plt.plot(confidence, n)
plt.xlabel('Confidence Level')
plt.ylabel('Sample Size (n)')
plt.title('Discovery Sample Size vs Confidence Level')
plt.grid(True)
plt.show()

So for a 5% intolerable error rate at 95% confidence we have:

In [None]:
confidence = 0.95
n = np.log(1 - confidence) / np.log(1 - 0.05)
print(f"\nDiscovery sample size = {np.ceil(n):.0f}")

Where the RAM assesses control risk to be anything higher, the auditor can assume that scope will be expanded to include attribute sampling. Attribute sampling estimates the error rate in the entire transaction population with some confidence (e.g., 95%) that the estimate is within the out-of-control error-rate cutoff for that transaction stream. If it is found that a particular transaction stream is out of control, then attribute estimation will help us decide on the actual error rate of the systems that process this transaction stream. Errors estimates from attribute samples may either be *rates* or *amounts* or both.

If discovery sampling suggests that a particular transaction stream is out of control, then attribute estimation will help us decide on the actual error rate of the systems that process this transaction stream. Attribute sampling size is determined using Cohen's power analysis which we can compute using statistical power calculations.

In [None]:
# Attribute sample for estimating 'rate' of errors
size = 1000  # number of transactions
Delta = 0.05 * size  # detect 5% occurrence error
sigma = 0.3 * size  # variability (guess ~1/3 rd)
effect = Delta / sigma

# Using power analysis to determine sample size
# For one-sample t-test with effect size, alpha=0.05, power=0.8
from statsmodels.stats.power import TTestPower

power_analysis = TTestPower()
sample_n = power_analysis.solve_power(effect_size=effect, alpha=0.05, power=0.8, alternative='larger')

print(f"\nAttribute sample size for occurrence of error = {np.ceil(sample_n):.0f}")

In [None]:
# Attribute sample for estimating 'amount' of errors
size = 100000  # total amount of transactions
mu = 50  # average value of transaction
Delta = 0.05 * mu  # detect 5% amount intolerable error
sigma = 30  # variability
effect = Delta / sigma

sample_n = power_analysis.solve_power(effect_size=effect, alpha=0.05, power=0.8, alternative='larger')

print(f"\nAttribute sample size for amount of error = {np.ceil(sample_n):.0f}")

The auditor faces different decisions in substantive testing. The particular type of account determines the impact of control weaknesses found in interim testing. For example, a 5% error rate in a $1 million sales account discovered in interim testing implies a $50,000 error in annual sales on the trial balance. In contrast, assume that accounts receivable turn over 10 times annually, then that 5% error rate implies only a $5000 misstatement in accounts receivable. Whether sales or accounts receivable are 'fairly stated' depends on the immateriality level set by the auditor -- a $10,000 materiality level would imply that sales is not fairly presented, while accounts receivable is fairly stated.

At year-end where there will be a complete set of transactions available for the year, and substantive samples are typically focused on acceptance sampling to determine of the account balance is 'fairly stated' (does not contain intolerable or material error). The approach is the same as attribute sampling of amounts, and is inherently more straightforward than interim control tests. Substantive tests estimate the error rate in an account balance with some confidence (e.g., 95%) that the estimate is within the 'materiality' or 'intolerable error' cutoff for that account balance.

For example, consider sampling sales invoices from the accounts receivable aging report and comparing them to supporting documentation to see if they were billed in the correct amounts, to the correct customers, and on the correct dates. Additionally, auditors might trace invoices to shipping log, and match invoice dates to the shipment dates for those items in the shipping log, to see if sales are being recorded in the correct accounting period. This can include an examination of invoices issued after the period being audited, to see if they should have been included in a prior period.

Acceptance sampling size is determined using Cohen's power analysis.

In [None]:
# Acceptance sample for estimating 'amount' of error in an account balance
size = 100000  # Account balance
mu = 50  # average value of account transaction
Delta = 0.05 * mu  # detect 5% amount
sigma = 30  # variability
effect = Delta / sigma

sample_n = power_analysis.solve_power(effect_size=effect, alpha=0.05, power=0.8, alternative='larger')

print(f"\nAcceptance sample size for amount of error = {np.ceil(sample_n):.0f}")