# Metrics and Estimates

Statisticians estimate; business analysts measure. Statisticians often use the terms *statistic* and *estimate* for values calculated from the data, to draw a distinction between interpretations of the data, and the 'true' state of affairs. Data scientists and business analysts are more likely to refer to such values as a *metric*. The difference reflects the approach of statistics versus data science: accounting for uncertainty lies at the heart of the discipline of statistics. Business or organizational objectives are the focus of data science.

In the past, the auditors initial step when confronted with a new database is to 'foot' the dataset (i.e., compute a total) and 'agree' that total to the client's records (i.e., see whether client's records agree with the computed total). This is done with pandas sum methods in Python.

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path

# Load the disbursements journal
# Note: Update the path to match your data location
data_path = Path('../data')
disburse = pd.read_csv(data_path / 'ch_2_AP_disbursements_journal.csv')

print("Summary statistics:")
print(disburse.describe())

print(f"\n\nFooted total of disbursements journal = {disburse['amount_paid'].sum():.2f}")

Python has pandas and other packages that generate basic statistics from the data, beyond the built-in describe() method. Here are some useful approaches applied to a banking industry dataset.

In [None]:
# Load banking financial data
bank_fin = pd.read_csv(data_path / 'ch_2_yahoo_fin.csv')

# Display basic info
print("Dataset Info:")
print(bank_fin.info())

print("\n" + "="*80)
print("Detailed Statistics:")
print("="*80)
print(bank_fin.describe())

In [None]:
# Group-by statistics
if 'name' in bank_fin.columns:
    print("\nStatistics by Bank Name:")
    print(bank_fin.groupby('name').describe())

If the auditor wishes to use summary statistics for further processing, these can be easily accessed as DataFrames. The statistics can also be formatted into professional tables for inclusion in audit papers.

In [None]:
# Create a formatted summary statistics table
summary_stats = bank_fin.describe().T
summary_stats['range'] = summary_stats['max'] - summary_stats['min']

print("\nFormatted Summary Statistics Table:")
print(summary_stats.to_string())

# You can also export to various formats:
# summary_stats.to_csv('summary_stats.csv')
# summary_stats.to_excel('summary_stats.xlsx')
# summary_stats.to_html('summary_stats.html')

In [None]:
# Additional statistical measures
from scipy import stats

# Select numeric columns only
numeric_cols = bank_fin.select_dtypes(include=[np.number]).columns

print("\nAdditional Statistical Measures:")
for col in numeric_cols:
    data = bank_fin[col].dropna()
    if len(data) > 0:
        print(f"\n{col}:")
        print(f"  Skewness: {stats.skew(data):.4f}")
        print(f"  Kurtosis: {stats.kurtosis(data):.4f}")
        print(f"  Coefficient of Variation: {(data.std() / data.mean()):.4f}")