# Chapter 2: Metrics and Estimates

Converted from R to Python

## Introduction

Statisticians estimate; business analysts measure. Statisticians often use the terms *statistic* and *estimate* for values calculated from the data, to draw a distinction between interpretations of the data, and the 'true' state of affairs. Data scientists and business analysts are more likely to refer to such values as a *metric*. The difference reflects the approach of statistics versus data science: accounting for uncertainty lies at the heart of the discipline of statistics. Business or organizational objectives are the focus of data science.

In the past, the auditor's initial step when confronted with a new database is to 'foot' the dataset (i.e., compute a total) and 'agree' that total to the client's records (i.e., see whether client's records agree with the computed total).

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import auditanalytics as aa

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

## Footing and Agreeing

This is done with pandas `sum()` method or our `foot_and_agree()` function:

In [None]:
# Load the disbursements journal
disburse = aa.data.load_data('ch_2_AP_disbursements_journal.csv')

# Display summary
print(disburse.describe())

# Foot the total
print(f"\n\nFooted total of disbursements journal = ${disburse['amount_paid'].sum():,.2f}")

## Summary Statistics with pandas

Python's pandas library has powerful built-in methods for generating summary statistics. Here are several ways to analyze the banking industry dataset:

In [None]:
# Load banking financial data
bank_fin = aa.data.load_data('ch_2_yahoo_fin.csv')

# Basic statistics
print("Basic Summary Statistics:")
print(bank_fin.describe())

## Advanced Summary Statistics

The auditanalytics package provides enhanced summary statistics similar to R's Hmisc and psych packages:

In [None]:
# Extended statistics including skewness and kurtosis
extended_stats = aa.core.compute_summary_stats(bank_fin, describe_type='extended')
print("\nExtended Summary Statistics:")
print(extended_stats)

## Group-wise Statistics

Pandas makes it easy to compute statistics by groups:

In [None]:
# If the data has a 'name' column, we can group by it
if 'name' in bank_fin.columns:
    grouped_stats = aa.core.compute_summary_stats(bank_fin, group_by='name')
    print("\nGrouped Statistics:")
    print(grouped_stats)
else:
    print("No 'name' column found for grouping")

## Creating Formatted Tables

For audit papers, we can create nicely formatted tables:

In [None]:
# Create a styled table
summary = bank_fin.describe().round(2)
summary.style.background_gradient(cmap='Blues')

## Visualization

Python's matplotlib and seaborn make it easy to visualize data:

In [None]:
# Select numeric columns only
numeric_cols = bank_fin.select_dtypes(include=[np.number]).columns

if len(numeric_cols) > 0:
    # Create distribution plots
    fig, axes = plt.subplots(1, min(3, len(numeric_cols)), figsize=(15, 4))
    
    if len(numeric_cols) == 1:
        axes = [axes]
    
    for idx, col in enumerate(numeric_cols[:3]):
        if len(numeric_cols) > 1:
            ax = axes[idx]
        else:
            ax = axes[0]
        
        bank_fin[col].hist(ax=ax, bins=20, edgecolor='black')
        ax.set_title(f'Distribution of {col}')
        ax.set_xlabel(col)
        ax.set_ylabel('Frequency')
    
    plt.tight_layout()
    plt.show()
else:
    print("No numeric columns found for visualization")

## Conclusion

This notebook demonstrates basic audit analytics operations in Python:

1. **Footing and agreeing** - Verifying totals
2. **Summary statistics** - Computing descriptive statistics
3. **Grouped analysis** - Analyzing data by categories
4. **Visualization** - Creating charts and graphs

All functionality from the R version is available in Python with similar or enhanced capabilities.