# Chapter 2: Foundations of Audit Analytics

Broad application of statistical and data analytic tools are essential for the cost-effective completion of audits. This notebook demonstrates Python tools for analytics that provide a scientific basis for audit decisions.

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import auditanalytics as aa

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

## Metrics and Estimates

Statisticians estimate; business analysts measure. In auditing, accounting for uncertainty lies at the heart of the discipline.

### Footing and Agreeing

The auditor's initial step when confronted with a new database is to 'foot' the dataset (compute a total) and 'agree' that total to the client's records.

In [None]:
# Load disbursements journal
disburse = aa.load_dataset('ch_2_AP_disbursements_journal')

# Display summary statistics
print(disburse.describe())
print(f"\n\nFooted total of disbursements journal = ${disburse['amount_paid'].sum():,.2f}")

## Banking Industry Dataset Analysis

Let's analyze financial data from the banking industry.

In [None]:
# Load banking financial data
bank_fin = aa.load_dataset('ch_2_yahoo_fin')

# Display basic information
print("Dataset shape:", bank_fin.shape)
print("\nColumn names:")
print(bank_fin.columns.tolist())
print("\nFirst few rows:")
bank_fin.head()

In [None]:
# Descriptive statistics
print("Descriptive Statistics:")
bank_fin.describe(include='all')

In [None]:
# Group-by analysis if 'name' column exists
if 'name' in bank_fin.columns:
    print("\nStatistics by Institution:")
    print(bank_fin.groupby('name').describe())

## Data Visualization

Visual analysis helps identify patterns and anomalies in financial data.

In [None]:
# Example: Distribution plot for numerical columns
numeric_cols = bank_fin.select_dtypes(include=[np.number]).columns

if len(numeric_cols) > 0:
    # Plot first numeric column as example
    plt.figure(figsize=(10, 6))
    sns.histplot(bank_fin[numeric_cols[0]].dropna(), kde=True)
    plt.title(f'Distribution of {numeric_cols[0]}')
    plt.xlabel(numeric_cols[0])
    plt.ylabel('Frequency')
    plt.show()

## Statistical Testing

Statistical tests help validate assumptions about the data.

In [None]:
# Normality test on numeric data
if len(numeric_cols) > 0:
    sample_col = numeric_cols[0]
    data = bank_fin[sample_col].dropna()
    
    if len(data) > 0:
        statistic, p_value = stats.normaltest(data)
        print(f"Normality test for {sample_col}:")
        print(f"  Statistic: {statistic:.4f}")
        print(f"  P-value: {p_value:.4f}")
        print(f"  Result: {'Data appears normal' if p_value > 0.05 else 'Data does not appear normal'}")

## Conclusion

This notebook demonstrates fundamental statistical and analytical techniques used in audit analytics with Python. These tools provide the foundation for more advanced audit procedures covered in subsequent chapters.