### Finance – Ensuring Accurate Transactions

**Task 1**: Transaction Data Validation Insights

**Objective**: Maintain transaction integrity.

**Steps**:
1. Choose a sample financial transaction dataset.
2. Identify common transaction issues like duplicate entries or incorrect amounts.
3. Develop a list of validation checks specific to financial transactions.

In [1]:
import pandas as pd

# --- Step 1: Sample Financial Transaction Dataset ---
data = {
    'transaction_id': [1001, 1002, 1003, 1004, 1002],  # duplicate ID 1002
    'account_id': ['A001', 'A002', 'A001', 'A003', 'A002'],
    'transaction_date': ['2025-05-01', '2025-05-02', '2025-05-03', '2025-05-04', '2025-05-02'],
    'amount': [250.00, -150.00, 300.00, 500.00, -150.00],  # negative amounts allowed for refunds
    'transaction_type': ['debit', 'credit', 'debit', 'credit', 'credit'],
}

df = pd.DataFrame(data)
df['transaction_date'] = pd.to_datetime(df['transaction_date'])

# Master list of valid account IDs (could come from a reference table)
valid_accounts = {'A001', 'A002', 'A003', 'A004'}

# --- Step 2: Validation Checks ---

def validate_transactions(df, valid_accounts):
    print("Starting transaction data validation...\n")
    
    # Check 1: Duplicate transaction IDs
    duplicates = df[df.duplicated(subset=['transaction_id'], keep=False)]
    if not duplicates.empty:
        print("Duplicate transaction IDs found:")
        print(duplicates[['transaction_id', 'account_id', 'amount']], "\n")
    else:
        print("No duplicate transaction IDs found.\n")
        
    # Check 2: Amount validity (e.g., no zero amounts)
    zero_amounts = df[df['amount'] == 0]
    if not zero_amounts.empty:
        print("Transactions with zero amount found:")
        print(zero_amounts, "\n")
    else:
        print("No zero amount transactions found.\n")
    
    # Check 3: Valid transaction types
    valid_types = ['debit', 'credit']
    invalid_types = df[~df['transaction_type'].isin(valid_types)]
    if not invalid_types.empty:
        print("Invalid transaction types found:")
        print(invalid_types[['transaction_id', 'transaction_type']], "\n")
    else:
        print("All transaction types are valid.\n")
        
    # Check 4: Transaction dates not in the future
    today = pd.Timestamp.today()
    future_dates = df[df['transaction_date'] > today]
    if not future_dates.empty:
        print("Transactions with future dates found:")
        print(future_dates[['transaction_id', 'transaction_date']], "\n")
    else:
        print("No future transaction dates found.\n")
    
    # Check 5: Valid account IDs
    invalid_accounts = df[~df['account_id'].isin(valid_accounts)]
    if not invalid_accounts.empty:
        print("Transactions with invalid account IDs found:")
        print(invalid_accounts[['transaction_id', 'account_id']], "\n")
    else:
        print("All account IDs are valid.\n")

# Run validation
validate_transactions(df, valid_accounts)

Starting transaction data validation...

Duplicate transaction IDs found:
   transaction_id account_id  amount
1            1002       A002  -150.0
4            1002       A002  -150.0 

No zero amount transactions found.

All transaction types are valid.

No future transaction dates found.

All account IDs are valid.



**Task 2**: Implement Financial Data Validation

**Objective**: Use automated tools to ensure transaction accuracy.

**Steps**:
1. Integrate data validation rules into your existing financial systems.
2. Ensure real-time checks to validate data upon entry.

In [2]:
import pandas as pd
from datetime import datetime

# Master valid accounts list (could come from your DB or config)
valid_accounts = {'A001', 'A002', 'A003', 'A004'}

# Valid transaction types
valid_types = {'debit', 'credit'}

# Validation functions
def check_duplicate(transaction_id, existing_df):
    return transaction_id in existing_df['transaction_id'].values

def check_amount(amount):
    return amount != 0

def check_transaction_type(t_type):
    return t_type in valid_types

def check_transaction_date(t_date):
    return t_date <= pd.Timestamp.now()

def check_account_id(account_id):
    return account_id in valid_accounts

# Real-time transaction validation function
def validate_transaction_entry(transaction, existing_df):
    """
    transaction: dict with keys: transaction_id, account_id, transaction_date, amount, transaction_type
    existing_df: DataFrame of existing transactions to check duplicates
    
    Returns: (bool, list of error messages)
    """
    errors = []
    
    if check_duplicate(transaction['transaction_id'], existing_df):
        errors.append("Duplicate transaction ID.")
    if not check_amount(transaction['amount']):
        errors.append("Amount cannot be zero.")
    if not check_transaction_type(transaction['transaction_type']):
        errors.append(f"Invalid transaction type: {transaction['transaction_type']}")
    if not check_transaction_date(transaction['transaction_date']):
        errors.append("Transaction date is in the future.")
    if not check_account_id(transaction['account_id']):
        errors.append(f"Invalid account ID: {transaction['account_id']}")
    
    return (len(errors) == 0), errors

# --- Simulate existing transactions ---
existing_transactions = pd.DataFrame({
    'transaction_id': [1001, 1002, 1003],
    'account_id': ['A001', 'A002', 'A001'],
    'transaction_date': pd.to_datetime(['2025-05-01', '2025-05-02', '2025-05-03']),
    'amount': [250.00, -150.00, 300.00],
    'transaction_type': ['debit', 'credit', 'debit'],
})

# --- Simulate new incoming transaction entries ---
new_transactions = [
    {
        'transaction_id': 1002,  # duplicate ID
        'account_id': 'A002',
        'transaction_date': pd.Timestamp('2025-05-05'),
        'amount': -150.00,
        'transaction_type': 'credit',
    },
    {
        'transaction_id': 1004,
        'account_id': 'A005',  # invalid account
        'transaction_date': pd.Timestamp('2025-05-06'),
        'amount': 0,           # zero amount invalid
        'transaction_type': 'debit',
    },
    {
        'transaction_id': 1005,
        'account_id': 'A003',
        'transaction_date': pd.Timestamp('2026-01-01'),  # future date invalid
        'amount': 500.00,
        'transaction_type': 'transfer',                 # invalid transaction type
    },
    {
        'transaction_id': 1006,
        'account_id': 'A003',
        'transaction_date': pd.Timestamp('2025-05-04'),
        'amount': 400.00,
        'transaction_type': 'debit',
    },
]

# --- Validate new transactions ---
for tx in new_transactions:
    is_valid, errs = validate_transaction_entry(tx, existing_transactions)
    print(f"Validating transaction ID {tx['transaction_id']}:")
    if is_valid:
        print("  -> Valid transaction. Can be accepted.\n")
        # Use pd.concat instead of deprecated append
        existing_transactions = pd.concat([existing_transactions, pd.DataFrame([tx])], ignore_index=True)
    else:
        print("  -> Invalid transaction due to:")
        for e in errs:
            print(f"     - {e}")
        print()

Validating transaction ID 1002:
  -> Invalid transaction due to:
     - Duplicate transaction ID.

Validating transaction ID 1004:
  -> Invalid transaction due to:
     - Amount cannot be zero.
     - Invalid account ID: A005

Validating transaction ID 1005:
  -> Invalid transaction due to:
     - Invalid transaction type: transfer
     - Transaction date is in the future.

Validating transaction ID 1006:
  -> Valid transaction. Can be accepted.

