### Finance – Ensuring Accurate Transactions

**Task 1**: Transaction Data Validation Insights

**Objective**: Maintain transaction integrity.

**Steps**:
1. Choose a sample financial transaction dataset.
2. Identify common transaction issues like duplicate entries or incorrect amounts.
3. Develop a list of validation checks specific to financial transactions.

In [1]:
import pandas as pd

# Sample financial transactions data
data = {
    'TransactionID': [101, 102, 103, 104, 102],  # Note duplicate TransactionID 102
    'AccountID': [2001, 2002, 2003, 2004, 2002],
    'TransactionDate': ['2025-05-20', '2025-05-21', '2025-05-21', '2025-05-22', '2025-05-21'],
    'Amount': [100.00, 250.50, -150.00, 300.00, 250.50],  # Negative amount might be invalid depending on context
    'TransactionType': ['Credit', 'Debit', 'Debit', 'Credit', 'Debit']
}

# Load into DataFrame
df = pd.DataFrame(data)

# Step 1: Identify duplicate transactions by TransactionID
duplicates = df[df.duplicated(subset=['TransactionID'], keep=False)]

# Step 2: Identify transactions with invalid amounts (e.g., negative amounts where not allowed)
# Assuming amounts should be positive for this example
invalid_amounts = df[df['Amount'] <= 0]

# Step 3: Define validation checks
def validate_transactions(df):
    issues = {}

    # Check for duplicate TransactionID
    dup = df[df.duplicated(subset=['TransactionID'], keep=False)]
    if not dup.empty:
        issues['DuplicateTransactionIDs'] = dup['TransactionID'].unique().tolist()

    # Check for invalid amounts (<=0)
    invalid_amt = df[df['Amount'] <= 0]
    if not invalid_amt.empty:
        issues['InvalidAmounts'] = invalid_amt.index.tolist()

    # Check for missing critical fields
    missing_fields = df.isnull().sum()
    missing_fields = missing_fields[missing_fields > 0]
    if not missing_fields.empty:
        issues['MissingFields'] = missing_fields.to_dict()

    return issues

# Run validations
validation_issues = validate_transactions(df)

print("Duplicate Transactions:\n", duplicates)
print("\nInvalid Amount Transactions:\n", invalid_amounts)
print("\nValidation Issues Summary:")
print(validation_issues)


Duplicate Transactions:
    TransactionID  AccountID TransactionDate  Amount TransactionType
1            102       2002      2025-05-21   250.5           Debit
4            102       2002      2025-05-21   250.5           Debit

Invalid Amount Transactions:
    TransactionID  AccountID TransactionDate  Amount TransactionType
2            103       2003      2025-05-21  -150.0           Debit

Validation Issues Summary:
{'DuplicateTransactionIDs': [102], 'InvalidAmounts': [2]}


**Task 2**: Implement Financial Data Validation

**Objective**: Use automated tools to ensure transaction accuracy.

**Steps**:
1. Integrate data validation rules into your existing financial systems.
2. Ensure real-time checks to validate data upon entry.

In [2]:
import pandas as pd

# Existing dataset (for reference, could be a database or file in practice)
existing_transactions = pd.DataFrame({
    'TransactionID': [101, 102, 103],
    'AccountID': [2001, 2002, 2003],
    'TransactionDate': ['2025-05-20', '2025-05-21', '2025-05-21'],
    'Amount': [100.00, 250.50, 150.00],
    'TransactionType': ['Credit', 'Debit', 'Debit']
})

# Validation function for incoming transaction data
def validate_transaction(new_txn, existing_df):
    errors = []

    # Check for duplicate TransactionID
    if new_txn['TransactionID'] in existing_df['TransactionID'].values:
        errors.append(f"Duplicate TransactionID: {new_txn['TransactionID']}")

    # Check that amount is positive
    if new_txn['Amount'] <= 0:
        errors.append(f"Invalid Amount: {new_txn['Amount']} (must be positive)")

    # Check required fields are present and not null
    required_fields = ['TransactionID', 'AccountID', 'TransactionDate', 'Amount', 'TransactionType']
    for field in required_fields:
        if field not in new_txn or new_txn[field] is None:
            errors.append(f"Missing or null field: {field}")

    return errors

# Example incoming transactions (simulating real-time entries)
incoming_transactions = [
    {'TransactionID': 104, 'AccountID': 2004, 'TransactionDate': '2025-05-22', 'Amount': 300.00, 'TransactionType': 'Credit'},
    {'TransactionID': 102, 'AccountID': 2002, 'TransactionDate': '2025-05-21', 'Amount': 250.50, 'TransactionType': 'Debit'},  # duplicate ID
    {'TransactionID': 105, 'AccountID': 2005, 'TransactionDate': '2025-05-23', 'Amount': -50.00, 'TransactionType': 'Debit'},   # invalid amount
]

# Process each incoming transaction with validation
for txn in incoming_transactions:
    errors = validate_transaction(txn, existing_transactions)
    if errors:
        print(f"Transaction {txn['TransactionID']} validation failed with errors:")
        for err in errors:
            print(f" - {err}")
    else:
        print(f"Transaction {txn['TransactionID']} passed validation.")
        # If valid, add to existing_transactions (simulate database insert)
        existing_transactions = existing_transactions.append(txn, ignore_index=True)

print("\nUpdated Transactions DataFrame:")
print(existing_transactions)


Transaction 104 passed validation.


AttributeError: 'DataFrame' object has no attribute 'append'