### Finance – Ensuring Accurate Transactions

**Task 1**: Transaction Data Validation Insights

**Objective**: Maintain transaction integrity.

**Steps**:
1. Choose a sample financial transaction dataset.
2. Identify common transaction issues like duplicate entries or incorrect amounts.
3. Develop a list of validation checks specific to financial transactions.

In [1]:
import pandas as pd

# Sample transaction data as a dictionary
data = {
    'Transaction_ID': ['TXN001', 'TXN002', 'TXN003', 'TXN003', 'TXN005'],
    'Date': ['2025-05-01', '2025-05-01', '2025-05-02', '2025-05-02', '2026-01-01'],
    'Sender_Account': ['1234567890', '1234567890', '2345678901', '2345678901', '3456789012'],
    'Receiver_Account': ['0987654321', '0987654321', '8765432109', '8765432109', '4567890123'],
    'Amount': [500.00, -100.00, 250.00, 250.00, 99.999],
    'Currency': ['USD', 'USD', 'USD', 'USD', 'usd'],
    'Status': ['Success', 'Failed', 'Pending', 'Pending', 'Succcess']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Save to CSV
df.to_csv('sample_transactions.csv', index=False)

print("✅ 'sample_transactions.csv' has been created.")


✅ 'sample_transactions.csv' has been created.


In [2]:
# Write your code from here
import pandas as pd
import numpy as np
from datetime import datetime

# Load the CSV file
df = pd.read_csv('sample_transactions.csv')

# Convert 'Date' to datetime format
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

print("Original Data:\n", df)

# --- Validation Checks ---

# 1. Duplicate Transaction_ID
duplicate_ids = df[df.duplicated('Transaction_ID', keep=False)]

# 2. Negative or Zero Amounts for 'Success' Status
invalid_amounts = df[(df['Amount'] <= 0) & (df['Status'] == 'Success')]

# 3. Invalid Currency Codes (expect uppercase ISO 3-letter codes)
valid_currencies = ['USD', 'EUR', 'INR']
invalid_currency = df[~df['Currency'].isin(valid_currencies)]

# 4. Missing Critical Fields
missing_values = df[df[['Transaction_ID', 'Amount', 'Sender_Account', 'Status']].isnull().any(axis=1)]

# 5. Invalid Account Numbers (non-numeric or not 10 digits)
invalid_accounts = df[~df['Sender_Account'].astype(str).str.isnumeric() |
                      (df['Sender_Account'].astype(str).str.len() != 10)]

# 6. Invalid Status Values
valid_statuses = ['Success', 'Failed', 'Pending']
invalid_status = df[~df['Status'].isin(valid_statuses)]

# 7. Future Transaction Dates
future_dates = df[df['Date'] > pd.to_datetime(datetime.now())]

# 8. Duplicate Transactions (same sender, receiver, amount, and date)
duplicate_rows = df[df.duplicated(subset=['Sender_Account', 'Receiver_Account', 'Amount', 'Date'], keep=False)]

# 9. Amounts with too many decimal places
decimal_check = df[df['Amount'].apply(lambda x: round(x, 2) != x)]

# --- Display Issues ---

print("\n--- VALIDATION RESULTS ---")
print("\n[1] Duplicate Transaction IDs:\n", duplicate_ids)
print("\n[2] Invalid Amounts for 'Success':\n", invalid_amounts)
print("\n[3] Invalid Currency Codes:\n", invalid_currency)
print("\n[4] Missing Critical Fields:\n", missing_values)
print("\n[5] Invalid Account Numbers:\n", invalid_accounts)
print("\n[6] Invalid Status Values:\n", invalid_status)
print("\n[7] Future-dated Transactions:\n", future_dates)
print("\n[8] Duplicate Transactions (by details):\n", duplicate_rows)
print("\n[9] Amounts with Too Many Decimal Places:\n", decimal_check)


Original Data:
   Transaction_ID       Date  Sender_Account  Receiver_Account   Amount  \
0         TXN001 2025-05-01      1234567890         987654321  500.000   
1         TXN002 2025-05-01      1234567890         987654321 -100.000   
2         TXN003 2025-05-02      2345678901        8765432109  250.000   
3         TXN003 2025-05-02      2345678901        8765432109  250.000   
4         TXN005 2026-01-01      3456789012        4567890123   99.999   

  Currency    Status  
0      USD   Success  
1      USD    Failed  
2      USD   Pending  
3      USD   Pending  
4      usd  Succcess  

--- VALIDATION RESULTS ---

[1] Duplicate Transaction IDs:
   Transaction_ID       Date  Sender_Account  Receiver_Account  Amount  \
2         TXN003 2025-05-02      2345678901        8765432109   250.0   
3         TXN003 2025-05-02      2345678901        8765432109   250.0   

  Currency   Status  
2      USD  Pending  
3      USD  Pending  

[2] Invalid Amounts for 'Success':
 Empty DataFrame
C

**Task 2**: Implement Financial Data Validation

**Objective**: Use automated tools to ensure transaction accuracy.

**Steps**:
1. Integrate data validation rules into your existing financial systems.
2. Ensure real-time checks to validate data upon entry.

In [3]:
# Write your code from here
import pandas as pd
from datetime import datetime

def validate_transactions(df):
    issues = {}

    # 1. Duplicate Transaction_ID
    issues['duplicate_ids'] = df[df.duplicated('Transaction_ID', keep=False)]

    # 2. Negative or Zero Amounts for 'Success'
    issues['invalid_amounts'] = df[(df['Amount'] <= 0) & (df['Status'] == 'Success')]

    # 3. Invalid Currency
    valid_currencies = ['USD', 'EUR', 'INR']
    issues['invalid_currency'] = df[~df['Currency'].isin(valid_currencies)]

    # 4. Missing Required Fields
    issues['missing_fields'] = df[df[['Transaction_ID', 'Amount', 'Sender_Account', 'Status']].isnull().any(axis=1)]

    # 5. Invalid Account Format
    issues['invalid_accounts'] = df[~df['Sender_Account'].astype(str).str.isnumeric() |
                                    (df['Sender_Account'].astype(str).str.len() != 10)]

    # 6. Incorrect Status
    valid_statuses = ['Success', 'Failed', 'Pending']
    issues['invalid_status'] = df[~df['Status'].isin(valid_statuses)]

    # 7. Future Dates
    issues['future_dates'] = df[df['Date'] > pd.to_datetime(datetime.now())]

    # 8. Duplicate Transaction Details
    issues['duplicate_rows'] = df[df.duplicated(['Sender_Account', 'Receiver_Account', 'Amount', 'Date'], keep=False)]

    # 9. Too Many Decimals
    issues['decimal_precision'] = df[df['Amount'].apply(lambda x: round(x, 2) != x)]

    return issues

