### Finance – Ensuring Accurate Transactions

**Task 1**: Transaction Data Validation Insights

**Objective**: Maintain transaction integrity.

**Steps**:
1. Choose a sample financial transaction dataset.
2. Identify common transaction issues like duplicate entries or incorrect amounts.
3. Develop a list of validation checks specific to financial transactions.

In [1]:
# Write your code from here
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# 1. Create Sample Financial Transaction Data
data = {
    'transaction_id': ['T001', 'T002', 'T003', 'T003', 'T005', 'T006', 'T007', 'T008'],
    'account_id': ['A1001', 'A1002', 'A1001', 'A1001', 'A1003', 'A1002', 'A1004', 'A1004'],
    'amount': [250.0, 520.0, -100.0, -100.0, 100000.0, 0.0, 330.5, None],
    'transaction_date': [
        '2024-06-01', '2024-06-02', '2024-06-03', '2024-06-03',
        '2024-06-04', '2024-13-01', '2024-06-06', ''
    ]
}

df = pd.DataFrame(data)

# Convert transaction_date to datetime, handle errors
df['transaction_date'] = pd.to_datetime(df['transaction_date'], errors='coerce')

print("\n🔍 Original Transaction Data:")
print(df)

# 2. Validation Checks

# Check 1: Duplicate Transactions
duplicates = df[df.duplicated(subset=['transaction_id', 'amount', 'transaction_date'], keep=False)]
print("\n⚠️ Duplicate Transactions Found:")
print(duplicates)

# Check 2: Missing or Null Fields
missing_values = df[df.isnull().any(axis=1)]
print("\n⚠️ Transactions with Missing Fields:")
print(missing_values)

# Check 3: Negative or Zero Transaction Amounts (flag if not refunds)
invalid_amounts = df[df['amount'] <= 0]
print("\n⚠️ Invalid Transaction Amounts (≤ 0):")
print(invalid_amounts)

# Check 4: Extremely High Amounts (Possible Outliers)
upper_limit = df['amount'].dropna().quantile(0.99)
outliers = df[df['amount'] > upper_limit]
print("\n⚠️ Potential Outlier Transactions (above 99th percentile):")
print(outliers)

# Check 5: Invalid Dates
invalid_dates = df[df['transaction_date'].isnull()]
print("\n⚠️ Invalid or Malformed Transaction Dates:")
print(invalid_dates)

# 3. Summary of Validation Checks
print("\n✅ Validation Summary:")
print(f"Total Transactions: {len(df)}")
print(f"Duplicate Entries: {len(duplicates)}")
print(f"Missing Fields: {len(missing_values)}")
print(f"Negative/Zero Amounts: {len(invalid_amounts)}")
print(f"Outliers (above 99th percentile): {len(outliers)}")
print(f"Invalid Dates: {len(invalid_dates)}")




🔍 Original Transaction Data:
  transaction_id account_id    amount transaction_date
0           T001      A1001     250.0       2024-06-01
1           T002      A1002     520.0       2024-06-02
2           T003      A1001    -100.0       2024-06-03
3           T003      A1001    -100.0       2024-06-03
4           T005      A1003  100000.0       2024-06-04
5           T006      A1002       0.0              NaT
6           T007      A1004     330.5       2024-06-06
7           T008      A1004       NaN              NaT

⚠️ Duplicate Transactions Found:
  transaction_id account_id  amount transaction_date
2           T003      A1001  -100.0       2024-06-03
3           T003      A1001  -100.0       2024-06-03

⚠️ Transactions with Missing Fields:
  transaction_id account_id  amount transaction_date
5           T006      A1002     0.0              NaT
7           T008      A1004     NaN              NaT

⚠️ Invalid Transaction Amounts (≤ 0):
  transaction_id account_id  amount transactio

**Task 2**: Implement Financial Data Validation

**Objective**: Use automated tools to ensure transaction accuracy.

**Steps**:
1. Integrate data validation rules into your existing financial systems.
2. Ensure real-time checks to validate data upon entry.

In [2]:
# Write your code from here

import pandas as pd
from datetime import datetime

# In-memory store for transactions
transaction_store = pd.DataFrame(columns=[
    'transaction_id', 'account_id', 'amount', 'transaction_date', 'type'
])

# Define validation rules
def validate_transaction(txn, existing_ids):
    errors = []

    # Rule 1: Unique Transaction ID
    if txn['transaction_id'] in existing_ids:
        errors.append("Duplicate transaction_id.")

    # Rule 2: Required Fields Not Null
    for field in ['transaction_id', 'account_id', 'amount', 'transaction_date']:
        if not txn.get(field):
            errors.append(f"Missing required field: {field}")

    # Rule 3: Valid Date Format
    try:
        datetime.strptime(txn['transaction_date'], "%Y-%m-%d")
    except ValueError:
        errors.append("Invalid date format (should be YYYY-MM-DD).")

    # Rule 4: Amount must be > 0 unless it’s a refund
    if txn['type'] != 'refund' and (txn['amount'] is None or txn['amount'] <= 0):
        errors.append("Amount must be > 0 unless it's a refund.")

    # Rule 5: Amount must be within a realistic range
    if txn['amount'] and abs(txn['amount']) > 1_000_000:
        errors.append("Amount exceeds maximum limit.")

    return errors


# Simulated real-time entry function
def submit_transaction(txn):
    global transaction_store

    existing_ids = set(transaction_store['transaction_id'])

    errors = validate_transaction(txn, existing_ids)

    if errors:
        print(f"❌ Transaction Rejected: {txn['transaction_id']}")
        for err in errors:
            print(f" - {err}")
    else:
        # Format and append
        txn['transaction_date'] = pd.to_datetime(txn['transaction_date'])
        transaction_store = pd.concat([transaction_store, pd.DataFrame([txn])], ignore_index=True)
        print(f"✅ Transaction Accepted: {txn['transaction_id']}")


# ----------------------------
# Test Transactions (Simulating real-time entry)
# ----------------------------
transactions = [
    {'transaction_id': 'T001', 'account_id': 'A1001', 'amount': 250.00, 'transaction_date': '2024-06-01', 'type': 'purchase'},
    {'transaction_id': 'T002', 'account_id': 'A1002', 'amount': -50.00, 'transaction_date': '2024-06-02', 'type': 'refund'},
    {'transaction_id': 'T001', 'account_id': 'A1001', 'amount': 250.00, 'transaction_date': '2024-06-01', 'type': 'purchase'},  # Duplicate
    {'transaction_id': 'T003', 'account_id': 'A1003', 'amount': 0.00, 'transaction_date': '2024-06-03', 'type': 'purchase'},   # Invalid amount
    {'transaction_id': 'T004', 'account_id': 'A1004', 'amount': 9999999.99, 'transaction_date': '2024-06-04', 'type': 'purchase'},  # Outlier
    {'transaction_id': 'T005', 'account_id': '', 'amount': 300.00, 'transaction_date': 'invalid-date', 'type': 'purchase'},  # Bad date
    {'transaction_id': 'T006', 'account_id': 'A1006', 'amount': 100.00, 'transaction_date': '2024-06-06', 'type': 'purchase'}  # Valid
]

# Simulate transaction entry
for txn in transactions:
    submit_transaction(txn)

# View final valid transactions
print("\n✅ Final Valid Transactions Stored:")
print(transaction_store)


✅ Transaction Accepted: T001
✅ Transaction Accepted: T002
❌ Transaction Rejected: T001
 - Duplicate transaction_id.
❌ Transaction Rejected: T003
 - Missing required field: amount
 - Amount must be > 0 unless it's a refund.
❌ Transaction Rejected: T004
 - Amount exceeds maximum limit.
❌ Transaction Rejected: T005
 - Missing required field: account_id
 - Invalid date format (should be YYYY-MM-DD).
✅ Transaction Accepted: T006

✅ Final Valid Transactions Stored:
  transaction_id account_id  amount transaction_date      type
0           T001      A1001   250.0       2024-06-01  purchase
1           T002      A1002   -50.0       2024-06-02    refund
2           T006      A1006   100.0       2024-06-06  purchase


  transaction_store = pd.concat([transaction_store, pd.DataFrame([txn])], ignore_index=True)
