[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Digital-AI-Finance/Digital-Finance-Introduction/blob/main/day_02/notebooks/NB02_Payment_Analysis.ipynb)

# NB02: Payment Transaction Analysis

**Topic:** 2.1 - Platform Finance and Payment Systems

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Analyze Payment Data**: Generate and explore synthetic payment transaction data
2. **Understand Payment Flows**: Identify patterns in payment volumes and frequencies
3. **Calculate Payment Metrics**: Compute key metrics like settlement times, volumes, and fees
4. **Visualize Payment Networks**: Use network graphs to understand payment relationships
5. **Compare Payment Methods**: Analyze cost structures across different payment types

## üêç Before You Start: Python Basics for Day 2

**New to Python?** Don't worry! We've added a comprehensive Python primer in Day 1's NB01 notebook that covers everything you need to get started.

### Key Concepts You'll See Today

In this notebook, we use three powerful Python libraries:

- **`pandas`** - A library for working with data tables (think Excel, but in Python). We use it to organize and analyze transaction data.
- **`networkx`** - A library for analyzing networks and connections. Perfect for visualizing who pays whom.
- **`matplotlib`** - A library for creating charts and visualizations (bar charts, line plots, network graphs).

### Remember

You **don't need to write code** from scratch! Just:
- ‚úÖ **Read** the explanations
- ‚úÖ **Run** the cells (click the play button or press Shift+Enter)
- ‚úÖ **Modify** values to explore different scenarios
- ‚úÖ **Observe** how the results change

Let's dive in!

## Section 1: Setup

We'll use standard data science libraries for analysis and visualization.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from datetime import datetime, timedelta
import random

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

# Configure matplotlib
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

print("Libraries loaded successfully!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

### Understanding the Imports

Let's break down what each import statement does:

- `import pandas as pd` ‚Üí **pandas** helps us work with data tables; we call it '**pd**' for short
- `import networkx as nx` ‚Üí **networkx** lets us analyze networks (who's connected to whom); we call it '**nx**'
- `import matplotlib.pyplot as plt` ‚Üí **matplotlib** creates charts; '**plt**' is shorthand
- `import numpy as np` ‚Üí **numpy** provides mathematical functions and random number generation

The line `np.random.seed(42)` ensures we get the same "random" data every time we run the code (useful for reproducibility).

## Section 2: Generate Synthetic Payment Data

In this section, we'll create a realistic dataset of payment transactions. This simulates data you might encounter when analyzing a payment platform or financial institution.

### Payment Methods in Modern Finance

Modern payment systems include:
- **Wire Transfer**: Traditional bank-to-bank transfers (higher fees, slower)
- **ACH (Automated Clearing House)**: Electronic batch processing (lower fees, 1-3 days)
- **Credit Card**: Instant but with merchant fees (1.5-3.5%)
- **Debit Card**: Direct bank account access (lower fees than credit)
- **Digital Wallet**: Services like PayPal, Venmo (variable fees)
- **Cryptocurrency**: Blockchain-based transfers (variable fees and times)

### Pandas Basics

Common operations:
- `.head()` - first 5 rows
- `.describe()` - statistics
- `.groupby()` - group by category
- `.agg()` - aggregate stats


In [None]:
def generate_payment_data(n_transactions=5000, n_parties=50, start_date='2024-01-01', days=90):
    """
    Generate synthetic payment transaction data.
    
    Args:
        n_transactions: Number of transactions to generate
        n_parties: Number of unique parties (senders/receivers)
        start_date: Start date for transactions
        days: Number of days to span
    
    Returns:
        DataFrame with payment transactions
    """
    
    # Define payment methods with their characteristics
    payment_methods = {
        'Wire Transfer': {'fee_pct': 0.005, 'fee_fixed': 25.0, 'settlement_days': (1, 3)},
        'ACH': {'fee_pct': 0.001, 'fee_fixed': 0.5, 'settlement_days': (1, 3)},
        'Credit Card': {'fee_pct': 0.025, 'fee_fixed': 0.30, 'settlement_days': (0, 1)},
        'Debit Card': {'fee_pct': 0.015, 'fee_fixed': 0.25, 'settlement_days': (0, 1)},
        'Digital Wallet': {'fee_pct': 0.029, 'fee_fixed': 0.30, 'settlement_days': (0, 0)},
        'Cryptocurrency': {'fee_pct': 0.01, 'fee_fixed': 2.0, 'settlement_days': (0, 1)}
    }
    
    # Generate party names (businesses and individuals)
    business_types = ['Corp', 'LLC', 'Inc', 'Ltd', 'GmbH']
    first_names = ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta', 'Eta', 'Theta', 'Iota', 'Kappa']
    industries = ['Tech', 'Finance', 'Retail', 'Health', 'Energy', 'Media', 'Food', 'Auto', 'Travel', 'Education']
    
    parties = []
    for i in range(n_parties):
        if i < n_parties * 0.7:  # 70% businesses
            name = f"{random.choice(first_names)} {random.choice(industries)} {random.choice(business_types)}"
        else:  # 30% individuals
            name = f"Individual_{i+1:03d}"
        parties.append(name)
    
    # Generate transactions
    transactions = []
    base_date = datetime.strptime(start_date, '%Y-%m-%d')
    
    for i in range(n_transactions):
        # Random timestamp within the date range
        random_days = random.uniform(0, days)
        random_hours = random.uniform(8, 18)  # Business hours bias
        timestamp = base_date + timedelta(days=random_days, hours=random_hours)
        
        # Select sender and receiver (different parties)
        sender = random.choice(parties)
        receiver = random.choice([p for p in parties if p != sender])
        
        # Payment method (weighted - more common methods more likely)
        method = random.choices(
            list(payment_methods.keys()),
            weights=[0.1, 0.25, 0.25, 0.2, 0.15, 0.05]  # Weights for each method
        )[0]
        
        # Amount (log-normal distribution for realistic spread)
        amount = np.random.lognormal(mean=6, sigma=1.5)  # Median around $400
        amount = min(max(amount, 10), 100000)  # Clip to reasonable range
        
        # Calculate fee
        method_info = payment_methods[method]
        fee = amount * method_info['fee_pct'] + method_info['fee_fixed']
        
        # Settlement time
        settlement_min, settlement_max = method_info['settlement_days']
        settlement_days = random.uniform(settlement_min, settlement_max)
        settlement_date = timestamp + timedelta(days=settlement_days)
        
        # Transaction status
        status = random.choices(
            ['Completed', 'Pending', 'Failed'],
            weights=[0.92, 0.05, 0.03]
        )[0]
        
        transactions.append({
            'transaction_id': f'TXN{i+1:06d}',
            'timestamp': timestamp,
            'sender': sender,
            'receiver': receiver,
            'amount': round(amount, 2),
            'fee': round(fee, 2),
            'payment_method': method,
            'settlement_date': settlement_date,
            'settlement_hours': round(settlement_days * 24, 1),
            'status': status
        })
    
    df = pd.DataFrame(transactions)
    df = df.sort_values('timestamp').reset_index(drop=True)
    
    return df

# Generate the dataset
payments_df = generate_payment_data(n_transactions=5000, n_parties=50, days=90)

print("Payment Dataset Generated")
print("=" * 50)
print(f"Total transactions: {len(payments_df):,}")
print(f"Date range: {payments_df['timestamp'].min().date()} to {payments_df['timestamp'].max().date()}")
print(f"Unique senders: {payments_df['sender'].nunique()}")
print(f"Unique receivers: {payments_df['receiver'].nunique()}")
print(f"\nSample transactions:")
payments_df.head()

In [None]:
# Basic statistics
print("Transaction Amount Statistics")
print("=" * 50)
print(payments_df['amount'].describe())

print("\nPayment Methods Distribution")
print("=" * 50)
print(payments_df['payment_method'].value_counts())

print("\nTransaction Status Distribution")
print("=" * 50)
print(payments_df['status'].value_counts())

## Section 3: Payment Volume Analysis

Understanding payment volumes and patterns is crucial for:
- **Capacity Planning**: Ensuring systems can handle peak loads
- **Fraud Detection**: Identifying unusual patterns
- **Business Intelligence**: Understanding customer behavior
- **Revenue Forecasting**: Predicting fee income

In [None]:
# Filter to completed transactions for volume analysis
completed_df = payments_df[payments_df['status'] == 'Completed'].copy()
print(f"Analyzing {len(completed_df):,} completed transactions")

# Add date components for aggregation
completed_df['date'] = completed_df['timestamp'].dt.date
completed_df['day_of_week'] = completed_df['timestamp'].dt.day_name()
completed_df['hour'] = completed_df['timestamp'].dt.hour
completed_df['week'] = completed_df['timestamp'].dt.isocalendar().week

In [None]:
# Daily transaction volume and value
daily_stats = completed_df.groupby('date').agg({
    'amount': ['count', 'sum', 'mean'],
    'fee': 'sum'
}).round(2)
daily_stats.columns = ['transaction_count', 'total_volume', 'avg_amount', 'total_fees']
daily_stats = daily_stats.reset_index()

# Plot daily patterns
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Daily transaction count
axes[0, 0].plot(daily_stats['date'], daily_stats['transaction_count'], 
                color='steelblue', linewidth=1.5)
axes[0, 0].fill_between(daily_stats['date'], daily_stats['transaction_count'], 
                        alpha=0.3, color='steelblue')
axes[0, 0].set_title('Daily Transaction Count', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Number of Transactions')
axes[0, 0].tick_params(axis='x', rotation=45)

# Daily volume
axes[0, 1].plot(daily_stats['date'], daily_stats['total_volume']/1000, 
                color='darkgreen', linewidth=1.5)
axes[0, 1].fill_between(daily_stats['date'], daily_stats['total_volume']/1000, 
                        alpha=0.3, color='darkgreen')
axes[0, 1].set_title('Daily Transaction Volume', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Volume ($K)')
axes[0, 1].tick_params(axis='x', rotation=45)

# Day of week patterns
dow_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_stats = completed_df.groupby('day_of_week')['amount'].agg(['count', 'sum']).reindex(dow_order)
axes[1, 0].bar(dow_stats.index, dow_stats['count'], color='coral', edgecolor='darkred')
axes[1, 0].set_title('Transactions by Day of Week', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Day of Week')
axes[1, 0].set_ylabel('Number of Transactions')
axes[1, 0].tick_params(axis='x', rotation=45)

# Hourly patterns
hourly_stats = completed_df.groupby('hour')['amount'].count()
axes[1, 1].bar(hourly_stats.index, hourly_stats.values, color='mediumpurple', edgecolor='indigo')
axes[1, 1].set_title('Transactions by Hour of Day', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Hour')
axes[1, 1].set_ylabel('Number of Transactions')

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print(f"- Average daily transactions: {daily_stats['transaction_count'].mean():.0f}")
print(f"- Average daily volume: ${daily_stats['total_volume'].mean():,.0f}")
print(f"- Peak day of week: {dow_stats['count'].idxmax()}")
print(f"- Peak hour: {hourly_stats.idxmax()}:00")

In [None]:
# Volume analysis by payment method
method_stats = completed_df.groupby('payment_method').agg({
    'amount': ['count', 'sum', 'mean'],
    'fee': ['sum', 'mean']
}).round(2)
method_stats.columns = ['count', 'total_volume', 'avg_amount', 'total_fees', 'avg_fee']
method_stats['fee_rate'] = (method_stats['total_fees'] / method_stats['total_volume'] * 100).round(3)
method_stats = method_stats.sort_values('total_volume', ascending=False)

print("Payment Method Statistics")
print("=" * 80)
print(method_stats.to_string())

# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

colors = plt.cm.Set2(range(len(method_stats)))

# Transaction count by method
axes[0].pie(method_stats['count'], labels=method_stats.index, autopct='%1.1f%%', 
            colors=colors, startangle=90)
axes[0].set_title('Transaction Count Share', fontsize=12, fontweight='bold')

# Volume by method
axes[1].pie(method_stats['total_volume'], labels=method_stats.index, autopct='%1.1f%%', 
            colors=colors, startangle=90)
axes[1].set_title('Volume Share', fontsize=12, fontweight='bold')

# Average transaction size
axes[2].barh(method_stats.index, method_stats['avg_amount'], color=colors, edgecolor='gray')
axes[2].set_title('Average Transaction Size', fontsize=12, fontweight='bold')
axes[2].set_xlabel('Amount ($)')

plt.tight_layout()
plt.show()

## Section 4: Settlement Time Analysis

**Settlement time** is the period between when a payment is initiated and when funds are actually transferred. This is critical for:

- **Liquidity Management**: Understanding when funds will be available
- **Working Capital**: Businesses need to manage cash flow
- **Customer Experience**: Faster settlements improve satisfaction
- **Regulatory Compliance**: Some regulations require specific settlement windows

In [None]:
# Settlement time analysis
settlement_stats = completed_df.groupby('payment_method')['settlement_hours'].agg([
    'count', 'mean', 'median', 'std', 'min', 'max'
]).round(2)
settlement_stats.columns = ['count', 'mean_hours', 'median_hours', 'std_hours', 'min_hours', 'max_hours']
settlement_stats = settlement_stats.sort_values('mean_hours')

print("Settlement Time Statistics (in hours)")
print("=" * 80)
print(settlement_stats.to_string())

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot of settlement times
settlement_data = [completed_df[completed_df['payment_method'] == method]['settlement_hours'].values 
                   for method in settlement_stats.index]
bp = axes[0].boxplot(settlement_data, labels=settlement_stats.index, patch_artist=True)
colors = plt.cm.viridis(np.linspace(0, 1, len(settlement_stats)))
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
axes[0].set_title('Settlement Time Distribution by Payment Method', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Payment Method')
axes[0].set_ylabel('Settlement Time (hours)')
axes[0].tick_params(axis='x', rotation=45)

# Bar chart of mean settlement times
bars = axes[1].bar(settlement_stats.index, settlement_stats['mean_hours'], 
                   color=colors, edgecolor='black')
axes[1].axhline(y=24, color='red', linestyle='--', label='24 hours (1 day)')
axes[1].axhline(y=48, color='orange', linestyle='--', label='48 hours (2 days)')
axes[1].set_title('Mean Settlement Time by Payment Method', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Payment Method')
axes[1].set_ylabel('Mean Settlement Time (hours)')
axes[1].tick_params(axis='x', rotation=45)
axes[1].legend()

plt.tight_layout()
plt.show()

In [None]:
# Calculate settlement time percentiles for SLA analysis
print("Settlement Time Percentiles (hours)")
print("=" * 60)

percentiles = [50, 75, 90, 95, 99]
for method in settlement_stats.index:
    data = completed_df[completed_df['payment_method'] == method]['settlement_hours']
    print(f"\n{method}:")
    for p in percentiles:
        val = np.percentile(data, p)
        print(f"  P{p}: {val:.1f} hours ({val/24:.2f} days)")

### Graph Theory Primer

- **Directed Graph** - connections with direction (money flows sender to receiver)
- **Node** - a point (person or business)
- **Edge** - a connection (payment relationship)
- **In-degree** - connections coming IN (people who send you money)
- **Out-degree** - connections going OUT (people you pay)
- **Centrality** - how important/well-connected a node is


## Section 5: Payment Network Visualization

Payment networks reveal the relationships between parties. Network analysis helps:

- **Identify Key Players**: Who are the major hubs in the network?
- **Detect Fraud**: Unusual network patterns may indicate suspicious activity
- **Understand Money Flow**: Where does money concentrate?
- **Risk Assessment**: Identify systemic risks from interconnections

In [None]:
# Build payment network
def build_payment_network(df, min_transactions=5):
    """
    Build a directed graph from payment transactions.
    
    Args:
        df: DataFrame with sender, receiver, amount columns
        min_transactions: Minimum transactions to include an edge
    
    Returns:
        NetworkX DiGraph
    """
    # Aggregate transactions by sender-receiver pairs
    edge_data = df.groupby(['sender', 'receiver']).agg({
        'amount': ['count', 'sum']
    }).reset_index()
    edge_data.columns = ['sender', 'receiver', 'transaction_count', 'total_amount']
    
    # Filter by minimum transactions
    edge_data = edge_data[edge_data['transaction_count'] >= min_transactions]
    
    # Create directed graph
    G = nx.DiGraph()
    
    for _, row in edge_data.iterrows():
        G.add_edge(row['sender'], row['receiver'], 
                   weight=row['transaction_count'],
                   amount=row['total_amount'])
    
    return G, edge_data

# Build network from completed transactions
G, edge_data = build_payment_network(completed_df, min_transactions=3)

print("Payment Network Statistics")
print("=" * 50)
print(f"Number of nodes (parties): {G.number_of_nodes()}")
print(f"Number of edges (relationships): {G.number_of_edges()}")
print(f"Network density: {nx.density(G):.4f}")

# Calculate node metrics
in_degree = dict(G.in_degree(weight='weight'))
out_degree = dict(G.out_degree(weight='weight'))
in_value = dict(G.in_degree(weight='amount'))
out_value = dict(G.out_degree(weight='amount'))

In [None]:
# Analyze key network players
node_metrics = pd.DataFrame({
    'party': list(G.nodes()),
    'incoming_txns': [in_degree.get(n, 0) for n in G.nodes()],
    'outgoing_txns': [out_degree.get(n, 0) for n in G.nodes()],
    'incoming_value': [in_value.get(n, 0) for n in G.nodes()],
    'outgoing_value': [out_value.get(n, 0) for n in G.nodes()]
})
node_metrics['net_flow'] = node_metrics['incoming_value'] - node_metrics['outgoing_value']
node_metrics['total_activity'] = node_metrics['incoming_txns'] + node_metrics['outgoing_txns']

print("Top 10 Parties by Total Activity (transactions sent + received)")
print("=" * 70)
print(node_metrics.nlargest(10, 'total_activity').to_string(index=False))

print("\nTop 5 Net Receivers (more money in than out)")
print("=" * 70)
print(node_metrics.nlargest(5, 'net_flow')[['party', 'net_flow', 'incoming_value', 'outgoing_value']].to_string(index=False))

print("\nTop 5 Net Senders (more money out than in)")
print("=" * 70)
print(node_metrics.nsmallest(5, 'net_flow')[['party', 'net_flow', 'incoming_value', 'outgoing_value']].to_string(index=False))

In [None]:
# Visualize the payment network
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Use a subset for clearer visualization (top nodes by activity)
top_parties = node_metrics.nlargest(20, 'total_activity')['party'].tolist()
G_sub = G.subgraph(top_parties)

# Calculate layout
pos = nx.spring_layout(G_sub, k=2, iterations=50, seed=42)

# Node sizes based on total activity
node_sizes = [node_metrics[node_metrics['party'] == n]['total_activity'].values[0] * 3 + 100 
              for n in G_sub.nodes()]

# Node colors based on net flow
net_flows = [node_metrics[node_metrics['party'] == n]['net_flow'].values[0] for n in G_sub.nodes()]

# Edge widths based on transaction count
edge_weights = [G_sub[u][v]['weight'] * 0.5 for u, v in G_sub.edges()]

# Plot 1: Network with net flow coloring
nodes = nx.draw_networkx_nodes(G_sub, pos, ax=axes[0], node_size=node_sizes, 
                                node_color=net_flows, cmap='RdYlGn', alpha=0.8)
nx.draw_networkx_edges(G_sub, pos, ax=axes[0], width=edge_weights, 
                       alpha=0.5, edge_color='gray', arrows=True, arrowsize=10)
nx.draw_networkx_labels(G_sub, pos, ax=axes[0], font_size=7)
axes[0].set_title('Payment Network (Top 20 Parties)\nColor: Net Flow (Green=Receiver, Red=Sender)', 
                  fontsize=12, fontweight='bold')
axes[0].axis('off')
plt.colorbar(nodes, ax=axes[0], label='Net Flow ($)')

# Plot 2: Degree distribution
degrees = [d for n, d in G.degree()]
axes[1].hist(degrees, bins=20, color='steelblue', edgecolor='black', alpha=0.7)
axes[1].axvline(np.mean(degrees), color='red', linestyle='--', label=f'Mean: {np.mean(degrees):.1f}')
axes[1].set_title('Degree Distribution', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Degree (number of connections)')
axes[1].set_ylabel('Number of Parties')
axes[1].legend()

plt.tight_layout()
plt.show()

In [None]:
# Calculate centrality measures
print("Network Centrality Analysis")
print("=" * 60)

# Different centrality measures
degree_centrality = nx.degree_centrality(G)
betweenness_centrality = nx.betweenness_centrality(G)
pagerank = nx.pagerank(G)

centrality_df = pd.DataFrame({
    'party': list(G.nodes()),
    'degree_centrality': [degree_centrality[n] for n in G.nodes()],
    'betweenness_centrality': [betweenness_centrality[n] for n in G.nodes()],
    'pagerank': [pagerank[n] for n in G.nodes()]
})

print("\nTop 10 by Degree Centrality (most connections):")
print(centrality_df.nlargest(10, 'degree_centrality')[['party', 'degree_centrality']].to_string(index=False))

print("\nTop 10 by Betweenness Centrality (bridges between groups):")
print(centrality_df.nlargest(10, 'betweenness_centrality')[['party', 'betweenness_centrality']].to_string(index=False))

print("\nTop 10 by PageRank (important based on who connects to them):")
print(centrality_df.nlargest(10, 'pagerank')[['party', 'pagerank']].to_string(index=False))

### Understanding Basis Points

Fees are often in basis points (bps):
- **1 basis point = 0.01%**
- **100 basis points = 1%**
- **150 basis points = 1.5% = 0.015 decimal**

Example: 250 bps = 2.5%
On $1,000: $1,000 √ó 0.025 = $25 fee


## Section 6: Fee Analysis Across Payment Methods

Understanding fee structures is critical for:

- **Cost Optimization**: Choosing the right payment method for each transaction
- **Pricing Strategy**: Setting competitive but profitable fees
- **Customer Segmentation**: Understanding price sensitivity
- **Revenue Projection**: Forecasting fee income

In [None]:
# Detailed fee analysis
fee_analysis = completed_df.groupby('payment_method').agg({
    'amount': ['count', 'sum', 'mean', 'median'],
    'fee': ['sum', 'mean', 'median']
}).round(2)
fee_analysis.columns = ['txn_count', 'total_volume', 'avg_amount', 'median_amount',
                        'total_fees', 'avg_fee', 'median_fee']
fee_analysis['effective_rate_pct'] = (fee_analysis['total_fees'] / fee_analysis['total_volume'] * 100).round(3)
fee_analysis['revenue_share_pct'] = (fee_analysis['total_fees'] / fee_analysis['total_fees'].sum() * 100).round(1)

print("Fee Analysis by Payment Method")
print("=" * 100)
print(fee_analysis.to_string())

In [None]:
# Visualize fee structures
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

methods = fee_analysis.index.tolist()
colors = plt.cm.Set3(range(len(methods)))

# Effective fee rate
axes[0, 0].bar(methods, fee_analysis['effective_rate_pct'], color=colors, edgecolor='black')
axes[0, 0].set_title('Effective Fee Rate by Payment Method', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Fee Rate (%)')
axes[0, 0].tick_params(axis='x', rotation=45)
for i, v in enumerate(fee_analysis['effective_rate_pct']):
    axes[0, 0].text(i, v + 0.05, f'{v:.2f}%', ha='center', fontsize=9)

# Total fee revenue
axes[0, 1].bar(methods, fee_analysis['total_fees'], color=colors, edgecolor='black')
axes[0, 1].set_title('Total Fee Revenue by Payment Method', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('Total Fees ($)')
axes[0, 1].tick_params(axis='x', rotation=45)

# Fee vs Amount scatter
for method, color in zip(methods, colors):
    method_data = completed_df[completed_df['payment_method'] == method].sample(min(200, len(completed_df[completed_df['payment_method'] == method])))
    axes[1, 0].scatter(method_data['amount'], method_data['fee'], 
                       alpha=0.5, label=method, color=color, s=20)
axes[1, 0].set_title('Fee vs Transaction Amount', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Transaction Amount ($)')
axes[1, 0].set_ylabel('Fee ($)')
axes[1, 0].legend(loc='upper left', fontsize=8)
axes[1, 0].set_xlim(0, 10000)  # Focus on smaller transactions for clarity

# Revenue share pie
axes[1, 1].pie(fee_analysis['revenue_share_pct'], labels=methods, autopct='%1.1f%%',
               colors=colors, startangle=90)
axes[1, 1].set_title('Fee Revenue Share', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Fee efficiency analysis: which method is cheapest for different transaction sizes?
def calculate_fee(amount, method):
    """Calculate fee for a given amount and payment method."""
    fee_structure = {
        'Wire Transfer': {'pct': 0.005, 'fixed': 25.0},
        'ACH': {'pct': 0.001, 'fixed': 0.5},
        'Credit Card': {'pct': 0.025, 'fixed': 0.30},
        'Debit Card': {'pct': 0.015, 'fixed': 0.25},
        'Digital Wallet': {'pct': 0.029, 'fixed': 0.30},
        'Cryptocurrency': {'pct': 0.01, 'fixed': 2.0}
    }
    return amount * fee_structure[method]['pct'] + fee_structure[method]['fixed']

# Create comparison for different transaction sizes
amounts = [10, 50, 100, 500, 1000, 5000, 10000, 50000]
methods = ['Wire Transfer', 'ACH', 'Credit Card', 'Debit Card', 'Digital Wallet', 'Cryptocurrency']

print("Fee Comparison by Transaction Size")
print("=" * 80)
print(f"{'Amount':<12}", end="")
for method in methods:
    print(f"{method[:10]:<12}", end="")
print("\nBest Choice")
print("-" * 80)

for amount in amounts:
    fees = {method: calculate_fee(amount, method) for method in methods}
    best = min(fees, key=fees.get)
    print(f"${amount:<11,}", end="")
    for method in methods:
        fee_str = f"${fees[method]:.2f}"
        if method == best:
            fee_str += "*"
        print(f"{fee_str:<12}", end="")
    print(f" -> {best}")

print("\n* = Lowest fee for this transaction size")

In [None]:
# Visualize fee comparison
amounts_fine = np.logspace(1, 5, 100)  # 10 to 100,000

plt.figure(figsize=(12, 6))

colors = plt.cm.tab10(range(len(methods)))
for method, color in zip(methods, colors):
    fees = [calculate_fee(a, method) for a in amounts_fine]
    plt.plot(amounts_fine, fees, label=method, linewidth=2, color=color)

plt.xscale('log')
plt.yscale('log')
plt.xlabel('Transaction Amount ($)', fontsize=12)
plt.ylabel('Fee ($)', fontsize=12)
plt.title('Payment Method Fee Comparison', fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)

# Add breakeven annotations
plt.axvline(x=500, color='gray', linestyle='--', alpha=0.5)
plt.axvline(x=5000, color='gray', linestyle='--', alpha=0.5)
plt.text(500, 0.5, '$500', fontsize=9, color='gray')
plt.text(5000, 0.5, '$5,000', fontsize=9, color='gray')

plt.tight_layout()
plt.show()

print("\nKey Insights:")
print("- For small transactions (<$500): ACH is most cost-effective")
print("- For medium transactions ($500-$5,000): ACH or Cryptocurrency")
print("- For large transactions (>$5,000): ACH has lowest percentage cost")
print("- Wire transfers only make sense for very large amounts due to high fixed fee")
print("- Card payments have convenience but highest fees for larger amounts")

## Try It Yourself!

### Experiment 1: Change Parameters
- Change `n_transactions=5000` to `10000`
- Change `days=90` to `30`

### Experiment 2: Filter Data


### Experiment 3: Calculate Fees


### Experiment 4: Find Large Transactions



## Section 7: Challenge Exercises

Apply what you've learned with these hands-on challenges!

### Challenge 1: Identify Suspicious Transaction Patterns

Fraud detection often relies on identifying unusual patterns. Write code to flag transactions that:
1. Are significantly larger than the party's average transaction
2. Occur at unusual hours
3. Involve parties with very few total transactions

In [None]:
# Challenge 1: Your code here

def flag_suspicious_transactions(df, amount_threshold=3, unusual_hours=(0, 6)):
    """
    Flag potentially suspicious transactions.
    
    Args:
        df: Payment transactions DataFrame
        amount_threshold: Flag if amount > threshold * party's mean
        unusual_hours: Tuple of (start_hour, end_hour) considered unusual
    
    Returns:
        DataFrame with suspicious transactions flagged
    """
    # TODO: Implement this function
    # Hints:
    # 1. Calculate each sender's average transaction amount
    # 2. Flag transactions where amount > threshold * sender's average
    # 3. Flag transactions during unusual hours
    # 4. Count transactions per party and flag those with < 3 transactions
    
    pass  # Replace with your implementation

# Test your function
# suspicious = flag_suspicious_transactions(completed_df)
# print(f"Found {len(suspicious)} suspicious transactions")

### Challenge 2: Optimize Payment Method Selection

Create a recommendation system that suggests the optimal payment method based on:
1. Transaction amount
2. Required settlement time
3. Cost constraints

In [None]:
# Challenge 2: Your code here

def recommend_payment_method(amount, max_settlement_hours=48, max_fee_pct=None):
    """
    Recommend the best payment method based on constraints.
    
    Args:
        amount: Transaction amount
        max_settlement_hours: Maximum acceptable settlement time
        max_fee_pct: Maximum acceptable fee as percentage of amount
    
    Returns:
        Recommended payment method and reasoning
    """
    # TODO: Implement this function
    # Hints:
    # 1. Calculate fee for each payment method
    # 2. Filter by settlement time constraint
    # 3. Filter by fee percentage constraint if provided
    # 4. Return the cheapest option that meets all constraints
    
    pass  # Replace with your implementation

# Test cases
# print(recommend_payment_method(100, max_settlement_hours=1))
# print(recommend_payment_method(10000, max_settlement_hours=72, max_fee_pct=1.0))

### Challenge 3: Build a Payment Flow Dashboard

Create a summary function that generates a comprehensive payment flow report for a specific time period, including:
1. Volume statistics
2. Top senders and receivers
3. Method breakdown
4. Anomaly flags

In [None]:
# Challenge 3: Your code here

def generate_payment_report(df, start_date, end_date):
    """
    Generate a comprehensive payment flow report.
    
    Args:
        df: Payment transactions DataFrame
        start_date: Report start date (string or datetime)
        end_date: Report end date (string or datetime)
    
    Returns:
        Dictionary with report sections
    """
    # TODO: Implement this function
    # Include:
    # - Executive summary (total volume, count, avg)
    # - Top 5 senders by volume
    # - Top 5 receivers by volume
    # - Payment method breakdown
    # - Daily trend summary
    # - Fee analysis
    
    pass  # Replace with your implementation

# Test your function
# report = generate_payment_report(completed_df, '2024-01-01', '2024-01-31')
# print(report)

### Challenge 4: Network Analysis - Find Communities

Use network analysis to identify clusters of parties that frequently transact with each other. This can reveal:
- Business ecosystems
- Supply chains
- Potential fraud rings

In [None]:
# Challenge 4: Your code here

def find_payment_communities(G):
    """
    Identify communities in the payment network.
    
    Args:
        G: NetworkX graph of payment relationships
    
    Returns:
        Dictionary mapping community ID to list of parties
    """
    # TODO: Implement community detection
    # Hints:
    # 1. Convert to undirected graph for community detection
    # 2. Use nx.community.louvain_communities() or similar
    # 3. Analyze the size and composition of each community
    # 4. Visualize the communities with different colors
    
    pass  # Replace with your implementation

# Test your function
# communities = find_payment_communities(G)
# for i, community in communities.items():
#     print(f"Community {i}: {len(community)} members")

## Summary

In this notebook, you learned how to analyze payment transaction data:

### Key Concepts Covered

1. **Payment Data Generation**: Created realistic synthetic payment data with various payment methods, fees, and settlement times

2. **Volume Analysis**: 
   - Daily and weekly transaction patterns
   - Hourly distribution of payments
   - Payment method usage breakdown

3. **Settlement Time Analysis**:
   - Comparison across payment methods
   - Understanding SLA implications
   - Trade-offs between speed and cost

4. **Network Visualization**:
   - Building payment graphs with NetworkX
   - Centrality measures (degree, betweenness, PageRank)
   - Identifying key players in the network

5. **Fee Analysis**:
   - Effective fee rates by payment method
   - Cost optimization for different transaction sizes
   - Revenue breakdown and projections

### Real-World Applications

- **Fraud Detection**: Unusual patterns, network anomalies
- **Liquidity Management**: Settlement time predictions
- **Cost Optimization**: Payment method selection
- **Capacity Planning**: Volume forecasting
- **Regulatory Compliance**: Transaction monitoring

### Next Steps

- Explore real payment data from public datasets
- Implement machine learning for fraud detection
- Study payment system regulations (PSD2, instant payments)
- Learn about payment APIs (Stripe, PayPal, Plaid)
- Understand cross-border payment challenges

### Further Reading

- [Federal Reserve Payments Study](https://www.federalreserve.gov/paymentsystems/fr-payments-study.htm)
- [Bank for International Settlements - Payment Statistics](https://www.bis.org/cpmi/paysysinfo.htm)
- [European Payments Council](https://www.europeanpaymentscouncil.eu/)
- [Faster Payments Innovation](https://www.fasterpaymentscouncil.org/)