# üîç AML Suspicious Transaction Detector
## Anti-Money Laundering Analysis Dashboard

This notebook provides a complete end-to-end analysis pipeline for detecting suspicious financial activity:

1. **Data Generation** - Create realistic synthetic transaction data
2. **Pattern Detection** - Identify structuring/smurfing patterns
3. **Velocity Analysis** - Detect unusual transaction velocities
4. **ML Anomaly Detection** - Use Isolation Forest & LOF algorithms
5. **Risk Scoring** - Calculate composite risk scores
6. **Network Visualization** - Graph suspicious money flows
7. **Reporting** - Generate investigation reports

---


In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import networkx as nx
import warnings
warnings.filterwarnings('ignore')

# Import our AML modules
import sys
sys.path.append('../src')

from data_generator import AMLDataGenerator
from pattern_structuring import StructuringDetector
from pattern_velocity import VelocityDetector
from anomaly_detection import AnomalyDetector
from risk_scoring import RiskScorer
from graph_builder import TransactionGraphBuilder
from report_builder import AMLReportBuilder

# Set plotting style
plt.style.use('dark_background')
sns.set_palette('husl')

print("‚úÖ All modules loaded successfully!")


---
## üìä Step 1: Generate Synthetic Transaction Data

We generate realistic synthetic data including:
- **Normal customers** - Regular transaction patterns
- **Money mules** - High in/out velocity, low retention
- **Fraud rings** - Interconnected suspicious accounts
- **Structuring** - Multiple deposits just under $10,000


In [None]:
# Generate synthetic data
generator = AMLDataGenerator(
    n_customers=1000,
    n_transactions=25000,
    seed=42
)

customers, transactions = generator.generate_and_save('../data')


In [None]:
# Display sample data
print("\nüìã Sample Customer Data:")
display(customers.head(10))

print("\nüìã Sample Transaction Data:")
display(transactions.head(10))


In [None]:
# Data overview visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Customer Types Distribution',
        'Transaction Amount Distribution',
        'Transaction Types',
        'Daily Transaction Volume'
    ),
    specs=[[{"type": "pie"}, {"type": "histogram"}],
           [{"type": "bar"}, {"type": "scatter"}]]
)

# Customer types pie chart
customer_counts = customers['customer_type'].value_counts()
fig.add_trace(
    go.Pie(labels=customer_counts.index, values=customer_counts.values, hole=0.4),
    row=1, col=1
)

# Amount distribution
fig.add_trace(
    go.Histogram(x=transactions['amount'], nbinsx=50, marker_color='#2196f3'),
    row=1, col=2
)

# Transaction types
tx_types = transactions['tx_type'].value_counts()
fig.add_trace(
    go.Bar(x=tx_types.index, y=tx_types.values, marker_color='#4caf50'),
    row=2, col=1
)

# Daily volume
daily = transactions.groupby(transactions['timestamp'].dt.date)['amount'].sum()
fig.add_trace(
    go.Scatter(x=list(daily.index), y=daily.values, mode='lines', line_color='#ff7043'),
    row=2, col=2
)

fig.update_layout(
    height=800,
    title_text="<b>üìä Data Overview Dashboard</b>",
    showlegend=False,
    template='plotly_dark'
)

fig.show()


---
## üî¥ Step 2: Detect Structuring Patterns

Structuring (smurfing) detection identifies:
- Multiple deposits just under $10,000 threshold
- Many senders ‚Üí one receiver patterns
- Rapid-fire deposits
- Round amount patterns


In [None]:
# Run structuring detection
structuring_detector = StructuringDetector()
structuring_alerts, structuring_df = structuring_detector.run_all_detectors(transactions)

# Save alerts
if not structuring_df.empty:
    structuring_df.to_csv('../data/structuring_alerts.csv', index=False)
    print(f"\nüìÅ Saved {len(structuring_df)} structuring alerts")


---
## ‚ö° Step 3: Velocity Analysis

Velocity rules detect:
- Too many transactions in short periods
- Sudden activity spikes
- Money mule patterns (high in/out, low retention)


In [None]:
# Run velocity detection
velocity_detector = VelocityDetector()
velocity_alerts, velocity_df = velocity_detector.run_all_detectors(transactions)

# Save alerts
if not velocity_df.empty:
    velocity_df.to_csv('../data/velocity_alerts.csv', index=False)
    print(f"\nüìÅ Saved {len(velocity_df)} velocity alerts")


---
## ü§ñ Step 4: Machine Learning Anomaly Detection

Using advanced ML algorithms:
- **Isolation Forest** - Tree-based anomaly detection
- **Local Outlier Factor (LOF)** - Density-based detection
- **Statistical Outliers** - Z-score analysis


In [None]:
# Run ML anomaly detection
anomaly_detector = AnomalyDetector(
    isolation_contamination=0.05,
    lof_contamination=0.05
)

features_df, anomaly_alerts, anomaly_df = anomaly_detector.run_all_detectors(transactions, customers)

# Save results
features_df.to_csv('../data/account_features.csv', index=False)
if not anomaly_df.empty:
    anomaly_df.to_csv('../data/anomaly_alerts.csv', index=False)
    print(f"\nüìÅ Saved {len(anomaly_df)} anomaly alerts")


In [None]:
# Visualize anomaly scores
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Isolation Forest Scores', 'Combined Anomaly Score Distribution')
)

# Scatter plot of accounts by anomaly score
fig.add_trace(
    go.Scatter(
        x=features_df['tx_count_total'],
        y=features_df['amount_total'],
        mode='markers',
        marker=dict(
            size=8,
            color=features_df['if_anomaly_score'],
            colorscale='RdYlGn_r',
            showscale=True,
            colorbar=dict(title='Anomaly<br>Score')
        ),
        text=features_df['account_id'],
        hovertemplate='<b>%{text}</b><br>Transactions: %{x}<br>Amount: $%{y:,.0f}<extra></extra>'
    ),
    row=1, col=1
)

# Score distribution histogram
fig.add_trace(
    go.Histogram(
        x=features_df['combined_anomaly_score'],
        nbinsx=30,
        marker_color='#9c27b0'
    ),
    row=1, col=2
)

fig.update_xaxes(title_text='Transaction Count', row=1, col=1)
fig.update_yaxes(title_text='Total Amount ($)', row=1, col=1)
fig.update_xaxes(title_text='Combined Anomaly Score', row=1, col=2)

fig.update_layout(
    height=500,
    title_text="<b>ü§ñ ML Anomaly Detection Results</b>",
    template='plotly_dark',
    showlegend=False
)

fig.show()


---
## üìà Step 5: Risk Scoring

Calculate composite risk scores based on:
- Alert count and severity
- Transaction volume
- Velocity patterns
- Network connections
- Behavioral anomalies


In [None]:
# Combine all alerts
all_alerts = pd.concat([
    structuring_df if not structuring_df.empty else pd.DataFrame(),
    velocity_df if not velocity_df.empty else pd.DataFrame(),
    anomaly_df if not anomaly_df.empty else pd.DataFrame()
], ignore_index=True)

print(f"Total alerts combined: {len(all_alerts)}")

# Calculate risk scores
risk_scorer = RiskScorer()
risk_scores, risk_df = risk_scorer.calculate_risk_scores(
    transactions, all_alerts, features_df
)

# Save risk scores
risk_df.to_csv('../data/risk_scores.csv', index=False)

# Get suspicious accounts
suspicious = risk_scorer.get_suspicious_accounts(risk_df, 'medium')
suspicious.to_csv('../data/suspicious_accounts.csv', index=False)
print(f"\nüìÅ Saved risk scores for {len(risk_df)} accounts")
print(f"üìÅ Identified {len(suspicious)} suspicious accounts")


In [None]:
# Display top 15 risky accounts
print("\nüö® TOP 15 HIGHEST RISK ACCOUNTS")
print("=" * 80)
display(risk_df.head(15)[[
    'account_id', 'overall_score', 'risk_category',
    'alert_count', 'total_amount', 'contributing_factors'
]])


---
## üï∏Ô∏è Step 6: Network Graph Visualization

Build transaction network graph:
- Nodes = accounts (colored by risk)
- Edges = money flows (weighted by amount)
- Identify suspicious clusters and paths


In [None]:
# Build transaction graph
graph_builder = TransactionGraphBuilder()
G = graph_builder.build_graph(transactions, risk_df)

# Calculate network metrics
network_metrics = graph_builder.calculate_network_metrics()
network_metrics.to_csv('../data/network_metrics.csv', index=False)

# Find suspicious paths
suspicious_paths = graph_builder.find_suspicious_paths()


In [None]:
# Create network visualization
fig = graph_builder.visualize_network(
    output_path='../data/network_graph.png',
    title='AML Suspicious Transaction Network'
)
plt.show()


In [None]:
# Network metrics analysis
print("\nüìä Network Analysis Summary")
print("=" * 50)
print(f"Total nodes (accounts): {G.number_of_nodes()}")
print(f"Total edges (flows): {G.number_of_edges()}")
print(f"Network density: {nx.density(G):.4f}")
print(f"Communities detected: {len(graph_builder.communities)}")
print(f"Suspicious paths found: {len(suspicious_paths)}")

# Top PageRank accounts
print("\nüîù Top 10 Most Central Accounts (PageRank):")
display(network_metrics.nlargest(10, 'pagerank')[['account_id', 'pagerank', 'total_degree', 'in_amount', 'out_amount']])


---
## üìù Step 7: Generate Reports

Create comprehensive investigation reports:
- Executive Summary
- HTML Dashboard
- Suspicious Accounts List


In [None]:
# Generate all reports
reporter = AMLReportBuilder(output_dir='../reports')
reports = reporter.generate_full_report(transactions, all_alerts, risk_df)

# Export graph data
graph_builder.export_graph_data('../data')

print("\n" + "=" * 60)
print("‚úÖ ALL ANALYSIS COMPLETE!")
print("=" * 60)
print("\nCheck the /data and /reports directories for all outputs.")
print("Open the HTML dashboard for an interactive view.")


---
## ‚úÖ Analysis Complete!

### Generated Files:
- `data/customers.csv` - Customer information
- `data/transactions.csv` - Transaction records
- `data/structuring_alerts.csv` - Structuring detection alerts
- `data/velocity_alerts.csv` - Velocity rule alerts
- `data/anomaly_alerts.csv` - ML anomaly alerts
- `data/risk_scores.csv` - Composite risk scores
- `data/suspicious_accounts.csv` - Flagged accounts
- `data/network_graph.png` - Network visualization
- `reports/executive_summary_*.txt` - Text summary
- `reports/aml_dashboard_*.html` - Interactive HTML report

### Next Steps:
1. Review critical and high-risk accounts
2. Investigate suspicious transaction paths
3. Generate SARs for confirmed suspicious activity
4. Update customer risk ratings
