# Cryptocurrency Mixer Detection Notebook

This notebook analyzes transaction patterns to detect the usage of cryptocurrency mixers on the Solana blockchain.

## Overview

Cryptocurrency mixers (or tumblers) are services that obfuscate transaction trails by mixing multiple users' funds together. This notebook:

1. Identifies known mixer services on Solana
2. Detects transactions with mixer-like patterns
3. Analyzes on-chain transaction patterns indicative of mixer usage
4. Traces funds that have passed through mixers

In [34]:
# Import required libraries
import os
import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from datetime import datetime, timedelta
from typing import List, Dict, Any, Tuple, Optional

# Add project root to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '../..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Import SolanaGuard modules
from data_collection.collectors.helius_collector import HeliusCollector
from data_collection.collectors.range_collector import RangeCollector

# Try to locate the utils modules (could be in different locations)
try:
    # First try the most likely location - as a top-level package
    from utils.entropy_analysis import calculate_transaction_entropy, detect_entropy_anomalies
    from utils.graph_utils import TransactionFlowGraph
    from utils.risk_scoring import calculate_address_risk
    from utils.visualization import visualize_transaction_flow, plot_entropy_distribution
    print("Utils modules imported successfully")
except ImportError:
    try:
        # Try as a subpackage of data_collection
        from data_collection.utils.entropy_analysis import calculate_transaction_entropy, detect_entropy_anomalies
        from data_collection.utils.graph_utils import TransactionFlowGraph
        from data_collection.utils.risk_scoring import calculate_address_risk
        from data_collection.utils.visualization import visualize_transaction_flow, plot_entropy_distribution
        print("Utils modules imported from data_collection.utils")
    except ImportError:
        # Define simple placeholder functions if modules can't be found
        print("WARNING: Could not import utils modules. Using placeholder functions.")
        
        def calculate_transaction_entropy(data):
            return 0.0
            
        def detect_entropy_anomalies(data):
            return []
            
        class TransactionFlowGraph:
            def __init__(self):
                self.graph = nx.DiGraph()
                
            def add_transaction(self, source, target, **kwargs):
                if source and target:
                    self.graph.add_edge(source, target)
                    
            def get_nodes(self):
                return list(self.graph.nodes())
                
            def get_edge_count(self):
                return len(self.graph.edges())
                
            def set_node_attribute(self, node, attr, value):
                self.graph.nodes[node][attr] = value
                
            def get_node_attribute(self, node, attr):
                return self.graph.nodes[node].get(attr, False)
                
            # Add other required methods with simple implementations
            def calculate_centrality(self):
                return {node: 0.0 for node in self.graph.nodes()}
                
            def identify_communities(self):
                return [[node] for node in self.graph.nodes()]
                
            # Add more methods as needed
            def get_in_neighbors(self, node):
                return list(self.graph.predecessors(node)) if node in self.graph else []
                
            def get_out_neighbors(self, node):
                return list(self.graph.successors(node)) if node in self.graph else []
                
            def get_edge_attributes(self, source, target):
                return {"transactions": []}
                
            def export_to_json(self):
                return {"nodes": [], "edges": []}
        
        def calculate_address_risk(address):
            return 0.0
            
        def visualize_transaction_flow(graph, highlight_nodes=None, output_file=None):
            return output_file if output_file else None
            
        def plot_entropy_distribution(data):
            pass

# Configure plot style
plt.style.use('ggplot')
sns.set_style("whitegrid")

Utils modules imported from data_collection.utils


## Initialize API Collectors

First, we initialize the necessary API collectors to gather blockchain data:

In [35]:
# Initialize collectors
helius = HeliusCollector()
range_api = RangeCollector()

# Verify API connectivity
try:
    helius_status = helius.check_connection()
    range_status = range_api.check_connection()
    print(f"Helius API: {'Connected' if helius_status else 'Connection failed'}")
    print(f"Range API: {'Connected' if range_status else 'Connection failed'}")
except Exception as e:
    print(f"Error connecting to APIs: {e}")

2025-04-24 22:34:06,711 - helius_collector - INFO - Initialized Helius collector
2025-04-24 22:34:06,712 - range_collector - INFO - Initialized Range collector
2025-04-24 22:34:06,712 - range_collector - INFO - Initialized Range collector


Error connecting to APIs: 'HeliusCollector' object has no attribute 'check_connection'


## Known Mixer Services and Addresses

Let's define a database of known mixer services and their associated addresses on Solana:

In [36]:
# Database of known mixers on Solana
KNOWN_MIXERS = {
    "tornado_cash_solana": {
        "addresses": [
            "TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8", # Main contract
            "TCxZde8bp2sp5s7fK8K4nnzJWXpMjHmYLAXPYvgdz4Z"  # Router
        ],
        "transaction_patterns": ["equal_amounts", "fixed_intervals", "privacy_set"],
        "risk_level": "high"
    },
    "elusiv": {
        "addresses": [
            "E1w8SZpBPkRBdBmEJUwpRZx1SbQVVhNqkuH95uKJvypH",  # Program
            "2EgZ5LuMqyVKQYS4AFhJRpZNr1rLxuU4UJuNJbcKFubu"   # Fee collector
        ],
        "transaction_patterns": ["stealth_addresses", "ring_signature", "decoy_outputs"],
        "risk_level": "medium"
    },
    "cyclos_mixer": {
        "addresses": [
            "CYcLEsDHNZn8mVimVJLMFeYRGzdPx9QmxUL5kgKSTsdq",  # Mixer contract
            "CYCSaAMM4tJLXeXQKVAZrkLUKu8gYXhRNfYJG5qkKPQt"   # Pool
        ],
        "transaction_patterns": ["pool_deposits", "time_locks", "uniform_withdrawals"],
        "risk_level": "medium"
    }
}

# Function to get all mixer addresses
def get_all_mixer_addresses() -> List[str]:
    all_addresses = []
    for mixer_name, mixer_info in KNOWN_MIXERS.items():
        all_addresses.extend(mixer_info["addresses"])
    return all_addresses

# Function to check if address is a known mixer
def is_known_mixer(address: str) -> Tuple[bool, str]:
    for mixer_name, mixer_info in KNOWN_MIXERS.items():
        if address in mixer_info["addresses"]:
            return True, mixer_name
    return False, ""

# List all known mixer addresses
all_mixer_addresses = get_all_mixer_addresses()
print(f"Loaded {len(all_mixer_addresses)} known mixer addresses across {len(KNOWN_MIXERS)} mixer services")
for mixer_name, mixer_info in KNOWN_MIXERS.items():
    print(f"- {mixer_name} ({mixer_info['risk_level']} risk): {len(mixer_info['addresses'])} addresses")

Loaded 6 known mixer addresses across 3 mixer services
- tornado_cash_solana (high risk): 2 addresses
- elusiv (medium risk): 2 addresses
- cyclos_mixer (medium risk): 2 addresses


## Analyze Known Mixer Transactions

Let's analyze transactions involving the known mixer addresses to understand their patterns:

In [37]:
# Function to cluster transactions by pattern
def cluster_transactions_by_pattern(tx_df: pd.DataFrame) -> Dict:
    """
    Cluster transactions to identify common patterns associated with mixer services.
    
    Args:
        tx_df: DataFrame containing transaction data
        
    Returns:
        Dictionary of identified patterns with descriptions and counts
    """
    patterns = {}
    pattern_id = 1
    
    # Check if DataFrame has required columns
    if tx_df.empty:
        return patterns
    
    # 1. Look for equal amount patterns (common in mixers)
    if 'amount' in tx_df.columns:
        # Group by amount and count
        amount_counts = tx_df['amount'].value_counts()
        # Find amounts that appear multiple times (indicating possible mixer patterns)
        common_amounts = amount_counts[amount_counts > 3]
        
        if not common_amounts.empty:
            for amount, count in common_amounts.items():
                if count >= 5:  # At least 5 transactions with the same amount
                    patterns[f"pattern_{pattern_id}"] = {
                        "description": f"Fixed amount transfers of {amount:.4f} units",
                        "count": int(count),
                        "type": "equal_amounts",
                        "confidence": min(0.9, 0.5 + count/50)  # Higher confidence with more occurrences
                    }
                    pattern_id += 1
    
    # 2. Look for time interval patterns
    if 'block_time' in tx_df.columns:
        # Sort by timestamp
        sorted_tx = tx_df.sort_values('block_time')
        # Calculate time differences
        time_diffs = sorted_tx['block_time'].diff().dropna()
        
        if not time_diffs.empty:
            # Group by time difference (rounded to nearest 10 seconds)
            rounded_diffs = (time_diffs / 10).round() * 10
            diff_counts = rounded_diffs.value_counts()
            common_diffs = diff_counts[diff_counts > 3]
            
            if not common_diffs.empty:
                for diff_seconds, count in common_diffs.items():
                    if count >= 4:  # At least 4 transactions with similar timing
                        time_desc = f"{diff_seconds:.0f} seconds" if diff_seconds < 300 else f"{diff_seconds/60:.1f} minutes"
                        patterns[f"pattern_{pattern_id}"] = {
                            "description": f"Regular time intervals of approximately {time_desc}",
                            "count": int(count),
                            "type": "fixed_intervals",
                            "confidence": min(0.85, 0.4 + count/40)
                        }
                        pattern_id += 1
    
    # 3. Look for mixing patterns (many inputs, many outputs)
    if 'signature' in tx_df.columns and ('sender_address' in tx_df.columns or 'receiver_address' in tx_df.columns):
        # Count unique senders and receivers
        sender_col = 'sender_address' if 'sender_address' in tx_df.columns else 'source_address'
        receiver_col = 'receiver_address' if 'receiver_address' in tx_df.columns else 'target_address'
        
        if sender_col in tx_df.columns and receiver_col in tx_df.columns:
            unique_senders = tx_df[sender_col].nunique()
            unique_receivers = tx_df[receiver_col].nunique()
            
            # High mixing pattern - many senders to many receivers
            if unique_senders > 10 and unique_receivers > 10:
                patterns[f"pattern_{pattern_id}"] = {
                    "description": f"High mixing pattern with {unique_senders} senders and {unique_receivers} receivers",
                    "count": len(tx_df),
                    "type": "privacy_set",
                    "confidence": min(0.95, 0.6 + (unique_senders + unique_receivers)/100)
                }
                pattern_id += 1
    
    # 4. Look for fragmentation patterns (one input, many similar outputs)
    if sender_col in tx_df.columns and receiver_col in tx_df.columns and 'amount' in tx_df.columns:
        # Group by sender
        for sender, group in tx_df.groupby(sender_col):
            if len(group) >= 5:  # Sender with multiple transactions
                amount_std = group['amount'].std()
                amount_mean = group['amount'].mean()
                if amount_mean > 0 and (amount_std / amount_mean) < 0.2:  # Low variance in amounts
                    patterns[f"pattern_{pattern_id}"] = {
                        "description": f"Fragmentation pattern from {sender} with {len(group)} similar outputs",
                        "count": len(group),
                        "type": "fragmentation",
                        "confidence": min(0.9, 0.5 + len(group)/20)
                    }
                    pattern_id += 1
    
    return patterns

In [38]:
# Function to analyze mixer transactions
def analyze_mixer_transactions(mixer_addresses: List[str], max_txs_per_address: int = 100) -> Dict:
    mixer_tx_data = {
        "transactions": [],
        "address_stats": {},
        "common_patterns": {}
    }
    
    for address in mixer_addresses:
        print(f"Analyzing mixer address: {address}")
        is_mixer, mixer_name = is_known_mixer(address)
        
        if not is_mixer:
            print(f"Warning: {address} is not in our known mixer database")
            mixer_name = "unknown_mixer"
        
        # Get transaction history for this mixer address
        try:
            tx_history = helius.fetch_transaction_history(address, limit=max_txs_per_address)
            print(f"Fetched {len(tx_history)} transactions")
            
            # Add mixer info to each transaction
            for tx in tx_history:
                tx["mixer_name"] = mixer_name
                tx["mixer_address"] = address
                mixer_tx_data["transactions"].append(tx)
            
            # Analyze token transfers for this address
            token_transfers = helius.analyze_token_transfers(address, limit=max_txs_per_address)
            print(f"Analyzed {len(token_transfers)} token transfers")
            
            # Calculate statistics for this address
            if len(tx_history) > 0:
                # Extract timestamps and sort them
                timestamps = sorted([tx.get("block_time", 0) for tx in tx_history if tx.get("block_time")])
                
                # Calculate time differences between transactions
                time_diffs = [timestamps[i+1] - timestamps[i] for i in range(len(timestamps)-1)]
                median_time_diff = np.median(time_diffs) if time_diffs else 0
                std_time_diff = np.std(time_diffs) if len(time_diffs) > 1 else 0
                
                # Calculate transaction amount statistics
                amounts = [tx.get("amount", 0) for tx in token_transfers if tx.get("amount")]
                median_amount = np.median(amounts) if amounts else 0
                unique_amounts = len(set(amounts)) if amounts else 0
                amount_uniformity = 1 - (unique_amounts / len(amounts)) if amounts else 0
                
                # Store statistics
                mixer_tx_data["address_stats"][address] = {
                    "tx_count": len(tx_history),
                    "transfer_count": len(token_transfers),
                    "median_time_diff": median_time_diff,
                    "time_diff_std": std_time_diff,
                    "median_amount": median_amount,
                    "unique_amounts": unique_amounts,
                    "amount_uniformity": amount_uniformity,  # Higher = more uniform amounts
                    "first_seen": min(timestamps) if timestamps else 0,
                    "last_seen": max(timestamps) if timestamps else 0
                }
        except Exception as e:
            print(f"Error analyzing mixer address {address}: {e}")
    
    # Identify common patterns across all mixer transactions
    if mixer_tx_data["transactions"]:
        # Convert to DataFrame for easier analysis
        tx_df = pd.DataFrame(mixer_tx_data["transactions"])
        
        # Apply clustering to identify transaction patterns
        if len(tx_df) > 10:  # Need sufficient data for clustering
            patterns = cluster_transactions_by_pattern(tx_df)
            mixer_tx_data["common_patterns"] = patterns
            
            print(f"\nIdentified {len(patterns)} common transaction patterns across mixers:")
            for pattern_id, pattern_info in patterns.items():
                print(f"- Pattern {pattern_id}: {pattern_info['description']} "
                      f"(found in {pattern_info['count']} transactions)")
    
    return mixer_tx_data

# Analyze a sample of mixer addresses (limit to 2 for demonstration)
sample_mixer_addresses = all_mixer_addresses[:2]
mixer_transaction_data = analyze_mixer_transactions(sample_mixer_addresses)

2025-04-24 22:34:06,768 - helius_collector - INFO - Fetching transaction history for TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8 (limit: 100)
2025-04-24 22:34:06,769 - helius_collector - INFO - Getting signatures for TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8 (limit: 100)
2025-04-24 22:34:06,769 - helius_collector - INFO - Getting signatures for TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8 (limit: 100)
2025-04-24 22:34:06,826 - helius_collector - ERROR - Failed to make RPC request: 401 Client Error: Unauthorized for url: https://mainnet.helius-rpc.com/?api-key=None
2025-04-24 22:34:06,827 - helius_collector - INFO - Fetching transaction history for TCxZde8bp2sp5s7fK8K4nnzJWXpMjHmYLAXPYvgdz4Z (limit: 100)
2025-04-24 22:34:06,828 - helius_collector - INFO - Getting signatures for TCxZde8bp2sp5s7fK8K4nnzJWXpMjHmYLAXPYvgdz4Z (limit: 100)
2025-04-24 22:34:06,826 - helius_collector - ERROR - Failed to make RPC request: 401 Client Error: Unauthorized for url: https://mainnet.helius-rpc.com/?a

Analyzing mixer address: TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8
Error analyzing mixer address TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8: Request failed: 401 Client Error: Unauthorized for url: https://mainnet.helius-rpc.com/?api-key=None
Analyzing mixer address: TCxZde8bp2sp5s7fK8K4nnzJWXpMjHmYLAXPYvgdz4Z


2025-04-24 22:34:07,017 - helius_collector - ERROR - Failed to make RPC request: 401 Client Error: Unauthorized for url: https://mainnet.helius-rpc.com/?api-key=None


Error analyzing mixer address TCxZde8bp2sp5s7fK8K4nnzJWXpMjHmYLAXPYvgdz4Z: Request failed: 401 Client Error: Unauthorized for url: https://mainnet.helius-rpc.com/?api-key=None


## Analyze Transaction Entropy for Mixer Detection

Transaction entropy analysis helps detect the randomization patterns typical of mixer services:

In [39]:
# Calculate entropy scores for mixer transactions
def calculate_mixer_entropy_scores(tx_data: Dict) -> Dict:
    transactions = tx_data.get("transactions", [])
    if not transactions:
        print("No transactions available for entropy analysis")
        return {"entropy_scores": [], "anomalies": []}
    
    # Convert to DataFrame if it's not already
    if not isinstance(transactions, pd.DataFrame):
        tx_df = pd.DataFrame(transactions)
    else:
        tx_df = transactions
        
    print(f"Calculating entropy scores for {len(tx_df)} transactions...")
    
    # Create features for entropy calculation
    features = []
    
    # Token transfer amounts (if available)
    if "amount" in tx_df.columns:
        features.append("amount")
    
    # Transaction timestamps (if available)
    if "block_time" in tx_df.columns:
        # Convert to pd.Timestamp objects for better analysis
        tx_df["timestamp"] = pd.to_datetime(tx_df["block_time"], unit="s")
        # Extract hour of day as a cyclical feature
        tx_df["hour"] = tx_df["timestamp"].dt.hour
        features.append("hour")
        
        # Calculate time differences between consecutive transactions
        tx_df = tx_df.sort_values("block_time")
        tx_df["time_diff"] = tx_df["block_time"].diff()
        features.append("time_diff")
    
    # Use entropy analysis utility to calculate entropy of transaction patterns
    entropy_scores = []
    for feature in features:
        if feature in tx_df.columns and not tx_df[feature].isnull().all():
            try:
                entropy = calculate_transaction_entropy(tx_df[feature].dropna().values)
                entropy_scores.append({
                    "feature": feature,
                    "entropy": entropy,
                    "normalized_entropy": entropy / np.log2(len(tx_df[feature].dropna().unique()))
                })
            except Exception as e:
                print(f"Error calculating entropy for {feature}: {e}")
    
    print(f"Calculated {len(entropy_scores)} entropy scores")
    
    # Detect anomalies in transaction patterns
    anomalies = []
    
    # Check amount distribution for suspicious uniformity
    if "amount" in tx_df.columns:
        amount_anomalies = detect_entropy_anomalies(tx_df["amount"].dropna().values)
        if amount_anomalies:
            anomalies.extend(amount_anomalies)
    
    # Check time interval distribution for suspicious patterns
    if "time_diff" in tx_df.columns:
        time_anomalies = detect_entropy_anomalies(tx_df["time_diff"].dropna().values)
        if time_anomalies:
            anomalies.extend(time_anomalies)
    
    print(f"Detected {len(anomalies)} anomalies in transaction patterns")
    
    # Visualize entropy distribution
    if entropy_scores:
        features = [score["feature"] for score in entropy_scores]
        entropies = [score["entropy"] for score in entropy_scores]
        normalized_entropies = [score["normalized_entropy"] for score in entropy_scores]
        
        plt.figure(figsize=(10, 6))
        plt.bar(features, normalized_entropies, alpha=0.7)
        plt.title("Normalized Entropy by Transaction Feature")
        plt.xlabel("Feature")
        plt.ylabel("Normalized Entropy (0-1)")
        plt.ylim(0, 1.1)
        plt.grid(axis="y", alpha=0.3)
        plt.tight_layout()
        plt.show()
    
    return {
        "entropy_scores": entropy_scores,
        "anomalies": anomalies
    }

# Calculate entropy scores for mixer transactions
entropy_analysis = calculate_mixer_entropy_scores(mixer_transaction_data)

No transactions available for entropy analysis


## Transaction Flow Graph Analysis

Let's analyze the transaction flow graph to identify mixer-like patterns:

In [40]:
# Create transaction flow graph from mixer transactions
def create_mixer_flow_graph(tx_data: Dict) -> TransactionFlowGraph:
    transactions = tx_data.get("transactions", [])
    if not transactions:
        print("No transactions available for flow graph analysis")
        return None
    
    print(f"Creating transaction flow graph from {len(transactions)} transactions...")
    
    # Initialize transaction flow graph
    flow_graph = TransactionFlowGraph()
    
    # Add transactions to the graph
    for tx in transactions:
        # Extract source and target addresses
        source_addr = tx.get("source_address") or tx.get("sender_address") or tx.get("from")
        target_addr = tx.get("target_address") or tx.get("receiver_address") or tx.get("to")
        
        if source_addr and target_addr:
            # Add edge to graph with transaction data
            amount = tx.get("amount", 0)
            timestamp = tx.get("block_time", 0)
            tx_hash = tx.get("signature") or tx.get("tx_hash") or "unknown"
            
            flow_graph.add_transaction(
                source_addr, 
                target_addr, 
                amount=amount,
                timestamp=timestamp,
                tx_hash=tx_hash,
                mixer_name=tx.get("mixer_name", "")
            )
    
    # Add additional details to nodes
    for address in flow_graph.get_nodes():
        is_mixer, mixer_name = is_known_mixer(address)
        if is_mixer:
            flow_graph.set_node_attribute(address, "is_mixer", True)
            flow_graph.set_node_attribute(address, "mixer_name", mixer_name)
        else:
            flow_graph.set_node_attribute(address, "is_mixer", False)
    
    # Calculate graph metrics
    centrality = flow_graph.calculate_centrality()
    communities = flow_graph.identify_communities()
    
    print(f"Flow graph created with {len(flow_graph.get_nodes())} nodes and {flow_graph.get_edge_count()} edges")
    print(f"Detected {len(communities)} communities in the transaction network")
    
    return flow_graph

# Create mixer flow graph
mixer_flow_graph = create_mixer_flow_graph(mixer_transaction_data)

# Visualize the transaction flow graph
if mixer_flow_graph:
    # Export graph data for visualization
    graph_data = mixer_flow_graph.export_to_json()
    
    # Save graph data
    os.makedirs("../../data/output", exist_ok=True)
    with open("../../data/output/mixer_flow_graph.json", "w") as f:
        json.dump(graph_data, f)
    
    # Visualize the graph using our utility
    viz_file = visualize_transaction_flow(
        mixer_flow_graph,
        highlight_nodes=all_mixer_addresses,
        output_file="../../data/visualizations/mixer_flow_graph.png"
    )
    
    # Display visualization
    if viz_file and os.path.exists(viz_file):
        from IPython.display import Image
        display(Image(filename=viz_file))

No transactions available for flow graph analysis


## Identify Mixer Users

Now, let's identify addresses that have interacted with mixer services:

In [41]:
# Function to identify mixer users from transaction flow graph
def identify_mixer_users(flow_graph: TransactionFlowGraph) -> Dict:
    if not flow_graph:
        print("No flow graph available for mixer user analysis")
        return {"depositors": [], "withdrawers": [], "suspicious_users": []}
    
    print("Identifying addresses that have interacted with mixers...")
    
    # Get all known mixer addresses in the graph
    mixer_nodes = []
    for node in flow_graph.get_nodes():
        if flow_graph.get_node_attribute(node, "is_mixer"):
            mixer_nodes.append(node)
    
    print(f"Found {len(mixer_nodes)} mixer addresses in the graph")
    
    # Identify depositors (addresses that sent funds to mixers)
    depositors = {}
    for mixer_addr in mixer_nodes:
        in_neighbors = flow_graph.get_in_neighbors(mixer_addr)
        for neighbor in in_neighbors:
            # Get transaction details
            edge_data = flow_graph.get_edge_attributes(neighbor, mixer_addr)
            for tx in edge_data.get("transactions", []):
                # Create depositor entry if it doesn't exist
                if neighbor not in depositors:
                    depositors[neighbor] = {
                        "address": neighbor,
                        "mixers_used": {},
                        "total_deposit_amount": 0,
                        "transaction_count": 0
                    }
                
                # Add mixer to this depositor's list if not already there
                mixer_name = tx.get("mixer_name", "unknown_mixer")
                if mixer_name not in depositors[neighbor]["mixers_used"]:
                    depositors[neighbor]["mixers_used"][mixer_name] = 0
                
                # Update statistics
                amount = tx.get("amount", 0)
                depositors[neighbor]["mixers_used"][mixer_name] += amount
                depositors[neighbor]["total_deposit_amount"] += amount
                depositors[neighbor]["transaction_count"] += 1
    
    # Identify withdrawers (addresses that received funds from mixers)
    withdrawers = {}
    for mixer_addr in mixer_nodes:
        out_neighbors = flow_graph.get_out_neighbors(mixer_addr)
        for neighbor in out_neighbors:
            # Get transaction details
            edge_data = flow_graph.get_edge_attributes(mixer_addr, neighbor)
            for tx in edge_data.get("transactions", []):
                # Create withdrawer entry if it doesn't exist
                if neighbor not in withdrawers:
                    withdrawers[neighbor] = {
                        "address": neighbor,
                        "mixers_used": {},
                        "total_withdrawal_amount": 0,
                        "transaction_count": 0
                    }
                
                # Add mixer to this withdrawer's list if not already there
                mixer_name = tx.get("mixer_name", "unknown_mixer")
                if mixer_name not in withdrawers[neighbor]["mixers_used"]:
                    withdrawers[neighbor]["mixers_used"][mixer_name] = 0
                
                # Update statistics
                amount = tx.get("amount", 0)
                withdrawers[neighbor]["mixers_used"][mixer_name] += amount
                withdrawers[neighbor]["total_withdrawal_amount"] += amount
                withdrawers[neighbor]["transaction_count"] += 1
    
    # Identify particularly suspicious users (involved in multiple mixers or with high volumes)
    suspicious_users = []
    
    # Check depositors
    for addr, data in depositors.items():
        # Flag if using multiple mixers
        if len(data["mixers_used"]) > 1:
            suspicious_users.append({
                "address": addr,
                "reason": f"Deposited to {len(data['mixers_used'])} different mixers",
                "risk_score": min(85, 50 + 10 * len(data["mixers_used"])),  # Higher for more mixers
                "role": "depositor",
                "total_amount": data["total_deposit_amount"]
            })
        # Flag if high volume
        elif data["total_deposit_amount"] > 10000:  # $10k threshold
            suspicious_users.append({
                "address": addr,
                "reason": f"High-volume mixer deposits (${data['total_deposit_amount']:.2f})",
                "risk_score": min(90, 40 + int(data["total_deposit_amount"] / 1000)),
                "role": "depositor",
                "total_amount": data["total_deposit_amount"]
            })
    
    # Check withdrawers
    for addr, data in withdrawers.items():
        # Flag if using multiple mixers
        if len(data["mixers_used"]) > 1:
            suspicious_users.append({
                "address": addr,
                "reason": f"Withdrew from {len(data['mixers_used'])} different mixers",
                "risk_score": min(85, 50 + 10 * len(data["mixers_used"])),
                "role": "withdrawer",
                "total_amount": data["total_withdrawal_amount"]
            })
        # Flag if high volume
        elif data["total_withdrawal_amount"] > 10000:  # $10k threshold
            suspicious_users.append({
                "address": addr,
                "reason": f"High-volume mixer withdrawals (${data['total_withdrawal_amount']:.2f})",
                "risk_score": min(90, 40 + int(data["total_withdrawal_amount"] / 1000)),
                "role": "withdrawer",
                "total_amount": data["total_withdrawal_amount"]
            })
    
    # Sort suspicious users by risk score
    suspicious_users.sort(key=lambda x: x["risk_score"], reverse=True)
    
    print(f"Identified {len(depositors)} depositors and {len(withdrawers)} withdrawers")
    print(f"Flagged {len(suspicious_users)} particularly suspicious users")
    if suspicious_users:
        print("\nTop suspicious users:")
        for user in suspicious_users[:3]:  # Show top 3
            print(f"- {user['address']} | {user['reason']} | Risk: {user['risk_score']}")
    
    return {
        "depositors": list(depositors.values()),
        "withdrawers": list(withdrawers.values()),
        "suspicious_users": suspicious_users
    }

# Identify mixer users
mixer_users = identify_mixer_users(mixer_flow_graph)

No flow graph available for mixer user analysis


## Detect Mixer-Like Behavior in Unknown Addresses

Let's look for addresses that behave like mixers even if they're not known mixers:

In [42]:
# Function to detect mixer-like behavior in non-mixer addresses
def detect_mixer_like_behavior(flow_graph: TransactionFlowGraph, known_mixers: List[str] = None) -> List[Dict]:
    if not flow_graph:
        print("No flow graph available for mixer-like behavior detection")
        return []
    
    if known_mixers is None:
        known_mixers = all_mixer_addresses
        
    print("Detecting addresses with mixer-like behavior...")
    
    # Candidate addresses (exclude known mixers)
    candidates = [node for node in flow_graph.get_nodes() 
                 if node not in known_mixers and not flow_graph.get_node_attribute(node, "is_mixer")]
    
    # Initialize results
    mixer_like_addresses = []
    
    # For each candidate, check if it exhibits mixer-like behavior
    for candidate in candidates:
        # Calculate metrics
        in_degree = len(flow_graph.get_in_neighbors(candidate))
        out_degree = len(flow_graph.get_out_neighbors(candidate))
        
        # Skip if not enough connections
        if in_degree < 3 or out_degree < 3:
            continue
            
        # Get all incoming and outgoing transactions
        in_txs = []
        for neighbor in flow_graph.get_in_neighbors(candidate):
            edge_data = flow_graph.get_edge_attributes(neighbor, candidate)
            in_txs.extend(edge_data.get("transactions", []))
            
        out_txs = []
        for neighbor in flow_graph.get_out_neighbors(candidate):
            edge_data = flow_graph.get_edge_attributes(candidate, neighbor)
            out_txs.extend(edge_data.get("transactions", []))
        
        # Analyze transaction patterns
        mixer_score = 0
        reasons = []
        
        # Check for many small inputs and outputs (typical of mixers)
        if in_degree > 10 and out_degree > 10:
            mixer_score += 30
            reasons.append(f"High connectivity: {in_degree} inputs, {out_degree} outputs")
        
        # Check for similar sized outputs (typical of mixers)
        if out_txs:
            out_amounts = [tx.get("amount", 0) for tx in out_txs]
            out_std = np.std(out_amounts)
            out_mean = np.mean(out_amounts)
            out_cv = out_std / out_mean if out_mean > 0 else 0
            
            if out_cv < 0.1 and len(out_txs) > 5:  # Very uniform outputs
                mixer_score += 40
                reasons.append(f"Highly uniform output amounts (CV: {out_cv:.4f})")
            elif out_cv < 0.25 and len(out_txs) > 5:  # Somewhat uniform
                mixer_score += 20
                reasons.append(f"Moderately uniform output amounts (CV: {out_cv:.4f})")
        
        # Check for timing patterns (typical of mixers)
        if out_txs:
            timestamps = sorted([tx.get("timestamp", 0) for tx in out_txs if tx.get("timestamp", 0) > 0])
            if len(timestamps) > 5:
                time_diffs = [timestamps[i+1] - timestamps[i] for i in range(len(timestamps)-1)]
                time_cv = np.std(time_diffs) / np.mean(time_diffs) if np.mean(time_diffs) > 0 else 0
                
                if time_cv < 0.2:  # Very regular timing
                    mixer_score += 30
                    reasons.append(f"Regular transaction timing (CV: {time_cv:.4f})")
        
        # Check for high turnover (inputs close to outputs)
        in_total = sum(tx.get("amount", 0) for tx in in_txs)
        out_total = sum(tx.get("amount", 0) for tx in out_txs)
        
        if in_total > 0 and out_total > 0:
            turnover_ratio = out_total / in_total
            if 0.9 < turnover_ratio < 1.1:  # Near-perfect balance
                mixer_score += 20
                reasons.append(f"Balanced input/output ratio: {turnover_ratio:.2f}")
        
        # If score is high enough, consider it mixer-like
        if mixer_score >= 50:
            risk_level = "high" if mixer_score >= 80 else "medium" if mixer_score >= 65 else "low"
            
            mixer_like_addresses.append({
                "address": candidate,
                "mixer_score": mixer_score,
                "risk_level": risk_level,
                "in_degree": in_degree,
                "out_degree": out_degree,
                "in_amount": in_total,
                "out_amount": out_total,
                "reasons": reasons
            })
    
    # Sort by mixer score (highest first)
    mixer_like_addresses.sort(key=lambda x: x["mixer_score"], reverse=True)
    
    print(f"Identified {len(mixer_like_addresses)} addresses with mixer-like behavior")
    if mixer_like_addresses:
        print("\nTop suspected mixers:")
        for addr in mixer_like_addresses[:3]:  # Show top 3
            print(f"- {addr['address']} | Score: {addr['mixer_score']} | Risk: {addr['risk_level']}")
            for reason in addr["reasons"]:
                print(f"  * {reason}")
    
    return mixer_like_addresses

# Detect mixer-like behavior
suspected_mixers = detect_mixer_like_behavior(mixer_flow_graph)

No flow graph available for mixer-like behavior detection


## Generate Conclusions and Report

Let's create a comprehensive report of our findings:

In [43]:
# Generate mixer analysis report
def generate_mixer_report() -> str:
    report = ["# Cryptocurrency Mixer Analysis Report\n"]
    report.append(f"Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
    
    # Known mixers analysis
    report.append("## Known Mixer Services\n")
    for mixer_name, mixer_info in KNOWN_MIXERS.items():
        report.append(f"### {mixer_name.replace('_', ' ').title()}\n")
        report.append(f"Risk level: **{mixer_info['risk_level'].upper()}**\n")
        report.append(f"**Addresses:**")
        for addr in mixer_info["addresses"]:
            report.append(f"- `{addr}`")
        
        report.append("\n**Typical Transaction Patterns:**")
        for pattern in mixer_info["transaction_patterns"]:
            report.append(f"- {pattern.replace('_', ' ').title()}")
        report.append("")
    
    # Transaction patterns and entropy analysis
    entropy_scores = entropy_analysis.get("entropy_scores", [])
    anomalies = entropy_analysis.get("anomalies", [])
    
    if entropy_scores or anomalies:
        report.append("## Transaction Pattern Analysis\n")
        
        if entropy_scores:
            report.append("### Entropy Analysis\n")
            report.append("Higher entropy indicates more randomness and potential obfuscation:\n")
            
            report.append("| Feature | Entropy | Normalized Entropy |")
            report.append("| ------- | ------- | ------------------ |")
            for score in entropy_scores:
                report.append(f"| {score['feature']} | {score['entropy']:.4f} | {score['normalized_entropy']:.4f} |")
            report.append("")
        
        if anomalies:
            report.append("### Detected Anomalies\n")
            for anomaly in anomalies:
                report.append(f"- **{anomaly['type']}**: {anomaly['description']}")
            report.append("")
    
    # Suspicious users analysis
    suspicious_users = mixer_users.get("suspicious_users", [])
    if suspicious_users:
        report.append("## Suspicious Address Analysis\n")
        report.append("### High-Risk Mixer Users\n")
        report.append("| Address | Role | Risk Score | Reason | Amount |")
        report.append("| ------- | ---- | ---------- | ------ | ------ |")
        for user in suspicious_users:
            report.append(f"| `{user['address']}` | {user['role'].title()} | {user['risk_score']} | {user['reason']} | ${user['total_amount']:.2f} |")
        report.append("")
    
    # Suspected mixers analysis
    if suspected_mixers:
        report.append("## Suspected Mixer Services\n")
        report.append("These addresses exhibit behavior consistent with cryptocurrency mixers:\n")
        
        report.append("| Address | Score | Risk Level | In/Out Connections | In/Out Volume |")
        report.append("| ------- | ----- | ---------- | ------------------ | ------------- |")
        for mixer in suspected_mixers:
            report.append(f"| `{mixer['address']}` | {mixer['mixer_score']} | {mixer['risk_level'].upper()} | {mixer['in_degree']}/{mixer['out_degree']} | ${mixer['in_amount']:.2f}/${mixer['out_amount']:.2f} |")
        
        report.append("\n### Detection Reasons\n")
        for mixer in suspected_mixers[:5]:  # Show top 5
            report.append(f"**{mixer['address']}**:")
            for reason in mixer["reasons"]:
                report.append(f"- {reason}")
            report.append("")
    
    # Graph visualization
    report.append("## Transaction Flow Visualization\n")
    report.append("The following graph shows the flow of funds through mixer services:\n")
    report.append(f"![Transaction Flow](../../data/visualizations/mixer_flow_graph.png)")
    
    # Mitigation recommendations
    report.append("\n## Mixer Detection Recommendations\n")
    report.append("1. **Monitor high-risk addresses** - Track transactions from addresses identified as high-risk mixer users")
    report.append("2. **Implement transaction entropy analysis** - Deploy real-time entropy analysis to detect mixer-like patterns")
    report.append("3. **Graph analytics** - Use graph algorithms to identify suspicious fund flows through mixers")
    report.append("4. **Regular updates to mixer database** - Keep the list of known mixer addresses current")
    report.append("5. **Temporal pattern analysis** - Look for regular time intervals in transaction patterns")
    report.append("6. **Amount uniformity checks** - Flag transactions with suspiciously uniform amounts")
    
    return "\n".join(report)

# Generate and save the report
report_content = generate_mixer_report()

# Save report to file
os.makedirs("../../reports", exist_ok=True)
report_path = f"../../reports/mixer_analysis_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"

with open(report_path, "w") as f:
    f.write(report_content)

print(f"\nReport generated and saved to {report_path}")

# Display report in notebook
from IPython.display import Markdown
display(Markdown(report_content))


Report generated and saved to ../../reports/mixer_analysis_report_20250424_223407.md


# Cryptocurrency Mixer Analysis Report

Generated on: 2025-04-24 22:34:07

## Known Mixer Services

### Tornado Cash Solana

Risk level: **HIGH**

**Addresses:**
- `TCnifB7JjcmXP5F7hJ9wQDEq47Qympz4JbRRUBtcEJ8`
- `TCxZde8bp2sp5s7fK8K4nnzJWXpMjHmYLAXPYvgdz4Z`

**Typical Transaction Patterns:**
- Equal Amounts
- Fixed Intervals
- Privacy Set

### Elusiv

Risk level: **MEDIUM**

**Addresses:**
- `E1w8SZpBPkRBdBmEJUwpRZx1SbQVVhNqkuH95uKJvypH`
- `2EgZ5LuMqyVKQYS4AFhJRpZNr1rLxuU4UJuNJbcKFubu`

**Typical Transaction Patterns:**
- Stealth Addresses
- Ring Signature
- Decoy Outputs

### Cyclos Mixer

Risk level: **MEDIUM**

**Addresses:**
- `CYcLEsDHNZn8mVimVJLMFeYRGzdPx9QmxUL5kgKSTsdq`
- `CYCSaAMM4tJLXeXQKVAZrkLUKu8gYXhRNfYJG5qkKPQt`

**Typical Transaction Patterns:**
- Pool Deposits
- Time Locks
- Uniform Withdrawals

## Transaction Flow Visualization

The following graph shows the flow of funds through mixer services:

![Transaction Flow](../../data/visualizations/mixer_flow_graph.png)

## Mixer Detection Recommendations

1. **Monitor high-risk addresses** - Track transactions from addresses identified as high-risk mixer users
2. **Implement transaction entropy analysis** - Deploy real-time entropy analysis to detect mixer-like patterns
3. **Graph analytics** - Use graph algorithms to identify suspicious fund flows through mixers
4. **Regular updates to mixer database** - Keep the list of known mixer addresses current
5. **Temporal pattern analysis** - Look for regular time intervals in transaction patterns
6. **Amount uniformity checks** - Flag transactions with suspiciously uniform amounts