# 🔄 Base Chain Arbitrage Pools Analysis

This notebook analyzes the results from Base chain pool screener to identify arbitrage opportunities.

**Data Source**: MongoDB `strategies` database collections  
**Network**: Base  
**Purpose**: Analyze pool liquidity, trading volume, and identify arbitrage potential


## 📦 Environment Setup


In [1]:
# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

# Standard libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Database connection
from core.database_manager import db_manager

print("✅ Environment setup complete")


✅ Environment setup complete


## 🗄️ Load Data from MongoDB

Fetch the latest Base chain pool screening results from the `pools` collection


In [2]:
# Connect to MongoDB
import os
from pymongo import MongoClient

MONGO_URI = os.getenv('MONGO_URI', 'mongodb://admin:admin@localhost:27017/quants_lab?authSource=admin')
client = MongoClient(MONGO_URI)
db = client['quants_lab']
pools_collection = db['pools']

print("🔗 Connected to MongoDB")

# Get the latest 10 results (to get data from multiple runs)
latest_results = list(pools_collection.find({}).sort("timestamp", -1).limit(10))

print(f"📥 Found {len(latest_results)} recent screening results")

# Collect all pools from filtered results
all_pools = []
for result in latest_results:
    # Get filtered pools from both trending and new pools
    filtered_trending = result.get('filtered_trending_pools', [])
    filtered_new = result.get('filtered_new_pools', [])
    
    # Add timestamp to each pool for tracking
    for pool in filtered_trending:
        pool['source'] = 'trending'
        pool['screened_at'] = result.get('timestamp')
    
    for pool in filtered_new:
        pool['source'] = 'new'
        pool['screened_at'] = result.get('timestamp')
    
    all_pools.extend(filtered_trending)
    all_pools.extend(filtered_new)

print(f"✅ Loaded {len(all_pools)} pools in total")

# Remove duplicates based on pool address
if all_pools:
    # Create a dict to track unique pools (keep the most recent)
    unique_pools = {}
    for pool in all_pools:
        address = pool.get('address', '')
        if address:
            if address not in unique_pools:
                unique_pools[address] = pool
    
    all_pools = list(unique_pools.values())
    print(f"📊 After removing duplicates: {len(all_pools)} unique pools")
else:
    print("⚠️ No pools found in the database")
    print("\n💡 Possible reasons:")
    print("   1. Pool screener hasn't run yet")
    print("   2. All pools were filtered out by screening criteria")
    print("   3. MongoDB connection issue")
    print("\n🔧 To fix:")
    print("   1. Run: export MONGO_URI='mongodb://admin:admin@localhost:27017/quants_lab?authSource=admin'")
    print("   2. Run: python cli.py run-tasks --config config/base_pools_production.yml")
    print("   3. Wait for the task to complete (check logs)")
    print("   4. Re-run this notebook")


🔗 Connected to MongoDB
📥 Found 10 recent screening results
✅ Loaded 11 pools in total
📊 After removing duplicates: 10 unique pools


## 📊 Data Processing

Convert pool data to DataFrame and prepare for analysis


In [3]:
if all_pools:
    # Convert to DataFrame
    pools_df = pd.DataFrame(all_pools)
    
    # Convert numeric columns
    numeric_columns = [
        'fdv_usd', 'market_cap_usd', 'volume_usd_h24', 'reserve_in_usd',
        'transactions_h24_buys', 'transactions_h24_sells',
        'price_change_percentage_h1', 'price_change_percentage_h24'
    ]
    
    for col in numeric_columns:
        if col in pools_df.columns:
            pools_df[col] = pd.to_numeric(pools_df[col], errors='coerce')
    
    # Parse datetime
    if 'pool_created_at' in pools_df.columns:
        pools_df['pool_created_at'] = pd.to_datetime(pools_df['pool_created_at'], errors='coerce')
        pools_df['pool_age_days'] = (datetime.utcnow() - pools_df['pool_created_at']).dt.total_seconds() / 86400
    
    print("✅ DataFrame created successfully")
    print(f"📊 Shape: {pools_df.shape[0]} rows × {pools_df.shape[1]} columns")
    print(f"\n📋 Available columns:")
    for i, col in enumerate(pools_df.columns, 1):
        print(f"   {i:2d}. {col}")
else:
    pools_df = pd.DataFrame()
    print("❌ No data to process")


✅ DataFrame created successfully
📊 Shape: 10 rows × 31 columns

📋 Available columns:
    1. id
    2. type
    3. name
    4. base_token_price_usd
    5. base_token_price_native_currency
    6. quote_token_price_usd
    7. quote_token_price_native_currency
    8. address
    9. reserve_in_usd
   10. pool_created_at
   11. fdv_usd
   12. market_cap_usd
   13. price_change_percentage_h1
   14. price_change_percentage_h24
   15. transactions_h1_buys
   16. transactions_h1_sells
   17. transactions_h24_buys
   18. transactions_h24_sells
   19. volume_usd_h24
   20. dex_id
   21. base_token_id
   22. quote_token_id
   23. network_id
   24. base
   25. quote
   26. volume_liquidity_ratio
   27. fdv_liquidity_ratio
   28. fdv_volume_ratio
   29. source
   30. screened_at
   31. pool_age_days


## 🔍 Data Overview & Basic Statistics


In [None]:
if not pools_df.empty:
    print("📋 BASE CHAIN POOL SCREENER RESULTS")
    print("=" * 100)
    
    # Source distribution (trending vs new)
    if 'source' in pools_df.columns:
        print("\n🎯 Pools by Source:")
        source_counts = pools_df['source'].value_counts()
        for source, count in source_counts.items():
            print(f"   • {source:35s}: {count:3d} pools")
    
    # Volume statistics
    if 'volume_usd_h24' in pools_df.columns:
        total_volume = pools_df['volume_usd_h24'].sum()
        avg_volume = pools_df['volume_usd_h24'].mean()
        median_volume = pools_df['volume_usd_h24'].median()
        print(f"\n💰 24h Volume Statistics:")
        print(f"   • Total Volume:   ${total_volume:>15,.0f}")
        print(f"   • Average/Pool:   ${avg_volume:>15,.0f}")
        print(f"   • Median/Pool:    ${median_volume:>15,.0f}")
        print(f"   • Min/Pool:       ${pools_df['volume_usd_h24'].min():>15,.0f}")
        print(f"   • Max/Pool:       ${pools_df['volume_usd_h24'].max():>15,.0f}")
    
    # Liquidity statistics  
    if 'reserve_in_usd' in pools_df.columns:
        total_liquidity = pools_df['reserve_in_usd'].sum()
        avg_liquidity = pools_df['reserve_in_usd'].mean()
        median_liquidity = pools_df['reserve_in_usd'].median()
        print(f"\n💎 Liquidity Statistics:")
        print(f"   • Total Liquidity: ${total_liquidity:>15,.0f}")
        print(f"   • Average/Pool:    ${avg_liquidity:>15,.0f}")
        print(f"   • Median/Pool:     ${median_liquidity:>15,.0f}")
        print(f"   • Min/Pool:        ${pools_df['reserve_in_usd'].min():>15,.0f}")
        print(f"   • Max/Pool:        ${pools_df['reserve_in_usd'].max():>15,.0f}")
    
    # FDV statistics
    if 'fdv_usd' in pools_df.columns:
        avg_fdv = pools_df['fdv_usd'].mean()
        median_fdv = pools_df['fdv_usd'].median()
        print(f"\n📈 FDV (Fully Diluted Valuation) Statistics:")
        print(f"   • Average FDV:  ${avg_fdv:>15,.0f}")
        print(f"   • Median FDV:   ${median_fdv:>15,.0f}")
        print(f"   • Min FDV:      ${pools_df['fdv_usd'].min():>15,.0f}")
        print(f"   • Max FDV:      ${pools_df['fdv_usd'].max():>15,.0f}")
    
    # Pool age statistics
    if 'pool_age_days' in pools_df.columns:
        avg_age = pools_df['pool_age_days'].mean()
        median_age = pools_df['pool_age_days'].median()
        print(f"\n⏰ Pool Age Statistics:")
        print(f"   • Average Age: {avg_age:>6.1f} days")
        print(f"   • Median Age:  {median_age:>6.1f} days")
        print(f"   • Newest:      {pools_df['pool_age_days'].min():>6.1f} days")
        print(f"   • Oldest:      {pools_df['pool_age_days'].max():>6.1f} days")
    
    print("\n" + "=" * 100)
    
    # Display sample data
    print("\n📋 Sample Pool Data (first 5):")
    display_cols = ['name', 'source', 'volume_usd_h24', 'reserve_in_usd', 'fdv_usd']
    available_cols = [col for col in display_cols if col in pools_df.columns]
    display(pools_df[available_cols].head())
else:
    print("❌ No pools data to display")


## 📈 Calculate Arbitrage Indicators

Calculate key metrics for identifying arbitrage opportunities


In [5]:
if not pools_df.empty:
    print("🔢 Calculating Arbitrage Indicators...")
    print("=" * 80)
    
    # 1. Volume/Liquidity Ratio (higher = more active trading, better for arbitrage)
    if 'volume_usd_h24' in pools_df.columns and 'reserve_in_usd' in pools_df.columns:
        pools_df['volume_liquidity_ratio'] = (
            pools_df['volume_usd_h24'] / (pools_df['reserve_in_usd'] + 1)
        ).fillna(0)
        print("✅ Volume/Liquidity Ratio calculated")
    
    # 2. FDV/Liquidity Ratio (lower = better liquidity depth relative to market cap)
    if 'fdv_usd' in pools_df.columns and 'reserve_in_usd' in pools_df.columns:
        pools_df['fdv_liquidity_ratio'] = (
            pools_df['fdv_usd'] / (pools_df['reserve_in_usd'] + 1)
        ).fillna(0)
        print("✅ FDV/Liquidity Ratio calculated")
    
    # 3. Total Transactions
    if 'transactions_h24_buys' in pools_df.columns and 'transactions_h24_sells' in pools_df.columns:
        pools_df['total_transactions'] = (
            pools_df['transactions_h24_buys'] + pools_df['transactions_h24_sells']
        ).fillna(0)
        print("✅ Total Transactions calculated")
        
        # 4. Buy/Sell Pressure Ratio
        pools_df['buy_sell_ratio'] = (
            pools_df['transactions_h24_buys'] / (pools_df['transactions_h24_sells'] + 1)
        ).fillna(1.0)
        print("✅ Buy/Sell Pressure Ratio calculated")
    
    # 5. Liquidity Per Transaction (higher = less slippage per trade)
    if 'reserve_in_usd' in pools_df.columns and 'total_transactions' in pools_df.columns:
        pools_df['liquidity_per_transaction'] = (
            pools_df['reserve_in_usd'] / (pools_df['total_transactions'] + 1)
        ).fillna(0)
        print("✅ Liquidity Per Transaction calculated")
    
    # 6. Composite Arbitrage Score
    if all(col in pools_df.columns for col in ['volume_liquidity_ratio', 'total_transactions']):
        # Normalize individual metrics to 0-1 scale
        pools_df['vol_liq_norm'] = (
            (pools_df['volume_liquidity_ratio'] - pools_df['volume_liquidity_ratio'].min()) / 
            (pools_df['volume_liquidity_ratio'].max() - pools_df['volume_liquidity_ratio'].min() + 0.0001)
        ).fillna(0)
        
        pools_df['transactions_norm'] = (
            (pools_df['total_transactions'] - pools_df['total_transactions'].min()) / 
            (pools_df['total_transactions'].max() - pools_df['total_transactions'].min() + 0.0001)
        ).fillna(0)
        
        # Balanced buy/sell = better for stable arbitrage
        pools_df['balance_score'] = 1 - abs(pools_df['buy_sell_ratio'] - 1.0).clip(0, 1)
        
        # Composite score: weighted combination
        pools_df['arbitrage_score'] = (
            pools_df['vol_liq_norm'] * 0.35 +           # 35% volume/liquidity activity
            pools_df['transactions_norm'] * 0.35 +      # 35% transaction volume
            pools_df['balance_score'] * 0.30            # 30% buy/sell balance
        )
        print("✅ Composite Arbitrage Score calculated")
    
    print("\n✅ All arbitrage indicators calculated successfully!")
    print("=" * 80)


🔢 Calculating Arbitrage Indicators...
✅ Volume/Liquidity Ratio calculated
✅ FDV/Liquidity Ratio calculated
✅ Total Transactions calculated
✅ Buy/Sell Pressure Ratio calculated
✅ Liquidity Per Transaction calculated
✅ Composite Arbitrage Score calculated

✅ All arbitrage indicators calculated successfully!


In [None]:
if not pools_df.empty and 'arbitrage_score' in pools_df.columns:
    # Sort by arbitrage score
    top_opportunities = pools_df.sort_values('arbitrage_score', ascending=False).head(20)
    
    print("🏆 TOP 20 ARBITRAGE OPPORTUNITIES")
    print("=" * 120)
    print(f"\n{'Rank':<5} {'Pool Name':<25} {'Score':<7} {'Vol/Liq':<8} {'Volume 24h':<15} {'Liquidity':<15} {'Txns':<6} {'Source':<20}")
    print("-" * 120)
    
    for i, (idx, row) in enumerate(top_opportunities.iterrows(), 1):
        name = row.get('name', 'N/A')[:24]
        score = row.get('arbitrage_score', 0)
        vol_liq = row.get('volume_liquidity_ratio', 0)
        volume = row.get('volume_usd_h24', 0)
        liquidity = row.get('reserve_in_usd', 0)
        txns = int(row.get('total_transactions', 0))
        strategy = row.get('source', 'N/A')[:19]
        
        print(f"{i:<5} {name:<25} {score:>6.3f} {vol_liq:>7.2f} ${volume:>13,.0f} ${liquidity:>13,.0f} {txns:>5} {strategy:<20}")
    
    print("=" * 120)
    
    # Key insights
    print(f"\n📊 Key Insights:")
    print(f"   • Best opportunity: {top_opportunities.iloc[0]['name']}")
    print(f"   • Highest arbitrage score: {top_opportunities.iloc[0]['arbitrage_score']:.3f}")
    print(f"   • Average score (top 20): {top_opportunities['arbitrage_score'].mean():.3f}")
    
    if 'total_transactions' in top_opportunities.columns:
        high_activity = len(top_opportunities[top_opportunities['total_transactions'] > 1000])
        print(f"   • High-activity pools (>1000 txns): {high_activity}")
    
    # Display detailed table
    print(f"\n📋 Detailed Top 20 Opportunities:")
    display_cols = ['name', 'arbitrage_score', 'volume_liquidity_ratio', 'volume_usd_h24', 
                    'reserve_in_usd', 'total_transactions', 'buy_sell_ratio', 'source']
    available_cols = [col for col in display_cols if col in top_opportunities.columns]
    display(top_opportunities[available_cols].reset_index(drop=True))


## 📊 Visualization: Arbitrage Score Distribution


In [7]:
if not pools_df.empty and 'arbitrage_score' in pools_df.columns:
    # Create histogram
    fig = px.histogram(
        pools_df, 
        x='arbitrage_score',
        nbins=30,
        title='Distribution of Arbitrage Scores Across All Pools',
        labels={'arbitrage_score': 'Arbitrage Score', 'count': 'Number of Pools'},
        color_discrete_sequence=['#1f77b4']
    )
    
    fig.add_vline(
        x=pools_df['arbitrage_score'].mean(), 
        line_dash="dash", 
        line_color="red",
        annotation_text=f"Mean: {pools_df['arbitrage_score'].mean():.3f}",
        annotation_position="top right"
    )
    
    fig.update_layout(
        xaxis_title='Arbitrage Score',
        yaxis_title='Number of Pools',
        showlegend=False,
        height=500
    )
    
    fig.show()


## 📊 Visualization: Volume vs Liquidity Scatter


In [8]:
if not pools_df.empty and all(col in pools_df.columns for col in ['volume_usd_h24', 'reserve_in_usd', 'arbitrage_score']):
    # Create scatter plot
    fig = px.scatter(
        pools_df,
        x='reserve_in_usd',
        y='volume_usd_h24',
        size='total_transactions' if 'total_transactions' in pools_df.columns else None,
        color='arbitrage_score',
        hover_data=['name', 'volume_liquidity_ratio', 'source', 'dex_id'],
        title='Volume vs Liquidity (colored by Arbitrage Score, sized by Transactions)',
        labels={
            'reserve_in_usd': 'Pool Liquidity (USD)',
            'volume_usd_h24': '24h Trading Volume (USD)',
            'arbitrage_score': 'Arb Score'
        },
        color_continuous_scale='Viridis'
    )
    
    fig.update_layout(
        height=600,
        xaxis_type='log',
        yaxis_type='log'
    )
    
    fig.show()


## 📊 Visualization: Strategy Comparison


In [None]:
if not pools_df.empty and 'source' in pools_df.columns and 'arbitrage_score' in pools_df.columns:
    # Calculate average metrics by strategy
    strategy_stats = pools_df.groupby('source').agg({
        'arbitrage_score': 'mean',
        'volume_liquidity_ratio': 'mean',
        'total_transactions': 'mean',
        'volume_usd_h24': 'mean',
        'reserve_in_usd': 'mean'
    }).round(2)
    
    # Create bar chart for arbitrage scores by strategy
    fig = px.bar(
        strategy_stats.reset_index(),
        x='source',
        y='arbitrage_score',
        title='Average Arbitrage Score by Source (Trending/New)',
        labels={'source': 'Source', 'arbitrage_score': 'Avg Arbitrage Score'},
        color='arbitrage_score',
        color_continuous_scale='Blues'
    )
    
    fig.update_layout(
        xaxis_tickangle=-45,
        height=500
    )
    
    fig.show()
    
    # Display detailed stats table
    print("\n📊 Source Performance Comparison (Trending/New):")
    print("=" * 100)
    display(strategy_stats)


## 📊 Visualization: Buy/Sell Pressure Analysis


In [10]:
if not pools_df.empty and 'buy_sell_ratio' in pools_df.columns:
    # Categorize pools by buy/sell pressure
    pools_df['pressure_category'] = pd.cut(
        pools_df['buy_sell_ratio'],
        bins=[0, 0.8, 1.2, float('inf')],
        labels=['Sell Pressure', 'Balanced', 'Buy Pressure']
    )
    
    # Count by category
    pressure_counts = pools_df['pressure_category'].value_counts()
    
    # Create pie chart
    fig = px.pie(
        values=pressure_counts.values,
        names=pressure_counts.index,
        title='Market Pressure Distribution',
        color_discrete_sequence=['#ff7f0e', '#2ca02c', '#1f77b4']
    )
    
    fig.update_traces(textposition='inside', textinfo='percent+label')
    fig.update_layout(height=500)
    
    fig.show()
    
    # Print detailed analysis
    print("\n📊 Pressure Analysis by Category:")
    print("=" * 80)
    print(f"   • Sell Pressure (ratio < 0.8):  {pressure_counts.get('Sell Pressure', 0):3d} pools ({pressure_counts.get('Sell Pressure', 0)/len(pools_df)*100:5.1f}%)")
    print(f"   • Balanced (0.8-1.2):           {pressure_counts.get('Balanced', 0):3d} pools ({pressure_counts.get('Balanced', 0)/len(pools_df)*100:5.1f}%)")
    print(f"   • Buy Pressure (ratio > 1.2):   {pressure_counts.get('Buy Pressure', 0):3d} pools ({pressure_counts.get('Buy Pressure', 0)/len(pools_df)*100:5.1f}%)")
    print("\n💡 Balanced pools (0.8-1.2 ratio) are often best for stable arbitrage")



📊 Pressure Analysis by Category:
   • Sell Pressure (ratio < 0.8):    1 pools ( 10.0%)
   • Balanced (0.8-1.2):             8 pools ( 80.0%)
   • Buy Pressure (ratio > 1.2):     1 pools ( 10.0%)

💡 Balanced pools (0.8-1.2 ratio) are often best for stable arbitrage


## 📈 Top Pools by Volume/Liquidity Ratio


In [None]:
if not pools_df.empty and 'volume_liquidity_ratio' in pools_df.columns:
    # Get top 15 by volume/liquidity ratio
    top_vol_liq = pools_df.nlargest(15, 'volume_liquidity_ratio')
    
    # Create bar chart
    fig = px.bar(
        top_vol_liq,
        x='name',
        y='volume_liquidity_ratio',
        title='Top 15 Pools by Volume/Liquidity Ratio (Most Active)',
        labels={'name': 'Pool Name', 'volume_liquidity_ratio': 'Vol/Liq Ratio'},
        color='volume_liquidity_ratio',
        color_continuous_scale='Blues',
        hover_data=['source', 'dex_id', 'volume_usd_h24', 'reserve_in_usd']
    )
    
    fig.update_layout(
        xaxis_tickangle=-45,
        height=600,
        showlegend=False
    )
    
    fig.show()
    
    # Print high activity pools
    print("\n🔥 High Activity Pools (Vol/Liq Ratio > 3.0):")
    print("=" * 100)
    high_activity = pools_df[pools_df['volume_liquidity_ratio'] > 3.0].sort_values('volume_liquidity_ratio', ascending=False)
    
    if not high_activity.empty:
        print(f"\n{'Pool Name':<30} {'Ratio':<8} {'Volume 24h':<15} {'Liquidity':<15} {'Source':<25}")
        print("-" * 100)
        for idx, row in high_activity.head(10).iterrows():
            print(f"{row['name']:<30} {row['volume_liquidity_ratio']:>7.2f} ${row['volume_usd_h24']:>13,.0f} ${row['reserve_in_usd']:>13,.0f} {row['source']:<25}")
    else:
        print("   No pools with ratio > 3.0 found")


## 🎯 Arbitrage Strategy Recommendations

Based on pool characteristics, identify suitable arbitrage strategies


In [None]:
if not pools_df.empty and all(col in pools_df.columns for col in ['arbitrage_score', 'volume_liquidity_ratio', 'reserve_in_usd']):
    print("🎯 ARBITRAGE OPPORTUNITIES BY SOURCE")
    print("=" * 120)
    
    # Approach 1: High Liquidity Stable Arbitrage (Low Risk)
    high_liq = pools_df[
        (pools_df['reserve_in_usd'] > 200000) &
        (pools_df['volume_liquidity_ratio'] > 1.5) &
        (pools_df['arbitrage_score'] > 0.4)
    ].nlargest(5, 'arbitrage_score')
    
    print("\n💎 Approach 1: HIGH LIQUIDITY STABLE ARBITRAGE")
    print("   Risk Level: LOW | Position Size: LARGE | Frequency: MEDIUM")
    print("   Characteristics: Deep liquidity, stable spreads, minimal slippage")
    print(f"   Suitable pools: {len(high_liq)}")
    print("-" * 120)
    
    if not high_liq.empty:
        for idx, row in high_liq.iterrows():
            print(f"   ✓ {row['name']:<25} | Liq: ${row['reserve_in_usd']:>12,.0f} | Score: {row['arbitrage_score']:>5.3f} | Strategy: {row['source']}")
    else:
        print("   ⚠️  No pools currently meet high liquidity criteria")
    
    # Approach 2: High Frequency Trading (Medium Risk)
    high_freq = pools_df[
        (pools_df['volume_liquidity_ratio'] > 3.0) &
        (pools_df['total_transactions'] > 800) &
        (pools_df['arbitrage_score'] > 0.3)
    ].nlargest(5, 'volume_liquidity_ratio')
    
    print("\n\n🔥 Approach 2: HIGH FREQUENCY ARBITRAGE")
    print("   Risk Level: MEDIUM | Position Size: SMALL-MEDIUM | Frequency: HIGH")
    print("   Characteristics: Fast turnover, frequent opportunities, active trading")
    print(f"   Suitable pools: {len(high_freq)}")
    print("-" * 120)
    
    if not high_freq.empty:
        for idx, row in high_freq.iterrows():
            print(f"   ✓ {row['name']:<25} | Vol/Liq: {row['volume_liquidity_ratio']:>6.2f}x | Txns: {int(row['total_transactions']):>5} | Score: {row['arbitrage_score']:>5.3f}")
    else:
        print("   ⚠️  No pools currently meet high frequency criteria")
    
    # Approach 3: Balanced Multi-Pool Arbitrage (Medium-Low Risk)
    balanced = pools_df[
        (pools_df['buy_sell_ratio'] >= 0.8) &
        (pools_df['buy_sell_ratio'] <= 1.2) &
        (pools_df['arbitrage_score'] > 0.4) &
        (pools_df['reserve_in_usd'] > 50000)
    ].nlargest(5, 'arbitrage_score')
    
    print("\n\n⚖️  Approach 3: BALANCED MULTI-POOL ARBITRAGE")
    print("   Risk Level: MEDIUM-LOW | Position Size: MEDIUM | Frequency: MEDIUM-HIGH")
    print("   Characteristics: Balanced buy/sell pressure, stable price action")
    print(f"   Suitable pools: {len(balanced)}")
    print("-" * 120)
    
    if not balanced.empty:
        for idx, row in balanced.iterrows():
            print(f"   ✓ {row['name']:<25} | Buy/Sell: {row['buy_sell_ratio']:>5.2f} | Score: {row['arbitrage_score']:>5.3f} | Liq: ${row['reserve_in_usd']:>12,.0f}")
    else:
        print("   ⚠️  No pools currently meet balanced criteria")
    
    # Approach 4: New Pool Early Arbitrage (High Risk)
    if 'pool_age_days' in pools_df.columns:
        new_pools = pools_df[
            (pools_df['pool_age_days'] < 1) &
            (pools_df['arbitrage_score'] > 0.3) &
            (pools_df['volume_usd_h24'] > 50000)
        ].nlargest(5, 'arbitrage_score')
        
        print("\n\n🆕 Approach 4: NEW POOL EARLY ARBITRAGE")
        print("   Risk Level: HIGH ⚠️  | Position Size: SMALL | Frequency: OPPORTUNISTIC")
        print("   Characteristics: High volatility, price discovery, rug pull risk")
        print(f"   Suitable pools: {len(new_pools)}")
        print("-" * 120)
        
        if not new_pools.empty:
            for idx, row in new_pools.iterrows():
                age_hours = row['pool_age_days'] * 24
                print(f"   ⚠️  {row['name']:<25} | Age: {age_hours:>5.1f}h | Score: {row['arbitrage_score']:>5.3f} | Vol: ${row['volume_usd_h24']:>12,.0f}")
        else:
            print("   ℹ️  No very new pools currently available")
    
    print("\n" + "=" * 120)


In [None]:
if not pools_df.empty:
    print("📊 FINAL SUMMARY STATISTICS")
    print("=" * 100)
    
    print(f"\n📈 Overall Market Stats:")
    print(f"   • Total pools analyzed: {len(pools_df)}")
    print(f"   • Total 24h volume: ${pools_df['volume_usd_h24'].sum():,.0f}")
    print(f"   • Total liquidity: ${pools_df['reserve_in_usd'].sum():,.0f}")
    print(f"   • Average pool age: {pools_df['pool_age_days'].mean():.1f} days" if 'pool_age_days' in pools_df.columns else "")
    
    if 'arbitrage_score' in pools_df.columns:
        print(f"\n🎯 Arbitrage Potential Distribution:")
        high = len(pools_df[pools_df['arbitrage_score'] > 0.7])
        medium = len(pools_df[(pools_df['arbitrage_score'] >= 0.4) & (pools_df['arbitrage_score'] <= 0.7)])
        low = len(pools_df[pools_df['arbitrage_score'] < 0.4])
        
        print(f"   • High-potential (score > 0.7):      {high:3d} pools ({high/len(pools_df)*100:5.1f}%)")
        print(f"   • Medium-potential (0.4-0.7):        {medium:3d} pools ({medium/len(pools_df)*100:5.1f}%)")
        print(f"   • Low-potential (< 0.4):             {low:3d} pools ({low/len(pools_df)*100:5.1f}%)")
        print(f"   • Average arbitrage score: {pools_df['arbitrage_score'].mean():.3f}")
    
    if 'volume_liquidity_ratio' in pools_df.columns:
        print(f"\n⚡ Activity Metrics:")
        print(f"   • Average Vol/Liq ratio: {pools_df['volume_liquidity_ratio'].mean():.2f}")
        high_activity = len(pools_df[pools_df['volume_liquidity_ratio'] > 3])
        print(f"   • Highly active pools (ratio > 3): {high_activity} ({high_activity/len(pools_df)*100:5.1f}%)")
    
    if 'source' in pools_df.columns:
        print(f"\n📋 Source Coverage (Trending/New):")
        for strategy, count in pools_df['source'].value_counts().items():
            print(f"   • {strategy:35s}: {count:3d} pools ({count/len(pools_df)*100:5.1f}%)")
    
    print("\n" + "=" * 100)


## 💾 Export Results

Save top opportunities to CSV for further analysis or trading bot integration


In [None]:
if not pools_df.empty and 'arbitrage_score' in pools_df.columns:
    # Prepare export data
    export_df = pools_df.nlargest(50, 'arbitrage_score').copy()
    
    # Select and order columns for export
    export_cols = [
        'name', 'address', 'source', 'arbitrage_score',
        'volume_liquidity_ratio', 'volume_usd_h24', 'reserve_in_usd', 'fdv_usd',
        'total_transactions', 'buy_sell_ratio', 'liquidity_per_transaction',
        'price_change_percentage_h1', 'price_change_percentage_h24',
        'pool_created_at', 'pool_age_days'
    ]
    available_export_cols = [col for col in export_cols if col in export_df.columns]
    
    # Generate filename with timestamp
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    filename = f'../../app/outputs/base_arbitrage_top50_{timestamp}.csv'
    
    # Export to CSV
    export_df[available_export_cols].to_csv(filename, index=False)
    
    print("💾 EXPORT COMPLETE")
    print("=" * 80)
    print(f"✅ Exported top 50 opportunities to:")
    print(f"   {filename}")
    print(f"\n📊 Export details:")
    print(f"   • Number of pools: {len(export_df)}")
    print(f"   • Number of columns: {len(available_export_cols)}")
    print(f"   • Columns: {', '.join(available_export_cols[:5])}...")
    print(f"\n💡 Use this CSV for:")
    print(f"   • Trading bot configuration")
    print(f"   • Manual trade execution")
    print(f"   • Historical comparison")
    print(f"   • Risk analysis")
else:
    print("⚠️  No data to export")


## 📝 Conclusion

This analysis provides comprehensive insights into Base chain liquidity pools for arbitrage trading.

### 🎯 Key Takeaways

1. **Arbitrage Scoring**: Pools are ranked using a composite score that considers:
   - Volume/Liquidity ratio (trading activity)
   - Transaction volume (market participation)
   - Buy/Sell balance (price stability)

2. **Risk Levels**: Different strategies suit different risk profiles:
   - **Low Risk**: High liquidity stable arbitrage (>$200k liquidity)
   - **Medium Risk**: High frequency arbitrage (>3x vol/liq ratio)
   - **High Risk**: New pool opportunities (<24h old)

3. **Pool Characteristics**:
   - High Vol/Liq ratio (>3.0) = Very active, frequent opportunities
   - Balanced buy/sell (0.8-1.2) = Stable spreads, lower risk
   - High transactions (>1000/day) = Good liquidity depth

### 🚀 Next Steps

1. **Real-time Monitoring**: Set up alerts for top-scoring pools
2. **Cross-DEX Analysis**: Compare prices with other Base chain DEXs
3. **Gas Fee Calculation**: Factor in transaction costs for profit estimation
4. **Backtesting**: Test strategies on historical data
5. **Automation**: Implement automated arbitrage bots for high-frequency strategies

### ⚠️ Important Considerations

- **Gas Fees**: Base chain has low fees but they still impact small arbitrage
- **Slippage**: Calculate expected slippage based on liquidity depth
- **Timing**: High-frequency strategies require fast execution
- **Risk Management**: Never risk more than you can afford to lose
- **Rug Pulls**: Be extremely cautious with new pools (<24h old)

---

*Generated from Base pool screener results*  
*Configuration: `config/base_arbitrage_pools_screener.yml`*  
*Documentation: `docs/BASE_ARBITRAGE_GUIDE.md`*

**Happy Trading! 🚀**
