# Polymarket Hourly Prediction Market Analysis

This notebook analyzes orderbook data from Polymarket hourly prediction markets for crypto pairs (BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT) to develop market making strategies.

## Objectives:
1. **Data Exploration**: Understand orderbook patterns and market behavior
2. **Feature Engineering**: Create predictive features from orderbook and KLINES data
3. **Market Microstructure Analysis**: Analyze bid-ask spreads, volume imbalances, and price efficiency
4. **Strategy Development**: Identify profitable market making opportunities
5. **Correlation Analysis**: Find relationships between crypto price movements and prediction market behavior

In [1]:
# Import required libraries
import os
import sys
import polars as pl
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set up plotly for notebook display
pyo.init_notebook_mode(connected=True)

# Import our custom processor

sys.path.append(os.path.abspath(".."))
from analysis.data_processor import PolymarketDataProcessor

processor = PolymarketDataProcessor()

print("Libraries imported successfully!")

Libraries imported successfully!


## 1. Data Loading and Initial Exploration

In [2]:
# Initialize the data processor
processor = PolymarketDataProcessor("../orderbook_data")

# Load the most recent orderbook data
print("Loading orderbook data...")
orderbook_df = processor.load_orderbook_data()

print(f"Loaded {orderbook_df.height:,} orderbook records")
print(f"Date range: {orderbook_df['timestamp'].min()} to {orderbook_df['timestamp'].max()}")
print(f"Crypto pairs: {sorted(orderbook_df['crypto'].unique().to_list())}")
print(f"Unique markets: {orderbook_df['market_slug'].n_unique()}")
print(f"Unique assets: {orderbook_df['asset_id'].n_unique()}")

Loading orderbook data...
Loaded 33,700,390 orderbook records
Date range: 2025-07-15 06:02:28.796000 to 2025-07-18 15:04:22.190000
Crypto pairs: ['bitcoin', 'ethereum', 'solana', 'xrp']
Unique markets: 332
Unique assets: 664


In [3]:
# Display basic statistics
print("\nOrderbook Data Schema:")
print(orderbook_df.dtypes)

print("\nFirst 10 records:")
orderbook_df.head(10)


Orderbook Data Schema:
[Datetime(time_unit='us', time_zone=None), String, String, String, String, Float64, Float64, String]

First 10 records:


timestamp,market_slug,asset_id,crypto,side,price,size,event_type
datetime[μs],str,str,str,str,f64,f64,str
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.3,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.32,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.34,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.36,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.38,1532.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.4,64.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.42,1064.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.44,64.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.45,564.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.46,564.0,"""book"""


In [4]:
# Analyze data distribution by crypto
crypto_stats = (
    orderbook_df
    .group_by("crypto")
    .agg([
        pl.count().alias("total_records"),
        pl.col("market_slug").n_unique().alias("unique_markets"),
        pl.col("asset_id").n_unique().alias("unique_assets"),
        pl.col("price").mean().alias("avg_price"),
        pl.col("size").sum().alias("total_volume"),
        pl.col("timestamp").min().alias("first_record"),
        pl.col("timestamp").max().alias("last_record")
    ])
    .sort("total_records", descending=True)
)

print("Data Distribution by Crypto:")
crypto_stats

Data Distribution by Crypto:


crypto,total_records,unique_markets,unique_assets,avg_price,total_volume,first_record,last_record
str,u32,u32,u32,f64,f64,datetime[μs],datetime[μs]
"""bitcoin""",12172649,83,166,0.500004,13492000000.0,2025-07-15 06:20:47.479,2025-07-18 15:04:22.045
"""ethereum""",8510252,83,166,0.499998,8472600000.0,2025-07-15 06:18:09.297,2025-07-18 15:04:22.190
"""solana""",6645500,83,166,0.5,6324800000.0,2025-07-15 06:23:26.688,2025-07-18 15:04:22.155
"""xrp""",6371989,83,166,0.5,6687700000.0,2025-07-15 06:02:28.796,2025-07-18 15:04:22.181


## 2. Market Microstructure Analysis

In [5]:
# Resample data to 1-minute intervals for analysis
print("Resampling orderbook data to 1-minute intervals...")
resampled_data = processor.resample_orderbook_to_intervals(orderbook_df, "1m")

print(f"Resampled to {resampled_data.height:,} 1-minute intervals")
print("\nSample of resampled data:")
resampled_data.head()

Resampling orderbook data to 1-minute intervals...


ComputeError: input data is not sorted

In [None]:
# Calculate market features
print("Calculating market microstructure features...")
features_df = processor.calculate_market_features(resampled_data)

print(f"Features calculated for {features_df.height:,} intervals")
print("\nAvailable features:")
print([col for col in features_df.columns if col not in orderbook_df.columns])

In [None]:
# Visualize bid-ask spreads across different crypto pairs
spread_analysis = (
    features_df
    .filter(pl.col("spread_pct").is_not_null() & (pl.col("spread_pct") > 0))
    .group_by("crypto")
    .agg([
        pl.col("spread_pct").mean().alias("avg_spread_pct"),
        pl.col("spread_pct").median().alias("median_spread_pct"),
        pl.col("spread_pct").quantile(0.95).alias("p95_spread_pct"),
        pl.col("total_volume").mean().alias("avg_volume")
    ])
    .sort("avg_spread_pct", descending=True)
)

print("Bid-Ask Spread Analysis:")
spread_analysis

In [None]:
# Create spread visualization
spread_data = features_df.filter(
    pl.col("spread_pct").is_not_null() & 
    (pl.col("spread_pct") > 0) & 
    (pl.col("spread_pct") < 0.5)  # Filter outliers
).to_pandas()

fig = px.box(
    spread_data, 
    x="crypto", 
    y="spread_pct",
    title="Bid-Ask Spread Distribution by Crypto",
    labels={"spread_pct": "Spread (%)", "crypto": "Cryptocurrency"}
)
fig.update_layout(height=500)
fig.show()

## 3. Volume and Liquidity Analysis

In [None]:
# Analyze volume patterns and liquidity
volume_analysis = (
    features_df
    .with_columns([
        pl.col("timestamp").dt.hour().alias("hour"),
        pl.col("timestamp").dt.day_name().alias("day_of_week")
    ])
    .group_by(["crypto", "hour"])
    .agg([
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("spread_pct").mean().alias("avg_spread"),
        pl.col("bid_ratio").mean().alias("avg_bid_ratio")
    ])
    .sort(["crypto", "hour"])
)

print("Volume patterns by hour of day:")
volume_analysis.head(24)  # Show first 24 hours

In [None]:
# Visualize hourly volume patterns
volume_hourly = volume_analysis.to_pandas()

fig = px.line(
    volume_hourly, 
    x="hour", 
    y="avg_volume", 
    color="crypto",
    title="Average Hourly Volume Patterns by Cryptocurrency",
    labels={"avg_volume": "Average Volume", "hour": "Hour of Day"}
)
fig.update_layout(height=500)
fig.show()

## 4. Market Making Opportunity Identification

In [None]:
# Identify market making opportunities
print("Identifying market making opportunities...")
opportunities = processor.identify_market_opportunities(features_df, min_spread_threshold=0.02)

print(f"Found {opportunities.height:,} potential opportunities")
print("\nTop 10 opportunities:")
opportunities.head(10)

In [None]:
# Analyze opportunity distribution by crypto
opportunity_stats = (
    opportunities
    .group_by("crypto")
    .agg([
        pl.count().alias("total_opportunities"),
        pl.col("opportunity_score").mean().alias("avg_opportunity_score"),
        pl.col("spread_pct").mean().alias("avg_spread"),
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("market_sentiment").mode().first().alias("dominant_sentiment")
    ])
    .sort("avg_opportunity_score", descending=True)
)

print("Opportunity Analysis by Crypto:")
opportunity_stats

## 5. KLINES Integration and Correlation Analysis

In [None]:
# Load KLINES data for correlation analysis
crypto_pairs = ["ETHUSDT", "BTCUSDT"]  # Start with these two

correlation_results = {}

for pair in crypto_pairs:
    try:
        print(f"\nAnalyzing {pair}...")
        
        # Load KLINES data (automatically synced with orderbook date range)
        klines_df = processor.load_klines_data(pair)
        print(f"Loaded {klines_df.height:,} KLINES records for {pair}")
        
        # Show time range synchronization info
        time_info = processor.get_data_time_range_info()
        print(f"  Orderbook range: {time_info['orderbook_start']} to {time_info['orderbook_end']}")
        print(f"  KLINES range: {time_info['klines_start']} to {time_info['klines_end']}")
        print(f"  Duration: {time_info['orderbook_duration_hours']:.1f} hours (orderbook), {time_info['klines_duration_hours']:.1f} hours (klines)")
        
        # Validate data coverage
        coverage = processor.validate_data_coverage(klines_df)
        coverage_status = "✓ Good" if all(coverage.values()) else "⚠ Issues detected"
        print(f"  Coverage validation: {coverage_status}")
        
        # Merge with orderbook data
        merged_data = processor.merge_with_klines(features_df, klines_df, pair)
        print(f"Merged dataset has {merged_data.height:,} records")
        
        correlation_results[pair] = merged_data
        
    except Exception as e:
        print(f"Error processing {pair}: {e}")

In [None]:
# Analyze correlations between crypto price movements and prediction market behavior
if correlation_results:
    pair = list(correlation_results.keys())[0]  # Use first available pair
    merged_df = correlation_results[pair]
    
    # Calculate price returns and prediction market returns
    analysis_df = merged_df.with_columns([
        # Crypto price returns (from KLINES)
        (pl.col("close") / pl.col("open") - 1).alias("crypto_return"),
        
        # Prediction market returns (from orderbook)
        pl.col("price_return").alias("pred_market_return"),
        
        # Volume correlation
        pl.col("volume").alias("crypto_volume"),
        pl.col("total_volume").alias("pred_market_volume")
    ]).filter(
        pl.col("crypto_return").is_not_null() &
        pl.col("pred_market_return").is_not_null()
    )
    
    print(f"\nCorrelation analysis for {pair}:")
    print(f"Analysis dataset: {analysis_df.height:,} records")
    
    # Convert to pandas for correlation analysis
    corr_data = analysis_df.select([
        "crypto_return", "pred_market_return", "crypto_volume", "pred_market_volume"
    ]).to_pandas()
    
    correlation_matrix = corr_data.corr()
    print("\nCorrelation Matrix:")
    print(correlation_matrix)
else:
    print("No correlation data available for analysis.")

## 6. Strategy Development Insights

In [None]:
# Develop market making strategy insights
strategy_insights = {}

# 1. Optimal spread thresholds by crypto
spread_thresholds = (
    features_df
    .filter(pl.col("total_volume") > 0)
    .group_by("crypto")
    .agg([
        pl.col("spread_pct").quantile(0.75).alias("profitable_spread_threshold"),
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("price_return").std().alias("volatility")
    ])
)

print("Optimal Spread Thresholds by Crypto:")
spread_thresholds

In [None]:
# 2. Best trading hours analysis
hourly_profitability = (
    features_df
    .with_columns(pl.col("timestamp").dt.hour().alias("hour"))
    .group_by(["crypto", "hour"])
    .agg([
        pl.col("spread_pct").mean().alias("avg_spread"),
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("tick_count").mean().alias("avg_activity"),
        (pl.col("spread_pct") * pl.col("total_volume")).mean().alias("profitability_proxy")
    ])
    .sort(["crypto", "profitability_proxy"], descending=[False, True])
)

print("\nBest Trading Hours (Top 3 per crypto):")
for crypto in sorted(features_df['crypto'].unique().to_list()):
    crypto_hours = hourly_profitability.filter(pl.col("crypto") == crypto).head(3)
    print(f"\n{crypto.upper()}:")
    print(crypto_hours)

In [None]:
# 3. Risk-adjusted opportunity scoring
risk_adjusted_opportunities = (
    opportunities
    .with_columns([
        # Risk-adjusted score = opportunity_score / volatility_risk
        (pl.col("opportunity_score") / (1 + pl.col("volatility_risk"))).alias("risk_adjusted_score"),
        
        # Liquidity score
        (pl.col("total_volume") * pl.col("tick_count")).alias("liquidity_score")
    ])
    .sort("risk_adjusted_score", descending=True)
)

print("\nTop 10 Risk-Adjusted Opportunities:")
risk_adjusted_opportunities.select([
    "timestamp", "crypto", "market_slug", "spread_pct", 
    "total_volume", "risk_adjusted_score", "market_sentiment"
]).head(10)

## 7. Market Making Bot Strategy Recommendations

In [None]:
# Generate comprehensive strategy recommendations
print("=" * 60)
print("POLYMARKET MARKET MAKING STRATEGY RECOMMENDATIONS")
print("=" * 60)

# Best performing cryptos
best_cryptos = (
    opportunity_stats
    .sort("avg_opportunity_score", descending=True)
    .head(2)
)

print("\n1. PRIORITY CRYPTOCURRENCIES:")
for row in best_cryptos.iter_rows(named=True):
    crypto = row['crypto']
    score = row['avg_opportunity_score']
    spread = row['avg_spread'] * 100
    print(f"   • {crypto.upper()}: Avg Score {score:.2f}, Avg Spread {spread:.2f}%")

print("\n2. OPTIMAL SPREAD THRESHOLDS:")
for row in spread_thresholds.iter_rows(named=True):
    crypto = row['crypto']
    threshold = row['profitable_spread_threshold'] * 100
    volatility = row['volatility']
    print(f"   • {crypto.upper()}: Min Spread {threshold:.2f}%, Volatility {volatility:.4f}")

print("\n3. BEST TRADING HOURS:")
print("   • Based on profitability proxy (spread × volume)")
print("   • Focus on hours with high volume AND wide spreads")

print("\n4. KEY MARKET MAKING PARAMETERS:")
total_opportunities = opportunities.height
avg_daily_opportunities = total_opportunities / 3  # Assuming ~3 days of data
print(f"   • Expected daily opportunities: ~{avg_daily_opportunities:.0f}")
print(f"   • Minimum spread threshold: 2%")
print(f"   • Volume threshold: Above 10-min moving average")
print(f"   • Risk management: Monitor volatility_risk metric")

print("\n5. CORRELATION INSIGHTS:")
if correlation_results:
    print("   • Prediction market behavior correlates with underlying crypto movements")
    print("   • Use crypto price momentum as leading indicator")
    print("   • Monitor volume correlations for liquidity timing")
else:
    print("   • Need more KLINES data for correlation analysis")
    print("   • Recommend collecting real-time crypto price feeds")

## 8. Next Steps for Bot Implementation

Based on this analysis, here are the recommended next steps:

### Immediate Actions:
1. **Deploy monitoring system** for the top-performing crypto pairs
2. **Set up real-time data feeds** for both orderbook and KLINES data
3. **Implement risk management** based on volatility_risk metrics

### Bot Architecture:
1. **Data ingestion layer** - Real-time orderbook and price feeds
2. **Feature calculation engine** - Real-time market microstructure features
3. **Opportunity detection** - Based on our scoring algorithms
4. **Risk management** - Position sizing and volatility controls
5. **Execution engine** - Automated order placement and management

### Performance Monitoring:
1. **Track realized spreads** vs predicted spreads
2. **Monitor fill rates** and execution quality
3. **Measure correlation accuracy** between crypto and prediction markets
4. **Analyze profitability** by time of day and crypto pair

In [None]:
# Save processed data for bot implementation
print("Saving processed data for bot implementation...")

# Save opportunities data
opportunities.write_csv("../data/market_making_opportunities.csv")

# Save feature data
features_df.write_csv("../data/orderbook_features.csv")

# Save strategy parameters
spread_thresholds.write_csv("../data/spread_thresholds.csv")
hourly_profitability.write_csv("../data/hourly_profitability.csv")

print("Data saved successfully!")
print("\nFiles created:")
print("- market_making_opportunities.csv")
print("- orderbook_features.csv") 
print("- spread_thresholds.csv")
print("- hourly_profitability.csv")