# Polymarket Hourly Prediction Market Analysis

This notebook analyzes orderbook data from Polymarket hourly prediction markets for crypto pairs (BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT) to develop market making strategies.

## Objectives:
1. **Data Exploration**: Understand orderbook patterns and market behavior
2. **Feature Engineering**: Create predictive features from orderbook and KLINES data
3. **Market Microstructure Analysis**: Analyze bid-ask spreads, volume imbalances, and price efficiency
4. **Strategy Development**: Identify profitable market making opportunities
5. **Correlation Analysis**: Find relationships between crypto price movements and prediction market behavior

In [1]:
# Import required libraries
import os
import sys
import polars as pl
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set up plotly for notebook display
pyo.init_notebook_mode(connected=True)

# Import our custom processor

sys.path.append(os.path.abspath(".."))
from analysis.data_processor import PolymarketDataProcessor

processor = PolymarketDataProcessor()

print("Libraries imported successfully!")

Libraries imported successfully!


## 1. Data Loading and Initial Exploration

In [2]:
# Initialize the data processor
processor = PolymarketDataProcessor("../orderbook_data")

# Load the most recent orderbook data
print("Loading orderbook data...")
orderbook_df = processor.load_orderbook_data()

print(f"Loaded {orderbook_df.height:,} orderbook records")
print(f"Date range: {orderbook_df['timestamp'].min()} to {orderbook_df['timestamp'].max()}")
print(f"Crypto pairs: {sorted(orderbook_df['crypto'].unique().to_list())}")
print(f"Unique markets: {orderbook_df['market_slug'].n_unique()}")
print(f"Unique assets: {orderbook_df['asset_id'].n_unique()}")

Loading orderbook data...
Loaded 33,700,390 orderbook records
Date range: 2025-07-15 06:02:28.796000 to 2025-07-18 15:04:22.190000
Crypto pairs: ['bitcoin', 'ethereum', 'solana', 'xrp']
Unique markets: 332
Unique assets: 664


In [3]:
# Display basic statistics
print("\nOrderbook Data Schema:")
print(orderbook_df.dtypes)

print("\nFirst 10 records:")
orderbook_df.head(10)


Orderbook Data Schema:
[Datetime(time_unit='us', time_zone=None), String, String, String, String, Float64, Float64, String]

First 10 records:


timestamp,market_slug,asset_id,crypto,side,price,size,event_type
datetime[μs],str,str,str,str,f64,f64,str
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.3,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.32,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.34,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.36,32.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.38,1532.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.4,64.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.42,1064.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.44,64.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.45,564.0,"""book"""
2025-07-15 06:23:26.688,"""s1""","""a1""","""solana""","""bid""",0.46,564.0,"""book"""


In [4]:
# Analyze data distribution by crypto
crypto_stats = (
    orderbook_df
    .group_by("crypto")
    .agg([
        pl.count().alias("total_records"),
        pl.col("market_slug").n_unique().alias("unique_markets"),
        pl.col("asset_id").n_unique().alias("unique_assets"),
        pl.col("price").mean().alias("avg_price"),
        pl.col("size").sum().alias("total_volume"),
        pl.col("timestamp").min().alias("first_record"),
        pl.col("timestamp").max().alias("last_record")
    ])
    .sort("total_records", descending=True)
)

print("Data Distribution by Crypto:")
crypto_stats

Data Distribution by Crypto:


crypto,total_records,unique_markets,unique_assets,avg_price,total_volume,first_record,last_record
str,u32,u32,u32,f64,f64,datetime[μs],datetime[μs]
"""bitcoin""",12172649,83,166,0.500004,13492000000.0,2025-07-15 06:20:47.479,2025-07-18 15:04:22.045
"""ethereum""",8510252,83,166,0.499998,8472600000.0,2025-07-15 06:18:09.297,2025-07-18 15:04:22.190
"""solana""",6645500,83,166,0.5,6324800000.0,2025-07-15 06:23:26.688,2025-07-18 15:04:22.155
"""xrp""",6371989,83,166,0.5,6687700000.0,2025-07-15 06:02:28.796,2025-07-18 15:04:22.181


## 2. Market Microstructure Analysis

In [5]:
# Resample data to 1-minute intervals for analysis
print("Resampling orderbook data to 1-minute intervals...")
resampled_data = processor.resample_orderbook_to_intervals(orderbook_df, "1m")

print(f"Resampled to {resampled_data.height:,} 1-minute intervals")
print("\nSample of resampled data:")
resampled_data.head()

Resampling orderbook data to 1-minute intervals...
Resampled to 52,792 1-minute intervals

Sample of resampled data:


crypto,market_slug,asset_id,timestamp,open_price,high_price,low_price,close_price,total_volume,tick_count,avg_bid,avg_ask,bid_volume,ask_volume,bid_ask_spread,bid_ratio
str,str,str,datetime[μs],f64,f64,f64,f64,f64,u32,f64,f64,f64,f64,f64,f64
"""xrp""","""s6""","""a11""",2025-07-15 06:02:00,0.68,0.97,0.03,0.38,8222.0,29,0.333333,0.677143,4127.0,4095.0,0.34381,0.501946
"""xrp""","""s6""","""a8""",2025-07-15 06:02:00,0.09,0.97,0.03,0.52,8222.0,29,0.322857,0.666667,4095.0,4127.0,0.34381,0.498054
"""ethereum""","""s8""","""a16""",2025-07-15 06:18:00,0.3,0.7,0.3,0.42,26292.0,27,0.392308,0.601429,13130.0,13162.0,0.209121,0.499391
"""ethereum""","""s8""","""a15""",2025-07-15 06:18:00,0.57,0.7,0.3,0.68,26292.0,27,0.398571,0.607692,13162.0,13130.0,0.209121,0.500609
"""bitcoin""","""s3""","""a13""",2025-07-15 06:20:00,0.66,0.99,0.01,0.36,29240.25,43,0.289545,0.719048,14639.25,14601.0,0.429502,0.500654


In [6]:
# Calculate market features
print("Calculating market microstructure features...")
features_df = processor.calculate_market_features(resampled_data)

print(f"Features calculated for {features_df.height:,} intervals")
print("\nAvailable features:")
print([col for col in features_df.columns if col not in orderbook_df.columns])

Calculating market microstructure features...
Features calculated for 52,792 intervals

Available features:
['open_price', 'high_price', 'low_price', 'close_price', 'total_volume', 'tick_count', 'avg_bid', 'avg_ask', 'bid_volume', 'ask_volume', 'bid_ask_spread', 'bid_ratio', 'price_return', 'price_range', 'log_volume', 'volume_imbalance', 'spread', 'spread_pct', 'price_return_ma_5', 'volume_ma_5', 'spread_ma_5', 'bid_ratio_ma_5', 'price_return_ma_10', 'volume_ma_10', 'spread_ma_10', 'bid_ratio_ma_10', 'price_return_ma_30', 'volume_ma_30', 'spread_ma_30', 'bid_ratio_ma_30']


In [7]:
# Visualize bid-ask spreads across different crypto pairs
spread_analysis = (
    features_df
    .filter(pl.col("spread_pct").is_not_null() & (pl.col("spread_pct") > 0))
    .group_by("crypto")
    .agg([
        pl.col("spread_pct").mean().alias("avg_spread_pct"),
        pl.col("spread_pct").median().alias("median_spread_pct"),
        pl.col("spread_pct").quantile(0.95).alias("p95_spread_pct"),
        pl.col("total_volume").mean().alias("avg_volume")
    ])
    .sort("avg_spread_pct", descending=True)
)

print("Bid-Ask Spread Analysis:")
spread_analysis

Bid-Ask Spread Analysis:


crypto,avg_spread_pct,median_spread_pct,p95_spread_pct,avg_volume
str,f64,f64,f64,f64
"""bitcoin""",1.055756,0.963821,1.829442,1177300.0
"""xrp""",0.98282,0.866215,1.745926,675572.584237
"""ethereum""",0.811319,0.695819,1.666667,853320.858723
"""solana""",0.776281,0.631304,1.700409,681515.266285


In [8]:
# Create spread visualization
spread_data = features_df.filter(
    pl.col("spread_pct").is_not_null() & 
    (pl.col("spread_pct") > 0) & 
    (pl.col("spread_pct") < 0.5)  # Filter outliers
).to_pandas()

fig = px.box(
    spread_data, 
    x="crypto", 
    y="spread_pct",
    title="Bid-Ask Spread Distribution by Crypto",
    labels={"spread_pct": "Spread (%)", "crypto": "Cryptocurrency"}
)
fig.update_layout(height=500)
fig.show()

## 3. Volume and Liquidity Analysis

In [9]:
# Analyze volume patterns and liquidity
volume_analysis = (
    features_df
    .with_columns([
        pl.col("timestamp").dt.hour().alias("hour"),
        pl.col("timestamp").dt.strftime("%A").alias("day_of_week")
    ])
    .group_by(["crypto", "hour"])
    .agg([
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("spread_pct").mean().alias("avg_spread"),
        pl.col("bid_ratio").mean().alias("avg_bid_ratio")
    ])
    .sort(["crypto", "hour"])
)

print("Volume patterns by hour of day:")
volume_analysis.head(24)  # Show first 24 hours

Volume patterns by hour of day:


crypto,hour,avg_volume,avg_spread,avg_bid_ratio
str,i8,f64,f64,f64
"""bitcoin""",0,631121.048961,1.04695,
"""bitcoin""",1,692330.059208,1.094723,
"""bitcoin""",2,1.0374e6,1.040136,
"""bitcoin""",3,853023.738958,1.059477,
"""bitcoin""",4,917875.158701,1.08393,
…,…,…,…,…
"""bitcoin""",19,845984.558444,1.035202,
"""bitcoin""",20,710703.648206,1.046511,
"""bitcoin""",21,688352.207305,1.085316,
"""bitcoin""",22,806748.926798,1.022872,


In [10]:
# Visualize hourly volume patterns
volume_hourly = volume_analysis.to_pandas()

fig = px.line(
    volume_hourly, 
    x="hour", 
    y="avg_volume", 
    color="crypto",
    title="Average Hourly Volume Patterns by Cryptocurrency",
    labels={"avg_volume": "Average Volume", "hour": "Hour of Day"}
)
fig.update_layout(height=500)
fig.show()

## 4. Market Making Opportunity Identification

In [11]:
# Identify market making opportunities
print("Identifying market making opportunities...")
opportunities = processor.identify_market_opportunities(features_df, min_spread_threshold=0.02)

print(f"Found {opportunities.height:,} potential opportunities")
print("\nTop 10 opportunities:")
opportunities.head(10)

Identifying market making opportunities...
Found 18,306 potential opportunities

Top 10 opportunities:


crypto,market_slug,asset_id,timestamp,open_price,high_price,low_price,close_price,total_volume,tick_count,avg_bid,avg_ask,bid_volume,ask_volume,bid_ask_spread,bid_ratio,price_return,price_range,log_volume,volume_imbalance,spread,spread_pct,price_return_ma_5,volume_ma_5,spread_ma_5,bid_ratio_ma_5,price_return_ma_10,volume_ma_10,spread_ma_10,bid_ratio_ma_10,price_return_ma_30,volume_ma_30,spread_ma_30,bid_ratio_ma_30,opportunity_score,market_sentiment,volatility_risk
str,str,str,datetime[μs],f64,f64,f64,f64,f64,u32,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,str,f64
"""bitcoin""","""s281""","""a561""",2025-07-18 04:05:00,0.31,0.99,0.01,0.23,11078000.0,4039,0.131119,0.70289,8223900.0,2184800.0,0.57177,0.790095,-0.258065,98.0,16.220449,6039000.0,0.57177,1.371138,0.029912,5131600.0,1.194278,0.499489,-0.007793,2975600.0,1.143034,0.500154,1.521126,1061400.0,,,91727000000000.0,"""bullish""",98.0
"""bitcoin""","""s281""","""a564""",2025-07-18 04:05:00,0.74,0.99,0.01,0.77,11086000.0,4036,0.296703,0.869138,2170400.0,8246700.0,0.572435,0.208352,0.040541,98.0,16.221216,-6076300.0,0.572435,0.982013,0.052191,3322800.0,1.12207,0.44158,0.026014,1886700.0,1.114694,0.474784,1.516871,693381.073333,,,66152000000000.0,"""bearish""",98.0
"""bitcoin""","""s277""","""a553""",2025-07-18 03:42:00,0.94,0.99,0.01,0.23,7676200.0,4497,0.039296,0.519712,341645.15,5250400.0,0.480416,0.061095,-0.755319,98.0,15.85364,-4908800.0,0.480416,1.718815,9.036839,6884700.0,1.084438,0.537716,5.711947,5749700.0,1.090872,0.500001,5.615875,3963900.0,1.042545,0.500027,64766000000000.0,"""bearish""",98.0
"""bitcoin""","""s277""","""a553""",2025-07-18 03:41:00,0.01,0.99,0.01,0.25,7561200.0,5203,0.062805,0.558922,695505.38,4968600.0,0.496117,0.122793,24.0,98.0,15.83854,-4273100.0,0.496117,1.59593,9.043536,5756600.0,1.026435,0.517852,11.294538,5094200.0,1.051308,0.5,5.646739,3526400.0,1.030168,0.500027,51564000000000.0,"""bearish""",98.0
"""bitcoin""","""s297""","""a593""",2025-07-18 08:34:00,0.12,0.99,0.01,0.11,6403900.0,4584,0.042167,0.491455,1004600.0,4389200.0,0.449288,0.186257,-0.083333,98.0,15.672415,-3384500.0,0.449288,1.683919,8.639413,4572400.0,1.268899,0.436453,4.861241,3665300.0,1.146589,0.470076,5.72832,2196600.0,1.068309,0.487135,36498000000000.0,"""bearish""",98.0
"""bitcoin""","""s274""","""a550""",2025-07-18 02:48:00,0.17,0.99,0.01,0.24,7335100.0,5058,0.065822,0.460243,1410100.0,4365900.0,0.394421,0.244133,0.411765,98.0,15.808178,-2955800.0,0.394421,1.499516,-0.21749,4916900.0,1.0374,0.482247,14.12093,4476300.0,0.972293,0.502486,10.031819,3205000.0,0.965678,0.499329,32511000000000.0,"""bearish""",98.0
"""bitcoin""","""s277""","""a557""",2025-07-18 03:33:00,0.38,0.99,0.01,0.57,9551000.0,5726,0.231513,0.720706,5743800.0,2937200.0,0.489193,0.661654,0.5,98.0,16.072152,2806600.0,0.489193,1.02748,11.031775,3378700.0,0.999825,0.532489,5.693593,2578400.0,1.017712,0.51456,5.779462,1664200.0,1.011701,0.502096,27543000000000.0,"""bullish""",98.0
"""bitcoin""","""s309""","""a622""",2025-07-18 11:48:00,0.99,0.99,0.01,0.27,6567600.0,3237,0.066059,0.493749,452285.33,3158400.0,0.427689,0.125262,-0.727273,98.0,15.697659,-2706200.0,0.427689,1.527987,2.262879,6669000.0,1.173776,0.416324,1.324909,6217300.0,1.090449,0.498653,2.596702,3465300.0,1.114781,0.49919,27157000000000.0,"""bearish""",98.0
"""bitcoin""","""s281""","""a561""",2025-07-18 04:09:00,0.32,0.99,0.01,0.15,8167800.0,5256,0.12967,0.710356,5071200.0,2676700.0,0.580687,0.654521,-0.53125,98.0,15.915709,2394400.0,0.580687,1.382545,-0.013204,3630700.0,1.191707,0.549225,-0.054435,5132300.0,1.156386,0.499845,0.049619,2025900.0,1.091439,0.499916,27039000000000.0,"""bullish""",98.0
"""bitcoin""","""s297""","""a593""",2025-07-18 08:33:00,0.15,0.99,0.01,0.95,5942200.0,4936,0.054876,0.542014,1157800.0,3757100.0,0.487138,0.235575,5.333333,98.0,15.59759,-2599300.0,0.487138,1.632254,9.878632,3669500.0,1.259142,0.446362,6.174202,2674100.0,1.12637,0.479221,5.703473,1840300.0,1.060782,0.489296,25211000000000.0,"""bearish""",98.0


In [12]:
# Analyze opportunity distribution by crypto
opportunity_stats = (
    opportunities
    .group_by("crypto")
    .agg([
        pl.count().alias("total_opportunities"),
        pl.col("opportunity_score").mean().alias("avg_opportunity_score"),
        pl.col("spread_pct").mean().alias("avg_spread"),
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("market_sentiment").mode().first().alias("dominant_sentiment")
    ])
    .sort("avg_opportunity_score", descending=True)
)

print("Opportunity Analysis by Crypto:")
opportunity_stats

Opportunity Analysis by Crypto:


crypto,total_opportunities,avg_opportunity_score,avg_spread,avg_volume,dominant_sentiment
str,u32,f64,f64,f64,str
"""bitcoin""",6024,444380000000.0,1.03784,1402200.0,"""neutral"""
"""ethereum""",4851,67082000000.0,0.793165,995265.731773,"""neutral"""
"""solana""",3822,54609000000.0,0.761315,788364.487747,"""neutral"""
"""xrp""",3609,30182000000.0,0.964128,761330.77501,"""neutral"""


## 5. KLINES Integration and Correlation Analysis

In [14]:
# Load KLINES data for correlation analysis
crypto_pairs = ["ETHUSDT", "BTCUSDT"]  # Start with these two

correlation_results = {}

for pair in crypto_pairs:
    try:
        print(f"\nAnalyzing {pair}...")
        
        # Load KLINES data (automatically synced with orderbook date range)
        klines_df = processor.load_klines_data(pair)
        print(f"Loaded {klines_df.height:,} KLINES records for {pair}")
        
        # Show time range synchronization info
        time_info = processor.get_data_time_range_info()
        print(f"  Orderbook range: {time_info['orderbook_start']} to {time_info['orderbook_end']}")
        print(f"  KLINES range: {time_info['klines_start']} to {time_info['klines_end']}")
        print(f"  Duration: {time_info['orderbook_duration_hours']:.1f} hours (orderbook), {time_info['klines_duration_hours']:.1f} hours (klines)")
        
        # Validate data coverage
        coverage = processor.validate_data_coverage(klines_df)
        coverage_status = "✓ Good" if all(coverage.values()) else "⚠ Issues detected"
        print(f"  Coverage validation: {coverage_status}")
        
        # Merge with orderbook data
        merged_data = processor.merge_with_klines(features_df, klines_df, pair)
        print(f"Merged dataset has {merged_data.height:,} records")
        
        correlation_results[pair] = merged_data
        
    except Exception as e:
        print(f"Error processing {pair}: {e}")


Analyzing ETHUSDT...


Loaded 5,760 KLINES records for ETHUSDT
  Orderbook range: 2025-07-15 06:02:28.796000 to 2025-07-18 15:04:22.190000
  KLINES range: 2025-07-15 04:02:28.796000 to 2025-07-18 17:04:22.190000
  Duration: 81.0 hours (orderbook), 85.0 hours (klines)
  Coverage validation: ✓ Good
Error processing ETHUSDT: datatypes of join keys don't match - `timestamp_1m`: datetime[μs] on left does not match `timestamp_1m`: datetime[ns] on right

Analyzing BTCUSDT...


Loaded 5,760 KLINES records for BTCUSDT
  Orderbook range: 2025-07-15 06:02:28.796000 to 2025-07-18 15:04:22.190000
  KLINES range: 2025-07-15 04:02:28.796000 to 2025-07-18 17:04:22.190000
  Duration: 81.0 hours (orderbook), 85.0 hours (klines)
  Coverage validation: ✓ Good
Error processing BTCUSDT: datatypes of join keys don't match - `timestamp_1m`: datetime[μs] on left does not match `timestamp_1m`: datetime[ns] on right


🔍 Analyzing candle flipping behavior...
   Processing candle 1/4317
   Processing candle 1001/4317
   Processing candle 2001/4317
   Processing candle 3001/4317
   Processing candle 4001/4317

📊 Historical Analysis (4,317 candles):
   🟢 Green candles: 2,206 (51.1%)
   🔴 Red candles: 2,111 (48.9%)
   🔄 Overall flip rate: 14.7%

🎯 Flip Rate Analysis by Magnitude:
   📈 Up > 0.1% at 45min: 9.7% flip rate (1,838 samples)
   📉 Down <-0.1% at 45min: 9.4% flip rate (1,726 samples)
   📈 Up >0.25% at 45min: 5.2% flip rate (1,332 samples)
   📉 Down <-0.25% at 45min: 5.2% flip rate (1,219 samples)
   📈 Up > 0.5% at 45min: 1.9% flip rate (726 samples)
   📉 Down <-0.5% at 45min: 1.8% flip rate (679 samples)
   📈 Up > 1.0% at 45min: 0.9% flip rate (214 samples)
   📉 Down <-1.0% at 45min: 1.4% flip rate (222 samples)

🕐 Flip Rate by Hour of Day (UTC):
    0:00- 0:59  12.2% flip rate (180 candles)
    1:00- 1:59  13.3% flip rate (180 candles)
    2:00- 2:59  17.8% flip rate (180 candles)
    3:00- 3:59

Unnamed: 0,timestamp,open,price_at_45min,close,delta_45_pct,delta_close_pct,direction_45,direction_close,flipped,green_candle,hour_of_day,day_of_week
0,2025-01-01 00:00:00+00:00,3337.78,3356.09,3363.7,0.548568,0.776564,up,up,False,True,0,Wednesday
1,2025-01-01 01:00:00+00:00,3363.69,3352.46,3346.54,-0.33386,-0.509857,down,down,False,False,1,Wednesday
2,2025-01-01 02:00:00+00:00,3346.54,3357.78,3362.61,0.335869,0.480197,up,up,False,True,2,Wednesday
3,2025-01-01 03:00:00+00:00,3362.61,3355.95,3355.2,-0.19806,-0.220365,down,down,False,False,3,Wednesday
4,2025-01-01 04:00:00+00:00,3355.2,3345.51,3341.14,-0.288805,-0.419051,down,down,False,False,4,Wednesday
5,2025-01-01 05:00:00+00:00,3341.14,3348.09,3345.41,0.208013,0.127801,up,up,False,True,5,Wednesday
6,2025-01-01 06:00:00+00:00,3345.41,3344.48,3346.6,-0.027799,0.035571,down,up,True,True,6,Wednesday
7,2025-01-01 07:00:00+00:00,3346.61,3344.02,3347.12,-0.077392,0.015239,down,up,True,True,7,Wednesday
8,2025-01-01 08:00:00+00:00,3347.11,3339.69,3337.01,-0.221684,-0.301753,down,down,False,False,8,Wednesday
9,2025-01-01 09:00:00+00:00,3337.0,3327.16,3334.64,-0.294876,-0.070722,down,down,False,False,9,Wednesday



✅ Validation Check:
   📊 Same price at 45min and close: 3 (0.1%)
   📊 Different price at 45min and close: 4,314 (99.9%)
✅ Logic appears correct - 45-min prices are properly different from close prices


In [None]:
# Analyze correlations between crypto price movements and prediction market behavior
if correlation_results:
    pair = list(correlation_results.keys())[0]  # Use first available pair
    merged_df = correlation_results[pair]
    
    # Calculate price returns and prediction market returns
    analysis_df = merged_df.with_columns([
        # Crypto price returns (from KLINES)
        (pl.col("close") / pl.col("open") - 1).alias("crypto_return"),
        
        # Prediction market returns (from orderbook)
        pl.col("price_return").alias("pred_market_return"),
        
        # Volume correlation
        pl.col("volume").alias("crypto_volume"),
        pl.col("total_volume").alias("pred_market_volume")
    ]).filter(
        pl.col("crypto_return").is_not_null() &
        pl.col("pred_market_return").is_not_null()
    )
    
    print(f"\nCorrelation analysis for {pair}:")
    print(f"Analysis dataset: {analysis_df.height:,} records")
    
    # Convert to pandas for correlation analysis
    corr_data = analysis_df.select([
        "crypto_return", "pred_market_return", "crypto_volume", "pred_market_volume"
    ]).to_pandas()
    
    correlation_matrix = corr_data.corr()
    print("\nCorrelation Matrix:")
    print(correlation_matrix)
else:
    print("No correlation data available for analysis.")

## 6. Strategy Development Insights

In [None]:
# Develop market making strategy insights
strategy_insights = {}

# 1. Optimal spread thresholds by crypto
spread_thresholds = (
    features_df
    .filter(pl.col("total_volume") > 0)
    .group_by("crypto")
    .agg([
        pl.col("spread_pct").quantile(0.75).alias("profitable_spread_threshold"),
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("price_return").std().alias("volatility")
    ])
)

print("Optimal Spread Thresholds by Crypto:")
spread_thresholds

In [None]:
# 2. Best trading hours analysis
hourly_profitability = (
    features_df
    .with_columns(pl.col("timestamp").dt.hour().alias("hour"))
    .group_by(["crypto", "hour"])
    .agg([
        pl.col("spread_pct").mean().alias("avg_spread"),
        pl.col("total_volume").mean().alias("avg_volume"),
        pl.col("tick_count").mean().alias("avg_activity"),
        (pl.col("spread_pct") * pl.col("total_volume")).mean().alias("profitability_proxy")
    ])
    .sort(["crypto", "profitability_proxy"], descending=[False, True])
)

print("\nBest Trading Hours (Top 3 per crypto):")
for crypto in sorted(features_df['crypto'].unique().to_list()):
    crypto_hours = hourly_profitability.filter(pl.col("crypto") == crypto).head(3)
    print(f"\n{crypto.upper()}:")
    print(crypto_hours)

In [None]:
# 3. Risk-adjusted opportunity scoring
risk_adjusted_opportunities = (
    opportunities
    .with_columns([
        # Risk-adjusted score = opportunity_score / volatility_risk
        (pl.col("opportunity_score") / (1 + pl.col("volatility_risk"))).alias("risk_adjusted_score"),
        
        # Liquidity score
        (pl.col("total_volume") * pl.col("tick_count")).alias("liquidity_score")
    ])
    .sort("risk_adjusted_score", descending=True)
)

print("\nTop 10 Risk-Adjusted Opportunities:")
risk_adjusted_opportunities.select([
    "timestamp", "crypto", "market_slug", "spread_pct", 
    "total_volume", "risk_adjusted_score", "market_sentiment"
]).head(10)

## 7. Market Making Bot Strategy Recommendations

In [None]:
# Generate comprehensive strategy recommendations
print("=" * 60)
print("POLYMARKET MARKET MAKING STRATEGY RECOMMENDATIONS")
print("=" * 60)

# Best performing cryptos
best_cryptos = (
    opportunity_stats
    .sort("avg_opportunity_score", descending=True)
    .head(2)
)

print("\n1. PRIORITY CRYPTOCURRENCIES:")
for row in best_cryptos.iter_rows(named=True):
    crypto = row['crypto']
    score = row['avg_opportunity_score']
    spread = row['avg_spread'] * 100
    print(f"   • {crypto.upper()}: Avg Score {score:.2f}, Avg Spread {spread:.2f}%")

print("\n2. OPTIMAL SPREAD THRESHOLDS:")
for row in spread_thresholds.iter_rows(named=True):
    crypto = row['crypto']
    threshold = row['profitable_spread_threshold'] * 100
    volatility = row['volatility']
    print(f"   • {crypto.upper()}: Min Spread {threshold:.2f}%, Volatility {volatility:.4f}")

print("\n3. BEST TRADING HOURS:")
print("   • Based on profitability proxy (spread × volume)")
print("   • Focus on hours with high volume AND wide spreads")

print("\n4. KEY MARKET MAKING PARAMETERS:")
total_opportunities = opportunities.height
avg_daily_opportunities = total_opportunities / 3  # Assuming ~3 days of data
print(f"   • Expected daily opportunities: ~{avg_daily_opportunities:.0f}")
print(f"   • Minimum spread threshold: 2%")
print(f"   • Volume threshold: Above 10-min moving average")
print(f"   • Risk management: Monitor volatility_risk metric")

print("\n5. CORRELATION INSIGHTS:")
if correlation_results:
    print("   • Prediction market behavior correlates with underlying crypto movements")
    print("   • Use crypto price momentum as leading indicator")
    print("   • Monitor volume correlations for liquidity timing")
else:
    print("   • Need more KLINES data for correlation analysis")
    print("   • Recommend collecting real-time crypto price feeds")

## 8. Next Steps for Bot Implementation

Based on this analysis, here are the recommended next steps:

### Immediate Actions:
1. **Deploy monitoring system** for the top-performing crypto pairs
2. **Set up real-time data feeds** for both orderbook and KLINES data
3. **Implement risk management** based on volatility_risk metrics

### Bot Architecture:
1. **Data ingestion layer** - Real-time orderbook and price feeds
2. **Feature calculation engine** - Real-time market microstructure features
3. **Opportunity detection** - Based on our scoring algorithms
4. **Risk management** - Position sizing and volatility controls
5. **Execution engine** - Automated order placement and management

### Performance Monitoring:
1. **Track realized spreads** vs predicted spreads
2. **Monitor fill rates** and execution quality
3. **Measure correlation accuracy** between crypto and prediction markets
4. **Analyze profitability** by time of day and crypto pair

In [None]:
# Save processed data for bot implementation
print("Saving processed data for bot implementation...")

# Save opportunities data
opportunities.write_csv("../data/market_making_opportunities.csv")

# Save feature data
features_df.write_csv("../data/orderbook_features.csv")

# Save strategy parameters
spread_thresholds.write_csv("../data/spread_thresholds.csv")
hourly_profitability.write_csv("../data/hourly_profitability.csv")

print("Data saved successfully!")
print("\nFiles created:")
print("- market_making_opportunities.csv")
print("- orderbook_features.csv") 
print("- spread_thresholds.csv")
print("- hourly_profitability.csv")