# 📊 Data Exploration: Euler Subgraphs and Synthetic EulerSwap Data

This notebook explores the available data sources for our delta-neutral LP strategy:

1. **Real Euler Finance Data** from The Graph Protocol subgraphs
2. **Synthetic EulerSwap AMM Data** for backtesting demonstration
3. **Data Quality Analysis** and validation
4. **Market Structure Understanding** for strategy development

⚠️ **Educational Purpose**: This uses synthetic data for demonstration. Real trading requires live data sources.

In [None]:
# Standard imports
import warnings
from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import seaborn as sns
from plotly.subplots import make_subplots

warnings.filterwarnings("ignore")

# Project imports
import sys

sys.path.append("..")
from src.data.data_loader import DataStore
from src.data.preprocessor import EulerDataProcessor
from src.data.subgraph_client import EulerSubgraphClient

# Configure plotting
plt.style.use("seaborn-v0_8")
sns.set_palette("husl")
%matplotlib inline

## 🔍 Available Data Sources Investigation

Let's first investigate what data is available from Euler's subgraphs and understand why we need synthetic data.

In [None]:
# Initialize the subgraph client
client = EulerSubgraphClient()

print("🔬 Investigating Euler Subgraphs")
print("=" * 50)
print("🎯 Goal: Find EulerSwap AMM data for delta-neutral strategies")
print("")
print("Available subgraphs:")
print("1. Official Euler Finance (2 years old): Lending protocol data")
print("2. Community Euler (4 months old): Vault and EVault data")
print("")
print("⚠️  Finding: Neither contains EulerSwap AMM/DEX data")
print("💡 Solution: Generate realistic synthetic data for demonstration")

## 🎭 Synthetic EulerSwap Data Generation

Since EulerSwap is very new and not yet indexed by available subgraphs, we'll generate realistic synthetic data that demonstrates the concepts while being transparent about its educational nature.

In [None]:
# Generate synthetic data for analysis
end_date = datetime.now()
start_date = end_date - timedelta(days=30)  # 30 days of data

print(f"📅 Generating synthetic data from {start_date.date()} to {end_date.date()}")

# Generate the synthetic dataset
synthetic_data = client.generate_synthetic_eulerswap_data(
    start_date=start_date,
    end_date=end_date,
    base_asset="WETH",
    quote_asset="USDC",
    initial_price=2000.0,
    hourly_frequency=True,
)

print("\n📊 Dataset Overview:")
print(f"Shape: {synthetic_data.shape}")
print(
    f"Date range: {synthetic_data['timestamp'].min()} to {synthetic_data['timestamp'].max()}"
)
print(f"Columns: {list(synthetic_data.columns)}")

## 📈 Price Movement Analysis

In [None]:
# Create interactive price chart
fig = make_subplots(
    rows=3,
    cols=1,
    subplot_titles=("WETH/USD Price", "Trading Volume", "Total Liquidity"),
    vertical_spacing=0.1,
    specs=[
        [{"secondary_y": False}],
        [{"secondary_y": False}],
        [{"secondary_y": False}],
    ],
)

# Price chart
fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=synthetic_data["price_ratio"],
        name="WETH Price",
        line=dict(color="blue", width=2),
    ),
    row=1,
    col=1,
)

# Volume chart
fig.add_trace(
    go.Bar(
        x=synthetic_data["timestamp"],
        y=synthetic_data["swap_volume_usd"],
        name="Volume USD",
        marker_color="green",
        opacity=0.7,
    ),
    row=2,
    col=1,
)

# Liquidity chart
fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=synthetic_data["total_liquidity_usd"],
        name="Total Liquidity",
        fill="tonexty",
        line=dict(color="purple"),
    ),
    row=3,
    col=1,
)

fig.update_layout(
    title="📊 Synthetic EulerSwap Market Data Overview", height=800, showlegend=True
)

fig.show()

# Basic statistics
print("\n📊 Price Statistics:")
print(f"Mean Price: ${synthetic_data['price_ratio'].mean():.2f}")
print(
    f"Price Range: ${synthetic_data['price_ratio'].min():.2f} - ${synthetic_data['price_ratio'].max():.2f}"
)
print(
    f"Daily Volatility: {synthetic_data['price_ratio'].pct_change().std() * np.sqrt(24):.2%}"
)

print("\n💰 Volume Statistics:")
print(f"Average Hourly Volume: ${synthetic_data['swap_volume_usd'].mean():,.0f}")
print(f"Total Volume (30d): ${synthetic_data['swap_volume_usd'].sum():,.0f}")

print("\n🏦 Liquidity Statistics:")
print(f"Average TVL: ${synthetic_data['total_liquidity_usd'].mean():,.0f}")
print(
    f"TVL Range: ${synthetic_data['total_liquidity_usd'].min():,.0f} - ${synthetic_data['total_liquidity_usd'].max():,.0f}"
)

## 🏦 EulerSwap Vault Analysis

Understanding the vault integration that makes EulerSwap unique for delta-neutral strategies.

In [None]:
# Analyze vault utilization and borrowing capacity
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Vault Utilization Rate",
        "Available Borrow Capacity",
        "Vault APY Comparison",
        "Trading Fees vs Vault Yield",
    ),
    specs=[
        [{"secondary_y": False}, {"secondary_y": False}],
        [{"secondary_y": False}, {"secondary_y": False}],
    ],
)

# Vault utilization
fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=synthetic_data["vault_utilization_rate"] * 100,
        name="Utilization %",
        line=dict(color="red"),
    ),
    row=1,
    col=1,
)

# Available borrow capacity
fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=synthetic_data["available_borrow_usd"],
        name="Borrow Capacity",
        fill="tozeroy",
        line=dict(color="green"),
    ),
    row=1,
    col=2,
)

# APY comparison
fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=synthetic_data["vault_apy_asset0"] * 100,
        name="WETH Vault APY %",
        line=dict(color="blue"),
    ),
    row=2,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=synthetic_data["vault_apy_asset1"] * 100,
        name="USDC Vault APY %",
        line=dict(color="orange"),
    ),
    row=2,
    col=1,
)

# Trading fees vs vault yield
trading_fee_apy = (
    synthetic_data["trading_fee_rate"]
    * synthetic_data["swap_volume_usd"]
    / synthetic_data["total_liquidity_usd"]
    * 24
    * 365
) * 100

fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=trading_fee_apy,
        name="Trading Fee APY %",
        line=dict(color="purple"),
    ),
    row=2,
    col=2,
)

fig.update_layout(
    title="🏦 EulerSwap Vault Integration Analysis", height=600, showlegend=True
)

fig.show()

print("\n🏦 Vault Statistics:")
print(
    f"Average Vault Utilization: {synthetic_data['vault_utilization_rate'].mean():.1%}"
)
print(
    f"Average Available Borrow: ${synthetic_data['available_borrow_usd'].mean():,.0f}"
)
print(f"Average WETH Vault APY: {synthetic_data['vault_apy_asset0'].mean():.1%}")
print(f"Average USDC Vault APY: {synthetic_data['vault_apy_asset1'].mean():.1%}")
print(f"Average Trading Fee APY: {trading_fee_apy.mean():.1%}")

## 📊 Data Quality Assessment

Let's perform a comprehensive data quality check to ensure our synthetic data is suitable for backtesting.

In [None]:
# Initialize data processor for quality assessment
processor = EulerDataProcessor()

# Run quality validation
quality_report = processor.validate_data(synthetic_data)

print("🔍 Data Quality Report")
print("=" * 40)
print(f"📊 Overall Quality Score: {quality_report.quality_score:.1f}/100")
print(f"📝 Total Records: {quality_report.total_records:,}")
print(
    f"📅 Date Range: {quality_report.date_range[0]} to {quality_report.date_range[1]}"
)

if quality_report.missing_values:
    print("\n⚠️  Missing Values:")
    for col, count in quality_report.missing_values.items():
        print(f"   {col}: {count} ({count/quality_report.total_records*100:.1f}%)")
else:
    print("\n✅ No missing values found")

if quality_report.outliers:
    print("\n📊 Outliers Detected:")
    for col, count in quality_report.outliers.items():
        print(f"   {col}: {count} outliers")
else:
    print("\n✅ No significant outliers detected")

if quality_report.warnings:
    print("\n⚠️  Warnings:")
    for warning in quality_report.warnings:
        print(f"   • {warning}")

if quality_report.recommendations:
    print("\n💡 Recommendations:")
    for rec in quality_report.recommendations:
        print(f"   • {rec}")

## 📈 Market Microstructure Analysis

Understanding price-volume relationships and market efficiency for strategy development.

In [None]:
# Calculate returns and analyze market microstructure
synthetic_data["returns"] = synthetic_data["price_ratio"].pct_change()
synthetic_data["log_returns"] = np.log(synthetic_data["price_ratio"]).diff()
synthetic_data["volume_ratio"] = (
    synthetic_data["swap_volume_usd"]
    / synthetic_data["swap_volume_usd"].rolling(24).mean()
)

# Create microstructure analysis plots
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Returns Distribution",
        "Price vs Volume",
        "Volatility Clustering",
        "Volume Profile",
    ),
    specs=[
        [{"secondary_y": False}, {"secondary_y": False}],
        [{"secondary_y": False}, {"secondary_y": False}],
    ],
)

# Returns distribution
fig.add_trace(
    go.Histogram(
        x=synthetic_data["returns"].dropna() * 100,
        nbinsx=50,
        name="Returns %",
        marker_color="blue",
        opacity=0.7,
    ),
    row=1,
    col=1,
)

# Price vs Volume scatter
fig.add_trace(
    go.Scatter(
        x=synthetic_data["swap_volume_usd"] / 1e6,  # In millions
        y=synthetic_data["price_ratio"],
        mode="markers",
        name="Price vs Volume",
        marker=dict(color="green", size=4, opacity=0.6),
    ),
    row=1,
    col=2,
)

# Volatility clustering (rolling volatility)
rolling_vol = synthetic_data["returns"].rolling(24).std() * np.sqrt(24) * 100
fig.add_trace(
    go.Scatter(
        x=synthetic_data["timestamp"],
        y=rolling_vol,
        name="24h Volatility %",
        line=dict(color="red"),
    ),
    row=2,
    col=1,
)

# Volume profile
volume_bins = pd.cut(synthetic_data["swap_volume_usd"], bins=20)
volume_profile = synthetic_data.groupby(volume_bins)["price_ratio"].count()
bin_centers = [
    (interval.left + interval.right) / 2 for interval in volume_profile.index
]

fig.add_trace(
    go.Bar(
        x=volume_profile.values,
        y=[f"{center/1e6:.1f}M" for center in bin_centers],
        orientation="h",
        name="Volume Profile",
        marker_color="purple",
    ),
    row=2,
    col=2,
)

fig.update_layout(
    title="📊 Market Microstructure Analysis", height=600, showlegend=True
)

fig.show()

# Market statistics
print("\n📊 Market Microstructure Statistics:")
print("Return Statistics:")
print(f"  Mean Return: {synthetic_data['returns'].mean()*100:.3f}%")
print(f"  Return Volatility: {synthetic_data['returns'].std()*100:.3f}%")
print(f"  Skewness: {synthetic_data['returns'].skew():.3f}")
print(f"  Kurtosis: {synthetic_data['returns'].kurtosis():.3f}")

print(
    f"\nVolume-Price Correlation: {synthetic_data['swap_volume_usd'].corr(synthetic_data['price_ratio']):.3f}"
)
print(
    f"Volume-Volatility Correlation: {synthetic_data['swap_volume_usd'].corr(rolling_vol):.3f}"
)

## 💾 Data Storage for Strategy Development

Save the processed data for use in subsequent notebooks.

In [None]:
# Save synthetic data for strategy development
store = DataStore()

dataset_name = f"synthetic_eulerswap_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

metadata = {
    "data_type": "synthetic_eulerswap_amm",
    "purpose": "delta_neutral_strategy_backtesting",
    "base_asset": "WETH",
    "quote_asset": "USDC",
    "initial_price": 2000.0,
    "frequency": "hourly",
    "quality_score": quality_report.quality_score,
    "warning": "SYNTHETIC_DATA_FOR_EDUCATIONAL_DEMONSTRATION_ONLY",
    "disclaimer": "This data is artificially generated for educational purposes and should not be used for real trading decisions",
}

file_path = store.save_dataset(
    synthetic_data, dataset_name, category="processed", metadata=metadata
)

print(f"\n💾 Dataset saved as: {dataset_name}")
print(f"📁 File path: {file_path}")
print("📊 Ready for strategy development in next notebooks")

## 📋 Summary and Next Steps

### Key Findings:

1. **🔍 Data Availability**: Real EulerSwap AMM data is not yet available in The Graph subgraphs
2. **🎭 Synthetic Solution**: Generated realistic synthetic data with proper disclaimers
3. **📊 Data Quality**: High-quality synthetic dataset suitable for educational backtesting
4. **🏦 Vault Integration**: Modeled EulerSwap's unique vault borrowing capabilities
5. **📈 Market Structure**: Realistic price movements and volume patterns

### ⚠️ Important Disclaimers:

- This is **synthetic data** generated for educational demonstration
- Results are **intentionally overfit** to historical patterns
- **DO NOT use** for real trading decisions
- Designed for **learning delta-neutral concepts** and backtesting methodology

### 🎯 Next Steps:

1. **Notebook 02**: Data processing and feature engineering
2. **Notebook 03**: Delta-neutral strategy development
3. **Notebook 04**: VectorBT backtesting implementation
4. **Notebook 05**: Hyperparameter optimization with Optuna
5. **Notebook 06**: Results dashboard and analysis

The synthetic dataset is now ready for strategy development! 🚀