# Phase 2B: Statistical Analysis by Asset Class

Comprehensive statistical analysis of DeFi vs TradFi trading for all asset classes:
- **Traditional Commodities** (Gold, Silver, Oil, Natural Gas)
- **Traditional Equities** (AAPL, GOOGL, MSFT, NVDA, TSLA, etc.)
- **Crypto Coins** (BTC, ETH, SOL, LINK, ADA)

**Analysis Period:** July 1, 2025 to February 15, 2026
- Assets with less historical data will show their available data within this range
- Inception dates are tracked for all assets

**Statistical Analyses:**
1. **Volume Statistics** - Average daily notional volumes and volume ratios (DeFi/TradFi)
2. **Daily Volume T-Test** - Rolling 3-day window analysis to detect statistically significant volume patterns
3. **Cross-Correlation** - Lag analysis (-7 to +7 days) to identify lead/lag relationships between DeFi and TradFi volumes
4. **Price Correlation** - Correlation coefficients and tracking error analysis
5. **Asset Type Summary** - Aggregated statistics by asset class

---

## Implementation Notes

This notebook uses utility functions from `utils.statistical_analysis` for clean, maintainable code:

**Data Loading:**
- `load_all_assets_filtered()` - Loads assets with configurable date range filtering
- `group_assets_by_type()` - Groups assets by type with optional sorting

**Analysis Functions:**
- `create_volume_statistics_table()` - Generates volume statistics tables
- `plot_daily_volume_ttest()` - Creates T-test plots with rolling window analysis
- `plot_cross_correlation()` - Generates cross-correlation plots with customizable lag ranges
- `analyze_assets_by_type()` - Generic function to apply any analysis to all assets of a type
- `create_price_correlation_table()` - Comprehensive price correlation analysis
- `create_asset_type_summary()` - Aggregated statistics by asset class

**Key Features:**
- **Parameterized Analysis** - Functions adapt labels, colors, and calculations based on asset type
- **Trading Day Windows** - Rolling windows use actual trading days (non-zero, non-NaN), automatically excluding weekends/holidays for TradFi and pre-launch periods for crypto
- **Flexible Configuration** - Window size, confidence levels, and lag ranges are configurable
- **CSV Export** - T-test results automatically exported to `output/Phase 2B/Daily Volume Analysis/`
- **DRY Principle** - Single source of truth for each analysis type eliminates code duplication

**Statistical Methodology:**
- **T-Test (95% CI):** t-critical = Â±4.303 (df=2) for 3-day window
- **Cross-Correlation:** Pearson correlation with lag shifts
- **Trading Days Only:** All rolling calculations exclude non-trading periods

In [None]:
import os

from utils import (
    load_all_assets_filtered,
    group_assets_by_type,
    create_volume_statistics_table,
    plot_daily_volume_ttest,
    plot_cross_correlation,
    analyze_assets_by_type,
    create_price_correlation_table,
    create_asset_type_summary,
)

PHASE_1B_DIR = os.path.join("output", "Phase 1B")

# Date range for analysis (July 1, 2025 to Feb 15, 2026)
START_DATE = "2025-07-01"
END_DATE = "2026-02-15"

## Load All Assets from Phase 1B

In [None]:
# Load all assets from Phase 1B output directory with date filtering
assets = load_all_assets_filtered(PHASE_1B_DIR, START_DATE, END_DATE)
print(f"\nTotal assets loaded: {len(assets)}")
print(f"Analysis period: {START_DATE} to {END_DATE}")

## 1. Volume Statistics and Comparisons

Calculate summary statistics for notional volumes across all assets.

In [None]:
# Group assets by type and sort by number of overlapping days
assets_by_type = group_assets_by_type(assets, sort_by_overlap=True)

In [None]:
print("\n" + "="*100)
print("TRADITIONAL COMMODITY SPOT TRADING VOLUMES (USD NOTIONAL)")
print("="*100)

df = create_volume_statistics_table(assets_by_type, "Traditional Commodity")
if df is not None:
    display(df)

In [None]:
print("\n" + "="*100)
print("TRADITIONAL EQUITY SPOT TRADING VOLUMES (USD NOTIONAL)")
print("="*100)

df = create_volume_statistics_table(assets_by_type, "Traditional Equity")
if df is not None:
    display(df)

In [None]:
print("\n" + "="*100)
print("CRYPTO COIN SPOT TRADING VOLUMES (USD NOTIONAL)")
print("="*100)

df = create_volume_statistics_table(assets_by_type, "Crypto Coin")
if df is not None:
    display(df)

## 2. Daily Volume T-Test Analysis â€” Rolling 3-Day Window

Compute t-scores using a rolling 3-day **trading day** window to detect statistically significant volume patterns.

**Methodology:**
- **Window Type:** 3 most recent **trading days** (non-zero, non-NaN values), not calendar days
- **Calculation:** For each day, compare current volume against mean and standard deviation of previous 3 trading days
- **T-Score Formula:** `t = (today_volume - window_mean) / (window_std / sqrt(3))`

**Automatic Handling of Non-Trading Periods:**
- **TradFi equities/commodities:** Weekends and holidays automatically excluded
- **Crypto coins:** Pre-DEX-launch zeros automatically excluded
- **All assets:** Statistics based on actual market activity only

**Statistical Significance:**
- **95% Confidence Interval:** t-critical = Â±4.303 (df=2)
- ðŸŸ¢ **Normal:** |t| < 4.303 (within expected range)
- ðŸ”´ **Significant:** |t| â‰¥ 4.303 (statistically unusual volume, potential pattern or anomaly)

**Implementation:**
- Uses `plot_daily_volume_ttest()` from `utils.statistical_analysis`
- Exports detailed CSV files with t-scores, rolling means/stds, and window indices
- Creates 2Ã—2 subplot: T-scores (top) and actual volumes with rolling means (bottom)

In [None]:
from scipy import stats

WINDOW_3D = 3  # 3 days
DF = WINDOW_3D - 1  # degrees of freedom = 2
T_CRITICAL_95 = stats.t.ppf(0.975, DF)  # 95% confidence interval, two-tailed

# Create output directory for CSV exports
output_dir = os.path.join("output", "Phase 2B", "Daily Volume Analysis")

# Traditional Commodity Daily Volume T-Test
print("="*100)
print("TRADITIONAL COMMODITY DAILY VOLUME T-TEST ANALYSIS")
print("="*100)
print(f"95% Confidence Interval: Â±{T_CRITICAL_95:.3f} (df={DF})")

analyze_assets_by_type(
    assets_by_type,
    "Traditional Commodity",
    plot_daily_volume_ttest,
    window_size=WINDOW_3D,
    confidence_level=0.95,
    export_dir=output_dir
)

In [None]:
# Traditional Equity Daily Volume T-Test
print("="*100)
print("TRADITIONAL EQUITY DAILY VOLUME T-TEST ANALYSIS")
print("="*100)
print(f"95% Confidence Interval: Â±{T_CRITICAL_95:.3f} (df={DF})")

analyze_assets_by_type(
    assets_by_type,
    "Traditional Equity",
    plot_daily_volume_ttest,
    window_size=WINDOW_3D,
    confidence_level=0.95,
    export_dir=output_dir
)

In [None]:
# Crypto Coin Daily Volume T-Test
print("="*100)
print("CRYPTO COIN DAILY VOLUME T-TEST ANALYSIS")
print("="*100)
print(f"95% Confidence Interval: Â±{T_CRITICAL_95:.3f} (df={DF})")

analyze_assets_by_type(
    assets_by_type,
    "Crypto Coin",
    plot_daily_volume_ttest,
    window_size=WINDOW_3D,
    confidence_level=0.95,
    export_dir=output_dir
)

## 3. Cross-Correlation â€” DeFi vs TradFi Volume

Pearson correlation analysis with lag shifts to identify lead/lag relationships between DeFi and TradFi trading volumes.

**Methodology:**
- **Lag Range:** -7 to +7 days
- **Interpretation:**
  - **Negative lag** = DeFi leads TradFi (DeFi volume predicts future TradFi volume)
  - **Positive lag** = TradFi leads DeFi (TradFi volume predicts future DeFi volume)
  - **Peak correlation** indicates strongest lead/lag relationship
- **Correlation Metric:** Pearson correlation coefficient (r)

**Visualization:**
- Bar chart showing correlation at each lag
- Red bars = negative correlation, Blue bars = positive correlation
- Peak lag annotated with specific values
- Requires minimum 10 overlapping days for analysis

**Implementation:**
- Uses `plot_cross_correlation()` from `utils.statistical_analysis`
- Configurable lag range via `lag_range` parameter
- Automatically adapts labels based on asset type (DeFi/TradFi vs DEX/CEX)

In [None]:
# Traditional Commodity Cross-Correlation
print("="*100)
print("TRADITIONAL COMMODITY CROSS-CORRELATION")
print("="*100)

analyze_assets_by_type(
    assets_by_type,
    "Traditional Commodity",
    plot_cross_correlation,
    lag_range=(-7, 8)
)

In [None]:
# Traditional Equity Cross-Correlation
print("="*100)
print("TRADITIONAL EQUITY CROSS-CORRELATION")
print("="*100)

analyze_assets_by_type(
    assets_by_type,
    "Traditional Equity",
    plot_cross_correlation,
    lag_range=(-7, 8)
)

In [None]:
# Crypto Coin Cross-Correlation
print("="*100)
print("CRYPTO COIN CROSS-CORRELATION")
print("="*100)

analyze_assets_by_type(
    assets_by_type,
    "Crypto Coin",
    plot_cross_correlation,
    lag_range=(-7, 8)
)

## 5. Price Correlation and Tracking Error

Analyze how well DeFi prices track TradFi prices across all assets.

**Metrics:**
- **Price Correlation:** Pearson correlation coefficient between DeFi and TradFi closing prices
- **Tracking Error:** Standard deviation of price differences (in USD)
- **Average Price Difference:** Mean percentage difference between DeFi and TradFi prices

**Interpretation:**
- **High correlation (>0.99):** Prices move together very closely
- **Low tracking error:** Small absolute price differences
- **Low avg % difference:** Prices are well-aligned on average

**Implementation:**
- Uses `create_price_correlation_table()` from `utils.statistical_analysis`
- Analyzes all assets with sufficient overlapping data (minimum 5 days)
- Sorted by asset name for easy comparison

In [None]:
print("\n" + "="*100)
print("PRICE CORRELATION AND TRACKING ERROR")
print("="*100)

df = create_price_correlation_table(assets)
if df is not None:
    display(df)

## Summary Statistics by Asset Type

Aggregated statistics across all assets within each asset class.

**Aggregation Metrics:**
- **Number of Assets:** Count of assets analyzed in each class
- **Average Price Correlation:** Mean correlation across all assets in the class
- **Total DeFi/TradFi Volume:** Sum of average daily volumes across all assets
- **Average Days Analyzed:** Mean number of overlapping days per asset

**Purpose:**
- Compare overall market characteristics across asset types
- Identify which asset classes have the most/least DeFi adoption
- Understand volume distribution across traditional vs. crypto assets

**Implementation:**
- Uses `create_asset_type_summary()` from `utils.statistical_analysis`
- Automatically aggregates all metrics by asset type
- Provides high-level overview of market structure

In [None]:
print("\n" + "="*100)
print("SUMMARY BY ASSET TYPE")
print("="*100)

df = create_asset_type_summary(assets)
if df is not None:
    display(df)