# 📊 Bitcoin Market Analysis

#### 🗂️ **Notebook Structure**

| Section | Description |
|---------|-------------|
| ***1️⃣ Configuration*** | Imports, paths, parameters, events |
| ***2️⃣ Functions*** | Reusable utility functions |
| ***3️⃣ ETL Pipeline*** | Extract → Transform → Load → Export |
| ***4️⃣ Calculations*** | Correlations, rolling metrics, regression prep |
| ***5️⃣ Visualizations*** | Interactive charts & animations |***

---

---

## 1️⃣ CONFIGURATION (PYTHON VERS. 3.12.7)

---

In [None]:
# --- Standard Libraries ---
import json  # JSON file operations
import os  # Operating system interface
import webbrowser  # Open HTML files in default browser
from datetime import timedelta  # Date arithmetic operations
from pathlib import Path  # Cross-platform path handling
# --- Data Science Libraries ---
import numpy as np  # Numerical computations and arrays
import pandas as pd  # DataFrame operations and data manipulation
from scipy import stats  # Statistical functions (regression, correlation)
# --- Financial Data ---
import yfinance as yf  # Yahoo Finance API for market data
# --- Visualization ---
import plotly.graph_objects as go  # Interactive Plotly charts
import plotly.io as pio  # Plotly rendering and export


---

###  **Path Configuration**

---
#### Modify these paths to match your directory structure

The notebook uses `pathlib.Path` for cross-platform compatibility (Windows, Mac, Linux).

**Directory Structure**:
```
project_folder/
├── FINAL.ipynb
├── DATASETS/
│   ├── btc_cap_price.csv
│   └── global_crypto_cap.csv
└── OUTPUTS/          (auto-created)
    ├── CSV/
    └── HTML/
```


In [None]:
# --- Base Directory (automatically detects notebook location) ---
BASE_DIR = Path.cwd()  # Current working directory
# --- Input Paths (CSV data sources) ---
INPUT_DIR = BASE_DIR / "DATASETS"  # Folder containing input CSV files
CSV_BTC = INPUT_DIR / "btc_cap_price.csv"  # Bitcoin price and market cap data
CSV_CRYPTO = INPUT_DIR / "global_crypto_cap.csv"  # Total cryptocurrency market cap
# --- Output Paths (generated files) ---
OUTPUT_DIR = BASE_DIR / "OUTPUTS"  # Main output folder
OUTPUT_CSV = OUTPUT_DIR / "CSV"  # Merged dataframes and intermediate CSVs
OUTPUT_HTML = OUTPUT_DIR / "HTML"  # Interactive Plotly visualizations
# Create output directories if they don't exist
OUTPUT_CSV.mkdir(parents=True, exist_ok=True)  # Create CSV folder + parents
OUTPUT_HTML.mkdir(parents=True, exist_ok=True)  # Create HTML folder + parents


---

####  Parameters




In [None]:
# --- Date Range ---
START_DATE = "2022-01-04"  # Analysis start date (YYYY-MM-DD)
END_DATE = "2025-10-10"  # Analysis end date (YYYY-MM-DD)
# --- Asset Tickers (Yahoo Finance symbols) ---
TICKERS = [
    "^GSPC",    # S&P 500 Index
    "^IXIC",    # NASDAQ Composite Index
    "GLD",      # SPDR Gold Trust ETF
    "CL=F",     # WTI Crude Oil Futures
    "BTC-USD",  # Bitcoin (USD)
    "^VIX"      # CBOE Volatility Index
]
# --- Rolling Window Parameters ---
ROLLING_WINDOW = 30  # Days for rolling correlation calculation
FRAME_STEP = 2  # Animation frame sampling (every N days for performance)


---

####  Market Events 

| Date | Event | Type |
|------|-------|------|
| 2022-02-24 | Russia-Ukraine War | 🔴 Negative |
| 2022-03-16 | Fed Rate Hikes Start | 🟠 Negative |
| 2024-01-10 | BTC ETF Approval | 🟢 Positive |
| 2024-04-19 | Bitcoin Halving | 🟢 Positive |
| 2024-11-05 | US Election (Pro-Crypto) | 🟢 Positive |
| 2025-04-02 | Liberation Day Tariffs | 🔴 Negative |
| 2025-10-10 | Raw Materials Tariffs | 🔴 Negative |


In [None]:
# Key events for annotation on charts (date, label, color, sentiment)

MARKET_EVENTS = [
    ("2022-02-24", "Russia-Ukraine\nWar", "#E74C3C", "negative"),
    ("2022-03-16", "Fed Rate\nHikes Start", "#E67E22", "negative"),
    ("2024-01-10", "BTC ETF\nApproval", "#27AE60", "positive"),
    ("2024-04-19", "Bitcoin\nHalving", "#16A085", "positive"),
    ("2024-11-05", "US Election\n(Pro-Crypto)", "#27AE60", "positive"),
    ("2025-04-02", "Liberation Day\nTariffs", "#E74C3C", "negative"),
    ("2025-10-10", "Raw Materials\nTariffs", "#E74C3C", "negative")
]
# Convert string dates to pandas Timestamps for easier manipulation
EVENT_DATES = [
    (pd.Timestamp(date), label, color, sentiment)
    for date, label, color, sentiment in MARKET_EVENTS
]


---

####  Plotting Configuration


In [None]:
# Custom colorscale for correlation heatmap (low to high correlation)
COLORSCALE_CORR = [
    [0.0, "#042E16"],  # Dark green (lowest correlation)
    [0.1, "#0C552C"],
    [0.2, '#7DCEA0'],
    [0.3, "#ABDFEB"],  # Light blue
    [0.4, "#86A7FC"],
    [0.5, '#ECF0F1'],  # White (medium correlation)
    [0.6, '#FADBD8'],  # Light red
    [0.7, '#F5B7B1'],
    [0.8, '#F1948A'],
    [0.9, '#EC7063'],
    [1.0, '#E74C3C']   # Bright red (highest correlation)
]
# Asset name mapping for chart labels (cleaner display names)
ASSET_RENAME = {
    'BTC_Price': 'BTC',
    'Close_^GSPC': 'S&P 500',
    'Close_^IXIC': 'NASDAQ',
    'Close_GLD': 'Gold',
    'Close_CL=F': 'WTI'
}
# =============================================================================
print(f"📁 Base directory: {BASE_DIR}")
print(f"📂 Input CSVs: {INPUT_DIR}")
print(f"📂 Output folder: {OUTPUT_DIR}")

---

## 2️⃣ FUNCTIONS 

---



### 📚 Function Categories:

| Category | Functions | Purpose |
|----------|-----------|----------|
| **Data Cleaning** | `clean_index`, `safe_join`, `canonicalize_columns` | DataFrame preprocessing |
| **Correlation** | `get_lower_triangle`, `compute_period_corr` | Correlation matrix operations |
| **Visualization** | `add_highlight_annotations`, `get_text_color` | Chart annotations |
| **Regression** | `compute_regression`, `get_event_color` | Statistical analysis |
| **Export** | `open_html_in_browser` | File operations |

---



In [None]:
# REUSABLE FUNCTIONS

def clean_index(df: pd.DataFrame) -> pd.DataFrame:
    """
    Clean DataFrame index: convert to datetime, remove duplicates, sort chronologically
    
    Args:
        df: Input DataFrame with potentially dirty index
    
    Returns:
        DataFrame with cleaned datetime index
    """
    d = df.copy()  # Create copy to avoid modifying original
    d.index = pd.to_datetime(d.index)  # Convert index to datetime format
    d = d[~d.index.duplicated(keep='last')]  # Remove duplicates (keep most recent)
    return d.sort_index()  # Sort chronologically


def safe_join(base: pd.DataFrame, other: pd.DataFrame, cols: list) -> pd.DataFrame:
    """
    Safely join DataFrames by checking column existence first (prevents KeyError)
    
    Args:
        base: Left DataFrame (preserved structure)
        other: Right DataFrame (columns to add)
        cols: List of column names to join from 'other'
    
    Returns:
        Merged DataFrame with only existing columns from 'cols'
    """
    # Filter to only columns that actually exist in 'other' DataFrame
    existing_cols = [c for c in cols if c in other.columns]
    
    if existing_cols:
        return base.join(other[existing_cols], how='left')  # Left join (preserve all base rows)
    else:
        return base  # No valid columns to join, return base unchanged


def canonicalize_columns(df_btc: pd.DataFrame, df_crypto: pd.DataFrame) -> tuple:
    """
    Standardize CSV column names to unified nomenclature (handles different formats)
    
    Args:
        df_btc: Bitcoin DataFrame with potential naming variations
        df_crypto: Crypto market DataFrame with potential naming variations
    
    Returns:
        Tuple of (renamed_btc_df, renamed_crypto_df)
    """
    # --- BTC Price Mapping (try multiple common column names) ---
    btc_map = {}
    
    # Search for price column variants
    for candidate in ['prices', 'price', 'Price', 'open_price']:
        if candidate in df_btc.columns:
            btc_map[candidate] = 'BTC_Price_CSV'  # Standardized name
            break  # Stop after first match
    
    # Search for market cap column variants
    for candidate in ['market_cap', 'market_cap_usd', 'marketCap']:
        if candidate in df_btc.columns:
            btc_map[candidate] = 'BTC_MarketCap'  # Standardized name
            break
    
    # --- Crypto Total Market Cap Mapping ---
    crypto_map = {}
    
    for candidate in ['total_market_cap', 'market_cap', 'total_mcap']:
        if candidate in df_crypto.columns:
            crypto_map[candidate] = 'Crypto_Total_Cap'  # Standardized name
            break
    
    # Apply renamings and return both DataFrames
    return df_btc.rename(columns=btc_map), df_crypto.rename(columns=crypto_map)



In [None]:
# CORRELATION ANALYSIS FUNCTIONS
def get_lower_triangle(matrix: pd.DataFrame) -> pd.DataFrame:
    """
    Extract lower triangle of correlation matrix (remove redundant upper triangle)
    
    Args:
        matrix: Square correlation matrix
    
    Returns:
        Matrix with only lower triangle values (upper triangle = NaN)
    """
    # Create boolean mask: True for lower triangle (excluding diagonal)
    mask = np.tril(np.ones_like(matrix, dtype=bool), k=-1)
    return matrix.where(mask)  # Keep only True positions, rest become NaN


def compute_period_corr(start: pd.Timestamp, end: pd.Timestamp, 
                        df: pd.DataFrame, cols: list, 
                        rename_dict: dict) -> tuple:
    """
    Calculate correlation matrix for a specific time period
    
    Args:
        start: Period start date
        end: Period end date
        df: Main DataFrame with price data
        cols: Column names to correlate
        rename_dict: Mapping for readable asset names
    
    Returns:
        Tuple of (correlation_matrix, formatted_start_date, formatted_end_date)
    """
    # 1. Filter data to period
    period_df = df.loc[start:end]
    
    # 2. Keep only available columns
    period_cols = [c for c in cols if c in period_df.columns]
    period_base = period_df[period_cols].dropna()  # Remove missing values
    
    # 3. Calculate log returns: ln(P_t / P_{t-1})
    period_logret = np.log(period_base).diff().dropna()
    
    # 4. Compute absolute correlation matrix
    period_corr = period_logret.corr().abs()
    
    # 5. Rename for readability
    period_corr = period_corr.rename(columns=rename_dict, index=rename_dict)
    
    # 6. Extract lower triangle only
    period_lower = get_lower_triangle(period_corr).round(2)
    
    # 7. Format dates for display
    return period_lower, start.strftime('%d %b %Y'), end.strftime('%d %b %Y')


In [None]:
# HEATMAP ANNOTATION FUNCTIONS

def get_text_color(corr_value: float) -> str:
    """
    Choose readable text color based on correlation intensity (contrast optimization)
    
    Args:
        corr_value: Correlation coefficient (0 to 1)
    
    Returns:
        Hex color code for text
    """
    if np.isnan(corr_value):
        return '#2C3E50'  # Dark gray for empty cells
    
    if corr_value < 0.35:
        return '#FFFFFF'  # White text on dark background (low correlation)
    elif corr_value < 0.65:
        return '#2C3E50'  # Dark text on light background (medium correlation)
    else:
        return '#FFFFFF'  # White text on dark background (high correlation)


def add_highlight_annotations(z_values: np.ndarray, 
                              x_labels: list, 
                              y_labels: list) -> list:
    """
    Generate Plotly annotations for heatmap cells (display correlation values)
    
    Args:
        z_values: 2D array of correlation values
        x_labels: Column labels (asset names)
        y_labels: Row labels (asset names)
    
    Returns:
        List of Plotly annotation dictionaries
    """
    annotations = []
    
    for i, row in enumerate(z_values):  # Iterate rows
        for j, val in enumerate(row):  # Iterate columns
            if not np.isnan(val):  # Only annotate non-empty cells
                # High correlations (>0.7) get larger, bold text
                text = f"{val:.2f}"
                font_size = 18 if val > 0.7 else 15
                font_color = get_text_color(val)
                
                annotations.append(dict(
                    x=x_labels[j],  # X position (column)
                    y=y_labels[i],  # Y position (row)
                    text=text,
                    showarrow=False,  # No arrow pointer
                    font=dict(size=font_size, color=font_color, family='Arial')
                ))
    
    return annotations

In [None]:
# REGRESSION ANALYSIS FUNCTIONS

def compute_regression(x: np.ndarray, y: np.ndarray) -> dict:
    """
    Calculate linear regression statistics for two variables
    
    Args:
        x: Independent variable (log returns)
        y: Dependent variable (log returns)
    
    Returns:
        Dictionary with regression stats or None if insufficient data
    """
    # Remove NaN pairs
    mask = ~(np.isnan(x) | np.isnan(y))
    x_clean, y_clean = x[mask], y[mask]
    
    # Need at least 10 points for meaningful regression
    if len(x_clean) < 10:
        return None
    
    # Compute OLS regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(x_clean, y_clean)
    
    return {
        'slope': slope,           # Beta coefficient (sensitivity)
        'intercept': intercept,   # Alpha (constant term)
        'r_squared': r_value**2,  # Coefficient of determination (goodness of fit)
        'p_value': p_value,       # Statistical significance
        'n': len(x_clean)         # Sample size
    }


def get_event_color(date: pd.Timestamp, event_dates: list) -> str:
    """
    Determine which event period a date falls into (for color coding)
    
    Args:
        date: Date to check
        event_dates: List of (date, label, color, sentiment) tuples
    
    Returns:
        Hex color code for the event period
    """
    # Find the event period containing this date
    for i in range(len(event_dates) - 1):
        if event_dates[i][0] <= date < event_dates[i+1][0]:
            return event_dates[i][2]  # Return color of matching period
    
    return event_dates[-1][2]  # Default to last event color

In [None]:
# HTML EXPORT FUNCTION
def open_html_in_browser(filepath: Path) -> None:
    """
    Open HTML file in system's default web browser
    
    Args:
        filepath: Path object pointing to HTML file
    """
    # Convert to absolute path and open in default browser
    webbrowser.open('file://' + str(filepath.resolve()))
    print(f"🌐 Opened {filepath.name} in default browser")

In [None]:
# =============================================================================




print("✅ All functions defined successfully")
print(f"   - Data cleaning: clean_index, safe_join, canonicalize_columns")
print(f"   - Correlation: get_lower_triangle, compute_period_corr")
print(f"   - Visualization: add_highlight_annotations, get_text_color")
print(f"   - Regression: compute_regression, get_event_color")
print(f"   - Export: open_html_in_browser")


---

## 3️⃣ETL PIPELINE

---



##### 📊 Data Processing Workflow

###### The ETL (Extract-Transform-Load) pipeline processes data from multiple sources into a unified dataset.

```
┌─────────────┐
│  EXTRACT    │ → Download Yahoo Finance data + Load CSV files
└─────────────┘
       ↓
┌─────────────┐
│ TRANSFORM   │ → Clean dates, standardize columns, remove duplicates
└─────────────┘
       ↓
┌─────────────┐
│   LOAD      │ → Merge all sources, calculate derived metrics
└─────────────┘
       ↓
┌─────────────┐
│  EXPORT     │ → Save merged DataFrame to CSV
└─────────────┘
```

---

##### Data Sources:
###### 1. **Yahoo Finance API**: Real-time market data for BTC, S&P 500, NASDAQ, Gold, Oil, VIX
###### 2. **CSV Files**: Historical Bitcoin price/market cap + total crypto market cap


---

#### EXTRACT 

In [None]:
# --- Yahoo Finance Data Download ---
print(f"📥 Downloading {len(TICKERS)} assets from Yahoo Finance...")
raw_close = yf.download(
    TICKERS,                    # List of ticker symbols
    start=START_DATE,           # Start date
    end=END_DATE,               # End date
    auto_adjust=True,           # Adjust for splits and dividends
    progress=False              # Suppress progress bar
)['Close']  # Extract only closing prices
# Force conversion to DataFrame (handles single ticker edge case)
if not isinstance(raw_close, pd.DataFrame):
    raw_close = raw_close.to_frame()
# Clean index and add prefix to columns
prices = clean_index(raw_close)
prices.columns = [f"Close_{c}" for c in prices.columns]  # Prefix: Close_BTC-USD, Close_^GSPC, etc.

print(f"   ✅ Downloaded {len(prices)} rows, {len(prices.columns)} assets")
print(f"   📅 Coverage: {prices.index.min()} → {prices.index.max()}\n")
# --- CSV Data Loading ---
print("📥 Loading CSV files...")

# Load Bitcoin price and market cap data
df_btc = pd.read_csv(CSV_BTC)
print(f"   ✅ Loaded {CSV_BTC.name}: {len(df_btc)} rows")

# Load total cryptocurrency market cap data
df_crypto = pd.read_csv(CSV_CRYPTO)
print(f"   ✅ Loaded {CSV_CRYPTO.name}: {len(df_crypto)} rows\n")

print("✅ Extraction phase completed")



---

#### TRANSFORM 




In [None]:
# Standardize formats, clean dates, rename columns

print("🔄 Starting data transformation...\n")

# --- Date Parsing ---
print("📅 Parsing dates...")

# BTC CSV: automatic date detection (handles multiple formats)
df_btc['date'] = pd.to_datetime(df_btc['date'], errors='coerce')

# Crypto CSV: dates stored as milliseconds since epoch (Unix timestamp)
df_crypto['date'] = pd.to_datetime(df_crypto['date'], unit='ms', errors='coerce')

# Set date as index for time series operations
df_btc = df_btc.set_index('date')
df_crypto = df_crypto.set_index('date')

print(f"   ✅ BTC data: {df_btc.index.min()} → {df_btc.index.max()}")
print(f"   ✅ Crypto data: {df_crypto.index.min()} → {df_crypto.index.max()}\n")
# --- Column Standardization ---
print("🔤 Standardizing column names...")
df_btc, df_crypto = canonicalize_columns(df_btc, df_crypto)
print(f"   ✅ BTC columns: {list(df_btc.columns)}")
print(f"   ✅ Crypto columns: {list(df_crypto.columns)}\n")
# --- Final Index Cleaning ---
print("🧹 Cleaning indices (removing duplicates, sorting)...")
prices = clean_index(prices)
df_btc = clean_index(df_btc)
df_crypto = clean_index(df_crypto)
print("   ✅ All indices cleaned and sorted\n")

print("✅ Transformation phase completed")


---

####  LOAD 



In [None]:
# Join Yahoo Finance data with CSV sources

print("🔄 Starting data merge...\n")

# Start with Yahoo Finance prices as base
df = prices.copy()
print(f"📊 Base DataFrame: {len(df)} rows, {len(df.columns)} columns")

# Merge Bitcoin CSV data (price and market cap)
df = safe_join(df, df_btc, ['BTC_Price_CSV', 'BTC_MarketCap'])
print(f"   ✅ Merged Bitcoin data: {len(df)} rows, {len(df.columns)} columns")

# Merge total crypto market cap data
df = safe_join(df, df_crypto, ['Crypto_Total_Cap'])
print(f"   ✅ Merged crypto cap data: {len(df)} rows, {len(df.columns)} columns\n")

# --- Derived Column Calculations ---
print("🧮 Computing derived metrics...\n")
# 1. Unified BTC Price (merge yfinance + CSV sources)
# Priority: Yahoo Finance (more reliable) → CSV fallback
if 'Close_BTC-USD' in df.columns and 'BTC_Price_CSV' in df.columns:
    df['BTC_Price'] = df['Close_BTC-USD'].fillna(df['BTC_Price_CSV'])  # Fill gaps with CSV data
    print("   ✅ BTC_Price: Merged yfinance + CSV (yfinance priority)")
elif 'Close_BTC-USD' in df.columns:
    df['BTC_Price'] = df['Close_BTC-USD']
    print("   ✅ BTC_Price: Using yfinance only")
elif 'BTC_Price_CSV' in df.columns:
    df['BTC_Price'] = df['BTC_Price_CSV']
    print("   ✅ BTC_Price: Using CSV only")
# 2. Bitcoin Dominance (BTC market cap / total crypto market cap)
if 'BTC_MarketCap' in df.columns and 'Crypto_Total_Cap' in df.columns:
    df['BTC_Dominance'] = (df['BTC_MarketCap'] / df['Crypto_Total_Cap']) * 100  # Percentage
    print(f"   ✅ BTC_Dominance: Calculated ({df['BTC_Dominance'].notna().sum()} values)")
    print(f"      Range: {df['BTC_Dominance'].min():.2f}% → {df['BTC_Dominance'].max():.2f}%\n")

print("✅ Load phase completed")
print(f"📊 Final DataFrame: {len(df)} rows × {len(df.columns)} columns")
print(f"📅 Date range: {df.index.min()} → {df.index.max()}")



---

### EXPORT 

###### **Output**: `OUTPUTS/CSV/Merged_df.csv`


In [None]:
# Save final merged DataFrame for reproducibility

# Define export path
MERGED_CSV_PATH = OUTPUT_CSV / "Merged_df.csv"

# Export to CSV with index (dates)
df.to_csv(MERGED_CSV_PATH, index=True)

print("💾 ETL Export Summary")
print("=" * 50)
print(f"✅ Saved: {MERGED_CSV_PATH}")
print(f"📊 Rows: {len(df):,}")
print(f"📋 Columns: {len(df.columns)}")
print(f"📅 Period: {df.index.min().strftime('%Y-%m-%d')} → {df.index.max().strftime('%Y-%m-%d')}")
print(f"💽 File size: {MERGED_CSV_PATH.stat().st_size / 1024:.2f} KB\n")

print("✅ ETL pipeline completed successfully")


---

## 4️⃣ CALCULATIONS & METRICS

---


In [None]:
# --- Asset Selection ---
price_cols = ['BTC_Price', 'Close_^GSPC', 'Close_^IXIC', 'Close_GLD', 'Close_CL=F']
available_cols = [c for c in price_cols if c in df.columns]  # Keep only existing columns

print(f"📊 Assets included: {', '.join([ASSET_RENAME.get(c, c) for c in available_cols])}\n")
# --- Log Returns Calculation ---
# Log returns are more appropriate for correlation analysis (better statistical properties)
base = df[available_cols].dropna()  # Remove rows with any missing values
logret = np.log(base).diff().dropna()  # ln(P_t / P_{t-1})

print(f"   ✅ Log returns computed: {len(logret)} observations")

# --- Overall Correlation Matrix ---
corr = logret.corr().abs()  # Absolute correlation (0 to 1)
corr = corr.rename(columns=ASSET_RENAME, index=ASSET_RENAME)  # Readable names
corr_lower = get_lower_triangle(corr).round(2)  # Lower triangle only (remove redundancy)

print(f"   ✅ Overall correlation matrix computed\n")
# --- Period-Specific Correlations ---
# Define multiple time windows for comparative analysis
periods_map = {
    'Full Period': (df.index.min(), df.index.max()),
    'Last Year': (df.index.max() - pd.DateOffset(years=1), df.index.max()),
    'Year 2024': (pd.Timestamp('2024-01-01'), pd.Timestamp('2024-12-31')),
    'Last 90d': (df.index.max() - pd.DateOffset(days=90), df.index.max()),
    'Last 30d': (df.index.max() - pd.DateOffset(days=30), df.index.max()),
    'BTC ETF → Halving': (pd.Timestamp('2024-01-10'), pd.Timestamp('2024-04-20')),
    'Halving → US Election': (pd.Timestamp('2024-04-20'), pd.Timestamp('2024-11-05'))
}

print("📅 Computing period-specific correlations...")
# Pre-compute all period correlations (improves chart performance)
period_data = {}
for name, (start, end) in periods_map.items():
    period_corr, start_str, end_str = compute_period_corr(
        start, end, df, available_cols, ASSET_RENAME
    )
    period_data[name] = (period_corr, start_str, end_str)
    print(f"   ✅ {name}: {start_str} → {end_str}")

print(f"\n✅ Correlation calculations completed for {len(periods_map)} periods")


---

### Rolling Correlation (BTC vs S&P 500)

##### 30-Day Rolling Window


In [None]:
# --- Data Preparation ---
# Extract BTC and S&P 500 prices
pair = df[['BTC_Price', 'Close_^GSPC']].dropna().copy()

print(f"📊 Data points: {len(pair)}")
print(f"📅 Coverage: {pair.index.min()} → {pair.index.max()}\n")

# --- Log Returns ---
logret_pair = np.log(pair).diff().dropna()  # Daily log returns
# --- Rolling Correlation ---
# Calculate correlation over sliding 30-day window
corr_series = (
    logret_pair['BTC_Price']
    .rolling(window=ROLLING_WINDOW)  # 30-day window
    .corr(logret_pair['Close_^GSPC'])  # Correlate with S&P 500
    .dropna()  # Remove initial NaN values (insufficient window)
)

# Convert to DataFrame for easier manipulation
corr_df = corr_series.to_frame('Corr').copy()

print(f"   ✅ Rolling correlation computed: {len(corr_df)} values")
print(f"   📉 Min correlation: {corr_df['Corr'].min():.3f}")
print(f"   📈 Max correlation: {corr_df['Corr'].max():.3f}")
print(f"   📊 Mean correlation: {corr_df['Corr'].mean():.3f}\n")
# --- Align Price Data ---
# Match price data to correlation dates (for tooltips and annotations)
price_aligned = pair.loc[corr_df.index, ['BTC_Price', 'Close_^GSPC']].copy()

# --- Y-axis Range (fixed for animation stability) ---
cmin, cmax = float(corr_df['Corr'].min()), float(corr_df['Corr'].max())
corr_range = max(1e-6, cmax - cmin)  # Avoid division by zero
padding = 0.08 * corr_range  # 8% padding for visual clarity
Y_RANGE_CORR = [
    max(-1.0, cmin - padding),  # Lower bound (min -1.0)
    min(1.0, cmax + padding)    # Upper bound (max 1.0)
]

print(f"📏 Y-axis range: [{Y_RANGE_CORR[0]:.3f}, {Y_RANGE_CORR[1]:.3f}]\n")
# --- Animation Frame Selection ---
# Sample dates for performance (every FRAME_STEP days)
all_dates_corr = corr_df.index.tolist()
frame_dates_corr = all_dates_corr[::FRAME_STEP]  # Every 2nd day

# Ensure last date is included
if frame_dates_corr[-1] != all_dates_corr[-1]:
    frame_dates_corr.append(all_dates_corr[-1])



---

### Bitcoin Dominance & Price

##### **Interpretation**:

##### **High dominance (>50%)**: Bitcoin leads crypto market

##### **Low dominance (<40%)**: Altcoins gaining market share

##### **30-day MA**: Smoothed trend line to filter noise


In [None]:
# --- Data Extraction ---
df_dominance = df[['BTC_Dominance', 'BTC_Price']].dropna().copy()

print(f"📊 Data points: {len(df_dominance)}")
print(f"📅 Coverage: {df_dominance.index.min()} → {df_dominance.index.max()}\n")
# --- Moving Average (Smoothing) ---
# 30-day moving average to reduce noise and identify trends
df_dominance['MA_30'] = df_dominance['BTC_Dominance'].rolling(window=30).mean()

# --- Summary Statistics ---
print("📈 Bitcoin Dominance:")
print(f"   Min: {df_dominance['BTC_Dominance'].min():.2f}%")
print(f"   Max: {df_dominance['BTC_Dominance'].max():.2f}%")
print(f"   Mean: {df_dominance['BTC_Dominance'].mean():.2f}%")
print(f"   Current: {df_dominance['BTC_Dominance'].iloc[-1]:.2f}%\n")

print("💰 Bitcoin Price:")
print(f"   Min: ${df_dominance['BTC_Price'].min():,.0f}")
print(f"   Max: ${df_dominance['BTC_Price'].max():,.0f}")
print(f"   Current: ${df_dominance['BTC_Price'].iloc[-1]:,.0f}\n")
# --- Fixed Y-axis Ranges (for stable animation) ---
dom_min, dom_max = df_dominance['BTC_Dominance'].min(), df_dominance['BTC_Dominance'].max()
price_min, price_max = df_dominance['BTC_Price'].min(), df_dominance['BTC_Price'].max()

# Calculate ranges with padding
dom_range = max(1e-6, float(dom_max - dom_min))
price_range = max(1e-6, float(price_max - price_min))

dom_padding = 0.06 * dom_range
price_padding = 0.08 * price_range

Y1_RANGE_DOM = [round(dom_min - dom_padding, 2), round(dom_max + dom_padding, 2)]
Y2_RANGE_PRICE = [price_min - price_padding, price_max + price_padding]

print(f"📏 Y-axis ranges:")
print(f"   Price: [${Y2_RANGE_PRICE[0]:,.0f}, ${Y2_RANGE_PRICE[1]:,.0f}]\n")
# --- Animation Frames ---
all_dates_dom = df_dominance.index.tolist()
frame_dates_dom = all_dates_dom[::15]  # Every 15 days for smoother animation

if frame_dates_dom[-1] != all_dates_dom[-1]:
    frame_dates_dom.append(all_dates_dom[-1])


---

# 5️⃣ INTERACTIVE VISUALIZATIONS

---







| # | Chart Type | Features | Output File |
|---|------------|----------|-------------|
| **1** | Correlation Heatmap | Multi-period selector, annotations | `1_correlation_matrix.html` |
| **2** | Rolling Correlation | Animation, event markers | `2_btc_spx_rolling_corr_dynamic.html` |
| **3** | Dominance & Price | Dual-axis, moving average | `3_btc_dominance_price_animated.html` |
| **4** | Regression Scatter | BTC vs S&P 500 with regression line | `4_regression_btc_sp500_animated.html` |
| **5** | Regression Scatter | BTC vs Gold with regression line | `5_regression_btc_gold_animated.html` |



---

### Correlation Matrix Heatmap




In [None]:
# --- Setup for Correlation Heatmap ---

price_cols = ['BTC_Price', 'Close_^GSPC', 'Close_^IXIC', 'Close_GLD', 'Close_CL=F']
cols = [c for c in price_cols if c in df.columns]  # Keep only existing columns
base = df[cols].dropna()  # Remove rows with missing values (correlation needs complete data)
logret = np.log(base).diff().dropna()  # Compute log returns: ln(P_t / P_{t-1})
corr = logret.corr().abs()  # Absolute correlation values (0 to 1)

# Rename columns for readability in the chart
rename_dict = {
    'BTC_Price': 'BTC', 
    'Close_^GSPC': 'S&P 500', 
    'Close_^IXIC': 'NASDAQ',
    'Close_GLD': 'Gold', 
    'Close_CL=F': 'WTI'
}
corr = corr.rename(columns=rename_dict, index=rename_dict)
corr_lower = get_lower_triangle(corr).round(2)  # Extract lower triangle only


In [None]:
# --- PERIOD DEFINITIONS ---

# Define time periods for analysis (users can switch via buttons)
periods_map = {
    'Full Period': (df.index.min(), df.index.max()),
    'Last Year': (df.index.max() - pd.DateOffset(years=1), df.index.max()),
    'Year 2024': (pd.Timestamp('2024-01-01'), pd.Timestamp('2024-12-31')),
    'Last 90d': (df.index.max() - pd.DateOffset(days=90), df.index.max()),
    'Last 30d': (df.index.max() - pd.DateOffset(days=30), df.index.max()),
    'BTC ETF → Halving': (pd.Timestamp('2024-01-10'), pd.Timestamp('2024-04-20')),
    'Halving → US Election': (pd.Timestamp('2024-04-20'), pd.Timestamp('2024-11-05'))
}

# Pre-compute correlations for each period (improves performance)
period_data = {
    name: compute_period_corr(start, end, df, cols, rename_dict) 
    for name, (start, end) in periods_map.items()
}

In [None]:
# --- FIGURE CREATION ---

# Create heatmap trace
fig = go.Figure(data=go.Heatmap(
    z=corr_lower.values,           # 2D matrix of correlation values
    x=corr_lower.columns,          # X-axis labels (asset names)
    y=corr_lower.index,            # Y-axis labels (asset names)
    colorscale=COLORSCALE_CORR,    # Custom color gradient (green→red)
    zmin=0,                        # Minimum value for color scale
    zmax=1,                        # Maximum value for color scale
    showscale=True,                # Display color bar legend
    hovertemplate='%{y} vs %{x}<br>Correlation: %{z:.2f}',  # Tooltip format
    colorbar=dict(
        title=dict(
            text="Strength",       # Legend title
            side="right",          # Title position
            font=dict(size=14, color='#2C3E50')
        ),
        thickness=22,              # Color bar width in pixels
        len=0.75,                  # Color bar length (75% of plot height)
        x=1.02,                    # Position (right of plot)
        tickmode='array',          # Manual tick definition
        tickvals=[0, 0.2, 0.4, 0.6, 0.8, 1.0],  # Tick positions
        ticktext=['0', '0.2', '0.4', '0.6', '0.8', '1'],  # Tick labels
        tickfont=dict(size=13, color='#2C3E50'),
        outlinewidth=1,            # Border thickness
        outlinecolor='#BDC3C7'     # Border color (light gray)
    )
))

In [None]:
# --- ANNOTATIONS (CELL VALUES) ---

# Add initial annotations (correlation values displayed on cells)
initial_annotations = add_highlight_annotations(
    corr_lower.values,              # Matrix values
    corr_lower.columns.tolist(),    # Column labels
    corr_lower.index.tolist()       # Row labels
)

# --- PERIOD SELECTOR BUTTONS ---

# Create buttons to switch between time periods
buttons = []
for name, (corr_matrix, period_start, period_end) in period_data.items():
    # Generate new annotations for this period
    new_annotations = add_highlight_annotations(
        corr_matrix.values, 
        corr_matrix.columns.tolist(), 
        corr_matrix.index.tolist()
    )
    
    # Define button action
    buttons.append(dict(
        label=name,                    # Button text
        method='update',               # Update both data AND layout
        args=[
            {'z': [corr_matrix.values]},  # Update heatmap data
            {
                'title.text': f'Correlation Strength Between Assets<br>{period_start} → {period_end}',
                'annotations': new_annotations  # Update cell values
            }
        ]
    ))

In [None]:
# --- LAYOUT CONFIGURATION ---

fig.update_layout(
    # Title
    title={
        'text': f'Correlation Strength Between Assets<br>{df.index.min().strftime("%d %b %Y")} → {df.index.max().strftime("%d %b %Y")}',
        'x': 0.5,                      # Centered horizontally
        'xanchor': 'center',
        'y': 0.95,                     # Near top
        'font': {
            'size': 24,
            'color': '#2C3E50',        # Dark gray
            'family': 'Arial Black'
        }
    },
    
    # Initial annotations
    annotations=initial_annotations,
    
    # X-axis configuration
    xaxis={
        'side': 'bottom',              # Place labels at bottom
        'tickfont': {'size': 14, 'color': '#2C3E50'},
        'showgrid': False,             # Hide grid lines
        'showline': True,              # Show axis line
        'linecolor': '#BDC3C7',        # Light gray
        'linewidth': 2
    },
    
    # Y-axis configuration
    yaxis={
        'tickfont': {'size': 14, 'color': '#2C3E50'},
        'showgrid': False,
        'showline': True,
        'linecolor': '#BDC3C7',
        'linewidth': 2
    },
    
    # Colors
    plot_bgcolor='#ECF0F1',            # Light gray background
    paper_bgcolor='#ECF0F1',           # Canvas background
    
    # Dimensions
    width=800,
    height=600,
    margin=dict(l=120, r=180, t=150, b=100),  # Space for labels and legend
    
    # Interactive menu (period selector)
    updatemenus=[dict(
        type='buttons',                # Dropdown menu
        direction='down',              # Opens downward
        x=0.9,                        # Position (right side)
        y=0.6,                         # Near top
        buttons=buttons,               # List of period buttons
        bgcolor="#D8D8D8",             # White background
        bordercolor='#2C3E50',         # Dark border
        borderwidth=2,
        font=dict(size=12, color='#2C3E50')
    )]
)                                # Display in notebook

In [None]:
# --- EXPORT ---

fig.write_html(OUTPUT_HTML / '1_correlation_matrix.html')  # Save as standalone HTML file


In [None]:
# HEATMAP IN BROWSER
# =============================================================================
# Opens the generated HTML file using your system's default web browser

#correlation_html = OUTPUT_HTML / 'correlation_matrix.html'

#if correlation_html.exists():
#    open_html_in_browser(correlation_html)
#else:
#    print(f"❌ File not found: {correlation_html}")
#    print("   Please run the visualization cell first")



---



### BTC vs S&P 500 - 30-Day Rolling Correlation Evolution



In [None]:
# 1) Parameters
# Use ROLLING_WINDOW defined in configuration
# Use FRAME_STEP defined in configuration



In [None]:
# 2) Prepare correlation + price data
# Use pre-calculated data from Cell 8
# pair, corr_df, price_aligned already computed
logret = np.log(pair).diff().dropna()
# Skip recalculation - use corr_df from Cell 8
corr_series = logret['BTC_Price'].rolling(ROLLING_WINDOW).corr(logret['Close_^GSPC']).dropna()
corr_df = corr_series.to_frame('Corr').copy()
price_aligned = pair.loc[corr_df.index, ['BTC_Price', 'Close_^GSPC']].copy()


In [None]:
# 3) Fix axis ranges
cmin, cmax = float(corr_df['Corr'].min()), float(corr_df['Corr'].max())
crng = max(1e-6, cmax - cmin)
pad = 0.08 * crng
Y_RANGE = [max(-1.0, cmin - pad), min(1.0, cmax + pad)]

In [None]:
# 4) Build event shapes (no pre-calculated variations)
shapes, base_annotations = [], []
interval_info = []  # Store interval boundaries only

for i, (date_i, label_i, color_i, sent_i) in enumerate(EVENT_DATES):  # Use configured events
    # Vertical line
    shapes.append(dict(
        type='line', xref='x', yref='paper', x0=date_i, x1=date_i, y0=0, y1=1,
        line=dict(color=color_i, width=2, dash='dash')
    ))
    # Rotated label
    base_annotations.append(dict(
        x=date_i, y=1.10, xref='x', yref='paper', text=label_i,
        showarrow=False, textangle=-60, font=dict(size=9, color=color_i, family='Arial'),
        xanchor='right', yanchor='bottom'
    ))
    
    # Store interval info
    if i < len(EVENT_DATES) - 1:
        date_j = EVENT_DATES[i + 1][0]
        
        # Rectangle
        shapes.append(dict(
            type='rect', xref='x', yref='paper', x0=date_i, x1=date_j,
            y0=0, y1=1, fillcolor=color_i, opacity=0.08, layer='below', line_width=0
        ))
        
        mid_date = date_i + (date_j - date_i) / 2
        interval_info.append({
            'start': date_i,
            'end': date_j,
            'mid': mid_date,
            'color': color_i
        })


In [None]:
# 5) Frame dates
all_dates = corr_df.index.to_list()
frame_dates = all_dates[::FRAME_STEP]
if frame_dates[-1] != all_dates[-1]:
    frame_dates.append(all_dates[-1])

t0 = frame_dates[0]
init = corr_df.loc[:t0]
x0 = init.index

In [None]:
# 6) Base figure
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=x0, y=init['Corr'], name='BTC–S&P500 Corr',
    line=dict(color="#2C3E50", width=3),
    hovertemplate='%{x|%d %b %Y}<br>Corr: %{y:.2f}',
))
# Build frames with DYNAMIC variation calculation
frames = []
for t in frame_dates:
    span = corr_df.loc[:t]
    x = span.index
    y = span['Corr'].values
    
    # Current values
    tip_y = float(y[-1])
    btc_price = float(price_aligned.loc[t, 'BTC_Price'])
    spx_price = float(price_aligned.loc[t, 'Close_^GSPC'])
    
    # Tip tag
    tip_annot = dict(
        x=t, y=tip_y, xref='x', yref='y',
        text=f'{tip_y:+.2f}', showarrow=True, arrowhead=2, arrowsize=1.2,
        arrowcolor='#2C3E50', ax=36, ay=-24, standoff=6,
        font=dict(size=11, color='white', family='Arial Black'),
        bgcolor='#2C3E50', bordercolor='#2C3E50', borderwidth=1
    )
    
    # Price indicators
    price_indicators = [
        dict(
            x=0.88, y=0.94, xref='paper', yref='paper',
            text=f'<b>BTC</b><br>${btc_price:,.0f}',
            showarrow=False, align='center',
            font=dict(size=11, color='white', family='Arial'),
            bgcolor='#F39C12', bordercolor='#F39C12',
            borderwidth=2, borderpad=6
        ),
        dict(
            x=0.98, y=0.94, xref='paper', yref='paper',
            text=f'<b>S&P 500</b><br>{spx_price:,.0f}',
            showarrow=False, align='center',
            font=dict(size=11, color='white', family='Arial'),
            bgcolor='#3498DB', bordercolor='#3498DB',
            borderwidth=2, borderpad=6
        )
    ]
    
    # DYNAMIC interval badges: calculate variation from start to current time t
    visible_badges = []
    for interval in interval_info:
        # Show badge if we've passed the start of the interval
        if t >= interval['start']:
            # Calculate variation from interval start to MIN(current_time, interval_end)
            effective_end = min(t, interval['end'])
            
            try:
                # Get nearest data points
                idx_start = corr_df.index.get_indexer([interval['start']], method='nearest')[0]
                idx_end = corr_df.index.get_indexer([effective_end], method='nearest')[0]
                
                start_val = float(corr_df.iloc[idx_start]['Corr'])
                end_val = float(corr_df.iloc[idx_end]['Corr'])
                variation = end_val - start_val
                
                var_color = '#27AE60' if variation >= 0 else '#E74C3C'
                
                # Show badge with current variation (updates as animation progresses)
                visible_badges.append(dict(
                    x=interval['mid'], y=-0.12, xref='x', yref='paper',
                    text=f"{variation:+.2f}",
                    showarrow=False,
                    font=dict(size=11, color='white', family='Arial Black'),
                    bgcolor=var_color, bordercolor=var_color,
                    borderwidth=2, borderpad=4, align='center'
                ))
            except:
                pass

    frames.append(go.Frame(
        name=str(pd.to_datetime(t).date()),
        data=[go.Scatter(x=x, y=y)],
        layout=go.Layout(annotations=(
            base_annotations + visible_badges + [tip_annot] + price_indicators
        ))
    ))

fig.frames = frames


In [None]:
# 7) Slider
slider_steps = [dict(
    args=[[fr.name], dict(mode='immediate',
                          frame=dict(duration=50, redraw=True),
                          transition=dict(duration=0))],
    label=fr.name, method='animate'
) for fr in frames]

sliders = [dict(
    active=len(frames)-1,
    currentvalue=dict(prefix='Date: ', font=dict(size=14, color='#2C3E50', family='Arial Black'),
                         visible=True, xanchor='left'),
    pad=dict(t=60, b=10), steps=slider_steps,
    x=0.05, xanchor='left', y=0, yanchor='top', len=0.85,
    bgcolor='#E8E8E8', bordercolor='#2C3E50', borderwidth=2,
    activebgcolor='#3498DB', font=dict(size=10)
)]

In [None]:
# 8) Play/Pause/Reset buttons
play_pause = [dict(
    type='buttons', direction='left',
    x=0.5, xanchor='left', y=1.38, yanchor='middle',
    buttons=[
        dict(label='▶️', method='animate',
             args=[None, dict(frame=dict(duration=50, redraw=True),
                              fromcurrent=True, transition=dict(duration=0))]),
        dict(label='⏸️', method='animate',
             args=[[None], dict(mode='immediate',
                                frame=dict(duration=0, redraw=False),
                                transition=dict(duration=0))]),
        dict(label='🔄', method='animate',
             args=[[frame_dates[0]], dict(mode='immediate',
                                          frame=dict(duration=0, redraw=True),
                                          transition=dict(duration=0))])
    ],
    bgcolor='#ECF0F1', bordercolor='#2C3E50', borderwidth=2,
    font=dict(size=11, color='#2C3E50'), pad=dict(l=3, r=3, t=3, b=3)
)]


In [None]:
# 9) Layout + Show
fig.update_layout(
    title={
        'text': f"Rolling Correlation (BTC vs S&P 500) — {ROLLING_WINDOW}-Day Window<br>"
                f"<sub>{corr_df.index.min().strftime('%d %b %Y')} → "
                f"{corr_df.index.max().strftime('%d %b %Y')}</sub>",
        'x': 0.5, 'xanchor': 'center', 'y': 0.90,
        'font': {'size': 20, 'color': '#2C3E50', 'family': 'Arial Black'}
    },
    autosize=False, width=1280, height=720,
    plot_bgcolor='#FFFFFF', paper_bgcolor='#FFFFFF',
    margin=dict(l=90, r=110, t=240, b=160),
    
    xaxis=dict(
        title='Date', titlefont=dict(size=13, color='#2C3E50'),
        showgrid=True, gridwidth=1, gridcolor='#E8E8E8',
        showline=True, linewidth=2, linecolor='#2C3E50',
        rangeslider=dict(visible=True, thickness=0.05), type='date'
    ),
    yaxis=dict(
        title='Correlation', titlefont=dict(color='#2C3E50', size=13),
        tickfont=dict(color='#2C3E50', size=11),
        showgrid=True, gridcolor='#F0F0F0',
        showline=True, linewidth=2, linecolor='#2C3E50',
        autorange=False, range=Y_RANGE
    ),
    
    legend=dict(x=0, y=1.5, xanchor='left', yanchor='top',
                bgcolor='rgba(255,255,255,0.9)', bordercolor='#2C3E50',
                borderwidth=2, font=dict(size=12)),
    shapes=shapes,
    updatemenus=play_pause,
    sliders=sliders,
    hovermode='x unified',
    uirevision='lock-axes'
)

fig.write_html(OUTPUT_HTML / '2_btc_spx_rolling_corr_dynamic.html')  # Save HTML
fig.show()
print('✅ Correlation graph with DYNAMIC variations (updates during animation)')

In [None]:
# OPEN ROLLING CORRELATION CHART IN BROWSER
# =============================================================================

#rolling_corr_html = OUTPUT_HTML / 'btc_spx_rolling_corr_dynamic.html'

#if rolling_corr_html.exists():
#    open_html_in_browser(rolling_corr_html)
#else:
#    print(f"❌ File not found: {rolling_corr_html}")
#    print("   Please run the visualization cell first")



---

### Bitcoin Dominance & Price (Dual-Axis)






In [None]:
# 1) Prepare data
# Use df_dominance from Cell 9 (includes MA_30)
df_plot = df_dominance.copy()  # Already has BTC_Dominance, BTC_Price, MA_30
# MA_30 already computed in Cell 9
dom_min, dom_max = df_plot['BTC_Dominance'].min(), df_plot['BTC_Dominance'].max()
price_min, price_max = df_plot['BTC_Price'].min(), df_plot['BTC_Price'].max()
dom_rng   = max(1e-6, float(dom_max - dom_min))
price_rng = max(1e-6, float(price_max - price_min))
dom_pad, price_pad = 0.06*dom_rng, 0.08*price_rng
Y1_RANGE = [round(dom_min - dom_pad, 2), round(dom_max + dom_pad, 2)]
Y2_RANGE = [price_min - price_pad,       price_max + price_pad]
ENDPOINT_CLIP = False


In [None]:
# 2) Build static event shapes + annotations (reuse your earlier event pipeline)
shapes, base_annotations = [], []
if 'EVENT_DATES' in globals():  # Use configured events
    for i, (event_date, event_label, color, sentiment) in enumerate(EVENT_DATES):
        # Vertical dashed line
        shapes.append(dict(
            type='line', xref='x', yref='paper', x0=event_date, x1=event_date, y0=0, y1=1,
            line=dict(color=color, width=2, dash='dash')
        ))
        # Rotated label above plot
        base_annotations.append(dict(
            x=event_date, y=1.08, xref='x', yref='paper', text=event_label, showarrow=False,
            textangle=-60, font=dict(size=9, color=color, family='Arial'),
            xanchor='right', yanchor='bottom'
        ))
        # Optional period shading between consecutive events
        if i < len(EVENT_DATES) - 1:
            next_event_date = EVENT_DATES[i + 1][0]
            shapes.append(dict(
                type='rect', xref='x', yref='paper', x0=event_date, x1=next_event_date,
                y0=0, y1=1, fillcolor=color, opacity=0.08, layer='below', line_width=0
            ))




In [None]:
# 3) Choose frame dates (sample for performance if needed)
all_dates = df_plot.index.to_list()
# Use every 7th day plus the last date for smoother performance; adapt as needed
frame_dates = all_dates[::7]
if frame_dates[-1] != all_dates[-1]:
    frame_dates.append(all_dates[-1])


In [None]:
# 4) Initial data (first frame span)
t0 = frame_dates[0]
init = df_plot.loc[:t0]
x0 = init.index

fig = go.Figure()

# Trace 0: Dominance (left Y)
fig.add_trace(go.Scatter(
    x=x0, y=init['BTC_Dominance'], name='BTC Dominance',
    line=dict(color="#00340A", width=2.5), yaxis='y',
    hovertemplate='%{x|%d %b %Y}<br>Dominance: %{y:.2f}%',
    cliponaxis=ENDPOINT_CLIP
))

# Trace 1: MA 30 (left Y)
fig.add_trace(go.Scatter(
    x=x0, y=init['MA_30'], name='30-Day MA',
    line=dict(color="#010101", width=1.5, dash='dot'), yaxis='y', opacity=0.6,
    hovertemplate='%{x|%d %b %Y}<br>MA 30: %{y:.2f}%',
    cliponaxis=ENDPOINT_CLIP
))

# Trace 2: Price (right Y)
fig.add_trace(go.Scatter(
    x=x0, y=init['BTC_Price'], name='BTC Price (USD)',
    line=dict(color="#DA1A1A", width=2.5), yaxis='y2',
    hovertemplate='%{x|%d %b %Y}<br>Price: $%{y:,.0f}',
    cliponaxis=ENDPOINT_CLIP
))



In [None]:
# 5) Build animation frames with tip tags (dynamic annotations)
frames = []
for t in frame_dates:
    span = df_plot.loc[:t]
    x = span.index

    # Current tip values
    dom_t = float(span['BTC_Dominance'].iloc[-1])
    price_t = float(span['BTC_Price'].iloc[-1])

    # Dynamic tip annotations (arrow badges)
    tip_annotations = [
        dict(
            x=t, y=dom_t, xref='x', yref='y',
            text=f'{dom_t:.2f}%', showarrow=True, arrowhead=2, ax=40, ay=-20,
            font=dict(size=12, color='white', family='Arial Black'),
            bgcolor='#00340A', bordercolor='#00340A', borderwidth=1
        ),
        dict(
            x=t, y=price_t, xref='x', yref='y2',
            text=f'${price_t:,.0f}', showarrow=True, arrowhead=2, ax=40, ay=-20,
            font=dict(size=12, color='white', family='Arial Black'),
            bgcolor='#DA1A1A', bordercolor='#DA1A1A', borderwidth=1
        )
    ]

    frames.append(go.Frame(
        name=str(pd.to_datetime(t).date()),
        data=[
            go.Scatter(x=x, y=span['BTC_Dominance']),
            go.Scatter(x=x, y=span['MA_30']),
            go.Scatter(x=x, y=span['BTC_Price'])
        ],
        layout=go.Layout(annotations=(base_annotations + tip_annotations))
    ))

fig.frames = frames

In [None]:
# 6) Slider steps (one per frame)
slider_steps = []
for fr in frames:
    slider_steps.append(dict(
        args=[[fr.name], dict(mode='immediate',
                              frame=dict(duration=80, redraw=True),
                              transition=dict(duration=0))],
        label=fr.name,
        method='animate'
    ))

sliders = [dict(
    active=len(frames) - 1,
    currentvalue=dict(prefix='Date: ', font=dict(size=14, color='#2C3E50', family='Arial Black')),
    pad=dict(t=60),
    steps=slider_steps,
    x=0.05, xanchor='left', y=0, yanchor='top', len=0.9
)]


In [None]:
# 7) Play/Pause buttons
play_pause = [dict(
    type='buttons', direction='left', x=-0.2, y=2, xanchor='left', yanchor='top',
    buttons=[
        dict(label='Play', method='animate',
             args=[None, dict(frame=dict(duration=20, redraw=True),
                              fromcurrent=True, transition=dict(duration=0))]),
        dict(label='Pause', method='animate',
             args=[[None], dict(mode='immediate',
                                frame=dict(duration=0, redraw=False),
                                transition=dict(duration=0))])
    ],
    bgcolor='#FFFFFF', bordercolor='#2C3E50', borderwidth=2,
    font=dict(size=12, color='#2C3E50')
)]



In [None]:
# 8) Layout
fig.update_layout(
    title={'text': f'Bitcoin Dominance & Price (Animated Timeline)<br>'
                   f'{df_plot.index.min().strftime("%d %b %Y")} → '
                   f'{df_plot.index.max().strftime("%d %b %Y")}',
           'x': 0.5, 'xanchor': 'center', 'y': 0.90,
           'font': {'size': 24, 'color': '#2C3E50', 'family': 'Arial Black'}},
    autosize=True,  # prevent autosize toggling during redraws
    width=1152, height=648,  
    plot_bgcolor='#FFFFFF', paper_bgcolor='#FFFFFF',
    margin=dict(l=90, r=110, t=240, b=160),

    xaxis=dict(
        title='Date', titlefont=dict(size=15, color='#2C3E50'),
        showgrid=True, gridwidth=1, gridcolor='#E8E8E8',
        showline=True, linewidth=2, linecolor='#2C3E50',
        rangeslider=dict(visible=True, thickness=0.045), type='date'
    ),

    # LOCKED ranges (no autoscale across frames)
    yaxis=dict(
        title='BTC Dominance (%)',
        titlefont=dict(color='#00340A', size=14),
        tickfont=dict(color='#00340A', size=12),
        showgrid=True, gridcolor='#F0F0F0',
        showline=True, linewidth=2, linecolor='#00340A',
        autorange=False, range=Y1_RANGE
    ),
    yaxis2=dict(
        title='BTC Price (USD)',
        titlefont=dict(color='#DA1A1A', size=14),
        tickfont=dict(color='#DA1A1A', size=12),
        overlaying='y', side='right',
        showline=True, linewidth=2, linecolor='#DA1A1A',
        autorange=False, range=Y2_RANGE
    ),

    legend=dict(x=0.83, y=1.95, xanchor='left', yanchor='top',
                bgcolor='rgba(255,255,255,0.95)', bordercolor='#2C3E50',
                borderwidth=2, font=dict(size=12)),
    shapes=shapes,
    updatemenus=play_pause,
    sliders=sliders,
    hovermode='x unified',
    uirevision='lock-axes'  # preserve manual zoom/position between frame updates
)

In [None]:
# 9) Export
fig.write_html(OUTPUT_HTML / '3_btc_dominance_price_animated.html')  # Save HTML

---

### Animated Regression Analysis




##### Scatter Plots with Evolving Regression Lines

**Interpretation**:
- **β > 1**: BTC moves more than the comparison asset (amplified)
- **β = 1**: BTC moves proportionally
- **β < 1**: BTC moves less (dampened)
- **R² > 0.5**: Strong relationship


In [None]:
# --- 1) Data Preparation ---
# Already computed in Cell 10, but we'll verify it exists
if 'logret_regression' not in locals():
    print("⚠️ Computing log returns (should be from Cell 10)...")
    assets_cols = {'BTC': 'BTC_Price', 'S&P 500': 'Close_^GSPC', 'Gold': 'Close_GLD'}
    available = {name: col for name, col in assets_cols.items() if col in df.columns}
    prices = df[[col for col in available.values()]].dropna()
    logret_regression = np.log(prices).diff().dropna()
    logret_regression.columns = [name for name in available.keys()]

# Use the already computed data
logret = logret_regression.copy()


In [None]:
# --- 2) Animation Function ---
def create_animated_scatter(x_name, y_name, title, filename):
    """
    Create animated scatter plot with evolving regression line
    
    Args:
        x_name: Independent variable name (asset)
        y_name: Dependent variable name (asset)
        title: Chart title
        filename: Output HTML filename
    """
    
    # --- Frame Selection ---
    all_dates = logret.index.to_list()
    frame_dates = all_dates[::14]  # Every 14 days for smoother animation
    if frame_dates[-1] != all_dates[-1]:
        frame_dates.append(all_dates[-1])  # Ensure last date included
    
    # --- Axis Ranges (fixed for stability) ---
    x_all, y_all = logret[x_name].dropna(), logret[y_name].dropna()
    x_pad = (x_all.max() - x_all.min()) * 0.08
    y_pad = (y_all.max() - y_all.min()) * 0.08
    X_RANGE = [float(x_all.min() - x_pad), float(x_all.max() + x_pad)]
    Y_RANGE = [float(y_all.min() - y_pad), float(y_all.max() + y_pad)]
    
    # --- Event Background Shapes ---
    shapes = []
    for i, (date_i, label_i, color_i, sent_i) in enumerate(EVENT_DATES):
        # Vertical event line
        shapes.append(dict(
            type='line', xref='x', yref='paper', 
            x0=date_i, x1=date_i, y0=0, y1=1,
            line=dict(color=color_i, width=2, dash='dash')
        ))
        # Colored period rectangle
        if i < len(EVENT_DATES) - 1:
            date_j = EVENT_DATES[i + 1][0]
            shapes.append(dict(
                type='rect', xref='x', yref='paper',
                x0=logret.index.min() if i == 0 else date_i, 
                x1=date_j, y0=0, y1=1,
                fillcolor=color_i, opacity=0.08, 
                layer='below', line_width=0
            ))
    
    # --- Initial Frame (t0) ---
    t0 = frame_dates[0]
    mask0 = logret.index <= t0
    init_data = logret[mask0]
    x0, y0 = init_data[x_name].values, init_data[y_name].values
    colors0 = [get_event_color(d, EVENT_DATES) for d in init_data.index]
    
    # Create figure
    fig = go.Figure()
    
    # Trace 0: Scatter points
    fig.add_trace(go.Scatter(
        x=x0, y=y0, mode='markers',
        marker=dict(size=5, color=colors0, opacity=0.5),
        hovertemplate=f"{x_name}: %{{x:.4f}}<br>{y_name}: %{{y:.4f}}<extra></extra>",
        name='Data Points', showlegend=False
    ))
    
    # Trace 1: Regression line (MUST exist in initial figure)
    reg0 = compute_regression(x0, y0)
    if reg0:
        x_line = np.array(X_RANGE)
        y_line = reg0['slope'] * x_line + reg0['intercept']
        fig.add_trace(go.Scatter(
            x=x_line, y=y_line, mode='lines',
            line=dict(color='#E74C3C', width=4, dash='dash'),  # Thick red line
            name='Regression Line',
            hovertemplate=f"y = {reg0['slope']:.3f}x + {reg0['intercept']:.3f}<extra></extra>",
            showlegend=True
        ))
    else:
        # Placeholder if not enough data yet
        fig.add_trace(go.Scatter(
            x=[], y=[], mode='lines',
            line=dict(color='#E74C3C', width=4),
            name='Regression Line', showlegend=True
        ))
    
    # --- Build Animation Frames ---
    frames = []
    for idx, t in enumerate(frame_dates):
        mask = logret.index <= t
        frame_data = logret[mask]
        
        x_t = frame_data[x_name].values
        y_t = frame_data[y_name].values
        colors_t = [get_event_color(d, EVENT_DATES) for d in frame_data.index]
        
        # Compute regression for this frame
        reg = compute_regression(x_t, y_t)
        
        # Frame trace 0: Updated scatter points
        scatter_trace = go.Scatter(
            x=x_t, y=y_t, mode='markers',
            marker=dict(size=5, color=colors_t, opacity=0.5)
        )
        
        # Frame trace 1: Updated regression line
        if reg:
            x_line = np.array(X_RANGE)
            y_line = reg['slope'] * x_line + reg['intercept']
            line_trace = go.Scatter(
                x=x_line, y=y_line, mode='lines',
                line=dict(color='#E74C3C', width=4, dash='dash'),
                hovertemplate=f"y = {reg['slope']:.3f}x + {reg['intercept']:.3f}<extra></extra>"
            )
        else:
            line_trace = go.Scatter(
                x=[], y=[], mode='lines',
                line=dict(color='#E74C3C', width=4)
            )
        
        # Annotation with regression stats
        annotations = []
        if reg:
            progress = ((idx + 1) / len(frame_dates)) * 100
            annotations.append(dict(
                x=0.05, y=0.95, xref='paper', yref='paper',
                text=(f"<b>📅 {t.strftime('%d %b %Y')}</b> "
                      f"<span style='color:#3498DB;'>({progress:.0f}%)</span><br>"
                      f"<b style='font-size:15px; color:#E74C3C;'>"
                      f"y = {reg['slope']:.3f}x + {reg['intercept']:.3f}</b><br>"
                      f"<b>β = {reg['slope']:.3f}</b><br>"
                      f"R² = {reg['r_squared']:.3f}<br>"
                      f"p = {reg['p_value']:.2e}<br>"
                      f"n = {reg['n']}"),
                showarrow=False,
                font=dict(size=12, color='#2C3E50', family='Arial'),
                bgcolor='rgba(255,255,255,0.95)',
                bordercolor='#E74C3C', borderwidth=3, borderpad=6,
                align='left', xanchor='left', yanchor='top'
            ))
        
        # Add frame with BOTH traces (scatter + line)
        frames.append(go.Frame(
            name=str(t.date()),
            data=[scatter_trace, line_trace],
            layout=go.Layout(annotations=annotations)
        ))
    
    fig.frames = frames
    
    # --- Slider Configuration ---
    slider_steps = [dict(
        args=[[fr.name], dict(
            mode='immediate',
            frame=dict(duration=50, redraw=True),
            transition=dict(duration=0)
        )],
        label=fr.name, 
        method='animate'
    ) for fr in frames]
    
    sliders = [dict(
        active=len(frames)-1,
        currentvalue=dict(
            prefix='📍 Date: ', 
            font=dict(size=14, color='#2C3E50', family='Arial Black'),
            visible=True, xanchor='left'
        ),
        pad=dict(t=60, b=10), 
        steps=slider_steps,
        x=0.05, xanchor='left', y=0, yanchor='top', len=0.85,
        bgcolor='#E8E8E8', bordercolor='#2C3E50', borderwidth=2,
        activebgcolor='#3498DB', font=dict(size=10)
    )]
    
    # --- Play/Pause/Reset Buttons ---
    play_pause = [dict(
        type='buttons', direction='left', 
        x=0.78, xanchor='left', y=1.06, yanchor='middle',
        buttons=[
            dict(label='​☑️​', method='animate',
                 args=[None, dict(
                     frame=dict(duration=50, redraw=True),
                     fromcurrent=True, 
                     transition=dict(duration=0)
                 )]),
            dict(label='⏸️', method='animate',
                 args=[[None], dict(
                     mode='immediate', 
                     frame=dict(duration=0)
                 )]),
            dict(label='​🔁​', method='animate',
                 args=[[str(frame_dates[0].date())], dict(
                     mode='immediate',
                     frame=dict(duration=0, redraw=True)
                 )])
        ],
        bgcolor='#ECF0F1', bordercolor='#2C3E50', borderwidth=2,
        font=dict(size=11), pad=dict(l=3, r=3, t=3, b=3)
    )]
    
    # --- Final Layout ---
    fig.update_layout(
        title={
            'text': f"{title}<br><sub>Animated Regression • Red Line Shows Evolving Relationship</sub>",
            'x': 0.5, 'xanchor': 'center', 'y': 0.92,
            'font': {'size': 20, 'color': '#2C3E50', 'family': 'Arial Black'}
        },
        width=1152, height=648,
        plot_bgcolor='#F8F9FA', 
        paper_bgcolor='#FFFFFF',
        margin=dict(l=90, r=130, t=140, b=140),
        xaxis=dict(
            title=f'{x_name} Log Returns', 
            titlefont=dict(size=14, color='#2C3E50'),
            showgrid=True, gridcolor='#E8E8E8',
            zeroline=True, zerolinecolor='#34495E', zerolinewidth=2,
            autorange=False, range=X_RANGE
        ),
        yaxis=dict(
            title=f'{y_name} Log Returns', 
            titlefont=dict(size=14, color='#2C3E50'),
            showgrid=True, gridcolor='#E8E8E8',
            zeroline=True, zerolinecolor='#34495E', zerolinewidth=2,
            autorange=False, range=Y_RANGE
        ),
        shapes=shapes, 
        updatemenus=play_pause, 
        sliders=sliders,
        hovermode='closest',
        legend=dict(
            x=0.98, y=0.02, xanchor='right', yanchor='bottom',
            bgcolor='rgba(255,255,255,0.95)', 
            bordercolor='#E74C3C',
            borderwidth=2, font=dict(size=12)
        ),
        uirevision='lock'  # Prevents axis reset on frame change
    )
    
    # --- Export ---
    output_path = OUTPUT_HTML / filename
    fig.write_html(output_path)
    fig.show()
    print(f"   ✅ Saved: {output_path}")


In [None]:
# BTC vs S&P 500
if 'BTC' in logret.columns and 'S&P 500' in logret.columns:
    create_animated_scatter(
        'BTC', 'S&P 500', 
        'BTC vs S&P 500 • Correlation Evolution', 
        '4_regression_btc_sp500_animated.html'
    )


In [None]:
# BTC vs Gold
if 'BTC' in logret.columns and 'Gold' in logret.columns:
    create_animated_scatter(
        'BTC', 'Gold', 
        'BTC vs Gold • Correlation Evolution', 
        '5_regression_btc_gold_animated.html'
    )

print("\n✅ Scatter plot animations completed")


---

# 🎉 ANALYSIS COMPLETE!

---

## 📂 Generated Files

### CSV Output:
```
OUTPUTS/CSV/
└── Merged_df.csv          # Complete merged dataset
```

### HTML Visualizations:
```
OUTPUTS/HTML/
├── 1_correlation_matrix.html                   # Chart 1: Heatmap
├── 2_btc_spx_rolling_corr_dynamic.html         # Chart 2: Rolling correlation
├── 3_btc_dominance_price_animated.html         # Chart 3: Dominance
├── 4_regression_btc_sp500_animated.html        # Chart 4: BTC vs S&P 500
└── 5_regression_btc_gold_animated.html         # Chart 5: BTC vs Gold
```

---

## 🔧 How to Adapt This Notebook

### 1️⃣ Change Data Sources
Modify in **Section 1 - Path Configuration**:
```python
CSV_BTC = INPUT_DIR / "your_btc_data.csv"
CSV_CRYPTO = INPUT_DIR / "your_crypto_data.csv"
```

### 2️⃣ Adjust Date Range
Modify in **Section 1 - Analysis Parameters**:
```python
START_DATE = "2020-01-01"
END_DATE = "2025-12-31"
```

### 3️⃣ Add/Remove Assets
Modify in **Section 1 - Analysis Parameters**:
```python
TICKERS = [
    "BTC-USD",  # Bitcoin
    "^GSPC",    # S&P 500
    "GLD",      # Gold
    "ETH-USD",  # Add Ethereum
    "^DJI"      # Add Dow Jones
]
```

### 4️⃣ Customize Market Events
Modify in **Section 1 - Market Events Configuration**:
```python
MARKET_EVENTS = [
    ("2024-01-10", "Your Event", "#27AE60", "positive"),
    # Add more events here
]
```

### 5️⃣ Change Output Location
Modify in **Section 1 - Path Configuration**:
```python
OUTPUT_DIR = BASE_DIR / "MY_OUTPUTS"
```

---

## 🌍 For GitHub Users

### ✅ Repository Setup
1. **Clone** the repository
2. **Create** `DATASETS/` folder with required CSV files:
   - `btc_cap_price.csv`
   - `global_crypto_cap.csv`
3. **Run** all cells (Cell → Run All)
4. **View** generated HTML files in `OUTPUTS/HTML/`

### ✅ Cross-Platform Compatibility
- ✅ Uses `pathlib.Path` (works on Windows, Mac, Linux)
- ✅ Automatic directory creation
- ✅ No hardcoded absolute paths
- ✅ Relative path structure

### ✅ Dependencies
Install required packages:
```bash
pip install pandas numpy yfinance plotly scipy
```

Or use requirements.txt:
```bash
pip install -r requirements.txt
```

## 🙏 Acknowledgments

**Data Sources**:
- Yahoo Finance (via yfinance)
- Historical crypto market cap data

**Libraries**:
- Plotly (interactive visualizations)
- Pandas (data manipulation)
- SciPy (statistical analysis)

---

<div align="center">

### 📊 Happy Analyzing! 🚀

</div>
