# Step 1: Feature Engineering - The Foundation of Everything
# ----------------------------------------------------------
# This step determines the upper bound of model performance.
# All features must be non-forward-looking (i.e., only based on information available up to time t).

# Price-Derived Features:
# - Momentum: Past N-day returns (e.g., 7, 30, 90), Sharpe ratio over past N days.
# - Volatility: Historical volatility (e.g., 20-day rolling std), ATR (Average True Range).
# - Trend Strength: ADX (Average Directional Index), deviation from moving averages (e.g., Close / SMA(200)).
# - Relative Position: Quantile of current price in rolling window ((Close - Min(90)) / (Max(90) - Min(90))).

# On-Chain Data:
# - SOPR: Spent Output Profit Ratio (SOPR < 1 often signals a good entry point).
# - MVRV Z-Score: MVRV Z < 0 often indicates market bottom zone.
# - Network Activity: Active addresses, number of transactions, etc.

# Macro & Sentiment Features:
# - Fear & Greed Index
# - DXY (Dollar Index), S&P 500 returns and volatility

# Feature Transformations:
# - Z-score standardization
# - First and second differences (e.g., delta, acceleration)

# Final Output: For each day t, compute a feature vector X_t
# Step 2: Label Definition - Teaching the Model What to Learn
# -----------------------------------------------------------
# Each X_t should have a corresponding y_t label, representing the "desirability" of buying at time t.

# Label A: Future Price Reversal (Regression)
# y_t = - (future_price_N_days / current_price - 1)
# Goal: Encourage model to buy before price increases (negative label = good buy).

# Label B: Future Lowest Price Index (Multi-class Classification)
# y_t = argmin(future_prices_over_N_days)
# Goal: Predict the day in the next N days that represents the local minimum.

# Label C: Future Sharpe Ratio (Recommended, Regression)
# y_t = mean(future_N_day_returns) / std(future_N_day_returns)
# Goal: Predict risk-adjusted return; identifies low-volatility, high-return buy points.
# Step 3: Model Selection and Training
# ------------------------------------

# 1. Model Choice:
# - Gradient Boosting Trees (LightGBM, XGBoost, CatBoost): Strongly recommended for tabular data.
#   Pros: Handles missing values, regularization, interpretable, fast.
# - Time Series Models (LSTM, Transformer): Advanced, captures long-term dependencies. Needs more data & tuning.

# 2. Walk-Forward Validation (Time Series Friendly Training)
# - Sort data chronologically.
# - Loop training as follows:
#     1. Train: Use initial window (e.g., 2016–2020) to train LightGBM.
#     2. Predict: Predict next short period (e.g., Jan 2021) using model.
#     3. Store: Save predicted scores y_pred.
#     4. Slide window: Move training window forward by one month.
#     5. Repeat until entire backtest period is covered.

# Result: Daily y_pred sequence aligned with price data for the full period.
# Step 4: Weight Transformation & Backtesting
# -------------------------------------------

# 1. Signal Processing:
# - Convert raw y_pred into positive signal scores.
# - Use Sigmoid or modified Softmax:
#   raw_signal = 1 / (1 + exp(-y_pred))

# 2. Rolling Window Normalization:
# - For each 365-day evaluation window:
#   weights = raw_signals / sum(raw_signals)
# - (Optional) Add small epsilon to ensure strictly positive weights:
#   raw_signals += epsilon

# 3. Metric Computation:
# - Use weights and prices to calculate metrics for each window:
#   - SPD (Signal-Weighted Price Delta)
#   - SPD Percentile
#   - Win Rate

# - Final Score: Average across all rolling windows.
