# Condor Game

The goal is to anticipate how asset prices will evolve by providing not a single forecasted value, but a **full probability distribution over the future price change at multiple forecast horizons and steps.**

## Probabilistic Forecasting

Probabilistic forecasting provides **a distribution of possible future values** rather than a single point estimate, allowing for uncertainty quantification. Instead of predicting only the most likely outcome, it estimates a range of potential outcomes along with their probabilities by outputting a **probability distribution**.

A probabilistic forecast models the conditional probability distribution of a future value $(Y_t)$ given past observations $(\mathcal{H}_{t-1})$. This can be expressed as:  

$$P(Y_t \mid \mathcal{H}_{t-1})$$

where $(\mathcal{H}_{t-1})$ represents the historical data up to time $(t-1)$. Instead of a single prediction $(\hat{Y}_t)$, the model estimates a full probability distribution $(f(Y_t \mid \mathcal{H}_{t-1}))$, which can take different parametric forms, such as a Gaussian:

$$Y_t \mid \mathcal{H}_{t-1} \sim \mathcal{N}(\mu_t, \sigma_t^2)$$

where $(\mu_t)$ is the predicted mean and $(\sigma_t^2)$ represents the uncertainty in the forecast.

Probabilistic forecasting can be handled through various approaches, including **variance forecasters**, **quantile forecasters**, **interval forecasters** or **distribution forecasters**, each capturing uncertainty differently.

In this notebook, we try to forecast the target location by a gaussian density function (or a mixture), the model output follows the form:

```python
[
    {
        "step": (k + 1) * step,
        "prediction": {
              "density": {
                            "name": "normal",
                            "params": {"loc": y_mean, "scale": y_var}
                          },
              "weight": weight
              }, ...
    }
    for k in range(0, horizon // step)
]
```

A **mixture density**, such as the gaussion mixture $\sum_{i=1}^{K} w_i \mathcal{N}(Y_t | \mu_i, \sigma_i^2)$ allows for capturing multi-modal distributions and approximate more complex distributions.

![proba_forecast_v3](https://github.com/Tarandro/image_broad/blob/main/proba_forecast_v3.png?raw=true)


**Probabilistic Forecasting** is particularly valuable in supply chain management. Below are some interesting resources for a deeper understanding:  

- [Probabilistic Forecasting](https://www.lokad.com/probabilistic-forecasting-definition/) – Overview of probabilistic forecasting and its applications.  
- [Quantile Forecasting](https://www.lokad.com/quantile-regression-time-series-definition/) – Explanation of quantile-based forecasting methods.  
- **Evaluation Metrics:**  
  - [Continuous Ranked Probability Score (CRPS)](https://www.lokad.com/continuous-ranked-probability-score/)  
  - [Cross-Entropy](https://www.lokad.com/cross-entropy-definition/)  
  - [Pinball Loss](https://www.lokad.com/pinball-loss-function-definition/)

In [1]:
import numpy as np
import pandas as pd
import os
from tqdm import tqdm
from datetime import datetime, timezone, timedelta

from condorgame.price_provider import shared_pricedb
from condorgame.tracker import TrackerBase
from condorgame.tracker_evaluator import TrackerEvaluator
from condorgame.examples.utils import load_test_prices_once, load_initial_price_histories_once, visualize_price_data, count_evaluations
from condorgame.debug.plots import plot_quarantine, plot_prices, plot_scores

## What You Must Predict

Trackers must predict the **probability distribution of price changes**, defined as:

$$
r_{t,k} = P_t - P_{t-k}
$$

For each defined step **$k$** (e.g., 5 minutes, 1 hour, …), your tracker must return a full **probability density function (PDF)** over the future price change **$r_{t,k}$**.

# Gaussian Step Tracker

A simple benchmark that predicts future price changes by assuming they follow a Gaussian (normal) distribution estimated from recent historical data. It models the relative price change over each prediction step.

### **Key Ideas**  

- Historical prices sampled at 5-minute (300s) resolution are converted into returns: $r_{t} = P_t - P_{t-1}$
- The tracker estimates:
    - Drift: mean historical return 𝜇
    - Volatility: standard deviation historical returns 𝜎
- For each future step 𝑘, it outputs a normal density:
$$r_{t,k} \sim \mathcal{N}\!\left(\frac{k}{300}\mu,\; \sqrt{\frac{k}{300}}\sigma\right)$$

> The coefficient $\frac{k}{300}$ represents the ratio of the forecast step length to the base 5-minute (300s) interval, scaling the 5-minute return distribution to the target prediction step.

Each density prediction must comply with the [density_pdf](https://github.com/microprediction/densitypdf/blob/main/densitypdf/__init__.py) specification.

In [2]:
class GaussianStepTracker(TrackerBase):
    """
    A benchmark tracker that models *future incremental returns* as Gaussian-distributed.

    For each forecast step, the tracker returns a normal distribution
    r_{t,step} ~ N(a · mu, √a · sigma) where:
        - mu    = mean historical return
        - sigma = std historical return
        - a = (step / 300) represents the ratio of the forecast step duration to the historical 5-minute return interval.

    Multi-resolution forecasts (5min, 1h, 6h, 24h, ...)
    are automatically handled by `TrackerBase.predict_all()`,
    which calls the `predict()` method once per step size.

    /!/ This is not a price-distribution; it is a distribution over 
    incremental returns between consecutive steps /!/
    """
    def __init__(self):
        super().__init__()

    def predict(self, asset: str, horizon: int, step: int):
        """
        Produce a sequence of incremental return distributions
        for a single (asset, horizon, step) configuration.

        This method is called automatically by `TrackerBase.predict_all()`
        for each step resolution requested by the game.
        """

        # Retrieve recent historical prices sampled at 5-minute resolution
        resolution=300
        pairs = self.prices.get_prices(asset, days=3, resolution=resolution)
        if not pairs:
            return []

        _, past_prices = zip(*pairs)

        if len(past_prices) < 3:
            return []

        # Compute historical incremental returns (price differences)
        returns = np.diff(past_prices)

        # Estimate drift (mean return) and volatility (std dev of returns)
        mu = float(np.mean(returns))
        sigma = float(np.std(returns))

        if sigma <= 0:
            return []

        num_segments = horizon // step

        # Construct one predictive distribution per future time step.
        # Each distribution models the incremental return over a `step`-second interval.
        #
        # IMPORTANT:
        # - The returned objects must strictly follow the `density_pdf` specification.
        # - Each entry corresponds to the return between t + (k−1)·step and t + k·step.
        #
        # We use a single-component Gaussian mixture for simplicity:
        #   r_{t,k} ~ N( (step / 300) · μ , sqrt(step / 300) · σ )
        #
        # where μ and σ are estimated from historical 5-minute returns.
        distributions = []
        for k in range(1, num_segments + 1):
            distributions.append({
                "step": k * step,                      # Time offset (in seconds) from forecast origin
                "type": "mixture",
                "components": [{
                    "density": {
                        "type": "builtin",             # Note: use 'builtin' distributions instead of 'scipy' for speed
                        "name": "norm",  
                        "params": {
                            "loc": (step/resolution) * mu, 
                            "scale": np.sqrt(step/resolution) * sigma
                            }
                    },
                    "weight": 1
                }]
            })

        return distributions

## Configurations

In [None]:
##########
# For each asset and historical timestamp, generate density forecasts
# over a fixed forecast horizon (e.g. 24h or 1h) at multiple temporal
# resolutions and evaluate them against realized outcomes.

# Assets to evaluate
assets = ["BTC", "SOL"] # Supported assets: "BTC", "SOL", "ETH", "XAU"

###
# Forecast configuration (in seconds)
# Each profile defines:
# - a forecast horizon
# - a set of step resolutions
# - how often predictions are triggered

FORECAST_PROFILES = {
    "24h": {
        "horizon": 24 * 3600,  # 24 hours
        # Multi-resolution forecast grid
        # All forecasts span the same horizon but differ in temporal granularity.
        "steps": [
                    300,       # "5min"
                    3600,      # "1hour"
                    6 * 3600,  # "6hour"
                    24 * 3600, # "24hour"
        ],
        "interval": 3600,  # triggered every hour
    },
    "1h": {
        "horizon": 1 * 3600,  # 1 hour
        "steps": [
                    60,       # "1min"
                    60 * 5,   # "5min"
                    60 * 15,  # "15min"
                    60 * 30,  # "30min"
                    3600,     # "1hour"
        ],
        "interval": 60 * 12,  # triggered every 12 minutes
    },
}

# Select which forecast profile to evaluate
ACTIVE_HORIZON = "24h"  # options: "24h", "1h"

HORIZON = FORECAST_PROFILES[ACTIVE_HORIZON]["horizon"]
STEPS = FORECAST_PROFILES[ACTIVE_HORIZON]["steps"]
INTERVAL = FORECAST_PROFILES[ACTIVE_HORIZON]["interval"]

# Base directory where all evaluation results will be stored
base_dir_results = "results"
os.makedirs(base_dir_results, exist_ok=True)

# End timestamp for the test data
# evaluation_end: datetime = datetime.now(timezone.utc)
evaluation_end: datetime = datetime(2025, 11, 15, 00, 00, 00, tzinfo=timezone.utc)

# Number of days of test data to load
# Note: the last `horizon` seconds of the time series will not be scored
days = 5

# Number of days of historical data used as warm-up before evaluation.
# This history is used only to initialize the tracker and is not scored.
days_history = 30

## Data

In [4]:
## Load the last N days of price data (test period)
test_asset_prices = load_test_prices_once(
    assets, shared_pricedb, evaluation_end, days=days
)
# test_asset_prices : dict : {asset -> [(timestamp, price), ...]} used for evaluation.

## Provide the tracker with initial historical data (for the first tick):
## load prices from the last H days up to N days ago
initial_histories = load_initial_price_histories_once(
    assets, shared_pricedb, evaluation_end, days_history=days_history, days_offset=days
)
# initial_histories : dict : {asset -> [(timestamp, price), ...]} used as warm-up history.

In [5]:
visualize_price_data(
    history_data=initial_histories, test_data=test_asset_prices,
    selected_assets=None, show_graph=True
)

Dataset:


Unnamed: 0,asset,ts,price,split,time
0,BTC,1760140800,112725.082912,history,2025-10-11 00:00:00+00:00
1,BTC,1760140860,112640.608677,history,2025-10-11 00:01:00+00:00
2,BTC,1760140920,112518.000000,history,2025-10-11 00:02:00+00:00
3,BTC,1760140980,112899.083463,history,2025-10-11 00:03:00+00:00
4,BTC,1760141040,113015.138623,history,2025-10-11 00:04:00+00:00
...,...,...,...,...,...
100795,SOL,1763164500,138.768610,test,2025-11-14 23:55:00+00:00
100796,SOL,1763164560,138.620682,test,2025-11-14 23:56:00+00:00
100797,SOL,1763164620,138.604360,test,2025-11-14 23:57:00+00:00
100798,SOL,1763164680,138.554917,test,2025-11-14 23:58:00+00:00


## Run live simulation on historic data

In [6]:
# Setup tracker + evaluator
tracker_evaluator = TrackerEvaluator(GaussianStepTracker())

for asset, history_price in test_asset_prices.items():

    # First tick: initialize historical data
    tracker_evaluator.tick({asset: initial_histories[asset]})

    prev_ts = 0
    predict_count = 0
    pbar = tqdm(desc=f"Evaluating {asset}", total=count_evaluations(history_price, HORIZON, INTERVAL), unit="eval")
    for ts, price in history_price:
        # Feed the new tick
        tracker_evaluator.tick({asset: [(ts, price)]})

        # Evaluate prediction every hour (ts is in second)
        if ts - prev_ts >= INTERVAL:
            prev_ts = ts
            predictions_evaluated = tracker_evaluator.predict(asset, HORIZON, STEPS)

            # Quarantine mechanism:
            # - Predictions are not scored immediately. Each prediction is placed in a quarantine 
            #   until sufficient future price data (up to the full horizon ticks) becomes available.
            # - Predictions issued within the final `horizon` seconds of the
            #   time series cannot be scored, as future observations are unavailable.

            if predictions_evaluated:
                pbar.update(1)

            # Periodically display results
            if predictions_evaluated and predict_count % 20 == 0:
                pbar.write(
                    f"[{asset}] avg norm CRPS={tracker_evaluator.overall_crps_score_asset(asset):.4f} | "
                    f"recent={tracker_evaluator.recent_crps_score_asset(asset):.4f}"
                )
            predict_count += 1
    
    pbar.write(
            f"[{asset}] avg norm CRPS={tracker_evaluator.overall_crps_score_asset(asset):.4f} | "
            f"recent={tracker_evaluator.recent_crps_score_asset(asset):.4f}"
        )
    
    pbar.close()
    print()

tracker_name = tracker_evaluator.tracker.__class__.__name__
print(f"\nTracker {tracker_name}:"
      f"\nFinal average normalized crps score: {tracker_evaluator.overall_crps_score():.4f}")

current_results_dir = tracker_evaluator.to_json(horizon=HORIZON, steps=STEPS,
                                                interval=INTERVAL, base_dir=base_dir_results)

# Plot scoring timeline
timestamped_scores = tracker_evaluator.scores
print("\n(Note - Scores appear after quarantine: a score at time t evaluates a forecast issued at (t - horizon))")
plot_scores(timestamped_scores)

Evaluating BTC:  18%|█▊        | 17/96 [00:11<00:54,  1.45eval/s]

[BTC] avg norm CRPS=21.1432 | recent=21.1432


Evaluating BTC:  39%|███▊      | 37/96 [00:25<00:40,  1.46eval/s]

[BTC] avg norm CRPS=20.8594 | recent=20.8594


Evaluating BTC:  59%|█████▉    | 57/96 [00:39<00:29,  1.31eval/s]

[BTC] avg norm CRPS=21.0428 | recent=21.0428


Evaluating BTC:  80%|████████  | 77/96 [00:54<00:13,  1.37eval/s]

[BTC] avg norm CRPS=22.6941 | recent=22.6941


Evaluating BTC: 100%|██████████| 96/96 [01:07<00:00,  1.41eval/s]


[BTC] avg norm CRPS=24.9077 | recent=24.9077



Evaluating SOL:  18%|█▊        | 17/96 [00:12<00:56,  1.40eval/s]

[SOL] avg norm CRPS=23.5509 | recent=23.5509


Evaluating SOL:  39%|███▊      | 37/96 [00:25<00:42,  1.39eval/s]

[SOL] avg norm CRPS=23.0926 | recent=23.0926


Evaluating SOL:  59%|█████▉    | 57/96 [00:40<00:28,  1.36eval/s]

[SOL] avg norm CRPS=22.7689 | recent=22.7689


Evaluating SOL:  80%|████████  | 77/96 [00:54<00:13,  1.43eval/s]

[SOL] avg norm CRPS=23.3491 | recent=23.3491


Evaluating SOL: 100%|██████████| 96/96 [01:07<00:00,  1.41eval/s]


[SOL] avg norm CRPS=24.6447 | recent=24.6447


Tracker GaussianStepTracker:
Final average normalized crps score: 24.7762
[✔] Tracker results saved to results\2025-11-11T00-00-00_to_2025-11-14T23-00-00\GaussianStepTracker_h86400.json

(Note - Scores appear after quarantine: a score at time t evaluates a forecast issued at (t - horizon))


In [None]:
## Density forecast over returns (for the last asset and last prediction)
plot_quarantine(asset, predictions_evaluated[0], step=STEPS[0], prices=tracker_evaluator.tracker.prices, mode="direct")

In [8]:
## Return forecast mapped into price space (for the last asset and last quarantine prediction)
print("Normalized CRPS score:", tracker_evaluator.scores[asset][-1][1])
plot_quarantine(asset, predictions_evaluated[0], step=STEPS[0], prices=tracker_evaluator.tracker.prices, mode="incremental", lookback_seconds=HORIZON/4)

Normalized CRPS score: 27.76472495941874


# Tracker Comparison

In [9]:
from condorgame.examples.utils import load_all_results, plot_tracker_comparison

In [10]:
df_all = load_all_results(current_results_dir, horizon=HORIZON)
df_all

Directory: results\2025-11-11T00-00-00_to_2025-11-14T23-00-00\*h86400.json
[✔] Found 1 files:
   - GaussianStepTracker_h86400.json


Unnamed: 0,tracker,asset,horizon,ts,score,time
0,GaussianStepTracker,BTC,86400,1762819200,19.513205,2025-11-11 00:00:00+00:00
1,GaussianStepTracker,BTC,86400,1762822800,18.411285,2025-11-11 01:00:00+00:00
2,GaussianStepTracker,BTC,86400,1762826400,18.761005,2025-11-11 02:00:00+00:00
3,GaussianStepTracker,BTC,86400,1762830000,19.957813,2025-11-11 03:00:00+00:00
4,GaussianStepTracker,BTC,86400,1762833600,20.276069,2025-11-11 04:00:00+00:00
...,...,...,...,...,...,...
187,GaussianStepTracker,SOL,86400,1763146800,29.329598,2025-11-14 19:00:00+00:00
188,GaussianStepTracker,SOL,86400,1763150400,29.595340,2025-11-14 20:00:00+00:00
189,GaussianStepTracker,SOL,86400,1763154000,28.696314,2025-11-14 21:00:00+00:00
190,GaussianStepTracker,SOL,86400,1763157600,28.808491,2025-11-14 22:00:00+00:00


In [11]:
# Tracker comparison all assets (A lower CRPS score reflects more accurate predictions)
# Scores appear after quarantine: a score at time t evaluates a forecast issued at (t - horizon)
plot_tracker_comparison(df_all)

In [12]:
plot_tracker_comparison(df_all, 'SOL')

In [13]:
plot_tracker_comparison(df_all, 'BTC')