# Financial-IA — Latent Market Intelligence Demo

**End-to-end demo of the Strate IV PPO agent** operating in latent space learned by Fin-JEPA.

This notebook:
1. Installs dependencies and clones the repo
2. Downloads pre-trained checkpoints (PPO agent + trajectory buffer)
3. Runs the agent on held-out evaluation episodes
4. Visualizes regime switching, position management, and PnL vs Buy & Hold

**No training required** — inference only (~30 seconds on CPU, ~5s on GPU).

---
> Architecture: Spherical VQ-VAE → Fin-JEPA (Mamba-2) → Stochastic Predictor → PPO Agent  
> Paper reference: LeCun (2022) *A Path Towards Autonomous Machine Intelligence* — JEPA framework

## 1. Setup

In [None]:
# Install dependencies
!pip install -q torch pytorch-lightning tslearn numpy pandas dacite pyyaml einops gymnasium stable-baselines3 matplotlib

In [None]:
import os

# Clone the repo (skip if already cloned)
if not os.path.exists('World-IA-Finance'):
    !git clone https://github.com/ElMonstroDelBrest/World-IA-Finance.git

os.chdir('World-IA-Finance')
print('Working directory:', os.getcwd())

## 2. Download Pre-trained Checkpoints

In [None]:
import urllib.request
import zipfile
from pathlib import Path

BASE_URL = "https://github.com/ElMonstroDelBrest/World-IA-Finance/releases/download/v1.0.0"

def download(url, dest):
    dest = Path(dest)
    if dest.exists():
        print(f'  {dest} already exists, skipping.')
        return
    dest.parent.mkdir(parents=True, exist_ok=True)
    print(f'  Downloading {dest.name}...')
    urllib.request.urlretrieve(url, dest)
    print(f'  Done ({dest.stat().st_size / 1e6:.1f} MB)')

# PPO agent checkpoint
print('Downloading PPO agent...')
download(
    f"{BASE_URL}/ppo_strate_iv_1000000_steps.zip",
    "checkpoints/strate_iv/ppo_strate_iv_1000000_steps.zip"
)

# Pre-computed trajectory buffer (JEPA latent representations)
print('Downloading trajectory buffer...')
download(
    f"{BASE_URL}/trajectory_buffer.zip",
    "/tmp/trajectory_buffer.zip"
)

# Extract trajectory buffer
buf_dir = Path('data/trajectory_buffer')
if not buf_dir.exists() or not any(buf_dir.glob('*.pt')):
    print('Extracting trajectory buffer...')
    with zipfile.ZipFile('/tmp/trajectory_buffer.zip', 'r') as z:
        z.extractall('.')
    print(f'  Extracted {len(list(buf_dir.glob("*.pt")))} episodes')
else:
    print(f'  Buffer already extracted: {len(list(buf_dir.glob("*.pt")))} episodes')

print('\nAll assets ready.')

## 3. Run Agent Demo

The PPO agent operates on **latent observations** — not raw prices. Each step, it receives:
- The JEPA context encoding of past market regimes
- A distribution of N=16 stochastic future trajectories
- Its current position and cumulative PnL

It outputs a continuous action in `[-1, 1]` (short → flat → long).

In [None]:
import sys
sys.path.insert(0, '.')

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from IPython.display import display, Image
from pathlib import Path
import gymnasium as gym

from stable_baselines3 import PPO
from src.strate_iv.config import load_config
from src.strate_iv.env import LatentCryptoEnv
from src.strate_iv.trajectory_buffer import TrajectoryBuffer

# Load config and buffer
config = load_config('configs/strate_iv.yaml')
buffer = TrajectoryBuffer('data/trajectory_buffer/')
_, eval_buffer = buffer.split(val_ratio=config.buffer.val_ratio)
print(f'Eval buffer: {len(eval_buffer)} episodes')

# Load PPO agent — detect expected obs dim from saved model
model_path = 'checkpoints/strate_iv/ppo_strate_iv_1000000_steps.zip'
model = PPO.load(model_path)
expected_obs_dim = model.observation_space.shape[0]
print(f'PPO agent loaded — expected obs dim: {expected_obs_dim}')

# Create environment
env = LatentCryptoEnv(buffer=eval_buffer, config=config.env)
actual_obs_dim = env.observation_space.shape[0]
print(f'Environment obs dim: {actual_obs_dim}')

# Compatibility shim: env was updated after training, truncate obs to match
if actual_obs_dim != expected_obs_dim:
    print(f'Obs dim mismatch ({actual_obs_dim} vs {expected_obs_dim}) — applying compatibility shim')
    class ObsCompatWrapper(gym.ObservationWrapper):
        def __init__(self, env, target_dim):
            super().__init__(env)
            self.target_dim = target_dim
            self.observation_space = gym.spaces.Box(
                low=-np.inf, high=np.inf, shape=(target_dim,), dtype=np.float32
            )
        def observation(self, obs):
            if len(obs) >= self.target_dim:
                return obs[:self.target_dim]
            return np.pad(obs, (0, self.target_dim - len(obs)))
    env = ObsCompatWrapper(env, expected_obs_dim)

print(f'Environment: {config.env.n_tgt} patches x {config.env.patch_len} candles = {config.env.n_tgt * config.env.patch_len}h window')


In [None]:
def run_episode(model, env, seed=None):
    """Run one episode and collect trajectory data."""
    obs, info = env.reset(seed=seed)
    realized_idx = info['realized_future_idx']
    entry = env._entry
    realized = entry.future_ohlcv[realized_idx].numpy()  # (N_tgt, patch_len, 5)

    actions, positions, cum_pnls = [], [], []

    for _ in range(config.env.n_tgt):
        action, _ = model.predict(obs, deterministic=True)
        obs, _, terminated, _, step_info = env.step(action)
        actions.append(float(action[0]))
        positions.append(step_info['position'])
        cum_pnls.append(step_info['cumulative_pnl'])
        if terminated:
            break

    return {
        'realized_ohlcv': realized,
        'actions': np.array(actions),
        'positions': np.array(positions),
        'cum_pnls': np.array(cum_pnls),
        'last_close': entry.last_close,
        'sigma_close': entry.revin_stds[0, 3].item(),
    }


def plot_episode(traj, output_path, title_suffix=''):
    """3-panel visualization: price + position + PnL vs B&H."""
    realized = traj['realized_ohlcv']
    n_tgt, patch_len, _ = realized.shape

    close_flat = realized[:, :, 3].flatten()
    time_x = np.arange(len(close_flat))
    patch_mids = np.arange(n_tgt) * patch_len + patch_len // 2
    patch_edges = np.arange(n_tgt + 1) * patch_len

    bh_returns = (close_flat - close_flat[0]) / (close_flat[0] + 1e-8)

    # Reconstruct agent PnL in price space
    agent_pnl = np.zeros(len(close_flat))
    cum = 0.0
    for t in range(n_tgt):
        a = traj['actions'][t]
        for c in range(patch_len):
            idx = t * patch_len + c
            if idx > 0:
                ret = (close_flat[idx] - close_flat[idx-1]) / (close_flat[idx-1] + 1e-8)
                cum += a * ret
            agent_pnl[idx] = cum

    fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True,
                              gridspec_kw={'height_ratios': [3, 1.5, 2]})
    fig.patch.set_facecolor('#0d1117')
    for ax in axes:
        ax.set_facecolor('#161b22')
        ax.tick_params(colors='#8b949e')
        ax.spines[:].set_color('#30363d')
        ax.yaxis.label.set_color('#8b949e')
        ax.xaxis.label.set_color('#8b949e')

    # Panel 1: Price
    ax1 = axes[0]
    ax1.plot(time_x, close_flat, color='#58a6ff', linewidth=1.2, label='Close')
    for t in range(n_tgt):
        a = traj['actions'][t]
        color = '#238636' if a > 0.1 else '#da3633' if a < -0.1 else '#6e7681'
        alpha = min(0.35, abs(a) * 0.35) if abs(a) > 0.1 else 0.05
        ax1.axvspan(patch_edges[t], patch_edges[t+1], color=color, alpha=alpha)
    ax1.set_ylabel('Close Price')
    ax1.set_title(
        f'Strate IV — Latent Regime RL Agent{title_suffix}   |   '
        f'σ_close = {traj["sigma_close"]:.4f}   |   '
        f'Last context close = {traj["last_close"]:.4f}',
        color='#e6edf3', fontsize=11
    )
    ax1.legend(facecolor='#161b22', labelcolor='#e6edf3')
    ax1.grid(True, alpha=0.2, color='#30363d')

    # Panel 2: Position
    ax2 = axes[1]
    colors = ['#238636' if a > 0 else '#da3633' if a < 0 else '#6e7681'
              for a in traj['actions']]
    ax2.bar(patch_mids, traj['actions'], width=patch_len * 0.8,
            color=colors, alpha=0.85, edgecolor='#30363d', linewidth=0.5)
    ax2.axhline(0, color='#8b949e', linewidth=0.8)
    ax2.set_ylabel('Position')
    ax2.set_ylim(-1.2, 1.2)
    ax2.set_yticks([-1, -0.5, 0, 0.5, 1])
    ax2.grid(True, alpha=0.2, color='#30363d')

    # Panel 3: PnL
    ax3 = axes[2]
    ax3.plot(time_x, agent_pnl * 100, color='#3fb950',
             linewidth=2.2, label='Agent (Latent RL)')
    ax3.plot(time_x, bh_returns * 100, color='#8b949e',
             linewidth=1.5, linestyle='--', label='Buy & Hold')
    ax3.axhline(0, color='#6e7681', linewidth=0.6)
    ax3.set_ylabel('Cumulative Return (%)')
    ax3.set_xlabel(f'Candles (1h)  |  {n_tgt} patches × {patch_len} candles = {n_tgt * patch_len}h window')
    ax3.legend(facecolor='#161b22', labelcolor='#e6edf3')
    ax3.grid(True, alpha=0.2, color='#30363d')

    agent_final = agent_pnl[-1] * 100
    bh_final = bh_returns[-1] * 100
    alpha_val = agent_final - bh_final
    color_box = '#1a4731' if alpha_val >= 0 else '#4a1515'
    ax3.annotate(
        f'Agent:  {agent_final:+.2f}%\nB&H:    {bh_final:+.2f}%\nAlpha: {alpha_val:+.2f}%',
        xy=(0.98, 0.95), xycoords='axes fraction', ha='right', va='top',
        fontsize=10, fontfamily='monospace', color='#e6edf3',
        bbox=dict(boxstyle='round,pad=0.5', facecolor=color_box, edgecolor='#30363d')
    )

    plt.tight_layout()
    plt.savefig(output_path, dpi=150, bbox_inches='tight', facecolor=fig.get_facecolor())
    plt.close(fig)
    return agent_final, bh_final, alpha_val


# Run 5 episodes
Path('outputs/demo').mkdir(parents=True, exist_ok=True)
print('Running 5 evaluation episodes...\n')
results = []
for i in range(5):
    traj = run_episode(model, env, seed=i)
    out = f'outputs/demo/episode_{i:02d}.png'
    agent_r, bh_r, alpha = plot_episode(traj, out, title_suffix=f'  —  Episode {i+1}/5')
    results.append((agent_r, bh_r, alpha))
    sign = '+' if alpha >= 0 else ''
    print(f'  Episode {i+1}: Agent {agent_r:+.2f}% | B&H {bh_r:+.2f}% | Alpha {sign}{alpha:.2f}%')

print(f'\nMean alpha over 5 episodes: {np.mean([r[2] for r in results]):+.2f}%')

## 4. Results

In [None]:
from IPython.display import display, Image

for i in range(5):
    agent_r, bh_r, alpha = results[i]
    print(f'\n--- Episode {i+1} | Agent: {agent_r:+.2f}% | B&H: {bh_r:+.2f}% | Alpha: {alpha:+.2f}% ---')
    display(Image(filename=f'outputs/demo/episode_{i:02d}.png', width=900))

## 5. Summary

### What you just saw

The agent **never sees raw prices** during inference. It operates entirely on latent representations produced by Fin-JEPA:

| Component | Role |
|---|---|
| **Spherical VQ-VAE** (Strate I) | Tokenizes OHLCV patches → discrete market regime tokens |
| **Fin-JEPA + Mamba-2** (Strate II) | Self-supervised temporal model over token sequences |
| **Stochastic Predictor** (Strate III) | Samples N=16 divergent future latent trajectories |
| **PPO Agent** (Strate IV) | Plans in latent space, outputs continuous position [-1, 1] |

### Why this matters

Classical approaches (LSTM, Transformer on raw prices) are forced to predict **every tick** — memorizing noise instead of learning structure.  
JEPA learns to predict **latent representations** of future states, ignoring unpredictable details.  
The agent then plans in this cleaner latent space, exhibiting genuine **regime switching** rather than curve-fitting.

---

**Repository:** https://github.com/ElMonstroDelBrest/World-IA-Finance  
**License:** AGPL-3.0