# GPU Black-Scholes Trainer with SpectralMC
This interactive Jupyter notebook demonstrates how to:
1. Define a **Black-Scholes** simulation via `spectralmc.gbm`.
2. Construct a **CVNN** from `spectralmc.cvnn`.
3. Train via the **GbmTrainer** in `spectralmc.gbm_trainer`.
4. Track and visualize the training progress with **TensorBoard**.
5. Predict final prices on sample inputs.

Make sure you have:
- A CUDA-capable GPU.
- The `spectralmc` package installed (with your `sobol_sampler.py`, `gbm.py`, `cvnn.py`, `gbm_trainer.py`).
- The `tensorboard` Python package installed.

We will do a brief training run, then use TensorBoard to visualize the loss.

> **Note**: The large number of MC paths may require significant GPU memory if you pick large parameters. Consider smaller "batches_per_mc_run" or "network_size" if your GPU is memory-limited.

In [None]:
import os
import torch
import numpy as np
import datetime
import cupy as cp  # type: ignore[import-untyped]
from torch.utils.tensorboard import SummaryWriter
from spectralmc.gbm import SimulationParams, BlackScholes
from spectralmc.cvnn import CVNN
from spectralmc.gbm_trainer import GbmTrainer, _inputs_to_real_imag
from spectralmc.sobol_sampler import BoundSpec
print("PyTorch version:", torch.__version__)
print("CUDA available?:", torch.cuda.is_available())

# If needed, ensure your environment has:
# pip install tensorboard
# to allow in-notebook visualization.

## Set up TensorBoard
We'll create a new log directory each run to keep logs separate.
You can adjust the path as you wish.

In [None]:
log_dir = "./.logs/tb_run_" + datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
writer = SummaryWriter(log_dir=log_dir)
print(f"TensorBoard log directory: {log_dir}")

## Define Training Configuration
We'll pick a small-ish Monte Carlo config to ensure it runs quickly. If you have a lot of GPU memory, you can increase `batches_per_mc_run` or `network_size`.

We also define the domain for `(X0,K,T,r,d,v)` and create our **CVNN** with certain hidden features. Finally, we instantiate the **GbmTrainer**.

In [None]:
sim_params = SimulationParams(
    timesteps=16,
    network_size=8,
    batches_per_mc_run=256,  # total_paths = 8*256= 2048
    threads_per_block=128,
    mc_seed=123,
    buffer_size=1,
    dtype="float32",  # can switch to "float64" if desired
    simulate_log_return=True,
    normalize_forwards=False,
)

domain_example = {
    "X0": BoundSpec(lower=50.0, upper=150.0),
    "K":  BoundSpec(lower=50.0, upper=150.0),
    "T":  BoundSpec(lower=0.1,  upper=2.0),
    "r":  BoundSpec(lower=0.0,  upper=0.1),
    "d":  BoundSpec(lower=0.0,  upper=0.05),
    "v":  BoundSpec(lower=0.1,  upper=0.5),
}

cvnn_net = CVNN(
    input_features=6,
    output_features=sim_params.network_size,
    hidden_features=16,
    num_residual_blocks=1,
)

trainer = GbmTrainer(
    sim_params=sim_params,
    domain_bounds=domain_example,
    skip_sobol=0,
    sobol_seed=42,
    cvnn=cvnn_net,
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

## Train and Log the Loss to TensorBoard
We'll do a short run of, say, 30 batches. Each batch has 8-16 Sobol points. Then we write the loss each step into TensorBoard logs. The final loss is printed each 10 steps.

To view TensorBoard inside Jupyter, we can run `%load_ext tensorboard` then `%tensorboard --logdir=...`
If that doesn't work, run `tensorboard --logdir=logs/...` in the terminal.

In [None]:
# We'll define a train_and_log function that uses the trainer.

def train_and_log(
    trainer: GbmTrainer,
    num_batches: int = 30,
    batch_size: int = 8,
    learning_rate: float = 1e-3,
    writer=writer,
) -> None:
    """Train the CVNN for a few batches, logging loss to TensorBoard."""
    import torch.nn.functional as F

    sim_params = trainer.sim_params
    # Decide complex dtype from sim_params
    cupy_complex_dtype = cp.complex64 if sim_params.dtype == "float32" else cp.complex128
    torch_complex_dtype = torch.complex64 if sim_params.dtype == "float32" else torch.complex128
    torch_real_dtype = torch.float32 if sim_params.dtype == "float32" else torch.float64

    trainer.cvnn.train()
    optimizer = torch.optim.Adam(trainer.cvnn.parameters(), lr=learning_rate)

    for step in range(1, num_batches + 1):
        sobol_points = trainer.sampler.sample(batch_size)

        payoff_fft_cp = cp.zeros(
            (batch_size, sim_params.network_size), dtype=cupy_complex_dtype
        )

        for i, bs_input in enumerate(sobol_points):
            pr = trainer.bsm_engine.price(inputs=bs_input)
            put_price_cp = pr.put_price
            put_mat = put_price_cp.reshape(
                (sim_params.batches_per_mc_run, sim_params.network_size)
            )
            put_fft = cp.fft.fft(put_mat, axis=1)
            payoff_mean_fft = cp.mean(put_fft, axis=0)
            payoff_fft_cp[i, :] = payoff_mean_fft

        # Convert CuPy->Torch
        dlpack_capsule = payoff_fft_cp.toDlpack()
        payoff_fft_torch = torch.utils.dlpack.from_dlpack(dlpack_capsule)
        payoff_fft_torch = payoff_fft_torch.to(torch_complex_dtype)

        target_real = payoff_fft_torch.real
        target_imag = payoff_fft_torch.imag

        real_in, imag_in = _inputs_to_real_imag(
            sobol_points, dtype=torch_real_dtype, device=trainer.device
        )

        pred_r, pred_i = trainer.cvnn(real_in, imag_in)
        loss_r = F.mse_loss(pred_r, target_real)
        loss_i = F.mse_loss(pred_i, target_imag)
        loss_val = loss_r + loss_i

        optimizer.zero_grad()
        loss_val.backward()  # type: ignore[no-untyped-call]
        optimizer.step()

        # Write to TensorBoard
        writer.add_scalar("Loss/train", loss_val.item(), step)

        if step % 10 == 0 or step == num_batches:
            print(f"[TRAIN] step={step}/{num_batches}, loss={loss_val.item():.6f}")


In [None]:
# Let's do a short training run, e.g. 30 steps with batch_size=8
train_and_log(trainer, num_batches=30, batch_size=8, learning_rate=1e-3)

### Launch TensorBoard (Optional)
If you're in Jupyter, you can try:
```
%load_ext tensorboard
%tensorboard --logdir=./logs
```
If that doesn't work, you can run:
```
tensorboard --logdir=./logs
```
in a separate terminal, then open the displayed link in your browser.

In [None]:
# Uncomment these lines if you want to see TensorBoard inline (sometimes it doesn't work well in certain Jupyter envs):
# %load_ext tensorboard
# %tensorboard --logdir=./logs

## Predict on sample inputs
We'll now see how to do inference for 2 custom `(X0,K,T,r,d,v)` sets. The `predict_price` method
will ifft the network’s DFT output and interpret the 0-frequency component as the put price. If there's
a non-trivial imaginary component, it’ll raise an error.

In [None]:
sample_inputs = [
    BlackScholes.Inputs(X0=100.0, K=100.0, T=1.0, r=0.02, d=0.01, v=0.2),
    BlackScholes.Inputs(X0=110.0, K=105.0, T=0.5, r=0.03, d=0.0, v=0.3)
]
results = trainer.predict_price(sample_inputs)
for idx, res in enumerate(results):
    print(f"Sample {idx}: put={res.put_price:.4f}, call={res.call_price:.4f}, "
          f"underlying={res.underlying:.4f}, put_convexity={res.put_convexity:.4f}")


## Conclusions
- We have set up a short training loop for GPU-based MC via `BlackScholes`.
- We used `CVNN` to learn the payoff DFT, logging the training progress to TensorBoard.
- We can visualize the training progress by opening TensorBoard and checking the `Loss/train` plot.

If you see large imaginary parts in `predict_price`, it might indicate the model hasn't learned a purely real distribution or needs more training.
Feel free to tweak the hyperparameters (like `learning_rate`, `timesteps`, `network_size`) to get better results.