# TCN Architecture Types Guide (Project-Aligned)

This notebook explains the TCN architecture variants implemented in this project, how they process data, and why you would pick one over another. It is written to be both technical and readable for non-specialists.

## 1) What is implemented right now

From the current codebase, the TCN family includes:

- `TCN` (baseline temporal convolution model)
- `TCN_ATTENTION` (TCN + self-attention on the time axis)
- `TCN_FUSION` (hierarchical fusion: per-asset temporal encoding + cross-asset attention + global context + gating)

There is also a compatibility path where `actor_critic_type='TCN'` can route to fusion or attention based on `use_fusion` or `use_attention` flags.

## 2) Shared input data pipeline (all TCN variants)

All TCN variants consume the same prepared feature stream, then differ only in model internals.

Core groups in the pipeline include:
- Technical indicators and return transforms
- Dynamic covariance signals
- Fundamental features (daily-aligned from quarterly reports)
- Regime and macro features
- Actuarial features
- Cross-sectional/quant alpha style features

After feature construction, normalization is applied on the configured feature list. Sequence models then consume windows of shape:

\[ X \in \mathbb{R}^{B \times T \times F} \]
where:
- \(B\): batch size
- \(T\): sequence length (lookback window)
- \(F\): total number of features per timestep

## 3) Common actor output: Dirichlet concentration parameters

All TCN actors output logits, then transform logits into positive Dirichlet concentration values \(\alpha\).

General form:
\[ \alpha = g(\text{logits}/\tau) + \epsilon \]
where:
- \(g(\cdot)\) is the selected activation (`elu`, `softplus_shift`, `swish`, `mish`, `exp_clip`, or default softplus)
- \(\tau\) is optional logit temperature
- \(\epsilon\) is adaptive Dirichlet floor (annealed during training)

This guarantees valid positive \(\alpha\), which define the portfolio-weight distribution.

Plain language: the network does not output weights directly. It outputs confidence parameters for each asset, and those parameters define how concentrated or diversified the sampled weights are.

## 4) Baseline TCN (`TCN`)

### Core idea
Use stacked **dilated causal convolutions** with residual connections to learn temporal patterns efficiently.

A simplified block is:
\[ h^{(l)} = \sigma\left(\text{Conv}_{d_l}(\text{Conv}_{d_l}(h^{(l-1)})) + R(h^{(l-1)})\right) \]
- \(d_l\): dilation at block \(l\)
- \(R\): residual projection when dimensions differ
- \(\sigma\): ReLU in this implementation

Then a global average over time produces one vector for allocation output.

### Receptive field intuition
Approximate temporal reach grows with kernel size and dilations. With kernel \(k\) and dilation stack \((d_1,\dots,d_L)\), a common approximation is:
\[ RF \approx 1 + (k-1)\sum_{l=1}^{L} d_l \]
### Pros
- Fast and parallel compared with recurrent models
- Stable training due to residual design
- Strong baseline for medium-horizon pattern extraction

### Cons
- No explicit mechanism to reweight important timestamps globally
- Can under-emphasize sparse but critical events

## 5) TCN + Attention (`TCN_ATTENTION`)

### Core idea
Run TCN blocks first, then apply multi-head self-attention over time to let the model focus on key timesteps.

Attention equations:
\[ Q = XW_Q,\; K = XW_K,\; V = XW_V \]\[ \text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right)V \]
with multiple heads combined and projected.

### What changes vs baseline
- Baseline TCN asks: "what local-to-midrange motifs exist?"
- TCN+Attention adds: "which timesteps matter most for the decision?"

### Pros
- Better at emphasizing eventful periods
- More expressive than pure convolutional pooling

### Cons
- More compute and memory than baseline TCN
- Can overfit faster if regularization is weak

## 6) TCN Fusion (`TCN_FUSION`)

### Core idea
This is a structured fusion architecture that separates and recombines signals at two levels:

1. **Per-asset temporal encoder** (shared TCN blocks)
2. **Cross-asset attention** to model relationships between assets
3. **Global context branch** from the whole state
4. **Learnable gate** that mixes asset-context and global-context

Gated fusion form:
\[ g = \sigma(W_g[a \| c]) \]\[ z = g \odot a + (1-g) \odot c \]
- \(a\): asset-context embedding
- \(c\): global-context embedding
- \(z\): fused representation sent to output head

### Why this is different
Baseline and attention variants mostly treat the sequence as one unified stream. Fusion explicitly builds an asset-level pathway and a portfolio-level pathway, then learns how much to trust each at each step.

### Pros
- Best structural fit when you want both asset-specific and market-wide context
- Better interpretability of where signal comes from (asset branch vs global branch)

### Cons
- Highest implementation complexity
- Most sensitive to shape/config mismatches (asset count, feature slicing)
- Highest compute cost among TCN options

## 7) Quick comparison

| Variant | Best Use Case | Complexity | Speed | Main Risk |
|---|---|---|---|---|
| `TCN` | Strong baseline, fast iteration | Low | Fastest | May miss global importance weighting |
| `TCN_ATTENTION` | Event-aware temporal weighting | Medium | Medium | Higher overfitting risk |
| `TCN_FUSION` | Joint asset-level + market-level reasoning | High | Slowest | Config/shape sensitivity |

## 8) Practical selection logic

Use this progression in practice:

1. Start with `TCN` to validate data pipeline, reward shaping, and stability.
2. Move to `TCN_ATTENTION` when baseline converges but misses regime/event timing.
3. Use `TCN_FUSION` when you need explicit asset-interaction modeling and have strong data hygiene + compute budget.

For publication-quality comparisons, keep all non-architecture factors fixed (data window, reward profile, seed protocol, evaluation protocol).

In [None]:
# Example config toggles for TCN family

# Baseline TCN
config["agent_params"]["actor_critic_type"] = "TCN"
config["agent_params"]["use_attention"] = False
config["agent_params"]["use_fusion"] = False

# TCN + Attention
# config["agent_params"]["actor_critic_type"] = "TCN_ATTENTION"
# config["agent_params"]["use_attention"] = True
# config["agent_params"]["use_fusion"] = False

# TCN + Fusion
# config["agent_params"]["actor_critic_type"] = "TCN_FUSION"
# config["agent_params"]["use_fusion"] = True
# config["agent_params"]["use_attention"] = False

## 9) Notes for non-technical readers

- Think of `TCN` as a fast scanner over recent history.
- Think of `TCN_ATTENTION` as a scanner plus a highlighter that marks important time points.
- Think of `TCN_FUSION` as two analysts (asset-level and global-level) whose opinions are blended by a learned referee.

All three feed into the same portfolio-allocation logic via Dirichlet concentration outputs, so differences in behavior come from representation quality, not from changing the action distribution family.