# Research Paper Outline: The TAPE-TCN Framework

**Working Title**: *TAPE-TCN: Horizon-Agnostic Portfolio Optimization via Temporal Convolutional Networks and Actuarial Drawdown Control*

**Target Venue**: NeurIPS (FinAI Workshop), ICAIF, or Quantitative Finance (Journal)
**Status**: Draft Structure


## Abstract <a id='abstract'></a>

**Context**: Deep Reinforcement Learning (DRL) for portfolio optimization often struggles with two key issues: (1) the credit-assignment problem in long-horizon financial returns, and (2) path-dependent tail risks (drawdowns).

**Innovation**: We propose **TAPE-TCN**, a framework combining:
1. **Temporal Convolutional Networks (TCN)**: Dilated causal convolutions for multi-scale sequence modeling.
2. **TAPE Reward System**: A three-component step reward (Base + DSR/PBRS + Turnover proximity) with drawdown dual-penalty control and terminal TAPE utility.
3. **Actuarial State Augmentation**: Survival-analysis-inspired drawdown recovery features.

**Evaluation Plan**: We evaluate across deterministic (`mode`, `mean`) and stochastic tracks on a strict out-of-sample window (2020-01-01 to 2025-11-30), with all final performance metrics reported after full TCN variant runs.


## 1. Introduction <a id='intro'></a>

### 1.1 The Problem
- Financial markets are noisy, non-stationary, and have low signal-to-noise ratios.
- Standard RL objectives (maximize discounted cumulative return) under-handle **path-dependent risks** like drawdowns.
- Practical deployment requires balancing return, risk, and transaction realism.

### 1.2 The TAPE-TCN Solution
- **TCN**: Parallelizable sequence modeling with stable gradients and controllable receptive field.
- **Actuarial Intelligence**: Explicit drawdown recovery-state variables in the observation space.
- **TAPE**: Daily reward shaping + terminal utility alignment for long-horizon portfolio health.

### 1.3 Contributions
1. Actuarial survival-style drawdown features integrated into a portfolio RL state pipeline.
2. Three-component TAPE reward with drawdown dual-control and terminal alignment.
3. Comparative study across TCN variants (`TCN`, `TCN_ATTENTION`, `TCN_FUSION`) under unified evaluation protocol.


## 2. Methodology: Temporal Convolutional Architecture <a id='method-tcn'></a>

### 2.1 Dilated Causal Convolutions
- **Causal**: Output at time $t$ depends only on inputs $x_{0:t}$.
- **Dilated**: Filter $f$ applied with spacing $d$, enabling large temporal context.

$$ (F *_d X)(x_t) = \sum_{i=0}^{k-1} f(i) \cdot x_{t - d i} $$

### 2.2 Receptive Field Math (Current Stack)
With two causal conv layers per residual block, receptive field is:

$$ R = 1 + \sum_b 2(k-1)d_b $$

For kernel size $k=5$ and dilation stack $[2,4,8]$:

$$ R = 1 + 2\cdot(5-1)\cdot(2+4+8) = 113 $$

This provides broad temporal coverage for medium/long regime dynamics.


## 3. Methodology: TAPE Reward System <a id='method-tape'></a>

We decompose reward into dense step terms plus terminal utility.

### 3.1 Step Reward (Dense)

$$ r_t = r_t^{	ext{base}} + r_t^{	ext{DSR/PBRS}} + r_t^{	ext{turnover}} - r_t^{	ext{drawdown}} $$

where drawdown term is dual-controlled:

$$ r_t^{	ext{drawdown}} = \lambda_t \cdot \max(0, 	ext{DD}_t - d_{	ext{trigger}}), \quad
\lambda_{t+1}=\Pi_{[\lambda_{\min},\lambda_{\max}]}\left(\lambda_t + \eta_\lambda g_tight) $$

- **DSR/PBRS**: Differential Sharpe-based potential shaping.
- **Turnover proximity**: Reward near target turnover band; suppress excessive churn.
- **Drawdown dual control**: Adaptive penalty intensity under drawdown stress.

### 3.2 Terminal Utility (Sparse)
At episode end, compute aggregate TAPE utility $S \in [0,1]$ and apply:

$$ r_T \leftarrow r_T + \Lambda \cdot S_{	ext{TAPE}} $$

This aligns local actions with episode-level portfolio quality.


## 4. Methodology: Actuarial State Augmentation <a id='method-actuarial'></a>

We inject survival-analysis-inspired drawdown features into state $s_t$.

### 4.1 Core Idea
Estimate recovery characteristics from historical drawdown episodes and expose them as real-time risk context.

### 4.2 Implemented Features
- `Actuarial_Expected_Recovery`
- `Actuarial_Prob_30d`
- `Actuarial_Prob_60d`
- `Actuarial_Reserve_Severity`

**Hypothesis**: These features help distinguish recoverable dips from persistent stress regimes, improving risk-aware allocation.


## 5. Experimental Design <a id='experiments'></a>

### 5.1 Data and Splits (Implementation-Aligned)
- **Assets**: 10 US equities + cash.
- **Training window**: 2011-01-01 to 2019-12-31.
- **Out-of-sample (OOS)**: 2020-01-01 to 2025-11-30.

### 5.2 Model Families in Scope
1. **TCN** (core)
2. **TCN_ATTENTION**
3. **TCN_FUSION**

External/legacy comparisons (if reported) should be clearly labeled as non-core implementation baselines.

### 5.3 Planned Ablation Axes (for later execution)
1. **Architecture**: TCN vs TCN_ATTENTION vs TCN_FUSION
2. **Reward**: TAPE full vs return-only / reduced components
3. **Features**: with vs without actuarial block


## 6. Results and Discussion <a id='results'></a>

*(Placeholders only until full TCN variant campaign is completed.)*

### 6.1 Performance Table (Template)
| Model | Sharpe | Sortino | Max DD | Turnover |
|-------|--------|---------|--------|----------|
| TCN | TBD | TBD | TBD | TBD |
| TCN_ATTENTION | TBD | TBD | TBD | TBD |
| TCN_FUSION | TBD | TBD | TBD | TBD |

### 6.2 Behavioral Analysis (To Fill)
- Allocation concentration dynamics (weights/alphas)
- Turnover governance behavior versus target band
- Crisis-window behavior (2020, 2022, 2025 segments)
- Deterministic (`mode`, `mean`) vs stochastic robustness


## 7. Conclusion <a id='conclusion'></a>

This study defines a risk-aware TCN framework for portfolio RL that integrates actuarial drawdown context, shaped reward engineering, and execution constraints. Final empirical conclusions will be made only after the full TCN variant and robustness campaign is completed under the unified logging protocol.


## References

Use canonical citations in `paper/references.bib`, and align with project-local reading set in:

`notebooks/documentation/related_works/`

Core citation groups:
- Reward shaping / PBRS / safe RL
- Portfolio RL and risk-sensitive optimization
- Dirichlet policy for simplex allocation
- TCN / attention / fusion sequence modeling
- Activation function studies for stable alpha mapping
