In [1]:
import pandas as pd
import numpy as np

# Derivatives for reinforcement trading notebook

Given a trading system model $F_{t}(\theta)$, the goal is to adjust the parameters $\theta$ in order to maximise $U_{t}$ (wealth).

# $$ \frac{dU_{T}(\theta)}{d\theta} = 
\sum \limits _{t=1} ^ {T} 
\frac{dU_{T}}{dR_{t}} 
\{ \frac{dR_{t}}{dF_{t}} \frac{dF_{t}}{d\theta} + \frac{dR_{t}}{dF_{t-1}} \frac{dF_{t-1}}{d\theta}  \}$$

Where:
* $U$ = wealth function (Sharpe ratio / Sterling ratio)
* $R_{t}$ = realised returns 
* $r_{t}$ = asset returns
* $F_{t}$ = position  
* $\theta$ = model weights
* $\delta$ = transaction costs
* $\mu$ = max position (multiplied by $F_{t}$ which goes from -1 to 1)

#### References

* J. Moody and M. Saffell, "Learning to trade via direct reinforcement," in IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, July 2001, doi: 10.1109/72.935097.

* https://teddykoker.com/2019/06/trading-with-reinforcement-learning-in-python-part-ii-application/
* http://cs229.stanford.edu/proj2006/Molina-StockTradingWithRecurrentReinforcementLearning.pdf

----

## $$\frac{dU_{T}}{dR_{t}} $$  
  
  

The *Sterling Ratio* is used as a wealth function:

$$ U_{t} = SterlingRatio = \frac{Annualized Average Return}{Maximum Drawdown}$$  
  
  
  


  
This has been approximated to the Downside Deviation Ratio (DDR):

$$ DDR_{T} = \frac{Average(R_{t})}{DD_{T}} $$

Where $$DD_{T} = (\frac{1}{T}\sum \limits _{t=1} ^ {T} min \{R_{t},0\}^2)^\frac{1}{2} $$

In [2]:
import jax.numpy as jnp
from jax import grad, jit, vmap
from jax import random

In [6]:
# Cheat and use autograd

def DDR(R, Rt):
    _R = jnp.append(R, Rt)
    return jnp.mean(_R) / jnp.mean(jnp.minimum(_R, 0)**2)**0.5

ddr_auto = jit(grad(DDR, 1))

#### Approximated for online learning as:
The Differential Downside Deviation Ratio (DDR):

$$D_{t} = \frac{DD_{t-1}^2 \cdot (R_{t} - \frac{1}{2} A_{t-1}) - \frac{1}{2} A_{t-1}R_{t}^2}{DD_{t-1}^3} $$

Based on the moving average of returns (A) and squared drawdown (DD2)

Exponential moving average of returns:
$$A_{t} = A_{t-1} + n(R_{t} - A_{t-1})$$

In [10]:
def get_A(R, n):
    T = len(R)
    A = np.zeros(T)
    
    for t in range(1, T):
        A[t] = A[t-1] + (n * (R[t] - A[t-1]))
        A[t] = np.nan_to_num(A[t])

    return A

Exponential moving average of squared drawdown: 
$$DD_{t}^2 = DD_{t-1}^2 + n(min(R_{t},0)^2 - DD_{t-1}^2)$$

In [11]:
def get_DD2(R, n):
    T = len(R)
    DD2 = np.zeros(T)
    for t in range(1, T):
        DD2[t] = DD2[t-1] + n * (min(R[t],0)**2 - DD2[t-1])
        DD2[t] = np.nan_to_num(DD2[t])
        
    return DD2

#### Differential downside deviation ratio 

In [12]:
import sympy as sp

dd_t_1, rt, A_t_1 = sp.symbols('DD_t_-1, R_t, A_t_-1')
Dt = (dd_t_1**2 * (rt - 0.5*A_t_1) - (0.5*A_t_1*rt**2)) / dd_t_1**3

Differential downside deviation ratio (Dt)

In [13]:
Dt

(-0.5*A_t_-1*R_t**2 + DD_t_-1**2*(-0.5*A_t_-1 + R_t))/DD_t_-1**3

Partial derivative of Dt with respect to Rt

In [14]:
dDdR = sp.diff(Dt, rt)
dDdR

(-1.0*A_t_-1*R_t + DD_t_-1**2)/DD_t_-1**3

## $$ \frac{dU_{T}}{dR_{t}} = \frac{-1A_{t-1}R_{t}+DD^2_{t-1}}{DD^3_{t-1}}$$

In [15]:
def get_dUdR(A, t, R, DD2):
    return ((-1.0*A[t-1]*R[t]) + DD2[t-1]) / DD2[t-1]**2

-------

## $$\frac{dR_{t}}{dF_{t}} = -\mu\delta \cdot sgn(F_{t} - F_{t-1})$$

In [16]:
def get_dRdFt(delta, Ft, t):
    return -delta * np.sign(Ft[t] - Ft[t-1])

-----

## $$\frac{dF_{t}}{d\theta} = (1-tanh(\theta^Tx_{t})^2) \cdot \{ x_{t} + w_{M+2} \frac{dF_{t-1}}{d\theta} \}$$

In [17]:
def get_dFtdTheta(Ft, t, state, theta, prev):
    return (1 - Ft[t] ** 2) * (state + theta[-1] * prev)

------

# $$\frac{dR_{t}}{dF_{t-1}} = \mu \cdot r_{t} + \mu\delta \cdot sgn(F_{t}-F_{t-1})$$

In [18]:
def get_dRdFtp(r, delta, Ft, t):
    return r[t] + delta * np.sign(Ft[t] - Ft[t-1])

### DEbug

In [19]:
A, B = sp.symbols('A, B', real=True)
S_T = A / sp.sqrt(B-A**2)
S_T

A/sqrt(-A**2 + B)

In [20]:
sp.simplify(S_T.diff(A))

B/(-A**2 + B)**(3/2)

In [21]:
# deriv of sums = sum of derivs
# https://math.stackexchange.com/a/3910714

$$A=\frac{1}{T} \sum_{t=1} ^{T} R_{t}$$

In [22]:
sp.diff(S_T, A)

A**2/(-A**2 + B)**(3/2) + 1/sqrt(-A**2 + B)