For a given trading day $t$ and an allocation $S$, let:

- $M$ : the number of assets in the universe  
- $w,{S,t} = (w_{S,t,1},w_{S,t,2}, \dots, w_{S,t,N})$ :  be the weights of allocation $S$ at time $t$
- $r_{i,t+1}$ : the performance (or return) of asset $i$ from day $t$ to day $t+1$  

Then the realized return of allocation $S$ at $t+1$ is given by:

$$
R_{S,t+1} = \sum_{i=1}^M w_{S,t,i} \times r_{i,t+1}
$$

The prediction task is to estimate the sign of $R_{S,t+1}$.


In [1]:
import pandas as pd

## `X_train.csv`

At every day $t$, each allocation $S$ follows this property:

$$
\forall S,  \forall t : \ \sum_{i=1}^M |w_{S,t,i} |
$$

The **SIGNED\_VOLUME** of an allocation $S$ at $t$ is given by:

$$
V_{S,t} = \sum_{i=1}^M w_{S,t,i} \times v_{i,t}
$$

where $v_{i,t}$ is the traded volume of stock $i$ during the trading session at timestamp $t$.


For homogeneity, these $V_{S,t}$ were rescaled in a rolling fashion to ensure comparability across different styles of allocations.

The **AVG\_DAILY\_TURNOVER** of an allocation $S$ at $t$ is given by:

$$
TURNOVER_{S,t} = \sum_{i=1}^M | w_{S,t,i} - w_{S,t-1,i} |
$$

$ADT_{S,t} = median(TURNOVER_{S,t} , \dots, TURNOVER_{S,t-20} )$

In [2]:
X_train = pd.read_csv("data/X_train.csv")
X_train.head()

Unnamed: 0,ROW_ID,TS,ALLOCATION,RET_20,RET_19,RET_18,RET_17,RET_16,RET_15,RET_14,...,SIGNED_VOLUME_9,SIGNED_VOLUME_8,SIGNED_VOLUME_7,SIGNED_VOLUME_6,SIGNED_VOLUME_5,SIGNED_VOLUME_4,SIGNED_VOLUME_3,SIGNED_VOLUME_2,SIGNED_VOLUME_1,AVG_DAILY_TURNOVER
0,0,DATE_0001,ALLOCATION_01,-0.002477,0.004826,0.005374,-0.001688,-0.000152,-0.000685,-0.002217,...,-1.016154,-1.01145,-1.171714,-0.729594,-1.208138,-1.215123,-0.848346,-0.642461,-0.203447,0.054324
1,1,DATE_0001,ALLOCATION_02,0.006863,-0.005265,-0.004249,0.002686,-0.002638,0.003056,0.002712,...,0.896098,1.429419,0.946527,1.059767,0.988289,0.956915,0.943508,0.124168,0.081083,0.015669
2,2,DATE_0001,ALLOCATION_03,-0.005535,0.008541,0.00536,-0.002491,0.004679,-0.000848,-0.007197,...,-0.889142,-0.939257,-0.98037,-0.863196,-0.839662,-0.882459,-1.172723,-0.863937,-0.695998,0.057961
3,3,DATE_0001,ALLOCATION_04,0.003178,-0.001352,-0.004051,-0.001841,-0.005659,0.000627,0.006686,...,-1.788263,-0.807971,-1.587942,-0.042083,-1.356051,-1.007006,-1.821786,-0.45566,-1.090989,0.096004
4,4,DATE_0001,ALLOCATION_05,0.003359,-0.003349,-0.00546,0.000416,-0.003533,0.000913,0.005088,...,0.326148,1.0131,0.362135,0.77467,0.370484,-0.132558,-0.417645,-1.284208,-1.3829,0.005816


In [3]:
X_train.shape

(180245, 44)

In [4]:
X_train.columns

Index(['ROW_ID', 'TS', 'ALLOCATION', 'RET_20', 'RET_19', 'RET_18', 'RET_17',
       'RET_16', 'RET_15', 'RET_14', 'RET_13', 'RET_12', 'RET_11', 'RET_10',
       'RET_9', 'RET_8', 'RET_7', 'RET_6', 'RET_5', 'RET_4', 'RET_3', 'RET_2',
       'RET_1', 'SIGNED_VOLUME_20', 'SIGNED_VOLUME_19', 'SIGNED_VOLUME_18',
       'SIGNED_VOLUME_17', 'SIGNED_VOLUME_16', 'SIGNED_VOLUME_15',
       'SIGNED_VOLUME_14', 'SIGNED_VOLUME_13', 'SIGNED_VOLUME_12',
       'SIGNED_VOLUME_11', 'SIGNED_VOLUME_10', 'SIGNED_VOLUME_9',
       'SIGNED_VOLUME_8', 'SIGNED_VOLUME_7', 'SIGNED_VOLUME_6',
       'SIGNED_VOLUME_5', 'SIGNED_VOLUME_4', 'SIGNED_VOLUME_3',
       'SIGNED_VOLUME_2', 'SIGNED_VOLUME_1', 'AVG_DAILY_TURNOVER'],
      dtype='object')

In [5]:
X_train["TS"].unique()

array(['DATE_0001', 'DATE_0002', 'DATE_0003', ..., 'DATE_2771',
       'DATE_2772', 'DATE_2773'], shape=(2773,), dtype=object)

In [6]:
X_train["ALLOCATION"].unique()

array(['ALLOCATION_01', 'ALLOCATION_02', 'ALLOCATION_03', 'ALLOCATION_04',
       'ALLOCATION_05', 'ALLOCATION_06', 'ALLOCATION_07', 'ALLOCATION_08',
       'ALLOCATION_09', 'ALLOCATION_10', 'ALLOCATION_11', 'ALLOCATION_12',
       'ALLOCATION_13', 'ALLOCATION_14', 'ALLOCATION_15', 'ALLOCATION_16',
       'ALLOCATION_17', 'ALLOCATION_18', 'ALLOCATION_19', 'ALLOCATION_20',
       'ALLOCATION_21', 'ALLOCATION_22', 'ALLOCATION_23', 'ALLOCATION_24',
       'ALLOCATION_25', 'ALLOCATION_26', 'ALLOCATION_27', 'ALLOCATION_28',
       'ALLOCATION_29', 'ALLOCATION_30', 'ALLOCATION_31', 'ALLOCATION_32',
       'ALLOCATION_33', 'ALLOCATION_34', 'ALLOCATION_35', 'ALLOCATION_36',
       'ALLOCATION_37', 'ALLOCATION_38', 'ALLOCATION_39', 'ALLOCATION_40',
       'ALLOCATION_41', 'ALLOCATION_42', 'ALLOCATION_43', 'ALLOCATION_44',
       'ALLOCATION_45', 'ALLOCATION_46', 'ALLOCATION_47', 'ALLOCATION_48',
       'ALLOCATION_49', 'ALLOCATION_50', 'ALLOCATION_51', 'ALLOCATION_52',
       'ALLOCATION_53', '

Be careful no continuity in the dates

In [10]:
X_train[X_train["ALLOCATION"] == "ALLOCATION_01"][["TS", "RET_1", "RET_2"]]

Unnamed: 0,TS,RET_1,RET_2
0,DATE_0001,0.001061,0.004822
65,DATE_0002,0.003451,-0.000827
130,DATE_0003,0.001105,0.000013
195,DATE_0004,0.001886,0.000974
260,DATE_0005,0.001579,0.004163
...,...,...,...
179920,DATE_2769,-0.001257,0.003040
179985,DATE_2770,-0.003350,0.000868
180050,DATE_2771,-0.002333,-0.000563
180115,DATE_2772,-0.001352,-0.000667


In [6]:
y_train = pd.read_csv("data/y_train.csv")
y_train.head()

Unnamed: 0,ROW_ID,target
0,0,0.000962
1,1,-0.002046
2,2,0.00163
3,3,-0.001154
4,4,-0.00186


In [7]:
y_train.shape

(180245, 2)

In [9]:
X_test = pd.read_csv("data/X_test.csv")
X_test.head()

Unnamed: 0,ROW_ID,TS,ALLOCATION,RET_20,RET_19,RET_18,RET_17,RET_16,RET_15,RET_14,...,SIGNED_VOLUME_9,SIGNED_VOLUME_8,SIGNED_VOLUME_7,SIGNED_VOLUME_6,SIGNED_VOLUME_5,SIGNED_VOLUME_4,SIGNED_VOLUME_3,SIGNED_VOLUME_2,SIGNED_VOLUME_1,AVG_DAILY_TURNOVER
0,180245,DATE_2774,ALLOCATION_01,-0.006869,-0.001703,-0.003348,-0.003129,-0.003368,-0.002515,-0.001182,...,1.467264,1.11489,1.227472,0.835403,1.571033,0.946056,1.691493,3.561658,0.038297,0.01031
1,180246,DATE_2774,ALLOCATION_02,-0.002409,-0.000763,0.001349,-0.002069,-0.000147,0.002263,-0.004348,...,0.212862,0.571286,0.711297,0.815702,-0.10249,0.969315,1.738142,5.400385,2.127508,0.013252
2,180247,DATE_2774,ALLOCATION_03,-0.004936,-0.001041,-0.004108,-0.002354,-0.003627,0.000263,0.001072,...,1.16516,1.004918,1.159257,1.110129,1.473592,1.01356,1.622486,2.346189,0.676293,0.013975
3,180248,DATE_2774,ALLOCATION_04,-0.008992,-0.000644,0.001352,-0.004524,-0.004002,-0.004404,0.000524,...,1.557001,1.156245,1.688199,0.553323,2.060668,0.746113,2.300634,5.564923,-0.637974,0.017026
4,180249,DATE_2774,ALLOCATION_05,-0.002797,-0.001686,0.002453,-0.000645,0.000615,-0.000624,-0.004374,...,0.135133,0.115105,0.344619,0.312612,0.932051,0.335749,1.021416,2.351529,0.030891,0.006701


In [10]:
X_test.shape

(7735, 44)