# Understanding Winton Stock Market Data

과거 몇일전의 데이터를 받아서 몇일후에 주가를 예측하는 Challenge입니다.<br>
5일동안의 데이터를 받습니다. D-2, D-1, D, D+1, D+2 중에서 D-2, D-1, 그리고 D의 일부 시간 데이터를 받을뒤 남은 D, D+1, D+2의 주가를 예측하면 됩니다.<br>

D날에는 


<img src="images/winton-stock-data.png">

### **train.csv**
   * Feature_1 - Feature_25
   * Ret_MinusTwo, Ret_MinusOne
   * Ret_2 - Ret_120
   * Ret_121 - Ret_180: target variables
   * Ret_PlusOne, Ret_PlusTwo: target variables
   * Weight_Intraday, Weight_Daily
   
### **test.csv**
   * Feature_1 - Feature_25
   * Ret_MinusTwo, Ret_MinusOne
   * Ret_2 - Ret_120
   

### Data Fields



* **Feature_1 ~ Feature25:** prediction에 관련된 여러가지 features
* **Ret_MinusTwo:** D-2의 폐장시점부터 D1의 폐장시점까지의 수익률입니다. (1 day)
* **Ret_MinusOne:** D-1의 폐장시점부터 D의 일중 변동(Intraday returns)전까지의 수익률입니다. (대략 1/2 day)
* **Ret_2 ~ Ret_120:** D의 특정시점의 대략적인 분단위 수익률입니다. Ret_2의 경우 t=1 과 t=2사이의 수익률입니다.
* **Ret_121 ~ Ret_180:** D의 특정시점의 대략적인 분단위 수익률입니다. **target variables로서 {id}_{1~60}으로 예측을 해야하는 부분입니다**
* **Ret_PlusOne:** Ret_180이 계산 종료된 시점부터 D+1의 폐장시점까지의 수익률입니다. (대략 1 day). **target variables로서 {id}_61 로 예측을 해야 합니다.**
* **Ret_PlusTwo:** D+1의 폐장시점부터 D+2의 폐장시점까지의 수익률입니다. (1 day) **target variables로서 {id}_62로 예측을 해야 합니다.**
* **Weight_Intraday:** Ret_121 부터 Ret_180 까지의 일중 수익률을 계산시 사용한 가중치(weight)입니다.
* **Weight_Daily:**: Ret_PlusOne 그리고 Ret_PlusTwo를 예측할때 사용한 가중치 (Weight)입니다.

### Evaluation

* https://www.kaggle.com/c/the-winton-stock-market-challenge#evaluation

Winston Stock Market Challenge에서는 [Weighted Mean Absolute Error](https://www.kaggle.com/wiki/WeightedMeanAbsoluteError) 를 사용해서 평가를 합니다. 

$$ WMAE = \frac{1}{n}\sum\limits_{i=1}^{n} w_i \cdot \left|y_i - \hat{y_i}\right| $$

즉 여기서 $ w_i $는 Weight_Intraday 그리고 Weight_Daily 를 가르킵니다.

In [10]:
%pylab inline
import numpy as np
import pandas as pd

Populating the interactive namespace from numpy and matplotlib


## Loading Data

In [2]:
train = pd.read_csv('/dataset/winton-stock-market-challenge/train.csv')

In [3]:
train[['Id', 'Feature_1', 'Feature_25', 
       'Ret_MinusTwo', 'Ret_MinusOne', 
       'Ret_2', 'Ret_180', 
       'Ret_PlusOne', 'Ret_PlusTwo', 
       'Weight_Intraday', 'Weight_Daily']].head()

Unnamed: 0,Id,Feature_1,Feature_25,Ret_MinusTwo,Ret_MinusOne,Ret_2,Ret_180,Ret_PlusOne,Ret_PlusTwo,Weight_Intraday,Weight_Daily
0,1,,,0.055275,-0.01077,3e-06,-0.001974,-0.019512,0.028846,1251508.0,1564385.0
1,2,,-0.709462,0.009748,0.002987,-0.000487,2.7e-05,-0.002939,-0.010253,1733950.0,2167438.0
2,3,,-1.01937,0.003077,0.006181,-0.000782,0.000784,-0.024791,0.015711,1529197.0,1911497.0
3,4,,,0.000984,0.014106,0.000277,0.000341,-0.00568,-0.00219,1711569.0,2139462.0
4,5,6.0,3.21982,-0.018224,0.011065,-0.001232,-4e-06,0.036104,-0.026552,1267270.0,1584088.0


## Feature Summary

In [12]:
features = train[[i for i in range(1, 26)]]
features.describe()

Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Feature_6,Feature_7,Feature_8,Feature_9,Feature_10,...,Feature_16,Feature_17,Feature_18,Feature_19,Feature_20,Feature_21,Feature_22,Feature_23,Feature_24,Feature_25
count,6687.0,30854.0,38763.0,32279.0,40000.0,38067.0,40000.0,39531.0,38125.0,20529.0,...,39390.0,39354.0,39432.0,38810.0,32174.0,38982.0,38655.0,38289.0,39274.0,39345.0
mean,3.59025,-0.117558,0.558392,0.405572,5.482775,0.430972,49244.971525,0.196958,10.680289,4.744703,...,1.007362,-0.549725,0.803059,-1.205438,5.267359,0.605593,-0.773089,0.799833,-1.20929,-0.329675
std,2.798532,1.23625,0.902233,0.799082,2.942324,1.498274,28242.409717,0.138485,2.850634,0.865096,...,0.085488,0.936833,1.165442,0.642426,2.549227,1.319158,1.389229,1.28804,1.739656,0.958661
min,1.0,-3.440521,-4.643526,-5.440596,1.0,-0.936644,338.0,0.0098,0.0,1.0,...,1.0,-2.613987,-5.758047,-3.292909,2.0,-1.514998,-5.819912,-7.221387,-11.442205,-1.903876
25%,1.0,-0.967186,-0.110192,-0.111696,2.0,-0.265555,26143.0,0.0166,9.0,5.0,...,1.0,-1.021216,0.057598,-1.619718,3.0,-0.294925,-1.787615,0.539979,-1.838688,-0.830749
50%,3.0,-0.389162,0.437228,0.403516,6.0,0.055564,48457.0,0.2138,11.0,5.0,...,1.0,-0.59905,0.587005,-1.169327,5.0,0.308468,-0.699112,0.96258,-0.868435,-0.55155
75%,6.0,0.414442,1.064754,0.945944,8.0,0.559921,72387.0,0.3318,12.0,5.0,...,1.0,-0.184854,1.321231,-0.735786,7.0,1.109743,0.282958,1.415303,-0.129465,-0.257543
max,10.0,4.17515,4.530405,2.953163,10.0,12.609885,99861.0,0.365,36.0,6.0,...,2.0,7.683857,6.352352,0.898236,10.0,7.73702,2.284991,3.228906,2.526654,4.020332


## Return Summary

In [47]:
returns = train[[26, 27, 28, 146, 147, 206, 207, 208]]
returns.describe()

Unnamed: 0,Ret_MinusTwo,Ret_MinusOne,Ret_2,Ret_120,Ret_121,Ret_180,Ret_PlusOne,Ret_PlusTwo
count,40000.0,40000.0,38946.0,40000.0,40000.0,40000.0,40000.0,40000.0
mean,0.000784,-0.000803,5.60093e-06,-2.081735e-06,-3.5392e-05,-5.7e-05,-0.00021,1.2e-05
std,0.028279,0.030569,0.0009501528,0.001207381,0.001095917,0.00127,0.025039,0.02416
min,-0.536283,-0.51472,-0.01311777,-0.02844145,-0.02174075,-0.058046,-0.62769,-0.450779
25%,-0.010687,-0.01083,-0.0003215495,-0.0003465326,-0.0003439227,-0.000413,-0.010521,-0.010055
50%,0.000112,-0.000665,-1.989567e-07,-8.172886e-08,7.410876e-08,-2e-06,-0.000258,-0.000258
75%,0.010987,0.008976,0.0003122045,0.0003437908,0.0003389849,0.000364,0.010005,0.009772
max,0.894024,0.852139,0.03214902,0.1160767,0.05115931,0.026112,0.795602,0.303038
