#### 1. 本报告定义了如何快速实现部分alpha因子值
#### 2. 对于报告中的难点进行了讨论和探究
---

#### 1. paper中每个alpha计算公式用到了基本的开高低收价格及成交量、成交额等行情数据
#### 2. 通过对投资组合中的每只股票按其公式计算对应的alpha因子值，然后进行排序评分，作为下一期的投资参考
#### 3. 具体的投资策略，可根据计算出的因子值灵活构建，如可根据因子值进行每只股票的持仓权重设定等
---

#### 公式实现框架概述

* 以 600029.SH, 601398.SH, 600030.SH ,600010.SH ,'600031.SH' 5只股票为例，定义一个简易的投资组合，探讨如何快速的实现因子值

* abs(x) ,log(x) ,ts_min(x,d), ts_max(x,d), ts_argmax(x,d), ts_argmin(x,d),correlation(x,y,d),delay(x,d) 等等基础计算公式可通过 **pandas ,numpy** 自带的函数实现  
* 实现难点在于 **rank** 函数，论文中的rank(x) 表示某股票x值在股票池中横截面上的升序排名序号，并将排名归一到[0,1]的闭区间  
* 对于含有 **rank**函数的公式可具体细分如下：
> 1. 若只有一层rank:
>> 若rank(x) 中的x 变量为volume、open、close这类行情数据中已有的项目，则可通过传入已有项的名称来实现  
>> 若rank(x) 中x 变量是某个量价因子所产生的值，可通过用户自定义x的计算公式，传入函数参数实现   (未实现)
> 2. 若含有多层rank:  
>> 如rank(function(rank(x)))等形式，只能用户自定义函数逐步实现 (未实现)

In [1]:
import pandas as pd
import numpy as np
from wlkbacktest import Strategy

In [2]:
# 配置策略环境
def init(context):
    context.start = '20080101'
    context.end = '20151001'
    context.securities = ['600029.SH','601398.SH','600030.SH','600010.SH','600031.SH'] # 以5只股票为例，设定一个投资组合
    
stra = Strategy(init) #初始化策略类
datas = stra.get_trading_data() #获取交易数据

In [3]:
datas # 所有交易数据

<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 1885 (major_axis) x 6 (minor_axis)
Items axis: 600010.SH to 601398.SH
Major_axis axis: 2008-01-02 00:00:00 to 2015-09-30 00:00:00
Minor_axis axis: open to amount

### 查阅一下数据

In [4]:
datas['600029.SH'].head()

Unnamed: 0_level_0,open,high,close,low,volume,amount
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-01-02,17.14,17.99,17.55,17.03,26385079.0,759282152.0
2008-01-03,17.36,17.37,17.01,16.61,23295654.0,644080406.0
2008-01-04,17.07,17.19,16.8,16.37,29081275.0,795181029.0
2008-01-07,16.83,16.96,16.81,16.36,20952882.0,569655271.0
2008-01-08,16.86,17.1,16.27,16.06,27145581.0,732970680.0


In [5]:
datas['600010.SH'].tail()

Unnamed: 0_level_0,open,high,close,low,volume,amount
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-09-24,3.7,3.78,3.75,3.7,55137572.0,206567904.0
2015-09-25,3.73,3.78,3.64,3.61,60486856.0,223005776.0
2015-09-28,3.67,3.7,3.66,3.59,34519800.0,125944416.0
2015-09-29,3.59,3.6,3.53,3.5,66740496.0,236192896.0
2015-09-30,3.55,3.59,3.55,3.51,53250280.0,188991264.0



#### 举例说明
* 以下分别就含rank和不含rank的公式进行举例说明  

#### 以Alpha 6 和 Alpha 12 两个不含rank的公式为例:
* Alpha 6：   -1 * correlation(open, volume, 10)
* Alpha 12：  sign(delta(volume, 1)) ** (-1 * delta(close, 1))

#### 以Alpha 44 含一层rank(x)的公式为例 且 x 为行情数据已有项:
* Alpha 44：   -1 * correlation(high, rank(volume), 5)

In [6]:
# Alpha 6
def alpha_6(data, **kwargs):
    '''
    param data: pandas.DataFrame;including open,high,low,close,volume,amount
    '''
    data = data.copy()
    res = -pd.rolling_corr(data['open'], data['volume'], window=10)
    return res

# Alpha 12
def alpha_12(data, **kwargs):
    data = data.copy()
    res = np.sign(data['volume'].diff()) * (-1 * data['close'].diff())
    return res

# Alpha 44
def alpha_44(data):
    '''
    data: pandas.DataFrame 
    data中的r_volume值交由外层的函数实现
    假设data参数已默认包含'r_volume'列,方便公式逻辑的实现
    '''
    data = data.copy()
    res = - pd.rolling_corr(data['high'], data['r_volume'], 5)
    return res

### add_alpha快速实现因子值的计算，传入行情数据参数，和需计算的alpha公式函数，及函数中所需要涉及到rank的项

** datas 为原始的交易数据，pandas.Panel结构，每个标的名称对应的DataFrame只含开高低收，成交量，成交额6项数据 **    
** funs 函数列表传入 需要计算的alpha公式函数名 **  
** rank_items 为公式中需要涉及rank（x）的项 **

In [7]:
# 定义添加alpha函数
def add_alpha(datas,funs,rank_items = []):
    '''
    params : list of functions
    params : 要参与排名的itesm e.g.'open','volume'，以实现funs里面函数所需的 'r_volume','r_open'等项
    '''
    secs = datas.items # 标的个数
    nums = len(secs)
    if len(rank_items)!=0: #若要排序的项不为0
        datas_ = datas.swapaxes('items','minor') #转换格式，方便比较
        for item in rank_items:
            x = (datas_[item].rank(1) - 1)/nums
            for sec in secs:
                datas[sec]['r_'+item] = x[sec]

    for sec in secs:
        df = datas[sec]
        for fun in funs:
            df[fun.__name__] = fun(df)
    return datas

In [8]:
new_datas = add_alpha(datas,funs=[alpha_6,alpha_12,alpha_44],rank_items=['open','volume'])

In [9]:
new_datas

<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 1885 (major_axis) x 6 (minor_axis)
Items axis: 600010.SH to 601398.SH
Major_axis axis: 2008-01-02 00:00:00 to 2015-09-30 00:00:00
Minor_axis axis: open to amount

## 查看数据 以600030.SH为例

In [10]:
new_datas['600030.SH'].tail(20)

Unnamed: 0_level_0,open,high,close,low,volume,amount,r_open,r_volume,alpha_6,alpha_12,alpha_44
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-09-01,14.6,15.31,15.19,14.2,498759904.0,7443513344.0,0.8,0.4,0.9144,0.12,-0.4939
2015-09-02,14.62,15.15,14.87,14.51,362296000.0,5389001216.0,0.8,0.4,0.8475,-0.32,-0.6294
2015-09-07,14.85,15.44,14.76,14.7,200888128.0,3032953856.0,0.8,0.4,0.4947,-0.11,-0.5676
2015-09-08,14.6,15.29,15.24,14.28,215004144.0,3183032832.0,0.8,0.6,0.0489,-0.48,0.0489
2015-09-09,15.34,15.79,15.68,15.19,328061664.0,5101934080.0,0.8,0.6,-0.4091,-0.44,-0.5408
2015-09-10,15.4,15.51,15.15,15.07,192552624.0,2949927680.0,0.8,0.8,-0.0685,-0.53,-0.4399
2015-09-11,15.17,15.4,15.02,14.92,141370912.0,2142975360.0,0.8,0.8,-0.0104,-0.13,0.0255
2015-09-14,15.08,15.13,13.91,13.58,262223664.0,3739513600.0,0.8,0.4,-0.0023,1.11,-0.4287
2015-09-15,13.75,14.19,13.52,13.43,170885888.0,2354542848.0,0.8,0.4,-0.1225,-0.39,-0.6472
2015-09-16,13.36,14.87,14.43,12.84,333957440.0,4553831424.0,0.8,0.8,0.1623,-0.91,-0.6244


## 查看数据 601398.SH为例

In [11]:
new_datas['601398.SH'].tail(20)

Unnamed: 0_level_0,open,high,close,low,volume,amount,r_open,r_volume,alpha_6,alpha_12,alpha_44
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-09-01,3.98,4.3,4.27,3.96,1018541248.0,4491813888.0,0.2,0.8,0.4582,-0.29,-0.2272
2015-09-02,4.15,4.7,4.7,4.12,1394605696.0,6542697984.0,0.2,0.8,0.1058,-0.43,-0.5725
2015-09-07,4.49,4.51,4.29,4.27,648542784.0,2988526848.0,0.2,0.8,-0.0894,-0.41,-0.8326
2015-09-08,4.22,4.31,4.27,4.13,404199392.0,1800877184.0,0.2,0.8,-0.1202,-0.02,-0.7779
2015-09-09,4.23,4.3,4.27,4.2,366216704.0,1636333824.0,0.2,0.8,-0.0804,0.0,0.0
2015-09-10,4.24,4.27,4.25,4.21,184433632.0,822389632.0,0.2,0.6,0.0496,-0.02,-0.4496
2015-09-11,4.23,4.25,4.21,4.2,128608432.0,572234112.0,0.2,0.6,0.1112,-0.04,-0.594
2015-09-14,4.2,4.37,4.29,4.15,426149920.0,1920341376.0,0.2,0.8,0.1215,-0.08,-0.7968
2015-09-15,4.25,4.35,4.31,4.22,382841472.0,1729309184.0,0.2,0.8,0.1995,0.02,-0.8561
2015-09-16,4.26,4.36,4.3,4.22,238809952.0,1073991424.0,0.2,0.6,0.3845,-0.01,-0.6558


** add_alpha 函数可以快速实现因子值计算 **  
** 同时方便编写只含一个rank(x)(x为 open,high,low,close,volume,amount之一) 的ahpha公式函数，只需在函数中假设data参数已含有 rank(x)这项列值即可，名称为 'r_XXXX' ,例如 'r_volume' 代表 rank(volume)的值**