# 7. 什么是Alpha101?
*用数学公式找到Alpha机会*

## 目录

1. Alpha101是什么？
2. Alpha101用到的算法有哪些？
3. Alpha001-010怎么写？
4. 如何用TA_Lib设计新的Alpha因子？

## Alpha101是什么？
WorldQuant根据数据挖掘的方法发掘了101个alpha，据说里面 80% 的因子仍然还行之有效并运行在他们的投资策略中。Alpha101给出的公式，也就是计算机代码101年真实的定量交易Alpha。他们的平均持有期大约范围0.6 - 6.4天。平均两两这些Alpha的相关性较低,为15.9%。回报是与波动强相关，但对换手率没有明显的依赖性，直接确认较早的间接经验分析结果。我们从经验上进一步发现换手率对alpha相关性的解释能力很差。

PDF下载：

Python代码下载：

## Alpha101主要元素有什么？
### 1. 因子组成元素
- 价量因子（52/101）：
    - HLOC
    - ADV
    - VWAP
    - Volume
- 价格波动因子（21/101）:
    - HLOC
    - Return
    - STD()
- 组合因子(8/101): 价量与价波因子组合
- 市值因子（1/101）:
    - Return
    - Cap
- 板块组合因子（19/101）: 板块分类结合价量与价波因子

### 2. 函数与运算符
- 'x?y:z'是指x为True，返回y，否则返回z。
- Rank是指横向的品种间排序。
- ts_Rank是指纵向的时间序列排序。

详细参考原文PDF

## Alpha101用到的算法有哪些？
1. 编制函数需要的算法

In [1]:
# 1. 编制函数需要的算法，
#coding=utf-8
import numpy as np
import pandas as pd
from scipy.stats import rankdata

# 计算alpha101时会使用的函数
def ts_sum(df,window=10):
    return df.rolling(window).sum()

def ts_mean(df,window=10):
    return df.rolling(window).mean()

def stddev(df,window=10):
    return df.rolling(window).std()

def correlation(x,y,window=10):
    return x.rolling(window).corr(y)

def covariance(x,y,window=10):
    return x.rolling(window).cov(y)

def rolling_rank(na):
    return rankdata(na)[-1]

def ts_rank(df, window=10):
    return df.rolling(window).apply(rolling_rank)

def rolling_prod(na):
    return na.prod(na)

def product(df,window=10):
    return df.rolling(window).apply(rolling_prod)

def ts_min(df,window=10):
    return df.rolling(window).min()

def ts_max(df,window=10):
    return df.rolling(window).max()

def delta(df,period=1):
    return df.diff(period)

def delay(df,period=1):
    return df.shift(period)

def rank(df):
    return df.rank(axis=1, pct=True)

def scale(df,k=1):
    return df.mul(k).div(np.abs(df).sum())

def ts_argmax(df,window=10):
    return df.rolling(window).apply(np.argmax)+1

def ts_argmin(df,window=10):
    return df.rolling(window).apply(np.argmin)+1

def decay_linear(df,period=10):
    if df.isnull().values.any():
        df.fillna(method='ffill',inplace=True)
        df.fillna(method='bfill',inplace=True)
        df.fillna(value=0, inplace=True)
    return pd.DataFrame(
        {name: ta.WMA(item.values, period) for name, item in df.iteritems()},
        index=df.index
    )

## Alpha001-010怎么写？
2. 定义计算Alpha的类
3. 编制因子的函数
4. 传入股票池数据

In [2]:
# 2. 定义计算alpha值的类
class alphas(object):
    def __init__(self, pn_data):
        """
        :传入参数 pn_data: pandas.Panel
        """
        # 获取历史数据
        self.open = pd.DataFrame(pn_data.minor_xs('open'), dtype=np.float64)
        self.high = pd.DataFrame(pn_data.minor_xs('high'), dtype=np.float64)
        self.low = pd.DataFrame(pn_data.minor_xs('low'), dtype=np.float64)
        self.close = pd.DataFrame(pn_data.minor_xs('close'), dtype=np.float64)
        self.volume = pd.DataFrame(pn_data.minor_xs('volume'), dtype=np.float64)
        self.returns = pd.DataFrame(self.close.pct_change())
        self.adv = ts_mean(self.volume, 10)
        self.vwap = ts_sum(self.close*self.volume, 10)/ts_sum(self.volume, 10)

# 3. 编制因子的函数
    
    #   alpha001:(rank(Ts_ArgMax(SignedPower(((returns < 0) ? stddev(returns, 20) : close), 2.), 5)) -0.5)
    def alpha001(self):
        inner = self.close
        inner[self.returns < 0] = stddev(self.returns, 20)
        alpha = rank(ts_argmax(inner ** 2, 5))
        return alpha
    
    #  alpha002:(-1 * correlation(rank(delta(log(volume), 2)), rank(((close - open) / open)), 6))
    def alpha002(self):
        alpha = -1 * correlation(rank(delta(np.log(self.volume), 2)), rank((self.close - self.open) / self.open), 6)
        return alpha.replace([-np.inf, np.inf], np.nan)

    # alpha003:(-1 * correlation(rank(open), rank(volume), 10))
    def alpha003(self):
        alpha = -1 * correlation(rank(self.open), rank(self.volume), 10)
        return alpha.replace([-np.inf, np.inf], np.nan)

    # alpha004: (-1 * Ts_Rank(rank(low), 9))
    def alpha004(self):
        alpha = -1 * ts_rank(rank(self.low), 9)
        return alpha
    
    # alpha005:(rank((open - (sum(vwap, 10) / 10))) * (-1 * abs(rank((close - vwap)))))
    def alpha005(self):
        alpha = (rank((self.open - (ts_sum(self.vwap, 10) / 10))) * (-1 * np.abs(rank((self.close - self.vwap)))))
        return alpha
    
    # alpha006: (-1 * correlation(open, volume, 10))
    def alpha006(self):
        alpha = -1 * correlation(self.open, self.volume, 10)
        return alpha
        
    # alpha007: ((adv20 < volume) ? ((-1 * ts_rank(abs(delta(close, 7)), 60)) * sign(delta(close, 7))) : (-1* 1))
    def alpha007(self):
        adv20 = ts_mean(self.volume, 20)
        alpha = -1 * ts_rank(abs(delta(self.close, 7)), 60) * np.sign(delta(self.close, 7))
        alpha[adv20 >= self.volume] = -1
        return alpha

    # alpha008: (-1 * rank(((sum(open, 5) * sum(returns, 5)) - delay((sum(open, 5) * sum(returns, 5)),10))))
    def alpha008(self):
        alpha = -1 * (rank(((ts_sum(self.open, 5) * ts_sum(self.returns, 5)) -
                          delay((ts_sum(self.open, 5) * ts_sum(self.returns, 5)), 10))))
        return alpha
    
    # alpha009:((0 < ts_min(delta(close, 1), 5)) ? delta(close, 1) : ((ts_max(delta(close, 1), 5) < 0) ?delta(close, 1) : (-1 * delta(close, 1))))
    def alpha009(self):
        delta_close = delta(self.close, 1)
        cond_1 = ts_min(delta_close, 5) > 0
        cond_2 = ts_max(delta_close, 5) < 0
        alpha = -1 * delta_close
        alpha[cond_1 | cond_2] = delta_close
        return alpha

    # alpha010: rank(((0 < ts_min(delta(close, 1), 4)) ? delta(close, 1) : ((ts_max(delta(close, 1), 4) < 0)? delta(close, 1) : (-1 * delta(close, 1)))))
    def alpha010(self):
        delta_close = delta(self.close, 1)
        cond_1 = ts_min(delta_close, 4) > 0
        cond_2 = ts_max(delta_close, 4) < 0
        alpha = -1 * delta_close
        alpha[cond_1 | cond_2] = delta_close
        return alpha

In [3]:
# 4. 传入股票池数据
if __name__ == '__main__':
    import pandas as pd
    import tushare as ts

    codes = ['000001', '601318', '600029', '000089', '000402', 
             '000895', '600006', '000858', '600036', '600050']
    stocks_dict = {}
    for c in codes:
        stock = ts.get_k_data(c, start='2016-01-01', end='2016-12-31', ktype='D', autype='qfq')
        stock.index = pd.to_datetime(stock['date'], format='%Y-%m-%d')
        stock.pop('date')
        stocks_dict[c] = stock
    
    pn = pd.Panel(stocks_dict)

# 计算cmt001-010的值
print 'one:', alphas(pn).alpha001().tail(3)
print 'two:', alphas(pn).alpha002().tail(3)
print 'three:', alphas(pn).alpha003().tail(3)
print 'four:', alphas(pn).alpha004().tail(3)
print 'five:', alphas(pn).alpha005().tail(3)
print 'six:', alphas(pn).alpha006().tail(3)
print 'seven:', alphas(pn).alpha007().tail(3)
print 'eight:', alphas(pn).alpha008().tail(3)
print 'nine:', alphas(pn).alpha009().tail(3)
print 'ten:', alphas(pn).alpha010().tail(3)

one:             000001  000002  000089  000402  000858  000895  600006  600029  \
date                                                                         
2016-12-28    0.60    0.90    0.90    0.60    0.35    0.90    0.15    0.15   
2016-12-29    0.45    0.75    0.75    0.45    0.15    0.95    0.45    0.95   
2016-12-30    0.30    0.65    0.65    0.30    0.90    0.90    0.30    0.90   

            600036  600050  
date                        
2016-12-28    0.60    0.35  
2016-12-29    0.45    0.15  
2016-12-30    0.30    0.30  
two:               000001    000002    000089    000402    000858    000895  \
date                                                                     
2016-12-28 -0.441017 -0.782143 -0.173049 -0.764651  0.509797 -0.113961   
2016-12-29 -0.340427 -0.834709  0.204831 -0.760374  0.514727 -0.360704   
2016-12-30 -0.396615 -0.892770  0.384995 -0.627246  0.068359 -0.442627   

              600006    600029    600036    600050  
date                          

## 如何用TA_Lib设计新的Alpha因子？
加入slope技术指标

In [4]:
import talib as ta
import numpy as np

def slope(df, period=10):
    return pd.DataFrame(
        {name: ta.LINEARREG_SLOPE(item.values, period) for name, item in df.iteritems()},
        index=df.index
        )

class alphas(object):
    def __init__(self, pn_data):
        if pn_data.isnull().values.any():
            pn_data.fillna(method='ffill',inplace=True)
        self.close = pd.DataFrame(pn_data.minor_xs('close'), 
                                  dtype=np.float64)

# 自制因子
    def slope001(self):
        alpha = -1 * slope(self.close)
        return alpha

print alphas(pn).slope001().tail(5)

              000001    000002    000089    000402    000858    000895  \
date                                                                     
2016-12-26  0.035455  0.308667  0.027515  0.006545  0.144055  0.023727   
2016-12-27  0.028909  0.192182  0.013636 -0.022121  0.141430  0.020703   
2016-12-28  0.020667  0.119152  0.014970 -0.027273  0.190570  0.040285   
2016-12-29  0.017212  0.049879  0.021394 -0.022303  0.232873  0.048182   
2016-12-30  0.010606 -0.030303  0.015636 -0.018061  0.216691  0.035261   

              600006    600029    600036    600050  
date                                                
2016-12-26 -0.055697 -0.035818  0.137212 -0.112848  
2016-12-27 -0.048061 -0.037879  0.104061 -0.131030  
2016-12-28 -0.022848 -0.029697  0.065636 -0.127636  
2016-12-29  0.005394 -0.016606  0.058424 -0.098545  
2016-12-30  0.021394 -0.004727  0.040121 -0.050545  


## 作业
下载Alpha101完整代码研究，并设计有效的Alpha因子，导入Alphalens计算绩效。