# Trend Factor in China      
——Yang Liu, Guofu Zhou and Yingzi Zhu (Version: February 8, 2020)

# Introduction:
**Backgroud**:    
China is the world's second largest stock market, it is important to examine how well asset pricing theory developed previously in the US applies in China.      
**Motivation**:      
1.The Fama-French 3-factor model is one of the most important models for pricing US stocks, but its replication doesn't work well for Chinese stocks.          
2.LSY-3 (unique features of small stocks) still cannot explain certain important anomalies.   
3.Another important feature: individual investors contribute about 80% of the total trading volume    
**Main Work**:      
1.construct a trend factor specific to China to summarize succinctly the impact of past price and volume trends on future expected stock returns     
2.propose a 4-factor model consisting of the market, size, value, and trend    
**Main Conclusions**:    
1.Trend factor stands out in terms of average return, Sharpe ratio, the maximum drawdown.    
2.Our 4-factor model improves the state of art models with greater explanatory power. (GRS test)   
3.Our model is able to explain all reported pricing anomalies in China, including those not captured by LSY-3 or LSY-4. (GRS test)   
4.Our model also excels in explaining mutual fund returns with smaller aggregate pricing errors than LSY-3 and LSY-4.(GRS test)   
5.The Sharpe ratio of our 4-factor model is substantially greater than others.(Barillas and Shanken (2017))   
6.Our 4-factor model outperforms substantially the replication of Fama and French's (2015) 5-factor model and Hou, Xue, and Zhang's (2015) q-factor model in China.   
**Innovations**:    
1.Our factor has volume information.(compared with Han, Zhou and Zhu(2016))   
2.A new asset pricing model.

# Methodology:
**Trend signals**:    
1.the moving average (MA) price/volume signals with lag L (for each stock in each month)  
2.normalize the MA price/volume signals by the closing price/trading volume on on the last trading day for stationarity   
3.run the predictive cross-section regression in each month with signals on both price and volume to get the coefficient of the MA signal of price/volume   
4.forecast coefficient of price/volume MA signals (exponential moving average)    
5.get the expected return based on the trend signals for each stock in each month (ERtrend)   
Note:    
1.Following Brock, Lakonishok, and LeBaron (1992) and Han, Zhou, and Zhu (2016), we consider the MA signals with lag lengths 3-, 5-, 10-, 20-, 50-, 100-, 200-, 300-, and 400-days to capture short-, intermediate-, and long-term trends.     
2.We skip 38 months to estimate the forecasting coefficients, the effiective sample period for our study is from January 2005 to July 2018.   
3.During the suspension of trade period, we use the data right before the suspension to fill in.

![0.png](https://i.loli.net/2020/06/20/P7HeFiVcyhSsfpr.png)

In [2]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import datetime as dt
from datetime import timedelta
import scipy.stats as st
from pandas.core.frame import DataFrame
from numpy.linalg import inv

In [None]:
# 日收盘价、交易额数据
a1 = pd.read_csv('1.csv')
a2 = pd.read_csv('2.csv')
a3 = pd.read_csv('3.csv')
a4 = pd.read_csv('4.csv')
a5 = pd.read_csv('5.csv')
a6 = pd.read_csv('6.csv')
a7 = pd.read_csv('7.csv')
a8 = pd.read_csv('8.csv')
a9 = pd.read_csv('9.csv')
a10 = pd.read_csv('10.csv')
a11 = pd.read_csv('11.csv')
a12 = pd.read_csv('12.csv')

data = pd.concat([a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12])
data['Trddt'] =  pd.to_datetime(data['Trddt'])
data['year'] = data['Trddt'].dt.year
data['month'] = data['Trddt'].dt.month
data = data[(data['Trddt'] <= '2018-07-02 00:00:00')]
#取出每月交易日最后一天, 从1999.12-2018.7
trade1d = data.drop_duplicates(subset=['year','month'], keep='last').iloc[35:,:]['Trddt']

#取出日度的收盘价和成交额数据
dClsprc = pd.pivot_table(data,index='Trddt',columns='Stkcd',values='Clsprc')
dDnvaltrd  = pd.pivot_table(data,index='Trddt',columns='Stkcd',values='Dnvaltrd')
dDretwd = pd.pivot_table(data,index='Trddt',columns='Stkcd',values='Dretwd')
#交易暂停时收盘价和成交额数据按前一个交易日的数据进行补充
def fill_in(dataset1, dataset2):
    allindex = list(pd.date_range('1997-1-2','2018-7-1',freq='D'))
    realindex = list(dataset1.index)
    needindex = []
    for i in allindex:
        if i not in realindex:
            needindex.append(i)
    need1 = pd.DataFrame(index=needindex,columns=dClsprc.columns)
    need2 = pd.DataFrame(index=needindex,columns=dClsprc.columns)
    for i in needindex:
        k=1
        while True:
            if i + timedelta(days = -k) in realindex:
                need1.loc[i,:] = dataset1.loc[i + timedelta(days = -k),:]
                need2.loc[i,:] = dataset2.loc[i + timedelta(days = -k),:]
                # print(k)
                break
            else:
                k=k+1
    dataset1_n = pd.concat([dataset1, need1])
    dataset2_n = pd.concat([dataset2, need2])
    return dataset1_n, dataset2_n

dClsprc_n, dDnvaltrd_n = fill_in(dClsprc, dDnvaltrd)

In [3]:
# dClsprc_n = pd.read_csv('dClsprc_n.csv',index_col=0)
# dDnvaltrd_n = pd.read_csv('dDnvaltrd_n.csv',index_col=0)
# 按时间顺序排序收盘价和交易额数据
dClsprc_n['date'] = dClsprc_n.index.tolist()
dDnvaltrd_n['date'] = dDnvaltrd_n.index.tolist()
dClsprc_n.sort_values(by='date', ascending = True, inplace = True)
dDnvaltrd_n.sort_values(by='date', ascending = True, inplace = True)
del dClsprc_n['date'], dDnvaltrd_n['date']

#导入月收益率2000.1-2018.7
ret_month = pd.read_csv('TRD_Mnth111.csv')
ret_month['Trdmnt'] = pd.to_datetime(ret_month['Trdmnt'])
ret_month = ret_month[(ret_month['Trdmnt'] <= '2018-07-02 00:00:00')]
ret = pd.pivot_table(ret_month,index='Trdmnt',columns='Stkcd',values='Mretwd')
valid = [x for x in dClsprc_n.columns if x in ret.columns]
dClsprc_n = dClsprc_n.loc[:, valid]
dDnvaltrd_n = dDnvaltrd_n.loc[:, valid]

In [5]:
names = locals()
def ERtrend_month(dClsprc_n,dDnvaltrd_n,ret,trade1d,lamda):
    #找到每月第一个交易日在日收盘价、日交易额数据中的标签
    all_index = pd.to_datetime(dClsprc_n.index).tolist()
    #pr_index = pd.to_datetime(dDretwd.index).tolist()
    index_num = [all_index.index(x) for x in trade1d]
    #pr_num = [pr_index.index(x) for x in trade1d]
    #计算经过标准化的MA_p和MA_v, 从1999.12-2018.7
    MA_list = pd.DataFrame()
    for L in [3,5,10,20,50,100,200,300,400]:
        names['nMAp'+str(L)] = pd.DataFrame()
        names['nMAv'+str(L)] = pd.DataFrame()
        for i in index_num:
            temp1 = dClsprc_n.iloc[i-L+1:i+1,:].mean(axis = 0)/dClsprc_n.iloc[i,:]
            temp1 = temp1.to_frame()
            temp2 = dDnvaltrd_n.iloc[i-L+1:i+1,:].mean(axis = 0)/dDnvaltrd_n.iloc[i,:]
            temp2 = temp2.to_frame()
            print(i)
            names['nMAp'+str(L)] = pd.concat([names['nMAp'+str(L)], temp1], axis = 1)
            names['nMAv'+str(L)] = pd.concat([names['nMAv'+str(L)], temp2], axis = 1)
    #计算2000.1-2018.7的MA_p和MA_vd的回归系数，其中2000.1-2018.7的收益率数据对应1999.12-2018.6的MA指标
    coeff = pd.DataFrame(index = ['nMAp3','nMAv3','nMAp5','nMAv5','nMAp10','nMAv10','nMAp20','nMAv20','nMAp50','nMAv50','nMAp100','nMAv100','nMAp200','nMAv200','nMAp300','nMAv300','nMAp400','nMAv400'])
    coeff_t = pd.DataFrame(index = ['nMAp3','nMAv3','nMAp5','nMAv5','nMAp10','nMAv10','nMAp20','nMAv20','nMAp50','nMAv50','nMAp100','nMAv100','nMAp200','nMAv200','nMAp300','nMAv300','nMAp400','nMAv400'])
    for i in range(len(index_num)-1):
        MA_list = pd.concat([ret.iloc[i,:],nMAp3.iloc[:,i],nMAv3.iloc[:,i],nMAp5.iloc[:,i],nMAv5.iloc[:,i],nMAp10.iloc[:,i],nMAv10.iloc[:,i],nMAp20.iloc[:,i],nMAv20.iloc[:,i],nMAp50.iloc[:,i],nMAv50.iloc[:,i],nMAp100.iloc[:,i],nMAv100.iloc[:,i],nMAp200.iloc[:,i],nMAv200.iloc[:,i],nMAp300.iloc[:,i],nMAv300.iloc[:,i],nMAp400.iloc[:,i],nMAv400.iloc[:,i]], axis=1)
        #MA_list = MA_list.apply(pd.to_numeric, errors='ignore')
        MA_list.columns = ['rt','nMAp3','nMAv3','nMAp5','nMAv5','nMAp10','nMAv10','nMAp20','nMAv20','nMAp50','nMAv50','nMAp100','nMAv100','nMAp200','nMAv200','nMAp300','nMAv300','nMAp400','nMAv400']
        #model = smf.ols('rt ~ nMAp3+nMAv3+nMAp5+nMAv5+nMAp10+nMAv10+nMAp20+nMAv20+nMAp50+nMAv50+nMAp100+nMAv100+nMAp200+nMAv200+nMAp300+nMAv300+nMAp400+nMAv400',MA_list).fit(cov_type='HAC',cov_kwds={'maxlags':6})
        model = smf.ols('rt ~ nMAp3+nMAv3+nMAp5+nMAv5+nMAp10+nMAv10+nMAp20+nMAv20+nMAp50+nMAv50+nMAp100+nMAv100+nMAp200+nMAv200+nMAp300+nMAv300+nMAp400+nMAv400',data = MA_list).fit()
        print(i)
        coeff = pd.concat([coeff, model.params], axis = 1)
        coeff_t = pd.concat([coeff_t, model.tvalues], axis = 1)
    coeff.columns = range(len(index_num)-1)
    #用2000.1-2018.7的回归系数的指数移动平均作为下一期系数预测值
    #系数的预测值初期取0，预测系数从2000.1-2018.8，第i行对应该期的系数的预测值
    temp = pd.DataFrame([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],index = ['nMAp3','nMAv3','nMAp5','nMAv5','nMAp10','nMAv10','nMAp20','nMAv20','nMAp50','nMAv50','nMAp100','nMAv100','nMAp200','nMAv200','nMAp300','nMAv300','nMAp400','nMAv400'])
    coeff_f = temp
    for i in range(len(index_num)-1):#实际上我们只需要2000.1-2018.7的系数预测值
        temp0 = pd.DataFrame(lamda*coeff.loc[['nMAp3','nMAv3','nMAp5','nMAv5','nMAp10','nMAv10','nMAp20','nMAv20','nMAp50','nMAv50','nMAp100','nMAv100','nMAp200','nMAv200','nMAp300','nMAv300','nMAp400','nMAv400'],i])
        temp = (1-lamda)*temp[0] + temp0[i] #去掉截距项
        coeff_f = pd.concat([coeff_f,temp], axis = 1)
    coeff_f = coeff_f.iloc[:,:-1]
    #基于MA_p和MA_v等指标的收益预测，从第60期也就是2005.1开始进行预测至2018.7，构建ERtrend
    #其中MA指标对应1999.12-2018.7，系数预测值对应2000.1-2018.7的系数预测值，得到的ERtrend对应2000.1-2018.7
    ER_trend = np.zeros((len(MA_list),len(index_num)-60-1), dtype = float)
    ER_trendP = np.zeros((len(MA_list),len(index_num)-60-1), dtype = float)
    ER_trendV = np.zeros((len(MA_list),len(index_num)-60-1), dtype = float)
    for i in range(60,len(index_num)-1):
        print(i)
        MA_list = pd.concat([nMAp3.iloc[:,i],nMAv3.iloc[:,i],nMAp5.iloc[:,i],nMAv5.iloc[:,i],nMAp10.iloc[:,i],nMAv10.iloc[:,i],nMAp20.iloc[:,i],nMAv20.iloc[:,i],nMAp50.iloc[:,i],nMAv50.iloc[:,i],nMAp100.iloc[:,i],nMAv100.iloc[:,i],nMAp200.iloc[:,i],nMAv200.iloc[:,i],nMAp300.iloc[:,i],nMAv300.iloc[:,i],nMAp400.iloc[:,i],nMAv400.iloc[:,i]], axis=1).apply(pd.to_numeric, errors='ignore')
        for j in range(len(MA_list)):
            ER_trend[j,i-60] = np.nansum(coeff_f.iloc[:,i].values*MA_list.iloc[j,:].values)
            ER_trendP[j,i-60] = np.nansum(coeff_f.iloc[[0,2,4,6,8,10,12,14,16],i].values*MA_list.iloc[j,[0,2,4,6,8,10,12,14,16]].values)
            ER_trendV[j,i-60] = np.nansum(coeff_f.iloc[[1,3,5,7,9,11,13,15,17],i].values*MA_list.iloc[j,[1,3,5,7,9,11,13,15,17]].values)
    ER_trend,ER_trendP,ER_trendV = pd.DataFrame(ER_trend),pd.DataFrame(ER_trendP),pd.DataFrame(ER_trendV)
    ER_trend,ER_trendP,ER_trendV = ER_trend.replace(0, np.nan),ER_trendP.replace(0, np.nan),ER_trendV.replace(0, np.nan)
    ER_trend.index,ER_trendP.index,ER_trendV.index= ret.columns,ret.columns,ret.columns
    ER_trend.columns,ER_trendP.columns,ER_trendV.columns= range(1,164),range(1,164),range(1,164)
    ER_trendP = ER_trendP.stack().reset_index().rename(columns={'level_0': 'Stkcd','level_1': 'mon_num',0: 'ER'})
    ER_trendP['mon_num'] = ER_trendP['mon_num']+1
    ER_trendV = ER_trendV.stack().reset_index().rename(columns={'level_0': 'Stkcd','level_1': 'mon_num',0: 'ER'})
    ER_trendV['mon_num'] = ER_trendV['mon_num']+1
    ER_trend = ER_trend.stack().reset_index().rename(columns={'level_0': 'Stkcd','level_1': 'mon_num',0: 'ER'})
    ER_trend['mon_num'] = ER_trend['mon_num']+1
    return ER_trend,ER_trendP,ER_trendV

ER_trend,ER_trendP,ER_trendV = ERtrend_month(dClsprc_n,dDnvaltrd_n,ret,trade1d,0.02)

**Trend Factor and Others**:   
We use the trend-based expected return (ERtrend) along with the market capitalization (Size) and earnings-to-price ratio (EP) to construct the trend factor (Trend), the size factor (SMB), and the value factor (VMG) in our 4-factor model.   
1.exclude the smallest 30% of stocks each month   
2.sort the remaining 70% of stocks independently into two size groups (SizeSmall and SizeBig) by the median of the market capitalization, three EP groups (EPLow, EPMid, and EPHigh) and three trend groups (TrendLow, TrendMid, and TrendHigh) by the 30th and 70th percentiles of EP and ERtrend at the end of each month         
3.define trend factor (Trend) as the average of value-weighted (VW) returns of 6 portfolios in the TrendHigh group minus that in the TrendLow group, define the size factor (SMB) as the average of VW returns of 9 portfolios in the Sizesmall group minus that in the SizeBig group, define the value factor (VMG) as the average of VW returns of 6 portfolios in the EPHigh group minus that in the EPLow group, define the market factor (MKT) as the return on the VW portfolio of the top 70% of stocks      
Note: weigh each stock by the market capitalization of all its outstanding A-Shares

In [7]:
#导入总市值数据2004.12-2018.6
mktcap = pd.read_csv('TRD_Mnth1.csv')
mktcap['Trdmnt'] = pd.to_datetime(mktcap['Trdmnt'])
mktcap['year'] = mktcap['Trdmnt'].dt.year
mktcap['month'] = mktcap['Trdmnt'].dt.month
mktcap['mon_num'] = 12*(mktcap['year']-2005)+mktcap['month']+1#用上一个月的总市值计算ep
#导入净利润数据
earning = pd.read_csv('FS_Comins1.csv')
earning = earning[earning['Typrep']=='A']
earning['Accper'] = pd.to_datetime(earning['Accper'])
earning['year'] = earning['Accper'].dt.year
earning['month'] = earning['Accper'].dt.month
earning['match_num'] = 12*(earning['year']-2005)+earning['month']
earning['B002000201'] = earning['B002000201'].fillna(0)
#根据财报数据填充每月净利润并计算EP
def get_EP(mktcap, earning):
    mkt0 = mktcap[(mktcap['month'] == 12)]
    mkt0['match_num'] = 12*(mkt0['year']-2005+1)-3
    mkt1 = mktcap[ (mktcap['month'] == 1) | (mktcap['month'] == 2) | (mktcap['month'] == 3) ]
    mkt1['match_num'] = 12*(mkt1['year']-2005)-3
    mkt2 = mktcap[(mktcap['month'] == 4) | (mktcap['month'] == 5) | (mktcap['month'] == 6) | (mktcap['month'] == 7) ]
    mkt2['match_num'] = 12*(mkt2['year']-2005)+3
    mkt3 = mktcap[(mktcap['month'] == 8) | (mktcap['month'] == 9)]
    mkt3['match_num'] = 12*(mkt3['year']-2005)+6
    mkt4 = mktcap[(mktcap['month'] == 10) | (mktcap['month'] == 11)]
    mkt4['match_num'] = 12*(mkt4['year']-2005)+9
    mktcap_m = pd.concat([mkt0, mkt1, mkt2, mkt3, mkt4])
    mktcap_m = pd.merge(mktcap_m,earning,on=['Stkcd','match_num'],how='left')
    mktcap_m['ep'] = (mktcap_m['B002000000']-mktcap_m['B002000201']) / mktcap_m['Msmvttl']/1000
    ep_ratio = mktcap_m[['Stkcd','Trdmnt','ep','mon_num']]
    return ep_ratio
ep_ratio = get_EP(mktcap, earning)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is tryin

In [8]:
#2005.1-2018.7月个股收益率
ret_month = pd.read_csv('TRD_Mnth11.csv')
ret_month['Trdmnt'] = pd.to_datetime(ret_month['Trdmnt'])
ret_month['year'] = ret_month['Trdmnt'].dt.year
ret_month['month'] = ret_month['Trdmnt'].dt.month
ret_month['mon_num'] = 12*(ret_month['year']-2005)+ret_month['month']
#流通市值用于计算加权收益
mktcap = pd.read_csv('TRD_Mnth1vw.csv')
mktcap['Trdmnt'] = pd.to_datetime(mktcap['Trdmnt'])
mktcap['year'] = mktcap['Trdmnt'].dt.year
mktcap['month'] = mktcap['Trdmnt'].dt.month
mktcap['mon_num'] = 12*(mktcap['year']-2005)+mktcap['month']+1#用上一个月的流通市值分size组以及求加权收益

factor = pd.merge(ep_ratio[['Stkcd','mon_num','ep']], mktcap[['Stkcd','mon_num','Msmvosd']], on=['Stkcd','mon_num'])
factor = pd.merge(factor, ER_trend, on=['Stkcd','mon_num'], how = 'left')
factor_ret = pd.merge(ret_month[['Stkcd','Mretwd','mon_num']], factor, on=['Stkcd','mon_num'])

In [9]:
#建立分组标签
def get_group(factor_ret, group1n, x, group2n, y, group3n, z):
    X = pd.DataFrame()
    factor_ret = factor_ret.dropna()
    factor_ret['ret*mkt'] = factor_ret['Mretwd']*factor_ret['Msmvosd']
    for i in range(1,164):
        temp_value = factor_ret[factor_ret['mon_num'] == i]
        temp_value['sample'] = pd.qcut(temp_value['Msmvosd'],[0,0.3,1],labels=False,duplicates='drop')
        temp_value[group1n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group1n],[0,0.5,1],labels=False,duplicates='drop')
        temp_value[group2n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group2n],[0,0.3,0.7,1],labels=False,duplicates='drop')
        temp_value[group3n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group3n],[0,0.3,0.7,1],labels=False,duplicates='drop')
        X = pd.concat([X,temp_value],axis=0)
    factor_ret = X[(X['sample'] == 1)]#去掉30%小市值股票
    return factor_ret

factor_ret = get_group(factor_ret, 'Msmvosd', 2, 'ep', 3, 'ER', 3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#inde

In [10]:
#导入月度无风险收益率
rf = pd.read_csv('TRD_Nrrate1.csv')
rf['Clsdt'] = pd.to_datetime(rf['Clsdt'])
rf['year'] = rf['Clsdt'].dt.year
rf['month'] = rf['Clsdt'].dt.month
rf['mon_num'] = 12*(rf['year']-2005)+rf['month']
rf = rf.drop_duplicates(subset=['year','month'], keep='first')[['Nrrmtdt','mon_num']]

In [16]:
#按市值，EP和ERtrend进行三分组，计算Trend, SMB, VMG因子和MKT因子
def get_factor(factor_ret, group1n, group2n, group3n):
    Size_ep_T = pd.DataFrame(index=range(1,164))
    for i in range(2):
        for j in range(3):
            for k in range(3):
                for t in range(1,164):
                    temp = factor_ret[factor_ret['mon_num'] == t]
                    #print(t)
                    Size_ep_T.loc[t,str(i)+str(j)+str(k)] = (temp[(temp[group1n+'_n'] == i)&(temp[group2n+'_n'] == j)&(temp[group3n+'_n'] == k)]['ret*mkt']/(temp[(temp[group1n+'_n'] == i)&(temp[group2n+'_n'] == j)&(temp[group3n+'_n'] == k)]['Msmvosd'].sum())).sum()
    Trend = Size_ep_T[['002','012','022','102','112','122']].mean(axis=1) - Size_ep_T[['000','010','020','100','110','120']].mean(axis=1)
    smb = Size_ep_T[['000','001','002','010','011','012','020','021','022']].mean(axis=1) - Size_ep_T[['100','101','102','110','111','112','120','121','122']].mean(axis=1)
    vmg = Size_ep_T[['020','021','022','120','121','122']].mean(axis=1) - Size_ep_T[['000','001','002','100','101','102']].mean(axis=1)
    mkt = []
    for t in range(1,164):
        temp = factor_ret[factor_ret['mon_num'] == t]
        #print(t)
        mkt.append((temp['Mretwd']*temp['Msmvosd']/(temp['Msmvosd'].sum())).sum())
    mkt = mkt - 0.01* rf.iloc[:,0]
    mkt.index = range(1,164)
    mkt_avg = mkt.mean()
    return Trend, smb, vmg, mkt, mkt_avg

Trend, smb, vmg, mkt, mkt_avg = get_factor(factor_ret, 'Msmvosd', 'ep', 'ER')

# Empirical results
**Summary statistics**:    
1.LSY-4 factors     
2.our size factor (SMB\*), value factor (VMG\*) and trend factor (Trend)

In [17]:
def NWtest(a, lags=4):
    adj_a = np.array(a)
    # 对常数回归
    model = sm.OLS(adj_a, [1] * len(adj_a)).fit(cov_type='HAC', cov_kwds={'maxlags': lags})
    return 100*adj_a.mean(), float(model.tvalues)

def get_Sharpe(data, rf):
    mean = data.mean() * 12
    STD = data.std() * np.sqrt(12)
    sharp = (mean - 0.01*rf.iloc[:,0].mean() * 12) / STD
    return sharp

def get_MDD(data):
    data = data +1
    data = data.values.cumprod()
    index_j = np.argmax(np.maximum.accumulate(data) - data)  # 结束位置
    index_i = np.argmax(data[:index_j])  # 开始位置
    d = (data[index_j] - data[index_i]) / data[index_i]  # 最大回撤
    return -d

def Summary_statistic(data, rf): 
    result = pd.DataFrame(index=['Mean',' ','SD','Sharpe','Skew','MDD'], columns=['MKT','SMB','VMG','PMO','SMB*','VMG*','Trend'])
    i=0
    for temp in data: 
        mean, t = NWtest(temp)
        result.iloc[:,i] = [mean, t, temp.std(), get_Sharpe(temp,rf), temp.skew(), get_MDD(temp)*100]
        i=i+1
    return result

In [18]:
#导入LSY4因子的因子数据
Ch4 = pd.read_csv('CH_4_fac.csv',index_col = 0).iloc[60:223,:]/100
Ch4.index = range(1,164)

statistic = Summary_statistic([Ch4.mktrf, Ch4.SMB, Ch4.VMG, Ch4.PMO, smb, vmg, Trend], rf).applymap(lambda x:round(x, 3))
statistic

Unnamed: 0,MKT,SMB,VMG,PMO,SMB*,VMG*,Trend
Mean,0.931,1.0,1.095,0.886,0.765,1.046,0.528
,1.164,2.498,4.307,3.348,2.482,3.996,2.671
SD,0.084,0.05,0.04,0.039,0.041,0.039,0.026
Sharpe,0.296,0.55,0.768,0.594,0.473,0.744,0.417
Skew,-0.384,-0.052,0.216,-0.74,0.044,0.156,-0.654
MDD,69.068,26.064,19.694,25.693,25.493,19.16,18.481


![1.png](https://i.loli.net/2020/06/20/4cC7r3ePguDx1AV.png)

In [19]:
def Corr(data_list):
    result = []
    for data1 in data_list:
        result_i = []
        for data2 in data_list:
            result_i.append(data1.corr(data2)) 
        result.append(result_i)
    result = pd.DataFrame(result,index=['MKT','SMB','VMG','PMO','SMB*','VMG*','Trend'], columns=['MKT','SMB','VMG','PMO','SMB*','VMG*','Trend']).T
    return result

corr = Corr([Ch4.mktrf, Ch4.SMB, Ch4.VMG, Ch4.PMO, smb, vmg, Trend]).applymap(lambda x:round(x, 2))
corr

Unnamed: 0,MKT,SMB,VMG,PMO,SMB*,VMG*,Trend
MKT,1.0,0.14,-0.3,-0.28,0.07,-0.26,-0.08
SMB,0.14,1.0,-0.63,0.1,0.95,-0.63,0.37
VMG,-0.3,-0.63,1.0,-0.03,-0.62,0.9,-0.17
PMO,-0.28,0.1,-0.03,1.0,0.14,0.01,0.34
SMB*,0.07,0.95,-0.62,0.14,1.0,-0.62,0.37
VMG*,-0.26,-0.63,0.9,0.01,-0.62,1.0,-0.07
Trend,-0.08,0.37,-0.17,0.34,0.37,-0.07,1.0


![2.png](https://i.loli.net/2020/06/20/dIqLzhBtik91u3F.png)

**Conclusion:**    
1.PMO is trending up, say, by its long-leg stocks only, Trend should capture these stocks in its long-leg, resulting in a positive correlation.   
2.LSY-3's size (value) factor has a strong correlation, over 90%, with ours.    
3.As smaller stocks tend to be growth stocks, SMG (SMG\*) and VMG (VMG\*) exhibit a strong negative correlation.

**Comparison of PMO and Trend in sub-samples：**       
the average monthly returns for the turnover factor (PMO) and our trend factor (Trend) in sub-samples using 2\*3\*3 independent sortings    
1.Stocks are independently sorted by two kinds of control variables (2\*3) and PMO / Trend(\*3).    
2.In each one of the 6 sub-samples, the trend factor (Trend)/ turnover factor (PMO) are defined as the return of the Trend-High/ AbTurn-Low Portfolio minus that of the Trend-Low/ AbTurn-High portfolio.

In [20]:
#导入异常换手率因子
AbTurn = pd.read_csv('ret_turn_abn.csv',index_col = 0).iloc[71:-5,:]
AbTurn.columns = pd.to_numeric(AbTurn.columns)
valid = [x for x in AbTurn.columns if x in ret.columns]
AbTurn = AbTurn.loc[:, valid]
AbTurn.columns = ret.columns
AbTurn.index = range(1,164)
AbTurn = AbTurn.stack().reset_index().rename(columns={'level_0': 'mon_num',0: 'AbTurn'})
factor = pd.merge(ep_ratio[['Stkcd','mon_num','ep']], mktcap[['Stkcd','mon_num','Msmvosd']], on=['Stkcd','mon_num'])
factor = pd.merge(factor, ER_trend, on=['Stkcd','mon_num'])
factor = pd.merge(factor, AbTurn, on=['Stkcd','mon_num'])
factor_ret0 = pd.merge(ret_month[['Stkcd','Mretwd','mon_num']], factor, on=['Stkcd','mon_num'])

In [21]:
def get_group4(factor_ret, group1n, group2n, group3n, group4n):
    X = pd.DataFrame()
    factor_ret = factor_ret.dropna()
    factor_ret['ret*mkt'] = factor_ret['Mretwd']*factor_ret['Msmvosd']
    for i in range(1,164):
        temp_value = factor_ret[factor_ret['mon_num'] == i]
        temp_value['sample'] = pd.qcut(temp_value['Msmvosd'],[0,0.3,1],labels=False,duplicates='drop')
        temp_value[group1n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group1n],[0,0.5,1],labels=False,duplicates='drop')
        temp_value[group2n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group2n],[0,0.3,0.7,1],labels=False,duplicates='drop')
        temp_value[group3n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group3n],[0,0.3,0.7,1],labels=False,duplicates='drop')
        temp_value[group4n+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][group4n],[0,0.3,0.7,1],labels=False,duplicates='drop')
        X = pd.concat([X,temp_value],axis=0)
    factor_ret = X[(X['sample'] == 1)]
    return factor_ret

factor_ret_g = get_group4(factor_ret0, 'Msmvosd', 'ep', 'AbTurn', 'ER')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/

In [22]:
def get_Tab2(factor_ret_g, fac_n, control1, control2, s_minus_b = 1):
    result = pd.DataFrame(index = range(8), columns = ['Size-Small','Size-Big','Size-Average'])
    temp_fac = pd.DataFrame(index=range(1,164))
    for t in range(1,164):
        temp = factor_ret_g[factor_ret_g['mon_num'] == t]
        for i in range(2):
            for j in range(3):
                temp['last_n'] = pd.qcut(temp[fac_n],[0,0.3,0.7,1],labels=False,duplicates='drop')
                if s_minus_b == 1:
                    temp_fac.loc[t,str(i)+str(j)] = (temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 0)]['ret*mkt']/(temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 0)]['Msmvosd'].sum())).sum()\
                        - (temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 2)]['ret*mkt']/(temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 2)]['Msmvosd'].sum())).sum()
                else:
                    temp_fac.loc[t,str(i)+str(j)] = (temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 2)]['ret*mkt']/(temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 2)]['Msmvosd'].sum())).sum()\
                        - (temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 0)]['ret*mkt']/(temp[(temp[control1+'_n'] == i)&(temp[control2+'_n'] == j)&(temp['last_n'] == 0)]['Msmvosd'].sum())).sum()
    temp_fac.loc[:,'A0'] = temp_fac[['00','10']].mean(axis = 1)
    temp_fac.loc[:,'A1'] = temp_fac[['01','11']].mean(axis = 1)
    temp_fac.loc[:,'A2'] = temp_fac[['02','12']].mean(axis = 1)
    temp_fac.loc[:,'AA'] = temp_fac[['A0','A1','A2']].mean(axis = 1)
    temp_fac.loc[:,'0A'] = temp_fac[['00','01']].mean(axis = 1)
    temp_fac.loc[:,'1A'] = temp_fac[['10','11']].mean(axis = 1)
    for i in range(2):
        for j in range(3):
            result.iloc[2*j,i], result.iloc[2*j+1,i] = NWtest(temp_fac.iloc[:, i*3+j])
    result.iloc[0:2,2] = NWtest(temp_fac.loc[:,'A0'])
    result.iloc[2:4,2] = NWtest(temp_fac.loc[:,'A1'])
    result.iloc[4:6,2] = NWtest(temp_fac.loc[:,'A2'])
    result.iloc[6:8,2] = NWtest(temp_fac.loc[:,'AA'])
    result.iloc[6:8,0] = NWtest(temp_fac.loc[:,'0A'])
    result.iloc[6:8,1] = NWtest(temp_fac.loc[:,'1A'])
    result.index = [control2+'-Low','',control2+'-Mid','',control2+'-Hign','','Average','']
    return result

In [23]:
# Control for Size and EP: PMO & Trend
PanelA_PMO = get_Tab2(factor_ret_g, 'AbTurn','Msmvosd','ep')
PanelA_Trend = get_Tab2(factor_ret_g, 'ER','Msmvosd','ep', s_minus_b = 0)
pd.concat([PanelA_PMO,PanelA_Trend],axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Size-Small,Size-Big,Size-Average,Size-Small.1,Size-Big.1,Size-Average.1
ep-Low,1.68445,0.545741,1.1151,0.426534,1.20075,0.813642
,6.66051,1.20163,3.55752,1.93266,2.73156,2.70358
ep-Mid,1.25743,0.555118,0.906273,0.267717,0.46416,0.365939
,5.11431,1.62978,3.52782,1.44157,1.42138,1.67628
ep-Hign,1.32883,-0.109645,0.609593,0.393692,0.449088,0.42139
,4.56679,-0.320191,2.31301,1.90549,1.65579,2.25056
Average,1.47094,0.55043,0.876988,0.347126,0.832455,0.533657
,6.83707,1.53207,3.62017,1.95685,2.44088,2.72243


![3.png](https://i.loli.net/2020/06/20/ByGjAIPHiD7dbLF.png)

**Conclusion:**     
1.PMO earns a monthly return of 0.87% (t-statistic: 3.62), with contributions mainly from small stocks.   
2.Trend earns a monthly return of 0.53% (t-statistic: 2.72), with contributions mainly from big stocks?

In [24]:
# Control for Size and ER：PMO
PanelB_PMO = get_Tab2(factor_ret_g, 'AbTurn','Msmvosd','ER')
PanelB_PMO

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Size-Small,Size-Big,Size-Average
ER-Low,1.67115,0.305616,0.988385
,6.03694,0.772603,3.41912
ER-Mid,1.50802,0.406677,0.957349
,5.57431,1.0458,3.43636
ER-Hign,1.30436,-0.0556964,0.624334
,5.20587,-0.129373,2.3016
Average,1.58959,0.356146,0.856689
,6.57609,0.995167,3.6716


![4.png](https://i.loli.net/2020/06/20/DGxco83YIMRpO6T.png)

**Conclusion:**     
1.PMO earns a significant monthly return with contributions mainly from small stocks.   
2.The predictability of turnover can't be subsumed by the trend signals?

In [25]:
# Control for Size and AbTurn：Trend
PanelC_Trend = get_Tab2(factor_ret_g, 'ER','Msmvosd','AbTurn', s_minus_b = 0)
PanelC_Trend

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Size-Small,Size-Big,Size-Average
AbTurn-Low,0.0357256,0.604582,0.320154
,0.179181,1.64418,1.33802
AbTurn-Mid,0.0833079,0.718061,0.400685
,0.391995,2.18353,1.88109
AbTurn-Hign,0.402515,0.965894,0.684205
,1.6255,2.03184,2.24859
Average,0.0595167,0.661322,0.468348
,0.367344,2.55095,2.43075


![5.png](https://i.loli.net/2020/06/20/wsjWYHShV8gePmb.png)

**Conclusion:**    
1.Our trend measure provides independent information beyond size, EP, and turnover.     
2.It is able to capture trends in large stocks?

**Model performances in explaining factors in other models:**     
1.We compare the explanatory power of our 4-factor with LSY-3, LSY-4 as well as the replications of Hou, Xue, and Zhang's (2015) q-factor model (q-4) and Fama and French's (2015) 5-factor model (FF-5).     
2.We report the average absolute monthly alpha (%), the average absolute t -statistics, the aggregate pricing error △, and the Gibbons, Ross, and Shanken (1898) (GRS) F-statistics with associated p-values.

In [26]:
#本文的四因子模型
Our4 = pd.concat([pd.DataFrame(Trend), pd.DataFrame(smb), pd.DataFrame(vmg), pd.DataFrame(mkt)], axis = 1)
Our4.columns = ['Trend', 'smb', 'vmg', 'mkt_r']
#q4因子模型
q4factor = pd.read_csv('q4factor.csv',index_col = 0).iloc[96:-17,[3,4,5]]
q4factor = pd.read_csv('q4factor.csv',index_col = 0).iloc[96:-17,[3,4,5]]
#FF5因子模型
FF5_monthly = pd.read_csv('FF5_monthly.csv',index_col = 0).iloc[131:-9,[0,1,2,8,9]]
FF5_monthly.index = range(1,164)
#LSY3因子模型
Ch3 = pd.read_csv('CH_3_fac.csv',index_col = 0).iloc[60:223,1:]/100
Ch3.index = range(1,164)
#LSY4因子模型
Ch4 = pd.read_csv('CH_4_fac.csv',index_col = 0).iloc[60:223,1:]/100
Ch4.index = range(1,164)

In [27]:
def GRS_test(factor, resid, alpha):
    resid = resid.fillna(0)
    T, N = resid.shape
    L = factor.shape[1]
    mu_mean = factor.mean(0)
    cov_e = np.cov(resid.T)
    cov_f = np.cov(factor.T).reshape((L, L))
    alpha = np.asmatrix(alpha).reshape(N, 1)
    mu_mean = np.asmatrix(mu_mean).reshape(L, 1)
    # matrix operation with np.ndarray
    delta = alpha.T @ inv(cov_e) @ alpha
    GRS = float((T / N) * ((T - N - L) / (T - L - 1))) * (alpha.T @ inv(cov_e) @ alpha) / (1 + mu_mean.T @ inv(cov_f) @ mu_mean)
    GRS = GRS[0, 0]
    GRSp = st.f.sf(GRS, N, (T - N - L))
    grs = [GRS, GRSp]
    return  delta[:, 0][0, 0], grs

def fac_in_others(model1, model2):
    alpha = []
    t = []
    re = pd.DataFrame()
    for i in model1.columns:
        Y = model1[[i]]
        model = sm.OLS(Y.values,sm.add_constant(model2).values).fit()
        alpha.append(model.params[0])
        t.append(abs(model.tvalues[0]))
        residual = pd.DataFrame(model.resid)
        re = pd.concat([re, residual], axis=1)
    absalpha = [abs(x) for x in alpha]
    alpha = DataFrame(alpha)
    delta, grs = GRS_test(model2, re, alpha)
    return np.mean(absalpha)*100, np.mean(t), delta, grs

def get_Tab3():
    results = pd.DataFrame()
    for i in [[Our4,Ch3],[Ch3,Our4],[Our4,Ch4],[Ch4,Our4],[Our4,q4factor],[q4factor,Our4],[Our4,FF5_monthly],[FF5_monthly,Our4]]:
        a,b,c,d = fac_in_others(i[0], i[1])
        temp=pd.DataFrame([a,b,c]+d)
        results = pd.concat([results, temp], axis=1)
    results.index = ['ave_abs_α','ave_abs_t','△','GRS','GRS_p']
    results.columns = ['LSY3','Our4','LSY4','Our4','q4','Our4','FF5','Our4']
    return results

Tab3 = get_Tab3()
Tab3

Unnamed: 0,LSY3,Our4,LSY4,Our4.1,q4,Our4.2,FF5,Our4.3
ave_abs_α,0.153797,0.18709,0.105423,0.334027,0.858361,0.348225,0.472609,0.301758
ave_abs_t,0.931625,1.331932,0.726685,1.594597,2.382261,2.071139,2.658894,1.699075
△,0.035676,0.059569,0.026658,0.119689,0.268083,0.153109,0.384799,0.231339
GRS,1.046389,2.386166,0.74407,3.572753,8.754998,6.133141,12.97136,5.488812
GRS_p,0.385211,0.071219,0.563364,0.008111,2e-06,0.000572,3.977894e-09,0.000112


![6.png](https://i.loli.net/2020/06/20/fHESRs5gdBqitxZ.png)

**Model performances in explaining anomalies:**     
We compare the pricing ability of different factor models in explaining stock return anomalies(14) in China, and report the average absolute monthly alpha (%), the average absolute t -statistics, the aggregate pricing error △, and the Gibbons, Ross, and Shanken (1898) (GRS) F-statistics with associated p-values.    
1.exclude the smallest 30% of stocks in forming all the anomalies     
2.compute the standard long-short return spread between the extreme decile portfolios sorted by the corresponding
anomaly variable in the most recent month, and rebalance the portfolios monthly

In [28]:
#读取14个异象因子数据并进行处理
bm = pd.read_csv('bm.csv',index_col = 0).iloc[72:-5,:]
cp = pd.read_csv('cp.csv',index_col = 0).iloc[72:-5,:]
ep = pd.read_csv('ep.csv',index_col = 0).iloc[72:-5,:]
illq = pd.read_csv('illq.csv',index_col = 0).iloc[69:-5,:]
me = pd.read_csv('me.csv',index_col = 0).iloc[60:-5,:]
ret_max = pd.read_csv('ret_max.csv',index_col = 0).iloc[69:-5,:]
ret_rev = pd.read_csv('ret_rev.csv',index_col = 0).iloc[69:-5,:]
ret_turn = pd.read_csv('ret_turn.csv',index_col = 0).iloc[60:-5,:]
roe = pd.read_csv('roe.csv',index_col = 0).iloc[72:-5,:]
vol_1m = pd.read_csv('vol_1m.csv',index_col = 0).iloc[69:-5,:]

bm.columns, cp.columns, ep.columns, illq.columns, me.columns, ret_max.columns, ret_rev.columns, ret_turn.columns, roe.columns, vol_1m.columns = pd.to_numeric(bm.columns), pd.to_numeric(cp.columns), pd.to_numeric(ep.columns), pd.to_numeric(illq.columns), pd.to_numeric(me.columns), pd.to_numeric(ret_max.columns), pd.to_numeric(ret_rev.columns), pd.to_numeric(ret_turn.columns), pd.to_numeric(roe.columns), pd.to_numeric(vol_1m.columns)
valid = [x for x in bm.columns if x in ret.columns]
bm,cp,ep,illq,me,ret_max,ret_rev,ret_turn,roe,vol_1m = bm.loc[:, valid],cp.loc[:, valid],ep.loc[:, valid],illq.loc[:, valid],me.loc[:, valid],ret_max.loc[:, valid],ret_rev.loc[:, valid],ret_turn.loc[:, valid],roe.loc[:, valid],vol_1m.loc[:, valid]
bm.index, cp.index, ep.index, illq.index, me.index, ret_max.index, ret_rev.index, ret_turn.index, roe.index, vol_1m.index = range(1,164),range(1,164),range(1,164),range(1,164),range(1,164),range(1,164),range(1,164),range(1,164),range(1,164),range(1,164)

ER_trend = pd.pivot_table(ER_trend,index='mon_num',columns='Stkcd',values='ER')
ER_trendP = pd.pivot_table(ER_trendP,index='mon_num',columns='Stkcd',values='ER')
ER_trendV = pd.pivot_table(ER_trendV,index='mon_num',columns='Stkcd',values='ER')

In [31]:
mktcap = mktcap[['Stkcd','mon_num','Msmvosd']]#用于加权
#计算异象因子对应的高减低组收益率
def get_anomalies(factor, ret_month, anomalies, size = 0):
    factor_ret = pd.merge(ret_month[['Stkcd','Mretwd','mon_num']], mktcap, on=['Stkcd','mon_num'])
    if size == 0:
        factor = factor.stack().reset_index().rename(columns={'level_0': 'mon_num','level_1': 'Stkcd',0: anomalies})
        factor_ret = pd.merge(factor_ret, factor, on=['Stkcd','mon_num'], how = 'left')
    X = pd.DataFrame()
    factor_ret = factor_ret.dropna()
    factor_ret['ret*mkt'] = factor_ret['Mretwd']*factor_ret['Msmvosd']
    for i in range(1,164):
        temp_value = factor_ret[factor_ret['mon_num'] == i]
        temp_value['sample'] = pd.qcut(temp_value['Msmvosd'],[0,0.3,1],labels=False,duplicates='drop')#去掉30%小市值
        temp_value[anomalies+'_n'] = pd.qcut(temp_value[(temp_value['sample'] == 1)][anomalies],10,labels=False,duplicates='drop')
        X = pd.concat([X,temp_value],axis=0)
    factor_ret = X[(X['sample'] == 1)]
    anomaliesHL = pd.DataFrame(index=range(1,164))
    for i in [0,9]:
        for t in range(1,164):
            temp = factor_ret[factor_ret['mon_num'] == t]
            #print(t)
            anomaliesHL.loc[t,i] = (temp[temp[anomalies+'_n'] == i]['ret*mkt']/(temp[temp[anomalies+'_n'] == i]['Msmvosd'].sum())).sum()
    anomalies_fac = pd.DataFrame(anomaliesHL[9] - anomaliesHL[0],index = range(1,164))
    return anomalies_fac

In [32]:
size_f = get_anomalies(mktcap, ret_month, 'Msmvosd', size = 1)
ER_f = get_anomalies(ER_trend, ret_month, 'ER')
ERp_f = get_anomalies(ER_trendP, ret_month, 'ERp')
ERv_f = get_anomalies(ER_trendV, ret_month, 'ERv')
bm_f = get_anomalies(bm, ret_month, 'bm')
cp_f = get_anomalies(cp, ret_month, 'cp')
ep_f = get_anomalies(ep, ret_month, 'ep')
illq_f = get_anomalies(illq, ret_month, 'illq')
me_f = get_anomalies(me, ret_month, 'me')
ret_max_f = get_anomalies(ret_max, ret_month, 'ret_max')
ret_rev_f = get_anomalies(ret_rev, ret_month, 'ret_rev')
ret_turn_f = get_anomalies(ret_turn, ret_month, 'ret_turn')
roe_f = get_anomalies(roe, ret_month, 'roe')
vol_1m_f = get_anomalies(vol_1m, ret_month, 'vol_1m')
#合并
all_anomalies = pd.concat([size_f,ER_f,ERp_f,ERv_f,bm_f,cp_f,ep_f,illq_f,me_f,ret_max_f,ret_rev_f,ret_turn_f,roe_f,vol_1m_f], axis=1)
all_anomalies.columns = range(14)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]


In [34]:
def get_Tab4(anomalies, models):
    results = pd.DataFrame()
    for model in models:
        a,b,c,d = fac_in_others(anomalies, model)
        temp=pd.DataFrame([a,b,c]+d)
        results = pd.concat([results, temp], axis=1)
    results.index = ['ave_abs_α','ave_abs_t','△','GRS','GRS_p']
    results.columns = ['LSY3','LSY4','q4','FF5','Our4']#
    return results

all_anomalies1 = all_anomalies.iloc[:,1:]
Tab4 = get_Tab4(all_anomalies1, [Ch3, Ch4, q4factor, FF5_monthly, Our4])
Tab4

Unnamed: 0,LSY3,LSY4,q4,FF5,Our4
ave_abs_α,0.482395,0.350055,1.021899,0.871186,0.48993
ave_abs_t,1.173493,0.855464,2.289888,2.340053,1.265763
△,0.261769,0.239754,0.316712,0.456817,0.25813
GRS,2.428045,2.115499,3.270893,4.86637,2.435868
GRS_p,0.006597,0.019119,0.000329,1e-06,0.006443


![7.png](https://i.loli.net/2020/06/20/YSVKqGQLk1wfybl.png)

**Model performances in explaining mutual fund returns:**     
We sort the funds at the end of each month by assets under management (AUM) into ten decile portfolios, examine how the various models perform in explaining the fund returns, and report the average absolute monthly alpha (%), the average absolute t -statistics, the aggregate pricing error △, and the Gibbons, Ross, and Shanken (1898) (GRS) F-statistics with associated p-values.    

In [36]:
def get_Tab5():
    fund = pd.read_csv('fund.csv',encoding='gbk')
    fund['size'] = fund['最新基金份额']*fund['单位净值']#分组指标
    fund['年'] = [int(x) for x in fund['月度']/100]
    fund['月'] = fund['月度']%100
    fund['mon_num'] = 12*(fund['年']-2005)+fund['月']
    X = pd.DataFrame()
    fund = fund.dropna()
    fund['ret*mkt'] = fund['净值增长率']*fund['size']
    for i in range(1,164):
        temp_value = fund[fund['mon_num'] == i]
        temp_value['size_n'] = pd.qcut(temp_value['size'],10,labels=False,duplicates='drop')
        X = pd.concat([X,temp_value],axis=0)
    fund = X
    fund_fac = pd.DataFrame(index=range(1,164))
    #构建基金AUM十分组对应的十个alpha
    for i in range(10):
        for t in range(1,164):
            temp = fund[fund['mon_num'] == t]
            #print(t)
            fund_fac.loc[t,i] = (temp[temp['size_n'] == i]['ret*mkt']/(temp[temp['size_n'] == i]['size'].sum())).sum()
    Tab5 = get_Tab4(fund_fac, [Ch3, Ch4, q4factor, FF5_monthly, Our4])
    Tab5.iloc[0,:] = Tab5.iloc[0,:]*0.01
    return Tab5

Tab5 = get_Tab5()
Tab5

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':


Unnamed: 0,LSY3,LSY4,q4,FF5,Our4
ave_abs_α,0.13663,0.168494,0.709988,0.193179,0.13394
ave_abs_t,0.439625,0.529377,1.231936,0.68986,0.408834
△,0.082716,0.099757,0.1296,0.061977,0.106085
GRS,0.933128,1.070639,1.627865,0.803129,1.217636
GRS_p,0.504713,0.388423,0.1037,0.625906,0.284089


![8.png](https://i.loli.net/2020/06/20/KMPz98rjDqc5Gog.png)

**Sharpe ratio tests:**    
We conduct the Sharpe ratio test of Barillas and Shanken (2017) to compare the explaining power without using test assets (defined as the squared Sharpe ratio of the tangency portfolio spanned by the factors).

In [39]:
def get_Sh(factor):
    L = factor.shape[1]
    mu_mean = factor.mean(0)
    cov_f = np.cov(factor.T).reshape((L, L))
    mu_mean = np.asmatrix(mu_mean).reshape(L, 1)
    # matrix operation with np.ndarray
    Sh = mu_mean.T @ inv(cov_f) @ mu_mean
    return  Sh[:, 0][0, 0]

def get_Tab6(fac_list):
    Sh = []
    for i in range(len(fac_list)):
        Sh.append(get_Sh(fac_list[i]))
    Sh = pd.DataFrame(Sh).T
    Sh.columns = ['LSY3','LSY4','q4','FF5','Our4']
    
    Sh_diff = pd.DataFrame(index = range(5),columns = ['LSY3','LSY4','q4','FF5','Our4'])
    for i in range(len(Sh.iloc[0,:])):
        for j in range(len(Sh.iloc[0,:])):
            Sh_diff.iloc[i,j] = Sh.iloc[0,j] - Sh.iloc[0,i]
    Tab6 = pd.concat([Sh,Sh_diff], axis = 0)
    return Tab6

Tab6 = get_Tab6([Ch3, Ch4, q4factor, FF5_monthly, Our4])
Tab6.index = ['Sh', 'LSY3', 'LSY4', 'q4', 'FF5', 'Our4']
Tab6

Unnamed: 0,LSY3,LSY4,q4,FF5,Our4
Sh,0.363114,0.432251,0.224246,0.18576,0.33922
LSY3,0.0,0.0691371,-0.138867,-0.177353,-0.0238933
LSY4,-0.0691371,0.0,-0.208005,-0.24649,-0.0930304
q4,0.138867,0.208005,0.0,-0.0384857,0.114974
FF5,0.177353,0.24649,0.0384857,0.0,0.15346
Our4,0.0238933,0.0930304,-0.114974,-0.15346,0.0


![9.png](https://i.loli.net/2020/06/20/j1BwR5At2ozIbn7.png)

# Robustness:     
1.The trend factor is robust to alternative formations and transaction costs, and it remains strong after controlling for major firm characteristics.     
2.Show further that it also holds in the US, albeit with smaller volume effects due to less individual trading(No report).

**Performances of the trend factor under alternative informations:**    
Exponential moving average: lamda——the parameter in Equation that determines the weight of the coefficients over different horizons

In [43]:
def get_Tab7():
    ER_trend003,_,_ = ERtrend_month(dClsprc_n,dDnvaltrd_n,ret,trade1d,0.03) #取lamda = 0.03
    ER_trend001,_,_ = ERtrend_month(dClsprc_n,dDnvaltrd_n,ret,trade1d,0.01) #取lamda = 0.01
    ER_trend005,_,_ = ERtrend_month(dClsprc_n,dDnvaltrd_n,ret,trade1d,0.05) #取lamda = 0.05
    factor0 = pd.merge(ep_ratio[['Stkcd','mon_num','ep']], mktcap[['Stkcd','mon_num','Msmvosd']], on=['Stkcd','mon_num'])
    result = pd.DataFrame()
    for ER_trend in [ER_trend001, ER_trend003, ER_trend005]:
        factor = pd.merge(factor0, ER_trend, on=['Stkcd','mon_num'], how = 'left')
        factor_ret = pd.merge(ret_month[['Stkcd','Mretwd','mon_num']], factor, on=['Stkcd','mon_num'])
        factor_ret = get_group(factor_ret, 'Msmvosd', 2, 'ep', 3, 'ER', 3)
        Trend,_,_,_,_ = get_factor(factor_ret, 'Msmvosd', 'ep', 'ER')
        alpha= []
        tvalue = []
        mean, t = NWtest(Trend)
        alpha.append(mean)
        tvalue.append(t)
        for i in [[0],[0,1,2],[0,1,2,3]]:
            Y = Ch4.iloc[:,i]
            model = sm.OLS(Trend.values,sm.add_constant(Y).values).fit(cov_type='HAC',cov_kwds={'maxlags':4})
            alpha.append(model.params[0]*100)
            tvalue.append(abs(model.tvalues[0]))
        result = pd.concat([result, pd.DataFrame([alpha,tvalue])],axis = 0)
    result.columns = ['Mean','CAPM_α','LSY3_α','LSY4_α']
    result.index = [0.01,'',0.03,'',0.05,'']
    return result

Tab7 = get_Tab7()
Tab7.applymap(lambda x:round(x, 3))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#inde

Unnamed: 0,Mean,CAPM_α,LSY3_α,LSY4_α
0.01,0.717,0.718,0.418,0.282
,3.33,3.473,1.828,1.211
0.03,0.421,0.388,0.141,0.083
,1.676,1.559,0.56,0.328
0.05,0.389,0.434,0.322,0.242
,1.849,2.157,1.477,0.971


![10.png](https://i.loli.net/2020/06/20/4gOUWfj8EwxdTPl.png)

**Transaction costs:**     
reports the turnover rate and the break-even transaction costs (BETCs) of the trend factor (Trend) and of the turnover factor (PMO)(following Grundy and Martin (2001), and Barroso and Santa-Clara(2015))      
1.Zero return: BETCs that would completely offset the returns or the risk-adjusted returns (CAPM alpha)    
2.5% Insignificant: BETCs that make the returns or the risk-adjusted returns insignificant at the 5% level

In [44]:
def get_Cost_005(factor, adjust):
    a = 0.01
    b = -0.01
    while True:
        i = (a+b)/2
        factor0 = factor - i
        if adjust == 0:
            _, t = NWtest(factor0)
        else:
            model = sm.OLS(factor0.values,sm.add_constant(mkt).values).fit()
            t = model.tvalues[0]
        if t<2:
            a = i
        elif t>2.05:
            b = i
        else:
            break
    return i

def get_Tab8(factor,fac_name):
    result = pd.DataFrame(columns = ['Zero return', '5% Insignificant'], index = [fac_name+'-Return',fac_name+'-CAPM alpha'])
    result.iloc[0,0] = pow(((factor+1).prod(axis = 0)),1/163)-1
    result.iloc[0,1] = get_Cost_005(factor, 0)
    model = sm.OLS(factor.values,sm.add_constant(mkt).values).fit(cov_type='HAC',cov_kwds={'maxlags':4})
    result.iloc[1,0] = pow((((factor-model.params[1]*mkt)+1).prod(axis = 0)),1/163)-1
    result.iloc[1,1] = get_Cost_005(factor, 1)
    return result*100

Tab8 = pd.concat([get_Tab8(Trend,'Trend'),get_Tab8(Ch4.loc[:,'PMO'],'PMO')], axis = 0).applymap(lambda x:round(x, 2))
Tab8

Unnamed: 0,Zero return,5% Insignificant
Trend-Return,0.49,0.12
Trend-CAPM alpha,0.52,0.13
PMO-Return,0.81,0.34
PMO-CAPM alpha,0.97,0.44


![11.png](https://i.loli.net/2020/06/20/qfEibc12AKawMDp.png)

**Performance after controlling for firm characteristics:**      
Consider the performance of the trend factor after controlling for size, EP, BM, beta, R_1, R_6_2, R_12_2, IVOL, ILLIQ, and turnover.    
1.first sort the stocks by one of the control variables into five quintile groups, and then in each quintile stocks are further sorted into five trend quintile portfolios      
2.then average the resulting 5\*5 portfolios across the five quintiles of the control variable to form five new trend quintile portfolios

In [45]:
#导入公司特征
beta = pd.read_csv('beta.csv',index_col = 0).T.iloc[60:-17,:]
valid = [x for x in beta.columns if x in ret.columns]
beta = beta.loc[:, valid]
beta.index = range(1,164)
idio = pd.read_csv('idio_ff_1m.csv',index_col = 0).iloc[95:-17,:]
idio.index = range(1,164)
import re
idio.columns = [re.sub("\D", "", x) for x in idio.columns]
idio.columns = pd.to_numeric(idio.columns)
valid = sorted([x for x in idio.columns if x in ret.columns])
idio = idio.loc[:, valid]

In [46]:
#the prior month return
r_1 = ret.shift().iloc[60:,:]
r_1.index = range(1,164)

def get_R(ret, L):
    ret0 = ret+1
    X = pd.DataFrame()
    for i in range(61,224):
        temp = pd.DataFrame(ret0.iloc[i-L-1:i-2,:].prod(axis = 0)).T
        X = pd.concat([X, temp],axis=0)
    X.index = range(1,164)
    return X
#the past six-month cumulative return skipping the last month
#the past twelve-month cumulative return skipping the last month
r_6_2 = get_R(ret, 6)
r_12_2 = get_R(ret, 12)

In [48]:
def get_Tab9(data1,data2,rt):
    '''
    输出等权收益和CAPM调整后α，以及两个数值的NW六期滞后t检验值
    
    参数
    data1：分组变量1，面板数据包括股票代码、月份编码和变量值三列
    data2：分组变量2，格式同上
    rt: 超额收益，格式同上
    
    输出
    df,两个变量1,2,3,4,5,5-1和avg组中的加权平均超额收益和各自5-1组回归CAPMα
    ''' 
    # grouping
    X = pd.DataFrame()
    data = pd.merge(data1,data2,on=['Stkcd','mon_num'])
    data = pd.merge(data,rt,on=['Stkcd','mon_num'])
    data = data.dropna()
    data['ret*mkt'] = data['Mretwd']*data['Msmvosd']
    data1_name = data1.columns[2]
    data2_name = data2.columns[2]
    
    for i in range(1,164):
        temp_value = data[data['mon_num'] == i]
        temp_value['sample'] = pd.qcut(temp_value['Msmvosd'],[0,0.3,1],labels=False,duplicates='drop')
        X = pd.concat([X,temp_value],axis=0)# panel data with group index
    data = X[(X['sample'] == 1)]
    del data['sample']
    X = pd.DataFrame()
    for i in range(1,164):
        temp_value = data[data['mon_num'] == i]
        temp_value['group1'] = pd.qcut(temp_value[data1_name],5,labels=False,duplicates='drop')
        x = pd.DataFrame()
        for j in range(5):
            temp_value2 = temp_value[temp_value['group1']==j]
            temp_value2['group2'] = pd.qcut(temp_value2[data2_name],5,labels=False,duplicates='drop')
            x = pd.concat([x,temp_value2],axis=0)
        X = pd.concat([X,x],axis=0)# panel data with group index
    temp = (X.groupby(['mon_num','group1','group2'])['ret*mkt'].sum()/X.groupby(['mon_num','group1','group2'])['Msmvosd'].sum()).reset_index()
    temp = temp.rename(columns={0:'rt_w'})
    temp = pd.pivot_table(temp,index = ['mon_num','group2'],columns = 'group1')['rt_w'].reset_index()
    # avg and 5-1
    temp.loc[:,5] = temp.iloc[:,2:7].mean(axis=1)
    df = pd.DataFrame()
    for i in list(set(temp['mon_num'])):
        x = temp[temp['mon_num'] == i].reset_index(drop=True)
        x.loc[5,0:] = x.iloc[4,2:]-x.iloc[0,2:]
        x.iloc[5,:2] = [i,5]
        df = pd.concat([df,x],axis=0)
    df.reset_index(drop=True)
    # 计算所有组平均超额收益
    table = df.groupby('group2').mean().reset_index(drop=True)*100
    del table['mon_num']
    # 计算所有data2分组内data1 5-1组值调整t检验以及CAPMα回归和调整t值
    avg_t = pd.DataFrame(index = range(6), columns = range(6))
    for j in range(6):
        temp1 = df[df['group2']==j]
        temp1.index = temp1['mon_num']
        for i in range(6):
            model = sm.OLS(temp1.iloc[:,2+i],[1]*len(temp1.iloc[:,2+i]), missing='drop').fit(cov_type='HAC',cov_kwds={'maxlags':4})
            avg_t.iloc[j,i] = model.tvalues[0]
    tableWhole = pd.DataFrame()
    for i in range(6):
        tableWhole = pd.concat([tableWhole,table.T.iloc[i,:],avg_t.T.iloc[i,:]],axis=1)
    return tableWhole.T

Tab9PanelA = get_Tab9(mktcap[['Stkcd','mon_num','Msmvosd']],ER_trend,ret_month[['Stkcd','Mretwd','mon_num']])
Tab9PanelA.columns = ['Trend-L',2,3,4,'Trend-H','Trend-H-L']
Tab9PanelA.index = ['Size-Small','',2,'',3,'',4,'','Size-Big','','Size-Average','']
Tab9PanelA.applymap(lambda x:round(x, 2))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Trend-L,2,3,4,Trend-H,Trend-H-L
Size-Small,1.73,2.38,2.43,2.57,2.0,0.27
,1.69,2.31,2.37,2.58,2.15,1.01
2,1.37,1.84,1.85,2.19,1.87,0.5
,1.43,1.89,1.9,2.2,1.97,2.16
3,1.71,1.78,1.68,1.88,1.47,-0.24
,1.78,1.86,1.76,1.99,1.58,-0.96
4,1.24,1.54,1.42,1.69,1.57,0.33
,1.32,1.6,1.45,1.76,1.68,1.14
Size-Big,0.94,1.46,1.53,1.22,1.53,0.58
,1.08,1.57,1.7,1.43,1.8,1.9


![12.png](https://i.loli.net/2020/06/20/dzGKeOEx8nj6CVJ.png)

In [50]:
def get_Tab9B(control_list,name_list):
    i=0
    X = pd.DataFrame()
    for control in control_list:
        control = control.stack().reset_index().rename(columns={'level_0': 'mon_num','level_1': 'Stkcd',0: name_list[i]})
        i = i+1
        control = pd.merge(control,mktcap[['Stkcd','mon_num','Msmvosd']],on=['Stkcd','mon_num'])
        temp = get_Tab9(control,ER_trend,ret_month[['Stkcd','Mretwd','mon_num']]).iloc[-2:,:]
        X = pd.concat([X,temp],axis=0)
    X.index = ['ep-Average','','bm-Average','','beta-Average','','R-1-Average','','R-6-2-Average','','R-12-2-Average','','IVOL-Average','','illq-Average','','turnover-Average','']
    X.columns = ['Trend-L',2,3,4,'Trend-H','Trend-H-L']
    return X

Tab9B = get_Tab9B([ep,bm,beta,r_1,r_6_2,r_12_2,illq,idio,ret_turn],['ep','bm','beta','R-1','R-6-2','R-12-2','IVOL','illq','ret_turn'])
Tab9B.applymap(lambda x:round(x, 2))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Trend-L,2,3,4,Trend-H,Trend-H-L
ep-Average,0.94,1.17,1.38,1.54,1.59,0.66
,1.02,1.34,1.49,1.71,1.78,2.53
bm-Average,1.14,1.28,1.52,1.59,1.79,0.65
,1.27,1.42,1.67,1.79,1.96,2.54
beta-Average,1.13,1.22,1.36,1.6,1.65,0.52
,1.16,1.29,1.47,1.76,1.82,1.82
R-1-Average,1.13,1.4,1.55,1.86,1.82,0.69
,1.21,1.57,1.74,2.01,2.07,2.19
R-6-2-Average,1.33,1.37,1.37,1.63,1.66,0.32
,1.39,1.48,1.52,1.86,1.9,1.14


![13.png](https://i.loli.net/2020/06/20/uZBjbo8Sze1IAXd.png)

**Conclusion:**    
The evidence suggests that our trend factor captures unique features of the Chinese stock market that cannot be replicated by the usual firm characteristics.

**The US evidence:(No data)**     
1.Even in the US, volume can still provide incremental predictive information, albeit small, in addition to price.    
2.The the price trend are more important in the USA. This is consistent with the fact that the trading in the Chinese
stock market is dominated by individual investors.     

**Trend and the participation of retail investors:(No data)**    
We use the share-holding ratio of retail investorsin each stock to approximate the uninformed investors' population for the risky asset(data comes from WIND database), and find that the greater the retail investor participation, the better the trend factor performance.     

**Trend and volatility of noise trader demand:**     
We construct the normalized residual volatility of trading volume to measure noise trader demand volatility (regress the monthly trading volume in month t on that in month t - 1 over the past 12 months), and find that the trend factor earns significantly higher returns in stocks with greater volatility of noise trader demand.