## Famma-French三因子模型

### FF基本形式
Famma-French模型认为投资组合的收益率是由市场因子、市值（Market Value）因子和账面市值比（BooK to Market Value,B/M ）因子共同决定。三个因子分别对应着三个投资组合的收益率：市场因子对应市场投资组合的收益率，市值因子对应的的做多市值较小的公式、做空市值较大的公司的投资组合收益率，账面市值比因子对应的是做多高B/M公司、做空低B/M公司的投资组合收益率。模型的形式如下：
$$E(R_{it})-R_{ft}=b_{i}[E(R_{mt})-R_{ft}] + s_{i}E(SMB_{t})+h_{i}E(HML_{t})$$ 
其中SMB(Small Minus Big)为市值因子，HML(High Minus Low)代表账面市值比因子，$R_{m}$为市场投资组合收益率。FF三因子模型认为如果投资组合的收益率可以由三个因子完全解，那么超额收益率为0.

然而现实中，实证结果证明投资组合的收益率往往并不能够由三因子模型完全解释。因此实证中通常采用的形式为
$$R_{it}-R_{ft}=\alpha_{i} + b_{i}[R_{mt}-R_{ft})] + s_{i}SMB_{t}+h_{i}HML_{t}+\epsilon_{it}$$
其中$\alpha_{i}$为超额收益。

#### Step1：根据市值

将股票池中的所有股票按照市值从小到大排列，将低于中位数的定义为samll组，高于中位数的定义为Big组。

#### Step2：根据账面市值比

将股票池中的所有股票按照账面价值比从低到高排列，将排在前30%的划分为Low组，中间40%为Medium组，后30%为High组

在Fama和French的文章中，他们用每年六月末的股票市值和每年年末的帐市比作为分类依据，因为六月末是美国股市要求披露年报的日期，而年末时间节点的选择是因为我们整个研究是以一个自然年为分组依据。当然在A股的研究中，因为4月是我们年报的披露截止时间，所以在研究中一般会针对市场情况，对时间节点进行调整。

假如我们要用三因子模型研究沪深300，那么首先我们依据每年4月底的300只股票的市值进行排序，分位值定位50%，也就是说一半小盘股（S）和一半大盘股(B)。然后依据年末的帐市比数据把股票分为三类，30%的高价值（H）、40%的中等价值（M）和30%的低价值(L).另外由于我们的研究往往依据多年数据，而上市公司的股票表现都是不断变化的，因此我们需要每十二个月对分组进行一次调整。

In [1]:
import numpy as np
import pandas as pd
import datetime
import time
from jqdata import *

In [2]:
#月初取出因子数值
def get_factors(fdate,factors):
    stock_set = get_index_stocks('000001.XSHG',fdate) # 000001.XSHE是上证指数
    q = query(
        valuation.code,
        balance.total_owner_equities/valuation.market_cap/100000000,
        #balance.total_owner_equities,#资产负债表数据
        #total_owner_equities 指股本、资本公积、盈余公积、未分配利润的之和，
        #代表了股东对企业的所有权，反映了股东在企业资产中享有的经济利益。
        valuation.market_cap, #总市值
        valuation.circulating_market_cap #流通股市值
        ).filter(
        valuation.code.in_(stock_set),
        #valuation.circulating_market_cap
    )
    fdf = get_fundamentals(q, date=fdate)
    fdf.index = fdf['code']
    fdf.columns = ['code'] + factors
    return fdf.iloc[:,-len(factors):]

In [3]:
factors = ['B/M','MARKET_CAP', 'CMV']
fdf = get_factors('2015-04-30',factors)
fdf.head()

Unnamed: 0_level_0,B/M,MARKET_CAP,CMV
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
600000.XSHG,0.856574,3370.6824,2696.5459
600004.XSHG,0.534592,171.58,171.58
600005.XSHG,0.567502,651.0488,651.0488
600006.XSHG,0.514272,156.0,156.0
600007.XSHG,0.295437,182.016,182.016


匿名函数lambda：是指一类无需定义标识符（函数名）的函数或子程序。所谓匿名函数，通俗地说就是没有名字的函数，lambda函数没有名字，是一种简单的、在同一行中定义函数的方法。

- lambda函数一般功能简单：单行expression决定了lambda函数不可能完成复杂的逻辑，只能完成非常简单的功能。由于其实现的功能一目了然，甚至不需要专门的名字来说明。

- lambda 函数可以接收任意多个参数 (包括可选参数) 并且返回单个表达式的值。

- lambda表达式只允许包含一个表达式，不能包含复杂语句，该表达式的运算结果就是函数的返回值。

lambda表达式的基本语法如下：

- lambda arg1,arg2,arg3… :<表达式>

- arg1/arg2/arg3为函数的参数（函数输入），表达式相当于函数体，运算结果是表达式的运算结果。

In [4]:
#测试lambda函数
f=lambda a,b,c,d:a*b*c*d
print(f(1,2,3,4))  #相当于下面这个函数

def test01(a,b,c,d):
    return a*b*c*d
print(test01(1,2,3,4))

24
24


部分Python内置函数接受函数作为参数,典型的此类内置函数有这些:

- filter函数 此时lambda函数用于指定过滤列表元素的条件。例如filter(lambda x: x % 3 == 0, [1, 2, 3])指定将列表[1,2,3]中能够被3整除的元素过滤出来，其结果是[3]。

- sorted函数 此时lambda函数用于指定对列表中所有元素进行排序的准则。例如sorted([1, 2, 3, 4, 5, 6, 7, 8, 9], key=lambda x: abs(5-x))将列表[1, 2, 3, 4, 5, 6, 7, 8, 9]按照元素与5距离从小到大进行排序，其结果是[5, 4, 6, 3, 7, 2, 8, 1, 9]。

- map函数 此时lambda函数用于指定对列表中每一个元素的共同操作。例如map(lambda x: x+1, [1, 2,3])将列表[1, 2, 3]中的元素分别加1，其结果[2, 3, 4]。

- reduce函数 此时lambda函数用于指定列表中两两相邻元素的结合条件。例如reduce(lambda a, b: '{}, {}'.format(a, b), [1, 2, 3, 4, 5, 6, 7, 8, 9])将列表 [1, 2, 3, 4, 5, 6, 7, 8, 9]中的元素从左往右两两以逗号分隔的字符的形式依次结合起来，其结果是'1, 2, 3, 4, 5, 6, 7, 8, 9'。

In [99]:
'''因为lambda是匿名函数，map和lambda结合起来使用，代码更加简洁:
求列表list_x = [1, 2, 3, 4, 5, 6, 7, 8]，每项的平方'''
list_x = [1, 2, 3, 4, 5, 6, 7, 8]
r = map(lambda x:x*x,list_x)
print(list(r))

[1, 4, 9, 16, 25, 36, 49, 64]


In [100]:
# map 是 pandas series 的属性
# import pandas as pd
pd.Series(list_x).map(lambda x:x*x)

0     1
1     4
2     9
3    16
4    25
5    36
6    49
7    64
dtype: int64

In [5]:
# Note that .map can only apply to Series，not dataframe, so we use fdf['MARKET_CAP'], 
# not fdf[['MARKET_CAP']]
# 按市值大小划分
fdf['SB'] = fdf['MARKET_CAP'].map(lambda x: 'B' if \
                                  x >= fdf['MARKET_CAP'].median() else 'S')

In [6]:
fdf.head()

Unnamed: 0_level_0,B/M,MARKET_CAP,CMV,SB
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
600000.XSHG,0.856574,3370.6824,2696.5459,B
600004.XSHG,0.534592,171.58,171.58,B
600005.XSHG,0.567502,651.0488,651.0488,B
600006.XSHG,0.514272,156.0,156.0,B
600007.XSHG,0.295437,182.016,182.016,B


与Python中的列表类似，可以使用for循环遍历DataFrame或Series，但是这样做(尤其是在大型数据集上)非常慢。Pandas中提供了一个高效的替代方案：apply()方法。

- DataFrame.apply(func, axis)

- Series.apply(func, axis)

In [102]:
df=pd.DataFrame(np.random.randn(4,3),columns=list('bde'),\
                index=['utah','ohio','texas','oregon'])
print(df)

#将函数应用到由各列或行形成的一维数组上。DataFrame的apply方法可以实现此功能
f=lambda x:x.max()-x.min()
#默认情况下会以列为单位，分别对列应用函数
t1=df.apply(f)
print(t1)
t2=df.apply(f,axis=1)
print(t2)

               b         d         e
utah   -1.409926 -0.717322 -2.628604
ohio    1.611696  1.472424 -0.737575
texas   0.932880  0.266641  0.131721
oregon  0.908448 -0.848910  0.134432
b    3.021621
d    2.321334
e    2.763036
dtype: float64
utah      1.911281
ohio      2.349271
texas     0.801159
oregon    1.757358
dtype: float64


In [14]:
# 划分高、中、低账面市值比公司
border_down, border_up = fdf['B/M'].quantile([0.3, 0.7])
#border_down, border_up
fdf['HML'] = fdf['B/M'].map(lambda x: 'H' if x >= border_up else 'M')
fdf['HML'] = fdf.apply(lambda row: 'L' if row['B/M'] <= border_down else row['HML'],\
                       axis=1)

In [199]:
fdf.head()

Unnamed: 0_level_0,B/M,MARKET_CAP,CMV,SB,HML
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
600000.XSHG,0.856574,3370.6824,2696.5459,B,H
600004.XSHG,0.534592,171.58,171.58,B,H
600005.XSHG,0.567502,651.0488,651.0488,B,H
600006.XSHG,0.514272,156.0,156.0,B,H
600007.XSHG,0.295437,182.016,182.016,B,M


#### Step3:  综合划分

根据Step1和Step2的结果，可以将得到6组投资组合：S/L，S/M，S/H，B/L，B/M，B/H,其中第一个字母代表市值比组别，第二个字母代表账面市值比组别。另外由于我们的研究往往依据多年数据，而上市公司的股票表现都是不断变化的，因此我们需要每十二个月对分组进行一次调整。

In [15]:
# 组合划分为6组
fdf_SL = fdf.query('(SB=="S") & (HML=="L")')
fdf_SM = fdf.query('(SB=="S") & (HML=="M")')
fdf_SH = fdf.query('(SB=="S") & (HML=="H")')
fdf_BL = fdf.query('(SB=="B") & (HML=="L")')
fdf_BM = fdf.query('(SB=="B") & (HML=="M")')
fdf_BH = fdf.query('(SB=="B") & (HML=="H")')

In [107]:
type(fdf_SL.index)

pandas.core.indexes.base.Index

#### Step4: 计算投资组合收益率
每个投资组合的收益率是组合中所有股票收益率的加权平均和，以B/M组为例
$$BM_{t} = \Sigma_{k=1}^{K}\frac{M_{kt}}{\Sigma_{k=1}^{K}M_{kt}}R_{kt}$$
其中$K$为组合中的股票数量，$M_{kt}$为单只股票的市场价值，$R_{kt}$为组合中单只股票的收益率。

In [9]:
def caculate_port_return(port,startdate,enddate,nextdate,CMV):
    date_list1 = get_trade_days(start_date=startdate,end_date=enddate)
    date_list2 = get_trade_days(start_date=enddate,  end_date=nextdate)
    close1     = get_price(port, date_list1[0], date_list1[-1], 'daily', ['close'], \
                           panel=False)
    close1_m   = close1.loc[close1.time ==date_list1[0].strftime('%Y-%m-%d'),:]
    close2     = get_price(port, date_list2[0], date_list2[-1], 'daily',['close'],\
                           panel=False)
    close2_m   = close2.loc[close2.time ==date_list2[0].strftime('%Y-%m-%d'),:]
    CMV        = CMV.reset_index(drop=True)
    m_return   = close2_m['close'].reset_index(drop=True)/\
                  close1_m['close'].reset_index(drop=True)-1
    weighted_m_return = (m_return*CMV).sum()/(CMV.sum())
    #((close2_m['close'].reset_index(drop=True)/close1_m['close'].reset_index(drop=True)-1)*CMV)).sum()/(CMV.sum()) 
    return weighted_m_return

In [229]:
SL_t = caculate_port_return(fdf_SL.index.tolist(),\
                            '2015-06-01','2015-07-01','2015-08-01',fdf_SL['CMV'])
SL_t

-0.18951292792229663

In [4]:
startdate = '2015-06-01'
enddate   = '2015-07-01'
nextdate  = '2015-08-01'

In [9]:
date_list1 = get_trade_days(start_date=startdate,end_date=enddate)
date_list2 = get_trade_days(start_date=enddate,  end_date=nextdate)

In [11]:
date_list2

array([2015-07-01, 2015-07-02, 2015-07-03, 2015-07-06, 2015-07-07,
       2015-07-08, 2015-07-09, 2015-07-10, 2015-07-13, 2015-07-14,
       2015-07-15, 2015-07-16, 2015-07-17, 2015-07-20, 2015-07-21,
       2015-07-22, 2015-07-23, 2015-07-24, 2015-07-27, 2015-07-28,
       2015-07-29, 2015-07-30, 2015-07-31], dtype=object)

In [16]:
close1=get_price(fdf_SL.index.tolist(), date_list1[0], date_list1[-1], 'daily', ['close'],\
                           panel=False)

In [17]:
close1

Unnamed: 0,time,code,close
0,2015-06-01,600053.XSHG,12.49
1,2015-06-02,600053.XSHG,12.49
2,2015-06-03,600053.XSHG,12.49
3,2015-06-04,600053.XSHG,12.49
4,2015-06-05,600053.XSHG,12.49
5,2015-06-08,600053.XSHG,12.49
6,2015-06-09,600053.XSHG,12.49
7,2015-06-10,600053.XSHG,12.49
8,2015-06-11,600053.XSHG,12.49
9,2015-06-12,600053.XSHG,12.49


In [18]:
close1_m   = close1.loc[close1.time ==date_list1[0].strftime('%Y-%m-%d'),:]

In [19]:
close1_m

Unnamed: 0,time,code,close
0,2015-06-01,600053.XSHG,12.49
22,2015-06-01,600055.XSHG,23.23
44,2015-06-01,600069.XSHG,5.40
66,2015-06-01,600071.XSHG,29.02
88,2015-06-01,600072.XSHG,39.39
110,2015-06-01,600076.XSHG,14.62
132,2015-06-01,600083.XSHG,14.16
154,2015-06-01,600088.XSHG,39.12
176,2015-06-01,600090.XSHG,23.23
198,2015-06-01,600091.XSHG,15.20


In [20]:
close2=get_price(fdf_SL.index.tolist(), date_list2[0], date_list2[-1], 'daily',['close'],\
                           panel=False)
close2_m   = close2.loc[close2.time ==date_list2[0].strftime('%Y-%m-%d'),:]

In [21]:
close2_m

Unnamed: 0,time,code,close
0,2015-07-01,600053.XSHG,12.49
23,2015-07-01,600055.XSHG,19.73
46,2015-07-01,600069.XSHG,4.97
69,2015-07-01,600071.XSHG,22.81
92,2015-07-01,600072.XSHG,25.84
115,2015-07-01,600076.XSHG,11.02
138,2015-07-01,600083.XSHG,13.59
161,2015-07-01,600088.XSHG,24.43
184,2015-07-01,600090.XSHG,16.40
207,2015-07-01,600091.XSHG,12.83


In [22]:
CMV = fdf_SL['CMV'].reset_index(drop=True)

In [23]:
m_return   = close2_m['close'].reset_index(drop=True)/\
close1_m['close'].reset_index(drop=True)-1

In [24]:
weighted_m_return = (m_return*CMV).sum()/(CMV.sum())
weighted_m_return

-0.18951292792229663

In [187]:
# 计算各组收益率
SL_t = caculate_port_return(fdf_SL.index.tolist(),'2015-06-01','2015-07-01','2015-08-01',\
                            fdf_SL['CMV'])
SM_t = caculate_port_return(fdf_SM.index.tolist(),'2015-06-01','2015-07-01','2015-08-01',\
                            fdf_SM['CMV'])
SH_t = caculate_port_return(fdf_SH.index.tolist(),'2015-06-01','2015-07-01','2015-08-01',\
                            fdf_SH['CMV'])

In [188]:
BL_t = caculate_port_return(fdf_BL.index.tolist(),'2015-06-01','2015-07-01','2015-08-01',\
                            fdf_BL['CMV'])
BM_t = caculate_port_return(fdf_BM.index.tolist(),'2015-06-01','2015-07-01','2015-08-01',\
                            fdf_BM['CMV'])
BH_t = caculate_port_return(fdf_BH.index.tolist(),'2015-06-01','2015-07-01','2015-08-01',\
                            fdf_BH['CMV'])

#### Step5：计算SMB和HML

$$SMB_{t} = \frac{1}{3}(SL_{t}+SM_{t}+SH_{t})-\frac{1}{3}(BL_{t}+BM_{t}+BH_{t})$$
$$HML_{t} = \frac{1}{2}(SL_{t}+BL_{t})-\frac{1}{2}(SH_{t}+BH_{t}).$$

In [189]:
SMB_t = (SL_t + SM_t + SH_t)/3 - (BL_t + BM_t + BH_t)/3
HML_t = (SL_t + BL_t)/2    - (SH_t + BH_t)/2

## 单个因子的使用

In [5]:
score = fdf['CMV'].sort_values()
score.head()

code
603010.XSHG    6.6200
603009.XSHG    7.4729
603268.XSHG    7.4976
603519.XSHG    7.7000
603601.XSHG    8.6564
Name: CMV, dtype: float64

In [6]:
len_5 = int(len(score)/5)
port1 = list(score.index)[: len_5]
port2 = list(score.index)[ len_5: 2*len_5]
port3 = list(score.index)[ 2*len_5: -2*len_5]
port4 = list(score.index)[ -2*len_5: -len_5]
port5 = list(score.index)[ -len_5: ]

In [7]:
def caculate_benchmark_return(port,startdate,enddate,nextdate):
    date_list1 = get_trade_days(start_date=startdate,end_date=enddate)
    date_list2 = get_trade_days(start_date=enddate,  end_date=nextdate)
    close1     = get_price(port, date_list1[0], date_list1[-1], 'daily', ['close'], \
                           panel=False)
    close1_m   = close1.loc[close1.time ==date_list1[0].strftime('%Y-%m-%d'),:]
    close2     = get_price(port, date_list2[0], date_list2[-1], 'daily',['close'],\
                           panel=False)
    close2_m   = close2.loc[close2.time ==date_list2[0].strftime('%Y-%m-%d'),:]
    m_return   = close2_m['close'].reset_index(drop=True)/\
                  close1_m['close'].reset_index(drop=True)-1
    weighted_m_return = m_return.sum()
    return weighted_m_return

计算不同市值类的股票月收益率

In [10]:
df = {}
CMV = fdf['CMV']
benchmark_return = caculate_benchmark_return(['000001.XSHG'],startdate,enddate,nextdate)
df['port1'] = caculate_port_return(port1,startdate,enddate,nextdate,CMV)
df['port2'] = caculate_port_return(port2,startdate,enddate,nextdate,CMV)
df['port3'] = caculate_port_return(port3,startdate,enddate,nextdate,CMV)
df['port4'] = caculate_port_return(port4,startdate,enddate,nextdate,CMV)
df['port5'] = caculate_port_return(port5,startdate,enddate,nextdate,CMV)
print(pd.Series(df))
print('benchmark_return %s'%benchmark_return)

port1   -0.047702
port2   -0.028474
port3   -0.037784
port4   -0.036370
port5   -0.048579
dtype: float64
benchmark_return -0.16050563915224259


我们考察一个简单的投资策略，就是按市值每月调仓，一直持有固定市值大小分类的股票。注意：1）我们取得是startdate前一天的市值数据，因为我们定义startdate为每月一号，所以用来分隔股票池的上月市值；2）我们计算的收益率是本月收益率。

In [21]:
factors = ['B/M','MARKET_CAP', 'CMV']

#因为研究模块取fundmental数据默认date为研究日期的前一天。所以要自备时间序列。按月取
year = ['2009','2010','2011','2012','2013','2014','2015']
month = ['01','02','03','04','05','06','07','08','09','10','11','12']
result = {}

for i in range(7*12):
    startdate = year[i//12] + '-' + month[i%12] + '-01'
    try:
        enddate = year[(i+1)//12] + '-' + month[(i+1)%12] + '-01'
    except IndexError:
        enddate = '2016-01-01'
    try:
        nextdate = year[(i+2)//12] + '-' + month[(i+2)%12] + '-01'
    except IndexError:
        if enddate == '2016-01-01':
            nextdate = '2016-02-01'
        else:
            nextdate = '2016-01-01'
    print('time %s'%startdate)
    fdf = get_factors(startdate,factors)
    CMV = fdf['CMV']
    #5个组合，一个benchmark, 3个因子
    df = pd.DataFrame(np.zeros(6*len(factors)).reshape(6,len(factors)),\
                   index = ['port1','port2','port3','port4','port5','benchmark'],\
                   columns = factors)
    for fac in factors:
        score = fdf[fac].sort_values()
        len_5 = int(len(score)/5)
        port1 = list(score.index)[: len_5]
        port2 = list(score.index)[ len_5: 2*len_5]
        port3 = list(score.index)[ 2*len_5: -2*len_5]
        port4 = list(score.index)[ -2*len_5: -len_5]
        port5 = list(score.index)[ -len_5: ]
        df.loc['port1',fac] = caculate_port_return(port1,startdate,enddate,nextdate,CMV)
        df.loc['port2',fac] = caculate_port_return(port2,startdate,enddate,nextdate,CMV)
        df.loc['port3',fac] = caculate_port_return(port3,startdate,enddate,nextdate,CMV)
        df.loc['port4',fac] = caculate_port_return(port4,startdate,enddate,nextdate,CMV)
        df.loc['port5',fac] = caculate_port_return(port5,startdate,enddate,nextdate,CMV)
        df.loc['benchmark',fac] = caculate_benchmark_return(['000001.XSHG'],startdate,\
                                                            enddate,nextdate)
        print('factor %s'%fac)
    result[startdate] = df
#monthly_return = pd.Panel(result)

time 2009-01-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-02-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-03-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-04-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-05-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-06-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-07-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-08-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-09-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-10-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-11-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-12-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-01-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-02-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-03-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-04-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-05-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-06-01
factor B/M
factor MARKET_CAP
fac

In [11]:
14//12

1

The 'try' block lets you test a block of code for errors.

The 'except' block lets you handle the error.

In [13]:
# x is not defined
try:
  print(x)
except:
  print("An exception occurred")

An exception occurred


In [12]:
print(x)

NameError: name 'x' is not defined

In [14]:
try:
  print(x)
except NameError:
  print("Variable x is not defined")
except:
  print("Something else went wrong")

Variable x is not defined


In [37]:
try:
  print("Hello")
except:
  print("Something went wrong")
else:
  print("Nothing went wrong")

Hello
Nothing went wrong


In [15]:
factors = ['B/M','MARKET_CAP', 'CMV']

#因为研究模块取fundmental数据默认date为研究日期的前一天。所以要自备时间序列。按月取
year = ['2009','2010']
month = ['01','02','03','04','05','06','07','08','09','10','11','12']
result = {}
#dateind = []

for i in range(len(year)*len(month)):
    startdate = year[i//12] + '-' + month[i%12] + '-01'
    try:
        enddate = year[(i+1)//12] + '-' + month[(i+1)%12] + '-01'
    except IndexError:
        enddate = '2011-01-01'
    try:
        nextdate = year[(i+2)//12] + '-' + month[(i+2)%12] + '-01'
    except IndexError:
        if enddate == '2011-01-01':
            nextdate = '2011-02-01'
        else:
            nextdate = '2011-01-01'
    # %s,表示格化式一个对象为字符，将值插入到%s占位符的字符串中。%字符：标记转换说明符的开始。
    #在%的左侧放置一个字符串（格式化字符串），而右侧则放置希望格式化的值。        
    print('time %s'%startdate)
    fdf = get_factors(startdate,factors)
    CMV = fdf['CMV']
    #5个组合，一个benchmark, 3个因子
    df = pd.DataFrame(np.zeros(6*len(factors)).reshape(6,len(factors)),\
                   index = ['port1','port2','port3','port4','port5','benchmark'],\
                   columns = factors)
    for fac in factors:
        score = fdf[fac].sort_values()
        len_5 = int(len(score)/5)
        port1 = list(score.index)[: len_5]
        port2 = list(score.index)[ len_5: 2*len_5]
        port3 = list(score.index)[ 2*len_5: -2*len_5]
        port4 = list(score.index)[ -2*len_5: -len_5]
        port5 = list(score.index)[ -len_5: ]
        df.loc['port1',fac] = caculate_port_return(port1,startdate,enddate,nextdate,CMV)
        df.loc['port2',fac] = caculate_port_return(port2,startdate,enddate,nextdate,CMV)
        df.loc['port3',fac] = caculate_port_return(port3,startdate,enddate,nextdate,CMV)
        df.loc['port4',fac] = caculate_port_return(port4,startdate,enddate,nextdate,CMV)
        df.loc['port5',fac] = caculate_port_return(port5,startdate,enddate,nextdate,CMV)
        df.loc['benchmark',fac] = caculate_benchmark_return(['000001.XSHG'],startdate,\
                                                            enddate,nextdate)
        print('factor %s'%fac)
    result[startdate] = df

time 2009-01-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-02-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-03-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-04-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-05-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-06-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-07-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-08-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-09-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-10-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-11-01
factor B/M
factor MARKET_CAP
factor CMV
time 2009-12-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-01-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-02-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-03-01
factor B/M
factor MARKET_CAP
factor MARKET_CAP
factor CMV
time 2010-05-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-06-01
factor B/M
factor MARKET_CAP
factor CMV
time 2010-07-01
factor B/M
fac

In [16]:
result

{'2009-01-01':                 B/M  MARKET_CAP       CMV
 port1      0.039124    0.046456  0.047465
 port2      0.041022    0.054009  0.052614
 port3      0.052947    0.056120  0.040766
 port4      0.038200    0.056400  0.046542
 port5      0.057812    0.054762  0.055300
 benchmark  0.069633    0.069633  0.069633,
 '2009-02-01':                 B/M  MARKET_CAP       CMV
 port1      0.028673    0.041435  0.033686
 port2      0.009645    0.028787  0.027942
 port3      0.015562    0.033632  0.033435
 port4      0.022094    0.031442  0.029802
 port5      0.026895    0.011912  0.015857
 benchmark  0.040648    0.040648  0.040648,
 '2009-03-01':                 B/M  MARKET_CAP       CMV
 port1      0.093686    0.075040  0.078816
 port2      0.077238    0.078304  0.084600
 port3      0.066776    0.064671  0.078915
 port4      0.074310    0.072289  0.069646
 port5      0.073496    0.071329  0.060857
 benchmark  0.150264    0.150264  0.150264,
 '2009-04-01':                 B/M  MARKET_CAP      

In [17]:
monthly_return = pd.concat(result, axis = 0)

In [18]:
monthly_return

Unnamed: 0,Unnamed: 1,B/M,MARKET_CAP,CMV
2009-01-01,port1,0.039124,0.046456,0.047465
2009-01-01,port2,0.041022,0.054009,0.052614
2009-01-01,port3,0.052947,0.056120,0.040766
2009-01-01,port4,0.038200,0.056400,0.046542
2009-01-01,port5,0.057812,0.054762,0.055300
2009-01-01,benchmark,0.069633,0.069633,0.069633
2009-02-01,port1,0.028673,0.041435,0.033686
2009-02-01,port2,0.009645,0.028787,0.027942
2009-02-01,port3,0.015562,0.033632,0.033435
2009-02-01,port4,0.022094,0.031442,0.029802


In [19]:
monthly_return.index

MultiIndex(levels=[['2009-01-01', '2009-02-01', '2009-03-01', '2009-04-01', '2009-05-01', '2009-06-01', '2009-07-01', '2009-08-01', '2009-09-01', '2009-10-01', '2009-11-01', '2009-12-01', '2010-01-01', '2010-02-01', '2010-03-01', '2010-04-01', '2010-05-01', '2010-06-01', '2010-07-01', '2010-08-01', '2010-09-01', '2010-10-01', '2010-11-01', '2010-12-01'], ['port1', 'port2', 'port3', 'port4', 'port5', 'benchmark']],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23], [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2,

In [22]:
monthly_return.columns

Index(['B/M', 'MARKET_CAP', 'CMV'], dtype='object')

In [43]:
idx = pd.IndexSlice
return_CMV = monthly_return.loc[idx[:, :], idx['CMV']].unstack()

In [20]:
monthly_return['CMV']

2009-01-01  port1        0.047465
            port2        0.052614
            port3        0.040766
            port4        0.046542
            port5        0.055300
            benchmark    0.069633
2009-02-01  port1        0.033686
            port2        0.027942
            port3        0.033435
            port4        0.029802
            port5        0.015857
            benchmark    0.040648
2009-03-01  port1        0.078816
            port2        0.084600
            port3        0.078915
            port4        0.069646
            port5        0.060857
            benchmark    0.150264
2009-04-01  port1        0.199099
            port2        0.029968
            port3        0.037521
            port4        0.015147
            port5        0.045904
            benchmark    0.063077
2009-05-01  port1        0.024599
            port2        0.026452
            port3        0.017738
            port4        0.005977
            port5        0.005990
            be

In [21]:
return_CMV = monthly_return['CMV'].unstack()

In [22]:
return_CMV

Unnamed: 0,port1,port2,port3,port4,port5,benchmark
2009-01-01,0.047465,0.052614,0.040766,0.046542,0.0553,0.069633
2009-02-01,0.033686,0.027942,0.033435,0.029802,0.015857,0.040648
2009-03-01,0.078816,0.0846,0.078915,0.069646,0.060857,0.150264
2009-04-01,0.199099,0.029968,0.037521,0.015147,0.045904,0.063077
2009-05-01,0.024599,0.026452,0.017738,0.005977,0.00599,0.063037
2009-06-01,0.051858,0.018657,0.019232,0.034271,0.020074,0.105417
2009-07-01,0.065285,0.042574,0.064111,0.049174,0.054529,0.15107
2009-08-01,-0.052363,-0.05666,-0.054922,-0.071989,-0.064893,-0.224939
2009-09-01,0.037376,0.020742,0.021226,0.031873,0.026839,0.084957
2009-10-01,0.034087,0.028856,0.02462,0.023141,0.024307,0.056643


In [49]:
(return_CMV+1).cumprod()

Unnamed: 0,port1,port2,port3,port4,port5,benchmark
2009-01-01,1.047465,1.052614,1.040766,1.046542,1.0553,1.069633
2009-02-01,1.08275,1.082026,1.075564,1.077731,1.072034,1.113111
2009-03-01,1.168089,1.173566,1.160442,1.152791,1.137275,1.280371
2009-04-01,1.400654,1.208736,1.203983,1.170253,1.189481,1.361133
2009-05-01,1.435108,1.240709,1.225339,1.177247,1.196606,1.446935
2009-06-01,1.50953,1.263857,1.248904,1.217592,1.220627,1.599467
2009-07-01,1.60808,1.317665,1.328973,1.277466,1.287186,1.841098
2009-08-01,1.523876,1.243007,1.255984,1.185503,1.203656,1.426964
2009-09-01,1.580832,1.268789,1.282644,1.223289,1.235962,1.548194
2009-10-01,1.634717,1.305401,1.314222,1.251597,1.266004,1.635889
