# Alphalens and Pyfolio integration

Alphalens can simulate the performance of a portfolio where the factor values are use to weight stocks. Once the portfolio is built, it can be analyzed by Pyfolio. For details on how this portfolio is built see:
- alphalens.performance.factor_returns
- alphalens.performance.cumulative_returns 
- alphalens.performance.create_pyfolio_input

In [1]:
# %pylab inline --no-import-all
import alphalens
import pyfolio
import pandas as pd
import numpy as np
import datetime

First load some stocks data

# 参数

In [2]:
lookahead_bias_days = 5
start_date = "2019-01-01"
end_date = '2020-07-15'

# 数据

In [3]:
from pathlib import Path

In [4]:
fp_sector = Path('sector.pkl')
fp_pricing = Path('b_open.pkl')

In [5]:
from zipline.research import get_pricing, get_sector_mappings

In [6]:
if not fp_sector.exists():
    # asset -> sector name
    sector_mappings = get_sector_mappings()
    tickers = np.random.choice(list(sector_mappings.keys()), 100)
    sector_mappings = {k:sector_mappings[k] for k in tickers}
    s = pd.Series(sector_mappings)
    s.to_pickle(str(fp_sector))
    
sector_mappings = pd.read_pickle(str(fp_sector)).to_dict()

In [7]:
if not fp_pricing.exists():
    pricing = get_pricing(sector_mappings.keys(), start_date, end_date, fields='b_open')
    pricing.to_pickle(str(fp_pricing))
    
pricing = pd.read_pickle(str(fp_pricing))

In [8]:
factor = -pricing.pct_change(lookahead_bias_days)
# introduce look-ahead bias and make the factor predictive
# factor = predictive_factor.shift(-lookahead_bias_days)
factor = factor.stack()
factor.index = factor.index.set_names(['date', 'asset'])

We'll compute a simple mean reversion factor looking at recent stocks performance: stocks that performed well in the last 5 days will have high rank and vice versa.

The pricing data passed to alphalens should contain the entry price for the assets so it must reflect the next available price after a factor value was observed at a given timestamp. Those prices must not be used in the calculation of the factor values for that time. Always double check to ensure you are not introducing lookahead bias to your study.

The pricing data must also contain the exit price for the assets, for period 1 the price at the next timestamp will be used, for period 2 the price after 2 timestats will be used and so on.

There are no restrinctions/assumptions on the time frequencies a factor should be computed at and neither on the specific time a factor should be traded (trading at the open vs trading at the close vs intraday trading), it is only required that factor and price DataFrames are properly aligned given the rules above.

In our example, before the trading starts every day, we observe yesterday factor values. The price we pass to alphalens is the next available price after that factor observation: the daily open price that will be used as assets entry price. Also, we are not adding additional prices so the assets exit price will be the following days open prices (how many days depends on 'periods' argument). The retuns computed by Alphalens will therefore based on assets open prices.

In [9]:
pricing = pricing.iloc[1:]
pricing.head()

Unnamed: 0_level_0,四方精创(300468),华纺股份(600448),开尔新材(300234),云天化(600096),中国核电(601985),光弘科技(300735),宇环数控(002903),优刻得(688158),火炬电子(603678),久远银海(002777),...,森霸传感(300701),新劲刚(300629),昊海生科(688366),南天信息(000948),上海雅仕(603329),安宁股份(002978),益佰制药(600594),威创股份(002308),凯发电气(300407),中电环保(300172)
b_open,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-01-03 00:00:00+00:00,14.407,5.125,10.73,8.53,5.4,8.816,25.633,,38.046,25.432,...,20.982,23.146,,9.103,15.338,,58.343,19.28,30.053,27.323
2019-01-04 00:00:00+00:00,14.223,5.208,10.692,8.39,5.371,8.189,24.322,,38.579,25.638,...,20.121,23.056,,8.957,14.993,,58.125,19.24,28.4,27.274
2019-01-07 00:00:00+00:00,14.906,5.233,10.972,8.65,5.469,8.693,25.594,,40.881,27.123,...,21.449,25.495,,9.394,15.535,,60.262,20.19,31.337,28.001
2019-01-08 00:00:00+00:00,15.037,5.158,11.082,8.75,5.547,8.977,25.822,,42.639,27.465,...,21.581,28.123,,9.456,15.772,,60.793,21.23,31.865,28.05
2019-01-09 00:00:00+00:00,15.051,5.15,11.268,8.72,5.547,9.261,25.762,,41.967,27.356,...,21.423,26.664,,9.433,15.811,,60.793,20.88,32.064,28.364


# Prepare data and run Alphalens

Pyfolio wants timezone set to UTC otherwise it refuses to work

In [10]:
# 验证 tz == 'UTC'
pricing.index.tz

<UTC>

In [11]:
factor.index.levels[0].tz

<UTC>

In [12]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor,
                                                                   pricing,
                                                                   periods=(1, 3, 5),
                                                                   quantiles=5,
                                                                   bins=None)

Dropped 1.4% entries from factor data: 1.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!


In [13]:
alphalens.tears.create_summary_tear_sheet(factor_data)

# Prepare data for Pyfolio

We can see in Alphalens analysis that quantiles 1 and 5 are the most predictive so we'll build a portfolio data using only those quantiles.

In [14]:
pf_returns, pf_positions, pf_benchmark = \
    alphalens.performance.create_pyfolio_input(factor_data,
                                               period='1D',
                                               capital=100000,
                                               long_short=True,
                                               group_neutral=False,
                                               equal_weight=True,
                                               quantiles=[1,5],
                                               groups=None,
                                               benchmark_period='1D')

Now that we have prepared the data we can run Pyfolio functions

In [15]:
pyfolio.tears.create_full_tear_sheet(pf_returns,
                                     positions=pf_positions,
                                     benchmark_rets=pf_benchmark)

开始日期,2019-01-09,2019-01-09
结束日期,2020-07-08,2020-07-08
总月数,26,26
Unnamed: 0_level_3,回测,Unnamed: 2_level_3
年收益率,16.912%,
累积收益率,40.378%,
年波动,9.899%,
Sharpe 比率,1.63,
Calmar 比率,1.57,
稳定度,0.70,
最大回撤,-10.796%,
Omega 比率,1.42,
Sortino 比率,2.39,
偏度,-0.53,


最大回撤,净回撤百分比%,波峰日期,波谷日期,恢复日期,持续时间
0,10.8,2020-01-14,2020-02-25,2020-07-06,125.0
1,3.35,2019-01-22,2019-01-31,2019-02-12,16.0
2,2.53,2020-07-07,2020-07-08,NaT,
3,2.44,2019-07-15,2019-08-16,2019-08-28,33.0
4,2.4,2019-02-14,2019-02-25,2019-02-28,11.0


压力事件,mean,min,max
新型肺炎,-0.37%,-4.14%,1.14%


期间排名前10位的【多头】头寸,max
asset,Unnamed: 1_level_1
10,2.78%
603,2.78%
829,2.78%
2013,2.78%
2103,2.78%
2182,2.78%
2207,2.78%
2219,2.78%
2242,2.78%
2251,2.78%


期间排名前10位的【空头】头寸,max
asset,Unnamed: 1_level_1
603,-2.94%
2207,-2.94%
2251,-2.94%
2367,-2.94%
2368,-2.94%
2777,-2.94%
2778,-2.94%
300522,-2.94%
300735,-2.94%
601128,-2.94%


期间排名前10位总头寸,max
asset,Unnamed: 1_level_1
603,2.94%
2207,2.94%
2251,2.94%
2367,2.94%
2368,2.94%
2777,2.94%
2778,2.94%
300522,2.94%
300735,2.94%
601128,2.94%


## Analyzing subsets of data

Sometimes it might be useful to analyze subets of your factor data, for example it could be interesting to see the comparison of your factor in different days of the week. Below we'll see how to select and analyze factor data corresponding to Mondays, the positions will be held the for a period of 5 days

In [16]:
monday_factor_data = factor_data[ factor_data.index.get_level_values('date').weekday == 0 ]

In [17]:
pf_returns, pf_positions, pf_benchmark = \
    alphalens.performance.create_pyfolio_input(monday_factor_data,
                                               period='5D',
                                               capital=100000,
                                               long_short=True,
                                               group_neutral=False,
                                               equal_weight=True,
                                               quantiles=[1,5],
                                               groups=None,
                                               benchmark_period='1D')

In [18]:
pyfolio.tears.create_full_tear_sheet(pf_returns,
                                     positions=pf_positions,
                                     benchmark_rets=pf_benchmark)

开始日期,2019-01-14,2019-01-14
结束日期,2020-07-06,2020-07-06
总月数,25,25
Unnamed: 0_level_3,回测,Unnamed: 2_level_3
年收益率,4.615%,
累积收益率,10.15%,
年波动,8.514%,
Sharpe 比率,0.57,
Calmar 比率,0.53,
稳定度,0.27,
最大回撤,-8.747%,
Omega 比率,1.30,
Sortino 比率,0.88,
偏度,0.72,


最大回撤,净回撤百分比%,波峰日期,波谷日期,恢复日期,持续时间
0,8.75,2019-03-17,2019-08-12,2020-01-06,211
1,6.41,2020-01-19,2020-02-03,2020-03-30,51
2,5.72,2020-04-12,2020-05-18,2020-06-15,46
3,2.65,2019-01-20,2019-01-28,2019-02-25,26
4,0.13,2020-06-21,2020-06-22,2020-06-29,6


压力事件,mean,min,max
新型肺炎,-0.26%,-3.56%,0.69%


期间排名前10位的【多头】头寸,max
asset,Unnamed: 1_level_1
2103,2.78%
2353,2.78%
2368,2.78%
2389,2.78%
2681,2.78%
2825,2.78%
2903,2.78%
300252,2.78%
300389,2.78%
300468,2.78%


期间排名前10位的【空头】头寸,max
asset,Unnamed: 1_level_1
2207,-2.78%
2219,-2.78%
2242,-2.78%
2251,-2.78%
2308,-2.78%
2515,-2.78%
2670,-2.78%
2777,-2.78%
300420,-2.78%
300434,-2.78%


期间排名前10位总头寸,max
asset,Unnamed: 1_level_1
2103,2.78%
2207,2.78%
2219,2.78%
2242,2.78%
2251,2.78%
2308,2.78%
2353,2.78%
2368,2.78%
2389,2.78%
2515,2.78%
