* 请在环境变量中设置`DB_URI`指向数据库
* 请在环境变量中设置`DATAYES_TOKEN`作为通联数据登陆凭证

In [None]:
%matplotlib inline
import os
from matplotlib import pyplot as plt
import uqer
import numpy as np
import pandas as pd
from uqer import DataAPI as api
from alphamind.api import *
from alphamind.data.neutralize import neutralize

plt.style.use('ggplot')

In [None]:
_ = uqer.Client(token=os.environ['DATAYES_TOKEN'])

In [None]:
ref_date = '2017-06-23'
factor = 'EPS'

engine = SqlEngine(os.environ['DB_URI'])
universe = Universe('custom', ['zz800'])

# Algorithm Description
--------------------------

猜测的 ``neutralize`` 残差$\bar Res$计算公式：

$$\bar Res_{i,k} = \bar f_{i,k} - \sum_j \beta_{j,k} \times \bar Ex_{i, j, k}$$

其中：$k$为行业分类，$i$为该行业中第$i$只股票，$j$为第$j$个风险因子。$\bar f$为因子序列，$\bar Ex$为风险暴露矩阵。系数$\beta_{j,k}$由OLS确定。

下面的章节，我们分别比较三种``neutralize``的方法差别：

* **UQER Neutralize**

    使用优矿的SDK计算因子残差。


* **Alpha-Mind Neutralize**

    使用alpha-mind计算因子残差，alpha-mind可以由以下地址安装：
    
    ```
    https://github.com/wegamekinglc/alpha-mind
    ```

* **Direct Weighted Least Square Fit Implementation**

    直接使用scikit-learn的线性回归功能来计算因子残差。


# Raw Data
---------------------------

In [None]:
codes = engine.fetch_codes(ref_date, universe)
factor_data = engine.fetch_factor(ref_date, factor, codes)
risk_cov, risk_expousre = engine.fetch_risk_model(ref_date, codes)
total_data = pd.merge(factor_data, risk_expousre, on=['code']).dropna()

In [None]:
total_data['ticker'] = total_data.code.apply(lambda x: '{0:06}'.format(x))
total_data.set_index('ticker', inplace=True)

In [None]:
len(total_data)

# UQER Neutralize
-----------------------

In [None]:
%%timeit
neutralized_factor_uqer = uqer.neutralize(total_data[factor],
                                          target_date=ref_date.replace('-', ''),
                                          industry_type='short')

In [None]:
neutralized_factor_uqer = uqer.neutralize(total_data[factor],
                                          target_date=ref_date.replace('-', ''),
                                          industry_type='short')
df = pd.DataFrame(neutralized_factor_uqer, columns=['uqer'])
df.head(10)

In [None]:
len(neutralized_factor_uqer)

In [None]:
risk_exposure_uqer = uqer.DataAPI.RMExposureDayGet(tradeDate=ref_date.replace('-', '')).set_index('ticker')
targeted_secs = risk_exposure_uqer.loc[neutralized_factor_uqer.index]

style_exposure = neutralized_factor_uqer.values @ targeted_secs[risk_styles].values
industry_exposure = neutralized_factor_uqer.values @ targeted_secs[industry_styles].values

exposure = pd.Series(np.concatenate([style_exposure, industry_exposure]), index=risk_styles+industry_styles)
exposure

# Alpha-Mind Neutralize
--------------------------

In [None]:
x = total_data[risk_styles + industry_styles].values
y = total_data[factor].values

In [None]:
%%timeit
neutralized_factor_alphamind = neutralize(x, y, weights=np.ones(len(y)))

In [None]:
neutralized_factor_alphamind = neutralize(x, y, weights=np.ones(len(y)))
alphamind_series = pd.Series(neutralized_factor_alphamind.flatten(), index=total_data.index)
df['alpha-mind'] = alphamind_series
df.head()

In [None]:
len(alphamind_series)

# The Ticker Missing in UQER but Still in Alpha-Mind
-----------------------------------

In [None]:
missed_codes = [c for c in alphamind_series.index if c not in neutralized_factor_uqer.index]

In [None]:
total_data.loc[missed_codes]

# Direct Weighted Least Square Fit Implementation
------------------------

In [None]:
import statsmodels.api as sm

In [None]:
mod = sm.WLS(y, x, weights=np.ones(len(y))).fit()
lg_series = pd.Series(mod.resid, index=total_data.index)

In [None]:
df['ols'] = lg_series

# Comparison
------------------

In [None]:
df['uqer - ols'] = df['uqer'] - df['ols']
df['alphamind - ols'] = df['alpha-mind'] - df['ols']

In [None]:
df[['uqer - ols', 'alphamind - ols']].plot(figsize=(14, 7), ylim=(-1e-4, 1e-4))

In [None]:
df.head()