alpha-mind的data文件夹提供了对于因子数据进行排序和求分位数的工具函数

### 因子排序： *rank*
- 从小到大排序，返回序列值。
- 可以进行整体排序，也可以分行业(分组)排序。

In [1]:
import numpy as np
import pandas as pd
from alphamind.data.rank import rank

# 假设有10只股票，每只股票有2个因子，构成一个矩阵
factors = pd.DataFrame(np.random.rand(10, 2))
factors.columns = ['factor_1', 'factor_2']
factors['rank_1'] = rank(factors['factor_1'].values)
factors['rank_2'] = rank(factors['factor_2'].values)

factors


Unnamed: 0,factor_1,factor_2,rank_1,rank_2
0,0.441676,0.559719,4.0,8.0
1,0.984315,0.765652,9.0,9.0
2,0.303978,0.050506,2.0,0.0
3,0.528663,0.153304,5.0,2.0
4,0.533509,0.509512,6.0,7.0
5,0.634676,0.075323,7.0,1.0
6,0.255377,0.221123,1.0,3.0
7,0.803129,0.423008,8.0,5.0
8,0.322453,0.449833,3.0,6.0
9,0.198229,0.22945,0.0,4.0


In [2]:
# 假设有10只股票，每只股票有1个因子
factors = pd.DataFrame(np.random.rand(10, 1))
factors.columns = ['factor_1']

# 假设这10只股票分为两个行业,前5个和后5个分属不同类别
industry = np.concatenate([np.array([1.0]*5), np.array([2.0]*5)])

factors['rank'] = rank(factors['factor_1'].values, groups=industry)
factors

Unnamed: 0,factor_1,rank
0,0.473528,1
1,0.036662,0
2,0.699256,2
3,0.939615,4
4,0.762472,3
5,0.054182,0
6,0.90634,4
7,0.141024,1
8,0.387577,2
9,0.868477,3


### 因子分位数: *quantile*
- 根据给定组数*(n_bins)*，按从小达到的顺序进行分组，返回每个因子属于的组别。

In [3]:
from alphamind.data.quantile import quantile

factors['quantile'] = quantile(factors['factor_1'].values, n_bins=5)
factors

Unnamed: 0,factor_1,rank,quantile
0,0.473528,1,2
1,0.036662,0,0
2,0.699256,2,2
3,0.939615,4,4
4,0.762472,3,3
5,0.054182,0,0
6,0.90634,4,4
7,0.141024,1,1
8,0.387577,2,1
9,0.868477,3,3
