# 3.6多级索引

当目前为止，我们接触的都是一维数据和二维数据，用 Pandas 的 Series 和 DataFrame 对象就可以存储。但我们也经常会遇到存储多维数据的需求，数据索引超过一两个键。因此， Pandas 提供了 Panel 和 Panel4D 对象解决三维数据与四维数据。而在实践中，更直观的形式是通过层级索引（hierarchical indexing，也被称为多级索引，multi-indexing） 配合多个有不同等级（level） 的一级索引一起使用，这样就可以将高维数组转换成类似一维 Series 和二维 DataFrame 对象的形式。

![](figures/0306 多级索引.jpg)

In [70]:
import pandas as pd

data = pd.read_csv(r'data/股价-关注度数据.csv', encoding='ISO-8859-1', sep='\t')
data['Trddt'] = pd.to_datetime(data['Trddt'])
#【注意】Trddt列原来是字符串型，必须转换为时间-日期型，以下才能设置索引
data.head()

Unnamed: 0,Stkcd,Trddt,Clsprc,Attention
0,600000,2011-01-04,12.61,12117
1,600000,2011-01-05,12.71,12209
2,600000,2011-01-06,12.67,10929
3,600000,2011-01-07,13.23,11567
4,600000,2011-01-10,13.07,13310


默认的行索引是0, 1, 2, ...，处理起来不方便。下面将行索引设置为两级索引：(股票代码, 日期)，并按索引值从小到大排序，这样支持按股票切分数据。

In [71]:
data.set_index(['Stkcd', 'Trddt'], inplace=True)#设置多级索引：一级'Stkcd'，二级'Trddt'; 必须inplace=True，否则不改变原表
data.sort_index(ascending=True, inplace=True)#对索引排序，即按'Stkcd'和'Trddt'升序排列

In [72]:
data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Clsprc,Attention
Stkcd,Trddt,Unnamed: 2_level_1,Unnamed: 3_level_1
516,2011-01-04,5.67,0
516,2011-01-05,5.69,0
516,2011-01-06,5.69,0
516,2011-01-07,5.7,0
516,2011-01-10,5.5,63


按新索引定位一行：

In [77]:
data.loc[(516, '2011-01-10')]#(516, '2011-01-10')看作一个整体，是个两级索引

Clsprc        5.5
Attention    63.0
Name: (516, 2011-01-10 00:00:00), dtype: float64

定位一行一列：

In [79]:
data.loc[(516, '2011-01-10'), 'Attention']#loc[行索引，列名]

Stkcd  Trddt     
516    2011-01-10    63
Name: Attention, dtype: int64

如何取得所有股票代码？**你必须会自己探索，借助help**

In [102]:
data.index #看到里面是个列表的列表[[], []]

MultiIndex(levels=[[516, 600000, 600004, 601118], [2011-01-04 00:00:00, 2011-01-05 00:00:00, 2011-01-06 00:00:00, 2011-01-07 00:00:00, 2011-01-10 00:00:00, 2011-01-11 00:00:00, 2011-01-12 00:00:00, 2011-01-13 00:00:00, 2011-01-14 00:00:00, 2011-01-17 00:00:00, 2011-01-18 00:00:00, 2011-01-19 00:00:00, 2011-01-20 00:00:00, 2011-01-21 00:00:00, 2011-01-24 00:00:00, 2011-01-25 00:00:00, 2011-01-26 00:00:00, 2011-01-27 00:00:00, 2011-01-28 00:00:00, 2011-01-31 00:00:00, 2011-02-01 00:00:00, 2011-02-09 00:00:00, 2011-02-10 00:00:00, 2011-02-11 00:00:00, 2011-02-14 00:00:00, 2011-02-15 00:00:00, 2011-02-16 00:00:00, 2011-02-17 00:00:00, 2011-02-18 00:00:00, 2011-02-21 00:00:00, 2011-02-22 00:00:00, 2011-02-23 00:00:00, 2011-02-24 00:00:00, 2011-02-25 00:00:00, 2011-02-28 00:00:00, 2011-03-01 00:00:00, 2011-03-02 00:00:00, 2011-03-03 00:00:00, 2011-03-04 00:00:00, 2011-03-07 00:00:00, 2011-03-08 00:00:00, 2011-03-09 00:00:00, 2011-03-10 00:00:00, 2011-03-11 00:00:00, 2011-03-14 00:00:00, 2011

In [99]:
data.index[0] #试试，发现不对

(516, Timestamp('2011-01-04 00:00:00'))

In [103]:
help(data.index) #看到有个levels参数，结合上一行代码

Help on MultiIndex in module pandas.core.indexes.multi object:

class MultiIndex(pandas.core.indexes.base.Index)
 |  A multi-level, or hierarchical, index object for pandas objects.
 |  
 |  Parameters
 |  ----------
 |  levels : sequence of arrays
 |      The unique labels for each level.
 |  codes : sequence of arrays
 |      Integers for each level designating which label at each location.
 |  
 |      .. versionadded:: 0.24.0
 |  labels : sequence of arrays
 |      Integers for each level designating which label at each location.
 |  
 |      .. deprecated:: 0.24.0
 |          Use ``codes`` instead
 |  sortorder : optional int
 |      Level of sortedness (must be lexicographically sorted by that
 |      level).
 |  names : optional sequence of objects
 |      Names for each of the index levels. (name is accepted for compat).
 |  copy : bool, default False
 |      Copy the meta-data.
 |  verify_integrity : bool, default True
 |      Check that the levels/codes are consistent and val

In [105]:
data.index.levels[0] #试试，发现对了

Int64Index([516, 600000, 600004, 601118], dtype='int64', name='Stkcd')

遍历，打印每只股票的信息：

In [108]:
for stkcd in data.index.levels[0]:
    print('股票代码: ' + str(stkcd))
    print(data.loc[stkcd])

股票代码: 516
            Clsprc  Attention
Trddt                        
2011-01-04    5.67          0
2011-01-05    5.69          0
2011-01-06    5.69          0
2011-01-07    5.70          0
2011-01-10    5.50         63
2011-01-11    5.52          0
2011-01-12    5.55          0
2011-01-13    5.52          0
2011-01-14    5.42          0
2011-01-17    5.15          0
2011-01-18    5.60         63
2011-01-19    5.62          0
2011-01-20    5.37          0
2011-01-21    5.41          0
2011-01-24    5.33          0
2011-01-25    5.27          0
2011-01-26    5.33          0
2011-01-27    5.39          0
2011-01-28    5.45          0
2011-01-31    5.54          0
2011-02-01    5.56          0
2011-02-09    5.51          0
2011-02-10    5.69         61
2011-02-11    5.66          0
2011-02-14    5.74          0
2011-02-15    5.82          0
2011-02-16    5.91          0
2011-02-17    5.84          0
2011-02-18    5.80          0
2011-02-21    5.99          0
...            ...        ...
