 #### Pandas的分层索引MultiIndex
为什么要学习分层索引
+ 分层索引：在一个轴向上拥有多个索引层级，可以表达更高维度数据的形式；
+ 可以更方便的进行数据筛选，如果有序则性能更好；
+ groupby等操作的结果，如果是多key，结果是分层索引，需要会使用
+ 一般不需要自己创建分层索引（其有构造函数但一般不用）

演示数据：百度、阿里巴巴、爱奇艺、京东四家公司的10天股票数据  
数据来自：英为财经 <https://cn.investing.com/>  

本次演示提纲：
一、Series的分层索引  
二、Series有多层索引怎样筛选数据？
三、DataFrame的多层索引
四、DataFrame有多层索引怎样筛选数据？

In [1]:
import pandas as pd
%matplotlib inline

In [2]:
stocks=pd.read_excel('./files/companies_stock.xlsx')
stocks.head(3)

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2023-02-17,JD,53.02,53.14,53.68,52.46,6.34M,-0.02
1,2023-02-16,JD,54.1,53.79,54.77,53.49,8.94M,0.0358
2,2023-02-15,JD,52.23,51.64,52.27,51.08,4.64M,-0.0017


In [3]:
stocks['公司'].unique()

array(['JD', 'BIDU', 'IQ', 'BABA'], dtype=object)

In [4]:
stocks.index

RangeIndex(start=0, stop=12, step=1)

In [5]:
stocks.groupby('公司')['收盘'].mean()

公司
BABA    102.066667
BIDU    147.586667
IQ        7.210000
JD       53.116667
Name: 收盘, dtype: float64

#### 一、Series的分层索引MultiIndex

In [6]:
ser=stocks.groupby(['公司','日期'])['收盘'].mean()
ser

公司    日期        
BABA  2023-02-15    103.08
      2023-02-16    103.11
      2023-02-17    100.01
BIDU  2023-02-15    152.04
      2023-02-16    149.01
      2023-02-17    141.71
IQ    2023-02-15      7.48
      2023-02-16      7.20
      2023-02-17      6.95
JD    2023-02-15     52.23
      2023-02-16     54.10
      2023-02-17     53.02
Name: 收盘, dtype: float64

多维索引中，空白的意思是：使用上面的值

In [7]:
ser.index

MultiIndex([('BABA', '2023-02-15'),
            ('BABA', '2023-02-16'),
            ('BABA', '2023-02-17'),
            ('BIDU', '2023-02-15'),
            ('BIDU', '2023-02-16'),
            ('BIDU', '2023-02-17'),
            (  'IQ', '2023-02-15'),
            (  'IQ', '2023-02-16'),
            (  'IQ', '2023-02-17'),
            (  'JD', '2023-02-15'),
            (  'JD', '2023-02-16'),
            (  'JD', '2023-02-17')],
           names=['公司', '日期'])

In [8]:
# unstack把二级索引变成列
ser.unstack()

日期,2023-02-15,2023-02-16,2023-02-17
公司,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BABA,103.08,103.11,100.01
BIDU,152.04,149.01,141.71
IQ,7.48,7.2,6.95
JD,52.23,54.1,53.02


In [9]:
ser

公司    日期        
BABA  2023-02-15    103.08
      2023-02-16    103.11
      2023-02-17    100.01
BIDU  2023-02-15    152.04
      2023-02-16    149.01
      2023-02-17    141.71
IQ    2023-02-15      7.48
      2023-02-16      7.20
      2023-02-17      6.95
JD    2023-02-15     52.23
      2023-02-16     54.10
      2023-02-17     53.02
Name: 收盘, dtype: float64

In [10]:
ser.reset_index()

Unnamed: 0,公司,日期,收盘
0,BABA,2023-02-15,103.08
1,BABA,2023-02-16,103.11
2,BABA,2023-02-17,100.01
3,BIDU,2023-02-15,152.04
4,BIDU,2023-02-16,149.01
5,BIDU,2023-02-17,141.71
6,IQ,2023-02-15,7.48
7,IQ,2023-02-16,7.2
8,IQ,2023-02-17,6.95
9,JD,2023-02-15,52.23


#### 二、Series有多层索引MultiIndex怎样筛选数据？

In [11]:
ser

公司    日期        
BABA  2023-02-15    103.08
      2023-02-16    103.11
      2023-02-17    100.01
BIDU  2023-02-15    152.04
      2023-02-16    149.01
      2023-02-17    141.71
IQ    2023-02-15      7.48
      2023-02-16      7.20
      2023-02-17      6.95
JD    2023-02-15     52.23
      2023-02-16     54.10
      2023-02-17     53.02
Name: 收盘, dtype: float64

In [13]:
ser.loc['BIDU']

日期
2023-02-15    152.04
2023-02-16    149.01
2023-02-17    141.71
Name: 收盘, dtype: float64

In [14]:
# 多层索引可以用元组的形式筛选数据
ser.loc[('BIDU','2023-2-15')]

152.04

In [15]:
# 只筛选第二级索引，用：表示第一级索引全选
ser.loc[:,'2023-2-15']

公司
BABA    103.08
BIDU    152.04
IQ        7.48
JD       52.23
Name: 收盘, dtype: float64

#### 三、DataFrame的多层索引

In [16]:
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2023-02-17,JD,53.02,53.14,53.68,52.46,6.34M,-0.02
1,2023-02-16,JD,54.1,53.79,54.77,53.49,8.94M,0.0358
2,2023-02-15,JD,52.23,51.64,52.27,51.08,4.64M,-0.0017
3,2023-02-17,BIDU,141.71,143.49,144.13,140.01,3.39M,-0.049
4,2023-02-16,BIDU,149.01,149.0,151.34,147.86,3.54M,-0.0199


In [17]:
stocks.set_index(['公司','日期'],inplace=True)
stocks

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
JD,2023-02-17,53.02,53.14,53.68,52.46,6.34M,-0.02
JD,2023-02-16,54.1,53.79,54.77,53.49,8.94M,0.0358
JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017
BIDU,2023-02-17,141.71,143.49,144.13,140.01,3.39M,-0.049
BIDU,2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
IQ,2023-02-17,6.95,7.01,7.16,6.93,6.82M,-0.0347
IQ,2023-02-16,7.2,7.26,7.43,7.15,8.62M,-0.0374
IQ,2023-02-15,7.48,7.48,7.77,6.99,14.17M,0.0275
BABA,2023-02-17,100.01,101.0,101.09,99.25,18.12M,-0.0301


In [19]:
stocks.index

MultiIndex([(  'JD', '2023-02-17'),
            (  'JD', '2023-02-16'),
            (  'JD', '2023-02-15'),
            ('BIDU', '2023-02-17'),
            ('BIDU', '2023-02-16'),
            ('BIDU', '2023-02-15'),
            (  'IQ', '2023-02-17'),
            (  'IQ', '2023-02-16'),
            (  'IQ', '2023-02-15'),
            ('BABA', '2023-02-17'),
            ('BABA', '2023-02-16'),
            ('BABA', '2023-02-15')],
           names=['公司', '日期'])

In [20]:
stocks.sort_index(inplace=True)
stocks
# tips：排序后的索引，在查询时性能更好

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2023-02-15,103.08,102.39,103.44,102.0,16.76M,-0.0109
BABA,2023-02-16,103.11,102.71,104.52,102.03,19.66M,0.0003
BABA,2023-02-17,100.01,101.0,101.09,99.25,18.12M,-0.0301
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
BIDU,2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
BIDU,2023-02-17,141.71,143.49,144.13,140.01,3.39M,-0.049
IQ,2023-02-15,7.48,7.48,7.77,6.99,14.17M,0.0275
IQ,2023-02-16,7.2,7.26,7.43,7.15,8.62M,-0.0374
IQ,2023-02-17,6.95,7.01,7.16,6.93,6.82M,-0.0347
JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017


#### 四、DataFrame有多层索引怎样筛选数据？
【重要知识】在选择数据时：
+ 元组（key1，key2）代表筛选多层索引，其中key1是索引第一级，key2是第二级，比如key1=JD，key2=2023-02-15
+ 列表[key1,key2]代表同一层的多个key，其中key1和key2是并列的同级索引，比如key1=JD，key2=BIDU

In [21]:
stocks.loc['BIDU']

Unnamed: 0_level_0,收盘,开盘,高,低,交易量,涨跌幅
日期,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
2023-02-17,141.71,143.49,144.13,140.01,3.39M,-0.049


In [22]:
stocks.loc[('BIDU','2023-02-15'),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199


In [23]:
stocks.loc[('BIDU','2023-02-15'),'开盘']

公司    日期        
BIDU  2023-02-15    153.06
Name: 开盘, dtype: float64

In [26]:
stocks.loc[['BIDU','JD']]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
BIDU,2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
BIDU,2023-02-17,141.71,143.49,144.13,140.01,3.39M,-0.049
JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017
JD,2023-02-16,54.1,53.79,54.77,53.49,8.94M,0.0358
JD,2023-02-17,53.02,53.14,53.68,52.46,6.34M,-0.02


In [27]:
stocks.loc[['BIDU','JD'],:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
BIDU,2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
BIDU,2023-02-17,141.71,143.49,144.13,140.01,3.39M,-0.049
JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017
JD,2023-02-16,54.1,53.79,54.77,53.49,8.94M,0.0358
JD,2023-02-17,53.02,53.14,53.68,52.46,6.34M,-0.02


In [32]:
stocks.loc[(['BIDU','JD'],'2023-02-15'),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017


In [33]:
stocks.loc[(['BIDU','JD'],'2023-02-15'),'收盘']

公司    日期        
BIDU  2023-02-15    152.04
JD    2023-02-15     52.23
Name: 收盘, dtype: float64

In [35]:
stocks.loc[('BIDU',['2023-02-15','2023-02-16']),'收盘']

公司    日期        
BIDU  2023-02-15    152.04
      2023-02-16    149.01
Name: 收盘, dtype: float64

In [36]:
# slice(None)代表筛选这一索引的所有内容
stocks.loc[(slice(None),['2023-02-15','2023-02-16']),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2023-02-15,103.08,102.39,103.44,102.0,16.76M,-0.0109
BABA,2023-02-16,103.11,102.71,104.52,102.03,19.66M,0.0003
BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
BIDU,2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
IQ,2023-02-15,7.48,7.48,7.77,6.99,14.17M,0.0275
IQ,2023-02-16,7.2,7.26,7.43,7.15,8.62M,-0.0374
JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017
JD,2023-02-16,54.1,53.79,54.77,53.49,8.94M,0.0358


In [37]:
stocks.reset_index()
# 两级索引即变成了普通的列

Unnamed: 0,公司,日期,收盘,开盘,高,低,交易量,涨跌幅
0,BABA,2023-02-15,103.08,102.39,103.44,102.0,16.76M,-0.0109
1,BABA,2023-02-16,103.11,102.71,104.52,102.03,19.66M,0.0003
2,BABA,2023-02-17,100.01,101.0,101.09,99.25,18.12M,-0.0301
3,BIDU,2023-02-15,152.04,153.06,155.0,150.8,3.85M,0.0199
4,BIDU,2023-02-16,149.01,149.0,151.34,147.86,3.54M,-0.0199
5,BIDU,2023-02-17,141.71,143.49,144.13,140.01,3.39M,-0.049
6,IQ,2023-02-15,7.48,7.48,7.77,6.99,14.17M,0.0275
7,IQ,2023-02-16,7.2,7.26,7.43,7.15,8.62M,-0.0374
8,IQ,2023-02-17,6.95,7.01,7.16,6.93,6.82M,-0.0347
9,JD,2023-02-15,52.23,51.64,52.27,51.08,4.64M,-0.0017
