# pandas层次化索引

## 1. 创建多层行索引

### 1) 隐式构造

最常见的方法是给DataFrame构造函数的index参数传递两个或更多的数组

- Series也可以创建多层索引

### 2) 显示构造pd.MultiIndex

- 使用数组

- 使用tuple

- 使用product

    最简单，推荐使用

============================================

练习8：

1. 创建一个DataFrame，表示出张三李四期中期末各科成绩

============================================

## 2. 多层列索引

除了行索引index，列索引columns也能用同样的方法创建多层索引

## 3. 多层索引对象的索引与切片操作

### 1）Series的操作

【重要】对于Series来说，直接中括号[]与使用.loc()完全一样，推荐使用.loc中括号索引和切片。

(1) 索引

(2) 切片

### 2）DataFrame的操作

(1) 可以直接使用列名称来进行列索引

行多级索引的索引和切片操作

列多级索引的索引和切片操作

In [2]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
data = np.random.randint(0,100,size=(5,6))
columns = pd.MultiIndex.from_product([['期中','期末'],['python','php','java']])
df = DataFrame(data=data,columns=columns)
df

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,python,php,java,python,php,java
0,62,84,81,96,6,18
1,74,70,3,56,59,27
2,99,93,52,89,95,42
3,50,66,64,26,86,10
4,49,29,74,49,31,6


In [5]:
# 行切片
df.loc[0:3]

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,python,php,java,python,php,java
0,62,84,81,96,6,18
1,74,70,3,56,59,27
2,99,93,52,89,95,42
3,50,66,64,26,86,10


In [10]:
# 列切片
df['期中'].loc[:,'python':'php']

Unnamed: 0,python,php
0,62,84
1,74,70
2,99,93
3,50,66
4,49,29


In [13]:
# 使用values属性列切片
df['期中'].values[:,0:2]

array([[62, 84],
       [74, 70],
       [99, 93],
       [50, 66],
       [49, 29]])

In [16]:
# 使用隐式索引列切片
df.iloc[:,0:2]

Unnamed: 0_level_0,期中,期中
Unnamed: 0_level_1,python,php
0,62,84
1,74,70
2,99,93
3,50,66
4,49,29


In [20]:
df.iloc[:,0:2]

Unnamed: 0_level_0,期中,期中
Unnamed: 0_level_1,python,php
0,62,84
1,74,70
2,99,93
3,50,66
4,49,29


(2) 使用行索引需要用ix()，loc()等函数

【极其重要】推荐使用loc()函数

In [22]:
index = pd.MultiIndex.from_product([['期中','期末'],['python','php','java']])
columns = ['dancer','lucy','tom','mery']
data = np.random.randint(0,100,size=(6,4))
df = DataFrame(data=data,columns=columns,index=index)
df

Unnamed: 0,Unnamed: 1,dancer,lucy,tom,mery
期中,python,44,53,20,97
期中,php,39,34,72,36
期中,java,45,39,96,24
期末,python,45,78,47,88
期末,php,58,9,59,41
期末,java,14,83,83,15


In [32]:
# 从列的角度访问
df['dancer']
# df.dancer.loc['期中','python']
# df.dancer['期中']['python']
# df.dancer[0]
df.dancer.iloc[0]

44

In [38]:
# 从行的角度访问
# df.loc['期中','python']['dancer']
# df.iloc[0]['dancer']
df.iloc[0,0]

44

In [42]:
# 行切片
# 索引是默认列索引，切片默认行切片
df.loc['期中']['python':'php']
# 隐式索引切片
df.iloc[0:2]

Unnamed: 0,Unnamed: 1,dancer,lucy,tom,mery
期中,python,44,53,20,97
期中,php,39,34,72,36


In [47]:
# 列切片
# df.loc[:,'dancer':'lucy']
df.iloc[:,0:2]

Unnamed: 0,Unnamed: 1,dancer,lucy
期中,python,44,53
期中,php,39,34
期中,java,45,39
期末,python,45,78
期末,php,58,9
期末,java,14,83


In [51]:
df1 = df.loc['期中']
df1

Unnamed: 0,dancer,lucy,tom,mery
python,44,53,20,97
php,39,34,72,36
java,45,39,96,24


In [52]:
df1.loc[['python','java']]

Unnamed: 0,dancer,lucy,tom,mery
python,44,53,20,97
java,45,39,96,24


In [57]:
# 获取期中期末的python成绩
df.iloc[[0,3]]
# df.loc[[['期中','python'],['期末','python']]]

Unnamed: 0,Unnamed: 1,dancer,lucy,tom,mery
期中,python,44,53,20,97
期末,python,45,78,47,88


注意在对行索引的时候，若一级行索引还有多个，对二级行索引会遇到问题！也就是说，无法直接对二级索引进行索引，必须让二级索引变成一级索引后才能对其进行索引！

注意：能用.loc解决的就用.loc解决，能用一个[]，尽量不要用两个

copy函数

============================================

练习9：

1. 分析比较Series和DataFrame各种索引的方式，熟练掌握.loc()方法

2. 假设张三再一次在期中考试的时候因为特殊原因放弃英语考试，如何实现？

============================================

In [83]:
score = df.unstack(level=0).stack(level=0).unstack(level=0)
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,java,php,python,java,php,python
dancer,45,39,44,14,58,45
lucy,39,34,53,83,9,78
mery,24,36,97,15,41,88
tom,96,72,20,83,59,47


In [102]:
score.loc['dancer'].iloc[-1]=0
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,java,php,python,java,php,python
dancer,45,39,0,14,58,0
lucy,0,34,53,83,9,78
mery,24,36,97,15,0,88
tom,96,0,20,83,59,47


In [87]:
score.loc['dancer']['期中','python'] = 0
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,java,php,python,java,php,python
dancer,45,39,0,14,58,45
lucy,39,34,53,83,9,78
mery,24,36,97,15,41,88
tom,96,72,20,83,59,47


In [92]:
score.loc['lucy'].loc['期中']['java'] = 0
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,java,php,python,java,php,python
dancer,45,39,0,14,58,45
lucy,0,34,53,83,9,78
mery,24,36,97,15,41,88
tom,96,72,20,83,59,47


In [95]:
score.iloc[3,1] = 0
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,java,php,python,java,php,python
dancer,45,39,0,14,58,45
lucy,0,34,53,83,9,78
mery,24,36,97,15,41,88
tom,96,0,20,83,59,47


In [99]:
score['期末','php'].loc['mery'] = 0
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,java,php,python,java,php,python
dancer,45,39,0,14,58,45
lucy,0,34,53,83,9,78
mery,24,36,97,15,0,88
tom,96,0,20,83,59,47


## 4. 索引的堆（stack）

- ``stack()``
- ``unstack()``

In [62]:
df

Unnamed: 0,Unnamed: 1,dancer,lucy,tom,mery
期中,python,44,53,20,97
期中,php,39,34,72,36
期中,java,45,39,96,24
期末,python,45,78,47,88
期末,php,58,9,59,41
期末,java,14,83,83,15


In [65]:
# unstack把行索引变换为列索引
DataFrame(df.unstack(level=0).unstack())

Unnamed: 0,Unnamed: 1,Unnamed: 2,0
dancer,期中,java,45
dancer,期中,php,39
dancer,期中,python,44
dancer,期末,java,14
dancer,期末,php,58
dancer,期末,python,45
lucy,期中,java,39
lucy,期中,php,34
lucy,期中,python,53
lucy,期末,java,83


In [71]:
# level无论是行标签还是列标签，从最外层开始算0，向内依次增加
df.stack().unstack(level=1).unstack(level=0).stack(level=0)

Unnamed: 0,Unnamed: 1,期中,期末
dancer,java,45,14
dancer,php,39,58
dancer,python,44,45
lucy,java,39,83
lucy,php,34,9
lucy,python,53,78
tom,java,96,83
tom,php,72,59
tom,python,20,47
mery,java,24,15


小技巧】使用stack()的时候，level等于哪一个，哪一个就消失，出现在行里。

【小技巧】使用unstack()的时候，level等于哪一个，哪一个就消失，出现在列里。

============================================

练习10：

1. 使用unstack()将ddd变为两行，分别为期中期末

2. 使用unstack()将ddd变为四行，分别为四个科目

============================================

In [104]:
df.unstack(level=1)

Unnamed: 0_level_0,dancer,dancer,dancer,lucy,lucy,lucy,tom,tom,tom,mery,mery,mery
Unnamed: 0_level_1,java,php,python,java,php,python,java,php,python,java,php,python
期中,45,39,44,39,34,53,96,72,20,24,36,97
期末,14,58,45,83,9,78,83,59,47,15,41,88


In [106]:
df.unstack(level=0)

Unnamed: 0_level_0,dancer,dancer,lucy,lucy,tom,tom,mery,mery
Unnamed: 0_level_1,期中,期末,期中,期末,期中,期末,期中,期末
java,45,14,39,83,96,83,24,15
php,39,58,34,9,72,59,36,41
python,44,45,53,78,20,47,97,88


In [109]:
df.unstack(level=1).stack(level=0)

Unnamed: 0,Unnamed: 1,java,php,python
期中,dancer,45,39,44
期中,lucy,39,34,53
期中,mery,24,36,97
期中,tom,96,72,20
期末,dancer,14,58,45
期末,lucy,83,9,78
期末,mery,15,41,88
期末,tom,83,59,47


## 5. 聚合操作

【注意】

- 需要指定axis

- 【小技巧】和unstack()相反，聚合的时候，axis等于哪一个，哪一个就保留。

In [None]:
ndarray的聚合
sum  np.nansum
mean
std
max
min
argmax
argmin

所谓的聚合操作：平均数，方差，最大值，最小值……

In [115]:
data = np.random.random(size=(5,5))
columns = list('ABCDE')
df = DataFrame(data=data,columns=columns)
df

Unnamed: 0,A,B,C,D,E
0,0.839465,0.226659,0.539291,0.343774,0.180513
1,0.251058,0.188429,0.744377,0.887086,0.180182
2,0.28409,0.292882,0.929772,0.055396,0.361624
3,0.767415,0.498463,0.182891,0.182202,0.611375
4,0.277419,0.780826,0.150085,0.354248,0.875335


In [119]:
# 默认以列的方向聚合
df.sum(axis=1)

0    2.129702
1    2.251130
2    1.923765
3    2.242347
4    2.437912
dtype: float64

In [120]:
df.loc[0,'B'] = None
df

Unnamed: 0,A,B,C,D,E
0,0.839465,,0.539291,0.343774,0.180513
1,0.251058,0.188429,0.744377,0.887086,0.180182
2,0.28409,0.292882,0.929772,0.055396,0.361624
3,0.767415,0.498463,0.182891,0.182202,0.611375
4,0.277419,0.780826,0.150085,0.354248,0.875335


In [121]:
# DataFrame对象的聚合操作会自动处理空值
df.sum()

A    2.419448
B    1.760600
C    2.546416
D    1.822706
E    2.209029
dtype: float64

============================================

练习11：

1. 计算各个科目期中期末平均成绩

2. 计算各科目张三李四的最高分

============================================

In [122]:
index = ['张三','李四']
columns = pd.MultiIndex.from_product([['期中','期末'],['python','php','java']])
data = np.random.randint(0,100,size=(2,6))
score = DataFrame(data=data,index=index,columns=columns)
score

Unnamed: 0_level_0,期中,期中,期中,期末,期末,期末
Unnamed: 0_level_1,python,php,java,python,php,java
张三,41,28,19,37,16,27
李四,57,39,49,79,29,66


In [123]:
score.mean(axis=0)

期中  python    49.0
    php       33.5
    java      34.0
期末  python    58.0
    php       22.5
    java      46.5
dtype: float64

In [124]:
score.max(axis=1)

张三    41
李四    79
dtype: int32

In [129]:
score.stack(level=0).max(axis=1)

张三  期中    41
    期末    37
李四  期中    57
    期末    79
dtype: int32