# Task03 索引
## 1 知识梳理（重点记忆）

### 1.1 索引器

#### 1.1.1 loc索引器
`loc`索引器，主要用于选取指定行列的数据，使用形式为`loc[*, *]`，可使用的对象为：
- 单个元素：如果返回为多个，则为Series，如果唯一，则为单个元素
- 元素列表
- 元素切片
- 布尔表达式：类似过滤器df[conditions]，可使用`|`（或）, `&`（且）,`~`（取反）
- 函数

#### 1.1.2 iloc索引器
`iloc`索引器，和`loc`索引器类似

#### 1.1.3 query方法
`query`方法和SQL类似，方法里面传入类SQL参数，便于多个复合条件的查找，表达简洁

### 1.2 多级索引

#### 1.2.1 多级索引及其表结构
通过`.index.get_level_values(x)`方法获得索引的属性值，然后调用`.tolist()`方法可将其转换为列表

In [1]:
import pandas as pd
import numpy as np

np.random.seed(0)
L1,L2 = ['A','B','C'],['a','b']
mul_index1 = pd.MultiIndex.from_product([L1,L2],names=('Upper', 'Lower'))
L3,L4 = ['D','E'],['d','e','f']
mul_index2 = pd.MultiIndex.from_product([L3,L4],names=('Big', 'Small'))
df = pd.DataFrame(np.random.randint(-6,7,(6,6)), index=mul_index1, columns=mul_index2)
df

Unnamed: 0_level_0,Big,D,D,D,E,E,E
Unnamed: 0_level_1,Small,d,e,f,d,e,f
Upper,Lower,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
A,a,6,-1,-6,-3,5,-3
A,b,1,3,-3,-1,-4,-2
B,a,1,0,2,2,6,4
B,b,-5,0,1,1,2,-5
C,a,-1,3,2,3,-2,-3
C,b,-6,-3,-1,-6,-4,-3


In [2]:
df.index.get_level_values(1).tolist()

['a', 'b', 'a', 'b', 'a', 'b']

#### 1.2.2 IndexSlice对象
通过采用`IndexSlice`对象，可以进行数据的条件选择

In [3]:
# 选取列和大于0的数据
idx = pd.IndexSlice

df.loc[idx[:'A', lambda x:x.sum()>0]]

Unnamed: 0_level_0,Big,D,E
Unnamed: 0_level_1,Small,e,e
Upper,Lower,Unnamed: 2_level_2,Unnamed: 3_level_2
A,a,-1,5
A,b,3,-4


#### 1.2.3 多级索引的构造
- from_tuples
- from_arrays 
- from_product

### 1.3 索引的常用方法

#### 1.3.1 索引层的交换和删除

In [4]:
# 列索引的第1层和第2层交换
df.swaplevel(1,0,axis=1).head() 

Unnamed: 0_level_0,Small,d,e,f,d,e,f
Unnamed: 0_level_1,Big,D,D,D,E,E,E
Upper,Lower,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
A,a,6,-1,-6,-3,5,-3
A,b,1,3,-3,-1,-4,-2
B,a,1,0,2,2,6,4
B,b,-5,0,1,1,2,-5
C,a,-1,3,2,3,-2,-3


In [5]:
# 列表数字指代原来索引中的层
df.reorder_levels([1,0],axis=0).head() 

Unnamed: 0_level_0,Big,D,D,D,E,E,E
Unnamed: 0_level_1,Small,d,e,f,d,e,f
Lower,Upper,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
a,A,6,-1,-6,-3,5,-3
b,A,1,3,-3,-1,-4,-2
a,B,1,0,2,2,6,4
b,B,-5,0,1,1,2,-5
a,C,-1,3,2,3,-2,-3


In [6]:
# 删除第1层的列索引
df.droplevel(1,axis=1)

Unnamed: 0_level_0,Big,D,D,D,E,E,E
Upper,Lower,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,a,6,-1,-6,-3,5,-3
A,b,1,3,-3,-1,-4,-2
B,a,1,0,2,2,6,4
B,b,-5,0,1,1,2,-5
C,a,-1,3,2,3,-2,-3
C,b,-6,-3,-1,-6,-4,-3


#### 1.3.2 索引属性的修改
- 通过`rename_axis`可以对索引层的名字进行修改，常用的修改方式是传入字典的映射
- 通过`rename`可以对索引的值进行修改，如果是多级索引需要指定修改的层号`level`

### 1.4 索引的运算
- $S_A \cap S_B$：`S_A.intersection(S_B)`、`S_A & S_B`
- $S_A \cup S_B$：`S_A.union(S_B)`、`S_A | S_B`
- $S_A - S_B$：`S_A.difference(S_B)`、`(S_A ^ S_B) & S_A`
- $S_A\triangle S_B$：`S_A.symmetric\_difference(S_B)`、`S_A ^ S_B`

## 2 练一练

### 2.1 第1题
`select_dtypes`是一个实用函数，它能够从表中选出相应类型的列，若要选出所有数值型的列，只需使用`.select_dtypes('number')`，请利用布尔列表选择的方法结合`DataFrame`的`dtypes`属性在`learn_pandas`数据集上实现这个功能。

**我的解答：**

In [7]:
df = pd.read_csv('../data/learn_pandas.csv', usecols = ['School', 'Grade', 'Name', 'Gender', 'Weight', 'Transfer'])
df.head()

Unnamed: 0,School,Grade,Name,Gender,Weight,Transfer
0,Shanghai Jiao Tong University,Freshman,Gaopeng Yang,Female,46.0,N
1,Peking University,Freshman,Changqiang You,Male,70.0,N
2,Shanghai Jiao Tong University,Senior,Mei Sun,Male,89.0,N
3,Fudan University,Sophomore,Xiaojuan Sun,Female,41.0,N
4,Fudan University,Sophomore,Gaojuan You,Male,74.0,N


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   School    200 non-null    object 
 1   Grade     200 non-null    object 
 2   Name      200 non-null    object 
 3   Gender    200 non-null    object 
 4   Weight    189 non-null    float64
 5   Transfer  188 non-null    object 
dtypes: float64(1), object(5)
memory usage: 9.5+ KB


In [9]:
# 可以观察到，只有Weight列符合条件，类型为number
df.select_dtypes('number').head()

Unnamed: 0,Weight
0,46.0
1,70.0
2,89.0
3,41.0
4,74.0


In [10]:
# 利用布尔列表选择的方法结合DataFrame的dtypes属性实现
import numpy as np

df.loc[:,df.dtypes[df.dtypes == np.number].index].head()

Unnamed: 0,Weight
0,46.0
1,70.0
2,89.0
3,41.0
4,74.0


### 2.2 第2题
与单层索引类似，若存在重复元素，则不能使用切片，请去除重复索引后给出一个元素切片的例子。

**我的解答：**

In [11]:
df_multi = df.set_index(['School', 'Grade'])
df_multi = df_multi.sort_index()
df_multi.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Gender,Weight,Transfer
School,Grade,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Fudan University,Freshman,Changqiang Yang,Female,49.0,N
Fudan University,Freshman,Gaoqiang Qin,Female,63.0,N
Fudan University,Freshman,Gaofeng Zhao,Female,43.0,N
Fudan University,Freshman,Yanquan Wang,Female,55.0,N
Fudan University,Freshman,Feng Wang,Male,74.0,N


In [12]:
df_dup = df_multi.reset_index().drop_duplicates(subset=['School','Grade'], keep='first').set_index(['School','Grade'])
df_dup

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Gender,Weight,Transfer
School,Grade,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Fudan University,Freshman,Changqiang Yang,Female,49.0,N
Fudan University,Junior,Yanli You,Female,48.0,N
Fudan University,Senior,Chengpeng Zheng,Female,38.0,N
Fudan University,Sophomore,Xiaojuan Sun,Female,41.0,N
Peking University,Freshman,Changqiang You,Male,70.0,N
Peking University,Junior,Juan Xu,Female,,N
Peking University,Senior,Changli Lv,Female,41.0,N
Peking University,Sophomore,Changmei Xu,Female,43.0,N
Shanghai Jiao Tong University,Freshman,Gaopeng Yang,Female,46.0,N
Shanghai Jiao Tong University,Junior,Feng Zheng,Female,51.0,N


In [13]:
df_dup.loc[('Fudan University', 'Freshman'):('Peking University', 'Junior')]

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Gender,Weight,Transfer
School,Grade,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Fudan University,Freshman,Changqiang Yang,Female,49.0,N
Fudan University,Junior,Yanli You,Female,48.0,N
Fudan University,Senior,Chengpeng Zheng,Female,38.0,N
Fudan University,Sophomore,Xiaojuan Sun,Female,41.0,N
Peking University,Freshman,Changqiang You,Male,70.0,N
Peking University,Junior,Juan Xu,Female,,N


### 2.3 第3题
尝试在`rename_axis`中使用函数完成与例子中一样的功能。

In [14]:
np.random.seed(0)
L1,L2,L3 = ['A','B'],['a','b'],['alpha','beta']
mul_index1 = pd.MultiIndex.from_product([L1,L2,L3], names=('Upper', 'Lower','Extra'))
L4,L5,L6 = ['C','D'],['c','d'],['cat','dog']
mul_index2 = pd.MultiIndex.from_product([L4,L5,L6], names=('Big', 'Small', 'Other'))
df_ex = pd.DataFrame(np.random.randint(-9,10,(8,8)), index=mul_index1,  columns=mul_index2)
df_ex

Unnamed: 0_level_0,Unnamed: 1_level_0,Big,C,C,C,C,D,D,D,D
Unnamed: 0_level_1,Unnamed: 1_level_1,Small,c,c,d,d,c,c,d,d
Unnamed: 0_level_2,Unnamed: 1_level_2,Other,cat,dog,cat,dog,cat,dog,cat,dog
Upper,Lower,Extra,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
A,a,alpha,3,6,-9,-6,-6,-2,0,9
A,a,beta,-5,-3,3,-8,-3,-2,5,8
A,b,alpha,-4,4,-1,0,7,-4,6,6
A,b,beta,-9,9,-6,8,5,-2,-9,-8
B,a,alpha,0,-9,1,-6,2,9,-7,-9
B,a,beta,-9,-5,-4,-3,-1,8,6,-5
B,b,alpha,0,1,-8,-8,-2,0,-6,-3
B,b,beta,2,5,9,-9,5,-6,3,1


**我的解答：**

原功能：

In [15]:
df_ex.rename_axis(index={'Upper':'Changed_row'}, columns={'Other':'Changed_Col'}).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Big,C,C,C,C,D,D,D,D
Unnamed: 0_level_1,Unnamed: 1_level_1,Small,c,c,d,d,c,c,d,d
Unnamed: 0_level_2,Unnamed: 1_level_2,Changed_Col,cat,dog,cat,dog,cat,dog,cat,dog
Changed_row,Lower,Extra,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
A,a,alpha,3,6,-9,-6,-6,-2,0,9
A,a,beta,-5,-3,3,-8,-3,-2,5,8
A,b,alpha,-4,4,-1,0,7,-4,6,6
A,b,beta,-9,9,-6,8,5,-2,-9,-8
B,a,alpha,0,-9,1,-6,2,9,-7,-9


使用函数实现：

In [16]:
df_ex.rename_axis(index=lambda x: 'Changed_row' if x == 'Upper' else x, 
                  columns=lambda x: 'Changed_Col' if x == 'Other' else x).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Big,C,C,C,C,D,D,D,D
Unnamed: 0_level_1,Unnamed: 1_level_1,Small,c,c,d,d,c,c,d,d
Unnamed: 0_level_2,Unnamed: 1_level_2,Changed_Col,cat,dog,cat,dog,cat,dog,cat,dog
Changed_row,Lower,Extra,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
A,a,alpha,3,6,-9,-6,-6,-2,0,9
A,a,beta,-5,-3,3,-8,-3,-2,5,8
A,b,alpha,-4,4,-1,0,7,-4,6,6
A,b,beta,-9,9,-6,8,5,-2,-9,-8
B,a,alpha,0,-9,1,-6,2,9,-7,-9


## 3 练习
### 3.1 Ex1：公司员工数据集
现有一份公司员工数据集：

In [17]:
df = pd.read_csv('../data/company.csv')
df.head(3)

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
0,1318,1/3/1954,61,Vancouver,Executive,CEO,M
1,1319,1/3/1957,58,Vancouver,Executive,VP Stores,F
2,1320,1/2/1955,60,Vancouver,Executive,Legal Counsel,F


1. 分别只使用`query`和`loc`选出年龄不超过四十岁且工作部门为`Dairy`或`Bakery`的男性。
2. 选出员工`ID`号 为奇数所在行的第1、第3和倒数第2列。
3. 按照以下步骤进行索引操作：

* 把后三列设为索引后交换内外两层
* 恢复中间一层
* 修改外层索引名为`Gender`
* 用下划线合并两层行索引
* 把行索引拆分为原状态
* 修改索引名为原表名称
* 恢复默认索引并将列保持为原表的相对位置

**我的解答：**

**第1问：**

In [18]:
# 使用loc选择器
df.loc[(df.age < 40) & (df.department.isin(['Dairy', 'Bakery'])) & (df.gender == 'M')].head()

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
3722,5902,1/12/1976,39,New Westminster,Dairy,Dairy Person,M
3724,5904,1/16/1976,39,Kelowna,Dairy,Dairy Person,M
3725,5905,1/19/1976,39,Burnaby,Dairy,Dairy Person,M
3727,5907,1/30/1976,39,Cranbrook,Bakery,Baker,M
3730,5910,2/5/1976,39,New Westminster,Dairy,Dairy Person,M


In [19]:
# 使用query方法
df.query('age < 40 & department == ["Dairy", "Bakery"] & gender == "M"').head()

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
3722,5902,1/12/1976,39,New Westminster,Dairy,Dairy Person,M
3724,5904,1/16/1976,39,Kelowna,Dairy,Dairy Person,M
3725,5905,1/19/1976,39,Burnaby,Dairy,Dairy Person,M
3727,5907,1/30/1976,39,Cranbrook,Bakery,Baker,M
3730,5910,2/5/1976,39,New Westminster,Dairy,Dairy Person,M


**第2问：**  
根据题意，采用`iloc`索引器，根据过滤条件`df.EmployeeID%2==1`，选取`[0, 2, -2]`列

In [20]:
df.iloc[(df.EmployeeID % 2 == 1).values, [0, 2, -2]].head()

Unnamed: 0,EmployeeID,age,job_title
1,1319,58,VP Stores
3,1321,56,VP Human Resources
5,1323,53,"Exec Assistant, VP Stores"
6,1325,51,"Exec Assistant, Legal Counsel"
8,1329,48,Store Manager


**第3问：**

In [21]:
df_copy = df.copy()

In [22]:
df_copy.columns[-3:].tolist()

['department', 'job_title', 'gender']

In [23]:
# 把后三列设为索引后交换内外两层
df_copy = df_copy.set_index(df_copy.columns[-3:].tolist()).swaplevel(0,2)
df_copy.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,EmployeeID,birthdate_key,age,city_name
gender,job_title,department,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M,CEO,Executive,1318,1/3/1954,61,Vancouver
F,VP Stores,Executive,1319,1/3/1957,58,Vancouver
F,Legal Counsel,Executive,1320,1/2/1955,60,Vancouver
M,VP Human Resources,Executive,1321,1/2/1959,56,Vancouver
M,VP Finance,Executive,1322,1/9/1958,57,Vancouver


In [24]:
# 恢复中间一层
df_copy = df_copy.reset_index(level=1)
df_copy.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,job_title,EmployeeID,birthdate_key,age,city_name
gender,department,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M,Executive,CEO,1318,1/3/1954,61,Vancouver
F,Executive,VP Stores,1319,1/3/1957,58,Vancouver
F,Executive,Legal Counsel,1320,1/2/1955,60,Vancouver
M,Executive,VP Human Resources,1321,1/2/1959,56,Vancouver
M,Executive,VP Finance,1322,1/9/1958,57,Vancouver


In [25]:
# 修改外层索引名为Gender
df_copy = df_copy.rename_axis(index={'gender':'Gender'})
df_copy.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,job_title,EmployeeID,birthdate_key,age,city_name
Gender,department,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M,Executive,CEO,1318,1/3/1954,61,Vancouver
F,Executive,VP Stores,1319,1/3/1957,58,Vancouver
F,Executive,Legal Counsel,1320,1/2/1955,60,Vancouver
M,Executive,VP Human Resources,1321,1/2/1959,56,Vancouver
M,Executive,VP Finance,1322,1/9/1958,57,Vancouver


In [26]:
# 用下划线合并两层行索引
df_copy.index = df_copy.index.map(lambda x: '_'.join(x))
df_copy.head()

Unnamed: 0,job_title,EmployeeID,birthdate_key,age,city_name
M_Executive,CEO,1318,1/3/1954,61,Vancouver
F_Executive,VP Stores,1319,1/3/1957,58,Vancouver
F_Executive,Legal Counsel,1320,1/2/1955,60,Vancouver
M_Executive,VP Human Resources,1321,1/2/1959,56,Vancouver
M_Executive,VP Finance,1322,1/9/1958,57,Vancouver


In [27]:
# 把行索引拆分为原状态
df_copy.index = df_copy.index.map(lambda x:tuple(x.split('_')))
df_copy.head()

Unnamed: 0,Unnamed: 1,job_title,EmployeeID,birthdate_key,age,city_name
M,Executive,CEO,1318,1/3/1954,61,Vancouver
F,Executive,VP Stores,1319,1/3/1957,58,Vancouver
F,Executive,Legal Counsel,1320,1/2/1955,60,Vancouver
M,Executive,VP Human Resources,1321,1/2/1959,56,Vancouver
M,Executive,VP Finance,1322,1/9/1958,57,Vancouver


In [28]:
# 修改索引名为原表名称
df_copy = df_copy.rename_axis(['gender', 'department'], axis=0)
df_copy.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,job_title,EmployeeID,birthdate_key,age,city_name
gender,department,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M,Executive,CEO,1318,1/3/1954,61,Vancouver
F,Executive,VP Stores,1319,1/3/1957,58,Vancouver
F,Executive,Legal Counsel,1320,1/2/1955,60,Vancouver
M,Executive,VP Human Resources,1321,1/2/1959,56,Vancouver
M,Executive,VP Finance,1322,1/9/1958,57,Vancouver


In [29]:
# 恢复默认索引并将列保持为原表的相对位置
df_copy = df_copy.reset_index()
df_copy.head()

Unnamed: 0,gender,department,job_title,EmployeeID,birthdate_key,age,city_name
0,M,Executive,CEO,1318,1/3/1954,61,Vancouver
1,F,Executive,VP Stores,1319,1/3/1957,58,Vancouver
2,F,Executive,Legal Counsel,1320,1/2/1955,60,Vancouver
3,M,Executive,VP Human Resources,1321,1/2/1959,56,Vancouver
4,M,Executive,VP Finance,1322,1/9/1958,57,Vancouver


发现顺序不对，于是采用reindex重置索引，将列名作为`columns`参数

In [30]:
df_copy = df_copy.reindex(columns=df.columns)
df_copy.head()

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
0,1318,1/3/1954,61,Vancouver,Executive,CEO,M
1,1319,1/3/1957,58,Vancouver,Executive,VP Stores,F
2,1320,1/2/1955,60,Vancouver,Executive,Legal Counsel,F
3,1321,1/2/1959,56,Vancouver,Executive,VP Human Resources,M
4,1322,1/9/1958,57,Vancouver,Executive,VP Finance,M


In [31]:
assert df_copy.equals(df)

### 3.2 Ex2：巧克力数据集
现有一份关于巧克力评价的数据集：

In [32]:
df = pd.read_csv('../data/chocolate.csv')
df.head(3)

Unnamed: 0,Company,Review\r\nDate,Cocoa\r\nPercent,Company\r\nLocation,Rating
0,A. Morin,2016,63%,France,3.75
1,A. Morin,2015,70%,France,2.75
2,A. Morin,2015,70%,France,3.0


1. 把列索引名中的`\n`替换为空格。
2. 巧克力`Rating`评分为1至5，每0.25分一档，请选出2.75分及以下且可可含量`Cocoa Percent`高于中位数的样本。
3. 将`Review Date`和`Company Location`设为索引后，选出`Review Date`在2012年之后且`Company Location`不属于`France, Canada, Amsterdam, Belgium`的样本。

**我的解答：**  
**第1问:**

In [33]:
df_demo = df.rename(columns=lambda x:str.replace(x, '\r\n', ' '))
df_demo.head()

Unnamed: 0,Company,Review Date,Cocoa Percent,Company Location,Rating
0,A. Morin,2016,63%,France,3.75
1,A. Morin,2015,70%,France,2.75
2,A. Morin,2015,70%,France,3.0
3,A. Morin,2015,70%,France,3.5
4,A. Morin,2015,70%,France,3.5


**第2问：**

In [34]:
df_demo['Cocoa Percent'] = df_demo['Cocoa Percent'].apply(lambda x: float(x[:-1])/100)

In [35]:
df_demo.query('Rating <=2.75 & `Cocoa Percent` > `Cocoa Percent`.median()').head()

Unnamed: 0,Company,Review Date,Cocoa Percent,Company Location,Rating
33,Akesson's (Pralus),2010,0.75,Switzerland,2.75
34,Akesson's (Pralus),2010,0.75,Switzerland,2.75
36,Alain Ducasse,2014,0.75,France,2.75
38,Alain Ducasse,2013,0.75,France,2.5
39,Alain Ducasse,2013,0.75,France,2.5


In [36]:
df_demo[(df_demo['Rating'] <=2.75) & (df_demo['Cocoa Percent'] > df_demo['Cocoa Percent'].median())].head()

Unnamed: 0,Company,Review Date,Cocoa Percent,Company Location,Rating
33,Akesson's (Pralus),2010,0.75,Switzerland,2.75
34,Akesson's (Pralus),2010,0.75,Switzerland,2.75
36,Alain Ducasse,2014,0.75,France,2.75
38,Alain Ducasse,2013,0.75,France,2.5
39,Alain Ducasse,2013,0.75,France,2.5


**第3问：**

In [37]:
idx = pd.IndexSlice

In [38]:
# 设置Review Date和Company Location为索引
df_demo = df_demo.set_index(['Review Date', 'Company Location']).sort_index(level=0)
df_demo.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Company,Cocoa Percent,Rating
Review Date,Company Location,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006,Belgium,Cote d' Or (Kraft),0.7,1.0
2006,Belgium,Dolfin (Belcolade),0.7,1.5
2006,Belgium,Neuhaus (Callebaut),0.73,2.0
2006,Belgium,Neuhaus (Callebaut),0.75,2.75
2006,Belgium,Neuhaus (Callebaut),0.71,3.0


In [39]:
# 选出Review Date在2012年之后且Company Location不属于France, Canada, Amsterdam, Belgium的样本
df_demo.loc[idx[2012:, df_demo.index.get_level_values(1).difference(['France', 'Canada', 'Amsterdam', 'Belgium'])], :].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Company,Cocoa Percent,Rating
Review Date,Company Location,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2012,Australia,Bahen & Co.,0.7,3.0
2012,Australia,Bahen & Co.,0.7,2.5
2012,Australia,Bahen & Co.,0.7,2.5
2012,Australia,Cravve,0.75,3.25
2012,Australia,Cravve,0.65,3.25
