# CH6 数据选择（菜品挑选）

数据选择，简单地说就是把需要的数据做筛选操作，主要有以下三种方式：<br>
1.列选择<br>
2.行选择<br>
3.行列同时选择<br>

In [2]:
import pandas as pd

## 6.1列选择

在Python中，若只需要选出部分列，可以直接在df后面的方括号中指明列名或是用列表传递多个值

In [3]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)
df[["订单编号","客户姓名"]]

Unnamed: 0,订单编号,客户姓名
0,A1,张通
1,A2,李谷
2,A3,孙凤
3,A4,赵恒
4,A5,赵恒


用列明进行筛选的方式称之为普通索引<br>
还可以用iloc方法传入具体列的位置，对数据进行筛选<br>
iloc[ , ]的方括号中分别表示行与列的选取，同样也是从0开始计数<br>
比如说，0:3代表选择第1列到第4列之间的值，用区间表示相当于是[0, 4)<br>

In [4]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)
df.iloc[:, 0:3]

Unnamed: 0,订单编号,客户姓名,唯一识别码
0,A1,张通,101
1,A2,李谷,102
2,A3,孙凤,103
3,A4,赵恒,104
4,A5,赵恒,104


## 6.2行选择

与列选择类似，行选择也有两种方法：<br>
1.普通索引，传入具体行索引的名称，要用到loc方法<br>
2.位置索引，传入具体的行数，要用到iloc方法；<br>

In [5]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)
df.index = ["一","二","三","四","五"]

为了演示方便，对索引重新进行赋值

In [6]:
df.loc["一"]

订单编号           A1
客户姓名           张通
唯一识别码         101
成交时间     2018-8-8
Name: 一, dtype: object

In [7]:
df.loc[["一","二"]]

Unnamed: 0,订单编号,客户姓名,唯一识别码,成交时间
一,A1,张通,101,2018-8-8
二,A2,李谷,102,2018-8-9


In [8]:
df.iloc[0]

订单编号           A1
客户姓名           张通
唯一识别码         101
成交时间     2018-8-8
Name: 一, dtype: object

In [9]:
df.iloc[[0,1]]

Unnamed: 0,订单编号,客户姓名,唯一识别码,成交时间
一,A1,张通,101,2018-8-8
二,A2,李谷,102,2018-8-9


In [10]:
df.iloc[0:2]

Unnamed: 0,订单编号,客户姓名,唯一识别码,成交时间
一,A1,张通,101,2018-8-8
二,A2,李谷,102,2018-8-9


当然，也可以通过判断条件来对行进行筛选，这种称为布尔索引

In [11]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "年龄":[31,45,23,240,240],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)

In [12]:
df[df["年龄"]< 200]

Unnamed: 0,订单编号,客户姓名,唯一识别码,年龄,成交时间
0,A1,张通,101,31,2018-8-8
1,A2,李谷,102,45,2018-8-9
2,A3,孙凤,103,23,2018-8-10


In [13]:
df[(df["年龄"] < 200) & (df["唯一识别码"] < 102)]

Unnamed: 0,订单编号,客户姓名,唯一识别码,年龄,成交时间
0,A1,张通,101,31,2018-8-8


## 6.3行与列同时选择

In [14]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)
df.index = ["一","二","三","四","五"]

In [15]:
df

Unnamed: 0,订单编号,客户姓名,唯一识别码,成交时间
一,A1,张通,101,2018-8-8
二,A2,李谷,102,2018-8-9
三,A3,孙凤,103,2018-8-10
四,A4,赵恒,104,2018-8-11
五,A5,赵恒,104,2018-8-12


### 6.3.1普通索引+普通索引

同时传入行与列的索引名进行数据选择，用loc方法

In [16]:
df.loc[["一","三"],["订单编号","唯一识别码"]]

Unnamed: 0,订单编号,唯一识别码
一,A1,101
三,A3,103


### 6.3.2位置索引+位置索引

同时传入行与列的位置索引名进行数据选择，用iloc方法<br>
左边的方括号表示行索引选择，右边的方括号表示列索引选择<br>
都是从0开始计数的

In [17]:
df.iloc[[0,1],[0,2]]

Unnamed: 0,订单编号,唯一识别码
一,A1,101
二,A2,102


### 6.3.3布尔索引+普通索引

逻辑：先用布尔索引选择行，再通过普通索引选择列

In [18]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "年龄":[31,45,23,240,240],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)

In [19]:
df[df["年龄"] < 200][["订单编号","年龄"]] 

Unnamed: 0,订单编号,年龄
0,A1,31
1,A2,45
2,A3,23


### 6.3.4切片索引+切片索引

同时传入行，列索引的位置索引进行数据选择，用iloc方法

In [21]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)
df.index = ["一","二","三","四","五"]
df

Unnamed: 0,订单编号,客户姓名,唯一识别码,成交时间
一,A1,张通,101,2018-8-8
二,A2,李谷,102,2018-8-9
三,A3,孙凤,103,2018-8-10
四,A4,赵恒,104,2018-8-11
五,A5,赵恒,104,2018-8-12


In [22]:
df.iloc[0:3,1:3]

Unnamed: 0,客户姓名,唯一识别码
一,张通,101
二,李谷,102
三,孙凤,103


### 6.3.5切片索引+普通索引

In [23]:
data_dict = {"订单编号":["A1","A2","A3","A4","A5"],
             "客户姓名":["张通","李谷","孙凤","赵恒","赵恒"],
             "唯一识别码":[101,102,103,104,104],
             "成交时间":["2018-8-8","2018-8-9","2018-8-10","2018-8-11","2018-8-12"]}
df = pd.DataFrame(data_dict)
df.index = ["一","二","三","四","五"]

In [24]:
df.ix[0:2, ["客户姓名","唯一识别码"]]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


Unnamed: 0,客户姓名,唯一识别码
一,张通,101
二,李谷,102
