# pandas 数据查询

### pandas数据查询的几种方式

1. loc：根据行、列的标签值查询
2. iloc：根据行、列的数字位置查询
3. where
4. query

loc既能查询，也能覆盖写

### pandas使用df.loc查询数据的方法

1. 使用单个label值查询
2. 使用值列表批量查询
3. 使用数值区间进行范围查询
4. 使用条件表达式查询
5. 调用函数查询

### 注意
1. 以上查询方法，即适用于行，也适用于列
2. 查询过程降维：dataframe>Series>值

In [1]:
import pandas as pd
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'

chipo = pd.read_csv(url, sep = '\t')

In [2]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [3]:
# 修改item_price列的数据
chipo.loc[:,'item_price'] = chipo['item_price'].str[1:].astype('float64')

In [4]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98


In [5]:
chipo.dtypes

order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object

In [7]:
# 使用单个label值查询数据，行或者列都可以之传入一个值进行查询
chipo.loc[1,'item_name']  # 得到一个单值

'Izze'

In [8]:
chipo.loc[[1,2],'item_name']

1                Izze
2    Nantucket Nectar
Name: item_name, dtype: object

In [9]:
chipo.loc[1,['item_name','quantity']]

item_name    Izze
quantity        1
Name: 1, dtype: object

In [10]:
# 使用值列表批量查询
chipo.loc[1:5,'item_name']   # Series

1                                     Izze
2                         Nantucket Nectar
3    Chips and Tomatillo-Green Chili Salsa
4                             Chicken Bowl
5                             Chicken Bowl
Name: item_name, dtype: object

In [12]:
chipo.loc[1:5,['item_name','quantity']]   # dataframe 

Unnamed: 0,item_name,quantity
1,Izze,1
2,Nantucket Nectar,1
3,Chips and Tomatillo-Green Chili Salsa,1
4,Chicken Bowl,2
5,Chicken Bowl,1


In [16]:
# 使用数值区间查询
chipo.loc[1:5,'order_id':'item_name'] 

Unnamed: 0,order_id,quantity,item_name
1,1,1,Izze
2,1,1,Nantucket Nectar
3,1,1,Chips and Tomatillo-Green Chili Salsa
4,2,2,Chicken Bowl
5,3,1,Chicken Bowl


In [17]:
# 使用条件表达式查询
chipo.loc[chipo['quantity']>4,:]

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
2441,970,5,Bottled Water,,7.5
3598,1443,15,Chips and Fresh Tomato Salsa,,44.25
3599,1443,7,Bottled Water,,10.5
3887,1559,8,Side of Chips,,13.52
4152,1660,10,Bottled Water,,15.0


In [18]:
chipo['quantity']>4

0       False
1       False
2       False
3       False
4       False
        ...  
4617    False
4618    False
4619    False
4620    False
4621    False
Name: quantity, Length: 4622, dtype: bool

In [20]:
chipo.loc[chipo['quantity']>4,'item_name':'item_price']

Unnamed: 0,item_name,choice_description,item_price
2441,Bottled Water,,7.5
3598,Chips and Fresh Tomato Salsa,,44.25
3599,Bottled Water,,10.5
3887,Side of Chips,,13.52
4152,Bottled Water,,15.0


In [21]:
# 调用函数查询
chipo.loc[lambda x:x.quantity>4,:]  # 使用函数对行进行筛选，传入函数的x是每一行的数据

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
2441,970,5,Bottled Water,,7.5
3598,1443,15,Chips and Fresh Tomato Salsa,,44.25
3599,1443,7,Bottled Water,,10.5
3887,1559,8,Side of Chips,,13.52
4152,1660,10,Bottled Water,,15.0


In [27]:
def query(df):
    return (df.item_name.str.startswith('B')) & (df.quantity >4)
chipo.loc[query,:]   # 函数式编程的思想就是将函数可以香变量一样传递

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
2441,970,5,Bottled Water,,7.5
3599,1443,7,Bottled Water,,10.5
4152,1660,10,Bottled Water,,15.0
