## 2.3 DataFrame 的索引与定位

In [3]:
import numpy as np
import pandas as pd

data = {
    "name": ["Alice", "Bob", "Charlie", "David", "Eva","Frank", "Grace", "Hannah", "Ian", "Jack"],
    "age": [25, 30, 35, 40, 28, 32, 29, 31, 27, 33],
    "city": ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix", "Philadelphia", "San Antonio", "San Diego", "Dallas", "San Jose"],
    "score": [85.5, 90.0, 78.5, 88.0, 92.5, 80.0, 87.5, 91.0, 76.5, 89.5],
    "gender": ["F", "M", "M", "M", "F", "M", "F", "F", "M", "M"]
}
df = pd.DataFrame(data, index=["S001", "S002", "S003", "S004", "S005", "S006", "S007", "S008", "S009", "S010"])
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, S001 to S010
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   name    10 non-null     object 
 1   age     10 non-null     int64  
 2   city    10 non-null     object 
 3   score   10 non-null     float64
 4   gender  10 non-null     object 
dtypes: float64(1), int64(1), object(3)
memory usage: 480.0+ bytes


#### 1. 通过原始的写法 [] 进行定位，注意：此时列优先，只能传递列索引，不推荐使用

##### 1.1 取单列，返回值为 Series

In [6]:
s_age = df["age"]
s_age

S001    25
S002    30
S003    35
S004    40
S005    28
S006    32
S007    29
S008    31
S009    27
S010    33
Name: age, dtype: int64

##### 1.2 取多列，返回值为 DataFrame

In [7]:
df_sub_1 = df[["name", "city", "score"]]
df_sub_1

Unnamed: 0,name,city,score
S001,Alice,New York,85.5
S002,Bob,Los Angeles,90.0
S003,Charlie,Chicago,78.5
S004,David,Houston,88.0
S005,Eva,Phoenix,92.5
S006,Frank,Philadelphia,80.0
S007,Grace,San Antonio,87.5
S008,Hannah,San Diego,91.0
S009,Ian,Dallas,76.5
S010,Jack,San Jose,89.5


#### 2. 推荐方法1: 采用 loc[行索引，列索引] 进行定位

##### 2.1 取单个值

In [13]:
value_1 = df.loc["S003", "name"]
print("Name of S003:", value_1)
value_2 = df.loc["S007", "score"]
print("Score of S007:", value_2)

Name of S003: Charlie
Score of S007: 87.5


##### 2.2 取多行多列

In [None]:
df_sub_3 = df.loc[["S001", "S005", "S009"], ["name", "city"]]
df_sub_3

2.3 取连续多行多列（切片）

In [21]:
df_sub_4 = df.loc["S001":"S003","name":"score"]
df_sub_4

Unnamed: 0,name,age,city,score
S001,Alice,25,New York,85.5
S002,Bob,30,Los Angeles,90.0
S003,Charlie,35,Chicago,78.5


##### 2.4 取单行/单列， 注意：返回值为 Series，传递单个索引时不要使用列表形式

In [25]:
# 取整行，返回值为 Series
s_sub_1 = df.loc["S004",:]
s_sub_1

name        David
age            40
city      Houston
score        88.0
gender          M
Name: S004, dtype: object

In [24]:
# 取整列,返回值为 Series
s_sub_2 = df.loc[:,"name"]
s_sub_2

S001      Alice
S002        Bob
S003    Charlie
S004      David
S005        Eva
S006      Frank
S007      Grace
S008     Hannah
S009        Ian
S010       Jack
Name: name, dtype: object

#### 3. 推荐方法2: 采用 iloc[行位置索引，列位置索引] 进行定位

##### 3.1 取单个值

In [26]:
value_3 = df.iloc[1,3] # 注意：位置索引从0开始
print("Score of S002:", value_3)

Score of S002: 90.0


##### 3.2 取多行多列

In [28]:
df_sub_5 = df.iloc[[1,2,3],[0,2,3]]
df_sub_5

Unnamed: 0,name,city,score
S002,Bob,Los Angeles,90.0
S003,Charlie,Chicago,78.5
S004,David,Houston,88.0


##### 3.3 取连续多行多列（切片）

In [29]:
df_sub_6 = df.iloc[0:4, 1:4] # 注意：切片不包含结束位置
df_sub_6

Unnamed: 0,age,city,score
S001,25,New York,85.5
S002,30,Los Angeles,90.0
S003,35,Chicago,78.5
S004,40,Houston,88.0


##### 3.4 取单行/单列， 注意：返回值为 Series

In [30]:
# 取整行，返回值为 Series
s_sub_3 = df.iloc[1,:]
s_sub_3

name              Bob
age                30
city      Los Angeles
score            90.0
gender              M
Name: S002, dtype: object

In [31]:
# 取整列,返回值为 Series
s_sub_4 = df.iloc[:,2]
s_sub_4

S001        New York
S002     Los Angeles
S003         Chicago
S004         Houston
S005         Phoenix
S006    Philadelphia
S007     San Antonio
S008       San Diego
S009          Dallas
S010        San Jose
Name: city, dtype: object

#### 4. 通过布尔索引进行定位

##### 4.1 基本布尔索引

In [33]:
df_sub_7 = df.loc[df.loc[:,"age"] > 25]
# 或者 df_sub_7 = df.loc[df["age"] > 25]
df_sub_7

Unnamed: 0,name,age,city,score,gender
S002,Bob,30,Los Angeles,90.0,M
S003,Charlie,35,Chicago,78.5,M
S004,David,40,Houston,88.0,M
S005,Eva,28,Phoenix,92.5,F
S006,Frank,32,Philadelphia,80.0,M
S007,Grace,29,San Antonio,87.5,F
S008,Hannah,31,San Diego,91.0,F
S009,Ian,27,Dallas,76.5,M
S010,Jack,33,San Jose,89.5,M


##### 4.2 布尔索引 + 多列选择

In [35]:
df_sub_8 = df.loc[df["score"] > 85, "name":"score"]
df_sub_8

Unnamed: 0,name,age,city,score
S001,Alice,25,New York,85.5
S002,Bob,30,Los Angeles,90.0
S004,David,40,Houston,88.0
S005,Eva,28,Phoenix,92.5
S007,Grace,29,San Antonio,87.5
S008,Hannah,31,San Diego,91.0
S010,Jack,33,San Jose,89.5
