In [15]:
import pandas as pd
# 读取数据航空顾客数据
df = pd.read_csv("./DataSet/airline_passenger_satisfaction.csv", sep=",")
df.head(10)

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,...,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,...,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,...,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,...,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,...,3,4,4,5,4,3,3,3,3,Satisfied
5,6,Male,43,Returning,Business,Business,3788,0,0.0,4,...,4,4,4,3,3,4,4,4,4,Satisfied
6,7,Male,43,Returning,Business,Business,1963,0,0.0,3,...,5,5,5,4,5,5,3,5,5,Satisfied
7,8,Female,60,Returning,Business,Business,853,0,3.0,3,...,3,4,4,4,4,3,4,3,3,Satisfied
8,9,Male,50,Returning,Business,Business,2607,0,0.0,1,...,4,3,4,3,3,4,4,4,4,Neutral or Dissatisfied
9,10,Female,38,Returning,Business,Business,2822,13,0.0,2,...,5,4,5,4,2,5,2,5,5,Satisfied


In [16]:
# 读取数据字典表
dictionary = pd.read_csv("./DataSet/data_dictionary.csv", sep=",")
dictionary.head(10)

Unnamed: 0,Field,说明
0,ID,唯一乘客标识符
1,Gender,乘客性别（女性/男性）
2,Age,乘客年龄
3,Customer Type,航空公司客户类型（首次/常客）
4,Type of Travel,航班目的（商务/个人）
5,Class,乘客座位所在的飞机旅行舱级别
6,Flight Distance,航班距离（英里）
7,Departure Delay,航班起飞延误（分钟）
8,Arrival Delay,航班到达延误（分钟）
9,Departure and Arrival Time Convenience,对航班起飞和到达时间便利性的满意度水平，从 1（最低）到 5（最高） - 0 表示“不适用”


In [18]:
airlines_columns = list(df.columns)
print(airlines_columns)
dValuesColumns = dictionary['说明']

['ID', 'Gender', 'Age', 'Customer Type', 'Type of Travel', 'Class', 'Flight Distance', 'Departure Delay', 'Arrival Delay', 'Departure and Arrival Time Convenience', 'Ease of Online Booking', 'Check-in Service', 'Online Boarding', 'Gate Location', 'On-board Service', 'Seat Comfort', 'Leg Room Service', 'Cleanliness', 'Food and Drink', 'In-flight Service', 'In-flight Wifi Service', 'In-flight Entertainment', 'Baggage Handling', 'Satisfaction']


### 获取个别列数据

In [19]:
# passengerBaseInfoData
passengerBaseInfoData = df[['ID', 'Gender', 'Age', 'Customer Type']]
passengerBaseInfoData

Unnamed: 0,ID,Gender,Age,Customer Type
0,1,Male,48,First-time
1,2,Female,35,Returning
2,3,Male,41,Returning
3,4,Male,50,Returning
4,5,Female,49,Returning
...,...,...,...,...
129875,129876,Male,28,Returning
129876,129877,Male,41,Returning
129877,129878,Male,42,Returning
129878,129879,Male,50,Returning


这段代码解释了pandas中loc索引器的特点和用法：

**功能**：通过行索引和列标签名称来选择数据
**语法**：使用df.loc[行标签, 列标签]格式
**特性**：切片操作时包含起始和结束标签（闭区间），与普通切片的左闭右开不同

这是pandas基于标签位置的数据选择方法。

In [24]:
df.loc[:5,['ID','Gender','Age','Customer Type']]

Unnamed: 0,ID,Gender,Age,Customer Type
0,1,Male,48,First-time
1,2,Female,35,Returning
2,3,Male,41,Returning
3,4,Male,50,Returning
4,5,Female,49,Returning
5,6,Male,43,Returning


# pandas中iloc索引器详解

## 特点
- **基于位置的索引**：通过行和列的整数位置来选取数据（从0开始）
- **语法格式**：`df.iloc[行位置, 列位置]`
- **切片规则**：左闭右开区间，包含起始位置但不包含结束位置

## 与loc的主要区别

| 特性 | loc | iloc |
|------|-----|------|
| 索引方式 | 基于标签 | 基于整数位置 |
| 切片范围 | 闭区间 [start:end] | 左闭右开 [start:end) |
| 使用示例 | `df.loc['a':'c']` 包含'a'到'c' | `df.iloc[0:3]` 包含0到2行 |

## 实际应用示例

根据当前数据集，以下是一些iloc使用的例子：

```python
# 获取前3行数据（行位置0,1,2）
df.iloc[0:3]

# 获取第1-2行，第1-3列的数据
df.iloc[1:3, 1:4]

# 获取特定行位置的所有列
df.iloc[[0, 2, 4]]  # 第1、3、5行
```


这种基于位置的索引方式在处理大型数据集时特别有用，特别是在你知道确切的行列位置而不需要记住标签名称的情况下。

In [29]:
# 这个iloc，通过索引获取数据
df.iloc[3:11,3:9]

Unnamed: 0,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay
3,Returning,Business,Business,1905,0,0.0
4,Returning,Business,Business,3470,0,1.0
5,Returning,Business,Business,3788,0,0.0
6,Returning,Business,Business,1963,0,0.0
7,Returning,Business,Business,853,0,3.0
8,Returning,Business,Business,2607,0,0.0
9,Returning,Business,Business,2822,13,0.0
10,First-time,Business,Business,821,0,5.0


In [33]:
dictionary

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 24 columns):
 #   Column                                  Non-Null Count   Dtype  
---  ------                                  --------------   -----  
 0   ID                                      129880 non-null  int64  
 1   Gender                                  129880 non-null  object 
 2   Age                                     129880 non-null  int64  
 3   Customer Type                           129880 non-null  object 
 4   Type of Travel                          129880 non-null  object 
 5   Class                                   129880 non-null  object 
 6   Flight Distance                         129880 non-null  int64  
 7   Departure Delay                         129880 non-null  int64  
 8   Arrival Delay                           129487 non-null  float64
 9   Departure and Arrival Time Convenience  129880 non-null  int64  
 10  Ease of Online Booking                  1298

In [36]:
high_age_passenger = df[df['Age']>50]
high_age_passenger

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
7,8,Female,60,Returning,Business,Business,853,0,3.0,3,...,3,4,4,4,4,3,4,3,3,Satisfied
14,15,Male,52,Returning,Personal,Economy,853,0,0.0,4,...,3,2,3,2,2,5,2,2,5,Neutral or Dissatisfied
15,16,Male,70,Returning,Personal,Economy,821,0,0.0,5,...,4,5,2,5,5,5,2,5,4,Neutral or Dissatisfied
17,18,Female,61,Returning,Personal,Economy,821,0,0.0,5,...,5,5,2,1,4,5,2,5,5,Neutral or Dissatisfied
21,22,Female,70,Returning,Personal,Economy,853,0,0.0,4,...,1,4,1,4,2,1,1,1,1,Neutral or Dissatisfied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129853,129854,Female,69,Returning,Personal,Economy,337,41,48.0,5,...,4,2,3,4,1,4,3,4,4,Neutral or Dissatisfied
129854,129855,Female,56,Returning,Business,Business,1972,0,0.0,0,...,3,5,3,5,3,3,0,3,3,Satisfied
129862,129863,Male,51,Returning,Business,Business,308,0,0.0,4,...,5,5,5,4,3,5,4,5,5,Satisfied
129865,129866,Male,64,Returning,Business,Economy Plus,337,27,18.0,3,...,1,2,2,2,2,3,2,2,3,Neutral or Dissatisfied


In [40]:
my_data = df[(df['Age']>40) & (df['Gender'] == 'Female')]

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,...,3,4,4,5,4,3,3,3,3,Satisfied
7,8,Female,60,Returning,Business,Business,853,0,3.0,3,...,3,4,4,4,4,3,4,3,3,Satisfied
16,17,Female,48,Returning,Personal,Economy,821,0,0.0,3,...,5,1,2,2,1,5,2,5,5,Neutral or Dissatisfied
17,18,Female,61,Returning,Personal,Economy,821,0,0.0,5,...,5,5,2,1,4,5,2,5,5,Neutral or Dissatisfied
19,20,Female,42,Returning,Personal,Economy,821,4,0.0,3,...,1,4,3,3,1,1,3,1,1,Neutral or Dissatisfied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129853,129854,Female,69,Returning,Personal,Economy,337,41,48.0,5,...,4,2,3,4,1,4,3,4,4,Neutral or Dissatisfied
129854,129855,Female,56,Returning,Business,Business,1972,0,0.0,0,...,3,5,3,5,3,3,0,3,3,Satisfied
129855,129856,Female,41,Returning,Business,Business,3926,0,0.0,0,...,2,4,2,5,3,2,0,2,2,Satisfied
129872,129873,Female,44,Returning,Personal,Economy Plus,308,0,22.0,4,...,3,4,3,5,5,3,3,3,3,Neutral or Dissatisfied


In [50]:
df.Gender.unique() # 显示性别类别
df['Customer Type'].unique()  # 只能显示类别的名称
df['Gender'].value_counts() # 能直接显示类别以及对应的数量

Gender
Female    65899
Male      63981
Name: count, dtype: int64