## 2. 数据合并

### concat()

适合在两个DataFrame列相同时使用

参数：
- ignore_index：是否清楚原DataFrame index
- join：合并方式，保留所有数据或保留相同数据
- axis：合并方向，0为纵向，1为横向

In [1]:
import pandas as pd

In [2]:
df1 = pd.DataFrame(
    {"Name": {0: "Al", 1: "Bob"},
     "Program": {0: "NLP", 1: "CV"}}
)

df2 = pd.DataFrame(
    {"Name": {0: "Ciel", 1: "Dio"},
     "Program": {0: "DS", 1: "Law"}}
)

In [3]:
df1

Unnamed: 0,Name,Program
0,Al,NLP
1,Bob,CV


In [4]:
df2

Unnamed: 0,Name,Program
0,Ciel,DS
1,Dio,Law


In [5]:
pd.concat([df1, df2])  # 纵向拼接，保留index

Unnamed: 0,Name,Program
0,Al,NLP
1,Bob,CV
0,Ciel,DS
1,Dio,Law


In [6]:
pd.concat([df1, df2], ignore_index=True)  # 纵向拼接，不保留index

Unnamed: 0,Name,Program
0,Al,NLP
1,Bob,CV
2,Ciel,DS
3,Dio,Law


In [7]:
df3 = pd.DataFrame(
    {"Name": {0: "Ciel", 1: "Dio"},
     "Program": {0: "DS", 1: "Law"},
     "Grade": {0: 11, 1: 13}}
)

In [8]:
pd.concat([df1, df3], ignore_index=True)  # 当两个DataFrame列不完全相同时，对应元素会被NaN填充

Unnamed: 0,Name,Program,Grade
0,Al,NLP,
1,Bob,CV,
2,Ciel,DS,11.0
3,Dio,Law,13.0


In [9]:
pd.concat([df1, df3], ignore_index=True, join='inner')  # 如果只想合并相同的列，可以添加join='inner'参数

Unnamed: 0,Name,Program
0,Al,NLP
1,Bob,CV
2,Ciel,DS
3,Dio,Law


In [10]:
pd.concat([df1, df2], axis=1)  # 横向拼接

Unnamed: 0,Name,Program,Name.1,Program.1
0,Al,NLP,Ciel,DS
1,Bob,CV,Dio,Law


### merge()

适合在两个DataFrame索引相同时使用

参数：
- how：合并方式，见下文
- on：合并索引

In [11]:
df1 = pd.DataFrame(
    {"Name": {0: "Adam", 1: "Bob", 2: "Ciel"},
     "Program": {0: "DS", 1: "CV", 2: "DS"}}
)

df2 = pd.DataFrame(
    {"Name": {0: "Adam", 1: "Bob", 2: "Dio"},
     "Grade": {0: 10, 1: 12, 2: 13}}
)

In [12]:
df1

Unnamed: 0,Name,Program
0,Adam,DS
1,Bob,CV
2,Ciel,DS


In [13]:
df2

Unnamed: 0,Name,Grade
0,Adam,10
1,Bob,12
2,Dio,13


In [14]:
pd.merge(df1, df2,
         how='left', on='Name')  # 仅保留df1包含的项

Unnamed: 0,Name,Program,Grade
0,Adam,DS,10.0
1,Bob,CV,12.0
2,Ciel,DS,


In [15]:
pd.merge(df1, df2,
         how='right', on='Name')  # 仅保留df2包含的项

Unnamed: 0,Name,Program,Grade
0,Adam,DS,10
1,Bob,CV,12
2,Dio,,13


In [16]:
pd.merge(df1, df2,
         how='inner', on='Name')  # 仅保留两个DataFrame共有项

Unnamed: 0,Name,Program,Grade
0,Adam,DS,10
1,Bob,CV,12


In [17]:
pd.merge(df1, df2,
         how='outer', on='Name')  # 保留所有项

Unnamed: 0,Name,Program,Grade
0,Adam,DS,10.0
1,Bob,CV,12.0
2,Ciel,DS,
3,Dio,,13.0
