## Notes before use `merge( )`
1. `inner join` as default setting.
2. Can use `index` or columns' `value` to connect two DataFrame.

`merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)`

**how後為連接方式，類似SQL的join方式**<br>
1) inner（交集）為默認選項<br>
2) outer（並集）<br>
3) left / right / ...

In [19]:
import pandas as pd
import numpy as np

In [12]:
data_1 = {
    'name': ['Ann', 'Tim', 'Joe', 'John'],
    'email': ['min@gmail.com', 'hchang@gmail.com', 'laioding@gmail.com', 'hsulight@gmail.com'],
    'grades': [60, 77, 92, 43]
}

data_2 = {
    'name': ['Tony', 'Johnny', 'Louis', 'Timmy'],
    'email': ['ww@gmail.com', 'cc@gmail.com', 'bb@gmail.com', 'ee@gmail.com'],
    'grades': [70, 17, 32, 43]
}

data_3 = {
    'name': ['Tony', 'Johnny', 'Louis', 'Timmy'],
    'email': ['xxxx', 'xxxx', 'bb@gmail.com', 'ee@gmail.com'],
    'gender': ['male', 'male', 'male', 'female']
}

data_4 = {
    'col1': ['Tony', 'Johnny', 'Louis', 'Timmy'],
    'col2': ['male', 'male', 'male', 'female']
}

data_5 = {
    'aaa': ['Tony', 'Johnny', 'Louis', 'Timmy'],
    'bbb': ['xxxx', 'xxxx', 'bb@gmail.com', 'ee@gmail.com'],
    'ccc': ['male', 'male', 'male', 'female']
}

df1 = pd.DataFrame(data_1)
df2 = pd.DataFrame(data_2)
df3 = pd.DataFrame(data_3)
df4 = pd.DataFrame(data_4)
df5 = pd.DataFrame(data_5)

print('df1----------------------')
print(df1)
print('df2----------------------')
print(df2)
print('df3----------------------')
print(df3)
print('df4----------------------')
print(df4)
print('df5----------------------')
print(df5)

df1----------------------
   name               email  grades
0   Ann       min@gmail.com      60
1   Tim    hchang@gmail.com      77
2   Joe  laioding@gmail.com      92
3  John  hsulight@gmail.com      43
df2----------------------
     name         email  grades
0    Tony  ww@gmail.com      70
1  Johnny  cc@gmail.com      17
2   Louis  bb@gmail.com      32
3   Timmy  ee@gmail.com      43
df3----------------------
     name         email  gender
0    Tony          xxxx    male
1  Johnny          xxxx    male
2   Louis  bb@gmail.com    male
3   Timmy  ee@gmail.com  female
df4----------------------
     col1    col2
0    Tony    male
1  Johnny    male
2   Louis    male
3   Timmy  female
df5----------------------
      aaa           bbb     ccc
0    Tony          xxxx    male
1  Johnny          xxxx    male
2   Louis  bb@gmail.com    male
3   Timmy  ee@gmail.com  female


## Couple scenario when using  `merge( )`

**Scenario 1.** 兩個 DataFrame 有多個相同欄位名 
- 在不設定具體key的情況下，默認會自動進行所有欄位的匹配；
- 能夠通過設定key來選定匹配的值 (key能夠傳入list以選定多個)；

In [7]:
df_merge = pd.merge(df2,df3)
df_merge

Unnamed: 0,name,email,grades,gender
0,Louis,bb@gmail.com,32,male
1,Timmy,ee@gmail.com,43,female


In [9]:
## 通過設定key來選定匹配的值 

df_merge_1 = pd.merge(df2,df3,left_on='name',right_on='name')
df_merge_1

Unnamed: 0,name,email_x,grades,email_y,gender
0,Tony,ww@gmail.com,70,xxxx,male
1,Johnny,cc@gmail.com,17,xxxx,male
2,Louis,bb@gmail.com,32,bb@gmail.com,male
3,Timmy,ee@gmail.com,43,ee@gmail.com,female


In [26]:
## key能夠傳入list以選定多個

df_merge_2 = pd.merge(df2,df3,left_on=['name','email'],right_on=['name','email'])
print('希望指定多欄位時')
print(df_merge_2)
print()
print('---------------------------------')
print()
df_merge_3 = pd.merge(df2,df5,left_on=['name','email'],right_on=['aaa','bbb'])
print('希望指定多欄位，且欄位名不同時也能使用')
print(df_merge_3)

希望指定多欄位時
    name         email  grades  gender
0  Louis  bb@gmail.com      32    male
1  Timmy  ee@gmail.com      43  female

---------------------------------

希望指定多欄位，且欄位名不同時也能使用
    name         email  grades    aaa           bbb     ccc
0  Louis  bb@gmail.com      32  Louis  bb@gmail.com    male
1  Timmy  ee@gmail.com      43  Timmy  ee@gmail.com  female


---

**Scenario 2.** 

In [27]:
df_idx_1 = pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=['v1','v2','v3','v4'])
df_idx_2 = pd.DataFrame(np.arange(12,24,1).reshape(3,4),index=list('abd'),columns=['v5','v6','v7','v8'])

print('df_idx_1')
print(df_idx_1)
print()
print('---------------------------------')
print()
print('df_idx_2')
print(df_idx_2)

df_idx_1
   v1  v2  v3  v4
a   0   1   2   3
b   4   5   6   7
c   8   9  10  11

---------------------------------

df_idx_2
   v5  v6  v7  v8
a  12  13  14  15
b  16  17  18  19
d  20  21  22  23


In [25]:
df_idx_merger = pd.merge(df_idx_1, df_idx_2, left_index=True, right_index=True)
df_idx_merger

Unnamed: 0,v1,v2,v3,v4,v5,v6,v7,v8
a,0,1,2,3,12,13,14,15
b,4,5,6,7,16,17,18,19
