# Merging, Joining, and Concatenating

There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating. In this lecture we will discuss these 3 methods with examples.

____

### Example DataFrames

In [1]:
import pandas as pd

In [2]:
import pandas as pd

In [3]:
df11 = pd.DataFrame({'A':['A0','A1','A2','A3'],
                    'B':['B0','B1','B2','B3'],
                    'C':['C0','C1','C2','C3'],
                    'D':['D0','D1','D2','D3']})

In [5]:
df12 = pd.DataFrame({'A':['A4','A5','A6','A7'],
                    'B':['B4','B5','B6','B7'],
                    'C':['C4','C5','C6','C7'],
                    'D':['D4','D5','D6','D7']})

In [6]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},
                        index=[0, 1, 2, 3])

In [7]:
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                        'B': ['B4', 'B5', 'B6', 'B7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],
                        'D': ['D4', 'D5', 'D6', 'D7']},
                         index=[4,5,6,7]) 

In [8]:
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [9]:
df11

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [10]:
df12

Unnamed: 0,A,B,C,D
0,A4,B4,C4,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


In [11]:
df2

Unnamed: 0,A,B,C,D
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


## Concatenation

Concatenation basically glues together DataFrames. Keep in mind that dimensions should match along the axis you are concatenating on. You can use **pd.concat** and pass in a list of DataFrames to concatenate together:

In [12]:
pd.concat([df1,df2])

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [13]:
pd.concat([df11,df12])

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
0,A4,B4,C4,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


In [14]:
pd.concat([df11,df12],axis=1)

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1
0,A0,B0,C0,D0,A4,B4,C4,D4
1,A1,B1,C1,D1,A5,B5,C5,D5
2,A2,B2,C2,D2,A6,B6,C6,D6
3,A3,B3,C3,D3,A7,B7,C7,D7


In [15]:
pd.concat([df1,df2],axis=1)

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1
0,A0,B0,C0,D0,,,,
1,A1,B1,C1,D1,,,,
2,A2,B2,C2,D2,,,,
3,A3,B3,C3,D3,,,,
4,,,,,A4,B4,C4,D4
5,,,,,A5,B5,C5,D5
6,,,,,A6,B6,C6,D6
7,,,,,A7,B7,C7,D7


_____
## Example DataFrames

In [16]:
left1 = pd.DataFrame({'sr_no': ['0', '1', '2', '3'],
                     'A': ['Jitu', 'Mitu', 'Seju', 'Keju'],
                     'B': ['Tiku', 'Chiku', 'Tipu', 'Ripu']})
   
right1 = pd.DataFrame({'sr_no':['0', '1', '2', '3',],
                          'C':['Anil', 'Avi', 'Ketan', 'Yashodip'],
                          'D':['Sunil', 'Mavi', 'Chetan', 'Shiva']})    

In [17]:
left3 = pd.DataFrame({'sr_no': ['K0', 'K1', 'K2', 'K3','K4'],
                     'A': ['A0', 'A1', 'A2', 'A3','A4'],
                     'B': ['B0', 'B1', 'B2', 'B3','B4']})
   
right4 = pd.DataFrame({'sr_no': ['K0', 'K1', 'K2', 'K3','K5'],
                          'C': ['C0', 'C1', 'C2', 'C3','C5'],
                          'D': ['D0', 'D1', 'D2', 'D3','D5']}) 

In [18]:
left1

Unnamed: 0,sr_no,A,B
0,0,Jitu,Tiku
1,1,Mitu,Chiku
2,2,Seju,Tipu
3,3,Keju,Ripu


In [19]:
right1

Unnamed: 0,sr_no,C,D
0,0,Anil,Sunil
1,1,Avi,Mavi
2,2,Ketan,Chetan
3,3,Yashodip,Shiva


In [None]:
pd.concat([left1,right1])

Unnamed: 0,sr_no,A,B
0,0,Jitu,Tiku
1,1,Mitu,Chiku
2,2,Seju,Tipu
3,3,Keju,Ripu
0,0,Anil,Sunil
1,1,Avi,Mavi
2,2,Ketan,Chetan
3,3,Yashodip,Shiva


___

## Merging

The **merge** function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example:

In [None]:
pd.merge(left1,right1, how = 'inner')

Unnamed: 0,sr_no,A,B,C,D
0,0,Jitu,Tiku,Anil,Sunil
1,1,Mitu,Chiku,Avi,Mavi
2,2,Seju,Tipu,Ketan,Chetan
3,3,Keju,Ripu,Yashodip,Shiva


In [None]:
pd.merge(left1,right1,how= 'outer')

Unnamed: 0,sr_no,A,B,C,D
0,0,Jitu,Tiku,Anil,Sunil
1,1,Mitu,Chiku,Avi,Mavi
2,2,Seju,Tipu,Ketan,Chetan
3,3,Keju,Ripu,Yashodip,Shiva


Or to show a more complicated example:

In [20]:
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})
    
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key4': ['K0', 'K0', 'K0', 'K0'],
                         'C': ['C0', 'C1', 'C2', 'C3'],
                         'D': ['D0', 'D1', 'D2', 'D3']})

In [21]:
left

Unnamed: 0,key1,key2,A,B
0,K0,K0,A0,B0
1,K0,K1,A1,B1
2,K1,K0,A2,B2
3,K2,K1,A3,B3


In [None]:
right

Unnamed: 0,key1,key4,C,D
0,K0,K0,C0,D0
1,K1,K0,C1,D1
2,K1,K0,C2,D2
3,K2,K0,C3,D3


In [None]:
pd.merge(left, right)

Unnamed: 0,key1,key2,A,B,key4,C,D
0,K0,K0,A0,B0,K0,C0,D0
1,K0,K1,A1,B1,K0,C0,D0
2,K1,K0,A2,B2,K0,C1,D1
3,K1,K0,A2,B2,K0,C2,D2
4,K2,K1,A3,B3,K0,C3,D3


In [22]:
pd.merge(left, right, how='outer', left_on=['key2'],right_on=['key4'])

Unnamed: 0,key1_x,key2,A,B,key1_y,key4,C,D
0,K0,K0,A0,B0,K0,K0,C0,D0
1,K0,K0,A0,B0,K1,K0,C1,D1
2,K0,K0,A0,B0,K1,K0,C2,D2
3,K0,K0,A0,B0,K2,K0,C3,D3
4,K1,K0,A2,B2,K0,K0,C0,D0
5,K1,K0,A2,B2,K1,K0,C1,D1
6,K1,K0,A2,B2,K1,K0,C2,D2
7,K1,K0,A2,B2,K2,K0,C3,D3
8,K0,K1,A1,B1,,,,
9,K2,K1,A3,B3,,,,


In [None]:
pd.merge(left, right, how='right', on=['key1'])

Unnamed: 0,key1,key2,A,B,key4,C,D
0,K0,K0,A0,B0,K0,C0,D0
1,K0,K1,A1,B1,K0,C0,D0
2,K1,K0,A2,B2,K0,C1,D1
3,K1,K0,A2,B2,K0,C2,D2
4,K2,K1,A3,B3,K0,C3,D3


In [None]:
pd.merge(left, right, how='inner')

Unnamed: 0,key1,key2,A,B,key4,C,D
0,K0,K0,A0,B0,K0,C0,D0
1,K0,K1,A1,B1,K0,C0,D0
2,K1,K0,A2,B2,K0,C1,D1
3,K1,K0,A2,B2,K0,C2,D2
4,K2,K1,A3,B3,K0,C3,D3


## Joining
Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

In [None]:
import pandas as pd 
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2']) 

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                    'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])

In [None]:
left

Unnamed: 0,A,B
K0,A0,B0
K1,A1,B1
K2,A2,B2


In [None]:
right

Unnamed: 0,C,D
K0,C0,D0
K2,C2,D2
K3,C3,D3


In [None]:
left.join(right)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


In [None]:
left.join(right, how='inner')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K2,A2,B2,C2,D2
