# Merging, Joining and Concatenating 

There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating. In this lecture we will discuss these 3 methods with examples.

### Example DataFrames

In [76]:
import pandas as pd

In [77]:
# create data frame 1
df1 = pd.DataFrame(data={
    'A': 'A0 A1 A2 A3'.split(),
    'B': 'B0 B1 B2 B3'.split(),
    'C': 'C0 C1 C2 C3'.split(),
    'D': 'D0 D1 D2 D3'.split()
},
index=[0, 1, 2, 3])

In [78]:
# show the dataframe 1
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [79]:
# create data frame 2
df2 = pd.DataFrame(data={
    'A': 'A4 A5 A6 A7'.split(),
    'B': 'B4 B5 B6 B7'.split(),
    'C': 'C4 C5 C6 C7'.split(),
    'D': 'D4 D5 D6 D7'.split()
},
index=[4, 5, 6, 7])

In [80]:
# show the dataframe 2
df2

Unnamed: 0,A,B,C,D
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [81]:
# create data frame 3
df3 = pd.DataFrame(data={
    'A': 'A8 A9 A10 A11'.split(),
    'B': 'B8 B9 B10 B11'.split(),
    'C': 'C8 C9 C10 C11'.split(),
    'D': 'D8 D9 D10 D11'.split()
},
index=[8, 9, 10, 11])

In [82]:
# show data frame 3
df3

Unnamed: 0,A,B,C,D
8,A8,B8,C8,D8
9,A9,B9,C9,D9
10,A10,B10,C10,D10
11,A11,B11,C11,D11


## Concatenation 

Concatenation basically glues together DataFrames. Keep in mind that dimensions should match along the axis you are concatenating on. You can use **pd.concat** and pass in a list of DataFrames to concatenate together:

In [83]:
pd.concat([df1, df2, df3])

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7
8,A8,B8,C8,D8
9,A9,B9,C9,D9


We can specify the value of the axis parameter as 1 if we want to concatenate along the columns.

In [84]:
pd.concat([df1,df2,df3], axis=1)

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1,A.2,B.2,C.2,D.2
0,A0,B0,C0,D0,,,,,,,,
1,A1,B1,C1,D1,,,,,,,,
2,A2,B2,C2,D2,,,,,,,,
3,A3,B3,C3,D3,,,,,,,,
4,,,,,A4,B4,C4,D4,,,,
5,,,,,A5,B5,C5,D5,,,,
6,,,,,A6,B6,C6,D6,,,,
7,,,,,A7,B7,C7,D7,,,,
8,,,,,,,,,A8,B8,C8,D8
9,,,,,,,,,A9,B9,C9,D9


Notice that the concatenation of DataFrames has a bunch of NaN values. This is because it these dataframes didn't have values for all the indices we wanted to concatenate on e.g. In df1, we only had valid values for indices (0, 1, 2, 3) whereby the rest of the indices upto 11 had to be filled with NaN.

Notwithstanding the large amount of NaN values present in this column-based concatenation of DataFrames, most of our concatenations will be along the axis=1 (i.e. column-based)

### Examples DataFrames

In [85]:
left = pd.DataFrame(data={
    'key': 'K0 K1 K2 K3'.split(),
    'A'  : 'A0 A1 A2 A3'.split(),
    'B'  : 'B0 B1 B2 B3'.split()
})

right = pd.DataFrame(data={
    'key': 'K0 K1 K2 K3'.split(),
    'C'  : 'C0 C1 C2 C3'.split(),
    'D'  : 'D0 D1 D2 D3'.split()
})

In [86]:
# display the left dataframe
left

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K2,A2,B2
3,K3,A3,B3


In [87]:
# display the right dataframe
right

Unnamed: 0,key,C,D
0,K0,C0,D0
1,K1,C1,D1
2,K2,C2,D2
3,K3,C3,D3


## Merging

The **merge** function allows you to merge DataFrames together using a similar logic as merging SQL Tables togther. For example:

In [88]:
pd.merge(left, right, how='inner', on='key')

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


Or to show a more complicated example: -

In [89]:
left = pd.DataFrame(data={
    'key1': 'K0 K0 K1 K2'.split(),
    'key2': 'K0 K1 K0 K1'.split(),
       'A': 'A0 A1 A2 A3'.split(),
       'B': 'B0 B1 B2 B3'.split()
})

right = pd.DataFrame(data={
    'key1': 'K0 K1 K1 K2'.split(),
    'key2': 'K0 K0 K0 K0'.split(),
       'C': 'C0 C1 C2 C3'.split(),
       'D': 'D0 D1 D2 D3'.split()
})

In [90]:
pd.merge(left=left, right=right, on=['key1', 'key2'])
# By default the merge happens like an inner join in SQL

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2


In [91]:
pd.merge(left=left, right=right, how='outer', on=['key1', 'key2'])
# This works like an outer join in SQL

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K0,K1,A1,B1,,
2,K1,K0,A2,B2,C1,D1
3,K1,K0,A2,B2,C2,D2
4,K2,K1,A3,B3,,
5,K2,K0,,,C3,D3


In [92]:
pd.merge(left=left, right=right, how='right', on=['key1', 'key2'])
# This merge works like a right outer join in SQL

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2
3,K2,K0,,,C3,D3


## Joining 

Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

In [93]:
left = pd.DataFrame(data={
    'A': 'A0 A1 A2'.split(),
    'B': 'B0 B1 B2'.split(),
}, index='K0 K1 K2'.split())

right = pd.DataFrame(data={
    'C': 'C0 C2 C3'.split(),
    'D': 'D0 D2 D3'.split(),
}, index='K0 K2 K3'.split())

In [94]:
left.join(right)
# This will automatically do an inner join on the left and the right dataframes based on the index keys

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


The join operation can essentially be thought off as a merge operation except the keys we want to join-on are on the index instead of the columns.

In [95]:
left.join(right, how='outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2
K3,,,C3,D3
