# Concatenating, Merging and Joining DataFrames in Pandas

1. Concat
2. Append
3. Merge
4. Join

# Let's understand concatenating

In [2]:
# Let's create two dummy data frames to understand
import pandas as pd

df1 = pd.DataFrame({'item': ['A', 'B', 'C', 'D'],
                     'value': [1, 2, 3, 5]})

df2 = pd.DataFrame({'item': ['E', 'F', 'G', 'H'],
                    'value': [5, 6, 7, 8]})

print(df1)
print(df2)

  item  value
0    A      1
1    B      2
2    C      3
3    D      5
  item  value
0    E      5
1    F      6
2    G      7
3    H      8


In [3]:
df1

Unnamed: 0,item,value
0,A,1
1,B,2
2,C,3
3,D,5


In [5]:
df2

Unnamed: 0,item,value
0,E,5
1,F,6
2,G,7
3,H,8


In [6]:
# Let's stack them vertically by using pd.concat

pd.concat([df1, df2])

Unnamed: 0,item,value
0,A,1
1,B,2
2,C,3
3,D,5
0,E,5
1,F,6
2,G,7
3,H,8


In [7]:
# What if they had different column names

df1 = pd.DataFrame({'item': ['A', 'B', 'C', 'D'],
                     'value': [1, 2, 3, 5]})

df2 = pd.DataFrame({'item': ['E', 'F', 'G', 'H'],
                    'quanity': [2, 2, 1, 5]})

pd.concat([df1, df2])

Unnamed: 0,item,value,quanity
0,A,1.0,
1,B,2.0,
2,C,3.0,
3,D,5.0,
0,E,,2.0
1,F,,2.0
2,G,,1.0
3,H,,5.0


In [8]:
# What if they had duplicates in items

df1 = pd.DataFrame({'item': ['A', 'B', 'C', 'D'],
                     'value': [1, 2, 3, 5]})

df2 = pd.DataFrame({'item': ['D', 'F', 'G', 'H'],
                    'quanity': [2, 2, 1, 5]})

pd.concat([df1, df2]).reset_index()

Unnamed: 0,index,item,value,quanity
0,0,A,1.0,
1,1,B,2.0,
2,2,C,3.0,
3,3,D,5.0,
4,0,D,,2.0
5,1,F,,2.0
6,2,G,,1.0
7,3,H,,5.0


In [9]:
# We can use axis = 1 to stack horizontally 

pd.concat([df1, df2], axis = 1)

Unnamed: 0,item,value,item.1,quanity
0,A,1,D,2
1,B,2,F,2
2,C,3,G,1
3,D,5,H,5


#### Vertical Concat

![alt text](https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_basic.png)

#### Horizontal Conat 

![alt text](https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_axis1.png)

# Similarly we can use Append to add multiple dataFrames onto another

In [10]:
df1 = pd.DataFrame({'item': ['A', 'B', 'C', 'D'],
                     'value': [1, 2, 3, 5]})

df2 = pd.DataFrame({'item': ['D', 'F', 'G', 'H'],
                    'quanity': [2, 2, 1, 5]})

df3 = pd.DataFrame({'item': ['I', 'J', 'K', 'L'],
                    'quanity': [3, 4, 7, 25]})

df1.append([df2, df3])

Unnamed: 0,item,value,quanity
0,A,1.0,
1,B,2.0,
2,C,3.0,
3,D,5.0,
0,D,,2.0
1,F,,2.0
2,G,,1.0
3,H,,5.0
0,I,,3.0
1,J,,4.0


# Pandas merge

Pandas has full-featured, high performance in-memory join operations idiomatically very similar to relational databases like SQL:

- **one-to-one joins:** for example when joining two DataFrame objects on their indexes (which must contain unique values).
- **many-to-one joins:** for example when joining an index (unique) to one or more columns in a different DataFrame.
- **many-to-many** joins: joining columns on columns.

![alt text](https://i.stack.imgur.com/hMKKt.jpg)

In [11]:
dfA = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

dfB = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                       'C': ['C0', 'C1', 'C2', 'C3'],
                       'D': ['D0', 'D1', 'D2', 'D3']})
dfA

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K2,A2,B2
3,K3,A3,B3


In [12]:
dfB

Unnamed: 0,key,C,D
0,K0,C0,D0
1,K1,C1,D1
2,K2,C2,D2
3,K3,C3,D3


In [13]:
# Merging on unique keys 
# by default we're doing an inner join

pd.merge(dfA, dfB, on='key')

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


In [14]:
 # Understand the difference between left and right joins
 
dfA = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

dfB = pd.DataFrame({'key': ['K4', 'K1', 'K2', 'K3'],
                       'C': ['C0', 'C1', 'C2', 'C3'],
                       'D': ['D0', 'D1', 'D2', 'D3']})
 
 pd.merge(dfA, dfB, how='left', on='key')

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,,
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


In [15]:
# Inner Joins

pd.merge(dfA, dfB, how='inner', on='key')

Unnamed: 0,key,A,B,C,D
0,K1,A1,B1,C1,D1
1,K2,A2,B2,C2,D2
2,K3,A3,B3,C3,D3


In [16]:
# Outer Joins

pd.merge(dfA, dfB, how='outer', on='key')

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,,
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3
4,K4,,,C0,D0


# Joining

In [17]:
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                      'B': ['B0', 'B1', 'B2']},
                     index=['K0', 'K1', 'K2'])

left

Unnamed: 0,A,B
K0,A0,B0
K1,A1,B1
K2,A2,B2


In [18]:
right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                       'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])

right

Unnamed: 0,C,D
K0,C0,D0
K2,C2,D2
K3,C3,D3


In [19]:
left.join(right)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


In [20]:
left.join(right, how='outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2
K3,,,C3,D3
