# Combining and Merging

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

Data contained in pandas objects can be combined in a number of ways:
1. `pandas.merge`: connects rows in DataFrames based on one or more keys.
1. `pandas.concat`: concatenate or _stack_ objects along an axis.
1. `combine_first`: splice together overlapping data to fill in missing values in one object with values from another

## Database-Style DataFrame Joins
- `pandas.merge`: main entry point to achieve joins in DataFrames

In [2]:
df1 = pd.DataFrame({'key': list('bbacaab'),
                    'data1':pd.Series(range(7), dtype="Int64")})
df2 = pd.DataFrame({"key": ['a','b','d'],
                    'data2':pd.Series(range(3), dtype="Int64")})                    

In [3]:
df1

Unnamed: 0,key,data1
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,a,5
6,b,6


In [4]:
df2

Unnamed: 0,key,data2
0,a,0
1,b,1
2,d,2


- An exmaple of many-to-one join.
- data in df1 has multiple rows labeled a and b,
- whereas, df2 has only one row for each value in the key column.

In [5]:
pd.merge(df1, df2)

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,1,1
2,a,2,0
3,a,4,0
4,a,5,0
5,b,6,1


- if we do not specify which column to join on, then `pandas.merge` uses the overlapping column anmes as the keys.
- Its a good practice to always specify the keys.

In [6]:
pd.merge(df1, df2, on='key')

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,1,1
2,a,2,0
3,a,4,0
4,a,5,0
5,b,6,1


In [12]:
df3 = pd.DataFrame({'lkey': list('bbacaab'),
                    'data1':pd.Series(range(7), dtype='Int64')})
df4 = pd.DataFrame({'rkey':list('abd'),
                    'data2':pd.Series(range(3), dtype='Int64')})

In [13]:
pd.merge(df3, df4, left_on='lkey', right_on='rkey')

Unnamed: 0,lkey,data1,rkey,data2
0,b,0,b,1
1,b,1,b,1
2,a,2,a,0
3,a,4,a,0
4,a,5,a,0
5,b,6,b,1


- By default `pandas.merge` does inner join.
- other possible options are
    - `left`
    - `right`
    - `outer`: takes the uniun of keys, combining the effect of applying both left and right jons.

In [9]:
pd.merge(df1, df2, how='outer')

Unnamed: 0,key,data1,data2
0,a,2.0,0.0
1,a,4.0,0.0
2,a,5.0,0.0
3,b,0.0,1.0
4,b,1.0,1.0
5,b,6.0,1.0
6,c,3.0,
7,d,,2.0


In [14]:
pd.merge(df3, df4, left_on='lkey', right_on='rkey', how='outer')

Unnamed: 0,lkey,data1,rkey,data2
0,a,2.0,a,0.0
1,a,4.0,a,0.0
2,a,5.0,a,0.0
3,b,0.0,b,1.0
4,b,1.0,b,1.0
5,b,6.0,b,1.0
6,c,3.0,,
7,,,d,2.0


- _Many-to-many_ merges form the Cartesian product of the matching keys.

In [19]:
df1 = pd.DataFrame({'key':list('bbacab'),'data1':pd.Series(range(6), dtype='Int64')})
df2 = pd.DataFrame({'key':list('ababd'),'data2':pd.Series(range(5), dtype='Int64')})

In [16]:
df1

Unnamed: 0,key,data1
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,b,5


In [20]:
df2

Unnamed: 0,key,data2
0,a,0
1,b,1
2,a,2
3,b,3
4,d,4


In [21]:
pd.merge(df1, df2, on='key', how='left')

Unnamed: 0,key,data1,data2
0,b,0,1.0
1,b,0,3.0
2,b,1,1.0
3,b,1,3.0
4,a,2,0.0
5,a,2,2.0
6,c,3,
7,a,4,0.0
8,a,4,2.0
9,b,5,1.0


In [22]:
pd.merge(df1, df2, how='inner')

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,0,3
2,b,1,1
3,b,1,3
4,a,2,0
5,a,2,2
6,a,4,0
7,a,4,2
8,b,5,1
9,b,5,3


Merging with mutliple keys

In [25]:
left = pd.DataFrame({'key1':['foo','foo','bar',],'key2':['one','two','one',],"lval": pd.Series([1,2,3], dtype='Int64')})
right = pd.DataFrame({'key1':['foo','foo','bar','bar',],'key2':['one','one','one','two',],"rval": pd.Series([4,5,6,7], dtype='Int64')})

In [26]:
left

Unnamed: 0,key1,key2,lval
0,foo,one,1
1,foo,two,2
2,bar,one,3


In [27]:
right

Unnamed: 0,key1,key2,rval
0,foo,one,4
1,foo,one,5
2,bar,one,6
3,bar,two,7


In [33]:
pd.merge(left, right, on=['key1','key2'], how='outer')

Unnamed: 0,key1,key2,lval,rval
0,bar,one,3.0,6.0
1,bar,two,,7.0
2,foo,one,1.0,4.0
3,foo,one,1.0,5.0
4,foo,two,2.0,


In [34]:
pd.merge(left, right, on='key1')

Unnamed: 0,key1,key2_x,lval,key2_y,rval
0,foo,one,1,one,4
1,foo,one,1,one,5
2,foo,two,2,one,4
3,foo,two,2,one,5
4,bar,one,3,one,6
5,bar,one,3,two,7


In [35]:
pd.merge(left, right, on='key1', suffixes=('_left', '_right'))

Unnamed: 0,key1,key2_left,lval,key2_right,rval
0,foo,one,1,one,4
1,foo,one,1,one,5
2,foo,two,2,one,4
3,foo,two,2,one,5
4,bar,one,3,one,6
5,bar,one,3,two,7
