# Pandas Tutorial Day 9

We have here two dataframes and we want to merge them into a single dataframe.

In [None]:
import pandas as pd

df1  = pd.DataFrame({
    'city' : ['mumbai', 'delhi', 'bangalore'],
    'temperatur' : [32, 45, 30]
})

df2 = pd.DataFrame({
    'city' : ['delhi', 'mumbai', 'bangalore'],
    'humidity' : [80, 60, 78]
})

Now we will merge both the dataframes into a single one using `merge`

In [None]:
df3 = pd.merge(df1, df2, on = 'city')
df3

Unnamed: 0,city,temperatur,humidity
0,mumbai,32,60
1,delhi,45,80


Now if some of the entries are different in both the dataframes, and if we use `merge` to combine the dataframes, we will only get the `cities` which are common in both the dataframes.

In [9]:
import pandas as pd

df1  = pd.DataFrame({
    'city' : ['mumbai', 'delhi', 'bangalore', 'baltimore'],
    'temperatur' : [32, 45, 30, 27]
})

df2 = pd.DataFrame({
    'city' : ['delhi', 'mumbai', 'pune'],
    'humidity' : [80, 60, 78]
})

df3 = pd.merge(df1, df2, on = 'city')
df3

Unnamed: 0,city,temperatur,humidity
0,mumbai,32,60
1,delhi,45,80


This happens because it gets only those values which are common to both tha dataframes, which means, it performs an intersection of both the sets. If we mention the way in which the `merge` must be performed, we can choose if we want to do a intersection or union.

In [12]:
df3 = pd.merge(df1, df2, on = 'city', how = 'outer')
df3

Unnamed: 0,city,temperatur,humidity
0,baltimore,27.0,
1,bangalore,30.0,
2,delhi,45.0,80.0
3,mumbai,32.0,60.0
4,pune,,78.0


We also have left and right methods, which is decided by the order in which the arguments are being passed.

In [None]:
# merging with all the elements in df1 and the intersection of df1 and df2
df4 = pd.merge(df1, df2, on = 'city', how = 'left')
df4

# merging with all the elements in df2 and the intersection of df1 and df2

df5 = pd.merge(df1, df2, on = 'city', how = 'right')
df5

     city  temperatur  humidity
0   delhi        45.0        80
1  mumbai        32.0        60
2    pune         NaN        78
     city  humidity
0   delhi        80
1  mumbai        60
2    pune        78


If we also want to know from there a particular entry in the merged dataframe came, we can use the `indicator` flag.

In [27]:
df5 = pd.merge(df1, df2, on = 'city', how = 'outer', indicator = True)
df5

Unnamed: 0,city,temperatur,humidity,_merge
0,baltimore,27.0,,left_only
1,bangalore,30.0,,left_only
2,delhi,45.0,80.0,both
3,mumbai,32.0,60.0,both
4,pune,,78.0,right_only


If we have common columns in both the dataframes, and want to keep them in the final dataframe, we can us `suffixes`

In [40]:
df1  = pd.DataFrame({
    'city' : ['mumbai', 'delhi', 'bangalore', 'baltimore'],
    'humidity' : [80, 60, 78, 45],
    'temperature' : [32, 45, 30, 27]
})

df2 = pd.DataFrame({
    'city' : ['delhi', 'mumbai', 'pune', 'ludhiana'],
    'humidity' : [80, 60, 78, 67],
    'temperature' : [32, 45, 30, 27]
})

df3 = pd.merge(df1, df2, on = 'city', how = 'outer', suffixes = ('_left', '_right'))
df3

Unnamed: 0,city,humidity_left,temperature_left,humidity_right,temperature_right
0,baltimore,45.0,27.0,,
1,bangalore,78.0,30.0,,
2,delhi,60.0,45.0,80.0,32.0
3,ludhiana,,,67.0,27.0
4,mumbai,80.0,32.0,60.0,45.0
5,pune,,,78.0,30.0
