## **Merging Data**

### **Left Merge***

#### The **left merge** returns a **DataFrame**, which has all **rows of the DataFrame placed on the left side** of the merge() function. Those rows of the left DataFrame, which do **not have a corresponding matching** value in the **right DataFrame**, are then assigned **NaN** values.


In [6]:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'pointer':['A', 'B', 'C', 'B', 'A', 'D'], 
                    'value_df1':[0,1,2,3,4,5]})

df2 = pd.DataFrame({'pointer':['B', 'C', 'B', 'D', 'E'], 
                    'value_df2':[6, 7, 8, 9, 12]})

display(df1)
display(df2)

print("Left Merged DataFrame\n")

display(pd.merge(df1, df2, how = 'left')) # Performing a left merge

Unnamed: 0,pointer,value_df1
0,A,0
1,B,1
2,C,2
3,B,3
4,A,4
5,D,5


Unnamed: 0,pointer,value_df2
0,B,6
1,C,7
2,B,8
3,D,9
4,E,12


Left Merged DataFrame



Unnamed: 0,pointer,value_df1,value_df2
0,A,0,
1,B,1,6.0
2,B,1,8.0
3,C,2,7.0
4,B,3,6.0
5,B,3,8.0
6,A,4,
7,D,5,9.0


### **Right Merge**

#### The right merge returns a DataFrame that has all the rows of the DataFrame placed on the right side of the merge() function. The rows to the right DataFrame that do not have a corresponding matching value in the left DataFrame are assigned NaN values.

In [10]:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'pointer':['A', 'B', 'C', 'B', 'A', 'D'], 
                    'value_df1':[0,1,2,3,4,5]})

df2 = pd.DataFrame({'pointer':['B', 'Z', 'C', 'B','D','E'], 
                    'value_df2':[6,7,8,9,10,11]})

display(df1)
display(df2)


print("Right Merged DataFrame\n")
display(pd.merge(df1, df2, how = 'right')) # Performing a right merge

Unnamed: 0,pointer,value_df1
0,A,0
1,B,1
2,C,2
3,B,3
4,A,4
5,D,5


Unnamed: 0,pointer,value_df2
0,B,6
1,Z,7
2,C,8
3,B,9
4,D,10
5,E,11


Right Merged DataFrame



Unnamed: 0,pointer,value_df1,value_df2
0,B,1.0,6
1,B,3.0,6
2,Z,,7
3,C,2.0,8
4,B,1.0,9
5,B,3.0,9
6,D,5.0,10
7,E,,11


### **Outer Merge**

#### This function returns all the **rows of both the DataFrames given** in the merge() function. The **rows that don’t** get matched in either case are assigned **NaN** values.

In [12]:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'pointer':['A', 'B', 'C', 'B', 'A', 'D'], 
                    'value_df1':[0,1,2,3,4,5]})

df2 = pd.DataFrame({'pointer':['B', 'Z', 'C', 'B','D','E'], 
                    'value_df2':[6,7,8,9,10,11]})

display(df1)
display(df2)

print("Outer Merged DataFrame\n")
display(pd.merge(df1, df2, how = 'outer')) # Performing an outer merge

Unnamed: 0,pointer,value_df1
0,A,0
1,B,1
2,C,2
3,B,3
4,A,4
5,D,5


Unnamed: 0,pointer,value_df2
0,B,6
1,Z,7
2,C,8
3,B,9
4,D,10
5,E,11


Outer Merged DataFrame



Unnamed: 0,pointer,value_df1,value_df2
0,A,0.0,
1,A,4.0,
2,B,1.0,6.0
3,B,1.0,9.0
4,B,3.0,6.0
5,B,3.0,9.0
6,C,2.0,8.0
7,D,5.0,10.0
8,Z,,7.0
9,E,,11.0


### **Merge on multiple columns**

In [2]:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'column1':['Pak', 'USA', 'Pak', 'UK', 'Ind','None'], #Column 1
                    'column2':['A', 'B', 'C', 'B', 'A', 'D'],            #Column 2
                    'value_df1':[0,1,2,3,4,5]})

df2 = pd.DataFrame({'column1':['USA', 'UK', 'None', 'USA', 'Pak','Ind'], #Column 1
                    'column2':['B', 'Z', 'C', 'B','D','E'],              #Column 2
                    'value_df2':[6,7,8,9,10,11]})

display(df1)
display(df2)

print("Outer Merged DataFrame on Multiple Columns\n")
display(pd.merge(df1, df2, on = ['column1', 'column2'], how = 'outer'))

Unnamed: 0,column1,column2,value_df1
0,Pak,A,0
1,USA,B,1
2,Pak,C,2
3,UK,B,3
4,Ind,A,4
5,,D,5


Unnamed: 0,column1,column2,value_df2
0,USA,B,6
1,UK,Z,7
2,,C,8
3,USA,B,9
4,Pak,D,10
5,Ind,E,11


Outer Merged DataFrame on Multiple Columns



Unnamed: 0,column1,column2,value_df1,value_df2
0,Pak,A,0.0,
1,USA,B,1.0,6.0
2,USA,B,1.0,9.0
3,Pak,C,2.0,
4,UK,B,3.0,
5,Ind,A,4.0,
6,,D,5.0,
7,UK,Z,,7.0
8,,C,,8.0
9,Pak,D,,10.0


### **Merge on Index**

#### This behavior and technique is almost the same. The only difference is that now the column of one DataFrame is merged with the index of another DataFrame.

In [5]:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'pointer':['A', 'B', 'C', 'B', 'A', 'D'], 
                    'value_df1':[0,1,2,3,4,5]})

df2 = pd.DataFrame(np.arange(10,13,1), index = ['A', 'B','C'], columns = ['values'])

display(df1)
display(df2)

print("Merged on index\n")
print(pd.merge(df1, df2, left_on='pointer', right_index=True))

Unnamed: 0,pointer,value_df1
0,A,0
1,B,1
2,C,2
3,B,3
4,A,4
5,D,5


Unnamed: 0,values
A,10
B,11
C,12


Merged on index

  pointer  value_df1  values
0       A          0      10
4       A          4      10
1       B          1      11
3       B          3      11
2       C          2      12


#### The right_index=True indicates merging the chosen column on the indexes of the right DataFrame. Similarly, parameters like right_on and left_index can also be used depending on the problem.

## **Mapping Data**

#### This technique is used to map the values of a Series or a DataFrame. The current values in a **Series or a DataFrame** are made **equivalent to some other values**. Then, the pandas map function is used to either replace the mapped values or join them together. The **map()** function can also be used to fill in the **values of new columns.**

In [9]:
import pandas as pd

df = pd.DataFrame({'City':['Lahore', 'Mumbai', 'Karachi', 'London'],
                   'AQI':[147, 166, 152, 81]})

print("The Original DataFrame")
display(df)

dict_map = {'Lahore':'Pakistan', 'Karachi':'Pakistan', 'Mumbai':'India', 'London':'UK'}

df['Country'] = df['City'].map(dict_map)

print("The Mapped DataFrame")
display(df)

The Original DataFrame


Unnamed: 0,City,AQI
0,Lahore,147
1,Mumbai,166
2,Karachi,152
3,London,81


The Mapped DataFrame


Unnamed: 0,City,AQI,Country
0,Lahore,147,Pakistan
1,Mumbai,166,India
2,Karachi,152,Pakistan
3,London,81,UK


## **Removing duplicated data**