### Merging 
When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables.


The key is the common column that the two DataFrames will be joined on. It’s a good practice to use keys which have unique values throughout the column to avoid unintended duplication of row values.



Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.


There are four basic ways to handle the join 



1-inner (default)


2-left


3-right

4-outer)

depending on which rows must retain their data.

Pandas provides a single function, merge, as the entry point for all standard database join operations between DataFrame objects −

#### 

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True)


Here, we have used the following parameters −

left − A DataFrame object.

right − Another DataFrame object.

on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.

left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.

right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.

left_index − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.

right_index − Same usage as left_index for the right DataFrame.

how − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. Each method has been described below.

sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.

### Merging a dataframe with one unique key combination


In [1]:

import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32],} 
   
# Define a dictionary containing employee data 
data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data1)
 
# Convert the dictionary into DataFrame  
df2 = pd.DataFrame(data2) 
  
print("df1 is :\n",df1,"\n\n df2 is : \n\n" ,df2) 




df1 is :
   key    Name  Age
0  K0     Jai   27
1  K1  Princi   24
2  K2  Gaurav   22
3  K3    Anuj   32 

 df2 is : 

   key    Address Qualification
0  K0     Nagpur         Btech
1  K1     Kanpur           B.A
2  K2  Allahabad          Bcom
3  K3    Kannuaj        B.hons


In [4]:
merge_left = pd.merge(df1, df2,how='inner',on='key')
merge_left

Unnamed: 0,key,Name,Age,Address,Qualification
0,K0,Jai,27,Nagpur,Btech
1,K1,Princi,24,Kanpur,B.A
2,K2,Gaurav,22,Allahabad,Bcom
3,K3,Anuj,32,Kannuaj,B.hons


In [3]:
merge_left = pd.merge(df1, df2,how='left',on='key')
merge_left

Unnamed: 0,key,Name,Age,Address,Qualification
0,K0,Jai,27,Nagpur,Btech
1,K1,Princi,24,Kanpur,B.A
2,K2,Gaurav,22,Allahabad,Bcom
3,K3,Anuj,32,Kannuaj,B.hons


In [5]:
merge_right = pd.merge(df1, df2,how='right',on='key')
merge_right

Unnamed: 0,key,Name,Age,Address,Qualification
0,K0,Jai,27,Nagpur,Btech
1,K1,Princi,24,Kanpur,B.A
2,K2,Gaurav,22,Allahabad,Bcom
3,K3,Anuj,32,Kannuaj,B.hons


In [6]:
merge_outer = pd.merge(df1, df2,how='outer',on='key')
merge_outer

Unnamed: 0,key,Name,Age,Address,Qualification
0,K0,Jai,27,Nagpur,Btech
1,K1,Princi,24,Kanpur,B.A
2,K2,Gaurav,22,Allahabad,Bcom
3,K3,Anuj,32,Kannuaj,B.hons


### from above results: merging result is same in all inner,left,right ,outer  joins the key is common and same value in both dataframes

####  MERGE METHOD	JOIN NAME	DESCRIPTION
left	 	Use keys from left frame only


right		Use keys from right frame only


outer		Use union of keys from both frames


inner	     Use intersection of keys from both frames

### left and right join 

In [7]:
import pandas as pd
df1 = pd.DataFrame({'key': ['K0', 'K8', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
   
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                          'C': ['C0', 'C1', 'C2', 'C3'],
                          'D': ['D0', 'D1', 'D2', 'D3']})


In [8]:
df1

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K8,A1,B1
2,K2,A2,B2
3,K3,A3,B3


In [9]:
df2

Unnamed: 0,key,C,D
0,K0,C0,D0
1,K1,C1,D1
2,K2,C2,D2
3,K3,C3,D3


In [10]:
## merging the data frame to the left
merge_left=pd.merge(df1,df2,how='left',on='key')
merge_left

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K8,A1,B1,,
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


In [15]:
#### merging the data frame to the left
merge_left1=pd.merge(df2,df1,how='left',on='key')
merge_left1

Unnamed: 0,key,C,D,A,B
0,K0,C0,D0,A0,B0
1,K1,C1,D1,,
2,K2,C2,D2,A2,B2
3,K3,C3,D3,A3,B3


In [12]:
##### merging the data frame to the right
merge_right=pd.merge(df1,df2,how='right',on='key')
merge_right

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,,,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


In [16]:
##### merging the data frame to the right
merge_right1=pd.merge(df2,df1,how='right',on='key')
merge_right1

Unnamed: 0,key,C,D,A,B
0,K0,C0,D0,A0,B0
1,K8,,,A1,B1
2,K2,C2,D2,A2,B2
3,K3,C3,D3,A3,B3


### inner join  : default join


Its main task is to combine the two DataFrames based on a join key and returns a new DataFrame. 


#### The returned DataFrame consists of only selected rows that have matching values in both of the original DataFrame.

In [17]:
# in inner : only matching column values comes 
merge_inner=pd.merge(df1,df2,how='inner',on='key')
merge_inner

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K2,A2,B2,C2,D2
2,K3,A3,B3,C3,D3


In [18]:
# in outer all keys would be present 
merge_outer=pd.merge(df1,df2,how='outer',on='key')
merge_outer

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K8,A1,B1,,
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3
4,K1,,,C1,D1


## taking :   all columns with names are same 

In [19]:
import pandas as pd
df1 = pd.DataFrame({'key': ['K0', 'K8', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
   
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                          'A': ['C0', 'C1', 'C2', 'C3'],
                          'B': ['D0', 'D1', 'D2', 'D3']})


In [20]:
merge_left=pd.merge(df1,df2,how='left',on='key')
merge_left

Unnamed: 0,key,A_x,B_x,A_y,B_y
0,K0,A0,B0,C0,D0
1,K8,A1,B1,,
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


In [None]:
## above output  _x _y suffix  names given to same column to remove ambiguity of same column

### join: 

it is same as merging only syntax is different and for same columns we have to give our own lsuffix name to remove mabiguity of same column name unlike in merge

In [21]:
## mandatory to give lsuffix for same colums it will not take its own 
import pandas as pd
df1 = pd.DataFrame({'key': ['K0', 'K8', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
   
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                          'A': ['C0', 'C1', 'C2', 'C3'],
                          'B': ['D0', 'D1', 'D2', 'D3']})


In [22]:
df2.join(df1,how='inner',lsuffix="_left",rsuffix="_right")

Unnamed: 0,key_left,A_left,B_left,key_right,A_right,B_right
0,K0,C0,D0,K0,A0,B0
1,K1,C1,D1,K8,A1,B1
2,K2,C2,D2,K2,A2,B2
3,K3,C3,D3,K3,A3,B3
