In [2]:
import pandas as pd

## Concate, Merge & Join in Pandas

- DataFrame object can be seen as a collection of Series in a table-like structure. At times, we get to see separate datasets for analysis. So, it sometimes becomes necessary to unify them. Hence, Pandas methods like merge, join and concatenate come in very handy.

**Concatenating DataFrames**

- Concatenation refers to how we combine DataFrames based on index or label of `rows or columns`. We use the `.concat()` method to achieve this in Pandas. Concatenation is a very efficient way of handling our large datasets: we can concatenate DataFrames both `vertically(axis=0)` and `horizontally(axis=1)`. We can also concatenate more than two DataFrames or Series at once.

In [3]:
dic1 =  {
    
        "City" : ["Simla", "Delhi", "Jaipur", "Bhopal"],
        "Temp" : [15, 33, 38, 35]
    
} 


dic2 = {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "Temp" : [39, 34, 36, 30]
    
}

dfnorth=pd.DataFrame(dic1)
dfsouth=pd.DataFrame(dic2)

In [4]:
dfnorth

Unnamed: 0,City,Temp
0,Simla,15
1,Delhi,33
2,Jaipur,38
3,Bhopal,35


In [5]:
dfsouth

Unnamed: 0,City,Temp
0,Chennai,39
1,Hyderabad,34
2,Vizag,36
3,Kochi,30


In [6]:
display(dfnorth,dfsouth)

Unnamed: 0,City,Temp
0,Simla,15
1,Delhi,33
2,Jaipur,38
3,Bhopal,35


Unnamed: 0,City,Temp
0,Chennai,39
1,Hyderabad,34
2,Vizag,36
3,Kochi,30


In [7]:
pd.concat([dfnorth,dfsouth])      

Unnamed: 0,City,Temp
0,Simla,15
1,Delhi,33
2,Jaipur,38
3,Bhopal,35
0,Chennai,39
1,Hyderabad,34
2,Vizag,36
3,Kochi,30


- Created a New dataframe with the combination of two dataframe(dfnorth & dfsouth).
- if we observe index is not continued from first dataframe to next dataframe. 
- to overcome this we can use `ignore_index=True` parameter in `.concat`

In [8]:
pd.concat([dfnorth,dfsouth],ignore_index=True)   

Unnamed: 0,City,Temp
0,Simla,15
1,Delhi,33
2,Jaipur,38
3,Bhopal,35
4,Chennai,39
5,Hyderabad,34
6,Vizag,36
7,Kochi,30


- While concating the dataframe with we want the two dataframe have diffrent index than we use `keys` parameter in concat. 

In [11]:
df_keys=pd.concat([dfnorth, dfsouth],keys=["North","South"])

In [12]:
df_keys

Unnamed: 0,Unnamed: 1,City,Temp
North,0,Simla,15
North,1,Delhi,33
North,2,Jaipur,38
North,3,Bhopal,35
South,0,Chennai,39
South,1,Hyderabad,34
South,2,Vizag,36
South,3,Kochi,30


In [15]:
df_keys.loc["North"]

Unnamed: 0,City,Temp
0,Simla,15
1,Delhi,33
2,Jaipur,38
3,Bhopal,35


In [17]:
df_keys.loc["North"]["Temp"]

0    15
1    33
2    38
3    35
Name: Temp, dtype: int64

- The issue with concat is it combine based on index or column labels when there is a diffent lables it takes as NaN value.

In [18]:
dic3 =  {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "Temp" : [15, 33, 38, 35]
    
} 


dic4 = {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "Humid" : [44, 40, 43, 42]
    
}


dftemp = pd.DataFrame(dic3)
dfhumid = pd.DataFrame(dic4)

In [19]:
display(dftemp, dfhumid)

Unnamed: 0,City,Temp
0,Chennai,15
1,Hyderabad,33
2,Vizag,38
3,Kochi,35


Unnamed: 0,City,Humid
0,Chennai,44
1,Hyderabad,40
2,Vizag,43
3,Kochi,42


In [23]:
pd.concat([dftemp, dfhumid])

Unnamed: 0,City,Temp,Humid
0,Chennai,15.0,
1,Hyderabad,33.0,
2,Vizag,38.0,
3,Kochi,35.0,
0,Chennai,,44.0
1,Hyderabad,,40.0
2,Vizag,,43.0
3,Kochi,,42.0


In [24]:
pd.concat([dftemp, dfhumid],axis=1)

Unnamed: 0,City,Temp,City.1,Humid
0,Chennai,15,Chennai,44
1,Hyderabad,33,Hyderabad,40
2,Vizag,38,Vizag,43
3,Kochi,35,Kochi,42


**Merging DataFrames**

- Merging can be defined as the process of combining two different datasets into one. Also, it is how we align rows from our dataset based on `common attributes or columns`. The joining is usually done on `columns or indexes`, whereby the key is the `common column the two DataFrames will join`.

- Pandas use the `.merge()` method to perform the merging. We can also merge either with any join types(inner, left, right and outer).

In [25]:
display(dftemp, dfhumid)

Unnamed: 0,City,Temp
0,Chennai,15
1,Hyderabad,33
2,Vizag,38
3,Kochi,35


Unnamed: 0,City,Humid
0,Chennai,44
1,Hyderabad,40
2,Vizag,43
3,Kochi,42


In [26]:
pd.merge(dftemp,dfhumid)     #Here in merge based on common column dataframes are combined

Unnamed: 0,City,Temp,Humid
0,Chennai,15,44
1,Hyderabad,33,40
2,Vizag,38,43
3,Kochi,35,42


In [27]:
dic5 =  {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "State" : ["TN", "TS","AP","KS"],
        "Temp" : [15, 33, 38, 35]
    
} 


dic6 = {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi", ],
        "State" : ["TN", "TS","AP","KS"],
        "Humid" : [44, 40, 43, 42]
    
}


dftemp = pd.DataFrame(dic5)
dfhumid = pd.DataFrame(dic6)

In [28]:
display(dftemp, dfhumid)

Unnamed: 0,City,State,Temp
0,Chennai,TN,15
1,Hyderabad,TS,33
2,Vizag,AP,38
3,Kochi,KS,35


Unnamed: 0,City,State,Humid
0,Chennai,TN,44
1,Hyderabad,TS,40
2,Vizag,AP,43
3,Kochi,KS,42


In [29]:
pd.merge(dftemp, dfhumid)

Unnamed: 0,City,State,Temp,Humid
0,Chennai,TN,15,44
1,Hyderabad,TS,33,40
2,Vizag,AP,38,43
3,Kochi,KS,35,42


- In merge we can also give the specified column on which merge should happened.

In [31]:
pd.merge(dftemp, dfhumid, on ="City")   #although it has two same columns on `City` they got merged

Unnamed: 0,City,State_x,Temp,State_y,Humid
0,Chennai,TN,15,TN,44
1,Hyderabad,TS,33,TS,40
2,Vizag,AP,38,AP,43
3,Kochi,KS,35,KS,42


- join types in merge(inner, left, right and outer).

In [36]:
dic3 =  {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "Temp" : [15, 33, 38, 35]
    
} 


dic4 = {
    
        "City" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "Humid" : [44, 40, 43, 42]
    
}


dftemp = pd.DataFrame(dic3)
dfhumid = pd.DataFrame(dic4)

In [37]:
display(dftemp, dfhumid)

Unnamed: 0,City,Temp
0,Chennai,15
1,Hyderabad,33
2,Vizag,38
3,Kochi,35


Unnamed: 0,City,Humid
0,Chennai,44
1,Hyderabad,40
2,Vizag,43
3,Kochi,42


In [38]:
pd.merge(dftemp, dfhumid )  # default is Inner

Unnamed: 0,City,Temp,Humid
0,Chennai,15,44
1,Hyderabad,33,40
2,Vizag,38,43
3,Kochi,35,42


In [40]:
pd.merge(dftemp, dfhumid, how="inner")

Unnamed: 0,City,Temp,Humid
0,Chennai,15,44
1,Hyderabad,33,40
2,Vizag,38,43
3,Kochi,35,42


In [41]:
pd.merge(dftemp, dfhumid, how = "left" )

Unnamed: 0,City,Temp,Humid
0,Chennai,15,44
1,Hyderabad,33,40
2,Vizag,38,43
3,Kochi,35,42


In [42]:
pd.merge(dftemp, dfhumid, how = "right" )

Unnamed: 0,City,Temp,Humid
0,Chennai,15,44
1,Hyderabad,33,40
2,Vizag,38,43
3,Kochi,35,42


In [44]:
pd.merge(dftemp, dfhumid, how = "cross" )

Unnamed: 0,City_x,Temp,City_y,Humid
0,Chennai,15,Chennai,44
1,Chennai,15,Hyderabad,40
2,Chennai,15,Vizag,43
3,Chennai,15,Kochi,42
4,Hyderabad,33,Chennai,44
5,Hyderabad,33,Hyderabad,40
6,Hyderabad,33,Vizag,43
7,Hyderabad,33,Kochi,42
8,Vizag,38,Chennai,44
9,Vizag,38,Hyderabad,40


**Joining DataFrames**

- The method of joining is where we use the `.join()` method to combine differently-indexed DataFrames into a new DataFrame. We can join using the argument ‘on’ , or join on two differently indexed DataFrames.

In [54]:
dic7 =  {
    
        "NCity" : ["Simla", "Delhi", "Jaipur", "Bhopal"],
        "Temp" : [15, 33, 38, 35]
    
} 


dic8 = {
    
        "SCity" : ["Chennai", "Hyderabad", "Vizag", "Kochi"],
        "Humid" : [39, 34, 36, 30]
}
    
df1 = pd.DataFrame(dic7)
df2 = pd.DataFrame(dic8)

In [55]:
display(df1,df2)

Unnamed: 0,NCity,Temp
0,Simla,15
1,Delhi,33
2,Jaipur,38
3,Bhopal,35


Unnamed: 0,SCity,Humid
0,Chennai,39
1,Hyderabad,34
2,Vizag,36
3,Kochi,30


In [56]:
df1.join(df2)    #join performs the same as concat

Unnamed: 0,NCity,Temp,SCity,Humid
0,Simla,15,Chennai,39
1,Delhi,33,Hyderabad,34
2,Jaipur,38,Vizag,36
3,Bhopal,35,Kochi,30


In [59]:
pd.concat([df1,df2],axis=1)    #concat=join

Unnamed: 0,NCity,Temp,SCity,Humid
0,Simla,15,Chennai,39
1,Delhi,33,Hyderabad,34
2,Jaipur,38,Vizag,36
3,Bhopal,35,Kochi,30
