# __Merge__

###  `pd.merge()` â€” **SQL-like joins**

#### Parameters

| Parameter   | Default       | Meaning                                   |
| ----------- | ------------- | ----------------------------------------- |
| `left`      | â€”             | Left DataFrame                            |
| `right`     | â€”             | Right DataFrame                           |
| `on`        | `None`        | Column(s) to join on                      |
| `how`       | `'inner'`     | `'inner'`, `'left'`, `'right'`, `'outer'` |
| `left_on`   | `None`        | Left column if names differ               |
| `right_on`  | `None`        | Right column if names differ              |
| `suffixes`  | `('_x','_y')` | Rename overlapping columns                |
| `indicator` | `False`       | Adds column showing source (`_merge`)     |


### ***Column to join on must be same / overlapping or aleast have some similarity and should be same name***

In [1]:
import pandas as pd

In [2]:
df1 = pd.DataFrame({
    'cust_id': [1, 2, 3,5],
    'name': ['Ali', 'Sara', 'Usman','Haroon']
})

df2 = pd.DataFrame({
    'cust_id': [1, 2, 3,4],
    'order_id': [101, 102, 103,104]
})
pd.merge(df1,df2)
# pd.merge(df2,df1,on='cust_id')

#  here 'cust_id' are same in both columns 

Unnamed: 0,cust_id,name,order_id
0,1,Ali,101
1,2,Sara,102
2,3,Usman,103


### `on='Column to join on'` -- **Parameter, on which column to join**

In [3]:
pd.merge(df1,df2,on='cust_id')

Unnamed: 0,cust_id,name,order_id
0,1,Ali,101
1,2,Sara,102
2,3,Usman,103


### `how= 'inner/left/right/outer'` -- **Parameter, type of join**

In [4]:
pd.merge(df1,df2,how='inner')
pd.merge(df1,df2,how='left')
pd.merge(df1,df2,how='right')
pd.merge(df1,df2,how='outer')

Unnamed: 0,cust_id,name,order_id
0,1,Ali,101.0
1,2,Sara,102.0
2,3,Usman,103.0
3,4,,104.0
4,5,Haroon,


### `indicator= True/False` -- **Parameter adds column showing source `_merge`**

In [5]:
pd.merge(df1,df2,how='outer',indicator=True)


Unnamed: 0,cust_id,name,order_id,_merge
0,1,Ali,101.0,both
1,2,Sara,102.0,both
2,3,Usman,103.0,both
3,4,,104.0,right_only
4,5,Haroon,,left_only


###  `left_index / right_index = True/False` -- **Parameters joins on matching row indexes** 

* **`left_index`**: Tells Pandas to ignore columns and match using the **Index** (Row Labels) of the **Left** table.
* **`right_index`**: Tells Pandas to ignore columns and match using the **Index** (Row Labels) of the **Right** table.


In [6]:
df1 = pd.DataFrame({
    'cust_id': [1, 2, 3, 5],
    'order_id': [101, 102, 103,104]
})

df2 = pd.DataFrame({
    'cust_id': [1, 2, 3, 4],
    'order_id': [101, 102, 103,104]
})

In [7]:
# Join strictly based on their Index
pd.merge(df1,df2,left_index=True,right_index=True)


Unnamed: 0,cust_id_x,order_id_x,cust_id_y,order_id_y
0,1,101,1,101
1,2,102,2,102
2,3,103,3,103
3,5,104,4,104


### `left_on / right_on = `  -- **Parameters used different columns and match them**
* **`left_on`**: Tells Pandas to look for the matching key in a **Column** of the **Left** table.
* **`right_on`**: Tells Pandas to look for the matching key in a **Column** of the **Right** table.

In [8]:
left = pd.DataFrame({
    'cust_id': [1, 2, 3, 5,6],
    'transaction_id': [101, 102, 103,104,105]
})

right = pd.DataFrame({
    'cust_id': [1, 2, 3, 4,7],
    'order_id': [101, 102, 103,104,106]
})

In [9]:
pd.merge(left,right,left_on='transaction_id',right_on='order_id')


Unnamed: 0,cust_id_x,transaction_id,cust_id_y,order_id
0,1,101,1,101
1,2,102,2,102
2,3,103,3,103
3,5,104,4,104


In [10]:
pd.merge(left,right,left_index=True,right_index=True)


Unnamed: 0,cust_id_x,transaction_id,cust_id_y,order_id
0,1,101,1,101
1,2,102,2,102
2,3,103,3,103
3,5,104,4,104
4,6,105,7,106





### âš¡ Cheat Sheet

| Parameter | What it tells Pandas |
| --- | --- |
| `on='ID'` | Both tables have a column named `'ID'`. Use it. |
| `left_on='ID', right_on='uid'` | Left uses `'ID'`, Right uses `'uid'`. Match them. |
| `left_index=True` | Don't look for a column; use the **Left Index** as the key. |
| `right_index=True` | Don't look for a column; use the **Right Index** as the key. |

**In short:**
* **`_on`** = Use a Column.
* **`_index`** = Use the Index.

In [11]:
left = pd.DataFrame({
    'cust_id': [1, 2, 3, 5,6],
    'order_id': [101, 102, 103,104,105]
})

right = pd.DataFrame({
    'cust_id': [1, 2, 3, 4,7],
    'order_id': [101, 102, 103,104,106]
})

### `suffix=(name)` --**Paremeter rename overlapping columns**


In [12]:
pd.merge(left,right,left_index=True,right_index=True, suffixes=('ID','Number'))


Unnamed: 0,cust_idID,order_idID,cust_idNumber,order_idNumber
0,1,101,1,101
1,2,102,2,102
2,3,103,3,103
3,5,104,4,104
4,6,105,7,106


---
---
# **Concat**

###  `pd.concat()` â€” **Stack or append data**

### ðŸ‘‰ Important Parameters

| Parameter      | Default   | Meaning                   |
| -------------- | --------- | ------------------------- |
| `objs`         | â€”         | List of DataFrames        |
| `axis`         | `0`       | `0` â†’ rows, `1` â†’ columns |
| `ignore_index` | `False`   | Reset index               |
| `join`         | `'outer'` | `'outer'` or `'inner'`    |
| `keys`         | `None`    | Create hierarchical index |


In [13]:
ser1=pd.Series([1,2,3,4])
ser2=pd.Series([11,12,13,14])

In [19]:
pd.concat([ser1,ser2])

0     1
1     2
2     3
3     4
0    11
1    12
2    13
3    14
dtype: int64

In [None]:
df1=pd.DataFrame({'Name':['Haroon','Moiz','Moin','Sami'],'Dept':['CS','IT','SE','IT']})
df2=pd.DataFrame({ 'Name':['Ali','Jawad'],'Dept':['HR','FINANCE']})

Unnamed: 0,Name,Dept
0,Haroon,CS
1,Moiz,IT
2,Moin,SE
3,Sami,IT
0,Ali,HR
1,Jawad,FINANCE


In [None]:
df1=pd.DataFrame({ 'Name':['Ali','Jawad']})
df2=pd.DataFrame({'Name':['Haroon','Moiz','Moin','Sami'],'Dept':['CS','IT','SE','IT']})

pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)


Unnamed: 0,Name,Name.1,Dept
0,Ali,Haroon,CS
1,Jawad,Moiz,IT
2,,Moin,SE
3,,Sami,IT


In [71]:
df1=pd.DataFrame({ 'B':[1,2,3,4,5]})
df2=pd.DataFrame({'A':[11,22,33,44],'C':[55,66,77,88]})
pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)


Unnamed: 0,B,A,C
0,1,11.0,55.0
1,2,22.0,66.0
2,3,33.0,77.0
3,4,44.0,88.0
4,5,,


In [30]:
# Row Wise Concatenation
pd.concat([df1,df2])

Unnamed: 0,Name,Dept
0,Haroon,CS
1,Moiz,IT
2,Moin,SE
3,Sami,IT
0,Ali,HR
1,Jawad,FINANCE


### `axis = 0/1` 

In [40]:
# Column Wise Concatenation
pd.concat([df1,df2],axis=1)

Unnamed: 0,Name,Dept,Name.1,Dept.1
0,Haroon,CS,Ali,HR
1,Moiz,IT,Jawad,FINANCE
2,Moin,SE,,
3,Sami,IT,,


### `join = 'inner/outer'`-- **Parameter**
**Outer : Union (All Rows)**

**Inner : Intersection (Overlapping/common Indexes only)**


In [None]:
pd.concat([df1,df2],axis=1,join='inner')

Unnamed: 0,Name,Dept,Name.1,Dept.1
0,Haroon,CS,Ali,HR
1,Moiz,IT,Jawad,FINANCE


In [51]:
pd.concat([df1,df2],axis=1,join='outer')

Unnamed: 0,Name,Dept,Name.1,Dept.1
0,Haroon,CS,Ali,HR
1,Moiz,IT,Jawad,FINANCE
2,Moin,SE,,
3,Sami,IT,,


### `key=["labels"]` --**Parameter, Creates hierarchical Labels**

In [60]:
pd.concat([df1,df2],axis=1,keys=['Df1','Df2'])


Unnamed: 0_level_0,Df1,Df1,Df2,Df2
Unnamed: 0_level_1,Name,Dept,Name,Dept
0,Haroon,CS,Ali,HR
1,Moiz,IT,Jawad,FINANCE
2,Moin,SE,,
3,Sami,IT,,


In [61]:
pd.concat([df1,df2],axis=0,keys=['Df1','Df2'])


Unnamed: 0,Unnamed: 1,Name,Dept
Df1,0,Haroon,CS
Df1,1,Moiz,IT
Df1,2,Moin,SE
Df1,3,Sami,IT
Df2,0,Ali,HR
Df2,1,Jawad,FINANCE
