# __Merge__

###  `pd.merge()` â€” **SQL-like joins**

#### Parameters

| Parameter   | Default       | Meaning                                   |
| ----------- | ------------- | ----------------------------------------- |
| `left`      | â€”             | Left DataFrame                            |
| `right`     | â€”             | Right DataFrame                           |
| `on`        | `None`        | Column(s) to join on                      |
| `how`       | `'inner'`     | `'inner'`, `'left'`, `'right'`, `'outer'` |
| `left_on`   | `None`        | Left column if names differ               |
| `right_on`  | `None`        | Right column if names differ              |
| `suffixes`  | `('_x','_y')` | Rename overlapping columns                |
| `indicator` | `False`       | Adds column showing source (`_merge`)     |


#### ***Column to join on must be same / overlapping or aleast have some similarity and should be same name***

In [None]:
import pandas as pd

In [None]:
df1 = pd.DataFrame({
    'cust_id': [1, 2, 3,5],
    'name': ['Ali', 'Sara', 'Usman','Haroon']
})

df2 = pd.DataFrame({
    'cust_id': [1, 2, 3,4],
    'order_id': [101, 102, 103,104]
})
pd.merge(df1,df2)
# pd.merge(df2,df1,on='cust_id')

#  here 'cust_id' are same in both columns 

### `on='Column to join on'` -- **Parameter, on which column to join**

In [None]:
pd.merge(df1,df2,on='cust_id')

### `how= 'inner/left/right/outer'` -- **Parameter, type of join**

In [None]:
pd.merge(df1,df2,how='inner')
pd.merge(df1,df2,how='left')
pd.merge(df1,df2,how='right')
pd.merge(df1,df2,how='outer')

### `indicator= True/False` -- **Parameter adds column showing source `_merge`**

In [None]:
pd.merge(df1,df2,how='outer',indicator=True)


###  `left_index / right_index = True/False` -- **Parameters joins on matching row indexes** 

* **`left_index`**: Tells Pandas to ignore columns and match using the **Index** (Row Labels) of the **Left** table.
* **`right_index`**: Tells Pandas to ignore columns and match using the **Index** (Row Labels) of the **Right** table.


In [None]:
df1 = pd.DataFrame({
    'cust_id': [1, 2, 3, 5],
    'order_id': [101, 102, 103,104]
})

df2 = pd.DataFrame({
    'cust_id': [1, 2, 3, 4],
    'order_id': [101, 102, 103,104]
})

In [None]:
# Join strictly based on their Index
pd.merge(df1,df2,left_index=True,right_index=True)


### `left_on / right_on = `  -- **Parameters used different columns and match them**
* **`left_on`**: Tells Pandas to look for the matching key in a **Column** of the **Left** table.
* **`right_on`**: Tells Pandas to look for the matching key in a **Column** of the **Right** table.

In [None]:
left = pd.DataFrame({
    'cust_id': [1, 2, 3, 5,6],
    'transaction_id': [101, 102, 103,104,105]
})

right = pd.DataFrame({
    'cust_id': [1, 2, 3, 4,7],
    'order_id': [101, 102, 103,104,106]
})

In [None]:
pd.merge(left,right,left_on='transaction_id',right_on='order_id')


In [None]:
pd.merge(left,right,left_index=True,right_index=True)





### âš¡ Cheat Sheet

| Parameter | What it tells Pandas |
| --- | --- |
| `on='ID'` | Both tables have a column named `'ID'`. Use it. |
| `left_on='ID', right_on='uid'` | Left uses `'ID'`, Right uses `'uid'`. Match them. |
| `left_index=True` | Don't look for a column; use the **Left Index** as the key. |
| `right_index=True` | Don't look for a column; use the **Right Index** as the key. |

**In short:**
* **`_on`** = Use a Column.
* **`_index`** = Use the Index.

In [None]:
left = pd.DataFrame({
    'cust_id': [1, 2, 3, 5,6],
    'order_id': [101, 102, 103,104,105]
})

right = pd.DataFrame({
    'cust_id': [1, 2, 3, 4,7],
    'order_id': [101, 102, 103,104,106]
})

### `suffix=(name)` --**Paremeter rename overlapping columns**


In [None]:
pd.merge(left,right,left_index=True,right_index=True, suffixes=('ID','Number'))


---
---
# **Concat**

###  `pd.concat()` â€” **Stack or append data**

### ðŸ‘‰ Important Parameters

| Parameter      | Default   | Meaning                   |
| -------------- | --------- | ------------------------- |
| `objs`         | â€”         | List of DataFrames        |
| `axis`         | `0`       | `0` â†’ rows, `1` â†’ columns |
| `ignore_index` | `False`   | Reset index               |
| `join`         | `'outer'` | `'outer'` or `'inner'`    |
| `keys`         | `None`    | Create hierarchical index |


In [None]:
ser1=pd.Series([1,2,3,4])
ser2=pd.Series([11,12,13,14])

In [None]:
pd.concat([ser1,ser2])

In [None]:
df1=pd.DataFrame({'Name':['Haroon','Moiz','Moin','Sami'],'Dept':['CS','IT','SE','IT']})
df2=pd.DataFrame({ 'Name':['Ali','Jawad'],'Dept':['HR','FINANCE']})

In [None]:
df1=pd.DataFrame({ 'Name':['Ali','Jawad']})
df2=pd.DataFrame({'Name':['Haroon','Moiz','Moin','Sami'],'Dept':['CS','IT','SE','IT']})

pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)


In [None]:
df1=pd.DataFrame({ 'B':[1,2,3,4,5]})
df2=pd.DataFrame({'A':[11,22,33,44],'C':[55,66,77,88]})
pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)


In [None]:
# Row Wise Concatenation
pd.concat([df1,df2])

### `axis = 0/1` 

In [None]:
# Column Wise Concatenation
pd.concat([df1,df2],axis=1)

### `join = 'inner/outer'`-- **Parameter**
**Outer : Union (All Rows)**

**Inner : Intersection (Overlapping/common Indexes only)**


In [None]:
pd.concat([df1,df2],axis=1,join='inner')

In [None]:
pd.concat([df1,df2],axis=1,join='outer')

### `key=["labels"]` --**Parameter, Creates hierarchical Labels**

In [None]:
pd.concat([df1,df2],axis=1,keys=['Df1','Df2'])


In [None]:
pd.concat([df1,df2],axis=0,keys=['Df1','Df2'])


---
---