# Joining Data with Pandas

## Data Merging Basis

- ### Inner Join
Pandas is a powerful tool for manipulating and transforming data.


Tables = Dataframes
Merging = Joning

In [2]:
import pandas as pd
df_1 = pd.DataFrame({
    'id' : ['1', '2', '3'],
    'name' : ['Adham', 'Amy', 'Aya'],
    'age' : [25, 26, 27]
})

df_1.rename(columns={'name':'godparents'})

df_2 = pd.DataFrame({
    'id': ['1','2','4'],
    'name': ['Ahmed', 'Ashraf', 'Ayman'],
    'hobby': ['football', 'basketball', 'sing']
})

df_2.rename(columns={'name':'person_name'})

Unnamed: 0,id,person_name,hobby
0,1,Ahmed,football
1,2,Ashraf,basketball
2,4,Ayman,sing


In [4]:
df_merge = df_1.merge(df_2, on='id', suffixes=('_A', '_B'))
df_merge

Unnamed: 0,id,name_A,age,name_B,hobby
0,1,Adham,25,Ahmed,football
1,2,Amy,26,Ashraf,basketball


- ### One-to-Many relationship

Note : We are grouping the data by columnA, and we will add the values of column columnB
```python
counted_df = data.groupby("columnA").agg({'columnB':'count'})
```

- #### Merging Multiple DataFrames
Structure to join more than 2 databases:

```python
dataABC = dataA.merge(dataB, on=[col, col1]).merge(dataC, on='col')
```

#### Merging Tables With Different Join Types

- ##### Left join
left combination

Returns the rows of data from the table on the left and only the rows that match the table on the left

```python
data_left_join = dataA.merge(dataB, on['colKey'], how='left')
```

- ##### Right join
right combination

Returns the rows of data from the table on the left and only the rows that match the table on the right

```python
data_left_join = dataA.merge(dataB, on['colKey'], how='right')
```

- ### Outer join

```python
data_outer_join = dataA.merge(dataB, on='col', how='outer', suffix=('_A', '_B'))
```


- #### Merging a table to itself
Union of the same table - AUTOUNION

```python
data_itself_join = data.merge(data, left_on=['colKey'], right_on=['colKey2'], suffixes=('_A', '_B'))
```


## Advanced Merging and Concatenating

- ### Filtering joins

- ##### semi-join

**Semi-joins**
- Returns the intersection, similar to an inner join
- Returns only columns from the left table and not the right table
- No duplicates

```python
dataAB = dataA.merge(dataB, on='colKey')
data_semi_join = dataA[dataA['colKey'].isin(dataAB['colKey'])]
```


- ##### anti-join

**Anti-joins**
- Returns the left table, excluding the intersection
- Returns only columns from the left table and not the right table

```python
dataAB = dataA.merge(dataB, on='colKey', how='left', indicator=True)
column_list = dataAB.loc[dataAB['_merge']=='left_only','colKey']
data_anti_join = dataA[dataA['colKey'].isin(column_list)]
```


## Merging Ordered and Time-Series Data

- ### Using merge_ordered()

```python
pd.merge_ordered(appl, mcd, on='date',suffixes=('_aapl','_mcd'),fill_method='ffill',how='left')
```

- ### Using merge_asof()

```python
pd.merge_asof(visa, ibm, on=['date_time'],suffixes=('_visa','_ibm'),direction='forward')
```