# ADVANCED PANDAS: DATA JOINING & MERGING

## Course Outline:
- Introduction to Data Wrangling
    - Case-study: Data Preprocessing for The Absolute Beginners
- Data Cleaning & Preparation
    - Data Cleaning (Missing & Duplicated Data)
    - String Manipulation (Regular Expression)
    - Data Transformation
- ***Merging, Joining, and Concatenating Data***
    - ***concat()***
    - ***merge()***
    - ***join()***
- Aggregation and Grouping
    - groupby()
- Reshaping and Pivoting
    - pivot()
    - pivot_table()
    - crosstab()

##### Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

In [None]:
men04 = pd.read_csv('data/men2004.csv')

In [None]:
men08 = pd.read_csv('data/men2008.csv')

In [None]:
men04_det = pd.read_csv('data/men2004_det.csv')

In [None]:
men08_det = pd.read_csv('data/men2008_det.csv')

==========

# Merging, Joining, and Concatenating (Multiple DataFrames)

##### Adding Rows to a Dataframe Using concat() & append()
Concatenating dataframes works as a glue on combining dataframes

In [None]:
men04.head()

In [None]:
men08.head()

In [None]:
# merging dataframes using append() function
men04.append(men08,ignore_index=True)

In [None]:
# merging dataframes using concat() function
men_concat = pd.concat([men04,men08], ignore_index=False, keys=[2004,2008], names=['Year'])#.reset_index().drop(columns='level_1')
men_concat

##### Merging Two DataFrames with merge()
Merge combine dataframes based on thier content, that's the big difference. It's also very similar to SQL basic concepts

In [None]:
from IPython.display import Image
Image("data/merge.png")

In [None]:
men04.head()

In [None]:
men04.shape

In [None]:
men08.head()

In [None]:
men08.shape

In [None]:
# Inspecting the concatenated dataframes to checkout its shape
men_concat.shape

In [None]:
# Search for unique values in the concatenated dataframes
men_concat['Athlete'].nunique()

#### OUTER JOIN

In [None]:
from IPython.display import Image
Image("data/join05.png")

In [None]:
# Merging the two dataframes using 'outer' technique
men_outer = men04.merge(men08, how='outer', on='Athlete', suffixes=['_2004', '_2008'], indicator=True)#.fillna(0)
men_outer

In [None]:
men_outer['_merge'].value_counts()

In [None]:
# Outer join without the intersection
men_outer[men_outer['_merge'] != 'both'].reset_index(drop=True)

In [None]:
# Left join without the intersection
men_outer[men_outer['_merge'] == 'left_only'].reset_index(drop=True)

In [None]:
# Left join without the intersection
men_outer[men_outer['_merge'] == 'right_only'].reset_index(drop=True)

In [None]:
# Let's change the column name for a specific dataframe
men2004 = men04.copy()
men2004.columns = ['Name', 'Medals']
men2004.head()

In [None]:
men08.head()

In [None]:
# Merging the two dataframes with unmatched columns names
men_left_right = men2004.merge(men08, how='outer', left_on='Name', right_on='Athlete', suffixes=['_2004','_2008'], indicator=True)
men_left_right

In [None]:
men_left_right['Name'].fillna(men_left_right['Athlete'], inplace=True)

In [None]:
men_left_right

In [None]:
men_left_right.drop(['Athlete','_merge'], axis=1, inplace=True)

In [None]:
men_left_right

#### INNER JOIN

In [None]:
from IPython.display import Image
Image("data/join04.png")

In [None]:
men_inner = men04.merge(men08, how='inner', on='Athlete', suffixes=['_2004', '_2008'], indicator=True)
men_inner

#### LEFT JOIN

In [None]:
from IPython.display import Image
Image("data/join02.png")

In [None]:
men_left = men04.merge(men08, how='left', on='Athlete', suffixes=['_2004', '_2008'], indicator=True)
men_left.head()

#### RIGHT JOIN

In [None]:
from IPython.display import Image
Image("data/join03.png")

In [None]:
men_right = men04.merge(men08, how='right', on='Athlete', suffixes=['_2004', '_2008'], indicator=True)
men_right.head()

##### Merging on more than one column

In [None]:
men04_det

In [None]:
men08_det

In [None]:
men_det = men04_det.merge(men08_det, how='inner', on=['Athlete','Medal'], suffixes=['_2004','_2008'])
men_det

In [None]:
# Now you can use the new merged dataframes to get a summary table
men_det.pivot_table(index='Medal', aggfunc=np.sum, margins=True)

##### Merging Two DataFrames with join()

In [None]:
men04.head()

In [None]:
men08.head()

In [None]:
# You can do the merging using join() function
men04.join(men08, how='outer', lsuffix='_2004', rsuffix='_2008')

==========

# THANK YOU!