# <center> Pandas Merging & Joining Data </center>

- [Simple Joining with Concat Function](#section_1)
- [Complex Joining with Merge Function](#section_2)

<hr>

### Pandas Merging & Joining Data <a class="anchor" id="section_0"></a>

In this lesson, we will learn the two most common ways to combine DataFrames in the Pandas library:

* **pd.concat([DataFrame1, DataFrame2]): Simple combining two or more Pandas dataframes in a column-wise or row-wise approach.**

* **pd.merge([DataFrame1, DataFrame2]): Complex column-wise combining of Pandas dataframes in a SQL-like way.**

In [1]:
import pandas as pd

In [4]:
# FIFA World Cup Winning Teams
df_fifa_world_cup_winners = pd.DataFrame({'year':[2018,2014,2010,2006,2002,1998],
                                         'winner':['France','Germany','Spain','Italy','Brazil','France'],
                                         'host_country':['Russia','Brazil','Soutth Africa','Germany','South Korea','Japan']})
# Display DataFrame
df_fifa_world_cup_winners

Unnamed: 0,year,winner,host_country
0,2018,France,Russia
1,2014,Germany,Brazil
2,2010,Spain,Soutth Africa
3,2006,Italy,Germany
4,2002,Brazil,South Korea
5,1998,France,Japan


In [9]:
# Rugby World Cup Winning Teams
df_rugby_world_cup_winners = pd.DataFrame({'year':[1999,2003,2007,2011,2015,2019],
                                          'winner':['Australia','England','South Africa','New Zealand','New Zealand','South Africa'],
                                          'host_country':['Wales','Australia','France','New Zealand','England','Japan'],
                                          'venue':['Millennium Stadium','Telstra Stadium','Stade de France','Eden Park', 'Twickenham','Nissan Stadium'],
                                          'attendance':[72500,82957,80430,61079,80125,70103]})
# Display DataFrame
df_rugby_world_cup_winners

Unnamed: 0,year,winner,host_country,venue,attendance
0,1999,Australia,Wales,Millennium Stadium,72500
1,2003,England,Australia,Telstra Stadium,82957
2,2007,South Africa,France,Stade de France,80430
3,2011,New Zealand,New Zealand,Eden Park,61079
4,2015,New Zealand,England,Twickenham,80125
5,2019,South Africa,Japan,Nissan Stadium,70103


### Simple Joining with Concat Function <a class="anchor" id="section_1"></a>

The `concat()` function is used to add together one or more DataFrames.

In [12]:
# Join the 2 DataFrames using the concat() method
df_teams = pd.concat([df_fifa_world_cup_winners[['year', 'winner', 'host_country']],
                     df_rugby_world_cup_winners[['year', 'winner', 'host_country']]])

# Display the DataFrame
df_teams

Unnamed: 0,year,winner,host_country
0,2018,France,Russia
1,2014,Germany,Brazil
2,2010,Spain,Soutth Africa
3,2006,Italy,Germany
4,2002,Brazil,South Korea
5,1998,France,Japan
0,1999,Australia,Wales
1,2003,England,Australia
2,2007,South Africa,France
3,2011,New Zealand,New Zealand


In [15]:
# Add data source index values to the new DataFrame
df_teams = pd.concat([df_fifa_world_cup_winners[['year','winner','host_country']],
                     df_rugby_world_cup_winners[['year','winner','host_country']]],
                    keys = ['soccer','rugby'])

# Display the DataFrame
df_teams

Unnamed: 0,Unnamed: 1,year,winner,host_country
soccer,0,2018,France,Russia
soccer,1,2014,Germany,Brazil
soccer,2,2010,Spain,Soutth Africa
soccer,3,2006,Italy,Germany
soccer,4,2002,Brazil,South Korea
soccer,5,1998,France,Japan
rugby,0,1999,Australia,Wales
rugby,1,2003,England,Australia
rugby,2,2007,South Africa,France
rugby,3,2011,New Zealand,New Zealand


In [18]:
# Ignore old index values in the new DataFrame
df_teams = pd.concat([df_fifa_world_cup_winners[['year','winner','host_country']],
                     df_rugby_world_cup_winners[['year','winner','host_country']]],
                    ignore_index = True)

# Display the DataFrame
df_teams

Unnamed: 0,year,winner,host_country
0,2018,France,Russia
1,2014,Germany,Brazil
2,2010,Spain,Soutth Africa
3,2006,Italy,Germany
4,2002,Brazil,South Korea
5,1998,France,Japan
6,1999,Australia,Wales
7,2003,England,Australia
8,2007,South Africa,France
9,2011,New Zealand,New Zealand


In [21]:
# The new DataFrame will include all original columns
df_teams = pd.concat([df_fifa_world_cup_winners,
                     df_rugby_world_cup_winners])

# Display the DataFrame
df_teams

Unnamed: 0,year,winner,host_country,venue,attendance
0,2018,France,Russia,,
1,2014,Germany,Brazil,,
2,2010,Spain,Soutth Africa,,
3,2006,Italy,Germany,,
4,2002,Brazil,South Korea,,
5,1998,France,Japan,,
0,1999,Australia,Wales,Millennium Stadium,72500.0
1,2003,England,Australia,Telstra Stadium,82957.0
2,2007,South Africa,France,Stade de France,80430.0
3,2011,New Zealand,New Zealand,Eden Park,61079.0


In [23]:
# The new DataFrame will include all original columns aligned horizontally
df_teams = pd.concat([df_fifa_world_cup_winners,
                     df_rugby_world_cup_winners], axis = 1)

# Display the DataFrame
df_teams

Unnamed: 0,year,winner,host_country,year.1,winner.1,host_country.1,venue,attendance
0,2018,France,Russia,1999,Australia,Wales,Millennium Stadium,72500
1,2014,Germany,Brazil,2003,England,Australia,Telstra Stadium,82957
2,2010,Spain,Soutth Africa,2007,South Africa,France,Stade de France,80430
3,2006,Italy,Germany,2011,New Zealand,New Zealand,Eden Park,61079
4,2002,Brazil,South Korea,2015,New Zealand,England,Twickenham,80125
5,1998,France,Japan,2019,South Africa,Japan,Nissan Stadium,70103


### Complex Joining with Merge Function <a class="anchor" id="section_2"></a>

Pandas `merge()` provides the functionality to join DataFrame and Series objects in a way similar to relational database operations.

In [24]:
# Create left DataFrame
df_left = pd.DataFrame({
    'key':['K0','K1','K2','K3','K4','K5'],
    'column_A':['A0','A1','A2','A3','A4','A5'],
    'column_B':['B0','B1','B2','B3','B4','B5']})

# Create right DataFrame
df_right = pd.DataFrame({
    'key':['K0','K1','K2','K3','K6'],
    'column_C':['C0','C1','C2','C3','C6'],
    'column_D':['D0','D1','D2','D3','D6']})

In [29]:
# Display DataFrame
df_right

Unnamed: 0,key,column_C,column_D
0,K0,C0,D0
1,K1,C1,D1
2,K2,C2,D2
3,K3,C3,D3
4,K6,C6,D6


In [9]:
# Display DataFrame


In [16]:
# Merge the two DataFrames using the common column
df_results = pd.merge(df_employees, df_departments,
                     on = 'department_id', how = 'outer',
                     indicator = True)

# Display the DataFrame
df_results

Unnamed: 0,employee_name,salary,department_id,department_name,department_location,_merge
0,Michael,2500.0,D1,IT,location_1,both
1,Alice,2000.0,D1,IT,location_1,both
2,Max,1500.0,D2,SALES,location_1,both
3,Janet,1000.0,D3,HR,location_2,both
4,Ali,500.0,D6,,,left_only
5,,,D4,R&D,location_2,right_only


<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

In [35]:
# Merge the two DataFrames using the common column
df_results = pd.merge(df_left, df_right, on = 'key',
                     how = 'outer', indicator = True)
# Display the DataFrame
df_results

Unnamed: 0,key,column_A,column_B,column_C,column_D,_merge
0,K0,A0,B0,C0,D0,both
1,K1,A1,B1,C1,D1,both
2,K2,A2,B2,C2,D2,both
3,K3,A3,B3,C3,D3,both
4,K4,A4,B4,,,left_only
5,K5,A5,B5,,,left_only
6,K6,,,C6,D6,right_only


In [3]:
# Create departments dataset
df_departments = pd.DataFrame(
    {'department_id':['D1','D2','D3','D4'],
    'department_name':['IT','SALES','HR','R&D'],
    'department_location':['location_1','location_1','location_2','location_2']})

# Create employees dataset
df_employees = pd.DataFrame(
    {'employee_name':['Michael','Alice','Max','Janet','Ali'],
    'salary':[2500,2000,1500,1000,500],
    'department_id':['D1','D1','D2','D3','D6']})

In [6]:
# Display DataFrame
df_employees

Unnamed: 0,employee_name,salary,department_id
0,Michael,2500,D1
1,Alice,2000,D1
2,Max,1500,D2
3,Janet,1000,D3
4,Ali,500,D6


In [None]:
# Display DataFrame
