# Merging DataFrames

In [None]:
import numpy as np
import pandas as pd

## Our Dataset
- Our datasets are spread across multiple files in this section. Each file has a `restaurant_` prefix.
- The `customers.csv` file stores our restaurant's customers.
- The `foods.csv` file stores our restaurant's menu items.
- The `week_1_sales` and `week_2_sales` files store our orders.

In [None]:
food = pd.read_csv('restaurant_foods.csv')
customers = pd.read_csv('restaurant_customers.csv')
week1 = pd.read_csv('restaurant_week_1_sales.csv')
week2 = pd.read_csv('restaurant_week_2_sales.csv')

## The pd.concat Function I
- The `concat` function concatenates one **DataFrame** to the end of another.
- The original index labels will be kept by default. Set `ignore_index` to True to generate a new index.
- The `keys` parameter create a **MultiIndex** using the specified keys/labels.

In [None]:
pd.concat([week1, week2], ignore_index=False)

In [None]:
pd.concat([week1, week2], ignore_index=True)

In [None]:
pd.concat([week1, week2], keys=['First week', 'Second week'])

## The pd.concat Function II
- Pandas will concatenate the **DataFrames** along the row/index axis.
- Pandas will include all columns that exist in either **DataFrame**. If there are no matching values, pandas will use `NaN` values.
- We can pass the `axis` parameter an argument of `"columns"` to concatenate on the column axis.

In [None]:
df1 = pd.DataFrame(data=[1, 2, 3], columns=['A'])
df1

In [None]:
df2 = pd.DataFrame(data=[4, 5, 6], columns=['B'])
df2

In [None]:
pd.concat([df1, df2])

In [None]:
pd.concat([df1, df2], axis='index')

In [None]:
pd.concat([df1, df2], axis='columns')

## Left Joins
- The `merge` method joins two **DataFrames** together based on shared values in a column or an index.
- A left join merges one **DataFrame** into another based on values in the first one.
- The "left" **DataFrame** is the one we invoke the `merge` method on.
- If the left **DataFrame's** value is not found in the right **DataFrame**, the row will hold `NaN` values.
<img src="SQL_Joins.png" width="800" height="800"/>

In [None]:
week1.merge(right=food, how='left', on='Food ID')

## The left_on and right_on Parameters
- The `left_on` and `right_on` parameters designate the column names from each **DataFrame** to use in the merge.

In [None]:
week1

In [None]:
customers

In [None]:
week1.merge(right=customers, how='left', left_on='Customer ID', right_on='ID')

In [None]:
week1.merge(right=customers, how='left', left_on='Customer ID', right_on='ID').drop('ID', axis='columns')
week1.merge(right=customers, how='left', left_on='Customer ID', right_on='ID').drop('ID', axis=1)

## Inner Joins I
- Inner joins merge two tables based on *shared*/*common* values in columns.
- If only one **DataFrame** has a value, pandas will exclude it from the final results set.
- If the same ID occurs multiple times, pandas will store each possible combination of the values.
- The design of the join ensures that the results will be the same no matter what **DataFrame** the `merge` method is invoked upon.
<img src="SQL_Joins.png" width="800" height="800"/>

In [None]:
week1

In [None]:
week2

In [None]:
week1.merge(right=week1, how='inner', on='Customer ID')

In [None]:
week1.merge(right=week1, how='inner', on='Customer ID', suffixes=(' - Week 1', ' - Week 2'))

## Inner Joins II
- We can pass multiple arguments to the `on` parameter of the `merge` method. Pandas will require matches in both columns across the **DataFrames**.

In [None]:
week1

In [None]:
week2

In [None]:
week1.merge(right=week2, how='inner', on=['Customer ID', 'Food ID'])

## Full/Outer Join
- A **full/outer** joins values that are found in either **DataFrame** or both **DataFrames**.
- Pandas does not mind if a value exists in one **DataFrame** but not the other.
- If a value does not exist in one **DataFrame**, it will have a `NaN`.

<img src="SQL_Joins.png" width="800" height="800"/>

In [None]:
week1

In [None]:
week2

In [None]:
merged = week1.merge(right=week2, how='outer', on='Customer ID', suffixes=(' - Week 1', ' - Week 2'), indicator=True)
merged

In [None]:
merged.rename(columns={'_merge': 'Merge'}, inplace=True)
merged

In [None]:
merged[merged['Merge'].isin(['left_only', 'right_only'])]

In [None]:
merged[merged['Merge'] == 'both']

## Merging by Indexes with the left_index and right_index Parameters
- Use the `on` parameter if the column(s) to be matched on have the same names in both **DataFrames**.
- Use the `left_on` and `right_on` parameters if the column(s) to be matched on have different names in the two **DataFrames**.
- Use the `left_index` or `right_index` parameters (set to True) to if the values to be matched on are found in the index of a **DataFrame**.

In [62]:
week1

Unnamed: 0,Customer ID,Food ID
0,537,9
1,97,4
2,658,1
3,202,2
4,155,9
...,...,...
245,413,9
246,926,6
247,134,3
248,396,6


In [67]:
food = pd.read_csv('restaurant_foods.csv', index_col='Food ID')
food

Unnamed: 0_level_0,Food Item,Price
Food ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Sushi,3.99
2,Burrito,9.99
3,Taco,2.99
4,Quesadilla,4.25
5,Pizza,2.49
6,Pasta,13.99
7,Steak,24.99
8,Salad,11.25
9,Donut,0.99
10,Drink,1.75


In [70]:
customers = pd.read_csv('restaurant_customers.csv', index_col='ID')
customers

Unnamed: 0_level_0,First Name,Last Name,Gender,Company,Occupation
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Joseph,Perkins,Male,Dynazzy,Community Outreach Specialist
2,Jennifer,Alvarez,Female,DabZ,Senior Quality Engineer
3,Roger,Black,Male,Tagfeed,Account Executive
4,Steven,Evans,Male,Fatz,Registered Nurse
5,Judy,Morrison,Female,Demivee,Legal Assistant
...,...,...,...,...,...
996,Debra,Garcia,Female,Dazzlesphere,Structural Engineer
997,Douglas,Bishop,Male,Livepath,Developer I
998,Frank,Franklin,Male,Brainverse,Nurse Practicioner
999,Jessica,Burns,Female,Babbleblab,Financial Advisor


In [74]:
week1.merge(right=customers, how='left', left_on='Customer ID', right_index=True).merge(right=food, how='left', left_on='Food ID', right_index=True)

Unnamed: 0,Customer ID,Food ID,First Name,Last Name,Gender,Company,Occupation,Food Item,Price
0,537,9,Cheryl,Carroll,Female,Zoombeat,Registered Nurse,Donut,0.99
1,97,4,Amanda,Watkins,Female,Ozu,Account Coordinator,Quesadilla,4.25
2,658,1,Patrick,Webb,Male,Browsebug,Community Outreach Specialist,Sushi,3.99
3,202,2,Louis,Campbell,Male,Rhynoodle,Account Representative III,Burrito,9.99
4,155,9,Carolyn,Diaz,Female,Gigazoom,Database Administrator III,Donut,0.99
...,...,...,...,...,...,...,...,...,...
245,413,9,Diane,Bailey,Female,Wikibox,Technical Writer,Donut,0.99
246,926,6,Anne,Wagner,Female,Skyba,Legal Assistant,Pasta,13.99
247,134,3,Diana,Hall,Female,Quinu,Financial Advisor,Taco,2.99
248,396,6,Juan,Romero,Male,Zoonder,Analyst Programmer,Pasta,13.99


## The join Method
- The `join` method is a shortcut for concatenating two **DataFrames** when merging by index labels.