## Venn Diagram of Merging

StackOverflow has an excellent discussion of different types of merging WITH VISUALS: https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join

Key Visual Summary: https://i.stack.imgur.com/hMKKt.jpg
- **Concatenate** - Return all rows with NaN for missing data
- **Inner Merge** - Return rows with column matches in both dataframes
- **Outer Merge** - Return rows with column matches in either dataframe

## Concatenate

The idea to concatenating is to take the two dataframes and stack them on one another. Any missing data (columns defined in one dataframe and not in the other) are treated as NaN. **There is no attempt to match and exclude data.** 

In [None]:
import pandas as pd

df_1 = pd.DataFrame({'col_1': [1, 2], 'col_2': [3, 4]})
df_2 = pd.DataFrame({'col_1': [11, 12], 'col_3': [13, 14]})
concat_df = pd.concat([df_1, df_2])

concat_df

Notice that some values were changed from integers to float. We will deal with that in a future lesson.

## Inner Merge

In [None]:
df_1 = pd.DataFrame({'col_1': [1, 2], 'col_2': [3, 4], 'col_3': [13, 14]})
df_2 = pd.DataFrame({'col_1': [1, 2], 'col_3': [3, 24]})
inner_merge_df = pd.merge(df_1, df_2, how="inner")

inner_merge_df

Since there are no perfectly matched rows, the output is empty. But what if we wanted to check for partial matches? We can define the columns we want to merge on with **on=[]**.

In [None]:
df_1 = pd.DataFrame({'col_1': [1, 2], 'col_2': [3, 4], 'col_3': [13, 14]})
df_2 = pd.DataFrame({'col_1': [1, 12], 'col_2': [100, 200], 'col_3': [13, 14]})
inner_merge_df = pd.merge(df_1, df_2, how="inner", on=['col_1', 'col_3'])
# Since the first row has col_1 = 1 for both dataframes, they are merged

inner_merge_df

This is a useful trick when looking for partial matches. Notice we had matches for col_1 and col_3? We did an inner merge and the conflicts for df_1 and df_2 are saved as col_2_x and col_2_y. 

## Outer Merge

In [None]:
df_1 = pd.DataFrame({'col_1': [1, 2], 'col_2': [3, 4], 'col_3': [13, 14]})
df_2 = pd.DataFrame({'col_1': [1, 12], 'col_2': [100, 200], 'col_3': [13, 14]})
outer_merge_df = pd.merge(df_1, df_2, how="outer", on=['col_1'])

outer_merge_df

Notice that we defined an outer merge on col_1. This means our match of col_1 = 1 was combined, with the conflicting values of the other columns listed. The other two rows are included since they have values defined for col_1. 

In [None]:
df_1 = pd.DataFrame({'col_1': [1, 2], 'col_2': [3, 4], 'col_3': [13, 14]})
df_2 = pd.DataFrame({'col_1': [1, 12], 'col_3': [13, 14]})
outer_merge_df = pd.merge(df_1, df_2, how="outer")

outer_merge_df

Notice how row 1 of df_2 is not included since it matches everywhere it is defined with row 1 of df_1.

In [None]:
df_1 = pd.DataFrame({'col_1': [1, 2], 'col_2': [3, 4]})
df_2 = pd.DataFrame({'col_1': [11, 12], 'col_3': [13, 14]})
outer_merge_df = pd.merge(df_1, df_2, how="outer")

outer_merge_df

When there are no matching rows, an outer merge will look like a concatenate. **The main difference here is outer merge TRIES to combine copies while concatenate does not!**

## Note

If you looked at the key visual summary, you'll see there are more ways to combine data than concatenate, inner merge, and outer merge. However, these will be the three most common ways to merge that you will use in the course.