# Concatenation, join & merge

[Reference lesson](https://www.kaggle.com/residentmario/renaming-and-combining)<br>
Summary: They all work for combining datasets but vary in complexity.

### Initialization

In [1]:
import pandas as pd

df1 = pd.DataFrame({
    'col1': [1, 3, 5, 7],
    'col2': [10, 3, 1, 6],
}).rename_axis('indexes')

df2 = pd.DataFrame({
    'col1': [2, 3, 9, 3],
    'col2': [14, 8, 0, 1],
}).rename_axis('indexes')

df1

Unnamed: 0_level_0,col1,col2
indexes,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1,10
1,3,3
2,5,1
3,7,6


### Concatenation

In [2]:
# Concatenation will basically join two dfs vertically as long as they share the same columns (must send an iterable argument (like a Python list))
df_concat = pd.concat([df1, df2])  # Concat it's a function from pandas, there's no equivalent method for DataFrame objects
df_concat

Unnamed: 0_level_0,col1,col2
indexes,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1,10
1,3,3
2,5,1
3,7,6
0,2,14
1,3,8
2,9,0
3,3,1


### Join

In [3]:
# Join is the next combining method in terms of complexity, it'll combine dfs horizontally based on indexes.
df_join = df1.join(df2, lsuffix="_firstDF", rsuffix='_secondDF')  # suffixes are necesary when columns are called the same on both dfs
df_join  # The first df it's the left one, the second one it's the right one

Unnamed: 0_level_0,col1_firstDF,col2_firstDF,col1_secondDF,col2_secondDF
indexes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,10,2,14
1,3,3,3,8
2,5,1,9,0
3,7,6,3,1


### Merge

In [4]:
# It's like a more customizable join(), on= let's YOU define the Foreign key by which to join tables. how= Let's you define the type of join (inner, left, right, etc.)
df_merge = df1.merge(df2, on='indexes', how='inner', suffixes=['_firstDF', '_secondDF'])  # suffixes=['_x', '_y'], how='inner' by default
df_merge  # The first df it's the left one, the second one it's the right one 

Unnamed: 0_level_0,col1_firstDF,col2_firstDF,col1_secondDF,col2_secondDF
indexes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,10,2,14
1,3,3,3,8
2,5,1,9,0
3,7,6,3,1
