In [1]:
import numpy as np
import pandas as pd

## Concatenation
• We can add two dataframes by indexes or columns.

In [28]:
dataset1 = {
    "A": ["A1","A2","A3","A4"],
    "B": ["B1","B2","B3","B4"],
    "C": ["C1","C2","C3","C4"],
}

dataset2 = {
    "A": ["A5","A6","A7","A8"],
    "B": ["B5","B6","B7","B8"],
    "C": ["C5","C6","C7","C8"],
}

In [30]:
df1 = pd.DataFrame(dataset1, index = [1,2,3,4])
df2 = pd.DataFrame(dataset2, index = [5,6,7,8])

In [31]:
df1

Unnamed: 0,A,B,C
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3
4,A4,B4,C4


In [32]:
df2

Unnamed: 0,A,B,C
5,A5,B5,C5
6,A6,B6,C6
7,A7,B7,C7
8,A8,B8,C8


##### ◘ Let's add them:

In [33]:
pd.concat([df1,df2]) # You can check parameters by pressing Shift+Tab inside paranthesis

Unnamed: 0,A,B,C
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3
4,A4,B4,C4
5,A5,B5,C5
6,A6,B6,C6
7,A7,B7,C7
8,A8,B8,C8


 Here they are added by indexes, because axis value is "0" \
\
**Note:** It's important that these 2 dataframes are similar to each other. Both of them include **"A B C"** columns.\
\
Now let's try to add them by columns (axis = 1)

In [34]:
pd.concat([df1,df2], axis = 1) #As you can see it's not a really good view

Unnamed: 0,A,B,C,A.1,B.1,C.1
1,A1,B1,C1,,,
2,A2,B2,C2,,,
3,A3,B3,C3,,,
4,A4,B4,C4,,,
5,,,,A5,B5,C5
6,,,,A6,B6,C6
7,,,,A7,B7,C7
8,,,,A8,B8,C8


In **df1 part**, there is no "A5 B5 C5..." values, that's wht it's NaN\
In **df2 part** there is no "A1 B1 C1..." values, that's why it's NaN

***
## Join
• Adds a dataframe to other.

In [76]:
dataset1 = {
    "A" : ["A1","A2","A3","A4"],
    "B" : ["B1","B2","B3","B4"],
}

dataset2 = {
    "X" : ["X1","X2","X3"],
    "Y" : ["Y1","Y2","Y3"],
}

In [77]:
df1 = pd.DataFrame(dataset1, index = [1,2,3,4])
df2 = pd.DataFrame(dataset2, index = [1,2,3])

In [78]:
df1

Unnamed: 0,A,B
1,A1,B1
2,A2,B2
3,A3,B3
4,A4,B4


In [79]:
df2

Unnamed: 0,X,Y
1,X1,Y1
2,X2,Y2
3,X3,Y3


In [80]:
df1.join(df2) # In Shift+Tab window, you see "how = left" parameter as default. So that means, it adds from left

Unnamed: 0,A,B,X,Y
1,A1,B1,X1,Y1
2,A2,B2,X2,Y2
3,A3,B3,X3,Y3
4,A4,B4,,


It add df2 to df1 from left, that means:\
**df1** includes **"A1 B1"** in index 1, **df2** includes **"X1 Y2"** in index 1. **X1 Y2** are added from the _left_ to **A1 B1**, as you see in the output\
\
It's same for **A2 B2 X2 Y2** and **A3 B3 X3 Y3**\
\
But in **4th index**, for **X Y** columns, it returns **NaN** because in that index there are values in **df1** but there is **no values in df2**. That's why program returns only **A4 B4** 

### Let's do it in reverse:

In [81]:
df2.join(df1)

Unnamed: 0,X,Y,A,B
1,X1,Y1,A1,B1
2,X2,Y2,A2,B2
3,X3,Y3,A3,B3


As you can see, there is **no 4th index**, because we are adding **df1 to df2** and in df2 has only 3 indexes. That's why only 3 indexes are returned

***
## Merge
• Adds dataframes by columns.\
• You can choose which column will be the common column and program will add both dataframes by that column.

In [108]:
dataset1 = {
    "A" : ["A1","A2","A3"],
    "B" : ["B1","B2","B3"],
    "key" : ["K1","K2","K3"]
}

dataset2 = {
    "X" : ["X1","X2","X3","X4"],
    "Y" : ["Y1","Y2","Y3","Y4"],
    "key" : ["K1","K2","K5","K4"]
}

In [109]:
df1 = pd.DataFrame(dataset1,index = [1,2,3]) 
df2 = pd.DataFrame(dataset2,index = [1,2,3,4])

In [110]:
df1

Unnamed: 0,A,B,key
1,A1,B1,K1
2,A2,B2,K2
3,A3,B3,K3


In [111]:
df2

Unnamed: 0,X,Y,key
1,X1,Y1,K1
2,X2,Y2,K2
3,X3,Y3,K5
4,X4,Y4,K4


In [115]:
pd.merge(df1,df2, on = "key")

Unnamed: 0,A,B,key,X,Y
0,A1,B1,K1,X1,Y1
1,A2,B2,K2,X2,Y2


In Sihft+Tab window, there is **"on = None"** parameter as default. If you change it as **"key"**, _which is one of our column_, program will add columns that only includes **common key values**