# Merging, Joining, Concatenating and Comparing Pandas Dataframe and Series

Pandas has a various set of utility functions allowing us to easily combine `Series` or `Dataframes`. Pandas also provides utilities functions to compare two `Series` or `Dataframe` and summarize their differences [[1](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)].

> This notebook uses the term of `objects` to refer to both `Series` and `Dataframe` or *either of them`.

In [1]:
import pandas as pd

## Concatenating objects using `concat()` function

[concat()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) function can be used to concatenate Pandas `objects` along a particular axis while performing optional set logic (union or intersection) of the indexes on the other axes.

### Concatenating dataframe whose indexes are different

In [2]:
df1 = pd.DataFrame(
    {
        'A': ['A0', 'A1', 'A2', 'A3'],
        'B': ['B0', 'B1', 'B2', 'B3'],
        'C': ['C0', 'C1', 'C2', 'C3']
    },
    index=[0, 1, 2, 3]
)
df1

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3


In [3]:
df2 = pd.DataFrame(
    {
        'A': ['A4', 'A5', 'A6', 'A7'],
        'B': ['B4', 'B5', 'B6', 'B7'],
        'C': ['C4', 'C5', 'C6', 'C7']
    },
    index=[4, 5, 6, 7]
)
df2

Unnamed: 0,A,B,C
4,A4,B4,C4
5,A5,B5,C5
6,A6,B6,C6
7,A7,B7,C7


In [4]:
df3 = pd.DataFrame(
    {
        'A': ['A8', 'A9', 'A10', 'A11'],
        'B': ['B8', 'B9', 'B10', 'B11'],
        'C': ['C8', 'C9', 'C10', 'C11']
    },
    index=[8, 9, 10, 11]
)
df3

Unnamed: 0,A,B,C
8,A8,B8,C8
9,A9,B9,C9
10,A10,B10,C10
11,A11,B11,C11


In [5]:
combined_df = pd.concat([df1, df2, df3])
combined_df

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3
4,A4,B4,C4
5,A5,B5,C5
6,A6,B6,C6
7,A7,B7,C7
8,A8,B8,C8
9,A9,B9,C9


In [6]:
# Rename the index column (so that later on when saving to a file the index column has a name)
combined_df.index.name = 'Index'
combined_df

Unnamed: 0_level_0,A,B,C
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3
4,A4,B4,C4
5,A5,B5,C5
6,A6,B6,C6
7,A7,B7,C7
8,A8,B8,C8
9,A9,B9,C9


In [7]:
# Save the combined dataframe to a csv file
combined_df.to_csv('./outputs/combined_df.csv')

In [8]:
# Read data from the csv file that was just output
file_combined_df = pd.read_csv('./outputs/combined_df.csv')
file_combined_df

Unnamed: 0,Index,A,B,C
0,0,A0,B0,C0
1,1,A1,B1,C1
2,2,A2,B2,C2
3,3,A3,B3,C3
4,4,A4,B4,C4
5,5,A5,B5,C5
6,6,A6,B6,C6
7,7,A7,B7,C7
8,8,A8,B8,C8
9,9,A9,B9,C9


### Concatenating dataframe whose indexes are the same

In [9]:
df4 = pd.DataFrame(
    [
        [1, 2, 3, 4],
        [10, 20, 30, 40],
        [100, 200, 300, 400]
    ],
    columns=['A', 'B', 'C', 'D']
)
df4

Unnamed: 0,A,B,C,D
0,1,2,3,4
1,10,20,30,40
2,100,200,300,400


In [10]:
df5 = df4 * 2
df5

Unnamed: 0,A,B,C,D
0,2,4,6,8
1,20,40,60,80
2,200,400,600,800


In [11]:
df6 = df4 * 3
df6

Unnamed: 0,A,B,C,D
0,3,6,9,12
1,30,60,90,120
2,300,600,900,1200


In [12]:
combined_df2 = pd.concat([df4, df5, df6])
combined_df2

Unnamed: 0,A,B,C,D
0,1,2,3,4
1,10,20,30,40
2,100,200,300,400
0,2,4,6,8
1,20,40,60,80
2,200,400,600,800
0,3,6,9,12
1,30,60,90,120
2,300,600,900,1200


In [13]:
combined_df2.iloc[3]

A    2
B    4
C    6
D    8
Name: 0, dtype: int64

In [14]:
# Rename the index column (so that later on when saving to a file the index column has a name)
combined_df2.index.name = 'Index'
combined_df2

Unnamed: 0_level_0,A,B,C,D
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,2,3,4
1,10,20,30,40
2,100,200,300,400
0,2,4,6,8
1,20,40,60,80
2,200,400,600,800
0,3,6,9,12
1,30,60,90,120
2,300,600,900,1200


In [15]:
# Save the combined result dataframe to a csv file
combined_df2.to_csv('./outputs/combined_df2.csv')

In [16]:
file_combined_df2 = pd.read_csv('./outputs/combined_df2.csv')
file_combined_df2

Unnamed: 0,Index,A,B,C,D
0,0,1,2,3,4
1,1,10,20,30,40
2,2,100,200,300,400
3,0,2,4,6,8
4,1,20,40,60,80
5,2,200,400,600,800
6,0,3,6,9,12
7,1,30,60,90,120
8,2,300,600,900,1200
