### Data Combination

Data combination is the process of joining, merging or concatenating multiple pandas data structure into a single data structure.
The most common ways of combine dataframes are concatenation, joining and merge

### Concatenation

Using the `concat` function to combine two or more dataframes with the same columns either vertically or horizontally.

In [5]:
import pandas as pd

In [6]:
df_1 = pd.DataFrame({'A': [4, 2, 6], 'B': ['m', 'n', 'o']})
df_2 = pd.DataFrame({'A': [45, 8, 9], 'B': ['x', 'y', 'z']})

In [7]:
df_1

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [8]:
df_2

Unnamed: 0,A,B
0,45,x
1,8,y
2,9,z


In [9]:
# Vertical concatenation (row-wise)
vertical_com = pd.concat([df_1, df_2], axis=0, keys=["df_1", "df_2"]) 
vertical_com

Unnamed: 0,Unnamed: 1,A,B
df_1,0,4,m
df_1,1,2,n
df_1,2,6,o
df_2,0,45,x
df_2,1,8,y
df_2,2,9,z


In [10]:
vertical_com.loc["df_1"]

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [11]:
vertical_com.loc["df_1",0]

A    4
B    m
Name: (df_1, 0), dtype: object

In [12]:
vertical_com.loc["df_1",0]["A"]

4

In [13]:
vertical_com.loc["df_1",0]["A":"B"]

A    4
B    m
Name: (df_1, 0), dtype: object

In [14]:
# Vertical concatenation (row-wise)
vertical_com = pd.concat([df_1, df_2], axis=0, ignore_index=True) 
vertical_com

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o
3,45,x
4,8,y
5,9,z


In [15]:
# Horizontal concatenation (column-wise)
horizontal_com = pd.concat([df_1, df_2], axis=1, keys=["df_1","df_2"]) 
horizontal_com

Unnamed: 0_level_0,df_1,df_1,df_2,df_2
Unnamed: 0_level_1,A,B,A,B
0,4,m,45,x
1,2,n,8,y
2,6,o,9,z


In [16]:
horizontal_com["df_1"]

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [17]:
horizontal_com["df_1"][["A","B"]]

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [18]:
Accra =pd.DataFrame({'Town':['Circle','Madina','East Legon'],
                    'Humidity':[32,29,34],
                    'Temperature':[28,30,25]})
Kumasi =pd.DataFrame({'Town':['Kejetia','Amakom','Mantia'],
                    'Humidity':[27,39,36],
                    'Temperature':[30,26,24]})


In [19]:
df_3 = pd.concat([Accra,Kumasi], ignore_index=True)
df_3

Unnamed: 0,Town,Humidity,Temperature
0,Circle,32,28
1,Madina,29,30
2,East Legon,34,25
3,Kejetia,27,30
4,Amakom,39,26
5,Mantia,36,24


In [20]:
df_4 = pd.DataFrame({'Windspeed':[8,5,7,8,6,4,9,10,5]})
df_4

Unnamed: 0,Windspeed
0,8
1,5
2,7
3,8
4,6
5,4
6,9
7,10
8,5


In [21]:
pd.concat([df_3,df_4], ignore_index=True, axis=0)

Unnamed: 0,Town,Humidity,Temperature,Windspeed
0,Circle,32.0,28.0,
1,Madina,29.0,30.0,
2,East Legon,34.0,25.0,
3,Kejetia,27.0,30.0,
4,Amakom,39.0,26.0,
5,Mantia,36.0,24.0,
6,,,,8.0
7,,,,5.0
8,,,,7.0
9,,,,8.0


In [22]:
pd.concat([df_3,df_4], ignore_index=True, axis=1)

Unnamed: 0,0,1,2,3
0,Circle,32.0,28.0,8
1,Madina,29.0,30.0,5
2,East Legon,34.0,25.0,7
3,Kejetia,27.0,30.0,8
4,Amakom,39.0,26.0,6
5,Mantia,36.0,24.0,4
6,,,,9
7,,,,10
8,,,,5


### Merge


The `merge` function is used to combine two or more dataframes based on a common column(s)

In [23]:
df_5 = pd.DataFrame({'Names': ['Ama', 'Barry', 'Celestine', 'Dela'], 'Score1': [1, 2, 3, 4]})
df_6 = pd.DataFrame({'Names': ['Barry', 'Dela', 'Emma', 'Frank'], 'Score2': [5, 6, 7, 8]})

In [24]:
df_5

Unnamed: 0,Names,Score1
0,Ama,1
1,Barry,2
2,Celestine,3
3,Dela,4


In [25]:
df_6

Unnamed: 0,Names,Score2
0,Barry,5
1,Dela,6
2,Emma,7
3,Frank,8


In [26]:
pd.merge(df_5, df_6, on='Names', how='inner')

Unnamed: 0,Names,Score1,Score2
0,Barry,2,5
1,Dela,4,6


In [27]:
pd.merge(df_5, df_6, on='Names', how='outer')

Unnamed: 0,Names,Score1,Score2
0,Ama,1.0,
1,Barry,2.0,5.0
2,Celestine,3.0,
3,Dela,4.0,6.0
4,Emma,,7.0
5,Frank,,8.0


In [28]:
pd.merge(df_5, df_6, on='Names', how='left')

Unnamed: 0,Names,Score1,Score2
0,Ama,1,
1,Barry,2,5.0
2,Celestine,3,
3,Dela,4,6.0


In [29]:
pd.merge(df_5, df_6, on='Names', how='right')

Unnamed: 0,Names,Score1,Score2
0,Barry,2.0,5
1,Dela,4.0,6
2,Emma,,7
3,Frank,,8


In [30]:
# The two dataframeshaving differnt column names
df_7 = pd.DataFrame({'Name1': ['Ama', 'Barry', 'Celestine', 'Dela'], 'Score': [1, 2, 3, 4]})
df_8 = pd.DataFrame({'Name2': ['Barry', 'Dela', 'Emma', 'Frank'], 'Score': [5, 6, 7, 8]})
merged_df = pd.merge(df_7, df_8, left_on='Name1', right_on='Name2', how='inner', suffixes=('_df7', '_df8'))
merged_df

Unnamed: 0,Name1,Score_df7,Name2,Score_df8
0,Barry,2,Barry,5
1,Dela,4,Dela,6


In [31]:
df_7.merge(df_8,left_on='Name1', right_on='Name2', how='inner', suffixes=('_df7', '_df8'))

Unnamed: 0,Name1,Score_df7,Name2,Score_df8
0,Barry,2,Barry,5
1,Dela,4,Dela,6


###  Join

The `join()` function id used to combine two dataframes on their indexes

In [32]:
df_9 = pd.DataFrame({'value1': [1, 2, 3, 4]}, index=['A', 'B', 'C', 'D'])
df_10 = pd.DataFrame({'value2': [5, 6, 7, 8]}, index=['B', 'D', 'E', 'F'])


In [33]:
df_9

Unnamed: 0,value1
A,1
B,2
C,3
D,4


In [34]:
df_10

Unnamed: 0,value2
B,5
D,6
E,7
F,8


In [39]:
df_9.join(df_10, how="inner")

Unnamed: 0,value1,value2
B,2,5
D,4,6


In [35]:
joined =df_9.join(df_10, how='left', lsuffix='_left', rsuffix='_right')
joined

Unnamed: 0,value1,value2
A,1,
B,2,5.0
C,3,
D,4,6.0


### append

The append function is used to concatenate two DataFrames along a particular axis.

In [42]:
df_11 = pd.DataFrame({'A': [5, 2, 6], 'B': [9, 15, 5]})
df_12 = pd.DataFrame({'A': [12, 8, 19], 'B': [6, 15, 12]})

In [43]:
df_11

Unnamed: 0,A,B
0,5,9
1,2,15
2,6,5


In [44]:
df_12

Unnamed: 0,A,B
0,12,6
1,8,15
2,19,12


In [45]:
df_11.append(df_12)

  df_11.append(df_12)


Unnamed: 0,A,B
0,5,9
1,2,15
2,6,5
0,12,6
1,8,15
2,19,12


In [46]:
df_11.append(df_12, ignore_index=True)

  df_11.append(df_12, ignore_index=True)


Unnamed: 0,A,B
0,5,9
1,2,15
2,6,5
3,12,6
4,8,15
5,19,12
