# Concatenating Along an Axis

Another kind of data combination operation is alternatively referred to as concatenation, binding, or stacking. NumPy has a concatenate function for doing this with raw NumPy arrays:

In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame, Series

In [2]:
arr = np.arange(12).reshape(3,4)

arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [3]:
np.concatenate([arr, arr], axis= 1)

array([[ 0,  1,  2,  3,  0,  1,  2,  3],
       [ 4,  5,  6,  7,  4,  5,  6,  7],
       [ 8,  9, 10, 11,  8,  9, 10, 11]])

In the context of pandas objects such as Series and DataFrame, having labeled axes enable you to further generalize array concatenation. In particular, you have a number of additional things to think about:

- If the objects are indexed differently on the other axes, should the collection of
axes be unioned or intersected?
- Do the groups need to be identifiable in the resulting object?
- Does the concatenation axis matter at all?

he concat function in pandas provides a consistent way to address each of these concerns. I’ll give a number of examples to illustrate how it works. Suppose we have three Series with no index overlap:

In [4]:
s1 = Series([0,1], index=list('ab'))

s2 = Series([2,3,4], index= list('cde'))

s3 = Series([5,6], index=list('fg'))

In [5]:
s1,s2,s3

(a    0
 b    1
 dtype: int64,
 c    2
 d    3
 e    4
 dtype: int64,
 f    5
 g    6
 dtype: int64)

In [6]:
pd.concat([s1,s2,s3])

a    0
b    1
c    2
d    3
e    4
f    5
g    6
dtype: int64

By default concat works along axis=0, producing another Series. If you pass axis=1, the result will instead be a DataFrame (axis=1 is the columns):

In [7]:
pd.concat([s1, s2, s3], axis= 1)

Unnamed: 0,0,1,2
a,0.0,,
b,1.0,,
c,,2.0,
d,,3.0,
e,,4.0,
f,,,5.0
g,,,6.0


In this case there is no overlap on the other axis, which as you can see is the sorted union (the 'outer' join) of the indexes. You can instead intersect them by passing join='inner':

In [8]:
s4 = pd.concat([s1*5, s3])

s4

a    0
b    5
f    5
g    6
dtype: int64

In [9]:
pd.concat([s1, s4], axis= 1)

Unnamed: 0,0,1
a,0.0,0
b,1.0,5
f,,5
g,,6


In [10]:
pd.concat([s1, s4], axis= 1, join= 'inner')

Unnamed: 0,0,1
a,0,0
b,1,5


> the by default of concat is join = 'outer'. While the by default of merge is how = 'inner'

You can even specify the axes to be used on the other axes with reindex:

In [11]:
data = pd.concat([s1, s4], axis= 1)

In [12]:
data.reindex(['a', 'c', 'b', 'e'])

Unnamed: 0,0,1
a,0.0,0.0
c,,
b,1.0,5.0
e,,


One issue is that the concatenated pieces are not identifiable in the result. Suppose instead you wanted to create a hierarchical index on the concatenation axis. To do this, use the keys argument:

In [13]:
res = pd.concat([s1, s1, s3], keys= ['one', 'two', 'three'])

In [14]:
res

one    a    0
       b    1
two    a    0
       b    1
three  f    5
       g    6
dtype: int64

In [15]:
res.unstack()

Unnamed: 0,a,b,f,g
one,0.0,1.0,,
two,0.0,1.0,,
three,,,5.0,6.0


In the case of combining Series along axis=1, the keys become the DataFrame column headers:

In [16]:
pd.concat([s1, s2, s3], axis=1, keys=['one', 'two', 'three'])

Unnamed: 0,one,two,three
a,0.0,,
b,1.0,,
c,,2.0,
d,,3.0,
e,,4.0,
f,,,5.0
g,,,6.0


The same logic extends to DataFrame objects:

In [17]:
df1 = DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],
                columns=['one', 'two'])

In [18]:
df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],
                columns=['three', 'four'])

In [19]:
df1, df2

(   one  two
 a    0    1
 b    2    3
 c    4    5,
    three  four
 a      5     6
 c      7     8)

In [20]:
pd.concat([df1, df2], axis = 1, keys=['level1', 'level2'])

Unnamed: 0_level_0,level1,level1,level2,level2
Unnamed: 0_level_1,one,two,three,four
a,0,1,5.0,6.0
b,2,3,,
c,4,5,7.0,8.0


If you pass a dict of objects instead of a list, the dict’s keys will be used for the keys option:

In [23]:
pd.concat({'level1': df1, 'level2':df2}, axis=1)

Unnamed: 0_level_0,level1,level1,level2,level2
Unnamed: 0_level_1,one,two,three,four
a,0,1,5.0,6.0
b,2,3,,
c,4,5,7.0,8.0


In [26]:
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'],
            names=['upper', 'lower'])

upper,level1,level1,level2,level2
lower,one,two,three,four
a,0,1,5.0,6.0
b,2,3,,
c,4,5,7.0,8.0


A last consideration concerns DataFrames in which the row index is not meaningful in the context of the analysis:

In [27]:
df1 = DataFrame(np.random.randn(3, 4), columns=['a', 'b', 'c', 'd'])

df2 = DataFrame(np.random.randn(2, 3), columns=['b', 'd', 'a'])

In [28]:
df1, df2

(          a         b         c         d
 0 -2.823207  0.354491  0.989606 -1.626060
 1  2.388484 -2.187131 -0.946199 -0.802956
 2  1.619345 -0.297916 -0.019669  1.832497,
           b         d         a
 0 -0.834046 -1.369765  0.963209
 1 -1.321913 -0.684172 -0.349918)

In this case, you can pass ignore_index=True:

In [31]:
pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,a,b,c,d
0,-2.823207,0.354491,0.989606,-1.62606
1,2.388484,-2.187131,-0.946199,-0.802956
2,1.619345,-0.297916,-0.019669,1.832497
3,0.963209,-0.834046,,-1.369765
4,-0.349918,-1.321913,,-0.684172


![concat function arguments](../../Pictures/concat%20function%20arguments.png)