In [6]:
import pandas as pd
import numpy as np

For convenience, define a specical function which creates a `DataFrame` of a particular form:

In [4]:
def make_df(cols, ind):
    """Quickly make a DataFrame"""
    data = {c: [str(c) + str(i) for i in ind] 
            for c in cols}
    print(data)
    
    return pd.DataFrame(data, ind)

In [5]:
make_df('ABC', range(3))

{'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2'], 'C': ['C0', 'C1', 'C2']}


Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2


Create a quick class to display multiple `DataFrame` side by side:

In [24]:
class Display(object):
    """Display HTML representation of multiple objects"""
    
    template = """<div style="float: left; padding: 10px;">
    <p style='font-family:"Courier New", Courier, monospace'>{0}</p>{1}
    </div>"""
    
    def __init__(self, *args):
        self.args = args
        
        
    # IPython use `_repr_html_` to implement its rich object display
    def _repr_html_(self):
        return '\n'.join(self.template.format(a, eval(a)._repr_html_())
                         for a in self.args)
    
    def __repr__(self):
        return '\n\n'.join(a + '\n' + repr(eval(a))
                           for a in self.args)

## Recall: Concatenation of Numpy Arrays

In [10]:
x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]
np.concatenate([x, y, z])

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
x = [[1, 2],
     [3, 4]]

np.concatenate([x, x], axis=1)

array([[1, 2, 1, 2],
       [3, 4, 3, 4]])

Concatenation of `Series` and `DataFrame` objects is very similar to concatenation of Numpy arrays.

## Simple Concatenation with `pd.concat`

`Series`:

In [12]:
ser1 = pd.Series(list('ABC'), index=range(1, 4))
ser1

1    A
2    B
3    C
dtype: object

In [13]:
ser2 = pd.Series(list('DEF'), index=range(4, 7))
ser2

4    D
5    E
6    F
dtype: object

In [14]:
pd.concat([ser1, ser2])

1    A
2    B
3    C
4    D
5    E
6    F
dtype: object

`DataFrame`:

In [15]:
df1 = make_df('AB', [1, 2])
df1

{'A': ['A1', 'A2'], 'B': ['B1', 'B2']}


Unnamed: 0,A,B
1,A1,B1
2,A2,B2


In [18]:
df2 = make_df('AB', [3, 4])
df2

{'A': ['A3', 'A4'], 'B': ['B3', 'B4']}


Unnamed: 0,A,B
3,A3,B3
4,A4,B4


In [26]:
pd.concat([df1, df2]) # by default concatenate row-wise

Unnamed: 0,A,B
1,A1,B1
2,A2,B2
3,A3,B3
4,A4,B4


In [28]:
df3 = make_df('AB', [0, 1])
df3

{'A': ['A0', 'A1'], 'B': ['B0', 'B1']}


Unnamed: 0,A,B
0,A0,B0
1,A1,B1


In [29]:
df4 = make_df('CD', [0, 1])
df4

{'C': ['C0', 'C1'], 'D': ['D0', 'D1']}


Unnamed: 0,C,D
0,C0,D0
1,C1,D1


In [32]:
# Column-wise concatenate
pd.concat([df3, df4], axis='columns')

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1


### Duplicate indices

Pandas concatenation preserves indices, even if the result will have duplicate indices! 

In [33]:
x = make_df('AB', [0, 1])
x

{'A': ['A0', 'A1'], 'B': ['B0', 'B1']}


Unnamed: 0,A,B
0,A0,B0
1,A1,B1


In [34]:
y = make_df('AB', [2, 3])
y.index = x.index
y

{'A': ['A2', 'A3'], 'B': ['B2', 'B3']}


Unnamed: 0,A,B
0,A2,B2
1,A3,B3


In [35]:
pd.concat([x, y], axis='rows')

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
0,A2,B2
1,A3,B3


Notice the repeated indices in the result. While this is valid within `DataFrame`s, the outcome is often undesirable.

#### Catch the repeats as an error

If you'd like to simply verify that the indices in the result of `pd.concat()` do not overlap, you can specify the `verify_integrity` flag. **With this set to True, the concatenation will raise an exception if there are duplicate indices.**

In [36]:
try:
    pd.concat([x, y], verify_integrity=True)
except ValueError as e:
    print("ValueError:", e)

ValueError: Indexes have overlapping values: Int64Index([0, 1], dtype='int64')


#### Ignore the index

Sometimes the index itself does not matter, and we would prefer it to simply be ignored. 

This option can be specified using the `ignore_index` flag. **With this set to true, the concatenation will create a new integer index for the resulting `Series`**

In [37]:
pd.concat([x, y], ignore_index=True)

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3


#### Add `MultiIndex` keys

Use the `keys` option to specify a label for the data sources; the result will be a hierarchically indexed series containing the data

In [39]:
pd.concat([x, y], keys=['x', 'y'])

Unnamed: 0,Unnamed: 1,A,B
x,0,A0,B0
x,1,A1,B1
y,0,A2,B2
y,1,A3,B3


### `append`

Because direct array concatenation is so common, `Series` and `DataFrame` objects have an `append` method that can accomplish the same thing in fewer keystrokes. 

In [40]:
df1

Unnamed: 0,A,B
1,A1,B1
2,A2,B2


In [41]:
df2

Unnamed: 0,A,B
3,A3,B3
4,A4,B4


In [42]:
df1.append(df2)

Unnamed: 0,A,B
1,A1,B1
2,A2,B2
3,A3,B3
4,A4,B4


the `append()` method in Pandas does not modify the original objectâ€“instead it creates a new object with the combined data. 

-> It also is not a very efficient method, because it involves creation of a new index *and* data buffer. Thus, **if you plan to do multiple `append` operations, it is generally better to build a list of `DataFrame`s and pass them all at once to the `concat()` function.**