# Appending and Concatenating DataFrames

Pandas provides the `append` and `concat` methods for merging dataframes or series together.

`append` - stacks the series/dataframe vertically, general syntax `s1.appand(s2)`. It appends objects without adjusting the index values.

`concat` - can join series or dataframes either vertically (stacking) or horizontally, it can concat row-wise or column-wise. The function is called on the pandas object, `pd`, with the function taking a list of series/dataframes to be concatenated, e.g. `pd.concat([s1, s2, s3])`. It takes the `axis=rows` (default - stack), or `axis=columns` to concatenate horizontally. 

In [8]:
import pandas as pd
import numpy as np
from glob import glob

one = pd.Series(list('abcdef'))
two = pd.Series(list('ghijkl'))
three = pd.Series(list('mnopqr'))

one.append(two).append(three).index

Int64Index([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype='int64')

In [9]:
# ensure that the current index is dropped with 'drop=True'
one.append(two).append(three).reset_index(drop=True).index

RangeIndex(start=0, stop=18, step=1)

In [11]:
# the same can be done with concat, again you have to reset the index
pd.concat([one, two, three]).reset_index(drop=True).index

RangeIndex(start=0, stop=18, step=1)

Alternatively if we call `pd.concat` with the `ignore_index=True` argument, there is no need to reset the index.

In [13]:
pd.concat([one, two, three], ignore_index=True).index

RangeIndex(start=0, stop=18, step=1)

In [15]:
jan = pd.read_csv('./data/Sales/sales-jan-2015.csv', parse_dates=True, index_col='Date')
feb = pd.read_csv('./data/Sales/sales-feb-2015.csv', parse_dates=True, index_col='Date')
mar = pd.read_csv('./data/Sales/sales-mar-2015.csv', parse_dates=True, index_col='Date')

In [23]:
quarter = pd.concat([jan, feb, mar])

In [24]:
quarter.loc['jan 27, 2015':'feb 2, 2015']

Unnamed: 0_level_0,Company,Product,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-01-27 07:11:55,Streeplex,Service,18
2015-02-02 08:33:01,Hooli,Software,3
2015-02-02 20:54:49,Mediacore,Hardware,9


In [25]:
quarter.loc['feb 26, 2015':'mar 7, 2015']

Unnamed: 0_level_0,Company,Product,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-02-26 08:57:45,Streeplex,Service,4
2015-02-26 08:58:51,Streeplex,Service,1
2015-03-06 10:11:45,Mediacore,Software,17
2015-03-06 02:03:56,Mediacore,Software,17


`concat` will concatenate a list of series.

In [27]:
pd.concat([jan['Units'], feb['Units'], mar['Units']])[:5]

Date
2015-01-21 19:13:21    11
2015-01-09 05:23:51     8
2015-01-06 17:19:34    17
2015-01-02 09:51:06    16
2015-01-11 14:51:02    11
Name: Units, dtype: int64

When using `append` to stack two (or more) dataframes with columns that do not match, pandas will stack them as normal and add `NaN` values to any row that does not posses that column.

In [28]:
cars = pd.read_csv('./data/cars.csv', index_col=0)
cars

Unnamed: 0,cars_per_cap,country,drives_right
US,809,United States,True
AUS,731,Australia,False
JAP,588,Japan,False
IN,18,India,False
RU,200,Russia,True
MOR,70,Morocco,True
EG,45,Egypt,True


In [29]:
brics = pd.read_csv('./data/brics.csv', index_col=0)
brics

Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0
SA,South Africa,Pretoria,1.221,52.98


In [32]:
cars.append(brics)

Unnamed: 0,area,capital,cars_per_cap,country,drives_right,population
US,,,809.0,United States,True,
AUS,,,731.0,Australia,False,
JAP,,,588.0,Japan,False,
IN,,,18.0,India,False,
RU,,,200.0,Russia,True,
MOR,,,70.0,Morocco,True,
EG,,,45.0,Egypt,True,
BR,8.516,Brasilia,,Brazil,,200.4
RU,17.1,Moscow,,Russia,,143.5
IN,3.286,New Delhi,,India,,1252.0


`concat` by default (`axis='rows'` or `axis=0`) does the same.

In [35]:
pd.concat([cars, brics], sort=True)

Unnamed: 0,area,capital,cars_per_cap,country,drives_right,population
US,,,809.0,United States,True,
AUS,,,731.0,Australia,False,
JAP,,,588.0,Japan,False,
IN,,,18.0,India,False,
RU,,,200.0,Russia,True,
MOR,,,70.0,Morocco,True,
EG,,,45.0,Egypt,True,
BR,8.516,Brasilia,,Brazil,,200.4
RU,17.1,Moscow,,Russia,,143.5
IN,3.286,New Delhi,,India,,1252.0


Using `concat` with `axis='columns` (or `axis=1`), and pandas concatenates the dataframes horizontally and try and match up indices.

In [36]:
pd.concat([cars, brics], axis='columns', sort=True)

Unnamed: 0,cars_per_cap,country,drives_right,country.1,capital,area,population
AUS,731.0,Australia,False,,,,
BR,,,,Brazil,Brasilia,8.516,200.4
CH,,,,China,Beijing,9.597,1357.0
EG,45.0,Egypt,True,,,,
IN,18.0,India,False,India,New Delhi,3.286,1252.0
JAP,588.0,Japan,False,,,,
MOR,70.0,Morocco,True,,,,
RU,200.0,Russia,True,Russia,Moscow,17.1,143.5
SA,,,,South Africa,Pretoria,1.221,52.98
US,809.0,United States,True,,,,


It is often convenient to build a large DataFrame by parsing many files as DataFrames and concatenating them all at once.

**Note**:

Create `file_name` using string interpolation with the loop variable medal. This has been done for you. The expression `"%s_top5.csv" % medal` evaluates as a string with the value of medal replacing `%s` in the format string.

In [40]:
medal_types = ['bronze', 'silver', 'gold']
medals = []

for medal in medal_types:

    # Create the file name: file_name
    file_name = "./data/Summer Olympic medals/%s_top5.csv" % medal
    
    # Create list of column names: columns
    columns = ['Country', medal]
    
    # Read file_name into a DataFrame: df
    medal_df = pd.read_csv(file_name, header=0, index_col='Country', names=columns)

    # Append medal_df to medals
    medals.append(medal_df)

# Concatenate medals horizontally: medals
medals = pd.concat(medals, axis='columns', sort=False)

# Print medals
print(medals)

                bronze  silver    gold
United States   1052.0  1195.0  2088.0
Soviet Union     584.0   627.0   838.0
United Kingdom   505.0   591.0   498.0
France           475.0   461.0     NaN
Germany          454.0     NaN   407.0
Italy              NaN   394.0   460.0
