## Concatenating multiple dataframes together
- conat method enables concatenating two or more DataFrames(or Series) together, both vertically and horizontally
- We combine DataFrames both horizontally and vertically with the concat method and then change the parameter values to yield different results

Read in the 2016 and 2017 stock datasets, and make their ticker symbol the index

In [1]:
import pandas as pd

In [4]:
stocks_2016 = pd.read_csv('data/stocks_2016.csv', index_col='Symbol')
stocks_2016

Unnamed: 0_level_0,Shares,Low,High
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AAPL,80,95,110
TSLA,50,80,130
WMT,40,55,70


In [5]:
stocks_2017 = pd.read_csv('data/stocks_2017.csv', index_col='Symbol')
stocks_2017

Unnamed: 0_level_0,Shares,Low,High
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AAPL,50,120,140
GE,100,30,40
IBM,87,75,95
SLB,20,55,85
TXN,500,15,23
TSLA,100,100,300


Place all the stock datasets into a single list, and then call the concat method to concatenate them together

In [6]:
s_list = [stocks_2016, stocks_2017]
pd.concat(s_list)

Unnamed: 0_level_0,Shares,Low,High
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AAPL,80,95,110
TSLA,50,80,130
WMT,40,55,70
AAPL,50,120,140
GE,100,30,40
IBM,87,75,95
SLB,20,55,85
TXN,500,15,23
TSLA,100,100,300


- By default, the concat method concatenates DataFrames vertically, one on top of the other
- Suppose we wanted to associate specific keys with each of the pieces of the chopped up DataFrame. We can do this using the keys argument
    * concat method allows each piece of the resulting DataFrame to be labeled with the keys parameter
    * This label will appear in the outermost index level of the concatenated frame and force the creation of a MultiIndex

In [7]:
pd.concat(s_list, keys=['2016', '2017'], names=['Year', 'Symbol'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Shares,Low,High
Year,Symbol,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016,AAPL,80,95,110
2016,TSLA,50,80,130
2016,WMT,40,55,70
2017,AAPL,50,120,140
2017,GE,100,30,40
2017,IBM,87,75,95
2017,SLB,20,55,85
2017,TXN,500,15,23
2017,TSLA,100,100,300


Concatenate horizontally by changing the axis parameter to columns or 1

In [8]:
pd.concat(s_list, keys=['2016', '2017'], axis=1, names=['Year', None])

Year,2016,2016,2016,2017,2017,2017
Unnamed: 0_level_1,Shares,Low,High,Shares,Low,High
AAPL,80.0,95.0,110.0,50.0,120.0,140.0
GE,,,,100.0,30.0,40.0
IBM,,,,87.0,75.0,95.0
SLB,,,,20.0,55.0,85.0
TSLA,50.0,80.0,130.0,100.0,100.0,300.0
TXN,,,,500.0,15.0,23.0
WMT,40.0,55.0,70.0,,,


- concat method, by default, uses an outer join, keeping all rows from each DataFrame in the list
- it gives us options to only keep rows that have the same index values in both DataFrames

In [9]:
pd.concat(s_list, keys=['2016', '2017'], axis=1, join='inner', names=['Year', None])

Year,2016,2016,2016,2017,2017,2017
Unnamed: 0_level_1,Shares,Low,High,Shares,Low,High
Symbol,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AAPL,80,95,110,50,120,140
TSLA,50,80,130,100,100,300


### Notes

- append 메소드는 concat 메소드의 watered down version
- 내부적으로 append 메소드는 concat 메소드를 호출

In [11]:
# pd.concat(s_list)
stocks_2016.append(stocks_2017)

Unnamed: 0_level_0,Shares,Low,High
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AAPL,80,95,110
TSLA,50,80,130
WMT,40,55,70
AAPL,50,120,140
GE,100,30,40
IBM,87,75,95
SLB,20,55,85
TXN,500,15,23
TSLA,100,100,300
