# Combining and Merging Datasets
*Curtis Miller*

In this notebook I demonstrate how to combine data sets using **pandas**. To demonstrate, we will be combining Apple stock closing prices (ticker symbol AAPL) with Apple's earnings per share (EPS). (Data obtained from [Quandl](https://www.quandl.com) and [Nasdaq](https://www.nasdaq.com).)

Let's load in CSV files containing these datasets.

In [None]:
import pandas as pd
from pandas import Series, DataFrame
%matplotlib inline

In [None]:
close = pd.read_csv("AAPL-Close-15-17.csv", index_col="date")
close.head()

In [None]:
close.plot()

In [None]:
eps = pd.read_csv("AAPL-EPS-15-17.csv", index_col="Date")
eps.head()

In [None]:
eps.plot()

We can use the `DataFrame` method `join()` to combine these two datasets. Joins happen on an index, by default the index of the `DataFrame`s (though this can be changed). The product will be a `DataFrame` with the combined dataset. There are four types of joins (analogous to joins in SQL):

* `"inner"`: Only rows with a common index will be joined.
* `"outer"`: Every index present in both tables will have a row, but some information will be missing for rows that are not shared (`np.nan`).
* `"left"`: Every row in the left dataset (the one calling `join()`, in this case) will be included in the end result.
* `"right"`: Every row in the right dataset (the argument of `join()`, in this case) will be included in the end result.

I demonstrate these below.

In [None]:
# Get a common datetime format
close.index    # Notie not being called a date

In [None]:
eps.index

In [None]:
pd.to_datetime(close.index)    # This is better

In [None]:
pd.to_datetime(eps.index)

In [None]:
close.index = pd.to_datetime(close.index)
eps.index = pd.to_datetime(eps.index)
close.head()

In [None]:
eps.head()

In [None]:
close.join(eps, how="inner")    # Inner join

In [None]:
close.join(eps, how="outer").head()    # Outer join

In [None]:
close.join(eps, how="left").head()    # Left join (compare to "outer")

In [None]:
close.join(eps, how="right")    # Right join (compare to "inner")