Pandas Plotting Demo
====================

This demo loads data into a DataFrame for students to be able to follow along the Pandas' visualization slides. It uses data from the stock market: the adjusted close prices between 2005 and 2009 for a few major stocks: Apple, Google, Microsoft, Procter and Gamble and Exxon Mobil. 

Feel free to inject your own data into this demo...

In [None]:
%matplotlib inline

In [None]:
import pandas as pd
from pandas.tools.plotting import autocorrelation_plot, scatter_matrix

In [None]:
df = pd.read_table("adj_close_stock_data_yahoo_2005_2010.txt", parse_dates=[[0,1,2]], sep="\s+", na_values=["-"])
df = df.set_index("year_month_day")
df.head()

In [None]:
# Let's extract a few time series from that DataFrame
ts = df['AAPL']
ts2 = df["MSFT"]

In [None]:
ts.plot()

In [None]:
# Adding imports from matplotlib since we will be using some of its functions:
import matplotlib.pyplot as plt

In [None]:
ts.plot(figsize=(12,9))
plt.xlabel("")
plt.savefig("aapl.png")

In [None]:
ts.plot()
ts2.plot()
plt.xlabel("")
plt.legend()

In [None]:
# Because in this case we are plotting the second curve on a different y-axis, 
# the clean up of the xlabel needs to happen after each call to plot. For the 
# same reason the calls to ylabel are also interleaved.
ts.plot()
plt.xlabel("")
plt.ylabel("AAPL")
ts2.plot(secondary_y=True)
plt.xlabel("")
plt.ylabel("MSFT")
plt.legend()
# Note: There is currently an imperfection in pandas' and matplotlib's  
# interaction which leads to only 1 line showing in the legend when using the 
# secondary_y. The fix is beyond the scope of this demo: 
# http://stackoverflow.com/questions/21988196/legend-only-shows-one-label-when-plotting-with-pandas

In [None]:
df.plot(subplots=True, style='b')
plt.xlabel("")

In [None]:
from numpy.random import rand
df2 = pd.DataFrame(rand(10,4), columns=list('abcd'))
df2.plot(kind='bar', stacked=True)

In [None]:
# The KDE estimator can't deal with missing values. Let's drop the days when there is a missing value
df2 = df.dropna()
df2.plot(kind="kde")

In [None]:
bp = df.boxplot()

In [None]:
sm = scatter_matrix(df)