# 3.1 Time series data viz

One of the things that pandas is really great for working with is **time series** data. This is data for which the index (or at least one level in the index) is composed of datetime values denoting when the data is from.

Time series data is abundant:
 - Stock market prices.
 - Data from sensors, IoT devices
 - Event streams from applications and services.
 - KPIs and performance data
 
 In pandas, we have a variety of tools for manipulating time series data, structured around having a pandas series or dataframe of data with a datetime index.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Let's load our Tesla share price data again.

In [None]:
df = pd.read_csv(
    'data/tsla_share_price.csv',
    index_col='Date', # We tell pandas to use Date as the index
    parse_dates=True, # We can tell pandas to try and automatically parse dates in index. In this case it does a good job
    dayfirst=False, # Let pandas know our csv has an American (MM/DD) date format not a european one
)

# we can check out our index to see that pandas has interpreted it correctly.
df.index

## Plotting

First thing to notice is that pandas (with the help of matplotlib) does a good job of plotting time series data and adding ticks at sensible points on the x-axis.

In [None]:
# Create a larger figure to plot to.
fig, ax = plt.subplots(figsize=(20,10))

df['Close'].plot()

## Resampling

Pandas also has some nice tools for aggregating data over time, for examle calculating some monthly statistics from our daily data.


In [None]:
# Here we take our data and calulate the min, max, mean and standard deviation of the price each month.


monthly_df = (
    df.resample("1M")                     # This creates a resampler object very like the gropuby object we created in the data transformation notebook.
    ['Close']                             # We select close price from the dataframe.
    .agg(["min","max","mean","std"])      # Here we tell the resampler for close price to calculate a number of different summary satstics.
)

# If we look at this data, you'll see we now have one row per month.
display(monthly_df.head())

Lets plot this data to see how it looks.

In [None]:
# We'll construct a more elaborate plot here with two subplots.
fix, axes = plt.subplots(
    nrows=2,
    ncols=1,
    sharex=True,
    figsize=(16,16)
)

# Plot the mean value and the difference between the min and max on one subplot.
axes[0].plot(
    monthly_df.index,
    monthly_df['mean'], 
    label="Mean Closing Price"
)
axes[0].fill_between(
    monthly_df.index,
    monthly_df['min'], 
    monthly_df['max'], 
    alpha=0.2, 
    label="Closing Price Range - Min to Max"
)

# Set a title for the subplot
axes[0].set_title('Tesla Closing Price')

# Add a legend using the labels we used for plotting
axes[0].legend()

# plot the standard deviation on another since the scale is different.
axes[1].plot(
    monthly_df.index, monthly_df['std'], label='Standard Deviation in Closing Price'
)

# Add a title
axes[1].set_title("Closing Price - Standard Deviation")

# show our plot
plt.show()

## Excercise

Load the example sales data from `data/sample-sales-data.csv`.
- use `ORDERNUMBER` as the index.
- parse `ORDERDATE` as a date (in American format). Hint: You can give the `parse_dates` column a list of columns with datetime data in them to try and parse them all.
- filter out *Cancelled*, *On Hold* or *Disputed* orders with the `STATUS` column
- create a summary with daily sales volumes by country with `pivot_table()`, use order date for the index, and country for the columns. Fill any nulls with zeroes.
- resample the daily summary with `resample()` to calulate quarterly sales volumes for each country
- add a column to the quartely summary for the total sales in all regions.
- plot the total sales for all regions along with the sales for the UK as a line chart.


In [None]:
# Load the data
df = pd.read_csv(
    'data/sample-sales-data.csv',
    index_col="ORDERNUMBER",
    parse_dates=["ORDERDATE"],
    dayfirst=False
)
df.head()

In [None]:
# Filter out cancelled, on hold and disputed orders
bad_orders = df['STATUS'].isin(['Cancelled','On Hold','Disputed'])

df = df.loc[~bad_orders]
df.head()

In [None]:
# Create daily summary by country
daily_sales = df.pivot_table(
    index='ORDERDATE',
    columns='COUNTRY',
    values="SALES",
    aggfunc='sum'
)

# We can use inplace to fill nulls in our dataframe in place rather than returning a new data frame.
daily_sales.fillna(0, inplace=True)

daily_sales

In [None]:
# Resample to quarterly sales.
quarterly_sales = daily_sales.resample("Q").sum()
quarterly_sales

In [None]:
# Add a column for total sales
quarterly_sales['Total'] = quarterly_sales.sum(axis='columns')
quarterly_sales

In [None]:
fig, ax = plt.subplots(figsize=(16,10))

ax.plot(quarterly_sales.index, quarterly_sales['UK'], label="UK")
ax.plot(quarterly_sales.index, quarterly_sales['Total'], label="Total Sales")
ax.legend()


plt.show()