# Time Series Windows Practical Exercises

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact
import re

%matplotlib inline

We are going to continue working with the alcohol consumption dataset used before. We will need again our function to parse quarters. 


### Parsing quarter function
The function `parse_quarter` takes a string of the form `YYYYQN` and convert it to `pandas.Timestamp` object.


In [None]:
def parse_quarter(string):
    """
    Converts a string from the format YYYYQN in datetime object at the end of quarter N.
    """

    # Note: you could also just retrieve the first four elements of the string
    # and the last one... Regex is fun but often not necessary
    year, qn = re.search(r"^(20[0-9][0-9])(Q[1-4])$", string).group(1, 2)

    # year and qn will be strings, pd.datetime expects integers.
    year = int(year)

    date = None

    if qn == "Q1":
        date = pd.Timestamp(year, 3, 31)
    elif qn == "Q2":
        date = pd.Timestamp(year, 6, 30)
    elif qn == "Q3":
        date = pd.Timestamp(year, 9, 30)
    else:
        date = pd.Timestamp(year, 12, 31)

    return date


# Check that it works!
print(parse_quarter("2000Q3"))  # should show 2000-09-20 00:00:00

### Giving the parser to pandas

Pandas can parse dates using a custom made parser such as the one you just defined. For this just specify your function in the `date_parser` option.

In [None]:
# reload the data using your parser, set the index to the date
alcohol_consumption = pd.read_csv(
    "data/NZAlcoholConsumption.csv",
    parse_dates=["DATE"],
    date_parser=parse_quarter,
    index_col="DATE",
)
alcohol_consumption.sort_index(inplace=True)
alcohol_consumption.head()

## Exercise: Moving Windows

In the cells below you will explore the effect of applying a "Rolling Average" to the data i.e.: look at a number of successive points, take the average, and replace the window by the average (either at the extreme right of the window, or at the center)

* Use the `rolling` method from `pd.Series` ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rolling.html#pandas.Series.rolling)) with the `TotalWine` variable.
* Specify a window of 4 points and apply `mean`

The code will then plot the averaged line and the original time series.

In [None]:
# Your code here


In [None]:
plt.figure(figsize=(8, 6))
plt.plot(alcohol_consumption.TotalWine, "-o", label="wine consumption")
plt.plot(rolling_mean, label="trend")
plt.legend(fontsize=12)

The rolling mean curve seems to capture the trend nicely and removes much of the seasonal movements. 
This curves allows to better appreciate the overall increase of wine consumption over time as well as the dip in consumption in 2008. 

To explore this rolling average further, it's nice to look at widgets. Have alook at the cell below and modify at will. 

In [None]:
def rolling_avg_plot(window_size):
    plt.plot(alcohol_consumption.TotalWine, "-o", label="wine consumption")
    rolling = alcohol_consumption.TotalWine.rolling(window=window_size).mean()
    plt.plot(rolling, label="trend")
    plt.legend()
    plt.show()


interact(rolling_avg_plot, window_size=(0, 10))

### Exponential moving

We will now compute the exponential moving average. Can you see any differences with the moving average?

In [None]:
def rolling_avg_plot(window_size):
    plt.plot(alcohol_consumption.TotalWine, "-o", label="wine consumption")
    rolling = alcohol_consumption.TotalWine.ewm(
        span=window_size + 1, adjust=False
    ).mean()
    plt.plot(rolling, label="trend")
    plt.legend()
    plt.show()


interact(rolling_avg_plot, window_size=(0, 10))

### Exercise: plot the moving sum with a window of width 4
Use `rolling` again with the `TotalWine` variable and a window of 4 but this time apply the `sum` as the function to use for the rolling window. The code will then plot both the original and the rolling sum. 

In [None]:
# Your code here


In [None]:
plt.figure(figsize=(8, 6))
plt.plot(alcohol_consumption.TotalWine, "-o", label="wine consumption")
plt.plot(rolling_sum, label="trend")
plt.legend(fontsize=12)

### Exercise: custom function

Using `.apply` with a lambda function, we can apply any transformation we like to our data. This is common when creating features from time series data. 

Use the `autocorr()` function to create a rolling autocorrelation with window size 4. 

In [None]:
# Your code here
plt.figure(figsize=(8, 6))
plt.plot(rolling_autocorr)