### Pandas Lab: Time Shifts & Multi Level Indexing

This lab is designed to introduce you to working with time in a more granular way, and understanding how to build features when your data has hierarchies or panels.  

Ie, when you have repeated observations for the same objects.  This is an important concept because lots of statistical methods don't explicitly account for values which might naturally be correlated with one another over time.  

But lots of data **is** highly correlated over time!  

By the time you're done with this lab, you'll have built 9 columns that capture a variety of information about how an observed value is changing with respect to itself.

**Question 1:** To capture some other aspects of dates, create columns in your dataset that capture the following aspects of each timestamp:

  - What quarter it's in
  - What month it's in
  - What year it's in
  - The numeric value of the `visit_date` column (ie, turn it into an integer)

If you want to try adding different pandas date parts, you can find them here:  https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-date-components

In [None]:
# your answer here

**Question 2:** Set the multi-level index so the first level is the store id, and the second level is the date.  Make sure the date column is sorted in ascending order.  You might have to use the `sort_index(level=0)` method to get the values straight.

In [None]:
# your answer here

**Question 3:** Time Series Embedding

Lots of times if you're trying to predict the value of something tomorrow, the most import piece of information is what the value of something is today, and yesterday, and so on.

However, your data won't really "know" about those values unless they can be observed alongside the current observation.  Data is read in as rows, not columns.  

To that end, make three columns that capture the value of the following:

 - What the previous recorded attendance for each restaurant was
 - The attendance from two days ago
 - The attendance from 7 days ago (ie, week over week)
 
**Remember:** This has to be done on a particular level of the index so make sure it's getting applied appropriately!

In [None]:
# your answer here

**Question 4:** Window Statistics

Lots of times, we want to capture some idea of momentum, or how some value changes with what's usually observed.

Ie, if we had 48 purchases in a store today, how does that number compare to what's happened in the last 14 days?  Are things trending up or trending down?  

This also allows us to get a clearer picture of general trends in values, even if there are irregular daily spikes.

To handle these sorts of issues, pandas has an entire section to calculate window statistics called `rolling`, it works like this:

In [9]:
# I'll create a sample dataframe with 30 days worth of values
import numpy as np
index = pd.date_range(start='01/01/2020', end='02/05/2020')
sample_df = pd.DataFrame(np.random.randn(36), index=index, columns=['Value'])
# and here's what it looks like
sample_df.head()

Unnamed: 0,Value
2020-01-01,-0.253379
2020-01-02,-0.838158
2020-01-03,-1.131807
2020-01-04,-1.708901
2020-01-05,-0.1963


In [11]:
# and now we'll see rolling 10 day averages
sample_df.rolling(10).mean()

Unnamed: 0,Value
2020-01-01,
2020-01-02,
2020-01-03,
2020-01-04,
2020-01-05,
2020-01-06,
2020-01-07,
2020-01-08,
2020-01-09,
2020-01-10,-0.366059


You can specify the number of observations to calculate, and choose your aggregator -- `mean()`, `min()`, `sum()`, etc, although `mean()` is the most common.

**Your Turn:** Calculate the rolling 7, 25, and 60 day moving averages for visits for each restaurant inside the dataset.

**Note:** Do *not* try and merge them back into your dataset yet, just make sure you have the values showing up and save them as variables

And be mindful of performing these on the appropriate levels of your dataset.

In [None]:
# your answer here

If you take a look at the index, you should notice that it has *three* levels to it, and not just two like before.  

Combining datasets with differing numbers of levels is cumbersome, and there's a decent amount of churn in what methods work from one version of Pandas to another.  

For now, try and get these values back into your original dataset by just using the `values` attribute, which will strip away the index and just return the values from the calculations.
 
So as a quick example, it would sort of work like this:

`five_day = df.groupby(level=0)['Visits'].your_stuff_here.values`

Take the values from the your previous calculations, and use them to create new columns for each one.

In [None]:
# your answer here