### Pandas Lab: Time Shifts & Multi Level Indexing

This lab is designed to introduce you to working with time in a more granular way, and understanding how to build features when your data has hierarchies or panels.  

Ie, when you have repeated observations for the same objects.  This is an important concept because lots of statistical methods don't explicitly account for values which might naturally be correlated with one another over time.  

But lots of data **is** highly correlated over time!  

By the time you're done with this lab, you'll have built 9 columns that capture a variety of information about how an observed value is changing with respect to itself.

**Question 1:** Set the multi-level index so the first level is the Stock symbol itself, and the second level is the date.  Make sure the date column is sorted in ascending order.  You might have to use the `sort_index(level=0)` method to get the values straight.

In [None]:
# your answer here

**Question 2:** To capture some other aspects of dates, create columns in your dataset that capture this aspect of each timestamp:

  - What quarter it's in
  - Whether or not it's the last day of the month/quarter
  - What day it is (ie, do price changes vary by day?)
  
**Hint:** You don't use the `dt` attribute to get date parts from index values.  Multi indices are also a little tricky.  

To get what you want, try this: `df.index.get_level_values(level=1).your_datepart_here`

In [None]:
# your answer here

**Question 3:** Time Series Embedding

Lots of times if you're trying to predict the value of something tomorrow, the most import piece of information is what the value of something is today, and yesterday, and so on.

Try and create columns that capture previously observed values for each stock.  

Make two columns that capture the value of the following:

 - What the previous recorded price for each stock was
 - The stock price from two observations ago
 
**Remember:** This has to be done on a particular level of the index to make sure it's getting applied appropriately!

In [None]:
# your answer here

**Question 4:** How did each stock price change compared to the S&P 500? 

Lots of times it's useful to see how something else moves with some other item that you're trying to track.  

In the data folder is a file called `s&p.csv`, and it contains the price history of the S&P 500 index for each day since its inception. See if you can upload it, and merge the `adj close` column into your dataset, so there's a column that displays the observed value of the index for every single price observation we have in our dataset.

**Hints:**
 - Merging on multi-level indices is tricky and prone to failure.  To make this a little bit easier, just use `reset_index()` to pop out the date column in the multi-index, and merge on it as if it were a regular column.
 - Make sure both date columns are actually encoded as dates, rather than strings, or else the merge won't work.
 - You'll want to go back to the multi-level index when you're done with this step.

In [None]:
# your answer here

**Question 5:** Window Statistics

Lots of times, if we want to capture some idea of momentum, or how some value changes with what's usually observed.

Ie, if we had 48 purchases in a store today, how does that number compare to what's happened in the last 14 days?  Are things trending up or trending down?  

This also allows us to get a clearer picture of general trends in values, even if there are irregular daily spikes.

To handle these sorts of issues, pandas has an entire section to calculate window statistics called `rolling`, it works like this:

In [9]:
# I'll create a sample dataframe with 30 days worth of values
import numpy as np
index = pd.date_range(start='01/01/2020', end='02/05/2020')
sample_df = pd.DataFrame(np.random.randn(36), index=index, columns=['Value'])
# and here's what it looks like
sample_df.head()

Unnamed: 0,Value
2020-01-01,-0.253379
2020-01-02,-0.838158
2020-01-03,-1.131807
2020-01-04,-1.708901
2020-01-05,-0.1963


In [11]:
# and now we'll see rolling 10 day averages
sample_df.rolling(10).mean()

Unnamed: 0,Value
2020-01-01,
2020-01-02,
2020-01-03,
2020-01-04,
2020-01-05,
2020-01-06,
2020-01-07,
2020-01-08,
2020-01-09,
2020-01-10,-0.366059


You can specify the number of observations to calculate, and choose your aggregator -- `mean()`, `min()`, `sum()`, etc, although `mean()` is the most common.

**Your Turn:** Calculate the rolling 5 & 10 day moving averages for each stock inside the dataset.

**Note:** Do *not* try and merge them back into your dataset yet, just make sure you have the values showing up.

In [None]:
# your answer here

If you take a look at the index, you should notice that it has *three* levels to it, and not just two like before.  

Combining datasets with differing numbers of levels is cumbersome, and there's a decent amount of churn in what methods work from one version of Pandas to another.  

For now, try and get these values back into your original dataset by taking the following steps:

 - calculate the 5 & 10 rolling averages for each stock price on the multilevel index, and save these as variables, and then use the *values* attribute for each one to drop the index and just get the column values (ask me about this if you have questions)
 - use reset_index() to unstack the index on your original dataframe
 - create new columns for the 5 & 10 day moving averages in the original dataset, using the values from the first step.
 
So as a quick example, it would sort of work like this:

`five_day = df.groupby(level=0)['Price'].your_stuff_here.values`

And then use this as the basis to make your new column from your original dataframe with the reset index.

In [None]:
# your answer here