### Pandas Lab: Time Shifts & Multi Level Indexing

This lab is designed to introduce you to working with time in a more granular way, and understanding how to build features when your data has hierarchies or panels.  

Ie, when you have repeated observations for the same objects.  This is an important concept because lots of statistical methods don't explicitly account for values which might naturally be correlated with one another over time.  

But lots of data **is** highly correlated over time!  

By the time you're done with this lab, you'll have built 9 columns that capture a variety of information about how an observed value is changing with respect to itself.

In [48]:
import numpy as np
import pandas as pd
from datetime import datetime

In [95]:
df = pd.read_csv("/Users/PRSmb/OneDrive/General-Assembly/my-1019-repo/ClassMaterial/Unit2/data/restaurants.csv", parse_dates=['visit_date'])

In [96]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 252108 entries, 0 to 252107
Data columns (total 11 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   id                252108 non-null  object        
 1   visit_date        252108 non-null  datetime64[ns]
 2   visitors          252108 non-null  int64         
 3   calendar_date     252108 non-null  object        
 4   day_of_week       252108 non-null  object        
 5   holiday           252108 non-null  int64         
 6   genre             252108 non-null  object        
 7   area              252108 non-null  object        
 8   latitude          252108 non-null  float64       
 9   longitude         252108 non-null  float64       
 10  reserve_visitors  108394 non-null  float64       
dtypes: datetime64[ns](1), float64(3), int64(2), object(5)
memory usage: 21.2+ MB


**Question 1:** To capture some other aspects of dates, create columns in your dataset that capture the following aspects of each timestamp:

  - What quarter it's in
  - What month it's in
  - What year it's in
  - The numeric value of the `visit_date` column (ie, turn it into an integer)

If you want to try adding different pandas date parts, you can find them here:  https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-date-components

In [None]:
# your answer here

In [51]:
df['year'] = df.visit_date.dt.year
df['quarter'] = df.visit_date.dt.quarter
df['month'] = df.visit_date.dt.month

In [52]:
df['date_as_int'] = df['visit_date'].apply(lambda x: int(datetime.strftime(x ,'%Y%m%d')))

In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 252108 entries, 0 to 252107
Data columns (total 15 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   id                252108 non-null  object        
 1   visit_date        252108 non-null  datetime64[ns]
 2   visitors          252108 non-null  int64         
 3   calendar_date     252108 non-null  object        
 4   day_of_week       252108 non-null  object        
 5   holiday           252108 non-null  int64         
 6   genre             252108 non-null  object        
 7   area              252108 non-null  object        
 8   latitude          252108 non-null  float64       
 9   longitude         252108 non-null  float64       
 10  reserve_visitors  108394 non-null  float64       
 11  year              252108 non-null  int64         
 12  quarter           252108 non-null  int64         
 13  month             252108 non-null  int64         
 14  date

### Jonathan's answers Q1:

In [99]:
df['month'] = df['visit_date'].dt.month
df['quarter'] = df['visit_date'].dt.quarter
df['year'] = df['visit_date'].dt.year



In [105]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 252108 entries, 0 to 252107
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   id                252108 non-null  object        
 1   visit_date        252108 non-null  datetime64[ns]
 2   visitors          252108 non-null  int64         
 3   calendar_date     252108 non-null  object        
 4   day_of_week       252108 non-null  object        
 5   holiday           252108 non-null  int64         
 6   genre             252108 non-null  object        
 7   area              252108 non-null  object        
 8   latitude          252108 non-null  float64       
 9   longitude         252108 non-null  float64       
 10  reserve_visitors  108394 non-null  float64       
 11  month             252108 non-null  int64         
 12  quarter           252108 non-null  int64         
 13  year              252108 non-null  int64         
dtypes: d

In [104]:
df.drop('ear',inplace=True,axis=1)

In [107]:
df.head(25)

Unnamed: 0,id,visit_date,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,month,quarter,year
0,air_ba937bf13d40fb24,2016-01-13,25,2016-01-13,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
1,air_ba937bf13d40fb24,2016-01-14,32,2016-01-14,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
2,air_ba937bf13d40fb24,2016-01-15,29,2016-01-15,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
3,air_ba937bf13d40fb24,2016-01-16,22,2016-01-16,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
4,air_ba937bf13d40fb24,2016-01-18,6,2016-01-18,Monday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
5,air_ba937bf13d40fb24,2016-01-19,9,2016-01-19,Tuesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
6,air_ba937bf13d40fb24,2016-01-20,31,2016-01-20,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
7,air_ba937bf13d40fb24,2016-01-21,21,2016-01-21,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
8,air_ba937bf13d40fb24,2016-01-22,18,2016-01-22,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
9,air_ba937bf13d40fb24,2016-01-23,26,2016-01-23,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016


In [110]:
df['visit_date'].astype(np.int64) 

# Expresses the date as unix time!
# Takes date as miliseconds form 1-Jan-1970
# https://en.wikipedia.org/wiki/Unix_time

# Needed 64-bit integer because of how big the integer is

0         1452643200000000000
1         1452729600000000000
2         1452816000000000000
3         1452902400000000000
4         1453075200000000000
                 ...         
252103    1492732800000000000
252104    1492819200000000000
252105    1490486400000000000
252106    1489968000000000000
252107    1491696000000000000
Name: visit_date, Length: 252108, dtype: int64

**Question 2:** Set the multi-level index so the first level is the store id, and the second level is the date.  Make sure the date column is sorted in ascending order.  You might have to use the `sort_index(level=0)` method to get the values straight.

In [None]:
# your answer here

In [54]:
df.set_index(['id','visit_date'],inplace=True)

In [55]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,year,quarter,month,date_as_int
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
air_ba937bf13d40fb24,2016-01-13,25,2016-01-13,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160113
air_ba937bf13d40fb24,2016-01-14,32,2016-01-14,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160114
air_ba937bf13d40fb24,2016-01-15,29,2016-01-15,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160115
air_ba937bf13d40fb24,2016-01-16,22,2016-01-16,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160116
air_ba937bf13d40fb24,2016-01-18,6,2016-01-18,Monday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160118


### Jonathan's answers Q2:

In [117]:
df.set_index(['id','visit_date'], inplace=True)

In [118]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,month,quarter,year
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
air_ba937bf13d40fb24,2016-01-13,25,2016-01-13,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
air_ba937bf13d40fb24,2016-01-14,32,2016-01-14,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
air_ba937bf13d40fb24,2016-01-15,29,2016-01-15,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
air_ba937bf13d40fb24,2016-01-16,22,2016-01-16,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
air_ba937bf13d40fb24,2016-01-18,6,2016-01-18,Monday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,1,1,2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...
air_a17f0778617c76e2,2017-04-21,49,2017-04-21,Friday,0,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,6.0,4,2,2017
air_a17f0778617c76e2,2017-04-22,60,2017-04-22,Saturday,0,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,37.0,4,2,2017
air_a17f0778617c76e2,2017-03-26,69,2017-03-26,Sunday,0,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,35.0,3,1,2017
air_a17f0778617c76e2,2017-03-20,31,2017-03-20,Monday,1,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,3.0,3,1,2017


In [120]:
df.sort_index(level=[0,1],inplace=True)

In [121]:
df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,month,quarter,year
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
air_00a91d42b08b08d9,2016-07-01,35,2016-07-01,Friday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-02,9,2016-07-02,Saturday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,4.0,7,3,2016
air_00a91d42b08b08d9,2016-07-04,20,2016-07-04,Monday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-05,25,2016-07-05,Tuesday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-06,29,2016-07-06,Wednesday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-07,34,2016-07-07,Thursday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-08,42,2016-07-08,Friday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-09,11,2016-07-09,Saturday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-11,25,2016-07-11,Monday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016
air_00a91d42b08b08d9,2016-07-12,24,2016-07-12,Tuesday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016


**Question 3:** Time Series Embedding

Lots of times if you're trying to predict the value of something tomorrow, the most import piece of information is what the value of something is today, and yesterday, and so on.

However, your data won't really "know" about those values unless they can be observed alongside the current observation.  Data is read in as rows, not columns.  

To that end, make three columns that capture the value of the following:

 - What the previous recorded attendance for each restaurant was
 - The attendance from two days ago
 - The attendance from 7 days ago (ie, week over week)
 
**Remember:** This has to be done on a particular level of the index so make sure it's getting applied appropriately!

In [None]:
# your answer here

In [59]:
df['visitors_yesterday'] = df.groupby(level=0)['visitors'].shift()

In [61]:
df['visitors_2daysago'] = df.groupby(level=0)['visitors'].shift(2)

In [62]:
df['visitors_7daysago'] = df.groupby(level=0)['visitors'].shift(7)

In [63]:
df.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 252108 entries, ('air_ba937bf13d40fb24', Timestamp('2016-01-13 00:00:00')) to ('air_a17f0778617c76e2', Timestamp('2017-04-09 00:00:00'))
Data columns (total 16 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   visitors            252108 non-null  int64  
 1   calendar_date       252108 non-null  object 
 2   day_of_week         252108 non-null  object 
 3   holiday             252108 non-null  int64  
 4   genre               252108 non-null  object 
 5   area                252108 non-null  object 
 6   latitude            252108 non-null  float64
 7   longitude           252108 non-null  float64
 8   reserve_visitors    108394 non-null  float64
 9   year                252108 non-null  int64  
 10  quarter             252108 non-null  int64  
 11  month               252108 non-null  int64  
 12  date_as_int         252108 non-null  int64  
 13  visitors_yesterday  25127

In [65]:
df.head(15)

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,year,quarter,month,date_as_int,visitors_yesterday,visitors_2daysago,visitors_7daysago
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
air_ba937bf13d40fb24,2016-01-13,25,2016-01-13,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160113,,,
air_ba937bf13d40fb24,2016-01-14,32,2016-01-14,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160114,25.0,,
air_ba937bf13d40fb24,2016-01-15,29,2016-01-15,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160115,32.0,25.0,
air_ba937bf13d40fb24,2016-01-16,22,2016-01-16,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160116,29.0,32.0,
air_ba937bf13d40fb24,2016-01-18,6,2016-01-18,Monday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160118,22.0,29.0,
air_ba937bf13d40fb24,2016-01-19,9,2016-01-19,Tuesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160119,6.0,22.0,
air_ba937bf13d40fb24,2016-01-20,31,2016-01-20,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160120,9.0,6.0,
air_ba937bf13d40fb24,2016-01-21,21,2016-01-21,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160121,31.0,9.0,25.0
air_ba937bf13d40fb24,2016-01-22,18,2016-01-22,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160122,21.0,31.0,32.0
air_ba937bf13d40fb24,2016-01-23,26,2016-01-23,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160123,18.0,21.0,29.0


### Jonathan's answers Q3:

In [None]:
# Key point is that we're doing the shift on a specific group

In [126]:
df.groupby(level=0)['visitors'].shift()

id                    visit_date
air_00a91d42b08b08d9  2016-07-01     NaN
                      2016-07-02    35.0
                      2016-07-04     9.0
                      2016-07-05    20.0
                      2016-07-06    25.0
                                    ... 
air_fff68b929994bfbd  2017-04-18     3.0
                      2017-04-19     6.0
                      2017-04-20     2.0
                      2017-04-21     2.0
                      2017-04-22     4.0
Name: visitors, Length: 252108, dtype: float64

In [127]:
df.groupby(level=0)['visitors'].shift().loc['air_fff68b929994bfbd']

visit_date
2016-07-01    NaN
2016-07-02    3.0
2016-07-05    3.0
2016-07-06    7.0
2016-07-07    6.0
             ... 
2017-04-18    3.0
2017-04-19    6.0
2017-04-20    2.0
2017-04-21    2.0
2017-04-22    4.0
Name: visitors, Length: 269, dtype: float64

In [128]:
df['visitors1'] = df.groupby(level=0)['visitors'].shift()
df['visitors2'] = df.groupby(level=0)['visitors'].shift(2)
df['visitors7'] = df.groupby(level=0)['visitors'].shift(7)

In [131]:
df.loc['air_fff68b929994bfbd'].head(10)

Unnamed: 0_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,month,quarter,year,visitors1,visitors2,visitors7
visit_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2016-07-01,3,2016-07-01,Friday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,,,
2016-07-02,3,2016-07-02,Saturday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,4.0,7,3,2016,3.0,,
2016-07-05,7,2016-07-05,Tuesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,3.0,3.0,
2016-07-06,6,2016-07-06,Wednesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,7.0,3.0,
2016-07-07,1,2016-07-07,Thursday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,6.0,7.0,
2016-07-08,5,2016-07-08,Friday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,1.0,6.0,
2016-07-09,6,2016-07-09,Saturday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,5.0,1.0,
2016-07-11,5,2016-07-11,Monday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,6.0,5.0,3.0
2016-07-12,7,2016-07-12,Tuesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,5.0,6.0,3.0
2016-07-13,2,2016-07-13,Wednesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,7.0,5.0,7.0


**Question 4:** Window Statistics

Lots of times, we want to capture some idea of momentum, or how some value changes with what's usually observed.

Ie, if we had 48 purchases in a store today, how does that number compare to what's happened in the last 14 days?  Are things trending up or trending down?  

This also allows us to get a clearer picture of general trends in values, even if there are irregular daily spikes.

To handle these sorts of issues, pandas has an entire section to calculate window statistics called `rolling`, it works like this:

In [9]:
# I'll create a sample dataframe with 30 days worth of values
import numpy as np
index = pd.date_range(start='01/01/2020', end='02/05/2020')
sample_df = pd.DataFrame(np.random.randn(36), index=index, columns=['Value'])
# and here's what it looks like
sample_df.head()

Unnamed: 0,Value
2020-01-01,-0.253379
2020-01-02,-0.838158
2020-01-03,-1.131807
2020-01-04,-1.708901
2020-01-05,-0.1963


In [11]:
# and now we'll see rolling 10 day averages
sample_df.rolling(10).mean()

Unnamed: 0,Value
2020-01-01,
2020-01-02,
2020-01-03,
2020-01-04,
2020-01-05,
2020-01-06,
2020-01-07,
2020-01-08,
2020-01-09,
2020-01-10,-0.366059


You can specify the number of observations to calculate, and choose your aggregator -- `mean()`, `min()`, `sum()`, etc, although `mean()` is the most common.

**Your Turn:** Calculate the rolling 7, 25, and 60 day moving averages for visits for each restaurant inside the dataset.

**Note:** Do *not* try and merge them back into your dataset yet, just make sure you have the values showing up and save them as variables

And be mindful of performing these on the appropriate levels of your dataset.

In [None]:
# your answer here

In [75]:
ma_visitors_7 = df.groupby(level=0)['visitors'].rolling(7).mean()
ma_visitors_25 = df.groupby(level=0)['visitors'].rolling(25).mean()
ma_visitors_60 = df.groupby(level=0)['visitors'].rolling(60).mean()

In [74]:
ma_visitors_7.head(10)

id                    id                    visit_date
air_00a91d42b08b08d9  air_00a91d42b08b08d9  2016-07-01          NaN
                                            2016-07-02          NaN
                                            2016-07-04          NaN
                                            2016-07-05          NaN
                                            2016-07-06          NaN
                                            2016-07-07          NaN
                                            2016-07-08    27.714286
                                            2016-07-09    24.285714
                                            2016-07-11    26.571429
                                            2016-07-12    27.142857
Name: visitors, dtype: float64

In [76]:
ma_visitors_25[:40]

id                    id                    visit_date
air_00a91d42b08b08d9  air_00a91d42b08b08d9  2016-07-01      NaN
                                            2016-07-02      NaN
                                            2016-07-04      NaN
                                            2016-07-05      NaN
                                            2016-07-06      NaN
                                            2016-07-07      NaN
                                            2016-07-08      NaN
                                            2016-07-09      NaN
                                            2016-07-11      NaN
                                            2016-07-12      NaN
                                            2016-07-13      NaN
                                            2016-07-14      NaN
                                            2016-07-15      NaN
                                            2016-07-16      NaN
                                            2016-

In [81]:
ma_visitors_60[58:70]

id                    id                    visit_date
air_00a91d42b08b08d9  air_00a91d42b08b08d9  2016-09-16          NaN
                                            2016-09-17    24.600000
                                            2016-09-20    24.200000
                                            2016-09-21    24.616667
                                            2016-09-23    24.766667
                                            2016-09-24    24.666667
                                            2016-09-26    24.483333
                                            2016-09-27    24.250000
                                            2016-09-28    24.016667
                                            2016-09-29    24.183333
                                            2016-09-30    24.450000
                                            2016-10-01    24.983333
Name: visitors, dtype: float64

If you take a look at the index, you should notice that it has *three* levels to it, and not just two like before.  

Combining datasets with differing numbers of levels is cumbersome, and there's a decent amount of churn in what methods work from one version of Pandas to another.  

For now, try and get these values back into your original dataset by just using the `values` attribute, which will strip away the index and just return the values from the calculations.
 
So as a quick example, it would sort of work like this:

`five_day = df.groupby(level=0)['Visits'].your_stuff_here.values`

Take the values from the your previous calculations, and use them to create new columns for each one.

In [82]:
# your answer here

vma_visitors_7 = df.groupby(level=0)['visitors'].rolling(7).mean().values
vma_visitors_25 = df.groupby(level=0)['visitors'].rolling(25).mean().values
vma_visitors_60 = df.groupby(level=0)['visitors'].rolling(60).mean().values

In [84]:
vma_visitors_7

array([       nan,        nan,        nan, ..., 4.        , 4.14285714,
       3.85714286])

In [85]:
df['ma_7_days'] = vma_visitors_7
df['ma_25_days'] = vma_visitors_25
df['ma_60_days'] = vma_visitors_60

In [86]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,year,quarter,month,date_as_int,visitors_yesterday,visitors_2daysago,visitors_7daysago,ma_7_days,ma_25_days,ma_60_days
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
air_ba937bf13d40fb24,2016-01-13,25,2016-01-13,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160113,,,,,,
air_ba937bf13d40fb24,2016-01-14,32,2016-01-14,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160114,25.0,,,,,
air_ba937bf13d40fb24,2016-01-15,29,2016-01-15,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160115,32.0,25.0,,,,
air_ba937bf13d40fb24,2016-01-16,22,2016-01-16,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160116,29.0,32.0,,,,
air_ba937bf13d40fb24,2016-01-18,6,2016-01-18,Monday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160118,22.0,29.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
air_a17f0778617c76e2,2017-04-21,49,2017-04-21,Friday,0,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,6.0,2017,2,4,20170421,22.0,25.0,88.0,3.285714,3.96,4.383333
air_a17f0778617c76e2,2017-04-22,60,2017-04-22,Saturday,0,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,37.0,2017,2,4,20170422,49.0,22.0,61.0,4.000000,4.00,4.433333
air_a17f0778617c76e2,2017-03-26,69,2017-03-26,Sunday,0,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,35.0,2017,1,3,20170326,60.0,49.0,26.0,4.000000,3.96,4.333333
air_a17f0778617c76e2,2017-03-20,31,2017-03-20,Monday,1,Italian/French,Hyōgo-ken Kōbe-shi Kumoidōri,34.695124,135.197852,3.0,2017,1,3,20170320,69.0,60.0,19.0,4.142857,3.80,4.233333


In [88]:
df.head(60)

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,year,quarter,month,date_as_int,visitors_yesterday,visitors_2daysago,visitors_7daysago,ma_7_days,ma_25_days,ma_60_days
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
air_ba937bf13d40fb24,2016-01-13,25,2016-01-13,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160113,,,,,,
air_ba937bf13d40fb24,2016-01-14,32,2016-01-14,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160114,25.0,,,,,
air_ba937bf13d40fb24,2016-01-15,29,2016-01-15,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160115,32.0,25.0,,,,
air_ba937bf13d40fb24,2016-01-16,22,2016-01-16,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160116,29.0,32.0,,,,
air_ba937bf13d40fb24,2016-01-18,6,2016-01-18,Monday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160118,22.0,29.0,,,,
air_ba937bf13d40fb24,2016-01-19,9,2016-01-19,Tuesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160119,6.0,22.0,,,,
air_ba937bf13d40fb24,2016-01-20,31,2016-01-20,Wednesday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160120,9.0,6.0,,27.714286,,
air_ba937bf13d40fb24,2016-01-21,21,2016-01-21,Thursday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160121,31.0,9.0,25.0,24.285714,,
air_ba937bf13d40fb24,2016-01-22,18,2016-01-22,Friday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160122,21.0,31.0,32.0,26.571429,,
air_ba937bf13d40fb24,2016-01-23,26,2016-01-23,Saturday,0,Dining bar,Tōkyō-to Minato-ku Shibakōen,35.658068,139.751599,,2016,1,1,20160123,18.0,21.0,29.0,27.142857,,


### Jonathan's answers Q4:

In [134]:
df.groupby(level=0)['visitors'].rolling(7).mean()

id                    id                    visit_date
air_00a91d42b08b08d9  air_00a91d42b08b08d9  2016-07-01         NaN
                                            2016-07-02         NaN
                                            2016-07-04         NaN
                                            2016-07-05         NaN
                                            2016-07-06         NaN
                                                            ...   
air_fff68b929994bfbd  air_fff68b929994bfbd  2017-04-18    5.000000
                                            2017-04-19    4.428571
                                            2017-04-20    4.571429
                                            2017-04-21    4.428571
                                            2017-04-22    4.142857
Name: visitors, Length: 252108, dtype: float64

In [138]:
df.groupby(level=0)['visitors'].rolling(7).mean().loc['air_fff68b929994bfbd'].head(20)

id                    visit_date
air_fff68b929994bfbd  2016-07-01         NaN
                      2016-07-02         NaN
                      2016-07-05         NaN
                      2016-07-06         NaN
                      2016-07-07         NaN
                      2016-07-08         NaN
                      2016-07-09    4.428571
                      2016-07-11    4.714286
                      2016-07-12    5.285714
                      2016-07-13    4.571429
                      2016-07-14    5.285714
                      2016-07-15    5.571429
                      2016-07-16    6.000000
                      2016-07-17    5.857143
                      2016-07-19    6.000000
                      2016-07-20    6.000000
                      2016-07-21    6.714286
                      2016-07-22    5.428571
                      2016-07-23    6.142857
                      2016-07-24    5.428571
Name: visitors, dtype: float64

In [139]:
# This produces an error since the indexes don't match
df['rolling7'] = df.groupby(level=0)['visitors'].rolling(7).mean()

TypeError: incompatible index of inserted column with frame index

In [140]:
# So use .values to get the data in numpy array

df['rolling7'] = df.groupby(level=0)['visitors'].rolling(7).mean().values
df['rolling25'] = df.groupby(level=0)['visitors'].rolling(25).mean().values
df['rolling60'] = df.groupby(level=0)['visitors'].rolling(60).mean().values

In [141]:
df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,month,quarter,year,visitors1,visitors2,visitors7,rolling7,rolling25,rolling60
id,visit_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
air_00a91d42b08b08d9,2016-07-01,35,2016-07-01,Friday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,,,,,,
air_00a91d42b08b08d9,2016-07-02,9,2016-07-02,Saturday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,4.0,7,3,2016,35.0,,,,,
air_00a91d42b08b08d9,2016-07-04,20,2016-07-04,Monday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,9.0,35.0,,,,
air_00a91d42b08b08d9,2016-07-05,25,2016-07-05,Tuesday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,20.0,9.0,,,,
air_00a91d42b08b08d9,2016-07-06,29,2016-07-06,Wednesday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,25.0,20.0,,,,
air_00a91d42b08b08d9,2016-07-07,34,2016-07-07,Thursday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,29.0,25.0,,,,
air_00a91d42b08b08d9,2016-07-08,42,2016-07-08,Friday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,34.0,29.0,,27.714286,,
air_00a91d42b08b08d9,2016-07-09,11,2016-07-09,Saturday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,42.0,34.0,35.0,24.285714,,
air_00a91d42b08b08d9,2016-07-11,25,2016-07-11,Monday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,11.0,42.0,9.0,26.571429,,
air_00a91d42b08b08d9,2016-07-12,24,2016-07-12,Tuesday,0,Italian/French,Tōkyō-to Chiyoda-ku Kudanminami,35.694003,139.753595,,7,3,2016,25.0,11.0,20.0,27.142857,,


In [144]:
df.loc['air_fff68b929994bfbd'].head(30)

Unnamed: 0_level_0,visitors,calendar_date,day_of_week,holiday,genre,area,latitude,longitude,reserve_visitors,month,quarter,year,visitors1,visitors2,visitors7,rolling7,rolling25,rolling60
visit_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2016-07-01,3,2016-07-01,Friday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,,,,,,
2016-07-02,3,2016-07-02,Saturday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,4.0,7,3,2016,3.0,,,,,
2016-07-05,7,2016-07-05,Tuesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,3.0,3.0,,,,
2016-07-06,6,2016-07-06,Wednesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,7.0,3.0,,,,
2016-07-07,1,2016-07-07,Thursday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,6.0,7.0,,,,
2016-07-08,5,2016-07-08,Friday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,1.0,6.0,,,,
2016-07-09,6,2016-07-09,Saturday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,5.0,1.0,,4.428571,,
2016-07-11,5,2016-07-11,Monday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,6.0,5.0,3.0,4.714286,,
2016-07-12,7,2016-07-12,Tuesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,5.0,6.0,3.0,5.285714,,
2016-07-13,2,2016-07-13,Wednesday,0,Bar/Cocktail,Tōkyō-to Nakano-ku Nakano,35.708146,139.666288,,7,3,2016,7.0,5.0,7.0,4.571429,,


This becomes the lead into unit 3... so we can start applying gradient boosing