### Prepping Data Challenge: Rolling Weekly Revenue (week 26)

This week's challenge is looking at creating moving calculations. By this let's use the example below, where on 5th January (yes British date format), if we wanted to understand a rolling week's values, you can include 3 days before the 5th (ie the 2nd, 3rd and 4th) as well as 3 days after the 5th (ie the 6th, 7th and 8th).

Clearly you need to define what your rolling period should include or not. A rolling week could look backwards for 6 days inclusive of the current date or 7 days if you don't use the current date. You could look the same period forward but ultimately you have to articulate what you are covering to your audience. The nature of the data might also influence the decision you are taking. 

### Challenge
Create a rolling weekly total and average for each Prep Air destination and an overall number for all destinations. The rolling week is as detailed above, 3 days before and 3 days after a date as well as that day itself. 

### Requirements
 - Input data
 - Create a data set that gives 7 rows per date (unless those dates aren't included in the data set). 
    - ie 1st Jan only has 4 rows of data (1st, 2nd, 3rd & 4th)
 - Remove any additional fields you don't need 
 - Create the Rolling Week Total and Rolling Week Average per destination
 - Records that have less than 7 days data should remain included in the output
 - Create the Rolling Week Total and Rolling Week Average for the whole data set
 - Pull the data together for the previous two requirements
 - Output the data

In [1]:
import pandas as pd

In [2]:
#Input the data
df = pd.read_csv('WK26-Input.csv', parse_dates=['Date'], dayfirst=True)

In [3]:
df.head()

Unnamed: 0,Destination,Date,Revenue
0,London,2021-01-01,232572
1,London,2021-01-02,105610
2,London,2021-01-03,149849
3,London,2021-01-04,164463
4,London,2021-01-05,129130


In [4]:
#Create a data set that gives 7 rows per date (unless those dates aren't included in the data set)
df = df.sort_values(by='Date')
df_1 = pd.DataFrame( {'Date' : df['Date'].unique()} )
df_1['date_row'] = [[d + pd.Timedelta(str(n-3) + ' day') for n in range(7)] 
                         for d in df_1['Date']]
df_1 = df_1.explode('date_row')

<div class="alert alert-block alert-info">
    
Pandas <strong>Timedeltas</strong> are differences in times, expressed in difference units, Eg days, hours, minutes, seconds. They can be both positive and negative. Timedelta objects can be created using various arguments.
    
<strong>String</strong><br> 
By passing a string literal, we can create a timedelta object.
    
pd.Timedelta('2 days 2 hours 15 minutes 30 seconds')<br>
2 days 02:15:30
    
<strong>Integer</strong><br>
print pd.Timedelta(6,unit='h')<br>
0 days 06:00:00
    
<strong>Data Offsets</strong><br>
Data offsets such as - weeks, days, hours, minutes, seconds, milliseconds, microseconds, nanoseconds can also be used in construction.

print pd.Timedelta(days=2)<br>
2 days 00:00:00
    
<strong>to_timedelta()</strong><br>
Using the top-level pd.to_timedelta, you can convert a scalar, array, list, or series from a recognized timedelta format/ value into a Timedelta type. It will construct Series if the input is a Series, a scalar if the input is scalar-like, otherwise will output a TimedeltaIndex.
    
<strong>Operations</strong><br>
You can operate on Series/ DataFrames and construct timedelta64[ns] Series through subtraction operations on datetime64[ns] Series, or Timestamps.
  
https://www.tutorialspoint.com/python_pandas/python_pandas_timedelta.htm
</div>

In [5]:
# join back to the original data on the join date
df_total = df.merge(df_1, on='Date', how='inner')\
            .merge(df, left_on=['Destination', 'date_row'], right_on=['Destination', 'Date'],
                   how='inner', suffixes=['', '_r'])

In [6]:
# Create the Rolling Week Total and Rolling Week Average per destination
df_date_agg = df_total.groupby(['Destination', 'Date']).agg(Rolling_Week_Avg=('Revenue_r', 'mean'),
                                                           Rolling_Week_Total=('Revenue_r', 'sum'))\
                     .reset_index()

In [7]:
#Create the Rolling Week Total and Rolling Week Average for the whole data set
df_total_agg = df_date_agg.groupby('Date').agg(Rolling_Week_Avg=('Rolling_Week_Avg', 'mean'),
                                               Rolling_Week_Total=('Rolling_Week_Total', 'sum'))\
                          .reset_index()
df_total_agg['Destination'] = 'All'

In [8]:
#Pull the data together for the previous two requirements
df_all = pd.concat([df_date_agg, df_total_agg])
df_all.columns = df_all.columns = [c.replace('_', ' ') for c in df_all.columns]

In [9]:
df_all.head()

Unnamed: 0,Destination,Date,Rolling Week Avg,Rolling Week Total
0,London,2021-01-01,163123.5,652494
1,London,2021-01-02,156324.8,781624
2,London,2021-01-03,168986.666667,1013920
3,London,2021-01-04,159290.857143,1115036
4,London,2021-01-05,149116.142857,1043813


In [10]:
df_all.to_csv('WK26-output.csv', index=False)