### Prepping Data Challenge:  Excelling through aggregation (week 32)

SUMIF, or SUMIFS if you have multiple conditions, allows you to scan a data set and summarise the values that match any condition you create. When working with large tables with multiple entries per category, this is a great way to create some totals to help you analyse the data set. Whilst SUMIF doesn't exist within Prep, the IF function and aggregation step can be used to create the same effect. 

### Requirements
- Input data
- Form Flight name
- Workout how many days between the sale and the flight departing
- Classify daily sales of a flight as:
  - Less than 7 days before departure
  - 7 or more days before departure
- Mimic the SUMIFS and AverageIFS functions by aggregating the previous requirements fields by each Flight and Class
- Round all data to zero decimal places
- Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
df = pd.read_csv('WK32-Input.csv', parse_dates = ['Date','Date of Flight'], dayfirst=True)

In [3]:
df.head()

Unnamed: 0,Departure,Destination,Date,Class,Date of Flight,Ticket Sales
0,London,Perth,2021-01-01,Economy,2021-01-31,572
1,London,Perth,2021-01-02,Economy,2021-01-31,1111
2,London,Perth,2021-01-03,Economy,2021-01-31,845
3,London,Perth,2021-01-04,Economy,2021-01-31,862
4,London,Perth,2021-01-05,Economy,2021-01-31,1087


In [4]:
#Form Flight name
df['Flight'] = df['Departure']+' to '+df['Destination']

In [5]:
#Workout how many days between the sale and the flight departing
df['no. of days'] = (df['Date of Flight'] - df['Date']).dt.days

In [6]:
#Classify daily sales of a flight as:
# Less than 7 days before departure
# 7 or more days before departure
df['Less than 7 days before departure'] = np.where(df['no. of days'] < 7, df['Ticket Sales'], np.nan)
df['7 or more days before departure'] = np.where(df['no. of days'] < 7, np.nan, df['Ticket Sales'])

In [7]:
#Mimic the SUMIFS and AverageIFS functions by aggregating the previous requirements fields by each Flight and Class
#Round all data to zero decimal places
output = df.groupby(['Flight','Class']).agg(Avg_daily_sales_7_more=('7 or more days before departure','mean'),
                                         Avg_daily_sales_less=('Less than 7 days before departure','mean'),
                                         Sales_less_than_7_days=('Less than 7 days before departure','sum'),
                                         Sales_7_days_or_more=('7 or more days before departure','sum'))\
                                     .round(0).astype(int)\
                                     .reset_index()

In [8]:
output.rename(columns={'Avg_daily_sales_7_more':'Avg. daily sales 7 days or more until flight',
                       'Avg_daily_sales_less':'Avg. daily sales less than 7 days until flight',
                       'Sales_less_than_7_days':'Sales less than 7 days until the flight',
                       'Sales_7_days_or_more':'Sales 7 days or more until the flight'}, inplace=True)

In [9]:
output = output[['Flight','Class',
             'Avg. daily sales 7 days or more until flight',
             'Avg. daily sales less than 7 days until flight',
             'Sales less than 7 days until the flight',
             'Sales 7 days or more until the flight']]

In [10]:
output.head(10)

Unnamed: 0,Flight,Class,Avg. daily sales 7 days or more until flight,Avg. daily sales less than 7 days until flight,Sales less than 7 days until the flight,Sales 7 days or more until the flight
0,London to Paris,Business,1071,994,41746,148804
1,London to Paris,Economy,239,245,10287,33204
2,London to Perth,Business,1598,1568,65863,222149
3,London to Perth,Economy,756,746,31333,105126
4,Paris to London,Business,1037,1007,42309,144181
5,Paris to London,Economy,245,250,10515,34087
6,Perth to London,Business,1691,1816,76261,235035
7,Perth to London,Economy,765,738,31008,106322


In [11]:
#output the data
output.to_csv('wk32-output.csv', index=False)