### Prepping Data Challenge:  Excelling at adding one more row (week 33)

The challenge is to classify when an order is new (the first report it has appeared in), unfulfilled (when it appears in any subsequent reports) or completed (the week after the order last appears in a report). But what if we needed to know whether the order was fulfilled and when? 

The Input is 5 worksheets in one Excel file with the same format

### Requirements
- Input the data
- Create one complete data set
- Use the Table Names field to create the Reporting Date
- Find the Minimum and Maximum date where an order appeared in the reports
- Add one week on to the maximum date to show when an order was fulfilled by
- Apply this logic:
  - The first time an order appears it should be classified as a 'New Order'
  - The week after the last time an order appears in a report (the maximum date) is when the order is classed as 'Fulfilled' 
  - Any week between 'New Order' and 'Fulfilled' status is classed as an 'Unfulfilled Order' 
- Pull of the data sets together 
- Remove any unnecessary fields
- Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
#Create one complete data set
df = None
with pd.ExcelFile('Wk33-Input.xlsx') as xl:
    for s in xl.sheet_names:
        df_new = pd.read_excel(xl, s)
        df_new['sheet_name'] = s
        df = pd.concat([df, df_new])

In [3]:
#Use the Table Names field to create the Reporting Date
df.rename(columns={'sheet_name':'Reporting Date'}, inplace=True)
df['Sale Date'] = pd.to_datetime(df['Sale Date'], format = '%Y/%m/%d')
df['Reporting Date'] = pd.to_datetime(df['Reporting Date'], format = '%Y/%m/%d')

In [4]:
#Find the Minimum and Maximum date where an order appeared in the reports
df['min_date'] = df.groupby('Orders')['Reporting Date'].transform('min')
df['max_date'] = df.groupby('Orders')['Reporting Date'].transform('max')

In [5]:
# Apply this logic:
#The first time an order appears it should be classified as a 'New Order'
#The week after the last time an order appears in a report (the maximum date) is when the order is classed as 'Fulfilled' 
#Any week between 'New Order' and 'Fulfilled' status is classed as an 'Unfulfilled Order'
df['Order Status'] = np.where(df['Reporting Date'] == df['min_date'], 'New Order',
                             (np.where(df['Reporting Date'] > df['max_date'],'Fulfilled Order', 'Unfulfilled Order')))

In [6]:
df.head()

Unnamed: 0,Orders,Sale Date,Reporting Date,min_date,max_date,Order Status
0,A,2020-12-29,2021-01-01,2021-01-01,2021-01-01,New Order
1,B,2020-12-31,2021-01-01,2021-01-01,2021-01-29,New Order
2,C,2021-01-01,2021-01-01,2021-01-01,2021-01-08,New Order
0,B,2020-12-31,2021-01-08,2021-01-01,2021-01-29,Unfulfilled Order
1,C,2021-01-01,2021-01-08,2021-01-01,2021-01-08,Unfulfilled Order


In [7]:
fulfilled = df.loc[(df['Reporting Date']==df['max_date']) & (df['Reporting Date'] != df['Reporting Date'].max())].copy()
fulfilled['Order Status'] = 'Fulfilled'

In [8]:
#Add one week on to the maximum date to show when an order was fulfilled by
fulfilled['Reporting Date'] = fulfilled['Reporting Date'] + pd.Timedelta('7 day')

In [9]:
fulfilled.head()

Unnamed: 0,Orders,Sale Date,Reporting Date,min_date,max_date,Order Status
0,A,2020-12-29,2021-01-08,2021-01-01,2021-01-01,Fulfilled
1,C,2021-01-01,2021-01-15,2021-01-01,2021-01-08,Fulfilled
3,E,2021-01-07,2021-01-15,2021-01-08,2021-01-08,Fulfilled
2,F,2021-01-08,2021-01-22,2021-01-08,2021-01-15,Fulfilled
4,H,2021-01-14,2021-01-22,2021-01-15,2021-01-15,Fulfilled


In [10]:
#Pull of the data sets together 
df = pd.concat([df, fulfilled])

In [11]:
#Remove any unnecessary fields
df = df[['Order Status','Orders','Sale Date','Reporting Date']]

In [12]:
df.head(10)

Unnamed: 0,Order Status,Orders,Sale Date,Reporting Date
0,New Order,A,2020-12-29,2021-01-01
1,New Order,B,2020-12-31,2021-01-01
2,New Order,C,2021-01-01,2021-01-01
0,Unfulfilled Order,B,2020-12-31,2021-01-08
1,Unfulfilled Order,C,2021-01-01,2021-01-08
2,New Order,D,2021-01-04,2021-01-08
3,New Order,E,2021-01-07,2021-01-08
4,New Order,F,2021-01-08,2021-01-08
0,Unfulfilled Order,B,2020-12-31,2021-01-15
1,Unfulfilled Order,D,2021-01-04,2021-01-15


In [13]:
#output the data
df.to_csv('wk33-output.csv', index=False)