### Prepping Data Challenge: Prep Air Project Overruns (week 18)

#### The Challenge 
This week's challenge is focused on Dates and the calculation functions available to you. 

This week we would like you to prepare you data for building a Gantt chart and supporting information on a dashboard (you don't have to build the dashboard but bonus points if you do!). Prep Air (our fake airline) has had a number of projects that have been over-running and the leadership team want to know why: 

#### Requirements
Here's what we're asking of you:
 - Input the data
 - Workout the 'Completed Date' by adding on how many days it took to complete each task from the Scheduled Date
 - Rename 'Completed In Days from Schedule Date' to 'Days Difference to Schedule'
 - Your workflow will likely branch into two at this point:
1. Pivot Task to become column headers with the Completed Date as the values in the column
   - You will need to remove some data fields to ensure the result of the pivot is a single row for each project, sub-project and owner combination. 
   - Calculate the difference between Scope to Build time
   - Calculate the difference between Build to Delivery time
   - Pivot the Build, Deliver and Scope column to re-create the 'Completed Dates' field and Task field
      - You will need to rename these
      
2. You don't need to do anything else to this second flow
   Now you will need to:
   - Join Branch 1 and Branch 2 back together 
     - Hint: there are 3 join clauses for this join
   - Calculate which weekday each task got completed on as we want to know whether these are during the weekend or not for the dashboard
   - Clean up the data set to remove any fields that are not required.
   - Output as a csv file

In [1]:
import pandas as pd

In [2]:
#Input the data
with pd.ExcelFile('WK18-Input.xlsx') as xlsx:
    df = pd.read_excel(xlsx)

In [3]:
df.head()

Unnamed: 0,Project,Sub-project,Task,Owner,Scheduled Date,Completed In Days from Scheduled Date
0,New Loyalty Scheme,Marketing,Scope,Tom,2021-04-19,0.0
1,New Loyalty Scheme,Marketing,Build,Tom,2021-04-21,2.0
2,New Loyalty Scheme,Marketing,Deliver,Tom,2021-04-30,5.0
3,New Loyalty Scheme,Operations,Scope,Jenny,2021-04-15,0.0
4,New Loyalty Scheme,Operations,Build,Jenny,2021-04-23,3.0


In [4]:
#Workout the 'Completed Date' by adding on how many days it took to complete each task from the Scheduled Date
df['Completed Date'] = df['Scheduled Date'] \
                       + df['Completed In Days from Scheduled Date'].apply(lambda x: pd.Timedelta(x, unit='D'))
#The to_timedelta() function is used to convert argument to datetime.
#Timedeltas are absolute differences in times, expressed in difference units (e.g. days, hours, minutes, seconds). 
#This method converts an argument from a recognized timedelta format / value into a Timedelta Day Unit type.

In [5]:
#Rename 'Completed In Days from Schedule Date' to 'Days Difference to Schedule'
df.rename(columns={'Completed In Days from Scheduled Date' : 'Days Difference to Schedule'}, inplace=True)

### 1. Pivot Task to become column headers with the Completed Date as the values in the column

In [6]:
#You will need to remove some data fields to ensure the result of the pivot is a single row for each project, 
#sub-project and owner combination.
df_complete = df.pivot(index= ['Project', 'Sub-project', 'Owner'], columns='Task', values='Completed Date').reset_index()

In [7]:
#Calculate the difference between Scope to Build time
df_complete['Scope to Build Time'] = df_complete['Build'] - df_complete['Scope']

In [8]:
df_complete.head()

Task,Project,Sub-project,Owner,Build,Deliver,Scope,Scope to Build Time
0,New Loyalty Scheme,Marketing,Tom,2021-04-23,2021-05-05,2021-04-19,4 days
1,New Loyalty Scheme,Operations,Jenny,2021-04-26,2021-05-02,2021-04-15,11 days
2,New Trolley Inventory,Marketing,Tom,2021-05-07,2021-05-17,2021-05-02,5 days
3,New Trolley Inventory,Operations,Jenny,2021-05-07,2021-05-17,2021-04-30,7 days
4,Spring Sale,Marketing,Carl,2021-05-05,2021-05-07,2021-04-22,13 days


In [9]:
#Calculate the difference between Build to Delivery time
df_complete['Build to Delivery Time'] = df_complete['Deliver'] - df_complete['Build']

### 2

In [10]:
#Join Branch 1 and Branch 2 back together 
df = df.merge(df_complete[['Project','Sub-project','Owner'] + ['Scope to Build Time', 'Build to Delivery Time']], 
              on=['Project','Sub-project','Owner'], how='inner')

In [11]:
#Calculate the completed weekday
df['Completed Weekday'] = df['Completed Date'].dt.day_name()

In [12]:
df= df[['Completed Weekday','Task','Scope to Build Time','Build to Delivery Time','Days Difference to Schedule',
       'Project','Sub-project','Owner','Scheduled Date','Completed Date']]

In [13]:
df.head()

Unnamed: 0,Completed Weekday,Task,Scope to Build Time,Build to Delivery Time,Days Difference to Schedule,Project,Sub-project,Owner,Scheduled Date,Completed Date
0,Monday,Scope,4 days,12 days,0.0,New Loyalty Scheme,Marketing,Tom,2021-04-19,2021-04-19
1,Friday,Build,4 days,12 days,2.0,New Loyalty Scheme,Marketing,Tom,2021-04-21,2021-04-23
2,Wednesday,Deliver,4 days,12 days,5.0,New Loyalty Scheme,Marketing,Tom,2021-04-30,2021-05-05
3,Thursday,Scope,11 days,6 days,0.0,New Loyalty Scheme,Operations,Jenny,2021-04-15,2021-04-15
4,Monday,Build,11 days,6 days,3.0,New Loyalty Scheme,Operations,Jenny,2021-04-23,2021-04-26


In [15]:
df.to_csv('WK18-output.csv', index=False) 