### Prepping Data Challenge: Prep Air Project Details (week 19)

#### The Challenge 
This week's challenge is all about String Calculations. 

This week we are trying to find out more detail on what is going on with the project over runs in Prep Air (every data prepper's favourite airline). To get more detail than just what was shared last week we've uncovered the commentary log that sits behind our project management system. Like any system that holds the detail shown on the programme's interface in a log file, it has great detail but held in an unfriendly way. 

We need your help to get stuck into the messy data and extract out the useful details 

#### Requirements
 - Input the data
 
There are lots of different ways you can do this challenge so rather than a step-by-step set of requirements, feel free to create each of these data fields in whatever order you like:
 - 'Week' with the word week and week number together 'Week x' 
 - 'Project' with the full project name
 - 'Sub-Project' with the full sub-project name
 - 'Task' with the full type of task
 - 'Name' with the owner of the task's full name (Week 18's output can help you check these if needed) 
 - 'Days Noted' some fields have comments that say how many days tasks might take. This field should note the number of days mentioned if said in the comment otherwise leave as a null. 
 - 'Detail' the description from the system output with the project details in the [ ] 

- Output the file

In [1]:
import pandas as pd

In [2]:
#Input the data

with pd.ExcelFile('WK19-Input.xlsx') as xlsx:
    P_updates = pd.read_excel(xlsx, 'Project Schedule Updates')
    P_lookup = pd.read_excel(xlsx, 'Project Lookup Table')
    Sub_project = pd.read_excel(xlsx, 'Sub-Project Lookup Table')
    Task = pd.read_excel(xlsx, 'Task Lookup Table')
    Owner = pd.read_excel(xlsx, 'Owner Lookup Table')

In [3]:
P_updates.head()

Unnamed: 0,Week,Commentary
0,16,[NLS/Op-Sc] Delivered scope for the project. R...
1,17,[NLS/Op-Bu] Build kickoff but long project. je...
2,18,[NLS/Op-De] Long delivery process has begun at...
3,19,[NTI/Mar-Bu] Project build commences. Will be ...
4,20,[NTI/Mar-De] Delivery next week around 8 days....


In [4]:
P_lookup.head()

Unnamed: 0,Project Code,Project
0,NLS,New Loyalty Scheme
1,NTI,New Trolley Inventory
2,SPS,Spring Sale


In [5]:
Sub_project.head()

Unnamed: 0,Sub-Project Code,Sub-Project
0,mar,Marketing
1,op,Operations


In [6]:
Task.head()

Unnamed: 0,Task Code,Task
0,Sc,Scope
1,Bu,Build
2,De,Deliver


In [7]:
Owner.head()

Unnamed: 0,Abbreviation,Name
0,Tom,Tom
1,Jen,Jenny
2,Jon,Jonathan
3,Car,Carl


In [8]:
#'Week' with the word week and week number together 'Week x' 
P_updates.index = 'Week ' + P_updates['Week'].astype(str)

In [9]:
#'Project' with the full project name
P_updates = P_updates['Commentary'].str.split('\s+(?=\[)').explode().str.strip().reset_index()

In [10]:
# parse the commentary
P_updates[['Project Code', 'Sub-Project Code', 'Task Code', 'Detail']] = \
                       P_updates['Commentary'].str.extract('\[(.*?)\/(.*?)\-(.*?)\]\s+(.*)\s?')

In [11]:
# parse the owner name
P_updates['Abbreviation'] = P_updates['Detail'].str.extract('.*\s+(.*)\.\s*')

In [12]:
#Days Noted' some fields have comments that say how many days tasks might take.
P_updates['Days Noted'] = P_updates['Detail'].str.extract('.*?(\d+)\sdays.*')

In [13]:
P_updates['Days Noted'].fillna("", inplace=True)

In [14]:
P_updates['Sub-Project Code'].unique()

array(['Op', 'Mar', 'Ops'], dtype=object)

In [15]:
#adjust the case
P_updates['Sub-Project Code'] = P_updates['Sub-Project Code'].str.lower().replace('ops', 'op')
P_updates['Abbreviation'] = P_updates['Abbreviation'].str.title()

In [16]:
# join to the lookup tables
P_updates = P_updates.merge(P_lookup, on='Project Code', how='left')\
       .merge(Sub_project, on='Sub-Project Code', how='left')\
       .merge(Task, on='Task Code', how='left')\
       .merge(Owner, on='Abbreviation', how='left')

In [17]:
output = P_updates[['Week','Project','Sub-Project','Task','Name','Days Noted','Detail']]

In [18]:
output.head()

Unnamed: 0,Week,Project,Sub-Project,Task,Name,Days Noted,Detail
0,Week 16,New Loyalty Scheme,Operations,Scope,Jenny,,Delivered scope for the project. Resourcing fi...
1,Week 17,New Loyalty Scheme,Operations,Build,Jenny,,Build kickoff but long project. jen.
2,Week 17,New Loyalty Scheme,Marketing,Scope,Tom,,Scope completed. tom.
3,Week 17,New Loyalty Scheme,Marketing,Build,Tom,,Marketing Build complete. tom.
4,Week 17,Spring Sale,Marketing,Scope,Carl,3.0,Completed but late in the week due (3 days nee...


In [19]:
output.to_csv('WK19-output.csv', index=False) 