### Prepping Data Challenge: Timesheet checks (week 17)

#### The Challenge 
My employees log their hours daily and are contracted to 8 hours per week so I want to check their average number of hours worked over the last 2 weeks. Also, I allow for 20% of their time (not including Chats) to work on their own special projects, meaning they should be spending at least 80% of their time on Client items of work, so I also want to check that they are sticking to instructions by calculating the % of total hours spent on Client work. The task has three sets of requirements as the stakeholder is quite specific.

#### Requirements
 - Remove the ‘Totals’ Rows
 - Pivot Dates to rows and rename fields 'Date' and 'Hours'
 - Split the ‘Name, Age, Area of Work’ field into 3 Fields and Rename
 - Remove unnecessary fields
 - Remove the row where Dan was on Annual Leave and check the data type of the Hours Field.
 - Total up the number of hours spent on each area of work for each date by each employee.

 - First we are going to work out the avg number of hours per day worked by each employee
   - Calculate the total number of hours worked and days worked per person
   - Calculate the avg hours and remove unnecessary fields.

 - Now we are going to work out what % of their day (not including Chats) was spend on Client work.
   - Filter out Work related to Chats.
   - Calculate total number of hours spent working on each area for each employee
   - Calculate total number of hours spent working on both areas together for each employee
   - Join these totals together
   - Calculate the % of total and remove unnecessary fields
   - Filter the data to just show Client work
   - Join to the table with Avg hours to create your final output

### Input the data

In [1]:
import pandas as pd

In [2]:
with pd.ExcelFile('WK17-Input.xlsx') as xlsx:
    df = pd.read_excel(xlsx)

In [3]:
#Remove the ‘Totals’ Rows
df = df[df['Name, Age, Area of Work'].notnull()]

In [4]:
df.head()

Unnamed: 0,"Name, Age, Area of Work",Project,2021-02-01 00:00:00,2021-02-02 00:00:00,2021-02-03 00:00:00,2021-02-04 00:00:00,2021-02-05 00:00:00,2021-02-08 00:00:00,2021-02-09 00:00:00,2021-02-10 00:00:00,2021-02-11 00:00:00,2021-02-12 00:00:00
0,"Dan, 28: Client",Client Meetings,,2.0,,1.0,,1.5,0.5,,,Annual Leave
1,"Dan, 28: Client",Client Issues,1.0,1.5,4.5,3.5,1.0,2.0,1.0,2.0,3.0,Annual Leave
2,"Dan, 28: Client",Monthly Reports,,,,,2.0,1.0,1.0,2.0,1.0,Annual Leave
3,"Dan, 28: Client",Client Emails,2.0,0.5,0.5,0.5,1.0,1.0,1.0,,,Annual Leave
4,"Dan, 28: Client",Client Communications,1.0,1.0,,,,0.5,,,,Annual Leave


In [5]:
#Pivot Dates to rows and rename fields 'Date' and 'Hours'
df = df.melt(id_vars=['Name, Age, Area of Work','Project'], var_name='Date', value_name='Hour')

In [6]:
#Split the ‘Name, Age, Area of Work’ field into 3 Fields and Rename
df[['Name','Age,Area of work']] = df['Name, Age, Area of Work'].str.split(',', expand=True)

In [7]:
df[['Age','Area of work']] = df['Age,Area of work'].str.split(':', expand=True)

In [8]:
#Remove unnecessary fields
df = df.drop(['Name, Age, Area of Work','Age,Area of work'], axis=1)

In [9]:
df.head()

Unnamed: 0,Project,Date,Hour,Name,Age,Area of work
0,Client Meetings,2021-02-01,,Dan,28,Client
1,Client Issues,2021-02-01,1.0,Dan,28,Client
2,Monthly Reports,2021-02-01,,Dan,28,Client
3,Client Emails,2021-02-01,2.0,Dan,28,Client
4,Client Communications,2021-02-01,1.0,Dan,28,Client


In [10]:
#Remove the row where Dan was on Annual Leave and check the data type of the Hours Field.
df = df.loc[df['Hour'] != 'Annual Leave']
df['Hour'].dtypes

dtype('O')

In [11]:
df['Hour'].astype('float')
df.fillna(0, inplace=True)

In [12]:
#Total up the number of hours spent on each area of work for each date by each employee.
totals = df.groupby('Name').agg(total_days=('Date', 'nunique'),
                                  total_hours=('Hour', 'sum')).reset_index()
totals['Avg Number of Hours worked per day'] = totals['total_hours'] / totals['total_days']

In [13]:
totals.head()

Unnamed: 0,Name,total_days,total_hours,Avg Number of Hours worked per day
0,Dan,9,72.25,8.027778
1,George,10,84.0,8.4
2,Sam,10,77.0,7.7


In [15]:
#Now we are going to work out what % of their day (not including Chats) was spend on Client work.
df_ex_chats =df.drop(df[df['Area of work'].str.contains('Chats')].index, axis= 0)
df_ex_chats = df_ex_chats.groupby(['Name','Area of work'])['Hour'].sum().reset_index()

# % of day (not including Chats)
df_ex_chats['% of Total'] = (df_ex_chats['Hour'] / df_ex_chats.groupby('Name')['Hour'].transform('sum'))\
                        .map('{:.0%}'.format)
#The map() function executes a specified function for each item in an iterable
#the '{:.0%}'.format left aligns it and puts in the percentage format.

further reading: https://pyformat.info/

In [16]:
df_ex_chats.head()

Unnamed: 0,Name,Area of work,Hour,% of Total
0,Dan,Client,40.5,75%
1,Dan,Special Projects,13.5,25%
2,George,Client,56.5,81%
3,George,Special Projects,13.0,19%
4,Sam,Client,53.0,87%


In [12]:
#Merge
df_output = pd.merge(df_ex_chats, totals,on='Name', how='left')

In [13]:
df_output =df_output.drop(df_output[df_output['Area of work'].str.contains('Special Projects')].index, axis= 0)

In [14]:
df_output = df_output[['Name', 'Area of work', '% of Total', 'Avg Number of Hours worked per day']]
df_output.head()

Unnamed: 0,Name,Area of work,% of Total,Avg Number of Hours worked per day
0,Dan,Client,75%,8.027778
2,George,Client,81%,8.4
4,Sam,Client,87%,7.7


In [15]:
df_output.to_csv('WK17-output.csv', index=False)