# Timesheet Analysis

The goal of this notebook is to analyze my timesheet of working hours. Things that I want to discover in this analysis:

- How many extra hours I have

- What's the mean of extra hours by day/week/month

- The days when I did most of the extra hours (excluding weekends)

Future goals:

- Predict how many extra hours I will do in a certain week/month

Some important information about the analysis:

- The date range begins on 09/15/2016 because that was the day when the timesheet became "official"

- The workload for each day is 08h45m with 1h of lunch, so, 07h45 of work

In [20]:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
import datetime

In [21]:
# Connecting to Google Spreadsheet API and getting the desired spreadsheet

client_secret_path = '/home/aiquis/study/timesheet-analysis/client_secret.json'
sheet_name = "Controle de ponto - EI"

scope = ['https://www.googleapis.com/auth/spreadsheets']
creds = ServiceAccountCredentials.from_json_keyfile_name(client_secret_path, scope)
client = gspread.authorize(creds)

sheet = client.open(sheet_name).sheet1

In [22]:
# Storing the content of the spreadsheet on a DataFrame

df = pd.DataFrame(sheet.get_all_records(), columns = ['Data', 'Hora Entrada', 'Hora Saída', 'Obs'])

df.columns = ['date', 'hour_in', 'hour_out', 'obs']

In [None]:
# Casting columns data types

df['date'] = pd.to_datetime(df['date'], errors='ignore', format="%d/%m/%Y")
df['hour_in'] = pd.to_timedelta(df.hour_in + ':00', errors='coerce')
df['hour_out']  = pd.to_timedelta(df.hour_out + ':00' , errors='coerce')

df.info()
df.head()

In [None]:
# Handling null (NaT) values on hour columns and setting column 'data' as the DataFrame index

# Dropping NAs because they represent days that I didn't work, so, they're useless for analysis

df = df.dropna(axis=0, how='any')

df = df.set_index('date')

df.info()
df.head()

In [25]:
carga_horaria = pd.to_timedelta('08:45:00')

df['week_day'] = df.index.strftime('%A')

df['worked_hours'] = df['hour_out'] - df['hour_in']

df['extra_hours'] = df['worked_hours'] - carga_horaria

df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 286 entries, 2016-06-01 to 2017-08-18
Data columns (total 6 columns):
hour_in         286 non-null timedelta64[ns]
hour_out        286 non-null timedelta64[ns]
obs             286 non-null object
week_day        286 non-null object
worked_hours    286 non-null timedelta64[ns]
extra_hours     286 non-null timedelta64[ns]
dtypes: object(2), timedelta64[ns](4)
memory usage: 15.6+ KB


Unnamed: 0_level_0,hour_in,hour_out,obs,week_day,worked_hours,extra_hours
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-06-01,08:15:00,19:37:00,,Wednesday,11:22:00,02:37:00
2016-06-02,08:26:00,17:31:00,,Thursday,09:05:00,00:20:00
2016-06-03,08:08:00,21:31:00,,Friday,13:23:00,04:38:00
2016-06-06,09:31:00,17:50:00,,Monday,08:19:00,-1 days +23:34:00
2016-06-07,07:59:00,19:00:00,,Tuesday,11:01:00,02:16:00


Next problems to solve:

- Column horas_extras show an unexpect value (but correct) when the result is negative. Format to show only negative hours

- Consider all hours done on weekends and holidays as extra hours

- Consider different weights for the hours depending on the day