# Timesheet Analysis

The goal of this notebook is to analyze my timesheet of working hours. Things that I want to discover in this analysis:

- How many extra hours I've did

- What's the mean of extra hours by day/week/month

- The days when I did most of the extra hours (excluding weekends)

Future goals:

- Predict how many extra hours I will do in a certain week/month

In [2]:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd

In [12]:
# Connecting to Google Spreadsheet API and getting the desired spreadsheet

client_secret_path = '/home/aiquis/EI/timesheet/client_secret.json'
sheet_name = "Controle de ponto - EI"

scope = ['https://www.googleapis.com/auth/spreadsheets']
creds = ServiceAccountCredentials.from_json_keyfile_name(client_secret_path, scope)
client = gspread.authorize(creds)

sheet = client.open(sheet_name).sheet1

In [13]:
# Storing the content of the spreadsheet on a DataFrame

df = pd.DataFrame(sheet.get_all_records(), columns = ['Data', 'Hora Entrada', 'Hora Saída', 'Obs'])

df.columns = ['data', 'hora_entrada', 'hora_saida', 'obs']

In [None]:
# Casting columns data types

df['data'] = pd.to_datetime(df['data'], errors='ignore', format="%d/%m/%Y")
df['hora_entrada'] = pd.to_timedelta(df.hora_entrada + ':00', errors='coerce')
df['hora_saida']  = pd.to_timedelta(df.hora_saida + ':00' , errors='coerce')

df.info()
df.head()

In [16]:
# Handling null (NaT) values on hour columns and setting column 'data' as the DataFrame index

# Dropping NAs because they represent days that I didn't work, so, they're useless for analysis

df = df.dropna(axis=0, how='any')

df = df.set_index('data')

df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 281 entries, 2016-06-01 to 2017-08-11
Data columns (total 3 columns):
hora_entrada    281 non-null timedelta64[ns]
hora_saida      281 non-null timedelta64[ns]
obs             281 non-null object
dtypes: object(1), timedelta64[ns](2)
memory usage: 8.8+ KB


Unnamed: 0_level_0,hora_entrada,hora_saida,obs
data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-06-01,08:15:00,19:37:00,
2016-06-02,08:26:00,17:31:00,
2016-06-03,08:08:00,21:31:00,
2016-06-06,09:31:00,17:50:00,
2016-06-07,07:59:00,19:00:00,
