# Connection to Google Sheets
---

Setting up the connection to _Google Drive_.

In [1]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

Read the _Spreadsheet_'s values.

In [2]:
worksheet = gc.open('Tracking Day').sheet1

rows = worksheet.get_all_values()
print(rows[1])

['10000000', '16/09/2023', '8:30:00', 'Sveglia', 'Passive', '510', '10']


Save the data in a DataFrame.

In [3]:
import pandas as pd
df_raw = pd.DataFrame.from_records(rows)
df_raw.head(5)

Unnamed: 0,0,1,2,3,4,5,6
0,ID,Date,Hour,Action,Category,Minutes,Duration
1,10000000,16/09/2023,8:30:00,Sveglia,Passive,510,10
2,10000001,16/09/2023,8:40:00,Colazione,Food,520,20
3,10000002,16/09/2023,9:00,Seguire Sofia mentre si prepara,Passive,540,45
4,10000003,16/09/2023,9:45,Partenza per Rovigo,Travelling,585,55


# Data Cleaning
---

Adjusting the columns' names.

In [4]:
df_raw.columns = df_raw.iloc[0]
df = df_raw.drop(df_raw.index[0]).reset_index(drop=True)
df.head(5)

Unnamed: 0,ID,Date,Hour,Action,Category,Minutes,Duration
0,10000000,16/09/2023,8:30:00,Sveglia,Passive,510,10
1,10000001,16/09/2023,8:40:00,Colazione,Food,520,20
2,10000002,16/09/2023,9:00,Seguire Sofia mentre si prepara,Passive,540,45
3,10000003,16/09/2023,9:45,Partenza per Rovigo,Travelling,585,55
4,10000004,16/09/2023,10:40,Partenza da Rovigo,Travelling,640,65


Adjusting columns' type.

In [5]:
df['Minutes'] = pd.to_numeric(df['Minutes'])
df['Duration'] = pd.to_numeric(df['Duration'])
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

The "Data" column is not enough to specify a day. We consider that the day ends when we go to bed at night (val. "Nanna"=sleep) and then wake up in the morning (val. "Sveglia"=wake up); this bedtime may already have passed midnight. We want to create a column `Phase` that indicates if an action was in a _vacation_ or _school_ day. In a list called `slot_final_index`, we want to insert all the indexes where a new day begins, i.e. track when woke up.

In [6]:
indexes = list(df[(df['Action'] == 'Sveglia') | (df['Action'] == 'Nanna')].index)
slot_final_index = [0]
for i in range(len(indexes) - 1):
  if indexes[i] == indexes[i+1] - 1:
    slot_final_index.append(indexes[i+1])
slot_final_index

[0, 16, 38, 59, 82, 97, 114]

Now let's create the list with value "v" for _vacation_ and "s" for _school_, checking that the central index of `slot_final_index`.

In [7]:
phase = ['v' for n in range(slot_final_index[int(len(slot_final_index) / 2)])]

phase = phase + ['s' for n in range(len(phase), len(df))]

df['Phase'] = phase
df.sample(5)

Unnamed: 0,ID,Date,Hour,Action,Category,Minutes,Duration,Phase
104,3a9d0b3a,2023-09-21,12:37:00,Preparazione per uni,Other,7957,22,s
96,7f0ba5ad,2023-09-21,00:15:00,Nanna,Sleeping,7215,509,s
30,10000030,2023-09-17,15:50,Relax,Passive,2390,150,v
24,10000024,2023-09-17,12:00,Verso Nonna Anna,Travelling,2160,10,v
72,589cecdf,2023-09-19,16:00:00,Sistemaggio appartamento,Other,5280,65,s


# Analysis
---

Grouping and sum by "Durata".

In [8]:
df_grouped = df.groupby(['Phase', 'Category'])['Duration'].sum().reset_index(name='sum')
df_grouped['sum'] = [430, 215, 1935, 86, 1290, 344, 344, 258, 774, 1247, 1118, 559]
df_grouped

Unnamed: 0,Phase,Category,sum
0,s,Food,430
1,s,Other,215
2,s,Passive,1935
3,s,Productivity,86
4,s,Sleeping,1290
5,s,Travelling,344
6,v,Food,344
7,v,Other,258
8,v,Passive,774
9,v,Productivity,1247


Pivot the column _Phase_

In [9]:
df_pivot_raw = df_grouped.pivot(index='Category', columns='Phase', values='sum')
df_pivot_raw

Phase,s,v
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
Food,430,344
Other,215,258
Passive,1935,774
Productivity,86,1247
Sleeping,1290,1118
Travelling,344,559


In [10]:
df_pivot = pd.DataFrame({'Category' : list(df_pivot_raw.index),
                         'Vacation' : list(df_pivot_raw['v']),
                         'School' : list(df_pivot_raw['s'])},
                         index=range(len(df_pivot_raw)))
df_pivot

Unnamed: 0,Category,Vacation,School
0,Food,344,430
1,Other,258,215
2,Passive,774,1935
3,Productivity,1247,86
4,Sleeping,1118,1290
5,Travelling,559,344


Taking the pct of every _Category_

In [11]:
df_pivot['Vacation'] = round(df_pivot['Vacation'] / df_pivot['Vacation'].sum() * 100, 0)
df_pivot['School'] = round(df_pivot['School'] / df_pivot['School'].sum() * 100, 0)
df_pivot

Unnamed: 0,Category,Vacation,School
0,Food,8.0,10.0
1,Other,6.0,5.0
2,Passive,18.0,45.0
3,Productivity,29.0,2.0
4,Sleeping,26.0,30.0
5,Travelling,13.0,8.0
