Health and Exercise Tracker Analysis Notebook

The following cell pulls the live tracker from Google Drive so I don't have to redownaload the file to a data folder every time.

In [1]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import pandas as pd
import requests
from io import BytesIO

spreadsheetId = "16EZhzrGxpV86c_1Axe9RpDaQNBbIbFWPVzkUyhOMIiA"  # <--- Please set the Spreadsheet ID.

# 1. Download the Google Spreadsheet as XLSX format.
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
url = "https://www.googleapis.com/drive/v3/files/" + spreadsheetId + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
res = requests.get(url, headers={"Authorization": "Bearer " + gauth.attr['credentials'].access_token})

# 2. The downloaded XLSX data is read with `pd.read_excel`.
sheet = "Sheet1"
df = pd.read_excel(BytesIO(res.content), usecols=None, sheet_name=sheet)

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?client_id=895221966072-ivfclv15clemuid8o8fphc3205ccooh2.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&access_type=offline&response_type=code

Authentication successful.


Here is a preview of the DataFrame.

In [2]:
df

Unnamed: 0,date,weight_kg,hours_slept,sleep_quality,bpm,walk_distance_km,walk_time,run_distance_km,run_time,cycle_distance_km,cycle_time,weights_intensity,weights_time,notes
0,07.03.2025,77.0,06:30:00,4,54,,,5.1,0:30:32,,,,,
1,08.03.2025,75.0,10:36:00,7,56,,,,,,,,,
2,09.03.2025,76.2,07:07:00,5,58,,,,,,,3.0,,
3,10.03.2025,77.5,08:32:00,6,55,3.74,41.02,3.77,37.46,,,,,"St Ramon (Uphill), sprained ankle, slow walk back"
4,11.03.2025,76.6,08:41:00,5,54,,,,,19.69,56.19,,,


And here we can see the dat types for the individual columns.

In [3]:
df.dtypes

date                  object
weight_kg            float64
hours_slept           object
sleep_quality          int64
bpm                    int64
walk_distance_km     float64
walk_time            float64
run_distance_km      float64
run_time              object
cycle_distance_km    float64
cycle_time           float64
weights_intensity    float64
weights_time         float64
notes                 object
dtype: object

So here we can see that we have some data types that will make life difficult for us in the future.

We could just change the data types in the original google sheets document but that wouldn't be good practice.

Let's start by trying to see if we can turn these into universal values.

First we'll convert the date into a date_time format:

In [8]:
df["date"] = pd.to_datetime(df["date"])

In [9]:
df.dtypes

date                 datetime64[ns]
weight_kg                   float64
hours_slept                  object
sleep_quality                 int64
bpm                           int64
walk_distance_km            float64
walk_time                   float64
run_distance_km             float64
run_time                     object
cycle_distance_km           float64
cycle_time                  float64
weights_intensity           float64
weights_time                float64
notes                        object
weight                      float64
dtype: object

We can set the individual columns according to their starting dtypes, see the column ["weight"] here:

In [7]:
df["weight"] = df['weight_kg'].astype(float)

In [10]:
df["hours_slept"] = pd.to_datetime(df['hours_slept'])

TypeError: <class 'datetime.time'> is not convertible to datetime, at position 0

In [4]:
# df["weight_kg"] = df["weight_kg"].apply(pd.to_numeric)