# Training with Zwift

> "Every time I see an adult on a bicycle, I no longer despair for the future of the human race." 
<br>**H. G. Wells**<br><br>
“Learn to ride a bicycle. You will not regret it if you live.” 
<br>**Mark Twain**

I've mostly biked indoors over the past few years, due to COVID and a shoulder injury. I joined [Zwift](https://us.zwift.com/) last year on the recommendation of a friend. Zwift gamifies bike training, offering an array of virtual routes that vary in difficulty. I'm currently using the app to train for a bikepack trip [~1400 miles along the Pacific Coast](https://www.adventurecycling.org/routes-and-maps/adventure-cycling-route-network/pacific-coast/) next summer. 

One of Zwift's best features is their ability to create insightful visualizations from user data, acquired from a bluetooth-enabled bike trainer. I recently learned that users can download their ride data, providing an opportunity to further characterize my training experience.<br><br>

**GOAL:** *(1)* Characterize bike training patterns over time, starting from March 1, 2022; and *(2)* describe anticipated level of functioning on May 1, 2023.<br>
**DATA:** Ride data downloaded from personal Zwift (v5.62) account.<br>
**ANALYSIS:** Exploratory data analysis; Bayesian modeling.<br>
**ETHICAL CONSIDERATIONS:** There are no apparent issues with transparency, accountability, or equity in terms of avaiable data. To avoid any unforseen privacy issues, I will not be posting the raw ride data on Github. I will do my best to characterize the data in this Notebook to justify any insights drawn from the analyses.<br>
**ADDITIONAL CONSIDERATIONS:** None.<br>

## Load libraries

In [1]:
import os

# data wrangling/analysis
import pandas as pd
import numpy as np

# data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# zwift file mgmt
import fitdecode
from datetime import datetime, timedelta
from typing import Dict, Union, Optional,Tuple

## Load data

Data from Zwift is exports as a *FIT* file that contains the following variables:

Field | Variables
---|---------
`file_id` | serial_number, time_created, manufacturer, product, number, type
`device_info` | timestamp, serial_number, cum_operating_time, manufacturer, product, software_version, battery_voltage, device_index, device_type, hardware_version, battery_status
`event` | timestamp, timer_trigger, data16, event, event_type, event_group
`record` | timestamp, position_lat, position_long, distance, time_from_course, speed, distance, compressed_speed_distance, heart_rate, enhanced_altitude, altitude, enhanced_speed, power, grade, cadence, resistance, cycle_length, temperature
`lap` | timestamp, start_time, start_position_lat, start_position_long, end_position_lat, end_position_long, total_elapsed_time, total_timer_time, total_distance, total_strokes, message_index, total_calories, total_fat_calories, enhanced_avg_speed, avg_speed, enhanced_max_speed, max_speed, avg_power, max_power, total_ascent, total_descent, event, event_type, avg_heart_rate, max_heart_rate, avg_cadence, max_cadence, intensity, lap_trigger, sport, event_group
`session` | timestamp, start_time, start_position_lat, start_position_long, total_elapsed_time, total_timer_time, total_distance, total_strokes, nec_lat, nec_long, swc_lat, swc_long, message_index, total_calories, total_fat_calories, enhanced_avg_speed, avg_speed, enhanced_max_speed, max_speed, avg_power, max_power, total_ascent, total_descent, first_lap_index, num_laps, event, event_type, sport, sub_sport, avg_heart_rate, max_heart_rate, avg_cadence, max_cadence, total_training_effect, event_group, trigger
`activity` | timestamp, total_timer_time, local_timestamp, num_sessions, type, event, event_type, event_group


**Note:** There is a `record` field for each second of data recorded. There is also an `event` field at the beginning and end of the `record` fields. 
<br><br>

Data in these fields can be accessed using the `fitdecode` argument `get_values`. Most of the fields are either redundant or irrelevant to me right now. The most relevant will likely be `records` or `session`. Summary statistics, like those reported in `session`, can be calculated from the second-by-second `record` field data. Because of that, I'll only be extracting data from `record`.

First, we need to define a few functions that will help us parse the imported data. I followed [this tutorial](https://towardsdatascience.com/parsing-fitness-tracker-data-with-python-a59e7dc17418) and tweaked [the accompanying code](https://github.com/bunburya/fitness_tracker_data_parsing/blob/main/parse_fit.py) to parse the *FIT* files downloaded from Zwift. Note that missing information is coded as a null value in the final dataframe. 

In [2]:
# Column names for the second-by-second data dataframe
RECORD_COLUMN_NAMES = ['timestamp', 'speed', 'distance', 'altitude', 'power', 'grade', 'cadence']

In [3]:
def get_record_data(frame: fitdecode.records.FitDataMessage) -> Dict[str, Union[float, datetime, timedelta, int]]:
    """
    Extract some data from a FIT frame representing a record and return it as a dictiontary
    """
    data: Dict[str, Union[float, datetime, timedelta, int]] = {}
    
    for field in RECORD_COLUMN_NAMES:  
        if frame.has_field(field):
            data[field] = frame.get_value(field)
    
    return data

In [4]:
def get_dataframe(fname: str) -> Tuple[pd.DataFrame]:
    """
    Takes the path to a FIT file (as a string) and returns a DataFrame with data from each record.
    """
    record_data = []
    with fitdecode.FitReader(fname) as fit_file:
        for frame in fit_file:
            if isinstance(frame, fitdecode.records.FitDataMessage):
                if frame.name == 'record':
                    single_record_data = get_record_data(frame)
                    record_data.append(single_record_data)

    record_df = pd.DataFrame(record_data, columns=RECORD_COLUMN_NAMES)
    record_df.reset_index(inplace=True)
    
    record_df.rename(columns={'index':'second'}, inplace = True)
    
    return record_df

Next, I loop through the downloaded *FIT* files located on the path.

In [5]:
# define path to .fit file 
dir_name = os.getcwd() + '/' 

In [6]:
files = ['2022-07-12-07-31-42.fit', '2022-07-16-07-47-29.fit']

In [7]:
# loop through list of FIT files 
for file in files:
    file_path = dir_name + file
    record_df = get_dataframe(file_path) 
    
    print(record_df)

      second                 timestamp speed  distance  altitude  power grade  \
0          0 2022-07-12 11:31:53+00:00  None      2.41       6.2     45  None   
1          1 2022-07-12 11:31:54+00:00  None      4.28       6.2     45  None   
2          2 2022-07-12 11:31:55+00:00  None      6.67       6.0     79  None   
3          3 2022-07-12 11:31:56+00:00  None      9.82       6.0     79  None   
4          4 2022-07-12 11:31:57+00:00  None     13.12       6.0     79  None   
...      ...                       ...   ...       ...       ...    ...   ...   
1239    1239 2022-07-12 11:52:32+00:00  None  10950.25       1.6     35  None   
1240    1240 2022-07-12 11:52:33+00:00  None  10958.71       1.6     23  None   
1241    1241 2022-07-12 11:52:34+00:00  None  10967.59       1.6     23  None   
1242    1242 2022-07-12 11:52:35+00:00  None  10976.01       1.6     15  None   
1243    1243 2022-07-12 11:52:36+00:00  None  10984.27       1.6     15  None   

      cadence  
0          