# Training with Zwift

> "Every time I see an adult on a bicycle, I no longer despair for the future of the human race." 
<br>**H. G. Wells**<br><br>
“Learn to ride a bicycle. You will not regret it if you live.” 
<br>**Mark Twain**

Biking is the only childhood hobby I've continued to enjoy throughout my life. I remember riding in the local Fourth of July parade in elementary school. As a college freshman, I salvaged a '70s Univega ("Maria"), building it up from the frame and customizing it for the next few years. I rode that bike everywhere. It was stolen while I was in grad school; plucked off the porch at a BBQ. Since then, I've had a [2017 Fuji Touring Bike](https://www.cyclingabout.com/2017-fuji-touring-bike/) ("björn"). björn has never failed me. 

Most of my riding over the past few years has been indoors due to COVID and a shoulder injury. I joined [Zwift](https://us.zwift.com/) last year on the recommendation of a friend. Zwift gamifies bike training by offering an array of virtual routes that vary in difficulty. I'm currently using the app to train for a bikepack trip [~1400 miles along the Pacific Coast](https://www.adventurecycling.org/routes-and-maps/adventure-cycling-route-network/pacific-coast/) next summer. 

One of Zwift's best features is their ability to create insightful visualizations from user data, acquired from a bluetooth-enabled bike trainer. I recently learned that users can download their ride data, providing an opportunity to further characterize my training experience.<br><br>

**GOAL:** *(1)* Characterize bike training patterns over time, starting from March 1, 2022; and *(2)* describe anticipated level of functioning on May 1, 2023.<br>
**DATA:** Ride data downloaded from personal Zwift (v5.62) account.<br>
**ANALYSIS:** Exploratory data analysis; Bayesian modeling.<br>
**ETHICAL CONSIDERATIONS:** There are no apparent issues with transparency, accountability, or equity in terms of avaiable data. To avoid any unforseen privacy issues, I will not be posting the raw ride data on Github. I will do my best to characterize the data in this Notebook to justify any insights drawn from the analyses.<br>
**ADDITIONAL CONSIDERATIONS:** None.<br>

## Load libraries

In [1]:
import os

# data wrangling/analysis
import pandas as pd
import numpy as np

# data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# zwift file mgmt
import fitdecode
from datetime import datetime, timedelta
from typing import Dict, Union, Optional,Tuple

"For the data we will be getting from the FIT data, we use the same name as the field names to make it easier to parse the data."

"Create DataFrames from the data we have collected. If any information is missing from a particular lap or track point, it will show up as a null value or "NaN" in the DataFrame."
    

> ***
`file_id` serial_number, time_created, manufacturer, product, number, type
> ***
`device_info` timestamp, serial_number, cum_operating_time, manufacturer, product, software_version, battery_voltage, device_index, device_type, hardware_version, battery_status
> ***
`event` timestamp, timer_trigger, timer_trigger, data16, event, event_type, event_group

**Note:** There is an `event` frame at the beginning and end of the second-by-second `record` frames.
> ***
`record`  ***timestamp***, position_lat, position_long, ***distance***, time_from_course, ***speed***, distance, compressed_speed_distance, heart_rate, enhanced_altitude, ***altitude***, enhanced_speed, speed, ***power, grade, cadence***, resistance, cycle_length, temperature

**Note:** There is a `record` frame for each second of data recorded.
> ***
`lap` timestamp, ***start_time***, start_position_lat, start_position_long, end_position_lat, end_position_long, total_elapsed_time, ***total_timer_time, total_distance***, total_strokes, message_index, total_calories, total_fat_calories, enhanced_avg_speed, ***avg_speed***, enhanced_max_speed, ***max_speed, avg_power, max_power, total_ascent, total_descent***, event, event_type, avg_heart_rate, max_heart_rate, ***avg_cadence, max_cadence***, intensity, lap_trigger, sport, event_group
> ***
`session` timestamp, ***start_time***, start_position_lat, start_position_long, total_elapsed_time, ***total_timer_time, total_distance***, total_strokes, nec_lat, nec_long, swc_lat, swc_long, message_index, total_calories, total_fat_calories, enhanced_avg_speed, ***avg_speed***, enhanced_max_speed, ***max_speed, avg_power, max_power, total_ascent, total_descent***, first_lap_index, num_laps, event, event_type, sport, sub_sport, avg_heart_rate, max_heart_rate, ***avg_cadence, max_cadence***, total_training_effect, event_group, trigger
> ***
`activity` timestamp, total_timer_time, local_timestamp, num_sessions, type, event, event_type, event_group
***

Data in these fields can be accessed using the `fitdecode` argument `get_values`. Most of the data in these fields are either redundant or irrelevant to me right now. Summary data from the training session is located in the `lap` and `session` frames, which contain many of the same variables. Summary statistics, like those reported in the `lap` and `session` fields, can be calculated from the second-by-second `record` data. Because of that, I am only going to extract data from the `record` field. 

I followed [this tutorial](https://towardsdatascience.com/parsing-fitness-tracker-data-with-python-a59e7dc17418) and  adjusted [this accompanying code](https://github.com/bunburya/fitness_tracker_data_parsing/blob/main/parse_fit.py) to parse the .fit files downloaded from Zwift. 

In [2]:
# define path to .fit file
file = os.getcwd() + '/2022-07-16-07-47-29.fit'

In [3]:
# Column names for the second-by-second data dataframe
RECORD_COLUMN_NAMES = ['timestamp', 'distance', 'speed', 'altitude', 'power', 'grade', 'cadence']

In [4]:
def get_record_data(frame: fitdecode.records.FitDataMessage) -> Dict[str, Union[float, datetime, timedelta, int]]:
    """
    Extract some data from a FIT frame representing a record and return it as a dict.
    """
    data: Dict[str, Union[float, datetime, timedelta, int]] = {}
    
    for field in RECORD_COLUMN_NAMES:  
        if frame.has_field(field):
            data[field] = frame.get_value(field)
    
    return data

In [5]:
def get_dataframe(fname: str) -> Tuple[pd.DataFrame]:
    """
    Takes the path to a FIT file (as a string) and returns two Pandas
    DataFrames: one containing data about the recorded second (RECORD), 
    and one containing data about the overall session (SESSION).
    """
    record_data = []
    record_no = 1
    with fitdecode.FitReader(fname) as fit_file:
        for frame in fit_file:
            if isinstance(frame, fitdecode.records.FitDataMessage):
                if frame.name == 'record':
                    single_record_data = get_record_data(frame)
                    single_record_data['number'] = record_no
                    record_data.append(single_record_data)
                    record_no += 1
    
    record_df = pd.DataFrame(record_data, columns=RECORD_COLUMN_NAMES)
    #record_df.set_index('number', inplace=True)
    
    return record_df

In [6]:
if __name__ == '__main__':
    
    fname = file  # Path to FIT file
    record_df = get_dataframe(fname)
    print('RECORD:')
    print(record_df)

RECORD:
                     timestamp  distance speed  altitude  power grade  cadence
0    2022-07-16 11:47:43+00:00      2.40  None      12.8     78  None       22
1    2022-07-16 11:47:44+00:00      4.17  None      12.8     78  None       49
2    2022-07-16 11:47:45+00:00      6.39  None      12.8     78  None       49
3    2022-07-16 11:47:46+00:00      8.98  None      12.8     92  None       49
4    2022-07-16 11:47:47+00:00     11.92  None      12.8     92  None       49
...                        ...       ...   ...       ...    ...   ...      ...
2640 2022-07-16 12:31:46+00:00  22896.36  None      13.8      0  None        0
2641 2022-07-16 12:31:47+00:00  22897.31  None      13.8      0  None        0
2642 2022-07-16 12:31:48+00:00  22897.57  None      13.8      0  None        0
2643 2022-07-16 12:31:49+00:00  22897.57  None      13.8      0  None        0
2644 2022-07-16 12:31:50+00:00  22897.57  None      13.8      0  None        0

[2645 rows x 7 columns]
