Skip to content
Aaron Schroeder edited this page May 9, 2023 · 8 revisions

Data Format Reference

.fit files

.fit files are the preferred type. They log the most comprehensive amount of data and are also the smallest. They have room for more summary data and more clearly explain what was going on in the activity. For example, they record every button-press event: start, pause, resume, lap, stop. The richness of the recorded data allows heartandsole to index the trackpoints DataFrame by record number, lap, and "bout" (a term I coined to describe the time between hitting "start" and "pause").

While there are many types of fit files, the following are the most relevant to this project. All fit files must begin with a File ID message.

  • Activity
    • Activity > Session > Lap > Record
      • Timestamp and at least one other value are required for each Record message.
  • Course
    • Course > Lap > Record
      • "The Record messages contain the latitude, longitude, altitude, and distance values that define the course."

Refs

.tcx files

.tcx files have a limited schema and can represent only a subset of the data found in a .fit file. The individual trackpoints retain all the relevant information from .fit files, but the schema does not allow you to know for sure when your device was paused. The activity itself is divided into laps which contain tracks which in turn contain trackpoints. When you hit the "new lap" button on your device, a new lap element begins in the .tcx file. However, when you hit the "pause" button on your device, nothing explicitly happens in the file; no new element is created; trackpoints simply stop being recorded. If you have a device that records trackpoints every second, it is apparent when the device is paused. But if your device is set to use Garmin's so-called "Smart Recording" feature, there are variable amounts of time between trackpoints. Seeing a seven second gap could mean the device was paused during that time, or the device waited that long to record a new point.

I am working on a method for inferring device pauses in .tcx files, but for now, heartandsole assumes they do not exist. In general, the inconsistency between devices and files leads me to simply avoid pausing my device during a workout - more data points are rarely a problem, and the data analysis goes easier if I am certain I was not paused. I suggest you do the same! Or use .fit files, which are the industry standard. As a compromise, if you use .tcx files and you want to manually delineate stopped and moving periods: when you would normally pause and restart your device, you could simply start a new lap when you stop, and start another lap when you begin again.

If we think of TCX files as trees, the leaf nodes that occur over and overrepresent sequences of data points. There are a few different types of data point nodes:

  • Trackpoint/
    • Time
    • Position/LatitudeDegrees
    • Position/LongitudeDegrees
    • AltitudeMeters
    • DistanceMeters
    • HeartRateBpm/Value
    • Cadence
    • SensorState
      • Options: "Present", "Absent"
    • Extensions/
      • TPX/
        • Speed
        • RunCadence/
        • Watts/
        • Extensions/...
      • ...
  • Coursepoint/
    • Name
    • Time
    • Position/LatitudeDegrees
    • Position/LongitudeDegrees
    • AltitudeMeters
    • PointType
      • Options: "Generic", "Summit", "Valley", "Water", "Food", "Danger",
        "Left", "Right", ...
    • Notes
    • Extensions/...

There area few different tree hierarchies that lead to these data points. They all start from a common root.

  • TrainingCenterDatabase/
    • Folders/Courses/
      • Track/Trackpoint
      • Coursepoint
    • Activities/
      • Activity/Lap/Track/Trackpoint
      • MultiSportSession/
        • FirstSport/Activity/Lap/Track/Trackpoint
        • NextSport/Activity/Lap/Track/Trackpoint
    • Courses/
      • Track/Trackpoint
      • Coursepoint

Refs

.gpx files

.gpx files have the most limited schema of all. In fact the .tcx file schema is actually an extension of the .gpx file schema - all the limitations mentioned above limitations apply to .gpx files too). GPX trackpoints can only represent latitude, longitude, time, elevation, heart rate, and cadence. Unlike TCX trackpoints, they lack speed and distance. These quantities can be calculated using the GPS coordinates, but your device calculates them in a different way that tends to be more accurate. Not having a slot for these fields means your device data is simply lost.

In addition to the limitations of .tcx files, as far as I know, .gpx files only record trackpoints with valid GPS coordinates. In other words if your device doesn't have a GPS signal at the beginning of your run, no data is recorded - even your heart rate and cadence. Also, typical .gpx files do not divide the activity into laps or indicate pauses. The .gpx file format is really bare-bones: just a few data fields recorded with each trackpoint, and no summary data to speak of.

  • *pt/

    • ele
    • time
    • lat
    • lon
    • extensions/
      • TrackPointExtension/
        • hr
        • cad
        • speed
        • ...
      • ...
    • name
    • type
    • link
    • ...
  • GPX/

    • wpt
    • rte/rtept
    • trk/trkseg/trkpt

Refs

Strava API JSON response

The Strava API contains an endpoint that returns time series data for one of your activities as a list of dicts.

If called successfully, the endpoint /activities/$ID/streams will return any available streams specified with the keys url query parameter. The keys should be formatted as a comma-separated list eg key1,key2,....

The possible keys are:

  • time
  • latlng
  • distance
  • altitude
  • velocity_smooth
  • heartrate
  • cadence
  • watts
  • temp
  • moving
  • grade_smooth

Each dict contains the time series name stream["type"] along with data points stream["data"] and some other metadata.

Refs:

Clone this wiki locally