# Adam Visokay Strava Data
This is a script that will take in my Strava activities csv from from Strava's API and reformat it into a concise dataframe in the following format:
<br>
<table style="border-collapse:collapse;border-spacing:0" class="tg"><tr><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Activity Date</th><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Activity Name</th><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Time(hh:mm:ss)</th><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Distance(mi)</th><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Description</th></tr><tr><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Most Recent</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td></tr><tr><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top">Oldest</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:inherit;text-align:left;vertical-align:top"></td></tr></table>

First I will import some libraries: pandas for creating and modifying dataframes, and time for reformatting time objects. 

In [12]:
import pandas as pd
import time

Through my account with Strava, I downloaded a csv file containing all of my training data. I'll use that file's unique file path on my local computer to create a pandas dataframe object and have a look at what it contains straight from Strava's API.

In [13]:
#read file into pandas and name the dataframe activities
activities = pd.read_csv('/Users/Adam/Desktop/Code Projects/GitHub/strava/activities.csv')
activities.head()

Unnamed: 0,Activity ID,Activity Date,Activity Name,Activity Type,Activity Description,Elapsed Time,Distance,Relative Effort,Commute,Activity Gear,Filename
0,374842848,"Aug 22, 2015, 4:30:00 PM",Morning Run,Run,Felt like myself again this morning. The weath...,6060,27.35,,False,,
1,395696426,"Sep 19, 2015, 9:00:00 PM",Hybrid Long Run/Tempo/Interval,Run,10 miles through grounds in 70 minutes. Then ...,6720,28.96,,False,,
2,1202650848,"Sep 25, 2017, 5:00:00 PM",Getting back into it,Run,Once I get a job (or win some prize money) I w...,4020,16.09,,False,,
3,1202653151,"Sep 26, 2017, 12:00:00 AM",Double,Run,Nice easy 5 for my afternoon double. Trying to...,2100,8.04,,False,Cliffs,
4,1208564029,"Sep 27, 2017, 10:30:00 PM",Easy Day,Run,Easy recovery run today at first landing.,3720,14.48,,False,Cliffs,


It contains a lot of information that I don't really care about. 'Activity ID' is useless information for me, and 'Relative Effort' is going to be null for every observation, so lets drop those entire columns from the dataframe. 

In [14]:
#drop unnecessary columns from the activities dataframe
activities.drop(columns = ['Activity ID', 'Activity Type', 'Relative Effort', 'Commute', 'Activity Gear', 'Filename'], inplace=True)
activities.head()

Unnamed: 0,Activity Date,Activity Name,Activity Description,Elapsed Time,Distance
0,"Aug 22, 2015, 4:30:00 PM",Morning Run,Felt like myself again this morning. The weath...,6060,27.35
1,"Sep 19, 2015, 9:00:00 PM",Hybrid Long Run/Tempo/Interval,10 miles through grounds in 70 minutes. Then ...,6720,28.96
2,"Sep 25, 2017, 5:00:00 PM",Getting back into it,Once I get a job (or win some prize money) I w...,4020,16.09
3,"Sep 26, 2017, 12:00:00 AM",Double,Nice easy 5 for my afternoon double. Trying to...,2100,8.04
4,"Sep 27, 2017, 10:30:00 PM",Easy Day,Easy recovery run today at first landing.,3720,14.48


Next up I notice that 'Elapsed Time' and 'Distance' don't have units so I have to do some extra thinking to figure out that the data are measured in seconds and kilometers, respectively. I would like to create new variables where distance is measured in miles and time is in the format hh:mm:ss and make sure that the titles of the column include those units in them. 

In [15]:
#create a list of seconds from 'Elapsed Time', convert it to hh:mm:ss and add it to the activities dataframe as a new column
sec = []
for n in activities['Elapsed Time']:
    sec.append(time.strftime('%H:%M:%S', time.gmtime(n)))
activities['Time(hh:mm:ss)'] = sec

#convert 'Distance' which is measured in kilometers to a new column of Distance measured in miles
activities['Distance(mi)'] = (activities['Distance']/1.609).round()

# drop 'Elapsed Time' and 'Distance' from the data frame
activities.drop(columns = ['Elapsed Time', 'Distance'], inplace=True)
activities.head()

Unnamed: 0,Activity Date,Activity Name,Activity Description,Time(hh:mm:ss),Distance(mi)
0,"Aug 22, 2015, 4:30:00 PM",Morning Run,Felt like myself again this morning. The weath...,01:41:00,17.0
1,"Sep 19, 2015, 9:00:00 PM",Hybrid Long Run/Tempo/Interval,10 miles through grounds in 70 minutes. Then ...,01:52:00,18.0
2,"Sep 25, 2017, 5:00:00 PM",Getting back into it,Once I get a job (or win some prize money) I w...,01:07:00,10.0
3,"Sep 26, 2017, 12:00:00 AM",Double,Nice easy 5 for my afternoon double. Trying to...,00:35:00,5.0
4,"Sep 27, 2017, 10:30:00 PM",Easy Day,Easy recovery run today at first landing.,01:02:00,9.0


This is looking much better! Now I want to reorganize the columns so that the description of the run is last. 

In [24]:
# reshape columns order by putting the 4th indexed column inbetween the 1st and 2nd indexed columns
new_order = [0,1,4,2,3]
activities = activities[activities.columns[new_order]]
activities.head()

Unnamed: 0,Activity Date,Activity Name,Time(hh:mm:ss),Distance(mi),Activity Description
0,"Aug 22, 2015, 4:30:00 PM",Morning Run,01:41:00,17.0,Felt like myself again this morning. The weath...
1,"Sep 19, 2015, 9:00:00 PM",Hybrid Long Run/Tempo/Interval,01:52:00,18.0,10 miles through grounds in 70 minutes. Then ...
2,"Sep 25, 2017, 5:00:00 PM",Getting back into it,01:07:00,10.0,Once I get a job (or win some prize money) I w...
3,"Sep 26, 2017, 12:00:00 AM",Double,00:35:00,5.0,Nice easy 5 for my afternoon double. Trying to...
4,"Sep 27, 2017, 10:30:00 PM",Easy Day,01:02:00,9.0,Easy recovery run today at first landing.


Finally, I will just reformat the Activity Date to mm/dd/yyyy format and sort by most recent activity first

In [27]:
#reformat date to mm/dd/yyyy and sort by most recent first
activities['Activity Date'] = pd.to_datetime(activities['Activity Date']).dt.strftime('%m/%d/%Y')
activities = activities.sort_index(ascending=False)
activities.head()

Unnamed: 0,Activity Date,Activity Name,Time(hh:mm:ss),Distance(mi),Activity Description
1057,11/21/2019,Morning Run,01:05:51,9.0,
1056,11/20/2019,Morning Run,01:01:20,8.0,
1055,11/19/2019,Afternoon Run,00:32:37,4.0,
1054,11/19/2019,Morning Run,00:14:01,2.0,
1053,11/19/2019,Hard mile after tempo,00:04:52,1.0,Didn’t end up picking it up as much as I would...


In [1]:
#create a new csv file out of this concise dataframe
#activities.to_csv('/Users/Adam/************/strava_activities.csv')

Now as I continue logging my runs on Strava, I can just download the updated activity csv from Strava's API and run it through this script to get it into the format that I want. Happy running and happy coding!<br>
-Adam