## OTF Workouts
Updated through 12/11/21 - and updated HP OTF Stats for classes through 12/11/21

### Imports

In [2]:
from pmaw import PushshiftAPI
import pandas as pd
import datetime as dt
import csv
import re 
from pprint import pprint
api = PushshiftAPI()
import os, glob
import secrets
import nltk
import numpy as np
from string import punctuation
from collections import Counter
from nltk.corpus import stopwords
sw = stopwords.words('english')

## Pull Comments

In [12]:
before = int(dt.datetime(2021,11,30).timestamp()) 
after = int(dt.datetime(2012,12,1,11).timestamp())

I wanted to pull information from the r/Orangetheory subreddit that contained information on the daily work out. The sub has gone through several variations of how this information is posted/shared. Currently, there is a automated post with the date and a generic post, once the mods have the workout intel for the day they post the information as a post, by splat_bot by the user u/splat_bot (which is a bot). 

I initially tried using PRAW API but was not able to get all the data I wanted due to limitations of the API.

In [13]:
# Generate comments from splat_bot
subreddit="orangetheory"
username =  'splat_bot' #otfmonitorbot
limit=100000
sb_comments = api.search_comments(subreddit=subreddit, author=username, limit=limit, before=before, after=after)
#Clay helped with this, mine was pulling into a txt file which was hard to clean

INFO:pmaw.PushshiftAPIBase:9990720 result(s) not found in Pushshift
INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 82.65% - Requests: 98 - Batches: 10 - Items Remaining: 2341
INFO:pmaw.PushshiftAPIBase:Total:: Success Rate: 83.69% - Requests: 141 - Batches: 17 - Items Remaining: 0


In [14]:
# Generate comments from otfmonitorbot
subreddit="orangetheory"
username =  'otfmonitorbot'
limit=100000
om_comments = api.search_comments(subreddit=subreddit, author=username, limit=limit, before=before, after=after)


INFO:pmaw.PushshiftAPIBase:99982 result(s) not found in Pushshift
INFO:pmaw.PushshiftAPIBase:Total:: Success Rate: 100.00% - Requests: 8 - Batches: 1 - Items Remaining: 0


In [16]:
#total comments pulled
#len(om_comments) + len(sb_comments)

Create Pandas DF of comments from r/splat_bot

In [17]:
sb_df = pd.DataFrame(sb_comments)

# preview the comments data
#sb_df 

Create Pandas DF of comments from r/otfmonitorbot

In [18]:
om_df = pd.DataFrame(om_comments)

# preview the comments data
#om_df 

Make DF of body and date

In [19]:
# Pull just columns wanted
workout_df = pd.DataFrame(sb_df, columns = ['body', 'created_utc'])
workout_df

Unnamed: 0,body,created_utc
0,This submission was removed by the moderators....,1609854056
1,Reposting content courtesy of /u/dc031114. \n...,1609826433
2,Reposting content courtesy of /u/splat_bot. \...,1609826427
3,This submission was removed by the moderators....,1609822987
4,It looks like the topic you posted about alrea...,1609815820
...,...,...
9275,This submission was removed by the moderators....,1627653137
9276,This submission was removed by the moderators....,1632961559
9277,I found some information that could be relevan...,1632959630
9278,This submission was removed by the moderators....,1632957311


In [23]:
# Pull just columns wanted
workout_df2 = pd.DataFrame(om_df, columns = ['body', 'created_utc'])
workout_df2

Unnamed: 0,body,created_utc
0,This thread is now locked. Please continue the...,1559884362
1,Reposting early intel courtesy of /u/LivingMem...,1559884359
2,This thread is now locked. Please continue the...,1559797884
3,Reposting early intel courtesy of /u/LivingMem...,1559797880
4,This thread is now locked. Please continue the...,1559797876
5,Reposting early intel courtesy of /u/LivingMem...,1559797873
6,This thread is now locked. Please continue the...,1559711489
7,Reposting early intel courtesy of /u/matthotli...,1559711486
8,This thread is now locked. Please continue the...,1559625051
9,Reposting early intel courtesy of /u/matthotli...,1559625047


Save DFs as CSV's

In [20]:
workout_df.to_csv('otf_splatbot.csv', header=True, index=False, columns=list(workout_df.axes[1]))

In [24]:
workout_df2.to_csv('otf_otfmonitor.csv', header=True, index=False, columns=list(workout_df2.axes[1]))

Combine the two CSVs

In [25]:
path = "\\Users\heath\\OneDrive - The University of Montana\\Applied_Data_Analytics\\data_project_2\\original_pulls\\"

all_files = glob.glob(os.path.join(path, "otf*.csv"))
df_from_each_file = (pd.read_csv(f, sep=',') for f in all_files)
df_merged   = pd.concat(df_from_each_file, ignore_index=True)
df_merged.to_csv( "combined_wos.csv", index = None)

Read in the combined file

In [28]:
workout = pd.read_csv("combined_wos.csv") 
workout["created_utc"] = pd.to_datetime(workout["created_utc"], unit= 's')
workout['date'] = [d.date() for d in workout['created_utc']]
workout.drop('created_utc', axis=1, inplace=True)
workout['date'] = pd.to_datetime(workout.date)
workout

Unnamed: 0,body,date
0,This thread is now locked. Please continue the...,2019-06-07
1,This thread is now locked. Please continue the...,2019-06-06
2,Reposting early intel courtesy of /u/LivingMem...,2019-06-06
3,This thread is now locked. Please continue the...,2019-06-06
4,Reposting early intel courtesy of /u/LivingMem...,2019-06-06
...,...,...
6961,I found a possible match to a frequently-asked...,2019-09-11
6962,This submission was removed by the moderators....,2019-09-11
6963,This submission was removed by the moderators....,2019-09-11
6964,I found a possible match to a frequently-asked...,2019-09-11


### Clean up

In [29]:
#lowercase and alpha only
workout.body = workout.body.str.lower()
workout.body = workout.body.apply(lambda x:' '.join(re.findall(r'\w+', x)))

Clean comments that are not related to workouts

In [33]:
# Drop rows that don't contain 'reposting' which are comments that contain workouts
clean_workout = workout[workout['body'].apply(lambda x: 'reposting' in x)]

In [34]:
clean_workout

Unnamed: 0,body,date
2,reposting early intel courtesy of u livingmemo...,2019-06-06
4,reposting early intel courtesy of u livingmemo...,2019-06-06
7,reposting early intel courtesy of u matthotlip...,2019-06-04
8,reposting early intel courtesy of u livingmemo...,2019-06-03
9,reposting early intel courtesy of u livingmemo...,2019-06-02
...,...,...
6883,reposting content courtesy of u kaorian saturd...,2019-10-19
6899,reposting content courtesy of u superspykay fr...,2019-10-18
6900,reposting content courtesy of u livingmemory 3...,2019-10-18
6912,reposting content courtesy of u livingmemory 3...,2019-10-17


In [35]:
#This works but found more efficent way after doing this

#workout_df = workout_df[workout_df["body"].str.contains("i found some information")==False]
#workout_df = workout_df[workout_df["body"].str.contains("this submission")==False]
#workout_df = workout_df[workout_df["body"].str.contains("this thread is now locked")==False]
#workout_df = workout_df[workout_df["body"].str.contains("please see previous message")==False]
#workout_df = workout_df[workout_df["body"].str.contains("i found a possible match")==False]
#workout_df = workout_df[workout_df["body"].str.contains("this is a very common question")==False]
#workout_df = workout_df[workout_df["body"].str.contains("it looks like")==False]workout_df = workout_df[workout_df["body"].str.contains("reminder from the mods")==False]
#workout_df = workout_df[workout_df["body"].str.contains("my heart beats")==False]
#workout_df = workout_df[workout_df["body"].str.contains("note from the moderators")==False]
#workout_df = workout_df[workout_df["body"].str.contains("oh no not the delete")==False]
#workout_df = workout_df[workout_df["body"].str.contains("per the wiki")==False]

Save DFs as CSVs

In [41]:
# Save cleaned dataframe as a csv
clean_workout.to_csv('clean_workouts.csv', header=True, index=False, columns=list(clean_workout.axes[1]))

In [42]:
#len(clean_workout)

In [43]:
#bring in my stats
#hp_stats = open("updated_otfstats.csv",'r', encoding='utf-8').read()
hp_stats = pd.read_csv("updated_otfstats.csv") 
hp_stats['Date'] = pd.to_datetime(hp_stats.Date)
hp_stats

Unnamed: 0,Class Type,Focus,Coach,Time,Date,Day of Week,Grey,Blue,Green,Orange,...,Splat Points,Avg HR,Max HR,Total Calories,Distance,Avg Speed,Max Speed,Avg Pace,Elevation,Notes
0,2G,ESP,Brittney,11:15 AM,2017-10-28,Saturday,1,1,7,36,...,43,169,185.0,586,,,,,,
1,2G,ESP,Shania,11:15 AM,2017-11-04,Saturday,2,1,21,33,...,34,162,180.0,615,,,,,,
2,3G,Endurance,Other,5:00 AM,2017-11-20,Monday,0,3,19,38,...,41,165,184.0,695,,,,,,
3,2G,Power,Gabriella,5:30 PM,2017-11-22,Wednesday,1,4,36,16,...,16,155,177.0,572,,,,,,
4,2G,Power,Bri,7:00 AM,2017-11-25,Saturday,1,13,36,10,...,10,150,173.0,560,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
252,3G,Power,Katie,7:30 AM,2021-09-18,Saturday,4,3,16,30,...,35,149,190.0,537,0.92,3.6,5.5,16:33,168.8,
253,3G,Endurance,Katie,6:45 PM,2021-09-20,Monday,6,11,25,11,...,15,136,176.0,474,1.07,4.1,6,14:41,56.4,
254,3G,ESP,Paul,6:45 PM,2021-09-27,Monday,2,5,19,18,...,30,153,192.0,545,1.1,5,6.5,12:06,53.9,Mile Benchmark -- PR 11:22
255,3G,Strength,Shania,6:45 PM,2021-09-29,Wednesday,2,13,22,17,...,19,141,177.0,491,1.3,3.6,5.4,16:46,294.7,


### Join my stats with the workout information
Below I combined data I have collected from my OTF workouts such as average heart rate, splat points,and distance. In the summer class I worked with that data, after the web scraping assignments I wanted see if I could include the details about the work out too, below I was able to combine the comments I collected and join them on the dates I have data for.

There are missing variables in my stats data as the information they collect/share has changed over time.

In [45]:
stats_w_wo = pd.merge(left=hp_stats, right=clean_workout, how='left', left_on='Date', right_on='date')
stats_w_wo.rename(columns={'body': 'Workout Details'}, inplace=True)


In [46]:
stats_w_wo.drop('date', axis=1, inplace=True)

In [47]:
stats_w_wo

Unnamed: 0,Class Type,Focus,Coach,Time,Date,Day of Week,Grey,Blue,Green,Orange,...,Avg HR,Max HR,Total Calories,Distance,Avg Speed,Max Speed,Avg Pace,Elevation,Notes,Workout Details
0,2G,ESP,Brittney,11:15 AM,2017-10-28,Saturday,1,1,7,36,...,169,185.0,586,,,,,,,
1,2G,ESP,Shania,11:15 AM,2017-11-04,Saturday,2,1,21,33,...,162,180.0,615,,,,,,,
2,3G,Endurance,Other,5:00 AM,2017-11-20,Monday,0,3,19,38,...,165,184.0,695,,,,,,,
3,2G,Power,Gabriella,5:30 PM,2017-11-22,Wednesday,1,4,36,16,...,155,177.0,572,,,,,,,
4,2G,Power,Bri,7:00 AM,2017-11-25,Saturday,1,13,36,10,...,150,173.0,560,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
273,3G,Power,Katie,7:30 AM,2021-09-18,Saturday,4,3,16,30,...,149,190.0,537,0.92,3.6,5.5,16:33,168.8,,
274,3G,Endurance,Katie,6:45 PM,2021-09-20,Monday,6,11,25,11,...,136,176.0,474,1.07,4.1,6,14:41,56.4,,reposting content courtesy of u rag_monkey cli...
275,3G,ESP,Paul,6:45 PM,2021-09-27,Monday,2,5,19,18,...,153,192.0,545,1.1,5,6.5,12:06,53.9,Mile Benchmark -- PR 11:22,
276,3G,Strength,Shania,6:45 PM,2021-09-29,Wednesday,2,13,22,17,...,141,177.0,491,1.3,3.6,5.4,16:46,294.7,,reposting content courtesy of u rag_monkey cli...


### Pull a random workout form those pulled, to use on your own
Thought it might be kinda fun to be able to pull a random workout from those that were scraped for use on my own.

In [58]:
wo_list = list(clean_workout.body)

In [61]:
pprint(secrets.choice(wo_list))

('reposting content courtesy of u dc031114 click here to view the comments on '
 'the original post https www reddit com r orangetheory comments l9la78 '
 'monday_1_february_2021_endurance_3g_60_minutes monday 1 february 2021 '
 'endurance 3g 60 minutes respectable template today not much recovery on the '
 'base and the row block will also get your heart rate going not a huge fan of '
 'the floor with the palm to elbows but it wasn t too bad but it won t get '
 'your heart rate up tread block 1 90 sec push 1 min base 2 min push 1 min '
 'base 1 min ao tread block 2 90 sec push 45 sec base 2 min push 45 sec base 1 '
 'min ao row block 14 minutes 600m row 10 x medicine ball ground to press 10 x '
 'iso squat front press 4 x 150m row 10 x medicine ball ground to press 4 x '
 '150m row 10 x medicine ball iso squat from press bonus 150m row 10 x '
 'medicine ball ground to press 10 x iso squat front press floor block 1 6 5 '
 'minutes 5 l x static lunge 5 l x iso lunge to hammer curl 5 r x