Analysis code that counts instances of a phrase on Reddit (e.g. "ask your advisor")
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.Rproj.user/958858BC/console06
data
.gitignore
README.Rmd
README.md
bigquery-reddit.Rproj

README.md

Obtain Data

Run on Google BigQuery.

SELECT author, body
FROM `fh-bigquery.reddit_comments.20*`
WHERE subreddit = 'UIUC'
AND REGEXP_CONTAINS(body, r'(?i)ask your advisor')

Save data as a CSV. The data used for this analysis can be found in the data/ folder.

Post processing

Once we have the data, let’s bring it into R.

# Load in the Data
comments = read.csv("data/results-20181114-075107.csv")

User Overview

Let’s quickly take a peak at the different contributors using this phrase.

# Figure out user counts
count_users = table(comments$author)

# Number of Unique Users
n_users_unique = length(count_users)

# Obtain a leaderboard of comments.
top_users = sort(count_users, decreasing = TRUE)

# Get name
top_username = names(top_users)[1]
top_user_posts =  top_users[1]

There were 348 of users who used a variation of the phrase “Ask your advisor”. The user with the highest amount of comments was /u/IDKAskYourAdvisor who had 30.

All users with at least 4 posts containing the phrase are listed next in a descending order.

Username Frequency
IDKAskYourAdvisor 30
cleverdragon1 11
Mobius118f 8
pissblasta3 8
theillini19 8
[deleted] 7
IlliniTy 6
Mosquite_Leaf 6
CertainTackle 5
DragonZaid 5
uiucrower 5
csdude007 4
GenjoKodo 4
jeffgerickson 4
JRDSandstorm 4
mathuiuc 4
Moi_Username 4
MrAcurite 4
ProgramTheWorld 4
schreiberbj 4
TheFearlessChuaEater 4
UIUCEngineering 4
WUTDO11231235 4

Amount of Words Used Per Post

# Figure out user counts
library("dplyr")

comments %>%
  group_by(author) %>%
  summarise(mean_nwords = mean(stringr::str_count(body, "\\S+")),
            n_entries = n()) %>%
  arrange(desc(n_entries), desc(mean_nwords)) %>%
  filter(n_entries >= 4) %>%
  knitr::kable()
author mean_nwords n_entries
IDKAskYourAdvisor 5.766667 30
cleverdragon1 37.090909 11
theillini19 8.125000 8
pissblasta3 5.500000 8
Mobius118f 3.125000 8
[deleted] 44.857143 7
IlliniTy 5.500000 6
Mosquite_Leaf 3.166667 6
uiucrower 9.000000 5
CertainTackle 3.600000 5
DragonZaid 3.000000 5
mathuiuc 66.000000 4
ProgramTheWorld 30.000000 4
jeffgerickson 29.250000 4
schreiberbj 17.750000 4
Moi_Username 12.500000 4
JRDSandstorm 10.000000 4
csdude007 6.500000 4
UIUCEngineering 6.000000 4
WUTDO11231235 5.000000 4
GenjoKodo 3.000000 4
MrAcurite 3.000000 4
TheFearlessChuaEater 3.000000 4