Web APIs & Classification

Reddit claims to be the front page of the internet, and that's because they are. With an average of 542 million monthly visitors, of which 234 million are unique, Reddit is the third most visited site in the U.S. and ranked sixth in the world. Reddit is subdivided into subreddits, which are themed discussion boards created and populated by Reddit users with links, text, videos and images. These subreddits span an endless array of interests including world news, sports, economics, movies, music, fitness, and more. Reddit members discuss proposed topics in the comments section, and the most popular comments are "up-voted" to the top of the discussion board. In 2015, Reddit users submitted nearly 75 million posts and followed up with nearly three quarters of a billion comments. With so many users, submissions, comments, and up-votes, it can seem impossible to craft a post that will ever see the light of day.

For this project, I was tasked with analyzing a subset of Reddit's "Hot Posts" section to identify what, if any, features of a post determine its popularity. Using data science techniques like exploratory data analysis, predictive modeling, and natural language processing allowed me to efficiently search through a sample of 5,000 posts faster than I ever could have if I had just read through the posts manually.

With an endless supply of things to read on the internet, it seems impossible to write a post that anyone else but your mom will read. But with a few (hundred) keystrokes, even a platform as wild as Reddit can be neatly distilled into a handful of targeted insights.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
Reddit		Reddit
Reddit Post Pitch.pdf		Reddit Post Pitch.pdf
Reddit_v7.ipynb		Reddit_v7.ipynb
Sentiment Analysis.png		Sentiment Analysis.png
comments		comments
comments_scores		comments_scores
hotdataframe		hotdataframe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reddit

Reddit

Reddit Post Pitch.pdf

Reddit Post Pitch.pdf

Reddit_v7.ipynb

Reddit_v7.ipynb

Sentiment Analysis.png

Sentiment Analysis.png

comments

comments

comments_scores

comments_scores

hotdataframe

hotdataframe

Repository files navigation

Web APIs & Classification

About

Releases

Packages

Languages

thedatasleuth/Reddit-NLP

Folders and files

Latest commit

History

Repository files navigation

Web APIs & Classification

About

Resources

Stars

Watchers

Forks

Languages