<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# <u><b>Project 3:</b></u> Subeddit suitability moderation between r/Discworld r/Cosmere subreddits using NLP classification modelling

<img src="https://www.redditinc.com/assets/images/site/reddit-logo.png" style="float: left; margin: 20px; height: 55px"><br>
### <u><b>About Reddit:</b></u> 
<br>
Reddit is a social aggregation and discussion website. Posts are submitted by registered users (commonly referred to as "Redditors") to the website, which are then voted on and discussed by other Redditors. Posts' content can either be links, text posts, images, or videos. With over 50 million daily active users worldwide, it is understandable that interest will be diverse and not all posts would be of interest to everyone. Therefore to encourage robust discussion, Redditors are required to choose topic-specific user-created communities or "subreddits" (commonly referred to as "subs") most suitable for their post.  
<br>  
<br>  
Reddit administrators (Reddit employees) moderate the website, whereas subreddit moderators (commonly referred to as "Mods") are done by volunteers who are not paid. Reddit rely on the Mods to maintain the standard of content within each sub, Mods may also settle disputes, levy rules on what is and isn't appropriate and delete or edit content deemed unsuitable for the site. 
<br>  
<br>  
Since Mods only perform their roles during their free time, it makes sense to leverage on Data Science to perform some routine moderation task such as assessing if the post is on topic. Posting on the wrong sub is such a prevalent problem that there's even a sub to discuss them: r/lostredditors 

### <u><b>Problem statement:</b></u> 
<b>This project aims to help r/Discworld and r/Cosmere (two subs that this author is a stan of) mods to perform the routine task of identifying posts that are more suitable for the other sub than their own, so that they can focus their time on more value added services, and the post's Origninal Poster (commonly refered to as "OP") can reach their intended audience. </b>

--- 
### Part 1: Scrap hot posts from r/Discworld and r/Cosmere
---

### <u> 1.1 Scrap hot posts from r/Discworld and r/Cosmere </u>

In [1]:
# Import libraries
import praw
import pandas as pd
import numpy as np
import prettyprinter as pprint

In [3]:
# PRAW dynamically provides the attributes that Reddit returns via the API. 
# Since those attributes are subject to change on Reddit’s end, PRAW makes no effort to document any new/removed/changed attributes. 
# The below discovers what is the current available attributes of the Reddit.submission class:
submission = reddit.submission(id="11kieej")
print(submission.title) # to make it non-lazy
pprint.pprint(vars(submission))

Built my own Lego Terry Pratchett!
{
    'comment_limit': 2048,
    'comment_sort': 'confidence',
    'id': '11kieej',
    '_reddit': <praw.reddit.Reddit object at 0x117a3c970>,
    '_fetched': True,
    '_comments_by_id': {
        't1_jb7lx43': Comment(id='jb7lx43'),
        't1_jb7m4l3': Comment(id='jb7m4l3'),
        't1_jb99ww0': Comment(id='jb99ww0'),
        't1_jb9qjav': Comment(id='jb9qjav'),
        't1_jb7ycvt': Comment(id='jb7ycvt'),
        't1_jb7qat9': Comment(id='jb7qat9'),
        't1_jb7rg0k': Comment(id='jb7rg0k'),
        't1_jb9182j': Comment(id='jb9182j'),
        't1_jb9agzb': Comment(id='jb9agzb'),
        't1_jb7tcrn': Comment(id='jb7tcrn'),
        't1_jb8u418': Comment(id='jb8u418'),
        't1_jb9bege': Comment(id='jb9bege'),
        't1_jb9cz99': Comment(id='jb9cz99'),
        't1_jb7wasm': Comment(id='jb7wasm'),
        't1_jb7y5oi': Comment(id='jb7y5oi'),
        't1_jb8fukd': Comment(id='jb8fukd'),
        't1_jb98xdj': Comment(id='jb98xdj'),
        't

In [4]:
# define dictionary
discworld_posts = {"title": [], "post_text": [],
                   "is_text": [], "is_media": [], "flair_text": [], "is_spoiler": [],
                   "score": [], "upvote_ratio": [], "total_comments": [], "total_awards": [],
                   "author": [], "id": [], "post_url": [], "subreddit": []
                  }
# Scrap hot posts from the r/discworld subreddit
for post in reddit.subreddit('discworld').hot(limit=1000):
    # text for NLP
    discworld_posts["title"].append(post.title)                           #The title of the submission.
    discworld_posts["post_text"].append(post.selftext)                    #The submissions’ selftext - an empty string if a link post.
    # submisssion features
    discworld_posts["is_text"].append(post.is_self)                       #Whether or not the submission is a selfpost (text-only).
    discworld_posts["is_media"].append(post.is_reddit_media_domain)       #Whether or not the submission contains media.
    discworld_posts["flair_text"].append(post.link_flair_text)            #The link flair’s text content, or None if not flaired
    discworld_posts["is_spoiler"].append(post.spoiler)                    #Whether or not the submission has been marked as a spoiler.
    # submission reception
    discworld_posts["score"].append(post.score)                           #The number of upvotes for the submission.
    discworld_posts["upvote_ratio"].append(post.upvote_ratio)             #The percentage of upvotes from all votes on the submission.
    discworld_posts["total_comments"].append(post.num_comments)           #The number of comments on the submission.
    discworld_posts["total_awards"].append(post.total_awards_received)    #The number of awards for the submission.
    # submission details
    discworld_posts["author"].append(post.author)                         #Provides an instance of Redditor. 
    discworld_posts["id"].append(post.id)                                 #ID of the submission.
    discworld_posts["post_url"].append(post.url)                          #The URL the submission links to, or the permalink if a selfpost.
    discworld_posts["subreddit"].append(post.subreddit)                   #Provides an instance of Subreddit.

# Saving the data in a pandas dataframe
discworld_posts = pd.DataFrame(discworld_posts)

In [5]:
discworld_posts.head(5)

Unnamed: 0,title,post_text,is_text,is_media,flair_text,is_spoiler,score,upvote_ratio,total_comments,total_awards,author,id,post_url,subreddit
0,GNU Terry Pratchett,>In the Ramtop village where they dance the re...,True,False,GNU,False,854,0.99,129,7,Faithful_jewel,ukigit,https://www.reddit.com/r/discworld/comments/uk...,discworld
1,"Sub Updates for 2023, including Mod Recruiting...",Greetings everyone! The Mod Team at r/Discworl...,True,False,Mod Announcement,False,30,0.96,15,0,Faithful_jewel,10mjmpn,https://www.reddit.com/r/discworld/comments/10...,discworld
2,This feels like something Death and many other...,,False,True,Memes/Humour,False,166,0.96,3,0,Panda-Sandwich,11pxjvp,https://i.redd.it/e013vqitoena1.jpg,discworld
3,"GNU Sir Terry, and thank you.",,False,True,RoundWorld,False,480,0.99,7,0,gordielaboom,11plpi2,https://i.redd.it/3hrm44c2scna1.jpg,discworld
4,In honor of Sir Terry Pratchett my family and ...,,False,False,RoundWorld,False,136,0.94,15,0,Sorellin-Grimm,11pu17r,https://www.reddit.com/gallery/11pu17r,discworld


In [6]:
# define dictionary
cosmere_posts = {"title": [], "post_text": [],
                   "is_text": [], "is_media": [], "flair_text": [], "is_spoiler": [],
                   "score": [], "upvote_ratio": [], "total_comments": [], "total_awards": [],
                   "author": [], "id": [], "post_url": [], "subreddit": []
                  }

# Scrap hot posts from the r/cosmere subreddit
for post in reddit.subreddit('cosmere').hot(limit=1000):
    # text for NLP
    cosmere_posts["title"].append(post.title)                           #The title of the submission.
    cosmere_posts["post_text"].append(post.selftext)                    #The submissions’ selftext - an empty string if a link post.
    # submisssion features
    cosmere_posts["is_text"].append(post.is_self)                       #Whether or not the submission is a selfpost (text-only).
    cosmere_posts["is_media"].append(post.is_reddit_media_domain)       #Whether or not the submission contains media.
    cosmere_posts["flair_text"].append(post.link_flair_text)            #The link flair’s text content, or None if not flaired
    cosmere_posts["is_spoiler"].append(post.spoiler)                    #Whether or not the submission has been marked as a spoiler.
    # submission reception
    cosmere_posts["score"].append(post.score)                           #The number of upvotes for the submission.
    cosmere_posts["upvote_ratio"].append(post.upvote_ratio)             #The percentage of upvotes from all votes on the submission.
    cosmere_posts["total_comments"].append(post.num_comments)           #The number of comments on the submission.
    cosmere_posts["total_awards"].append(post.total_awards_received)    #The number of awards for the submission.
    # submission details
    cosmere_posts["author"].append(post.author)                         #Provides an instance of Redditor. 
    cosmere_posts["id"].append(post.id)                                 #ID of the submission.
    cosmere_posts["post_url"].append(post.url)                          #The URL the submission links to, or the permalink if a selfpost.
    cosmere_posts["subreddit"].append(post.subreddit)                   #Provides an instance of Subreddit.

# Saving the data in a pandas dataframe
cosmere_posts = pd.DataFrame(cosmere_posts)

In [7]:
cosmere_posts.head(5)

Unnamed: 0,title,post_text,is_text,is_media,flair_text,is_spoiler,score,upvote_ratio,total_comments,total_awards,author,id,post_url,subreddit
0,SECRET PROJECT 1 | Full Book Discussion,**Full Book Discussion**\n\nUse the comments o...,True,False,Tress (SP1),False,241,1.0,1278,0,jofwu,zzmzil,https://www.reddit.com/r/Cosmere/comments/zzmz...,Cosmere
1,SECRET PROJECT 1 | Cosmere Discussion,**Cosmere Discussion**\n\nUse the comments of ...,True,False,Cosmere + Tress (SP1),False,240,1.0,1966,0,jofwu,zzmzea,https://www.reddit.com/r/Cosmere/comments/zzmz...,Cosmere
2,My Nightblood Themed Birthday Cake,,False,True,Warbreaker,False,809,0.98,31,0,storming_jguy,11pmq26,https://i.redd.it/0e7uofctdena1.jpg,Cosmere
3,I just got to chapter 21 in TLM and...,...Hoid accepting Sir Squeekins from Wayne is ...,True,False,Mistborn,True,141,0.97,12,0,NorthBall,11pdwyd,https://www.reddit.com/r/Cosmere/comments/11pd...,Cosmere
4,So what exactly is the Stormfather?,Stormfather seems to be a combination of a few...,True,False,Cosmere,True,17,0.9,15,0,Simon_Drake,11pqio6,https://www.reddit.com/r/Cosmere/comments/11pq...,Cosmere


In [8]:
posts=pd.concat([discworld_posts,cosmere_posts], axis=0, ignore_index=True)

In [9]:
posts

Unnamed: 0,title,post_text,is_text,is_media,flair_text,is_spoiler,score,upvote_ratio,total_comments,total_awards,author,id,post_url,subreddit
0,GNU Terry Pratchett,>In the Ramtop village where they dance the re...,True,False,GNU,False,854,0.99,129,7,Faithful_jewel,ukigit,https://www.reddit.com/r/discworld/comments/uk...,discworld
1,"Sub Updates for 2023, including Mod Recruiting...",Greetings everyone! The Mod Team at r/Discworl...,True,False,Mod Announcement,False,30,0.96,15,0,Faithful_jewel,10mjmpn,https://www.reddit.com/r/discworld/comments/10...,discworld
2,This feels like something Death and many other...,,False,True,Memes/Humour,False,166,0.96,3,0,Panda-Sandwich,11pxjvp,https://i.redd.it/e013vqitoena1.jpg,discworld
3,"GNU Sir Terry, and thank you.",,False,True,RoundWorld,False,480,0.99,7,0,gordielaboom,11plpi2,https://i.redd.it/3hrm44c2scna1.jpg,discworld
4,In honor of Sir Terry Pratchett my family and ...,,False,False,RoundWorld,False,136,0.94,15,0,Sorellin-Grimm,11pu17r,https://www.reddit.com/gallery/11pu17r,discworld
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1991,Way of Kings as a gift,I'm thinking about gifting my father a signed ...,True,False,No Spoilers,False,4,0.75,2,0,HauntingGold,10evvrr,https://www.reddit.com/r/Cosmere/comments/10ev...,Cosmere
1992,Literary Analysis for SP1,I created a short literary analysis on Tress a...,True,False,Tress (SP1),True,3,0.67,0,0,Ok-Consequence-8106,10evdo8,https://www.reddit.com/r/Cosmere/comments/10ev...,Cosmere
1993,Lord Ruler-God Theory,I do not want any spoilers for Mistborn. I am ...,True,False,Stormlight / early Mistborn: Final Empire,True,0,0.46,34,0,Any_Drag3177,10f764r,https://www.reddit.com/r/Cosmere/comments/10f7...,Cosmere
1994,do you invent or discover Awakening Commands?,like with maths,True,False,Warbreaker,False,28,0.98,26,0,spaghetto_guy,10eedye,https://www.reddit.com/r/Cosmere/comments/10ee...,Cosmere


In [10]:
posts.to_csv('../Data/posts.csv', index=False) 