# Introduction

The following project is an exploration of the Reddit API. The first section includes some basic use of PRAW's models including creating instances of subreddits and comments using the popular subreddit AmItheAsshole. The instances were then used to create a dataframe with comments from several submissions for sentiment analysis. In the second section of this notebook, comments were used to rank a selection of 10 popular subreddits by average sentiment.

### Setup

In [1]:
# Get these values from https://www.reddit.com/prefs/apps/
# Use Kaggle Secrets or paste in values here
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
REDDIT_CLIENT_ID = user_secrets.get_secret("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = user_secrets.get_secret("REDDIT_CLIENT_SECRET")
REDDIT_USER_AGENT = user_secrets.get_secret("REDDIT_USER_AGENT")

In [2]:
!pip install praw
import praw
import pprint
import pandas as pd
from itertools import islice
import random
from zlib import crc32
from tqdm.auto import tqdm
from nltk.sentiment.vader import SentimentIntensityAnalyzer

Collecting praw
  Downloading praw-7.6.1-py3-none-any.whl (188 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.8/188.8 kB[0m [31m351.0 kB/s[0m eta [36m0:00:00[0m
Collecting prawcore<3,>=2.1
  Downloading prawcore-2.3.0-py3-none-any.whl (16 kB)
Installing collected packages: prawcore, praw
Successfully installed praw-7.6.1 prawcore-2.3.0
[0m

In [3]:
reddit = praw.Reddit(client_id=REDDIT_CLIENT_ID,client_secret=REDDIT_CLIENT_SECRET,user_agent=REDDIT_USER_AGENT)

# Section 1 : Performing Sentiment Analysis on Posts from AmItheAsshole Subreddit using Reddit API

### Creating Instance of AmItheAsshole Subreddit

In [4]:
aita = reddit.subreddit("AmItheAsshole")

### Pulling Content from a Reddit Submission, Sorting by New

In [5]:
for post in aita.new(limit=1):
    print(post.id)
    print(post.title)

10xmdlb
AITA for being fed up with this girl??


### Pulling Comments from a Reddit Submission, Sorting by New

In [6]:
for post in aita.new(limit=1):
    post.comments.replace_more(limit=None)
    print(post.id, post.title)
    for comment in post.comments.list():
        print(comment.id, comment.body)

10xmdlb AITA for being fed up with this girl??
j7t502p Welcome to /r/AmITheAsshole. Please view our [voting guide here](https://www.reddit.com/r/AmItheAsshole/wiki/faq#wiki_what.2019s_with_these_acronyms.3F_what_do_they_mean.3F), and remember to use **only one** judgement in your comment.

OP has offered the following explanation for why they think they might be the asshole:

 > I believe I simply may have taken things too far being known as a light-hearted guy and bringing up her intelligence.

Help keep the sub engaging!

#Don’t downvote assholes!

Do upvote interesting posts!

 [Click Here For Our Rules](https://www.reddit.com/r/AmItheAsshole/about/rules) and [Click Here For Our FAQ](https://www.reddit.com/r/AmItheAsshole/wiki/faq)

---

*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](https://www.reddit.com/message/compose/?to=/r/AmItheAsshole) if you have any questions or concerns.*

*Contest mode is 1.5 hours long on this

### Creating a DataFrame from Comments on the First 10 Posts from AITA

In [7]:
sub_data = []
comment_data = []

for post in tqdm(aita.new(limit=10), total=10):
    sub_data.append({
        'post_id': post.id,
        'post_sub': post.subreddit_id,
        'post_text': post.selftext
    })
    
    post.comments.replace_more(limit=None)
    for comment in tqdm(islice(post.comments,None), total=post.num_comments):
        comment_data.append({
            'comment_sub': comment.subreddit_id,
            'post_id': post.id,
            'comment_id': comment.id,
            'comment_text': comment.body,
            
        })
        
comment_data = pd.DataFrame(comment_data)
sub_data = pd.DataFrame(sub_data)
aita_data = sub_data.join(comment_data.set_index('post_id'), on='post_id')

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/8 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/27 [00:00<?, ?it/s]

  0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

### Function to Compute Sentiment of Comments

In [8]:
def get_sentiment(sentence):
    sid = SentimentIntensityAnalyzer()
    ss = sid.polarity_scores(sentence)
    return ss['compound']

In [9]:
aita_data['comment_sentiment'] = aita_data['comment_text'].apply(get_sentiment)

# Section 2 : Ranking Popular Subreddits by Average Sentiment of Submission Comments

### Selecting Popular Subreddits

In [10]:
pop_subreddits = ['announcements','funny','AskReddit','gaming','aww','Music','worldnews','movies','science','todayilearned']

### Creating Function to Call Instance of a Subreddits

In [11]:
def call_sub_instance(sub):
    return reddit.subreddit(str(sub))

### Creating a DataFrame from Comments on the First 10 Posts of Each Subreddit, Sorted by Hot

In [12]:
POSTS_PER_SUBREDDIT = 10
COMMENTS_PER_POST = 10

pop_sub_instances = list(map(call_sub_instance, pop_subreddits))

pop_data = []
pop_comment_data = []

for sub in pop_sub_instances:
    for post in tqdm(sub.hot(limit=POSTS_PER_SUBREDDIT), total=POSTS_PER_SUBREDDIT):
        pop_data.append({
            'post_id': post.id,
            'post_sub': post.subreddit_id,
            'post_text': post.selftext,
            'subreddit': post.subreddit_name_prefixed
        })
    
        post.comments.replace_more(limit=COMMENTS_PER_POST)
        for comment in tqdm(islice(post.comments,COMMENTS_PER_POST), total=min(post.num_comments, COMMENTS_PER_POST)):
            pop_comment_data.append({
                'comment_sub': comment.subreddit_id,
                'post_id': post.id,
                'comment_id': comment.id,
                'comment_text': comment.body,

            })
        
pop_data = pd.DataFrame(pop_data)
pop_comment_data = pd.DataFrame(pop_comment_data)
pop_sub_data = pop_data.join(pop_comment_data.set_index('post_id'), on='post_id')

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/8 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

### Cleaning Data

In [13]:
# Drop comments with no comment_text
pop_sub_data = pop_sub_data.dropna(axis='index', subset=['comment_text'])
# Convert non-string comments to strings
pop_sub_data['comment_text'] = pop_sub_data.comment_text.convert_dtypes(convert_string = True)

### Computing Average Comment Sentiment and Ranking Subreddits

In [14]:
pop_sub_data['comment_sentiment'] = pop_sub_data['comment_text'].apply(get_sentiment)

In [15]:
avg_sentiment = pop_sub_data[['comment_sentiment','subreddit']].groupby(by='subreddit').mean().sort_values(by='comment_sentiment', ascending=False).reset_index()
avg_sentiment

Unnamed: 0,subreddit,comment_sentiment
0,r/aww,0.314361
1,r/Music,0.278952
2,r/movies,0.259721
3,r/science,0.141755
4,r/gaming,0.111468
5,r/funny,0.106482
6,r/todayilearned,0.095689
7,r/AskReddit,0.007265
8,r/worldnews,-0.044542
9,r/announcements,-0.063
