<h1><center>My Reddit monitoring dashboard for Python 🐍</center></h1>


<center><i>A simple dashboard to monitor keywords in Reddit, made with <a href="https://github.com/voila-dashboards/voila">Voila</a>, <a href="https://pandas.pydata.org/">Pandas</a>, <a href="https://plot.ly/python/plotly-express/">Plotly Express</a> and <a href="https://textblob.readthedocs.io/en/dev/">TextBlob</a>.</i></center>

<center><i><a href="https://github.com/NaquibAlam/voila_heroku_demo_2">Source code on github</a></i></center>

In [14]:
# import libraries

import requests
import pandas
import textblob
import plotly.express as px
import nltk
import matplotlib.pyplot as plt
nltk.download('punkt')
pandas.set_option('display.max_colwidth', -1) # don't cut my pandas dataframes

[nltk_data] Downloading package punkt to /Users/m0a04ut/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.



In [18]:
# define variables

COMMENT_COLOR         = "blueviolet"
SUBMISSION_COLOR      = "darkorange"
TEXT_PREVIEW_SIZE     = 240
TERM_OF_INTEREST      = "covid"
SUBREDDIT_OF_INTEREST = "covid"
TIMEFRAME             = "48h" # see more options in the pushshift api docs: https://github.com/pushshift/api
SIZE= 500 #no of records to return

In [19]:
# a couple of helper functions

def get_reddit_data(data_type, **kwargs):
    """
    Gets data from the pushshift api.
    
    Data type can be 'comment' or 'submission'
    The rest of the args as interpreted as payload.
    
    Read more: https://github.com/pushshift/api
    """
    
    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    
    request = requests.get(base_url, params=payload)
    
    return request.json()


def make_clickable(val):
    """
    Makes a pandas column clickable.
    """
    
    return '<a href="{}">Link</a>'.format(val)

# Figure Index

- [Comment activity](#1)
- [Submission activity](#2)
- [Most upvoted comments](#3) 
- [Most commented submissions](#4) 
- [/r/python comment sentiment timeline](#5)

## Comment activity <a class="anchor" id="1"></a>

In [39]:
data = get_reddit_data(data_type="comment", q=TERM_OF_INTEREST, after=TIMEFRAME, size= SIZE, aggs="subreddit").get("data")

df = pandas.DataFrame(data)["subreddit"].value_counts()[0:10]
x = df.keys()
y = df.values

px.bar(df,
       x=x,
       y=y,
       title=f"Subreddits with most comments having term '{TERM_OF_INTEREST}' in the last {TIMEFRAME}",
       labels={"x": "Subreddits", "y": "Number of comments"},
       color_discrete_sequence=[COMMENT_COLOR],
       height=500,
       width=800)

## Submission activity <a class="anchor" id="2"></a>

In [27]:
data = get_reddit_data(data_type="submission", q=TERM_OF_INTEREST, after=TIMEFRAME, size=1000, aggs="subreddit").get("data")

df = pandas.DataFrame(data)["subreddit"].value_counts()[0:10]
x = df.keys()
y = df.values


px.bar(df,
       x=x,
       y=y,
       title=f"Subreddits with most submissions having term '{TERM_OF_INTEREST}' in the last {TIMEFRAME}",
       labels={"x": "Subreddits", "y": "Number of submissions"},
       color_discrete_sequence=[COMMENT_COLOR],
       height=500,
       width=800)

## Most upvoted comments <a class="anchor" id="3"></a>

In [41]:
data = get_reddit_data(data_type="comment", q=TERM_OF_INTEREST, after=TIMEFRAME, size=10, sort_type="score", sort="desc").get("data")
# to see what columns interest you, simply list(df)
df = pandas.DataFrame(data)[["author", "subreddit", "score", "body", "permalink"]]

# we only keep the first X characters of the body 
df.body = df.body.str[0:100] + "..."

# we append the string to all the permalink entries
df.permalink = "https://reddit.com" + df.permalink.astype(str)

# print 
print(f"\nTop 10 most upvoted comments with '{TERM_OF_INTEREST}' in the past {TIMEFRAME}\n")

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})


Top 10 most upvoted comments with 'covid' in the past 48h



Unnamed: 0,author,subreddit,score,body,permalink
0,Gold-Giant,PublicFreakout,131,Everybody knows Covid can’t get you while you’re eating cookies....,Link
1,GobtheCyberPunk,baseball,74,That 90% number is also misleading because thats 90% chance of being *infected at all* if exposed to...,Link
2,MobiuS_360,TheRightCantMeme,55,"I already see it, 40 years from now they're going to use another famous person as propaganda who die...",Link
3,Sp_Gamer_Live,baseball,55,Covid putting up Wilt numbers...,Link
4,centaurius_,NYYankees,54,#GUMBY: GOOD #AARON: JUDGE AND JURY #GIANCARLO: SLAMTONIAN #LeMACHINE: ONLINE #LUIS: suCESSAful ...,Link
5,Ugadead1991,baseball,53,Covid knows not to disrespect the anthem! 🦅 ♥️ 🤍 💙 🎆...,Link
6,JamesWithaG,baseball,52,"In a strange way I felt like the whole thing went perfectly. None of it was intentional, Castellanos...",Link
7,caltheon,interestingasfuck,49,You’d be surprised. Since Covid I started tracking my distances more closely and found I walk roughl...,Link
8,austnoli,Torontobluejays,41,Imagine risking covid just to watch your team lose...,Link
9,andersmike,baseball,40,Everybody could have also stayed home. I doubt anybody that goes to a baseball game to drink beer an...,Link


## Most commented submissions <a class="anchor" id="4"></a>

In [42]:
data = get_reddit_data(data_type="submission", q=TERM_OF_INTEREST, after=TIMEFRAME, size=10, sort_type="num_comments", sort="desc").get("data")


# to see what columns interest you, simply list(df)
df = pandas.DataFrame.from_records(data)[["author", "subreddit", "num_comments", "title", "permalink"]]

# we only keep the first X characters of the body 
df.title = df.title[0:100] + "..."

# we append the string to all the permalink entries
df.permalink = "https://reddit.com" + df.permalink.astype(str)

# print 
print(f"\nTop 10 most commented submissions with '{TERM_OF_INTEREST}' in the past {TIMEFRAME}\n")

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})



Top 10 most commented submissions with 'covid' in the past 48h



Unnamed: 0,author,subreddit,num_comments,title,permalink
0,ukpolbot,ukpolitics,1783,Daily Megathread - 05/04/2021...,Link
1,Lightneng,PublicFreakout,1774,"Packed restaurant in Vancouver, BC chant ""Get out!"" at COVID-19 health inspectors...",Link
2,ukpolbot,ukpolitics,1418,Daily Megathread - 06/04/2021...,Link
3,nogoyolo,AmItheAsshole,1417,AITA for following through and not going to Easter because I'm tired of EVERY family thing being about the kids?...,Link
4,Vulphere,indonesia,1236,06 April 2021- Daily Chat Thread...,Link
5,TX908,science,1118,New study suggests that masks and a good ventilation system are more important than social distancing for reducing the airborne spread of COVID-19 in classrooms....,Link
6,throwaway5272,politics,1025,Biden set to announce he's moving deadline for all US adults to be eligible for Covid vaccine to April 19...,Link
7,AutoModerator,Coronavirus,1014,"Daily Discussion Thread | April 05, 2021...",Link
8,TheyCallHerBlossom,soccer,665,Comunicado Oficial: Raphaël Varane tests positive for Covid-19...,Link
9,Lyrtil,italy,656,Megathread Coronavirus * 05/04/21 - 11/04/21...,Link


## /r/covid comment sentiment timeline <a class="anchor" id="5"></a>

In [46]:
data = get_reddit_data(data_type="comment", after=TIMEFRAME, size=1000, sort_type="score", sort="desc", subreddit=SUBREDDIT_OF_INTEREST).get("data")
df = pandas.DataFrame.from_records(data)[["author", "body", "created_utc", "score", "permalink"]]

df["sentiment_polarity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.polarity, axis=1)
df["sentiment_subjectivity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.subjectivity, axis=1)
df["sentiment"] = df.apply(lambda row: "positive" if row["sentiment_polarity"] >= 0 else "negative", axis=1)

df["preview"] = df["body"].str[0:50]

df["date"] = pandas.to_datetime(df['created_utc'],unit='s')

px.scatter(df, x="date", 
               y="sentiment_polarity",
               hover_data=["author", "permalink", "preview"],
               color_discrete_map={"positive": "lightseagreen", "negative": "indianred"},
               color="sentiment",
               size_max=10,
               labels={"sentiment_polarity": "Comment positivity", "date": "Date comment was posted on"},
               title=f"Comment sentiment in /r/{SUBREDDIT_OF_INTEREST} for the past {TIMEFRAME}",
          )
