<a href="https://colab.research.google.com/github/dhonysilva/inicio/blob/master/Testes_com_os_dados_do_Reddit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import requests
url = "https://api.pushshift.io/reddit/search/comment/?q=python"
request = requests.get(url)
json_response = request.json()

In [0]:
def get_pushshift_data(data_type, **kwargs):
    """
    Gets data from the pushshift api.

    data_type can be 'comment' or 'submission'
    The rest of the args are interpreted as payload.

    Read more: https://github.com/pushshift/api
    """

    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    request = requests.get(base_url, params=payload)
    return request.json()

In [4]:
get_pushshift_data(data_type="comment",     # give me comments
                   q="python",              # that mention 'python'
                   after="48h",             # in the last 48 hours
                   size=10,               # maximum 1000 comments
                   sort_type="score",       # sort them by score
                   sort="desc")             # sort descending

{'data': [{'all_awardings': [],
   'associated_award': None,
   'author': '-Josh',
   'author_flair_background_color': None,
   'author_flair_css_class': None,
   'author_flair_richtext': [],
   'author_flair_template_id': None,
   'author_flair_text': None,
   'author_flair_text_color': None,
   'author_flair_type': 'text',
   'author_fullname': 't2_7uki3',
   'author_patreon_flair': False,
   'author_premium': True,
   'awarders': [],
   'body': 'It’s the subject of a classic [Monty Python sketch](https://m.youtube.com/watch?v=VAdlkunflRs) from 1974, so the gag is at bare minimum 45 years ago and is likely significantly older than that.',
   'collapsed_because_crowd_control': None,
   'created_utc': 1577641303,
   'gildings': {},
   'id': 'fcgl2x0',
   'is_submitter': False,
   'link_id': 't3_eh676m',
   'locked': False,
   'no_follow': False,
   'parent_id': 't1_fcfvi5r',
   'permalink': '/r/OutOfTheLoop/comments/eh676m/whats_the_deal_with_the_how_my_parents_say_they/fcgl2x0/',
   '

In [0]:
data = get_pushshift_data(data_type="comment",
                          q="python",
                          after="48h",
                          size=1000,
                          aggs="subreddit")

In [0]:
data = data.get("aggs").get("subreddit")

In [0]:
import pandas

In [0]:
df = pandas.DataFrame.from_records(data)[0:15]

In [12]:
df.head()

Unnamed: 0,doc_count,key
0,227,Python
1,221,learnpython
2,122,learnprogramming
3,118,ProgrammerHumor
4,114,programming


In [21]:
import plotly.express as px

px.bar(df,              # our dataframe
       x="key",         # x will be the 'key' column of the dataframe
       y="doc_count",   # y will be the 'doc_count' column of the dataframe
       title=f"Subreddits with most activity - comments with 'python' in the last 48h",
       labels={"doc_count": "# comments","key": "Subreddits"}, # the axis names
       color_discrete_sequence=["green"], # the colors used
       height=500,
       width=800)

In [0]:
def make_clickable(val):
    """ Makes a pandas column clickable by wrapping it in some html.
    """
    return '<a href="{}">Link</a>'.format(val,val)

In [27]:
# get the data we need using the function
data = get_pushshift_data(data_type="comment", q="python", after="7d", size=10, sort_type="score", sort="desc").get("data")

# we only care about certain columns
df = pandas.DataFrame.from_records(data)[["author", "subreddit", "score", "body", "permalink"]]

# we only keep the first X characters of the body of the comment (sometimes they are too big)
df['body'] = df['body'].str[0:400] + "..."

# we append the string to all the permalink entries so that we have a link to the comment
df['permalink'] = "https://reddit.com" + df['permalink'].astype(str)

# style the last column to be clickable and print
df.style.format({'permalink': make_clickable})

Unnamed: 0,author,subreddit,score,body,permalink
0,-Josh,OutOfTheLoop,37,"It’s the subject of a classic [Monty Python sketch](https://m.youtube.com/watch?v=VAdlkunflRs) from 1974, so the gag is at bare minimum 45 years ago and is likely significantly older than that....",Link
1,unfixpoint,Python,32,"1. What does belong in this sub!? I already know Python, this has nothing to do with me learning Python but is about the language itself. 2. This is a pretty vague answer but I'll go with it, though both `exec` and `print` are implemented in C so they're very intrinsic to the language too....",Link
2,feketegy,webdev,26,"This may be unpopular, but besides Java and Python I know all those things on that list. I’m not a “master”, but I know enough to set it up for a medium project. I don’t need nor want Facebook or Google level expertise. So yes, full-stack developers exist....",Link
3,ihasbedhead,Python,23,"Tldr; probably much simpler to have a keyword, and little reason to have a function. All flow control statements must be statements in high level languages because anything else would be massively confusing and not really useful. You may notice that many of the statement keywords in Python (break, continue, return, if, else, try, catch, yield, await, finally) have to do with skipping over code...",Link
4,etnguyen03,ProgrammerHumor,22,"*Image Transcription:* --- [*A coffee cup with the Python (programming language) logo printed on it, on a table. There is coffee inside the cup.*] --- ^^I'm&#32;a&#32;human&#32;volunteer&#32;content&#32;transcriber&#32;for&#32;Reddit&#32;and&#32;you&#32;could&#32;be&#32;too!&#32;[If&#32;you'd&#32;like&#32;more&#32;information&...",Link
5,homelesspancake,Python,22,">\# Import an image processing tool > > > >from PIL import Image > > > > > > > >\# Decide what image to convert, and make it 1/3 the size > > > >image = \[Image.open\](https://Image.open)(""ctycgchr36231.jpg"") > > > >image = image.resize((int(image.width / 3), int(image.height / 3))) > >...",Link
6,newp,webdev,20,"I'm actually kinda confused by this. Every person on our team codes in Python and PHP daily. They use Postgres, Redis, DynamoDB. We use Terraform, Cloudformation, Docker, ECS, etc. Everything is on \*nix. Why is it unheard of for developers to know all of these things?...",Link
7,Raggedhawk520,MakeMeSuffer,17,"I've actually held one of these insects at a reptile expo I went to to get rats for my Burmese Python. They are docile, and the insects into attack unless they experience the pain first, but they're well armored so j wouldn't worry about it....",Link
8,skyblueandblack,LosAngeles,14,"The main problem with backyard breeders is that they're irresponsible. Say their dog had two blind littermates -- that'd strongly indicate it's an inherited trait, and he's a likely carrier of that gene, so as far as reputable breeders are concerned, that removes him from consideration as a breeding dog; he'd be considered pet, rather than show quality. And usually, he'd be neutered and vaccinated...",Link
9,purpleappletrees,programmingcirclejerk,14,"Lisp /uj Personally I like Haskell, Ocaml, and Rust, and I'm fine with Python and C++. It's just fun to laugh at the flaws or stereotypes of each language. However, I do unironically agree with most comments here shitting on Go....",Link


In [0]:
# get the data with our function
data = get_pushshift_data(data_type="comment",
                          after="2d",
                          size=1000,
                          sort_type="score",
                          sort="desc",
                          subreddit="python").get("data")

# define a list of columns we want to keep
columns_of_interest = ["author", "body", "created_utc", "score", "permalink"]

# transform the response into a dataframe
df = pandas.DataFrame.from_records(data)[columns_of_interest]

In [29]:
import textblob

sentence1 = "Portugal is a horrible country. People drive like crazy animals."
print(textblob.TextBlob(sentence1).sentiment)
# -> Sentiment(polarity=-0.8, subjectivity=0.95)
# negative and subjective

sentence2 = "Portugal is the most beautiful country in the world because beaches face west."
print(textblob.TextBlob(sentence2).sentiment)
# -> Sentiment(polarity=0.675, subjectivity=0.75)
# positive and less subjective

Sentiment(polarity=-0.8, subjectivity=0.95)
Sentiment(polarity=0.675, subjectivity=0.75)


In [0]:
# create a column with sentiment polarity
df["sentiment_polarity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.polarity, axis=1)

# create a column with sentiment subjectivity
df["sentiment_subjectivity"] = df.apply(lambda row: textblob.TextBlob(row["body"]).sentiment.subjectivity, axis=1)

# create a column with 'positive' or 'negative' depending on sentiment_polarity
df["sentiment"] = df.apply(lambda row: "positive" if row["sentiment_polarity"] >= 0 else "negative", axis=1)

# create a column with a text preview that shows the first 50 characters
df["preview"] = df["body"].str[0:50]

# take the created_utc parameter and tranform it into a datetime column
df["date"] = pandas.to_datetime(df['created_utc'],unit='s')

In [32]:
px.scatter(df, x="date", # date on the x axis
               y="sentiment_polarity", # sentiment on the y axis
               hover_data=["author", "permalink", "preview"], # data to show on hover
               color_discrete_sequence=["lightseagreen", "indianred"], # colors to use
               color="sentiment", # what should the color depend on?
               size="score", # the more votes, the bigger the circle
               size_max=10, # not too big
               labels={"sentiment_polarity": "Comment positivity", "date": "Date comment was posted"}, # axis names
               title=f"Comment sentiment in /r/python for the past 48h", # title of figure
          )