# Demo 14 - Reddit API

In [1]:
import numpy as np
import pandas as pd

## PSAW

psaw is a python module that is a wrapper for the Pushshift API. It includes functionality for accessing publicly available Reddit submissions and comments. 

Code and examples can be found on [github](https://github.com/dmarx/psaw), documentation can be found [online](https://psaw.readthedocs.io/en/latest/#)

**Question:** How do we install psaw?

<details>
<summary>Solution</summary>
   !pip install psaw

</details>

In [5]:
!pip install psaw



In [4]:
import psaw

In [6]:
from psaw import PushshiftAPI

In [7]:
api = PushshiftAPI()
api

<psaw.PushshiftAPI.PushshiftAPI at 0x7faf0b4e2ac0>

In [None]:
api.search_comments()

In [46]:
api_request_generator = api.search_submissions(subreddit='AMITheAsshole', size=25)
api_request_generator

<generator object PushshiftAPIMinimal._search at 0x7faf47cc7f90>

In [40]:
type(api_request_generator)

generator

### Python Generators Explained

#### Iterable

**Question** What is an `Iterable`?

<details>
<summary>Solution</summary>
    An iterable is any object in Python which has an __iter__ or a __getitem__ method defined which returns an iterator or can take indexes. In short an iterable is any object which can provide us with an iterator.
    
    <br>
    <br>
    https://book.pythontips.com/en/latest/generators.html (Quotes below are from this site as well)

</details>

##### Examples of Iterable objects

In [10]:
np_array = np.arange(10)
np_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
"__iter__" in dir(np_array)

True

In [12]:
for number in np_array:
    print(number)

0
1
2
3
4
5
6
7
8
9


In [None]:
"__getitem__" in dir(np_array)

In [13]:
dictionary = {}
for key in 'abcdefhi':
    dictionary[key] = np.random.rand()
dictionary

{'a': 0.24905767287063585,
 'b': 0.8523255237584075,
 'c': 0.7024317651265927,
 'd': 0.2094614583863874,
 'e': 0.7664520700602468,
 'f': 0.29900172129809444,
 'h': 0.07475177676595168,
 'i': 0.4869221032738906}

In [17]:
np_array[0]

0

In [18]:
"__iter__" in dir(np_array)

True

In [19]:
"__getitem__" in dir(np_array)

True

#### Iterator

An iterator is any object in Python which has a next (Python2) or __next__ method defined. That’s it. That’s an iterator

In [22]:
"__next__" in dir(np_array)

False

In [23]:
"__next__" in dir(iter(np_array))

True

In [25]:
iterator = iter(np_array)
iterator, np_array

(<iterator at 0x7faf0b42d520>, array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

In [26]:
next(iterator)

0

In [27]:
next(iterator)

1

In [28]:
next(iterator)

2

In [29]:
iterator = iter(np_array[:1])
iterator, np_array[:1]

(<iterator at 0x7faf0b42d100>, array([0]))

In [30]:
next(iterator)

0

In [31]:
next(iterator)

StopIteration: 

The previous line throws an error because we are at the end of the iterator

In [32]:
iterator = iter(np_array)
iterator, np_array

(<iterator at 0x7faf0b42db20>, array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

In [33]:
for number in iterator:
    print(number)

0
1
2
3
4
5
6
7
8
9


In [36]:
for number in iterator:
    print(number)

In [37]:
next(iterator)

StopIteration: 

#### Iteration

In simple words it is the process of taking an item from something e.g a list. When we use a loop to loop over something it is called iteration. It is the name given to the process itself. Now as we have a basic understanding of these terms let’s understand generators.

In [20]:
for num in np_array:
    print(num)

0
1
2
3
4
5
6
7
8
9


In [None]:
for num in iterator:
    print(num)

In [21]:
iterator = iter(np_array)
iterator

for num in iterator:
    print(num)

0
1
2
3
4
5
6
7
8
9


#### Generators

Generators are iterators, but you can only iterate over them once. It’s because they do not store all the values in memory, they generate the values on the fly. You use them by iterating over them, either with a ‘for’ loop or by passing them to any function or construct that iterates. Most of the time generators are implemented as functions. However, they do not return a value, they yield it. Here is a simple example of a generator function:

In [38]:
def generator_function():
    for i in range(3):
        yield i

gen = generator_function()
print(next(gen))
# Output: 0
print(next(gen))
# Output: 1
print(next(gen))
# Output: 2
print(next(gen))
# Output: Traceback (most recent call last):
#            File "<stdin>", line 1, in <module>
#         StopIteration

0
1
2


StopIteration: 

We can use generators to convert functions into Iterators

### Search Submissions

In [48]:
api_request_generator, type(api_request_generator)

(<generator object PushshiftAPIMinimal._search at 0x7faf47cc7f90>, generator)

In [62]:
submission = next(api_request_generator)
submission.selftext

"This is my first time posting so I'm not exactly sure how this should go. Today my friends/roommates and I all went out to a bar to celebrate our senior year(full vaccinated obviously), two of my friends/roommates (L and G) decided to go home while my other roommate and I went with some friends to a different bar. After we got a drink we decided to go home and had a mutual friend who hadn't been drinking drive us home. As we were on our way home said roommate (U) got a phone call finding out a family member was in the hospital after some medical complications. My roommate was very upset in the car but they did not tell me what was happening I only pieced things together from what they were saying on the phone and they were crying.\n\nWe were almost home and I asked U if they wanted me to come to their room with them, but they said no they wanted to be alone. Our two other roommates (L and G) were upstairs watching TV and I said that our other roommate was drunk and just wanted to go t

Potential subbreddits for class:

- https://www.reddit.com/r/columbia/
- https://www.reddit.com/r/wallstreetbets/
- https://www.reddit.com/r/datasets/
- /r/AskReddit

In [67]:
api_request_generator = api.search_submissions(subreddit='AmITheAsshole', score = ">2000", size=5)
api_request_generator

<generator object PushshiftAPIMinimal._search at 0x7faf47ef1970>

In [65]:
[submission.d_ for submission in api_request_generator]

KeyboardInterrupt: 

In [68]:
aita_subs_df = pd.DataFrame([submission.d_ for submission in api_request_generator])
aita_subs_df.shape



(3238, 78)

In [87]:
sorted(aita_subs_df.keys())

['all_awardings',
 'allow_live_comments',
 'author',
 'author_cakeday',
 'author_flair_background_color',
 'author_flair_css_class',
 'author_flair_richtext',
 'author_flair_template_id',
 'author_flair_text',
 'author_flair_text_color',
 'author_flair_type',
 'author_fullname',
 'author_patreon_flair',
 'author_premium',
 'awarders',
 'banned_by',
 'can_mod_post',
 'contest_mode',
 'created',
 'created_utc',
 'domain',
 'edited',
 'full_link',
 'gilded',
 'gildings',
 'id',
 'is_crosspostable',
 'is_meta',
 'is_original_content',
 'is_reddit_media_domain',
 'is_robot_indexable',
 'is_self',
 'is_video',
 'link_flair_background_color',
 'link_flair_css_class',
 'link_flair_richtext',
 'link_flair_template_id',
 'link_flair_text',
 'link_flair_text_color',
 'link_flair_type',
 'locked',
 'media_only',
 'no_follow',
 'num_comments',
 'num_crossposts',
 'og_description',
 'og_title',
 'over_18',
 'parent_whitelist_status',
 'permalink',
 'pinned',
 'post_hint',
 'preview',
 'pwls',
 'remo

- author
- title
- Selftext
- Created_at
- Score
- num_comments
- Subreddit
- url


In [74]:
aita_subs_df['title'].map(lambda x: x.startswith("AITA ")).value_counts()

True     2777
False     461
Name: title, dtype: int64

In [77]:
aita_subs_df[~aita_subs_df['title'].map(lambda x: x.startswith("AITA "))]['title']

1       UPDATE: AITA for telling my wife I think she’s...
15      Aita for leaving a friend group chat after bei...
20      Update: AITA for not attending my friend's wed...
30      WIBTA if I made my husband choose between me a...
37      Update: WIBTAH if I fired a kid because his mo...
                              ...                        
3192    WIBTA If I don't share my rather large (to me ...
3196    AITA: I ditched a girl 40 minutes from her hou...
3203    WIBTA if I make my gf start paying some of the...
3229    WIBTA if I post pictures of this girl that wen...
3231                                  UPVOTE THE ASSHOLES
Name: title, Length: 461, dtype: object

In [78]:
aita_subs_df['selftext']

0       Me (M31) and my GF (F24)  have a great relatio...
1       [original post ](https://www.reddit.com/r/AmIt...
2                                               [deleted]
3                                               [deleted]
4       I (25F) started my Masters program last year r...
                              ...                        
3233    Hi, so my girlfriend and i watched a horror mo...
3234                                                     
3235    Recently my daughter has 'come out' to me as n...
3236    I was going to McDonald's for a quick bite to ...
3237    So over two years ago a cat appeared in my yar...
Name: selftext, Length: 3238, dtype: object

In [81]:
[column for column in aita_subs_df.columns if 'created' in column]

['created_utc', 'created']

In [83]:
aita_subs_df['created_utc']

0       1614937372
1       1614922936
2       1614916559
3       1614914439
4       1614913736
           ...    
3233    1535751589
3234    1535648431
3235    1535276025
3236    1529374025
3237    1526009287
Name: created_utc, Length: 3238, dtype: int64

In [84]:
from datetime import datetime

In [85]:
aita_subs_df.head(5)['created_utc'].apply(datetime.fromtimestamp)

0   2021-03-05 09:42:52
1   2021-03-05 05:42:16
2   2021-03-05 03:55:59
3   2021-03-05 03:20:39
4   2021-03-05 03:08:56
Name: created_utc, dtype: datetime64[ns]

In [86]:
aita_subs_df['num_comments']

0        610
1        197
2        515
3       1467
4        745
        ... 
3233     405
3234     104
3235     905
3236      90
3237     169
Name: num_comments, Length: 3238, dtype: int64

#### Search submission by keyword

In [None]:
api_request_generator = api.search_submissions(q='Missy Elliott', score = ">2000")
missy_elliot_df = pd.DataFrame([submission.d_ for submission in api_request_generator])


### Search Comments

In [None]:
search_generator = api.search_comments(size=25)
search_generator

In [None]:
comments_df = pd.DataFrame([submission.d_ for submission in search_generator])
comments_df.shape

In [None]:
submissionubmission in search_generator:
    submission=submission

In [None]:
type(submission)

In [None]:
dir(submission)

In [None]:
submission.d_['stickied']

#### Search Comments by Multiple Keywords

**Question:** Can someone explain the next two lines

In [89]:
api_request_generator = api.search_comments(q='(George Orwell)|(J. R. R. Tolkien)')
api_request_generator = api.search_comments(q='(George Orwell)AND(J. R. R. Tolkien)')

In [None]:
api_request_generator = api.search_comments(q='(Shakespeare)&(Beyonce)')

## PRAW

PRAW is a popular python wrapper for accessing Reddit data. Unlike PSAW, it uses the Reddit API directly rather than Pushshifts collection of Reddit. 

If you use Praw, you need to create a Reddit account and create a Reddit App on Reddit.
We'll skip this during today's demo.

### Making a Reddit App


Go to https://www.reddit.com/prefs/apps/ and click on the button that says 
`are you a developer? create an app...`

[API Access overview](https://www.reddit.com/wiki/api)


## Cleaning text

In [111]:
aita_subs_df['selftext'][20], aita_subs_df['url'][20]

("Link to original [post](https://www.reddit.com/r/AmItheAsshole/comments/ljaump/aita_for_not_attending_my_friends_wedding_on_the/)\n\nHello Reddit, I want to start off by saying thank you to everyone who replied, I was not expecting my post to get this level of attention. I have read all of your comments, and thank you once again. I am in tears and became an emotional mess after reading your thoughtful responses, thank you for the love and support. I have always felt as some sort of emotional burden to my friends, so thank you for your encouraging messages and telling me to keep my head up high. Even though we are all strangers on the internet, it sort of felt like I had gained new family. My heart goes out to the people who have messaged me privately of their experiences with losing their loved ones to drunk driving as well. Please know, I have felt your pain, you are not alone in this and I wish you all the love in this world.\n\nNow for the update, I did what some of you suggested 

In [None]:
import pandas as pd
aita_subs_df = pd.DataFrame([submission.d_ for submission in api_request_generator])


In [None]:
submission = next(api_request_generator)
submission.selftext

In [107]:
!pip install redditcleaner



In [108]:
import redditcleaner

In [114]:
redditcleaner.clean(aita_subs_df['selftext'][23])

'Throwaway. I 32(F) is the only child of my parents. It became clear growing up that my dad wanted a boy but made the best of what he got. He taught me how to do everything, like fixing stuff around the house, putting DIY furniture together(he’s big on that), woodworking, basically a lot of stuff that is traditionally expected of men to do, my dad taught me how. My husband (33M) and I had our first child last year. We got most of the baby furniture from IKEA, I ended up putting all of them together(I’m not saying this to complain, it was relaxing for me). I also put together any of our child’s toys or chairs or whatever else that needs to get done. Due to the on pandemic, my husband’s family hadn’t been able to visit. My SIL and her husband recently got the vaccine( they work in health care) and decided to visit us for the first time. SIL made a comment about how nice the nursery looked, and asked where we got the nursery furniture from. I told her we got them from ikea and then she sa

In [127]:
aita_subs_df['title'].iloc[-2]

'AITA for throwing a soda on the ground near the dude I bought it for?'

In [120]:
redditcleaner.clean(aita_subs_df['selftext'].iloc[-3])

"Recently my daughter has 'come out' to me as non binary, meaning that she supposedly does not believe she is a man or a woman. I heard her out and let her speak, and tried to calmly ask her how she has come to this conclusion. The conversation was civil until I told her I did not believe she is anything but a woman. At this point, she started crying, calling me a bigot, and my wife had to take over. My wife tells me I am being insensitive. They want me to refer to them as 'they', which I have agreed to do in order to avoid conflict. I understand that there are people who decide to live as the opposite gender, and many of them take hormones and have operations to make the effect more believable. I understand that - like being gay - this is something people cannot control, it is what they feel they truly are inside. My daughter says that gender is on a spectrum. While I am glad my daughter isn't planning on having any operations, throughout her upbringing, I have never seen any sign tha

In [124]:
aita_subs_df['selftext'].iloc[-7]

"Guys, please, this is for the good of our community. \n\nI know it's counter-intuitive, your instinct is to downvote when you see an asshole, but it's just not in the spirit of this subreddit to do that here.\n\nWe shouldn't have to sort by controversial to find assholes here. We should be upvoting them so that everyone can see their assholery from their front page.\n\nPlease, please, please upvote the assholes! "