# Reddit Data Collection — Workbook Solutions

*Don't forget to rename this notebook if you want to save changes!*

In this lesson, we're going to introduce learn how to collect Reddit posts with the API wrapper known as [PSAW](https://github.com/dmarx/psaw).

> Why do people use Pushshift’s API instead of the official Reddit API?

> In short, Pushshift makes it
much easier for researchers to query and retrieve historical
Reddit data, provides extended functionality by providing fulltext search against comments and submissions, and has larger
single query limits. 

>— Jason Baumgartner, et al., ["The Pushshift Reddit Dataset"](https://arxiv.org/pdf/2001.08435.pdf)

## Install PSAW

First, we're going to install the PSAW package with pip. The `!` allows us to run a command that is normally used on the command line.

In [None]:
!pip install psaw

Then we will import pandas and set the default display options.

In [2]:
import pandas as pd
pd.options.display.max_colwidth =  400
pd.options.display.max_columns = 50

Next we will import a specific part of the PSAW package, PushshiftAPI.

In [3]:
from psaw import PushshiftAPI

Then we will "initialize" the PushshiftAPI, so we can work with it below.

In [4]:
api = PushshiftAPI()

## Collect Reddit submissions for a subreddit

The way PSAW works is a little unique. First, we will set up an "API request generator," then we will loop through the generator to extract individual Reddit posts.

In [55]:
api_request_generator = api.search_submissions(subreddit='TodayILearned',
                                               score = ">10000",
                                               limit=100)

Here we extract individual Reddit posts from the API request generator, extracting the data, which is stored in the attribute `submission.d_`.

In [56]:
all_submissions = []
for submission in api_request_generator:
    all_submissions.append(submission.d_)

How would we calculate the length of the list `all_submissions`?

In [57]:
len(all_submissions)

100

How would we examine the first item in the list `all_submissions`?

In [58]:
all_submissions[0]

{'all_awardings': [{'award_sub_type': 'APPRECIATION',
   'award_type': 'global',
   'awardings_required_to_grant_benefits': None,
   'coin_price': 250,
   'coin_reward': 100,
   'count': 1,
   'days_of_drip_extension': 0,
   'days_of_premium': 0,
   'description': 'When a thing immediately combusts your brain. Gives %{coin_symbol}100 Coins to both the author and the community.',
   'end_date': None,
   'giver_coin_reward': None,
   'icon_format': None,
   'icon_height': 2048,
   'icon_url': 'https://i.redd.it/award_images/t5_22cerq/wa987k0p4v541_MindBlown.png',
   'icon_width': 2048,
   'id': 'award_9583d210-a7d0-4f3c-b0c7-369ad579d3d4',
   'is_enabled': True,
   'is_new': False,
   'name': 'Mind Blown',
   'penny_donate': None,
   'penny_price': None,
   'resized_icons': [{'height': 16,
     'url': 'https://preview.redd.it/award_images/t5_22cerq/wa987k0p4v541_MindBlown.png?width=16&amp;height=16&amp;auto=webp&amp;s=3429167a3ea031ab4a4574baf47b2140c7b59aa8',
     'width': 16},
    {'he

How would we create a DataFrame from `all_submissions`?

In [59]:
pd.DataFrame(all_submissions)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,...,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,thumbnail,thumbnail_height,thumbnail_width,title,total_awards_received,treatment_tags,upvote_ratio,url,url_overridden_by_dest,whitelist_status,wls,created,link_flair_template_id,link_flair_text,removed_by_category,link_flair_css_class,gilded,author_flair_background_color,author_flair_text_color
0,"[{'award_sub_type': 'APPRECIATION', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 250, 'coin_reward': 100, 'count': 1, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'When a thing immediately combusts your brain. Gives %{coin_symbol}100 Coins to both the author and the community.', 'end_date': None, 'giver_coin_reward': None, 'icon_forma...",True,HBombBrohan,,[],,text,t2_8lhmp,False,False,[],False,False,1614910685,youtube.com,https://www.reddit.com/r/todayilearned/comments/ly1ndi/til_that_ring_was_on_shark_tank_and_walked_away/,{'gid_1': 2},ly1ndi,True,False,False,False,True,False,False,...,False,False,todayilearned,t5_2qqjc,24986494,public,https://b.thumbs.redditmedia.com/8j3D96OGw1mZOKe1Z7LfkTlaRPuoC7Nc-SCBjMobgAA.jpg,105.0,140.0,TIL that Ring was on Shark Tank and walked away without a deal. Ring later sold to Amazon for $1 billion.,11,[],0.88,https://www.youtube.com/watch?v=zcEDiOwv6Ro,https://www.youtube.com/watch?v=zcEDiOwv6Ro,all_ads,6,1.614925e+09,,,,,,,
1,"[{'award_sub_type': 'APPRECIATION', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 250, 'coin_reward': 100, 'count': 1, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'The more you know... Gives %{coin_symbol}100 Coins to both the author and the community.', 'end_date': None, 'giver_coin_reward': None, 'icon_format': None, 'icon_height': ...",True,Rod_Me,,[],,text,t2_7du5z3fo,False,False,[],False,False,1614889519,en.wikipedia.org,https://www.reddit.com/r/todayilearned/comments/lxuavz/til_isaac_asimov_wrote_500_books_his_first_novel/,{'gid_1': 1},lxuavz,False,False,False,False,False,False,False,...,False,False,todayilearned,t5_2qqjc,24985209,public,default,140.0,140.0,"TIL Isaac Asimov wrote 500 books. His first novel was published in 1950, and he died in 1992. This means he wrote aprox 1 book per month for 42 straight years.",7,[],0.97,https://en.wikipedia.org/wiki/Isaac_Asimov,https://en.wikipedia.org/wiki/Isaac_Asimov,all_ads,6,1.614904e+09,b8ca7634-7786-11e6-9d21-0e18b019564b,"R2, R5",moderator,,,,
2,"[{'award_sub_type': 'GLOBAL', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 150, 'coin_reward': 0, 'count': 25, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'Thank you stranger. Shows the award.', 'end_date': None, 'giver_coin_reward': None, 'icon_format': None, 'icon_height': 2048, 'icon_url': 'https://i.redd.it/award_images/t5_22cerq...",True,Hambgex,,[],,text,t2_3tyrtztv,False,False,[],False,False,1614874741,en.wikipedia.org,https://www.reddit.com/r/todayilearned/comments/lxocip/til_that_at_an_allied_checkpoint_during_the/,{'gid_1': 18},lxocip,True,False,False,False,True,False,False,...,False,False,todayilearned,t5_2qqjc,24984092,public,https://b.thumbs.redditmedia.com/u60BEH8Wlu_TRzVkEvk6TA3WN3sMGMRzlZR6M63TSAM.jpg,100.0,140.0,"TIL that at an Allied checkpoint during the Battle of the Bulge, US General Omar Bradley was detained as a possible spy when he correctly identified Springfield as the capital of Illinois. The American military police officer who questioned him mistakenly believed the capital was Chicago",72,[],0.93,https://en.wikipedia.org/wiki/Battle_of_the_Bulge#Operation_Greif_and_Operation_W%C3%A4hrung,https://en.wikipedia.org/wiki/Battle_of_the_Bulge#Operation_Greif_and_Operation_W%C3%A4hrung,all_ads,6,1.614889e+09,,,,,,,
3,"[{'award_sub_type': 'GLOBAL', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 150, 'coin_reward': 0, 'count': 4, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'Thank you stranger. Shows the award.', 'end_date': None, 'giver_coin_reward': None, 'icon_format': None, 'icon_height': 2048, 'icon_url': 'https://i.redd.it/award_images/t5_22cerq/...",True,NewAccountEachYear,,[],,text,t2_6ick3b75,False,False,[],False,False,1614837213,en.wikipedia.org,https://www.reddit.com/r/todayilearned/comments/lxdwe1/til_the_first_country_to_recognize_greek/,{'gid_1': 4},lxdwe1,True,False,False,False,True,False,False,...,False,False,todayilearned,t5_2qqjc,24981120,public,https://b.thumbs.redditmedia.com/dh_ot3EVBz7dBUzbxNdb6YzAHZW8YJgpFjULPkqMRvA.jpg,140.0,140.0,"TIL the first country to recognize Greek independence was not any of the western powers, but Haiti, who alledgedly sent 25ton of Coffee beans to finance their rebellion",12,[],0.97,https://en.wikipedia.org/wiki/Jean-Pierre_Boyer#Greek_War_of_Independence,https://en.wikipedia.org/wiki/Jean-Pierre_Boyer#Greek_War_of_Independence,all_ads,6,1.614852e+09,,,,,,,
4,"[{'award_sub_type': 'PREMIUM', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 30, 'coin_reward': 0, 'count': 1, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'A glowing commendation for all to see', 'end_date': None, 'giver_coin_reward': None, 'icon_format': 'APNG', 'icon_height': 2048, 'icon_url': 'https://www.redditstatic.com/gold/awar...",True,breck,,[],,text,t2_30zub,False,True,[],False,False,1614833870,en.wikipedia.org,https://www.reddit.com/r/todayilearned/comments/lxd1z3/til_that_100s_of_years_ago_scribes_drew_colorful/,{'gid_1': 4},lxd1z3,False,False,False,False,False,False,False,...,False,False,todayilearned,t5_2qqjc,24980860,public,default,80.0,140.0,"TIL that 100's of years ago scribes drew colorful ¶ (""pilcrows"") to mark new paragraphs. They'd leave blank spaces for the ¶'s then switch inks and add them. They would often miss some so readers evolved to interpret a blank space (indent) as the start of a new paragraph. Voila.",13,[],0.97,https://en.wikipedia.org/wiki/Pilcrow,https://en.wikipedia.org/wiki/Pilcrow,all_ads,6,1.614848e+09,8c746876-aee8-11e1-949b-12313d28619b,(R.2) Editorializing,moderator,removed r3,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,"[{'award_sub_type': 'GLOBAL', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 150, 'coin_reward': 0, 'count': 8, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'Thank you stranger. Shows the award.', 'end_date': None, 'giver_coin_reward': None, 'icon_format': None, 'icon_height': 2048, 'icon_url': 'https://i.redd.it/award_images/t5_22cerq/...",True,bhaggith,,[],,text,t2_8v37f,False,False,[],False,False,1612100751,sciencenewsforstudents.org,https://www.reddit.com/r/todayilearned/comments/l9ckgq/til_that_babies_born_with_extra_digits_are_not/,{'gid_1': 3},l9ckgq,False,False,False,False,False,False,False,...,False,False,todayilearned,t5_2qqjc,24759320,public,default,74.0,140.0,"TIL That babies born with extra digits are not that rare, and those with 6 fingers on one hand are able to play a complicated video game with a single hand. What’s more, their brains have no trouble controlling the more complex movements of their extra digits.",15,[],0.91,https://www.sciencenewsforstudents.org/article/sixth-finger-can-prove-extra-handy,https://www.sciencenewsforstudents.org/article/sixth-finger-can-prove-extra-handy,all_ads,6,1.612115e+09,db5e66ec-bccd-11e1-b9fd-12313d051e91,(R.5) Omits Essential Info,moderator,removed r5,,,
96,"[{'award_sub_type': 'GLOBAL', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 150, 'coin_reward': 0, 'count': 11, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'Thank you stranger. Shows the award.', 'end_date': None, 'giver_coin_reward': None, 'icon_format': None, 'icon_height': 2048, 'icon_url': 'https://i.redd.it/award_images/t5_22cerq...",True,Bulgogi_Pupusas,,[],,text,t2_a13qdjpe,False,False,[],False,False,1612097521,en.wikipedia.org,https://www.reddit.com/r/todayilearned/comments/l9bpy2/til_that_the_common_method_for_a_spacecraft_to/,{'gid_1': 7},l9bpy2,True,False,False,False,True,False,False,...,False,False,todayilearned,t5_2qqjc,24759085,public,https://b.thumbs.redditmedia.com/QBnvM7tdkN4dYTpcaupA8zdLGnNWSBPok78Uh71-Ifs.jpg,140.0,140.0,"TIL that the common method for a spacecraft to shift between two orbits is called a Hohmann Transfer, and that the guy who calculated it (in 1925) was inspired by a science fiction book written in 1897, which gave a generally correct explanation of the concept of orbit trajectory",27,[],0.95,https://en.wikipedia.org/wiki/Hohmann_transfer_orbit,https://en.wikipedia.org/wiki/Hohmann_transfer_orbit,all_ads,6,1.612112e+09,,,,,,,
97,"[{'award_sub_type': 'GROUP', 'award_type': 'global', 'awardings_required_to_grant_benefits': 5, 'coin_price': 75, 'coin_reward': 100, 'count': 1, 'days_of_drip_extension': 0, 'days_of_premium': 7, 'description': 'All aboard! Every five Party Train Awards gives the author 100 Reddit Coins and a week of r/lounge access and ad-free browsing. Rack up the awards and watch the train level-up!', 'end...",True,ScreenExtension,,[],,text,t2_8tkyk48v,False,False,[],False,False,1608340113,alearned.com,https://www.reddit.com/r/todayilearned/comments/kfy30w/til_12_counties_in_africa_are_building_a_9_mile/,{'gid_1': 2},kfy30w,False,False,False,False,False,False,False,...,False,False,todayilearned,t5_2qqjc,24392378,public,https://b.thumbs.redditmedia.com/SJiyfSoO49dl0zKlgW7ujBFMDrYqlVN1cSvgDzjUH2A.jpg,111.0,140.0,TIL 12 counties in Africa are building a 9 mile wide tree wall all the way across the continent to prevent the Sahara from advancing,32,[],0.96,http://www.alearned.com/green-wall-of-africa/,http://www.alearned.com/green-wall-of-africa/,all_ads,6,1.608355e+09,,,moderator,,,,
98,"[{'award_sub_type': 'GLOBAL', 'award_type': 'global', 'awardings_required_to_grant_benefits': None, 'coin_price': 300, 'coin_reward': 0, 'count': 1, 'days_of_drip_extension': 0, 'days_of_premium': 0, 'description': 'When an upvote just isn't enough, smash the Rocket Like.', 'end_date': None, 'giver_coin_reward': None, 'icon_format': None, 'icon_height': 2048, 'icon_url': 'https://i.redd.it/awa...",True,duevigilance,,[],,text,t2_defeemc,False,False,[],False,False,1608330864,theguardian.com,https://www.reddit.com/r/todayilearned/comments/kfvcl3/til_that_legendary_hollywood_director_martin/,{'gid_1': 2},kfvcl3,True,False,False,False,True,False,False,...,False,False,todayilearned,t5_2qqjc,24391920,public,https://a.thumbs.redditmedia.com/s8eU4TyzSw2MnU8QK3ObGCbmXlPa25o7Z76_EY4Psr0.jpg,73.0,140.0,"TIL that legendary Hollywood director Martin Scorsese, best known for his violent gangster films, has used the same female editor, Thelma Schoonmaker, on every movie he's made since Raging Bull in 1980.",8,[],0.95,https://www.theguardian.com/film/2019/feb/10/thelma-schoonmaker-martin-scorsese-raging-bull-goodfellas-bafta-fellowship-2019,https://www.theguardian.com/film/2019/feb/10/thelma-schoonmaker-martin-scorsese-raging-bull-goodfellas-bafta-fellowship-2019,all_ads,6,1.608345e+09,,,,,,,


In [60]:
reddit_submissions = pd.DataFrame(all_submissions)

We could do all of the above in a single line of code, like so:

## Examine Data

Check what columns/metdata exist in this data:

In [61]:
reddit_submissions.columns

Index(['all_awardings', 'allow_live_comments', 'author',
       'author_flair_css_class', 'author_flair_richtext', 'author_flair_text',
       'author_flair_type', 'author_fullname', 'author_patreon_flair',
       'author_premium', 'awarders', 'can_mod_post', 'contest_mode',
       'created_utc', 'domain', 'full_link', 'gildings', 'id',
       'is_crosspostable', 'is_meta', 'is_original_content',
       'is_reddit_media_domain', 'is_robot_indexable', 'is_self', 'is_video',
       'link_flair_background_color', 'link_flair_richtext',
       'link_flair_text_color', 'link_flair_type', 'locked', 'media',
       'media_embed', 'media_only', 'no_follow', 'num_comments',
       'num_crossposts', 'over_18', 'parent_whitelist_status', 'permalink',
       'pinned', 'post_hint', 'preview', 'pwls', 'retrieved_on', 'score',
       'secure_media', 'secure_media_embed', 'selftext', 'send_replies',
       'spoiler', 'stickied', 'subreddit', 'subreddit_id',
       'subreddit_subscribers', 'subreddit_t

In [62]:
reddit_submissions[['title', 'score']].sample(5)

Unnamed: 0,title,score
57,TIL Lithuania withdrew from the 1992 Olympics due to the lack of money after the fall of the USSR. The Grateful Dead agreed to fund transportation costs for the basketball team along with Grateful Dead designs for the team's jerseys and shorts. They went on to win the Bronze.,37337
77,"TIL in 2013 a Canadian bank robber obsessed with Taylor Swift stole a Cessna 172 from a flight school, crossed the US border and flew to Nashville undetected. The plane crashed at Nashville International Airport, killing him instantly. No one noticed the burning wreck for five hours.",10147
27,"TIL At the height of the Japanese Real Estate bubble in 1989, Tokyo, on paper, was worth twice the value of the whole United States",11642
21,TIL Kevin Smith’s Dogma is unavailable to stream or purchase digitally and is out of print on home media.,43954
18,"TIL Marilyn Monroe’s signature breathy speaking voice was actually a tactic the actress used to overcome a childhood stutter. A speech therapist reportedly trained her to adopt the throaty style, and it ended up becoming one of her standout traits as an actress and singer.",51196


Transform the `created_utc` column to a normal date

In [63]:
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')

Select columns of interest

In [65]:
reddit_submissions = reddit_submissions[['score', 'title', 'author', 'selftext',
                  'url', 'subreddit',  'num_comments',
                  'num_crossposts']]
reddit_submissions

Unnamed: 0,score,title,author,selftext,url,subreddit,num_comments,num_crossposts
0,56030,TIL that Ring was on Shark Tank and walked away without a deal. Ring later sold to Amazon for $1 billion.,HBombBrohan,,https://www.youtube.com/watch?v=zcEDiOwv6Ro,todayilearned,2139,7
1,20440,"TIL Isaac Asimov wrote 500 books. His first novel was published in 1950, and he died in 1992. This means he wrote aprox 1 book per month for 42 straight years.",Rod_Me,,https://en.wikipedia.org/wiki/Isaac_Asimov,todayilearned,1315,2
2,82402,"TIL that at an Allied checkpoint during the Battle of the Bulge, US General Omar Bradley was detained as a possible spy when he correctly identified Springfield as the capital of Illinois. The American military police officer who questioned him mistakenly believed the capital was Chicago",Hambgex,,https://en.wikipedia.org/wiki/Battle_of_the_Bulge#Operation_Greif_and_Operation_W%C3%A4hrung,todayilearned,5113,10
3,18793,"TIL the first country to recognize Greek independence was not any of the western powers, but Haiti, who alledgedly sent 25ton of Coffee beans to finance their rebellion",NewAccountEachYear,,https://en.wikipedia.org/wiki/Jean-Pierre_Boyer#Greek_War_of_Independence,todayilearned,727,2
4,14369,"TIL that 100's of years ago scribes drew colorful ¶ (""pilcrows"") to mark new paragraphs. They'd leave blank spaces for the ¶'s then switch inks and add them. They would often miss some so readers evolved to interpret a blank space (indent) as the start of a new paragraph. Voila.",breck,,https://en.wikipedia.org/wiki/Pilcrow,todayilearned,519,4
...,...,...,...,...,...,...,...,...
95,42563,"TIL That babies born with extra digits are not that rare, and those with 6 fingers on one hand are able to play a complicated video game with a single hand. What’s more, their brains have no trouble controlling the more complex movements of their extra digits.",bhaggith,,https://www.sciencenewsforstudents.org/article/sixth-finger-can-prove-extra-handy,todayilearned,2598,5
96,49707,"TIL that the common method for a spacecraft to shift between two orbits is called a Hohmann Transfer, and that the guy who calculated it (in 1925) was inspired by a science fiction book written in 1897, which gave a generally correct explanation of the concept of orbit trajectory",Bulgogi_Pupusas,,https://en.wikipedia.org/wiki/Hohmann_transfer_orbit,todayilearned,1797,5
97,21311,TIL 12 counties in Africa are building a 9 mile wide tree wall all the way across the continent to prevent the Sahara from advancing,ScreenExtension,,http://www.alearned.com/green-wall-of-africa/,todayilearned,956,2
98,12922,"TIL that legendary Hollywood director Martin Scorsese, best known for his violent gangster films, has used the same female editor, Thelma Schoonmaker, on every movie he's made since Raging Bull in 1980.",duevigilance,,https://www.theguardian.com/film/2019/feb/10/thelma-schoonmaker-martin-scorsese-raging-bull-goodfellas-bafta-fellowship-2019,todayilearned,1066,1


## Your Turn!

Sort the DataFrame to look at the top 10 Reddit posts with the highest upvote score (note that upvote score is stored in the colum `score`):

In [67]:
reddit_submissions.sort_values(by='score', ascending=False)[:10]

Unnamed: 0,score,title,author,selftext,url,subreddit,num_comments,num_crossposts
64,129640,"TIL that William Whipple, one of the 56 signers of the Declaration of Independence, freed his slave after signing it because he believed one cannot simultaneously fight for freedom and hold another person in bondage.",NikMarus,,https://en.wikipedia.org/wiki/William_Whipple,todayilearned,2475,13
73,110110,"TIL Shia LaBeouf came under heavy fire for plagiarizing his directorial debut in 2012. When he publicly apologized to the original artist, Dan Clowes, people discovered that Shia's apology was itself plagiarized verbatim off a Yahoo Answers post from 2010.",IAmTheBraAndTheKet,,https://time.com/6094/shia-labeouf-plagiarism-scandal/,todayilearned,7226,12
5,108682,"TIL that the F.B.I. and C.I.A. recruit heavily from the Mormon population because they are usually cheaper to do a security clearance on, they often speak another language from their mission trips and they usually have a low risk lifestyle.",Intagvalley,,https://www.atlasobscura.com/articles/why-mormons-make-great-fbi-recruits,todayilearned,12340,21
46,95921,TIL GoldeneEye 007’s multiplayer mode was so last-minute that neither Rare nor Nintendo management knew about it. The first time executives saw anything was when programmers were playing it.,redmambo_no6,,https://www.engadget.com/2012-08-14-goldeneye-007s-multiplayer-was-added-last-minute-unknown-to-ra.html,todayilearned,7099,9
85,93337,"TIL Judge Judy earns $47 MILLION a year for taping 'Judge Judy.' In an interview she said that every 3 years she would present her salary request to the CBS TV Distribution President. Once, when he gave her a counter offer in an envelope, she refused to open it saying ""This isn't a negotiation.""",thejohnblog,,https://www.goodhousekeeping.com/life/entertainment/a31193785/judy-sheindlin-net-worth/,todayilearned,6605,3
47,93251,"TIL Joseph Bazalgette, the man who designed London's sewers in the 1860's, said 'Well, we're only going to do this once and there's always the unforeseen' and doubled the pipe diameter. If he had not done this, it would have overflowed in the 1960's (its still in use today).",james8475,,https://en.wikipedia.org/wiki/Joseph_Bazalgette,todayilearned,5950,10
39,89959,"TIL on the set of Blade: Trinity, Jessica Biel was supposed to fire an arrow directly at the camera, so the camera was surrounded by Plexiglass except for a 2"" x 2"" square in front of the camera lens. Biel managed to shoot the arrow through the hole and destroy the $300,000 camera.",Joe_Shroe,,https://youtu.be/VApoQKeCcVk,todayilearned,5502,13
76,87351,"TIL that graffiti artist Banksy sought to trademark his image of a protester throwing flowers. The trademark office denied it on the grounds of him having no interest in selling his work. In the ruling they used a quote from one of Banksy's books: ""copyright is for losers""",NewAccountEachYear,,https://www.bbc.com/news/entertainment-arts-54189113,todayilearned,3712,9
87,85163,"TIL in 2014, four tenants refused to move out of their homes when developers wanted to create one of the most exclusive residences in Manhattan. Eventually, they all received huge payouts. The last tenant was so savvy and stubborn he received $17 million, plus use of a $2 million residence for life.",WhileFalseRepeat,,https://www.independent.co.uk/news/world/americas/hermit-herbert-sukenik-paid-ps10-million-move-out-mould-covered-new-york-room-9165290.html,todayilearned,3574,8
2,82402,"TIL that at an Allied checkpoint during the Battle of the Bulge, US General Omar Bradley was detained as a possible spy when he correctly identified Springfield as the capital of Illinois. The American military police officer who questioned him mistakenly believed the capital was Chicago",Hambgex,,https://en.wikipedia.org/wiki/Battle_of_the_Bulge#Operation_Greif_and_Operation_W%C3%A4hrung,todayilearned,5113,10


Now choose your own subreddit to collect data from:

In [68]:
subreddit = 'wallstreetbets'

In [71]:
api_request_generator = api.search_submissions(subreddit=subreddit,
                                               score = ">3000", limit=100)

In [72]:
reddit_submissions = pd.DataFrame([submission.d_ for submission in api_request_generator])
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')
reddit_submissions = reddit_submissions[['date','score', 'title', 'author', 'selftext',
                  'url', 'subreddit',  'num_comments',
                  'num_crossposts']]
reddit_submissions

Unnamed: 0,date,score,title,author,selftext,url,subreddit,num_comments,num_crossposts
0,2021-03-13 23:14:32+00:00,6068,A message to WSB from the Director of the Dian Fossey Gorilla Fund!,kampingcarl,,https://v.redd.it/l3stpkyrnvm61,wallstreetbets,6072,2
1,2021-03-04 23:46:03+00:00,38793,Taking my GAINS paying off my house I just built.. I’m laying off the market for awhile.. my head hurts,Unrealforreal112,,https://i.redd.it/388ngnvxl3l61.jpg,wallstreetbets,2262,1
2,2021-03-04 22:57:08+00:00,7760,UPDATE: $STONKS TALKING TO ME TODAY,Tiptoedbymyself,,https://i.redd.it/xl2idpd7d3l61.png,wallstreetbets,140,0
3,2021-03-04 22:41:03+00:00,3111,A tale of two screenshots,Responsible-Gain-721,,https://i.redd.it/popjblkca3l61.png,wallstreetbets,116,0
4,2021-03-04 22:23:58+00:00,3102,"We broke through the $130 gate today. It's been bouncing off that roof all week, but this afternoon's slight volume pickup gave the push needed to breakthrough. This is going to set up for a crazy day tomorrow.",about9_9andahalf,,https://i.redd.it/gnxp3nfs63l61.png,wallstreetbets,373,0
...,...,...,...,...,...,...,...,...,...
95,2021-02-27 04:40:16+00:00,4932,You’re god damn right we going 🚀💥🌙,SleepNowInTheFire666,,https://v.redd.it/7okll2mr8yj61,wallstreetbets,110,1
96,2021-02-27 03:42:57+00:00,15014,WSB gonna give it to ya,twosons21,,https://v.redd.it/ab5ayuz9yxj61,wallstreetbets,198,1
97,2021-02-27 03:14:15+00:00,6349,First Week of February,Penguin_0X,,https://v.redd.it/m092bnsktxj61,wallstreetbets,131,1
98,2021-02-27 03:01:51+00:00,7057,I shopped at GameStop today and it blew my mind.,nrouns,"I went into a GameStop with my girlfriend so she could buy her normal little Pokemon Knick knacks and t-shirts.... BUT.... My visit made me rethink that GameStop really is doing better than I thought.\n\nFirst of all, this location had about 15 customers in there not including us. Two people bought switches, and three bought memberships before it got to our turn in line.\n\nBut seriously, I'm ...",https://www.reddit.com/r/wallstreetbets/comments/ltep3k/i_shopped_at_gamestop_today_and_it_blew_my_mind/,wallstreetbets,821,0


Sort the DataFrame to look at the 10 Reddit posts with the highest upvote score:

In [73]:
reddit_submissions.sort_values(by='score', ascending=False)[:10]

Unnamed: 0,date,score,title,author,selftext,url,subreddit,num_comments,num_crossposts
30,2021-03-04 10:55:01+00:00,87938,Hard hitting investigative journalism &amp; pultizer prize winning stuff right here,MIA4real,,https://i.redd.it/rpn6hpydszk61.jpg,wallstreetbets,2448,3
82,2021-02-27 16:13:39+00:00,66192,I spent more time on this than I like to admit,nerooooooo,,https://v.redd.it/p9vjrz43o1k61,wallstreetbets,1289,9
94,2021-02-27 04:52:08+00:00,59050,Chappelle has our backs 🦍🚀💎🤲🏼,Sculpzilla,,https://v.redd.it/53q5jn8oayj61,wallstreetbets,637,21
42,2021-03-03 21:33:21+00:00,54087,Dont piss off the mods,Fool_Take_5,,https://v.redd.it/567vinvatvk61,wallstreetbets,2639,18
21,2021-03-04 15:29:07+00:00,53083,When WSB hates Robinhood yet still posts a bunch of position screenshots from RH:,RingoDingo92,,https://v.redd.it/bhwkvfxp41l61,wallstreetbets,2735,2
10,2021-03-04 19:36:34+00:00,52762,GME and AMC Holders Right Now,ShermanWert,,https://v.redd.it/48prwht5d2l61,wallstreetbets,2726,5
65,2021-03-03 03:25:00+00:00,48178,UPDATE: $GME in full Phineas mode,Tiptoedbymyself,,https://i.redd.it/gy7tlv01fqk61.jpg,wallstreetbets,1619,7
80,2021-02-27 18:23:22+00:00,43554,To the moon we go! 🚀🚀🚀🌙,RevolutionaryHold401,,https://v.redd.it/vxcbqvppb2k61,wallstreetbets,721,9
64,2021-03-03 03:41:45+00:00,41949,How to pick your next buy guys!,SamoBomb,,https://v.redd.it/j019ttxyhqk61,wallstreetbets,2080,14
88,2021-02-27 13:01:28+00:00,39196,Don't get Psych'd out! Hang in there...,AlastorAugustus,,https://v.redd.it/1xa1npl6o0k61,wallstreetbets,995,8


## Collect Reddit submissions based on search keyword

Now search through Reddit posts based on a query word.

In [78]:
query = 'Missy Elliott'

In [79]:
api_request_generator = api.search_submissions(q= query,
                                                score = ">2000", limit=100)

In [80]:
reddit_submissions = pd.DataFrame([submission.d_ for submission in api_request_generator])
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')
reddit_submissions = reddit_submissions[['date','score', 'title', 'author', 'selftext',
                  'url', 'subreddit',  'num_comments',
                  'num_crossposts']]

Find all the subreddits where this query word appears (aka find the number of unique values for subreddits, which is stored in the column `subreddit`):

In [81]:
reddit_submissions['subreddit'].value_counts()

hiphopheads           7
Music                 2
The_Donald            2
savedyouaclick        1
BlackPeopleTwitter    1
reactiongifs          1
Showerthoughts        1
rupaulsdragrace       1
Name: subreddit, dtype: int64

## Bonus (If You Finish Early or Want to Explore More)

### Collect Reddit *comments* based on search keyword

In [82]:
api_request_generator = api.search_comments(q='Missy Elliott',
                                            score = ">2000")
reddit_submissions = pd.DataFrame([submission.d_ for submission in api_request_generator])
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')
reddit_submissions = reddit_submissions[['date','score', 'subreddit','body', 'author']]
reddit_submissions.head()

Unnamed: 0,date,score,subreddit,body,author
0,2018-12-30 17:17:40+00:00,9649,videos,"*cracks knuckles* it's my time to shine..\n\nJust to refresh your memories:\n\n- She publicized the private healthcare information of Method Man's wife, broadcasting that she had cancer and that Method Man was having an affair with his wife's physician. (which was a lie).\n\n- She's the one that spread the rumours that Tupac was raped in jail and had one testicle\n\n- She's proudly claimed th...",VulcanHobo
1,2018-03-19 03:40:30+00:00,4359,rage,"This is just the tip of the iceberg. Here's a list of shitty things Wendy Williams has done (compiled by /u/VulcanHobo):\n\n- She publicized the private healthcare information of Method Man's wife, broadcasting that she had cancer and that Method Man was having an affair with his wife's physician. (which was a lie).\n\n- She's the one that spread the rumours that Tupac was raped in jail and ha...",schoolboy43
2,2017-11-17 17:20:38+00:00,5581,videos,"credit to [/u/vulcanhobo] (https://www.reddit.com/r/videos/comments/7diwik/wendy_williams_says_that_terry_crews_coming/dpy8b31/)\n\nTold you guys in that last discussion on her (when she collapsed on halloween) that she deserves no respect and is a piece of shit person.\n\nJust to refresh your memories:\n\n- She publicized the private healthcare information of Method Man's wife, broadcasting t...",PhadedMonk
3,2017-11-17 07:54:52+00:00,14878,videos,"Told you guys in that last discussion on her (when she collapsed on halloween) that she deserves no respect and is quite the bitch.\n\nJust to refresh your memories:\n\n- She publicized the private healthcare information of Method Man's wife, broadcasting that she had cancer and that Method Man was having an affair with his wife's physician. (which was a lie).\n\n- She's the one that spread th...",VulcanHobo
4,2016-01-09 02:19:07+00:00,2202,gifs,"Who would have thought a background dancer for Janet Jackson, 'N Sync, Sean Combs, Toni Braxton, Celine Dion, Pink, Missy Elliott, Ricky Martin, and Christina Aguilera would have better moves than Channing.",HeyBayBeeUWanTSumFuk


### Collect Reddit submissions/comments based on multiple search keywords

To search for multiple phrases —  George Orwell OR J.R.R. Tolkein — use parentheses and the bitwise OR operator

In [83]:
api_request_generator = api.search_comments(q='(George Orwell)|(J. R. R. Tolkien)', limit=100)
reddit_submissions = pd.DataFrame([submission.d_ for submission in api_request_generator])
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')
reddit_submissions = reddit_submissions[['date','score', 'subreddit','body', 'author']]
reddit_submissions.head()

Unnamed: 0,date,score,subreddit,body,author
0,2021-04-05 18:57:42+00:00,1,ukpolitics,"There is practically no difference between an Anarchist and an An-Com. Both want the same thing- a society with no hierarchies.\n\n\nSo should George Orwell, one of the most prominent anti-USSR writers, have been put in jail for supporting the Anarchists and fighting as part of an AnCom group in the Spanish Civil War?",Codimus123
1,2021-04-05 18:55:04+00:00,1,ukpolitics,"Yes or no - would you say that the British Government should have put George Orwell in prison for supporting Anarchists?\n\n\nThere is no definition of Communism beyond ‘wanting a classless stateless society’. Anybody who wants that is a Communist, whether they’re Marxist or not. So, should the creators of Star Trek also be criminalised?",Codimus123
2,2021-04-05 18:46:55+00:00,1,ukpolitics,"So do you want to criminalise George Orwell’s works as well, such as *Homage to Catalonia*, *1984* and *Animal Farm*?\n\n\nGiven that Orwell supported the Anarchists and fought as part of an An-Com group?",Codimus123
3,2021-04-05 18:34:48+00:00,1,exmormon,"You're acting like a punk ass college kid. All you're doing is repeating right wing propaganda that is easily dis-proven...\n\nYou live in a red state? So you agree with how they shame the poor, homeless, people who have to depend on welfare?\n\n""Cancel culture"" is literally free market capitalism.... \n\nHigh taxes pay for the system that makes your capitalism possible. Infrastructure require...",Kon385
4,2021-04-05 18:02:36+00:00,1,AskReddit,Anything by George Orwell. It's all stupid and a waste of time if you ask me,EvilChocolateCookie


To search for multiple phrases —  Shakespeare AND Beyonce — use parentheses and the bitwise AND operator

In [84]:
api_request_generator = api.search_comments(q='(Shakespeare)&(Beyonce)')
reddit_submissions = pd.DataFrame([submission.d_ for submission in api_request_generator])
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')
reddit_submissions = reddit_submissions[['date','score', 'subreddit','body', 'author']]
reddit_submissions.head()

Unnamed: 0,date,score,subreddit,body,author
0,2021-04-04 21:54:03+00:00,1,indieheads,"# #26: Megan Thee Stallion - Savage Remix (feat. Beyonce)\n\n---\n\n**Average:** 6.343 **// Total Points:** 1129.1 **// Controversy:** 2.083 **// [Listen here](https://youtu.be/meSqIO3VqqE)**\n\n---\n\n**Highest scores:**\n\n(10 x6) Bosphorus\_f\_e\_d, ExtraEater, kappyko, MorrisFae, OllyTrolly, thebigscratch\n\n(9.5 x1) Shenanagoats\n\n(9.4 x2) NRuxin12, SRTviper\n\n(9 x7) adamjm99, Cashew\_F...",Srtviper
1,2021-04-03 11:14:59+00:00,1,musictheory,"In the way that music has become much more accessible to learn and consume, I think the concept of fame and what makes an artist great has become stratified. Artists can now be great, wildly renowned, famous, or praised for much smaller micro-genres. You can be famous for being the best Jazz Guitarist or Saxophonist, or you can be a great film scorer, a great lyricist, or a great bedroom ambie...",FrogDojo
2,2020-09-08 04:03:08+00:00,1,brasilnoticias,"A antropóloga e historiadora Lilia Schwarcz, convidada do Roda Viva nesta segunda-feira (7), relembrou o episódio em que foi criticada nas redes sociais após a o artigo sobre o [filme ""Black Is King"" de Beyoncé publicado na **Folha**.](https://www1.folha.uol.com.br/ilustrada/2020/08/filme-de-beyonce-erra-ao-glamorizar-negritude-com-estampa-de-oncinha.shtml)\n\n Ela disse não ser uma vítima d...",williambotter
3,2020-08-23 12:01:56+00:00,1,brasilnoticias,"Dou a palavra aqui a dois afro-brasileiros jovens, de pele preta (evitam a palavra “negro”), sobre [um artigo recente nesta **Folha**](https://www1.folha.uol.com.br/ilustrada/2020/08/filme-de-beyonce-erra-ao-glamorizar-negritude-com-estampa-de-oncinha.shtml) (da [antropóloga branca Lilia Schwarcz](https://www1.folha.uol.com.br/ilustrissima/2020/08/o-cancelamento-da-antropologa-branca-e-a-paut...",williambotter
4,2020-08-13 13:09:41+00:00,1,stupidpol,"I'm getting really tired of the high/low cultural relativism that is absolutely rife on Reddit and in left spaces as a supposedly egalitarian position. \n\nBeyonce and Eminem aren't as good as Wagner and Bach, and *Game of Thrones* and *Harry Potter* aren't as good as Shakespeare and Naipaul. Fucking get over it.",124876720


## Collect Reddit submissions/comments with start and end dates

From January 1, 2020 to January 10, 2020

In [85]:
import datetime as dt
start_epoch=int(dt.datetime(2020, 1, 1).timestamp())
end_epoch=int(dt.datetime(2020, 1, 10).timestamp())

api_request_generator = api.search_comments(q='(Shakespeare)&(Beyonce)"', after = start_epoch, before=end_epoch)
reddit_submissions = pd.DataFrame([submission.d_ for submission in api_request_generator])
reddit_submissions['date'] = pd.to_datetime(reddit_submissions['created_utc'], utc=True, unit='s')
reddit_submissions = reddit_submissions[['date','score', 'subreddit','body', 'author']]
reddit_submissions.head()

Unnamed: 0,date,score,subreddit,body,author
0,2020-01-05 06:09:22+00:00,6,popheads,"It's not hard to have an intelligent discussion from my end. Maybe it's because you're unable to see Taylor's lyrics unless it's from behind rose-tinted glasses is the reason why something that seems so obvious to you doesn't seem obvious to me.\n\nTaylor is a talented songwriter. I never denied that. But my point is some, if not MOST, of her biggest songs are every bit as vapid, repetitive an...",bleedin_liberal
