# Getting data from an API
This notebook walks you through some steps in collecting data from Reddit using the Pushshift.io API.

We will use the **Python Pushshift.io API Wrapper (PSAW)** which is documented here -> https://psaw.readthedocs.io/en/latest/

### Import package
This wrapper package allows the searching of public submissions and comments.

In [21]:
from psaw import PushshiftAPI
import pandas as pd

api = PushshiftAPI()

https://api.pushshift.io/meta


### Get the 5 most recent posts in all of Reddit

In [22]:
posts = api.search_submissions(limit=5, filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
results = list(posts)

https://api.pushshift.io/reddit/submission/search?limit=5&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [23]:
results[0]

submission(author='Tall-Video-2156', created_utc=1615977620, full_link='https://www.reddit.com/r/WidescreenWallpaper/comments/m6xkeo/can_anyone_help_me_find_this_wallpaper/', subreddit='WidescreenWallpaper', title='Can anyone help me find this wallpaper?', created=1615948820.0, d_={'author': 'Tall-Video-2156', 'created_utc': 1615977620, 'full_link': 'https://www.reddit.com/r/WidescreenWallpaper/comments/m6xkeo/can_anyone_help_me_find_this_wallpaper/', 'subreddit': 'WidescreenWallpaper', 'title': 'Can anyone help me find this wallpaper?', 'created': 1615948820.0})

### Get the most recent post from r/philippines

In [24]:
posts = api.search_submissions(limit=5, subreddit="philippines", filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_df = pd.DataFrame([thing.d_ for thing in posts])

https://api.pushshift.io/reddit/submission/search?limit=5&subreddit=philippines&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [25]:
posts_df

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,HiromiSai,1615977610,https://www.reddit.com/r/Philippines/comments/...,Philippines,That one outlier,1615949000.0
1,tiredlemonade,1615977298,https://www.reddit.com/r/Philippines/comments/...,Philippines,RDO and BIR 1700,1615948000.0
2,FooTVph,1615976790,https://www.reddit.com/r/Philippines/comments/...,Philippines,"Rico Blanco and Maris, March 17, 2021",1615948000.0
3,pleaseris,1615976650,https://www.reddit.com/r/Philippines/comments/...,Philippines,seaweeds!!!! whenever I eat at this unli resto...,1615948000.0
4,salmonisafish420,1615975923,https://www.reddit.com/r/Philippines/comments/...,Philippines,"If you we're the president of the Philippines,...",1615947000.0


In [26]:
posts_df.loc[0, 'full_link']

'https://www.reddit.com/r/Philippines/comments/m6xkbo/that_one_outlier/'

### Get posts from March 11 in r/philippines

In [27]:
import datetime as dt

sub="philippines"
start="2021-03-11"

start_date=pd.to_datetime(start)

start_epoch=int(start_date.timestamp())

posts = api.search_submissions(limit=10, 
                               subreddit=sub, 
                               before=start_epoch,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_df = pd.DataFrame([thing.d_ for thing in posts])

https://api.pushshift.io/reddit/submission/search?limit=10&subreddit=philippines&before=1615420800&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [28]:
posts_df

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,Intelligent_Ear3155,1615420404,https://www.reddit.com/r/Philippines/comments/...,Philippines,"Cuzette is a good jewelry brand, they offer go...",1615392000.0
1,ladyfromthedarkside,1615419908,https://www.reddit.com/r/Philippines/comments/...,Philippines,Makati’s strict implementation of wearing of f...,1615391000.0
2,Logical_Ad_3556,1615419483,https://www.reddit.com/r/Philippines/comments/...,Philippines,Hong Kong Toymakers Are Philippines’ New Targe...,1615391000.0
3,setardo,1615418893,https://www.reddit.com/r/Philippines/comments/...,Philippines,"Early Morning Coconut Trees View - Siargao, Ph...",1615390000.0
4,CommunicationFar116,1615418058,https://www.reddit.com/r/Philippines/comments/...,Philippines,Filipino on Guam Musician,1615389000.0
5,Reach_Round,1615417483,https://www.reddit.com/r/Philippines/comments/...,Philippines,Crypto to Peso ?,1615389000.0
6,VeterinarianDry7601,1615415742,https://www.reddit.com/r/Philippines/comments/...,Philippines,https://app.shopback.com/pK2fNgYuweb,1615387000.0
7,luvie06,1615414525,https://www.reddit.com/r/Philippines/comments/...,Philippines,PLS ANSWER I need this for my research :((,1615386000.0
8,the_yaya,1615413301,https://www.reddit.com/r/Philippines/comments/...,Philippines,"Daily random discussion - Mar 11, 2021",1615385000.0
9,threehappypenguins,1615411232,https://www.reddit.com/r/Philippines/comments/...,Philippines,Mail Forwarding Service,1615382000.0


In [29]:
import json

result = posts_df.to_json(orient="split")
parsed = json.loads(result)
json.dumps(parsed, indent=4) 
with open ('march11.json', 'w') as json_file:
    json.dump(parsed, json_file)


In [30]:
import datetime as dt

sub="philippines"
start="2021-03-12"

start_date=pd.to_datetime(start)

start_epoch=int(start_date.timestamp())

posts = api.search_submissions(limit=20, 
                               subreddit=sub, 
                               before=start_epoch,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_df = pd.DataFrame([thing.d_ for thing in posts])

https://api.pushshift.io/reddit/submission/search?limit=20&subreddit=philippines&before=1615507200&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [11]:
posts_df

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,HackdogDestroyer,1615507085,https://www.reddit.com/r/Philippines/comments/...,Philippines,Para sa pangmabilisang pagkatuto ng Baybayin,1615478000.0
1,AngBatas,1615507030,https://www.reddit.com/r/Philippines/comments/...,Philippines,gulaman and rice kayo dyan,1615478000.0
2,decadentrebel,1615506980,https://www.reddit.com/r/Philippines/comments/...,Philippines,"Two years later, #trashtag no longer holds the...",1615478000.0
3,Sensitive_Cycle_679,1615505583,https://www.reddit.com/r/Philippines/comments/...,Philippines,Legal Advice,1615477000.0
4,uria046,1615505465,https://www.reddit.com/r/Philippines/comments/...,Philippines,On this day March 12th 2021 marked the first a...,1615477000.0
5,Panthercm,1615505391,https://www.reddit.com/r/Philippines/comments/...,Philippines,Philippines: 4 New People’s Army (NPA) rebels ...,1615477000.0
6,LecheKaFlan,1615503678,https://www.reddit.com/r/Philippines/comments/...,Philippines,Anti-dam group asks court to stop construction...,1615475000.0
7,ConclusionNo9516,1615502417,https://www.reddit.com/r/Philippines/comments/...,Philippines,Please Support po! Panagbenga Festival Kit Unb...,1615474000.0
8,fake__username,1615501945,https://www.reddit.com/r/Philippines/comments/...,Philippines,Excellent as per as Duterte govt standard,1615473000.0
9,the_yaya,1615499701,https://www.reddit.com/r/Philippines/comments/...,Philippines,"Daily random discussion - Mar 12, 2021",1615471000.0


In [31]:
result = posts_df.to_json(orient="split")
parsed = json.loads(result)
json.dumps(parsed, indent=4) 
with open ('march12.json', 'w') as json_file:
    json.dump(parsed, json_file)

In [40]:
posts = api.search_submissions(limit=5, subreddit="anime", filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_df = pd.DataFrame([thing.d_ for thing in posts])

https://api.pushshift.io/reddit/submission/search?limit=5&subreddit=anime&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [34]:
posts_df.head()

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,alex_honk1234,1615978093,https://www.reddit.com/r/anime/comments/m6xoie...,anime,Get a life lmao,1615949000.0
1,Creative_Service_777,1615977907,https://www.reddit.com/r/anime/comments/m6xmrm...,anime,Anime donut deaths(spoilers),1615949000.0
2,kwassisking,1615977336,https://www.reddit.com/r/anime/comments/m6xi1b...,anime,Anime Rap - Yea or Nay,1615949000.0
3,AGN30,1615977335,https://www.reddit.com/r/anime/comments/m6xi19...,anime,"Outside me laugh, inside me hurt.",1615949000.0
4,quasarlel,1615977205,https://www.reddit.com/r/anime/comments/m6xgze...,anime,They Know What I Like I Guess 🤔,1615948000.0


In [41]:
sub="anime"
start="2021-03-11"

start_date=pd.to_datetime(start)

start_epoch=int(start_date.timestamp())

posts = api.search_submissions(limit=10, 
                               subreddit=sub, 
                               before=start_epoch,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_dfa = pd.DataFrame([thing.d_ for thing in posts])

https://api.pushshift.io/reddit/submission/search?limit=10&subreddit=anime&before=1615420800&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [42]:
result = posts_dfa.to_json(orient="split")
parsed = json.loads(result)
json.dumps(parsed, indent=4) 
with open ('march11_anime.json', 'w') as json_file:
    json.dump(parsed, json_file)

In [43]:
sub="anime"
start="2021-03-12"

start_date=pd.to_datetime(start)

start_epoch=int(start_date.timestamp())

posts = api.search_submissions(limit=10, 
                               subreddit=sub, 
                               before=start_epoch,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_dfa2 = pd.DataFrame([thing.d_ for thing in posts])

https://api.pushshift.io/reddit/submission/search?limit=10&subreddit=anime&before=1615507200&filter=full_link&filter=author&filter=title&filter=subreddit&filter=created_utc&metadata=true&sort=desc


In [44]:
result = posts_dfa2.to_json(orient="split")
parsed = json.loads(result)
json.dumps(parsed, indent=4) 
with open ('march12_anime.json', 'w') as json_file:
    json.dump(parsed, json_file)