# Getting data from an API
This notebook walks you through some steps in collecting data from Reddit using the Pushshift.io API.

We will use the **Python Pushshift.io API Wrapper (PSAW)** which is documented here -> https://psaw.readthedocs.io/en/latest/


    Other arguments / search parameters that you can use : https://github.com/pushshift/api

### Import package
This wrapper package allows the searching of public submissions and comments.

In [140]:
from psaw import PushshiftAPI
import pandas as pd
from datetime import datetime

api = PushshiftAPI()

### Get the 5 most recent posts in all of Reddit

Note that this will pull the top 5 most recent posts in Reddit regardless of subreddit

In [143]:
posts = api.search_submissions(limit=5,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
results = list(posts)

In [144]:
results

[submission(author='YoungManWitTheplan', created_utc=1649155377, full_link='https://www.reddit.com/r/InternetCity/comments/tws1e5/_/', subreddit='InternetCity', title='😭😭', created=1649126577.0, d_={'author': 'YoungManWitTheplan', 'created_utc': 1649155377, 'full_link': 'https://www.reddit.com/r/InternetCity/comments/tws1e5/_/', 'subreddit': 'InternetCity', 'title': '😭😭', 'created': 1649126577.0}),
 submission(author='MGUM44', created_utc=1649155377, full_link='https://www.reddit.com/r/placeAtlas2/comments/tws1e4/new_submission_aqua_cat_vtuber/', subreddit='placeAtlas2', title='New Submission Aqua Cat (Vtuber)', created=1649126577.0, d_={'author': 'MGUM44', 'created_utc': 1649155377, 'full_link': 'https://www.reddit.com/r/placeAtlas2/comments/tws1e4/new_submission_aqua_cat_vtuber/', 'subreddit': 'placeAtlas2', 'title': 'New Submission Aqua Cat (Vtuber)', 'created': 1649126577.0}),
 submission(author='Stillcouldbeworse', created_utc=1649155377, full_link='https://www.reddit.com/r/grantu

### Get the most recent post from r/philippines

In [145]:
posts = api.search_submissions(limit=5,
                               subreddit="philippines",
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])

In [146]:
# Posts can be iterated to get the specific key-value pairs of the specified column/filter in the code above.
# List of dictionaries. with key-value pairs
[thing.d_ for thing in posts][0]

{'author': 'reiner26',
 'created_utc': 1649155540,
 'full_link': 'https://www.reddit.com/r/Philippines/comments/tws2w0/unity_sa_philippine_television_abscbn_x_gma/',
 'subreddit': 'Philippines',
 'title': 'unity sa philippine television: ABS-CBN x GMA',
 'created': 1649126740.0}

In [147]:
# Here we conver the Dictionariy list into a dataframe.
posts = api.search_submissions(limit=5, subreddit="philippines", filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_df = pd.DataFrame([thing.d_ for thing in posts])
posts_df

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,reiner26,1649155540,https://www.reddit.com/r/Philippines/comments/...,Philippines,unity sa philippine television: ABS-CBN x GMA,1649127000.0
1,redblackshirt,1649155476,https://www.reddit.com/r/Philippines/comments/...,Philippines,Malapit na sa hukay balimbing pa rin?,1649127000.0
2,jrv993,1649155314,https://www.reddit.com/r/Philippines/comments/...,Philippines,Hopefully this will end Network Wars and Exclu...,1649127000.0
3,MainhaySiKarlo,1649155192,https://www.reddit.com/r/Philippines/comments/...,Philippines,I was playing Gangstar Vegas and I came across...,1649126000.0
4,Illustrious_Elk9405,1649155108,https://www.reddit.com/r/Philippines/comments/...,Philippines,Baler - Where to stay for 7 days? Need a local...,1649126000.0


In [148]:
# Find index, select column
posts_df.loc[0, 'full_link']

'https://www.reddit.com/r/Philippines/comments/tws2w0/unity_sa_philippine_television_abscbn_x_gma/'

In [150]:
# Select Column, then find index
posts_df['full_link'][0]

'https://www.reddit.com/r/Philippines/comments/tws2w0/unity_sa_philippine_television_abscbn_x_gma/'

### Get posts from March 11 in r/philippines

You can use this to select data from specific timeperiod

In [157]:
import datetime as dt

sub="philippines"
start="2021-03-11"
# Changed string to integer
start_date=pd.to_datetime(start)
start_epoch=int(start_date.timestamp())


posts = api.search_submissions(limit=20, 
                               subreddit=sub, 
                               after=start_epoch,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])
posts_df = pd.DataFrame([thing.d_ for thing in posts])

In [158]:
start_date

Timestamp('2021-03-11 00:00:00')

In [159]:
start_epoch

1615420800

In [160]:
# Notice created_utc and created column
posts_df.head()

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,CharlieNyoMarupok,1649156178,https://www.reddit.com/r/Philippines/comments/...,Philippines,Pro-Isko group in Visayas says switching to Le...,1649127000.0
1,thecuteLizzieNoen,1649156076,https://www.reddit.com/r/Philippines/comments/...,Philippines,My brother is now a Manny Pacquiao supporter,1649127000.0
2,springheeledjack69,1649155897,https://www.reddit.com/r/Philippines/comments/...,Philippines,Anybody remember the Sef Gonzales case? Found ...,1649127000.0
3,nobalutpls1231,1649155811,https://www.reddit.com/r/Philippines/comments/...,Philippines,what is star magarine and is it healthy?,1649127000.0
4,OutrageousPiglet1007,1649155659,https://www.reddit.com/r/Philippines/comments/...,Philippines,Bukas na ang Paglaem sa Palawan! Tuloy lang ka...,1649127000.0


In [161]:
#Converting created
posts_df['created'] = [datetime.fromtimestamp(i).strftime("%A, %B %d, %Y %I:%M:%S") for i in posts_df['created']]

#Converting created_utc - Coordinated Universal Time
posts_df['created_utc'] = [datetime.fromtimestamp(i).strftime("%A, %B %d, %Y %I:%M:%S") for i in posts_df['created_utc']]

In [162]:
posts_df

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,CharlieNyoMarupok,"Tuesday, April 05, 2022 06:56:18",https://www.reddit.com/r/Philippines/comments/...,Philippines,Pro-Isko group in Visayas says switching to Le...,"Tuesday, April 05, 2022 10:56:18"
1,thecuteLizzieNoen,"Tuesday, April 05, 2022 06:54:36",https://www.reddit.com/r/Philippines/comments/...,Philippines,My brother is now a Manny Pacquiao supporter,"Tuesday, April 05, 2022 10:54:36"
2,springheeledjack69,"Tuesday, April 05, 2022 06:51:37",https://www.reddit.com/r/Philippines/comments/...,Philippines,Anybody remember the Sef Gonzales case? Found ...,"Tuesday, April 05, 2022 10:51:37"
3,nobalutpls1231,"Tuesday, April 05, 2022 06:50:11",https://www.reddit.com/r/Philippines/comments/...,Philippines,what is star magarine and is it healthy?,"Tuesday, April 05, 2022 10:50:11"
4,OutrageousPiglet1007,"Tuesday, April 05, 2022 06:47:39",https://www.reddit.com/r/Philippines/comments/...,Philippines,Bukas na ang Paglaem sa Palawan! Tuloy lang ka...,"Tuesday, April 05, 2022 10:47:39"
5,reiner26,"Tuesday, April 05, 2022 06:45:40",https://www.reddit.com/r/Philippines/comments/...,Philippines,unity sa philippine television: ABS-CBN x GMA,"Tuesday, April 05, 2022 10:45:40"
6,redblackshirt,"Tuesday, April 05, 2022 06:44:36",https://www.reddit.com/r/Philippines/comments/...,Philippines,Malapit na sa hukay balimbing pa rin?,"Tuesday, April 05, 2022 10:44:36"
7,jrv993,"Tuesday, April 05, 2022 06:41:54",https://www.reddit.com/r/Philippines/comments/...,Philippines,Hopefully this will end Network Wars and Exclu...,"Tuesday, April 05, 2022 10:41:54"
8,MainhaySiKarlo,"Tuesday, April 05, 2022 06:39:52",https://www.reddit.com/r/Philippines/comments/...,Philippines,I was playing Gangstar Vegas and I came across...,"Tuesday, April 05, 2022 10:39:52"
9,Illustrious_Elk9405,"Tuesday, April 05, 2022 06:38:28",https://www.reddit.com/r/Philippines/comments/...,Philippines,Baler - Where to stay for 7 days? Need a local...,"Tuesday, April 05, 2022 10:38:28"


# Other Usable Parameters

In [164]:
import datetime as dt

# Select Subreddit
sub="philippines"

# Select Start Date
start="2021-05-11"

start_date=pd.to_datetime(start)
start_epoch=int(start_date.timestamp())

# Select End Date
end="2022-04-05"

end_date=pd.to_datetime(end)
end_epoch=int(end_date.timestamp())

# Select Author
author_search = 'Harakata'

# Searchterm

# search_term = 'League of Legends'

posts = api.search_submissions(limit=20, 
                               subreddit=sub, 
                               after=start_epoch,
                               before=end_epoch,
                               author = author_search,
#                                q = search_term,
                               filter=['full_link','author', 'title', 'subreddit', 'created_utc'])

# Converting to DataFrame
posts_df = pd.DataFrame([thing.d_ for thing in posts])


#Converting created
posts_df['created'] = [datetime.fromtimestamp(i).strftime("%A, %B %d, %Y %I:%M:%S") for i in posts_df['created']]

#Converting created_utc - Coordinated Universal Time
posts_df['created_utc'] = [datetime.fromtimestamp(i).strftime("%A, %B %d, %Y %I:%M:%S") for i in posts_df['created_utc']]
posts_df

Unnamed: 0,author,created_utc,full_link,subreddit,title,created
0,Harakata,"Thursday, January 20, 2022 01:20:48",https://www.reddit.com/r/Philippines/comments/...,Philippines,"Sa mga nagwork sa fast-food chain, anong mga p...","Thursday, January 20, 2022 05:20:48"
1,Harakata,"Monday, July 12, 2021 09:54:12",https://www.reddit.com/r/Philippines/comments/...,Philippines,How to use online shopping apps properly and n...,"Monday, July 12, 2021 01:54:12"
2,Harakata,"Monday, July 12, 2021 09:43:56",https://www.reddit.com/r/Philippines/comments/...,Philippines,HOW TO SHOP ONLINE IN THE PHILIPPINES:,"Monday, July 12, 2021 01:43:56"
3,Harakata,"Wednesday, June 30, 2021 09:13:02",https://www.reddit.com/r/Philippines/comments/...,Philippines,What are good breakfasts and dinners to cook w...,"Wednesday, June 30, 2021 01:13:02"
4,Harakata,"Monday, June 28, 2021 06:54:17",https://www.reddit.com/r/Philippines/comments/...,Philippines,May computer subject po ba sa curriulum ng bac...,"Monday, June 28, 2021 10:54:17"
5,Harakata,"Friday, June 18, 2021 07:42:38",https://www.reddit.com/r/Philippines/comments/...,Philippines,Pano ba dapat ako makitungo sa mga taong ginag...,"Friday, June 18, 2021 11:42:38"
6,Harakata,"Saturday, June 12, 2021 06:51:39",https://www.reddit.com/r/Philippines/comments/...,Philippines,Sino na dito na food poison ng Daing na Bangus?,"Saturday, June 12, 2021 10:51:39"
7,Harakata,"Tuesday, May 25, 2021 07:27:38",https://www.reddit.com/r/Philippines/comments/...,Philippines,"What's your worst ""nawala ko yung resibo"" story?","Monday, May 24, 2021 11:27:38"
8,Harakata,"Sunday, May 23, 2021 10:24:30",https://www.reddit.com/r/Philippines/comments/...,Philippines,"Sa mga nadelay ang board exams, musta na?","Sunday, May 23, 2021 02:24:30"
9,Harakata,"Thursday, May 20, 2021 08:23:10",https://www.reddit.com/r/Philippines/comments/...,Philippines,Penge naman ng tips kung san dapat ako titingi...,"Thursday, May 20, 2021 12:23:10"


# Other APIs where you can pull data from:

    Twitter API
    Facebook API
    Yahoo Finance API - Stocks Data
        I will prepare a sample jupyter notebook for this so you can explore timeseries data as well.