## Project 3: Web API's and NLP

Contents:

- [Pushshift API Request Example](#request)
- [Chess Subreddit Data Collection](#chess_data)
- [Poker Subreddit Data Collection](#poker_data)
- [Combine dataframe and save to csv](#concat)

### Collecting posts via API

In [24]:
# Imports 
import requests
import pandas as pd
import datetime as dt
import time

<a class="anchor" id="request"></a>

### Example of one Pushshift API request

In [25]:
# base url
url = 'https://api.pushshift.io/reddit/search/submission'

In [26]:
# params for chess subreddit
params = {
    'subreddit': 'chess',
    'size': 500, 
    'after': 1515851746
}

In [27]:
# create a request for the base url + subreddit
res = requests.get(url, params)

In [28]:
# status code for the request
res.status_code

200

In [29]:
# request json
data = res.json()

In [30]:
# create post variable
posts = data['data']

In [31]:
# generates 100 posts at a time
len(posts)

100

In [32]:
# save posts to a dataframe
df = pd.DataFrame(posts)

In [33]:
# columns to inspect first
df[['subreddit', 'selftext', 'title']]

Unnamed: 0,subreddit,selftext,title
0,chess,,One of my favorite chess puzzles of all time. ...
1,chess,"In Pump Up Your Rating, Axel Smith says:\n\n&g...","Do you ""say"" moves in your mind when calculating?"
2,chess,,Experience sheer luxury with the Sher-E-Punjab...
3,chess,[removed],Kingscrusher forces subscribing to watch his T...
4,chess,,I have this thing where I love revealed mates.
...,...,...,...
95,chess,Chess.com has tactics and daily puzzles but re...,Short Chess puzzles for dementia patients
96,chess,,White to move (Tactic I missed)
97,chess,,Would be much appreciated if you could vote fo...
98,chess,,GM norm requirements are more complicated than...


<a class="anchor" id="function"></a>

### Function to run multiple requests at a time 

<a class="anchor" id="chess_data"></a>

#### Chess Subreddit Data Collection

In [41]:
# chess posts master - create a dataframe variables
posts_master = pd.DataFrame(columns = ['title', 'subreddit', 'selftext', 'created_utc'])
posts_master

Unnamed: 0,title,subreddit,selftext,created_utc


In [42]:
# for loop to run through the chess subreddit
utc = 1605448546
for pull in range(10):
        query = 'https://api.pushshift.io/reddit/search/submission'
        params = {'subreddit': 'chess', 
                 'size': 500, 
                 'before' : utc}
        res = requests.get(query, params)
        data = res.json()['data']
        pull_dict = {
            'title' : [],
            'subreddit': [],
            'selftext': [],
            'created_utc': [],
    }
        for i in data:
            pull_dict['title'].append(i['title'])
            pull_dict['subreddit'].append(i['subreddit'])
            pull_dict['selftext'].append(i['selftext'])
            pull_dict['created_utc'].append(i['created_utc'])
        temp_posts = pd.DataFrame(pull_dict)
        posts_master = pd.concat([posts_master, temp_posts])
        utc = posts_master['created_utc'].astype('int64').min()
        time.sleep(3)
        print(f'pull number {pull + 1} Complete')


pull number 1 Complete
pull number 2 Complete
pull number 3 Complete
pull number 4 Complete
pull number 5 Complete
pull number 6 Complete
pull number 7 Complete
pull number 8 Complete
pull number 9 Complete
pull number 10 Complete


In [43]:
# print out a chess df
posts_master

Unnamed: 0,title,subreddit,selftext,created_utc
0,"Atalik-Sevgi, from Turkish Chess Championship ...",chess,,1605447514
1,Searching for a chess board,chess,[deleted],1605446338
2,How did I possibly lose this game?,chess,[deleted],1605446041
3,What could I have done better?,chess,"I am a quite a new player, and I just got this...",1605444239
4,The Giuocco Piano for black,chess,"Hello, I am looking for a book or video series...",1605442798
...,...,...,...,...
95,Mate in 9. Look for the back and forth!,chess,,1604792036
96,why did he resign?,chess,"&amp;#x200B;\n\n[Hey guys, I've just started p...",1604791686
97,Here's an interactive guide I just wrote for a...,chess,,1604790408
98,I chose one of my losses at random and an anal...,chess,Check out this #chess game: Matty5812 vs juanm...,1604789841


In [44]:
# minimum value of created utc column
posts_master['created_utc'].min()

1604789841

<a class="anchor" id="poker_data"></a>

#### Poker Subreddit Data Collection

In [45]:
# poker posts master - create a dataframe variables
posts_master_poker = pd.DataFrame(columns = ['title', 'subreddit', 'selftext', 'created_utc'])
posts_master_poker

Unnamed: 0,title,subreddit,selftext,created_utc


In [46]:
# for loop to loop through the poker subreddit
utc = 1602770146
for pull in range(10):
        query = 'https://api.pushshift.io/reddit/search/submission'
        params = {'subreddit': 'poker', 
                 'size': 500, 
                 'before' : utc}
        res = requests.get(query, params)
        data = res.json()['data']
        pull_dict = {
            'title' : [],
            'subreddit': [],
            'selftext': [],
            'created_utc': [],
    }
        for i in data:
            pull_dict['title'].append(i['title'])
            pull_dict['subreddit'].append(i['subreddit'])
            pull_dict['selftext'].append(i['selftext'])
            pull_dict['created_utc'].append(i['created_utc'])
        temp_posts_poker = pd.DataFrame(pull_dict)
        posts_master_poker = pd.concat([posts_master_poker, temp_posts_poker])
        utc = posts_master_poker['created_utc'].astype('int64').min()
        time.sleep(3)
        print(f'pull number {pull + 1} Complete')


pull number 1 Complete
pull number 2 Complete
pull number 3 Complete
pull number 4 Complete
pull number 5 Complete
pull number 6 Complete
pull number 7 Complete
pull number 8 Complete
pull number 9 Complete
pull number 10 Complete


In [47]:
# print out a poker df
posts_master_poker

Unnamed: 0,title,subreddit,selftext,created_utc
0,I completely butchered this hand. Villain had ...,poker,,1602767797
1,Poker in Poland,poker,Are there live poker rooms in Poland? I know o...,1602767632
2,What's the most profitable way to exchange tou...,poker,I have several thousand bucks tournament money...,1602765918
3,Hand reading vs weak opponent!,poker,,1602764506
4,When I'm trying to divide my range on the flop...,poker,Thank you:),1602764486
...,...,...,...,...
95,Help with chip denominations and blinds?,poker,Playing a friendly game between friends (4 of ...,1601203176
96,MP who reshoved had 100+BB and i had 70BB on t...,poker,,1601197576
97,What do you do in this situation,poker,I have AdQd in middle position (6 person cash ...,1601192473
98,HUNL against an opponent who always seems to h...,poker,Any tips for HUNL against an opponent who seem...,1601191875


In [48]:
# minimum value of created utc column
posts_master_poker['created_utc'].min()

1601184366

<a class="anchor" id="concat"></a>

### Combine Dataframes

In [None]:
# concatenate 'chess' and 'poker' subreddit dataframes
df = pd.concat([posts_master, posts_master_poker], ignore_index = False, sort=False)

In [None]:
# save df to a csv file
df.to_csv('data/combined_df.csv')