<img src="../images/GA.png" style="width:600px; height:200px"/>

# Project 3 - Web APIs and NLP

# Background

We are a data science team working in GG (a PC building company like AfterShock), specialising in constructing high-performance gaming terminals for our customers. Based on customer feedback, the component that rouses the most concerns and requests for assistance is regarding the graphics card.

# Problem Statement

Management had asked the IT team to build a bot to direct customers with graphics card problems to relevant subreddits incase they need help after office hours. The software engineers **requested our assistance** to help **classify keywords** for the bot to direct customers to their respective subject subreddits albeit with accuracy.

# Summary

The following subreddits were scraped using PushShift API from https://api.pushshift.io to scrape the title and selftext from the following subreddits:  
**AMD and NVIDIA**.

This is a classification problem and it is approached by using **Count Vectorization** and **TF-IDF Vectorization** with 2 classifier models.  

Grid Search CV was utilized to find the optimal hyperparamters for each classifier model.  
The models chosen are :  
    1. **Logistic Regression**  
    2. **Multinomial Naive Bayes**  
    
The train, test scores and area under curve of the models were used to gauge the classifier performance.

The criteria for selecting the models are the following:

- **High Accuracy**
- **Minimize False Positives (High Sensitivity)**
- **Minimize False Negatives (High Specificity)**
- **Good Generalization**

High sensitivity is required as we want to direct customers to the correct subreddit and not the wrong ones.

All models provided similar accuracy of 0.9 and similiar sensitivity of 0.9.

In summary, **Multinomial Naive Bayes** with **Count Vectorization** was picked as the chosen classifier with the least score difference with 0.90 train and 0.89 test as it had the best generalization with the other considerations being the same.


# Background Research





#### 1. Reddit Score is upvote minus downvote
https://guides.co/g/a-beginners-guide-to-reddit/9668

#### 2. AMD is a semiconductor company specialising in computer graphic cards and microprocessors
https://www.amd.com/en/corporate/about-amd

#### 3. NVIDIA is a semiconductor company specialising in computer graphic cards
https://www.nvidia.com/en-sg/
https://www.newegg.com/Components/Store/ID-1

#### 4. A graphics card or a graphics processing unit is a graphics rendering acclerator
https://www.tomshardware.com/reviews/gpu-graphics-card-definition,5742.html

<img src="../images/AMD and NVIDIA.png" style="width:6000px; height:150px"/>

#### 5. RTX vs GTX - model names for NVIDIA Graphics Cards
https://www.geeksforgeeks.org/gtx-vs-rtx-which-is-better/

#### 6. Ryzen vs Intel - CPU companies
https://www.gamingscan.com/amd-ryzen-vs-intel-for-gaming/

#### 7. Geforce Experience - Application for optimizing games
https://www.nvidia.com/en-sg/geforce/geforce-experience/faq/#:~:text=GeForce%20Experience%20is%20the%20companion,greatest%20gaming%20moments%20with%20friends.&text=GeForce%20Experience%20provides%20optimal%20settings%20for%20over%20350%20games

#### 8. Geforce Experience vs NVidia Control Panel - Separate applications for different uses
https://www.quora.com/Does-GeForce-Experience-replace-Nvidia-Control-Panel

#### 9. GPU prices - a comparison chart
https://www.techspot.com/article/2294-gpu-pricing-2021-update/

#### 10. AMD on cryptocurrency mining - no blocking, but cards are better for gaming
https://www.pcgamer.com/amd-cryptocurrency-mining-limiter-ethereum/

#### 11. GPU Ranking for gaming with price 
https://benchmarks.ul.com/compare/best-gpus

#### 12. Breakeven pricing for mining
https://www.tomshardware.com/best-picks/best-mining-gpus-benchmarked-and-ranked

#### 13. Aftershock PC - Our "company" basically models this company
https://www.aftershockpc.com/welcome/About

#### 1. Importing Libraries and Settings

In [1]:
#import libraries

import pandas as pd
import numpy as np
import requests
import time
import nltk
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
import re

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [2]:
pd.set_option('display.max_columns', 4000)
pd.set_option('display.max_rows', 4000)

#### 2. Checking API and parameter settings
Subreddits identified: amd and nvidia

In [3]:
#using push shift api for subreddits
url = 'https://api.pushshift.io/reddit/search/submission?subreddit=amd'
url2 = 'https://api.pushshift.io/reddit/search/submission?subreddit=nvidia'

In [4]:
#setting parameters from 1st thread
params = {
    'subreddit': 'amd',
    'size' : 100 #100 seems to be the max, even if change this to a greater size
#    'before': ''
}

In [5]:
#setting parameters from 2nd thread
params2 = {
    'subreddit': 'nvidia',
    'size' : 100 #100 seems to be the max, even if change this to a greater size
#    'before': ''
}

In [6]:
#checking for request status for 1st url
res = requests.get(url, params) 
res.status_code

200

In [7]:
#checking for request status for 2nd url
res2 = requests.get(url2, params2)
res2.status_code

200

In [8]:
#assigning response in json format to a variable
data1 = res.json()
#copy out data from dictionary
posts1 = data1['data'][0]

#verify what we got
#print(orig_posts)
print(posts1)
print(type(posts1))
#checking keys
print(posts1.keys())

{'all_awardings': [], 'allow_live_comments': False, 'author': 'G-Mas', 'author_flair_css_class': None, 'author_flair_richtext': [], 'author_flair_text': None, 'author_flair_type': 'text', 'author_fullname': 't2_7twu2lhx', 'author_is_blocked': False, 'author_patreon_flair': False, 'author_premium': False, 'awarders': [], 'can_mod_post': False, 'contest_mode': False, 'created_utc': 1627245839, 'domain': 'self.Amd', 'full_link': 'https://www.reddit.com/r/Amd/comments/orjkhx/wandering_if_my_5950x_is_getting_to_hot/', 'gildings': {}, 'id': 'orjkhx', 'is_created_from_ads_ui': False, 'is_crosspostable': False, 'is_meta': False, 'is_original_content': False, 'is_reddit_media_domain': False, 'is_robot_indexable': False, 'is_self': True, 'is_video': False, 'link_flair_background_color': '#ff9800', 'link_flair_richtext': [], 'link_flair_template_id': 'a0256696-9254-11e6-936c-0e8d317f3e82', 'link_flair_text': 'Discussion', 'link_flair_text_color': 'light', 'link_flair_type': 'text', 'locked': Fals

In [9]:
#repeat for 2nd subreddit
data2 = res2.json()
posts2 = data2['data'][0]
print(posts2)
print(type(posts2))
print(posts2.keys())

{'all_awardings': [], 'allow_live_comments': False, 'author': 'WaterXS', 'author_flair_css_class': None, 'author_flair_richtext': [], 'author_flair_text': None, 'author_flair_type': 'text', 'author_fullname': 't2_2lz7xps7', 'author_is_blocked': False, 'author_patreon_flair': False, 'author_premium': False, 'awarders': [], 'can_mod_post': False, 'contest_mode': False, 'created_utc': 1627246635, 'domain': 'self.nvidia', 'full_link': 'https://www.reddit.com/r/nvidia/comments/orjtst/does_pcie_40_work_on_a_pcie_30_slot/', 'gildings': {}, 'id': 'orjtst', 'is_created_from_ads_ui': False, 'is_crosspostable': False, 'is_meta': False, 'is_original_content': False, 'is_reddit_media_domain': False, 'is_robot_indexable': False, 'is_self': True, 'is_video': False, 'link_flair_background_color': '#e91e63', 'link_flair_css_class': 'question', 'link_flair_richtext': [{'e': 'text', 't': 'Question'}], 'link_flair_template_id': 'fb6f8e52-4086-11e6-b284-0ee96c7aff3d', 'link_flair_text': 'Question', 'link_f

In [10]:
#checking scraped data in df
results_df = pd.DataFrame(data1['data'])
results_df2 = pd.DataFrame(data2['data'])
results_df

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_is_blocked,author_patreon_flair,author_premium,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_created_from_ads_ui,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,pwls,removed_by_category,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,suggested_sort,thumbnail,title,total_awards_received,treatment_tags,upvote_ratio,url,whitelist_status,wls,link_flair_css_class,thumbnail_height,thumbnail_width,post_hint,preview,author_flair_background_color,author_flair_text_color,banned_by,url_overridden_by_dest,media,media_embed,secure_media,secure_media_embed,author_flair_template_id,crosspost_parent,crosspost_parent_list,gallery_data,is_gallery,media_metadata,author_cakeday
0,[],False,G-Mas,,[],,text,t2_7twu2lhx,False,False,False,[],False,False,1627245839,self.Amd,https://www.reddit.com/r/Amd/comments/orjkhx/w...,{},orjkhx,False,False,False,False,False,False,True,False,#ff9800,[],a0256696-9254-11e6-936c-0e8d317f3e82,Discussion,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orjkhx/wandering_if_my_5950x_i...,False,6,reddit,1627245850,1,[removed],True,False,False,Amd,t5_2rw0n,1021671,public,confidence,self,Wandering if my 5950x is getting to hot,0,[],1.0,https://www.reddit.com/r/Amd/comments/orjkhx/w...,all_ads,6,,,,,,,,,,,,,,,,,,,,
1,[],False,w3lmatatan,,[],,text,t2_gguqm,False,False,False,[],False,False,1627245823,self.Amd,https://www.reddit.com/r/Amd/comments/orjkbp/n...,{},orjkbp,False,False,False,False,False,False,True,False,#ff9800,[],a0256696-9254-11e6-936c-0e8d317f3e82,Discussion,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orjkbp/new_bios_update_for_str...,False,6,reddit,1627245834,1,[removed],True,False,False,Amd,t5_2rw0n,1021669,public,confidence,self,New BIOS update for Strix X570-I Gaming,0,[],1.0,https://www.reddit.com/r/Amd/comments/orjkbp/n...,all_ads,6,,,,,,,,,,,,,,,,,,,,
2,[],False,tacotaco87,,[],,text,t2_4mfhwn8n,False,False,False,[],False,False,1627245741,self.Amd,https://www.reddit.com/r/Amd/comments/orjjg9/d...,{},orjjg9,False,False,False,False,False,False,True,False,#ffc107,[],28a601d8-9255-11e6-9c8f-0eda72ca337c,Tech Support,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orjjg9/do_i_need_to_upgrade_my...,False,6,reddit,1627245752,1,[removed],True,False,False,Amd,t5_2rw0n,1021667,public,confidence,self,Do I need to upgrade my B450 board to a B550? ...,0,[],1.0,https://www.reddit.com/r/Amd/comments/orjjg9/d...,all_ads,6,,,,,,,,,,,,,,,,,,,,
3,[],False,nikkdizzle,,[],,text,t2_vc6gf,False,False,False,[],False,False,1627245556,self.Amd,https://www.reddit.com/r/Amd/comments/orjhcp/m...,{},orjhcp,False,False,False,False,False,False,True,False,#ed1c24,[],f43b65fc-3634-11e8-bd1d-0e028dbf07e4,Battlestation,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orjhcp/my_first_pc_build_and_t...,False,6,reddit,1627245568,1,[removed],True,False,False,Amd,t5_2rw0n,1021664,public,confidence,https://a.thumbs.redditmedia.com/nbK5bMDJbE-pr...,My first PC build and things escalated quickly.,0,[],1.0,https://www.reddit.com/r/Amd/comments/orjhcp/m...,all_ads,6,battlestation,140.0,140.0,,,,,,,,,,,,,,,,,
4,[],False,xdtolm,,[],,text,t2_h7oc5,False,False,False,[],False,False,1627242595,self.Amd,https://www.reddit.com/r/Amd/comments/oriior/v...,{},oriior,False,False,False,False,False,False,True,False,#2196f3,[],a2ff72e4-9254-11e6-ba56-0ea8bdcdb688,News,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/oriior/vkfft_can_now_perform_f...,False,6,reddit,1627242606,1,[removed],True,False,False,Amd,t5_2rw0n,1021627,public,confidence,self,VkFFT can now perform Fast Fourier Transforms ...,0,[],1.0,https://www.reddit.com/r/Amd/comments/oriior/v...,all_ads,6,,,,self,"{'enabled': False, 'images': [{'id': 'Mf9YapcR...",,,,,,,,,,,,,,,
5,[],False,[deleted],,,,,,False,,,[],False,False,1627241937,self.Amd,https://www.reddit.com/r/Amd/comments/oribd6/n...,{},oribd6,False,False,False,False,False,False,True,False,#c0ca33,[],222a5170-d73f-11e8-bcc5-0ea0ee8d0508,Benchmark,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/oribd6/nvidai/,False,6,deleted,1627241947,1,,True,False,False,Amd,t5_2rw0n,1021617,public,confidence,default,Nvidai,0,[],1.0,https://www.reddit.com/r/Amd/comments/oribd6/n...,all_ads,6,,,,,,,dark,moderators,,,,,,,,,,,,
6,[],False,A_bit_disappointing,,[],,text,t2_7v35u11a,False,False,False,[],False,False,1627240587,/r/Amd/comments/orhwb5/i_have_7_usage_on_my_59...,https://www.reddit.com/r/Amd/comments/orhwb5/i...,{},orhwb5,False,True,False,False,False,True,False,True,#ffc107,[],28a601d8-9255-11e6-9c8f-0eda72ca337c,Tech Support,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orhwb5/i_have_7_usage_on_my_59...,False,6,,1627240599,1,,True,False,False,Amd,t5_2rw0n,1021601,public,confidence,https://b.thumbs.redditmedia.com/3V5fH-aPRETcm...,I have ~7% usage on my 5900x. But the temps ar...,0,[],1.0,https://v.redd.it/vgxz32dored71,all_ads,6,,140.0,140.0,hosted:video,"{'enabled': False, 'images': [{'id': 'egVCKbSN...",,,,https://v.redd.it/vgxz32dored71,,,,,,,,,,,
7,[],False,bobmunciee,,[],,text,t2_50l38lop,False,False,False,[],False,False,1627240227,i.redd.it,https://www.reddit.com/r/Amd/comments/orhs44/o...,{},orhs44,False,True,False,False,True,True,False,False,#ed1c24,[],f43b65fc-3634-11e8-bd1d-0e028dbf07e4,Battlestation,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orhs44/one_more_quick_photo_of...,False,6,,1627240238,1,,True,False,False,Amd,t5_2rw0n,1021592,public,confidence,https://b.thumbs.redditmedia.com/8U1UVAHft8Au9...,One more Quick photo of my R5 5600X 3070Ti Bui...,0,[],1.0,https://i.redd.it/n40y0kl4red71.jpg,all_ads,6,battlestation,140.0,140.0,image,"{'enabled': True, 'images': [{'id': '_0cLYH9US...",,,,https://i.redd.it/n40y0kl4red71.jpg,,,,,,,,,,,
8,[],False,Dxterity,,[],,text,t2_itz9f,False,False,False,[],False,False,1627239500,self.Amd,https://www.reddit.com/r/Amd/comments/orhjp4/m...,{},orhjp4,False,False,False,False,False,False,True,False,#ffc107,[],28a601d8-9255-11e6-9c8f-0eda72ca337c,Tech Support,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orhjp4/motherboard_for_5900x/,False,6,reddit,1627239512,1,[removed],True,False,False,Amd,t5_2rw0n,1021576,public,confidence,self,Motherboard for 5900X,0,[],1.0,https://www.reddit.com/r/Amd/comments/orhjp4/m...,all_ads,6,,,,,,,,,,,,,,,,,,,,
9,[],False,milesoc,,[],,text,t2_4rws5,False,False,False,[],False,False,1627238568,self.Amd,https://www.reddit.com/r/Amd/comments/orh9cq/5...,{},orh9cq,False,False,False,False,False,False,True,False,#ffc107,[],28a601d8-9255-11e6-9c8f-0eda72ca337c,Tech Support,light,text,False,False,True,0,0,False,all_ads,/r/Amd/comments/orh9cq/5950x_hotter_on_idle_th...,False,6,reddit,1627238579,1,[removed],True,False,False,Amd,t5_2rw0n,1021562,public,confidence,self,5950x - Hotter on idle than under load?,0,[],1.0,https://www.reddit.com/r/Amd/comments/orh9cq/5...,all_ads,6,,,,,,,,,,,,,,,,,,,,


In [11]:
#narrowing down variables
subfields = ['title', 'selftext', 'subreddit', 'created_utc', 'author', 'num_comments', 'score']

In [12]:
#checking
results_df = results_df[subfields]
results_df2 = results_df2[subfields]
results_df.head(40)

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score
0,Wandering if my 5950x is getting to hot,[removed],Amd,1627245839,G-Mas,0,1
1,New BIOS update for Strix X570-I Gaming,[removed],Amd,1627245823,w3lmatatan,0,1
2,Do I need to upgrade my B450 board to a B550? ...,[removed],Amd,1627245741,tacotaco87,0,1
3,My first PC build and things escalated quickly.,[removed],Amd,1627245556,nikkdizzle,0,1
4,VkFFT can now perform Fast Fourier Transforms ...,[removed],Amd,1627242595,xdtolm,0,1
5,Nvidai,,Amd,1627241937,[deleted],0,1
6,I have ~7% usage on my 5900x. But the temps ar...,,Amd,1627240587,A_bit_disappointing,0,1
7,One more Quick photo of my R5 5600X 3070Ti Bui...,,Amd,1627240227,bobmunciee,0,1
8,Motherboard for 5900X,[removed],Amd,1627239500,Dxterity,0,1
9,5950x - Hotter on idle than under load?,[removed],Amd,1627238568,milesoc,0,1


In [13]:
#removing duplicates
results_df.drop_duplicates(inplace=True)
results_df2.drop_duplicates(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  results_df.drop_duplicates(inplace=True)


In [14]:
#converting timestamp
results_df['timestamp'] = results_df['created_utc'].map(dt.date.fromtimestamp)
results_df2['timestamp'] = results_df2['created_utc'].map(dt.date.fromtimestamp)
results_df['timestamp']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  results_df['timestamp'] = results_df['created_utc'].map(dt.date.fromtimestamp)


0     2021-07-26
1     2021-07-26
2     2021-07-26
3     2021-07-26
4     2021-07-26
5     2021-07-26
6     2021-07-26
7     2021-07-26
8     2021-07-26
9     2021-07-26
10    2021-07-26
11    2021-07-26
12    2021-07-26
13    2021-07-26
14    2021-07-26
15    2021-07-26
16    2021-07-26
17    2021-07-26
18    2021-07-26
19    2021-07-25
20    2021-07-25
21    2021-07-25
22    2021-07-25
23    2021-07-25
24    2021-07-25
25    2021-07-25
26    2021-07-25
27    2021-07-25
28    2021-07-25
29    2021-07-25
30    2021-07-25
31    2021-07-25
32    2021-07-25
33    2021-07-25
34    2021-07-25
35    2021-07-25
36    2021-07-25
37    2021-07-25
38    2021-07-25
39    2021-07-25
40    2021-07-25
41    2021-07-25
42    2021-07-25
43    2021-07-25
44    2021-07-25
45    2021-07-25
46    2021-07-25
47    2021-07-25
48    2021-07-25
49    2021-07-25
50    2021-07-25
51    2021-07-25
52    2021-07-25
53    2021-07-25
54    2021-07-25
55    2021-07-25
56    2021-07-25
57    2021-07-25
58    2021-07-

In [25]:
# checking shape of pulled data test
results_df.shape

(100, 8)

#### 3. Setting up function for automatic scraping of data

In [26]:
#Function setting

def fetch_posts_test(subreddit, day_window = 30, n = 10000):
    
    #urls
    base_url = 'https://api.pushshift.io/reddit/search/submission'
    full_url = f'{base_url}?subreddit={subreddit}&size=100'
    print(full_url)
    
    #creating an empty list
    posts = []
    
#     #first 100 posts
#     first_url = full_url
#     res_1 = requests.get(first_url)
#     scraped_dict = res.json()['data']
#     df = pd.DataFrame.from_dict(scraped_dict)
#     posts.append(df)
    
    # interations
    for i in range(1, n+1):
        URL = '{}&after={}d'.format(full_url, day_window *i)
        print('Show after date: ' + URL)
        res_1 = requests.get(URL)
        
        try:
            res = requests.get(URL)
            assert res.status_code == 200
        except:
            continue
        
        scraped_dict = res.json()['data']
        df = pd.DataFrame.from_dict(scraped_dict)
        posts.append(df)
        total_scraped = sum(len(x) for x in posts)
        
        # print(len(posts))
        print(total_scraped)
        # if there are more than 10,000, stop
        if total_scraped > 10_000:
            break
        
        #wait for 1 second
        time.sleep(1)
            
    
    full_df = pd.concat(posts, sort=False)
    full_df['timestamp'] = full_df['created_utc'].map(dt.date.fromtimestamp)
    print(full_df.shape)
    return full_df.reset_index(drop=True)


#### 4. Executing scraping processes

# Warning
The code from here on will scrape data automatically.
Please expect to wait at least 30-45 minutes for the scraping of each subreddit.

In [21]:
# Pulling data from AMD subreddit
amd_test = fetch_posts_test('amd')

https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=30d
100
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=60d
200
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=90d
300
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=120d
400
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=150d
500
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=180d
600
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=210d
700
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=240d
800
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&

7680
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2340d
7780
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2370d
7880
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2400d
7980
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2430d
8080
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2460d
8180
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2490d
8280
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2520d
8380
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2550d
8480
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=amd&size=100&after=2580d
8580
Show after date: https://api.pus

In [22]:
# Checking pulled AMD data
amd_test.head()

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gallery_data,gildings,id,is_created_from_ads_ui,is_crosspostable,is_gallery,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media_metadata,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,pwls,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,suggested_sort,thumbnail,thumbnail_height,thumbnail_width,title,total_awards_received,treatment_tags,upvote_ratio,url,url_overridden_by_dest,whitelist_status,wls,author_flair_background_color,author_flair_template_id,author_flair_text_color,post_hint,preview,removed_by_category,link_flair_css_class,author_cakeday,media,media_embed,secure_media,secure_media_embed,crosspost_parent,crosspost_parent_list,banned_by,edited,steward_reports,updated_utc,gilded,author_id,rte_mode,brand_safe,previous_visits,approved_at_utc,author_created_utc,banned_at_utc,distinguished,mod_reports,user_reports,timestamp
0,[],False,TheHillxer,,[],,text,t2_a13bii2l,False,False,[],False,False,1624754123,reddit.com,https://www.reddit.com/r/Amd/comments/o8mm1f/f...,"{'items': [{'id': 53287492, 'media_id': 'kt0j9...",{},o8mm1f,False,True,True,False,False,False,True,False,False,#009688,[],a5500b08-9254-11e6-8389-0ee9c105291a,Photo,light,text,False,"{'f6trfzvnep771': {'e': 'Image', 'id': 'f6trfz...",False,True,1,0.0,False,all_ads,/r/Amd/comments/o8mm1f/finally_done/,False,6.0,1624754134,1,,True,False,False,Amd,t5_2rw0n,987572.0,public,confidence,https://b.thumbs.redditmedia.com/3nj-I_ca2-ktI...,140.0,140.0,Finally done!!,0.0,[],1.0,https://www.reddit.com/gallery/o8mm1f,https://www.reddit.com/gallery/o8mm1f,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
1,[],False,TheHillxer,,[],,text,t2_a13bii2l,False,True,[],False,False,1624754677,reddit.com,https://www.reddit.com/r/Amd/comments/o8mr60/f...,"{'items': [{'id': 53289049, 'media_id': '6w42b...",{},o8mr60,False,True,True,False,False,False,True,False,False,#009688,[],a5500b08-9254-11e6-8389-0ee9c105291a,Photo,light,text,False,"{'6w42b65bgp771': {'e': 'Image', 'id': '6w42b6...",False,True,15,0.0,False,all_ads,/r/Amd/comments/o8mr60/finally_done_5950x_6900xt/,False,6.0,1624754688,1,,True,False,False,Amd,t5_2rw0n,987572.0,public,confidence,https://a.thumbs.redditmedia.com/zFlqIxipq8JhJ...,140.0,140.0,Finally done!! 5950x 6900xt,0.0,[],1.0,https://www.reddit.com/gallery/o8mr60,https://www.reddit.com/gallery/o8mr60,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
2,[],False,noobtablet9,,[],,text,t2_gkzy0,False,False,[],False,False,1624754751,self.Amd,https://www.reddit.com/r/Amd/comments/o8mrvt/m...,,{},o8mrvt,False,True,,False,False,False,True,True,False,#ffc107,[],28a601d8-9255-11e6-9c8f-0eda72ca337c,Tech Support,light,text,False,,False,True,2,0.0,False,all_ads,/r/Amd/comments/o8mrvt/my_vega_64_is_stuck_in_...,False,6.0,1624754762,1,Is the card dead? It is no longer under warran...,True,False,False,Amd,t5_2rw0n,987573.0,public,confidence,self,,,My Vega 64 is stuck in a cycle of NO SIGNAL,0.0,[],1.0,https://www.reddit.com/r/Amd/comments/o8mrvt/m...,,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
3,[],False,destiny2sk,amd,[],3900X | 32GB | RTX,text,t2_7x2azwx,False,False,[],False,False,1624755170,reddit.com,https://www.reddit.com/r/Amd/comments/o8mvoz/3...,"{'items': [{'id': 53290278, 'media_id': '7vowc...",{},o8mvoz,False,True,True,False,False,False,True,False,False,#ff9800,[],a0256696-9254-11e6-936c-0e8d317f3e82,Discussion,light,text,False,"{'7vowcgc6hp771': {'e': 'Image', 'id': '7vowcg...",False,True,18,0.0,False,all_ads,/r/Amd/comments/o8mvoz/3900x_core_location_on_...,False,6.0,1624755182,1,,True,False,False,Amd,t5_2rw0n,987577.0,public,confidence,https://b.thumbs.redditmedia.com/Cd_oQpmXjKmNC...,139.0,140.0,3900x core location on the ccds,0.0,[],1.0,https://www.reddit.com/gallery/o8mvoz,https://www.reddit.com/gallery/o8mvoz,all_ads,6.0,#f26621,78761856-ef59-11e4-a27c-22000b39e724,light,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
4,[],False,ceetoee,amd,[],5600x PBO | B550-F WiFi | 32GB 3733MHz CL16,text,t2_8i0uz42e,False,False,[],False,False,1624756643,self.Amd,https://www.reddit.com/r/Amd/comments/o8n99s/a...,,{},o8n99s,False,True,,False,False,False,True,True,False,#ff9800,[],a0256696-9254-11e6-936c-0e8d317f3e82,Discussion,light,text,False,,False,True,10,0.0,False,all_ads,/r/Amd/comments/o8n99s/asus_motherboard_rog_st...,False,6.0,1624756655,1,**ASUS B550-F BIOS 2403**\n\nDoes PBO need to ...,True,False,False,Amd,t5_2rw0n,987595.0,public,confidence,self,,,ASUS Motherboard ROG STRIX B550-F PBO options ...,0.0,[],1.0,https://www.reddit.com/r/Amd/comments/o8n99s/a...,,all_ads,6.0,#f26621,78761856-ef59-11e4-a27c-22000b39e724,light,self,"{'enabled': False, 'images': [{'id': 'D6qXdt-n...",,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27


In [23]:
# Pulling data from NVIDIA subreddit
nvidia_test = fetch_posts_test('nvidia')
nvidia_test.head()

https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=30d
100
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=60d
200
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=90d
300
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=120d
400
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=150d
500
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=180d
600
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=210d
700
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=240d
800
Show after date: https://api.pushshift.io/reddit/search/submiss

7490
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2280d
7590
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2310d
7690
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2340d
7790
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2370d
7890
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2400d
7990
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2430d
8090
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2460d
8190
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2490d
8290
Show after date: https://api.pushshift.io/reddit/search/submission?subreddit=nvidia&size=100&after=2520d
8390
Show 

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_created_from_ads_ui,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_class,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,pwls,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,thumbnail,title,total_awards_received,treatment_tags,upvote_ratio,url,whitelist_status,wls,media_metadata,thumbnail_height,thumbnail_width,post_hint,preview,is_gallery,removed_by_category,url_overridden_by_dest,gallery_data,author_flair_background_color,author_flair_template_id,author_flair_text_color,crosspost_parent,crosspost_parent_list,media,media_embed,secure_media,secure_media_embed,edited,banned_by,author_cakeday,poll_data,distinguished,steward_reports,updated_utc,gilded,author_id,rte_mode,brand_safe,approved_at_utc,banned_at_utc,suggested_sort,author_created_utc,mod_reports,user_reports,timestamp
0,[],False,snorkles01,,[],,text,t2_10rq35,False,False,[],False,False,1624756389,self.nvidia,https://www.reddit.com/r/nvidia/comments/o8n6w...,{},o8n6wy,False,True,False,False,False,True,True,False,#e91e63,question,"[{'e': 'text', 't': 'Question'}]",fb6f8e52-4086-11e6-b284-0ee96c7aff3d,Question,light,richtext,False,False,True,0,0.0,False,all_ads,/r/nvidia/comments/o8n6wy/buzzing_sound_from_3...,False,6.0,1624756399,1,I just got my 3060 Ti FE installed a couple da...,True,False,False,nvidia,t5_2rlgy,913721.0,public,self,Buzzing sound from 3060 Ti FE,0.0,[],1.0,https://www.reddit.com/r/nvidia/comments/o8n6w...,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
1,[],False,dolemite79,,[],,text,t2_fel3tsz,False,False,[],False,False,1624757040,self.nvidia,https://www.reddit.com/r/nvidia/comments/o8ncx...,{},o8ncxq,False,True,False,False,False,True,True,False,#e91e63,question,"[{'e': 'text', 't': 'Question'}]",fb6f8e52-4086-11e6-b284-0ee96c7aff3d,Question,light,richtext,False,False,False,4,0.0,False,all_ads,/r/nvidia/comments/o8ncxq/system_crash_now_no_...,False,6.0,1624757051,1,"So I've had my 3080 since November, no issues ...",True,False,False,nvidia,t5_2rlgy,913727.0,public,self,System crash now no video signal from evga 3080,0.0,[],1.0,https://www.reddit.com/r/nvidia/comments/o8ncx...,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
2,[],False,makisekurisudesu,,[],,text,t2_482yjyco,False,False,[],False,False,1624759428,self.nvidia,https://www.reddit.com/r/nvidia/comments/o8nyc...,{},o8nyc6,False,True,False,False,False,True,True,False,#ff9800,discussion,"[{'e': 'text', 't': 'Discussion'}]",cf69204c-61e5-11e4-890f-12313b0e8c78,Discussion,light,richtext,False,False,True,12,0.0,False,all_ads,/r/nvidia/comments/o8nyc6/dlss_in_necromunda_h...,False,6.0,1624759439,1,"I think what I want to like the clarity, alias...",True,False,False,nvidia,t5_2rlgy,913766.0,public,https://b.thumbs.redditmedia.com/cHrgEl1jPBpu1...,DLSS in Necromunda Hired Gun Image Quality Com...,0.0,[],1.0,https://www.reddit.com/r/nvidia/comments/o8nyc...,all_ads,6.0,"{'6ahjk81csp771': {'e': 'Image', 'id': '6ahjk8...",52.0,140.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
3,[],False,TJOcraft8,,[],,text,t2_40q6b23,False,False,[],False,False,1624759484,self.nvidia,https://www.reddit.com/r/nvidia/comments/o8nyu...,{},o8nyuc,False,True,False,False,False,True,True,False,#e91e63,question,"[{'e': 'text', 't': 'Question'}]",fb6f8e52-4086-11e6-b284-0ee96c7aff3d,Question,light,richtext,False,False,True,5,0.0,False,all_ads,/r/nvidia/comments/o8nyuc/how_come_my_cat_ears...,False,6.0,1624759495,1,"come on man, IM ASKING YOU REDDIT. IM A EMO GO...",True,False,False,nvidia,t5_2rlgy,913769.0,public,self,How come my cat ears WONT show up on NVIDIA BR...,0.0,[],1.0,https://www.reddit.com/r/nvidia/comments/o8nyu...,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27
4,[],False,FriendleyComdrade,,[],,text,t2_7t0sprwu,False,False,[],False,False,1624759975,self.nvidia,https://www.reddit.com/r/nvidia/comments/o8o39...,{},o8o39j,False,True,False,False,False,True,True,False,#e91e63,question,"[{'e': 'text', 't': 'Question'}]",fb6f8e52-4086-11e6-b284-0ee96c7aff3d,Question,light,richtext,False,False,True,3,0.0,False,all_ads,/r/nvidia/comments/o8o39j/filters_arent_working/,False,6.0,1624759986,1,Every time i open Valorant to add a certain fi...,True,False,False,nvidia,t5_2rlgy,913773.0,public,self,Filters arent working,0.0,[],1.0,https://www.reddit.com/r/nvidia/comments/o8o39...,all_ads,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-06-27


#### 5. Saving dfs to csv for export and cleaning

In [24]:
#saving to csv
amd_test.to_csv('../data/amd.csv', index=False)
nvidia_test.to_csv('../data/nvidia.csv', index=False)