# Project 3: Subreddit Classification

## Problem Statement

Recently,there is an inflow of irrelevant posts on Apple and Android subreddits and Reddit has received many complaints from the end users. In an attempt to salvage their reputation, Reddit is looking to find a permanent fix to solve the issue by leveraging machine learning to distinguish between Apple and Android subreddit posts. Consequently, the load on the content moderators could be eased and millions of users could also benefit in finding relevant posts of their interest on the subreddits. 

The goal of this project is to develop a machine learning binary classifier and gain insights on the features which could accurately separate posts into Apple or Android subreddits, using natural language processing techniques. The performance of the different classifier models would be evaluated using the Receiver Operating Characteristic-Area Under Curve (ROC-AUC) metric which tells how good the classifier is in distinguishing between Apple and Android subreddits, with a success criteria of above 95% while preserving model generalizability.     

## Executive Summary






### Contents:
- [Data Collection](#Data-Collection)

## Data Collection

The Reddit API allows one to interact remotely to download posts from each subreddit, subject to a cap of 1000 posts. Since each typical request for posts only returns 25 entries, a custom function would need to be created to automate the process to collect sufficient data without overstressing the server such that a user could get blocked. This method requires a custom User-agent and a time.sleep() function to disguise the API call as coming from a Python programme.   

The goal is to gather as much data possible related to each subreddit post for the machine learning classifier to reasonably capture the relationships that may exist between input and output features.

In [1]:
import requests
import pandas as pd
import time
import random

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

%matplotlib inline

### Scrapping Apple Subreddit

In [2]:
posts = []
after = None

for a in range(40):
    url = 'https://www.reddit.com/r/apple.json'
    if after == None:
        current_url = url
    else:
        current_url = url + '?after=' + after
    print(current_url)
    
    # send request to url
    res = requests.get(current_url, headers={'User-agent': 'BBC Inc 1.0'})
    
    # check for errors
    if res.status_code != 200:
        print('Status error', res.status_code)
        break
    
    # get posts and add to [posts]
    current_dict = res.json()
    current_posts = [p['data'] for p in current_dict['data']['children']]
    posts.extend(current_posts)
    
    # get tag of last post on the page
    after = current_dict['data']['after']

    # generate a random sleep duration to look more 'natural'
    sleep_duration = random.randint(1,5)
    print(sleep_duration)
    time.sleep(sleep_duration)

https://www.reddit.com/r/apple.json
3
https://www.reddit.com/r/apple.json?after=t3_j06wzm
1
https://www.reddit.com/r/apple.json?after=t3_j02p2w
3
https://www.reddit.com/r/apple.json?after=t3_j05ws0
1
https://www.reddit.com/r/apple.json?after=t3_j0670r
3
https://www.reddit.com/r/apple.json?after=t3_iyu4vk
2
https://www.reddit.com/r/apple.json?after=t3_iybsyi
5
https://www.reddit.com/r/apple.json?after=t3_izcdcw
1
https://www.reddit.com/r/apple.json?after=t3_iysjo5
1
https://www.reddit.com/r/apple.json?after=t3_ix32rn
1
https://www.reddit.com/r/apple.json?after=t3_iwswjl
3
https://www.reddit.com/r/apple.json?after=t3_ivx31l
2
https://www.reddit.com/r/apple.json?after=t3_ivl1pj
4
https://www.reddit.com/r/apple.json?after=t3_iwa65i
1
https://www.reddit.com/r/apple.json?after=t3_ivxlf0
3
https://www.reddit.com/r/apple.json?after=t3_iuiasz
5
https://www.reddit.com/r/apple.json?after=t3_iuxujh
5
https://www.reddit.com/r/apple.json?after=t3_itphm8
2
https://www.reddit.com/r/apple.json?after=t3

In [3]:
len(posts)

985

In [4]:
# Check for unique posts
len(set([p['name'] for p in posts]))

808

In [5]:
df_apple = pd.DataFrame(posts)

In [6]:
# Drop duplicates 
df_apple.drop_duplicates(subset='name', keep="first", inplace=True)
df_apple.head()

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,subreddit_name_prefixed,hidden,pwls,link_flair_css_class,downs,thumbnail_height,top_awarded_type,hide_score,name,quarantine,link_flair_text_color,upvote_ratio,author_flair_background_color,subreddit_type,ups,total_awards_received,media_embed,thumbnail_width,author_flair_template_id,is_original_content,user_reports,secure_media,is_reddit_media_domain,is_meta,category,secure_media_embed,link_flair_text,can_mod_post,score,approved_by,author_premium,thumbnail,edited,author_flair_css_class,author_flair_richtext,gildings,post_hint,content_categories,is_self,mod_note,created,link_flair_type,wls,removed_by_category,banned_by,author_flair_type,domain,allow_live_comments,selftext_html,likes,suggested_sort,banned_at_utc,view_count,archived,no_follow,is_crosspostable,pinned,over_18,preview,all_awardings,awarders,media_only,can_gild,spoiler,locked,author_flair_text,treatment_tags,visited,removed_by,num_reports,distinguished,subreddit_id,mod_reason_by,removal_reason,link_flair_background_color,id,is_robot_indexable,report_reasons,author,discussion_type,num_comments,send_replies,whitelist_status,contest_mode,mod_reports,author_patreon_flair,author_flair_text_color,permalink,parent_whitelist_status,stickied,url,subreddit_subscribers,created_utc,num_crossposts,media,is_video,url_overridden_by_dest,link_flair_template_id
0,,apple,\n\nWelcome to the daily Tech Support thread f...,t2_6l4z3,False,,0,False,Daily Tech Support Thread - [September 27],"[{'e': 'text', 't': 'Official Megathread'}]",r/apple,False,6,megathread,0,,,False,t3_j0so5j,False,dark,0.68,,public,7,0,{},,,False,[],,False,False,,{},Official Megathread,False,7,,True,self,False,,[],{},self,,True,,1601248000.0,richtext,6,,,text,self.apple,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,new,,,False,True,False,False,False,{'images': [{'source': {'url': 'https://extern...,[],[],False,False,False,False,,[],False,,,moderator,t5_2qh1f,,,,j0so5j,True,,AutoModerator,,66,False,all_ads,False,[],False,,/r/apple/comments/j0so5j/daily_tech_support_th...,all_ads,True,https://www.reddit.com/r/apple/comments/j0so5j...,1800820,1601219000.0,0,,False,,
1,,apple,"\n\nHello /r/Apple, and welcome to ""Shortcuts ...",t2_6l4z3,False,,0,False,Shortcuts Sunday - [September 27],"[{'e': 'text', 't': 'Official Megathread'}]",r/apple,False,6,megathread,0,,,False,t3_j0que7,False,dark,0.81,,public,20,0,{},,,False,[],,False,False,,{},Official Megathread,False,20,,True,self,False,,[],{},self,,True,,1601241000.0,richtext,6,,,text,self.apple,True,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,False,True,False,False,False,{'images': [{'source': {'url': 'https://extern...,[],[],False,False,False,False,,[],False,,,moderator,t5_2qh1f,,,,j0que7,True,,AutoModerator,,5,False,all_ads,False,[],False,,/r/apple/comments/j0que7/shortcuts_sunday_sept...,all_ads,True,https://www.reddit.com/r/apple/comments/j0que7...,1800820,1601212000.0,0,,False,,
2,,apple,,t2_78w5nxue,False,,0,False,iOS 14: 'Phoenix 2' Space Shooter Delivers Pla...,"[{'e': 'text', 't': 'iOS'}]",r/apple,False,6,ios,0,83.0,,False,t3_j0rz3p,False,dark,0.97,,public,1660,3,{},140.0,,False,[],,False,False,,{},iOS,False,1660,,False,https://b.thumbs.redditmedia.com/i8UjN90iT4042...,False,,[],{},link,,False,,1601246000.0,richtext,6,,,text,macrumors.com,True,,,,,,False,False,False,False,False,{'images': [{'source': {'url': 'https://extern...,"[{'giver_coin_reward': None, 'subreddit_id': N...",[],False,False,False,False,,[],False,,,,t5_2qh1f,,,,j0rz3p,True,,redditer_09,,92,True,all_ads,False,[],False,,/r/apple/comments/j0rz3p/ios_14_phoenix_2_spac...,all_ads,False,https://www.macrumors.com/2020/09/26/phoenix-g...,1800820,1601217000.0,0,,False,https://www.macrumors.com/2020/09/26/phoenix-g...,e9b1d532-5701-11e9-87f1-0edf28c73d02
3,,apple,,t2_th4cg,False,,0,False,iOS 14: How to stop your AirPods automatically...,"[{'e': 'text', 't': 'AirPods'}]",r/apple,False,6,,0,70.0,,False,t3_j0yob9,False,dark,0.95,,public,233,0,{},140.0,,False,[],,False,False,,{},AirPods,False,233,,False,https://b.thumbs.redditmedia.com/E6OTLOdto3Ths...,False,,[],{},link,,False,,1601269000.0,richtext,6,,,text,9to5mac.com,True,,,,,,False,False,False,False,False,{'images': [{'source': {'url': 'https://extern...,[],[],False,False,False,False,,[],False,,,,t5_2qh1f,,,,j0yob9,True,,PJ09,,44,False,all_ads,False,[],False,,/r/apple/comments/j0yob9/ios_14_how_to_stop_yo...,all_ads,False,https://9to5mac.com/2020/09/27/disable-airpods...,1800820,1601240000.0,0,,False,https://9to5mac.com/2020/09/27/disable-airpods...,db824c80-5701-11e9-b066-0e168b9ea574
4,,apple,,t2_15c325,False,,4,False,There was no NASA Astronomy Picture of the Day...,"[{'e': 'text', 't': 'Promo Saturday'}]",r/apple,False,6,promo,0,73.0,,False,t3_j0k9n1,False,dark,0.97,,public,4167,30,{},140.0,,False,[],,False,False,,{},Promo Saturday,False,4167,,True,https://b.thumbs.redditmedia.com/rWpCHZ42VAGd8...,False,,[],"{'gid_2': 4, 'gid_3': 1}",link,,False,,1601208000.0,richtext,6,,,text,apps.apple.com,True,,,,,,False,False,False,False,False,{'images': [{'source': {'url': 'https://extern...,"[{'giver_coin_reward': 0, 'subreddit_id': None...",[],False,False,False,False,,[],False,,,,t5_2qh1f,,,,j0k9n1,True,,TrailWhale,,158,True,all_ads,False,[],False,,/r/apple/comments/j0k9n1/there_was_no_nasa_ast...,all_ads,False,https://apps.apple.com/us/app/apod-astronomy-p...,1800820,1601179000.0,1,,False,https://apps.apple.com/us/app/apod-astronomy-p...,854c34e2-5702-11e9-bf73-0e73ef6cdf98


In [2]:
col_keep =  ['subreddit', 'selftext', 'title','upvote_ratio','score','num_comments','author' ]

In [8]:
df_apple = df_apple[col_keep]

In [9]:
df_apple.head()

Unnamed: 0,subreddit,selftext,title,upvote_ratio,score,num_comments,author
0,apple,\n\nWelcome to the daily Tech Support thread f...,Daily Tech Support Thread - [September 27],0.68,7,66,AutoModerator
1,apple,"\n\nHello /r/Apple, and welcome to ""Shortcuts ...",Shortcuts Sunday - [September 27],0.81,20,5,AutoModerator
2,apple,,iOS 14: 'Phoenix 2' Space Shooter Delivers Pla...,0.97,1660,92,redditer_09
3,apple,,iOS 14: How to stop your AirPods automatically...,0.95,233,44,PJ09
4,apple,,There was no NASA Astronomy Picture of the Day...,0.97,4167,158,TrailWhale


In [10]:
# Export to csv
df_apple.to_csv('../data/apple_subreddit v2.csv', index = False)

### Scrapping Android Subreddit

In [2]:
posts = []
after = None

for a in range(40):
    url = 'https://www.reddit.com/r/Android.json'
    if after == None:
        current_url = url
    else:
        current_url = url + '?after=' + after
    print(current_url)
    
    # send request to url
    res = requests.get(current_url, headers={'User-agent': 'BBC Inc 1.0'})
    
    # check for errors
    if res.status_code != 200:
        print('Status error', res.status_code)
        break
    
    # get posts and add to [posts]
    current_dict = res.json()
    current_posts = [p['data'] for p in current_dict['data']['children']]
    posts.extend(current_posts)
    
    # get tag of last post on the page
    after = current_dict['data']['after']

    # generate a random sleep duration to look more 'natural'
    sleep_duration = random.randint(1,5)
    print(sleep_duration)
    time.sleep(sleep_duration)

https://www.reddit.com/r/Android.json
5
https://www.reddit.com/r/Android.json?after=t3_j04ato
1
https://www.reddit.com/r/Android.json?after=t3_iyysc1
2
https://www.reddit.com/r/Android.json?after=t3_ixszud
3
https://www.reddit.com/r/Android.json?after=t3_iy9bir
5
https://www.reddit.com/r/Android.json?after=t3_ivrc4d
2
https://www.reddit.com/r/Android.json?after=t3_ivm5uq
3
https://www.reddit.com/r/Android.json?after=t3_iuffvg
1
https://www.reddit.com/r/Android.json?after=t3_itfsx0
5
https://www.reddit.com/r/Android.json?after=t3_isv51c
1
https://www.reddit.com/r/Android.json?after=t3_irwf64
1
https://www.reddit.com/r/Android.json?after=t3_iqoyfi
5
https://www.reddit.com/r/Android.json?after=t3_iphepi
3
https://www.reddit.com/r/Android.json?after=t3_ioxv11
4
https://www.reddit.com/r/Android.json?after=t3_io6oae
3
https://www.reddit.com/r/Android.json?after=t3_inkb4u
4
https://www.reddit.com/r/Android.json?after=t3_imfsj4
1
https://www.reddit.com/r/Android.json?after=t3_il5i6u
5
https://

In [3]:
len(posts)

977

In [4]:
# Check for unique posts
len(set([p['name'] for p in posts]))

726

In [5]:
df_android = pd.DataFrame(posts)

In [6]:
# Drop duplicates 
df_android.drop_duplicates(subset='name', keep="first", inplace=True)
df_android.head()

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,subreddit_name_prefixed,hidden,pwls,link_flair_css_class,downs,thumbnail_height,top_awarded_type,hide_score,name,quarantine,link_flair_text_color,upvote_ratio,author_flair_background_color,subreddit_type,ups,total_awards_received,media_embed,thumbnail_width,author_flair_template_id,is_original_content,user_reports,secure_media,is_reddit_media_domain,is_meta,category,secure_media_embed,link_flair_text,can_mod_post,score,approved_by,author_premium,thumbnail,edited,author_flair_css_class,author_flair_richtext,gildings,content_categories,is_self,mod_note,created,link_flair_type,wls,removed_by_category,banned_by,author_flair_type,domain,allow_live_comments,selftext_html,likes,suggested_sort,banned_at_utc,view_count,archived,no_follow,is_crosspostable,pinned,over_18,all_awardings,awarders,media_only,can_gild,spoiler,locked,author_flair_text,treatment_tags,visited,removed_by,num_reports,distinguished,subreddit_id,mod_reason_by,removal_reason,link_flair_background_color,id,is_robot_indexable,report_reasons,author,discussion_type,num_comments,send_replies,whitelist_status,contest_mode,mod_reports,author_patreon_flair,author_flair_text_color,permalink,parent_whitelist_status,stickied,url,subreddit_subscribers,created_utc,num_crossposts,media,is_video,post_hint,url_overridden_by_dest,preview,link_flair_template_id,author_cakeday
0,,Android,"Note 1. Join us at /r/MoronicMondayAndroid, a ...",t2_6l4z3,False,,0,False,Moronic Monday (Sep 28 2020) - Your weekly que...,[],r/Android,False,6,,0,,,False,t3_j1a3yp,False,dark,0.82,,public,21,0,{},,,False,[],,False,False,,{},,False,21,,True,self,False,robot,[],{},,True,,1601321000.0,text,6,,,text,self.Android,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,new,,,False,False,False,False,False,[],[],False,False,False,False,*beep boop*,[],False,,,moderator,t5_2qlqh,,,,j1a3yp,True,,AutoModerator,,45,False,all_ads,False,[],False,dark,/r/Android/comments/j1a3yp/moronic_monday_sep_...,all_ads,True,https://www.reddit.com/r/Android/comments/j1a3...,2268461,1601292000.0,0,,False,,,,,
1,,Android,,t2_65wds,False,,0,False,The Home Depot is selling Google’s new Chromec...,[],r/Android,False,6,,0,73.0,,False,t3_j1e68n,False,dark,0.97,,public,3317,0,{},140.0,,False,[],,False,False,,{},,False,3317,,False,https://b.thumbs.redditmedia.com/scixK6uf0GnqF...,False,,[],{},,False,,1601336000.0,text,6,,,text,theverge.com,True,,,,,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qlqh,,,,j1e68n,True,,zbhoy,,469,True,all_ads,False,[],False,,/r/Android/comments/j1e68n/the_home_depot_is_s...,all_ads,False,https://www.theverge.com/2020/9/28/21459849/go...,2268461,1601307000.0,4,,False,link,https://www.theverge.com/2020/9/28/21459849/go...,{'images': [{'source': {'url': 'https://extern...,,
2,,Android,,t2_31mkizvx,False,,0,False,Developer boots Android 11 on 22 older devices...,[],r/Android,False,6,,0,93.0,,False,t3_j1gjze,False,dark,0.95,,public,138,0,{},140.0,,False,[],,False,False,,{},,False,138,,False,https://a.thumbs.redditmedia.com/nDtW5EcfvEXqa...,False,,[],{},,False,,1601343000.0,text,6,,,text,xda-developers.com,False,,,,,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qlqh,,,,j1gjze,True,,apmcruZ,,23,True,all_ads,False,[],False,,/r/Android/comments/j1gjze/developer_boots_and...,all_ads,False,https://www.xda-developers.com/developer-boots...,2268461,1601314000.0,0,,False,link,https://www.xda-developers.com/developer-boots...,{'images': [{'source': {'url': 'https://extern...,,
3,,Android,,t2_74ql25l0,False,,0,False,[MKBHD] The Galaxy S20 Fan Edition: Hear Me Out!,[],r/Android,False,6,,0,105.0,,True,t3_j1oahj,False,dark,0.75,,public,38,0,"{'content': '&lt;iframe width=""600"" height=""33...",140.0,,False,[],"{'type': 'youtube.com', 'oembed': {'provider_u...",False,False,,"{'content': '&lt;iframe width=""600"" height=""33...",,False,38,,True,https://b.thumbs.redditmedia.com/iLaeKGbAuwDcA...,False,,[],{},,False,,1601367000.0,text,6,,,text,youtube.com,False,,,,,,False,False,False,False,False,[],[],False,False,False,False,,[],False,,,,t5_2qlqh,,,,j1oahj,True,,HayashiSawaryo,,42,True,all_ads,False,[],False,,/r/Android/comments/j1oahj/mkbhd_the_galaxy_s2...,all_ads,False,https://www.youtube.com/watch?v=azrdcp4yYas,2268461,1601338000.0,0,"{'type': 'youtube.com', 'oembed': {'provider_u...",False,rich:video,https://www.youtube.com/watch?v=azrdcp4yYas,{'images': [{'source': {'url': 'https://extern...,,
4,,Android,Recently a member of my family got Samsung Not...,t2_qdd9ax7,False,,0,False,Realised how annoying updating is without A/B ...,[],r/Android,False,6,,0,,,False,t3_j1c1yg,False,dark,0.88,,public,131,0,{},,cd5c020e-ff93-11e0-aba5-12313d265470,False,[],,False,False,,{},,False,131,,False,self,False,userDarkPink,[],{},,True,,1601329000.0,text,6,,,text,self.Android,False,"&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt...",,,,,False,False,False,False,False,[],[],False,False,False,False,Dark Pink,[],False,,,,t5_2qlqh,,,,j1c1yg,True,,whamenrespecter69,,86,True,all_ads,False,[],False,dark,/r/Android/comments/j1c1yg/realised_how_annoyi...,all_ads,False,https://www.reddit.com/r/Android/comments/j1c1...,2268461,1601300000.0,0,,False,,,,,


In [9]:
df_android = df_android[col_keep]
df_android.head()

Unnamed: 0,subreddit,selftext,title,upvote_ratio,score,num_comments,author
0,Android,"Note 1. Join us at /r/MoronicMondayAndroid, a ...",Moronic Monday (Sep 28 2020) - Your weekly que...,0.82,21,45,AutoModerator
1,Android,,The Home Depot is selling Google’s new Chromec...,0.97,3317,469,zbhoy
2,Android,,Developer boots Android 11 on 22 older devices...,0.95,138,23,apmcruZ
3,Android,,[MKBHD] The Galaxy S20 Fan Edition: Hear Me Out!,0.75,38,42,HayashiSawaryo
4,Android,Recently a member of my family got Samsung Not...,Realised how annoying updating is without A/B ...,0.88,131,86,whamenrespecter69


In [27]:
df_android.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 731 entries, 0 to 730
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   subreddit     731 non-null    object 
 1   selftext      731 non-null    object 
 2   title         731 non-null    object 
 3   upvote_ratio  731 non-null    float64
 4   score         731 non-null    int64  
 5   num_comments  731 non-null    int64  
 6   author        731 non-null    object 
dtypes: float64(1), int64(2), object(4)
memory usage: 45.7+ KB


In [10]:
# Export to csv
df_android.to_csv('../data/android_subreddit v3.csv', index = False)

### Merging data collection

In [11]:
df_android_v1 = pd.read_csv('../data/android_subreddit.csv')
df_apple_v1 = pd.read_csv('../data/apple_subreddit.csv')
df_android_v2 = pd.read_csv('../data/android_subreddit v2.csv')
df_apple_v2 = pd.read_csv('../data/apple_subreddit v2.csv')
df_android_v3 = pd.read_csv('../data/android_subreddit v3.csv')

In [12]:
# merge dataframes
combined_android_df = df_android_v2.append(df_android_v1, ignore_index=True)
combined_apple_df = df_apple_v2.append(df_apple_v1, ignore_index=True)

In [13]:
combined_android_df = combined_android_df.append(df_android_v3, ignore_index=True)

In [14]:
# Drop duplicates 
combined_android_df.drop_duplicates(subset='title', keep="first", inplace=True)
combined_android_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 776 entries, 0 to 1476
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   subreddit     776 non-null    object 
 1   selftext      200 non-null    object 
 2   title         776 non-null    object 
 3   upvote_ratio  776 non-null    float64
 4   score         776 non-null    int64  
 5   num_comments  776 non-null    int64  
 6   author        776 non-null    object 
dtypes: float64(1), int64(2), object(4)
memory usage: 48.5+ KB


In [15]:
combined_apple_df.drop_duplicates(subset='title', keep="first", inplace=True)
combined_apple_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 937 entries, 0 to 1607
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   subreddit     937 non-null    object 
 1   selftext      225 non-null    object 
 2   title         937 non-null    object 
 3   upvote_ratio  937 non-null    float64
 4   score         937 non-null    int64  
 5   num_comments  937 non-null    int64  
 6   author        937 non-null    object 
dtypes: float64(1), int64(2), object(4)
memory usage: 58.6+ KB


In [16]:
# Export to csv
combined_android_df.to_csv('../data/android_subreddit_final.csv', index = False)
combined_apple_df.to_csv('../data/apple_subreddit_final.csv', index = False)

### Data Dictionary

|Feature|Type|Dataset|Description|
|---|---|---|---|
|subreddit|str|android/ apple subreddits|The name of the subreddit|
|selftext|str|android/ apple subreddits|The body text of each subreddit post| 
|title|str|android/ apple subreddits|The title of each subreddit post|
|upvote_ratio|float|android/ apple subreddits|The number of upvotes a post has, divided by the total number of votes the post received|
|score|float|android/ apple subreddits|The number of upvotes a post has|
|num_comments|float|android/ apple subreddits|The number of upvotes a post has| 