# Data Collection

This notebook is dedicated to the initial data collection of my term project. The first thing that I need to do is install PRAW, which is an API wrapper used to scrape data from Reddit. Information regarding PRAW and its documentation can be found [here](https://praw.readthedocs.io/en/stable/).

In [3]:
pip install praw

Note: you may need to restart the kernel to use updated packages.


The next thing that I will do is import all libraries that I need for data collection. The libraries and their uses are as follows:
- **PRAW**: Reddit Reddit API Wrapper that allows users to work with various aspects of Reddit(subreddits, posts, etc.)
- **PANDAS**: Data Analysis library that allows users to easily work with data 
- **NUMPY**: A library for matrices, arrays, and mathematical functions

In [1]:
# This cell contains all of the imports that I will be using
import praw
import pandas as pd
import numpy as np

After importing everything, I can now work with the API to select and clean up data from Reddit.

In [19]:
# This initializes the API with information that is given by the Reddit API when you register a script on their website
pi = open("privateInfo.txt", "r")
privateInfo = []
[privateInfo.append(x) for x in pi.read().split('\n')]
reddit = praw.Reddit(client_id=privateInfo[0], client_secret=privateInfo[1], user_agent=privateInfo[2])

### Gaming Subreddit

For the purposes of my project, I will be analyzing posts from various subreddits on Reddit. The first subreddit that I will be collecting and cleaning up posts from is [r/gaming](https://www.reddit.com/r/gaming/).

In [20]:
# Initializing gamer_posts to the subreddit titled "gaming"
gamer_posts = reddit.subreddit('gaming')

In [21]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to the discussion of all things related to gaming
gamer_posts.description

'**If your submission does not appear, do not delete it. Simply [message the moderators](https://www.reddit.com/message/compose?to=%2Fr%2Fgaming) and ask us to look into it.**\n\n*Do NOT private message or use reddit chat to contact moderators about moderator actions. Only message the team via the link above. Directly messaging individual moderators may result in a temporary ban.*\n\n\n\n---\n#Community Rules\n\n1. **Submissions must be directly gaming-related**, not just a "forced" connection via the title or a caption added to the content.  Note that we do not allow non-gaming meme templates as submissions. **Discussion prompts must be made as text posts.**\n\n\n\n1. No bandwagon/raid/"pass it on" or direct reply posts.\n\n1. No piracy, even "abandonware".\n\n1. Mark your spoilers and NSFW submissions, comments and links. Spoiler tags are `>!X kills Y!<`  . Cosplay posts from content creators who focus primarily is adult content will be removed.\n\n1. No Giveaways / Trades / Contests

In [8]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in gamer_posts.new(limit = 4000)]
[authors.append(x.author) for x in gamer_posts.new(limit = 4000)]
[titles.append(x.title) for x in gamer_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in gamer_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in gamer_posts.new(limit = 4000)]
[scores.append(x.score) for x in gamer_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in gamer_posts.new(limit = 4000)]

# New dataframe
gaming_df = pd.DataFrame()

# Assigning lists to columns
gaming_df['Title'] = titles
gaming_df['Id'] = ids
gaming_df['Text'] = texts
gaming_df['Author'] = authors
gaming_df['Number of Comments'] = numComments
gaming_df['Number of Upvotes'] = scores
gaming_df['Ratio of Upvotes'] = upvoteRatios

# Print head
gaming_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,hey people how do I get rid of this dust?,11akfej,,Lychee247,4,1,1.0
1,What game studio do you trust with your eyes c...,11akf5y,,olatungijidejj,3,1,1.0
2,atomic heart anal,11ak8mn,is atomic heart have anal? can you anal? if ca...,ishitmyselfhard,9,0,0.15
3,God of War should have won GOTY,11ak6wl,I am currently playing God of War and it shoul...,msr2020,10,0,0.38
4,"I finally finished Super Mario 64, it only too...",11ajv4m,,forgiven_obscenity,5,17,0.95
5,"Another ""Live service"" with a battle pass...",11ajrq4,,DynamiteSuren,5,8,0.83
6,Epic gamer moment,11ajrkd,,nathnhart1,0,10,0.86
7,Maybe not an every day question - is there a t...,11a5knm,I've been looking to get something like this t...,CreativeFun228,1,3,1.0
8,"But guys, real communism has never really been...",11ajntx,,_Killj0y_,9,0,0.41
9,Sons of The Forest early access living up to a...,11ajnbk,,Mysterii00,2,5,0.86


In [9]:
# Printing out the shape, essentially the number of entries and columns
gaming_df.shape

(902, 7)

The next thing that I am going to do is fill all empty values with NaN. After this, I will remove all entries with NaN values. This will significantly reduce the number of entries, which I can account for through adding more entries at a later date if needed. This step will occur for all subreddits that I collect posts from.

In [10]:
gaming_df.replace('', np.nan, inplace = True)
gaming_df.dropna(inplace = True)
gaming_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
2,atomic heart anal,11ak8mn,is atomic heart have anal? can you anal? if ca...,ishitmyselfhard,9,0,0.15
3,God of War should have won GOTY,11ak6wl,I am currently playing God of War and it shoul...,msr2020,10,0,0.38
7,Maybe not an every day question - is there a t...,11a5knm,I've been looking to get something like this t...,CreativeFun228,1,3,1.0
17,Switch or Steam Deck?,11aj1w1,I wanna buy one or the other..Ive had a switch...,brokenheatherrrrr,13,2,1.0
19,Games like southpark TFBW,11ailip,Hello to everyone. As a big fan of the southpa...,FatzoChewkovski,17,3,0.83
21,Really need some help regarding Xbox Series X ...,11aidu3,I currently have a 3 year Xbox Gamepass Ultima...,TheObviousBurnerAcc,25,1,0.6
22,What's a good game for somebody who doesn't ha...,11aiczn,I'm just looking for a game that doesn't requi...,Dmanduck,14,2,1.0
25,how do I start playing fps games?,11ahyvd,"I only ever play games like Minecraft, terrari...",XC3LL1UM,8,1,0.67
27,Samurai Games like Vagabond,11ahuie,Recently started reading a manga called vagabo...,YaBoiMibb,6,2,1.0
28,Playstation Franchises,11ahrg8,It seems and these are just my own thoughts an...,undefeated_Equality,2,2,0.75


In [11]:
gaming_df.shape

(463, 7)

In [133]:
gaming_df.to_csv('gamingData.csv')

### Lawyertalk subreddit

The second subreddit that I will be collecting and cleaning up posts from is [r/lawyertalk](https://www.reddit.com/r/Lawyertalk/).

In [12]:
# Initializing lawyer_posts to the subreddit titled "lawyertalk"
lawyer_posts = reddit.subreddit('lawyertalk')

In [13]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussion between lawyers
lawyer_posts.description

'This is a place for practicing lawyers to discuss their profession and everything associated with it.  Unlike [/r/law](http://www.reddit.com/r/law/), this is not a place for posting articles or updates about the legal world at large.  Rather, this subreddit is for discussion about lawyering itself.    \n\nBasically, this a great place to:\n\n* Discuss/lament the culture of your firm/non-profit/whatever\n\n* Get advice from other practicing lawyers on anything.\n\n* Vent about issues only other lawyers would find interesting (AEDPA anyone?)\n\n* Post esoteric memes\n\n_______________________________________\n\nRelated Subereddits:\n\n[/r/law](http://www.reddit.com/r/law/) - For discussion about legal news, and law in the abstract\n\n[/r/lawschool](http://www.reddit.com/r/lawschool/) - For discussion about law school\n\n[/r/lawfirm](http://www.reddit.com/r/lawfirm/) - For discussion about solo/small firm practice'

In [14]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in lawyer_posts.new(limit = 4000)]
[authors.append(x.author) for x in lawyer_posts.new(limit = 4000)]
[titles.append(x.title) for x in lawyer_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in lawyer_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in lawyer_posts.new(limit = 4000)]
[scores.append(x.score) for x in lawyer_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in lawyer_posts.new(limit = 4000)]

# New dataframe
lawyer_df = pd.DataFrame()

# Assigning lists to columns
lawyer_df['Title'] = titles
lawyer_df['Id'] = ids
lawyer_df['Text'] = texts
lawyer_df['Author'] = authors
lawyer_df['Number of Comments'] = numComments
lawyer_df['Number of Upvotes'] = scores
lawyer_df['Ratio of Upvotes'] = upvoteRatios

# Print head
lawyer_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Legal advice,11akhq9,"So basically on discord, I was in a crypto tra...",Charming_Anteater_73,0,1,1.0
1,Has anyone ever worked with or against Alex Mu...,11ah03n,"If so, what’s he like?",Tracy_Turnblad,1,2,1.0
2,JAG opinion,11aglti,How do you guys feel about JAGs?\n\nnot the tv...,tyrionthedrunk,3,1,1.0
3,Private sub for practicing r/Prosecutors,11aeag7,We recently started a new sub for practicing r...,weirdbeardwolf,1,0,0.33
4,Just another day as a commercial lawyer.,11adfsj,,KillerDadBod,15,63,0.97
5,Looking for a quote about lawyers defending gu...,11abm2s,Hello! I heard a quote a while ago about the m...,EarlyInterview1274,3,0,0.33
6,If I want to go to law school & be a criminal ...,11abler,,Interesting-Bar6885,24,0,0.47
7,Best SEO companies that get results and charge...,11aauk1,I am working with an SEO company who has gotte...,Fragrant_Self_7104,2,0,0.33
8,Paralegal was rude to me. Am I being unreasona...,11a9u7r,"New hire at an ID firm. Lot of work, lot of pr...",Mission_Ad5628,29,4,0.65
9,Interview?,11a84q5,"Interview? Hello everyone, I'm currently in co...",Then-Poem7543,4,3,0.67


In [15]:
lawyer_df.shape

(891, 7)

In [16]:
lawyer_df.replace('', np.nan, inplace = True)
lawyer_df.dropna(inplace = True)
lawyer_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Legal advice,11akhq9,"So basically on discord, I was in a crypto tra...",Charming_Anteater_73,0,1,1.0
1,Has anyone ever worked with or against Alex Mu...,11ah03n,"If so, what’s he like?",Tracy_Turnblad,1,2,1.0
2,JAG opinion,11aglti,How do you guys feel about JAGs?\n\nnot the tv...,tyrionthedrunk,3,1,1.0
3,Private sub for practicing r/Prosecutors,11aeag7,We recently started a new sub for practicing r...,weirdbeardwolf,1,0,0.33
5,Looking for a quote about lawyers defending gu...,11abm2s,Hello! I heard a quote a while ago about the m...,EarlyInterview1274,3,0,0.33
7,Best SEO companies that get results and charge...,11aauk1,I am working with an SEO company who has gotte...,Fragrant_Self_7104,2,0,0.33
8,Paralegal was rude to me. Am I being unreasona...,11a9u7r,"New hire at an ID firm. Lot of work, lot of pr...",Mission_Ad5628,29,4,0.65
9,Interview?,11a84q5,"Interview? Hello everyone, I'm currently in co...",Then-Poem7543,4,3,0.67
10,Do you generally hate other lawyers?,11a7f7y,"Hate is defined as, you wouldn't totally feel ...",SaltMembership9044,22,1,0.5
11,"In the last 30 hours, I’ve billed 24 of them.",11a5b2z,"Just needed to vent, and also something of a P...",olemiss18,56,117,0.95


In [17]:
lawyer_df.shape

(719, 7)

In [134]:
lawyer_df.to_csv('lawyerData.csv')

### Cryptocurrency subreddit

The third subreddit that I will be collecting and cleaning up posts from is [r/cryptocurrency](https://www.reddit.com/r/CryptoCurrency/).

In [18]:
# Initializing cryptocurrency_posts to the subreddit titled "cryptocurrency"
cryptocurrency_posts = reddit.subreddit('cryptocurrency')

In [19]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussion and news about cryptocurrency
cryptocurrency_posts.description



In [22]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in cryptocurrency_posts.new(limit = 4000)]
[authors.append(x.author) for x in cryptocurrency_posts.new(limit = 4000)]
[titles.append(x.title) for x in cryptocurrency_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in cryptocurrency_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in cryptocurrency_posts.new(limit = 4000)]
[scores.append(x.score) for x in cryptocurrency_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in cryptocurrency_posts.new(limit = 4000)]

# New dataframe
cryptocurrency_df = pd.DataFrame()

# Assigning lists to columns
cryptocurrency_df['Title'] = titles
cryptocurrency_df['Id'] = ids
cryptocurrency_df['Text'] = texts 
cryptocurrency_df['Author'] = authors
cryptocurrency_df['Number of Comments'] = numComments
cryptocurrency_df['Number of Upvotes'] = scores
cryptocurrency_df['Ratio of Upvotes'] = upvoteRatios

# Print head
cryptocurrency_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,BUSD fell to $0.20 on Binance after trader sol...,11akgt0,,sublimeload420,11,0,0.43
1,10 Underrated Cryptocurrencies to Watch in 2023,11akfst,1.) Cardano (ADA): A proof-of-stake blockchain...,Fptmike,28,5,0.65
2,Individual reddit sub currencies.,11akf6a,Moons are a currency you can earn in this sub....,Weary_Dark510,19,6,0.88
3,Massachusetts man charged after mining crypto ...,11ak9bt,,alecz123,12,7,1.0
4,There are now more than 130 U.S. banks activel...,11ak6rd,,kryptoNoob69420,11,5,1.0
5,If art can be a store of value then so can Bit...,11ak2xk,I see BTC as a testament to a combination of i...,Frosty-Cone,38,12,0.8
6,Dead brands & web3 - a cash grab or innovation?,11ajbq4,[Radioshack Token:](https://coinmarketcap.com/...,Wack0Wizard,11,12,1.0
7,ETH and Decepticons in OK? - P1,11ajww6,"Hey there fellow investors, I've got some exci...",My_mate_Miyaguchi,17,8,0.83
8,A neat trick to backtest your trading strategy...,11ajpdc,Most of the people who purchase premium versio...,Chysce,8,5,1.0
9,SEC is not the appropriate regulator for stabl...,11ajm81,,z0uNdz,17,13,1.0


In [23]:
cryptocurrency_df.shape

(911, 7)

In [24]:
cryptocurrency_df.replace('', np.nan, inplace = True)
cryptocurrency_df.dropna(inplace = True)
cryptocurrency_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
1,10 Underrated Cryptocurrencies to Watch in 2023,11akfst,1.) Cardano (ADA): A proof-of-stake blockchain...,Fptmike,28,5,0.65
2,Individual reddit sub currencies.,11akf6a,Moons are a currency you can earn in this sub....,Weary_Dark510,19,6,0.88
5,If art can be a store of value then so can Bit...,11ak2xk,I see BTC as a testament to a combination of i...,Frosty-Cone,38,12,0.8
6,Dead brands & web3 - a cash grab or innovation?,11ajbq4,[Radioshack Token:](https://coinmarketcap.com/...,Wack0Wizard,11,12,1.0
7,ETH and Decepticons in OK? - P1,11ajww6,"Hey there fellow investors, I've got some exci...",My_mate_Miyaguchi,17,8,0.83
8,A neat trick to backtest your trading strategy...,11ajpdc,Most of the people who purchase premium versio...,Chysce,8,5,1.0
15,"For your own sake, stop mentioning crypto to p...",11aj2j4,It is definitely exciting to talk about crypto...,fatfk69,116,32,0.94
18,Need Ideas For My First Crypto Focused Website...,11aiq66,Hello everyone I’m a webmaster and hold a Degr...,Goal2030_1B,27,2,0.5
19,"Liquid staking, Interchain Security and the fi...",11aiinb,The Cosmos Hub is rapidly gaining a centraliza...,Jcook_14,13,3,0.64
20,Sat/byte,11ai8q5,Can I get help understanding the concept of 1 ...,Smok_eater,15,5,0.78


In [25]:
cryptocurrency_df.shape

(323, 7)

In [135]:
cryptocurrency_df.to_csv('cryptoData.csv')

### Sports subreddit

The fourth subreddit that I will be collecting and cleaning up posts from is [r/nfl](https://www.reddit.com/r/nfl/).

In [26]:
# Initializing sports_posts to the subreddit titled "nfl"
sports_posts = reddit.subreddit('nfl')

In [27]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussion about nfl
sports_posts.description

'[Follow us on Twitter @nflreddit](https://twitter.com/nflreddit "ad-twitter")\n\n[Reddit is AntiCSS - read what the redesign means to r/nfl](https://www.reddit.com/r/nfl/comments/8glmp1/rnfl_the_redesign_and_the_future_of_reddit/ "pro-css")\n\n[Rules/Posting Guidelines](/r/nfl/wiki/postingguidelines "subreddit-rules")\n\n####FAQ\n\n[FAQ Page](/r/nfl/w/faq)  \n\n**[Select your team flairs here](https://www.reddit.com/r/nfl/wiki/flair/primary)**\n\n[Which team should I root for?](https://www.reddit.com/r/nfl/wiki/new/pickateam)\n\n#### [](/blank "START scores")\n\n## Super Bowl Schedule\n\nTime|Away||@||Home\n|:--|:--:|--:|:--:|:--|:--:|\nSun 06:30PM|[*KC*](/r/kansascitychiefs)|38|@|35|[*PHI*](/r/eagles)\n\n\n#### [](/blank "END scores")\n\n*All times (EST)*\n\n[Which games will be shown locally?](http://506sports.com/nfl/)\n\n\n\n#### [](/blank "START standings")\n\n## Standings\n\n|AFC|North|AFC|South|\n|--:|:--|--:|:--|\n|[*CIN*](/r/bengals)|12-4^xz|[*JAX*](/r/jaguars)|9-8^xz|\n|[*BA

In [28]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in sports_posts.new(limit = 4000)]
[authors.append(x.author) for x in sports_posts.new(limit = 4000)]
[titles.append(x.title) for x in sports_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in sports_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in sports_posts.new(limit = 4000)]
[scores.append(x.score) for x in sports_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in sports_posts.new(limit = 4000)]

# New dataframe
sports_df = pd.DataFrame()

# Assigning lists to columns
sports_df['Title'] = titles
sports_df['Id'] = ids
sports_df['Text'] = texts 
sports_df['Author'] = authors
sports_df['Number of Comments'] = numComments
sports_df['Number of upvotes'] = scores
sports_df['Ratio of Upvotes'] = upvoteRatios

# Print head
sports_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Heres a video on the famous cave,11ajmr3,,Niners22,6,0,0.26
1,[Fowler] The #Lions are signing defensive coor...,11aje9q,,politicallyMarston,10,22,0.96
2,does the meaningless of fall/winter weather fo...,11aj97q,i’ve been pondering. it’s been… so windy these...,bear-the-bear,12,0,0.26
3,[PFT] Former NFL receiver Sam Hurd released fr...,11ahof2,,OneAngryPanda,96,201,0.97
4,[PFF] Lane Johnson did not allow a single QB h...,11ahcyg,,SourBerry1425,42,148,0.94
5,Found this Bleacher Report article from 2011 -...,11afx2e,https://syndication.bleacherreport.com/amp/697...,Eravian,81,46,0.86
6,Looking at Chris Simms' Top 40 QB List from 2....,11aflur,https://sports.nbcsports.com/2020/07/07/top-40...,kid-vicious,80,64,0.77
7,How much longer do you think the longer tenure...,11afbcv,How many more seasons do you think there will ...,Fearless-Muffin,45,15,0.72
8,32 teams / 32 days: Day x - The New Orleans Sa...,11ae2ye,***32 teams / 32 days: Day x - The New Orleans...,Firefawkes17,46,61,0.86
9,"[Rosenthal] its scary for NFL owners, but yes ...",11adcu9,,Jay_Dubbbs,262,498,0.89


In [29]:
sports_df.shape

(965, 7)

In [30]:
sports_df.replace('', np.nan, inplace = True)
sports_df.dropna(inplace = True)
sports_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
2,does the meaningless of fall/winter weather fo...,11aj97q,i’ve been pondering. it’s been… so windy these...,bear-the-bear,12,0,0.26
5,Found this Bleacher Report article from 2011 -...,11afx2e,https://syndication.bleacherreport.com/amp/697...,Eravian,81,46,0.86
6,Looking at Chris Simms' Top 40 QB List from 2....,11aflur,https://sports.nbcsports.com/2020/07/07/top-40...,kid-vicious,80,64,0.77
7,How much longer do you think the longer tenure...,11afbcv,How many more seasons do you think there will ...,Fearless-Muffin,45,15,0.72
8,32 teams / 32 days: Day x - The New Orleans Sa...,11ae2ye,***32 teams / 32 days: Day x - The New Orleans...,Firefawkes17,46,61,0.86
11,Total Number of Subscribers in Each NFL Team S...,11acnbq,"Hello, I felt like making a chart/graph thing ...",fluffy_77,44,58,0.86
13,Who is off the trade table completely?,11abs63,If the Chiefs were offered 5 number 1s for Mah...,LocalSteve504,336,161,0.83
14,Is Geno Smith the next Case Keenum? Or is he a...,11abgbb,When you think about it the situations are pre...,usernamesniping,101,105,0.88
22,Free agency fits for all 32 NFL teams:,119xski,&#x200B;\n\nhttps://preview.redd.it/toejshpdbx...,hallach_halil,0,0,0.48
27,"Other than Patrick Mahomes and Brett Favre, wh...",11a8r9z,"I could see Brady, Unitas, and Rodgers getting...",nickelst92,459,304,0.78


In [31]:
sports_df.shape

(285, 7)

In [136]:
sports_df.to_csv('sportsData.csv')

### College subreddit

The fifth subreddit that I will be collecting and cleaning up posts from is [r/college](https://www.reddit.com/r/college/).

In [32]:
# Initializing cryptocurrency_posts to the subreddit titled "college"
college_posts = reddit.subreddit('college')

In [33]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussions about various aspects of college
college_posts.description

'#####Please see our [rules](https://www.reddit.com/r/college/about/rules/) before posting here.\n\n/r/college is a place for discussion related to college and collegiate life.\n\nTo maintain the quality of the discourse, we remove some types of content and ban users for certain violations of community norms. *Help the mods improve this subreddit/enforce these rules by reporting posts that are irrelevant, pointless, or of poor quality.*\n\n**Behavior that will result in a user ban:**\n\n1. **Posting spam** - including but not limited to SURVEYS, blog posts, links to low quality/crowdsourced websites, discord, copypasta, etc.\n2. **Seeking personal gain** – including but not limited to referrals, contests/giveaways, requests for votes/money, any attempt to sell or advertise a product/service/website, etc.\n3. **Engaging in threats or harassment.** \n4. **Advocating for dangerous or illegal activities** – including but not limited to cheating, copyright violation, fraud, etc. \n5. **Spre

In [34]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in college_posts.new(limit = 4000)]
[authors.append(x.author) for x in college_posts.new(limit = 4000)]
[titles.append(x.title) for x in college_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in college_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in college_posts.new(limit = 4000)]
[scores.append(x.score) for x in college_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in college_posts.new(limit = 4000)]

# New dataframe
college_df = pd.DataFrame()

# Assigning lists to columns
college_df['Title'] = titles
college_df['Id'] = ids
college_df['Text'] = texts 
college_df['Author'] = authors
college_df['Number of Comments'] = numComments
college_df['Number of upvotes'] = scores
college_df['Ratio of Upvotes'] = upvoteRatios

# Print head
college_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Dorm room,11ake77,"Hey guys, so I don’t have a roommate this seme...",Own-Ambition6657,1,1,1.0
1,AP class choices in high school,11ak4w6,I’ve been told that in high school rather than...,Competitive_Contact5,1,1,1.0
2,It's been a while to go college after almost 4...,11ak3ys,Hi. I'm (21F) and I will go to the college soo...,Wary-Unrest,0,1,0.99
3,Would double majoring in community college be ...,11ajplp,"The title, basically. I'm currently in communi...",CallOfTheQueer,2,3,1.0
4,What classes to take senior year?,11aiken,I have been fortunate to satisfy *all* of my g...,PizzaZealousideal,0,2,1.0
5,Is getting a job during school worth it?,11aihvd,My parents think I should be working during co...,Strange616616,4,2,1.0
6,How much harder is a BS in psychology than a B...,11ahzma,I struggle with my mental health and I’m worri...,th1399,11,3,1.0
7,"A college ""corrected my FAFSA"" and now my EFC ...",11ahn18,"Hey everyone! So I am a high school senior, an...",MaximumP0wer,0,6,1.0
8,"I’m lost, I don’t know what to do! Depression ...",11ah6tu,"This semester Im taking 4 courses, Calc2, disc...",Ok_Ad_9986,2,1,1.0
9,Should I attend a lower quality college with v...,11afnhk,I am accepted into both University of Mary Har...,av4n_iv,4,7,1.0


In [35]:
college_df.shape

(956, 7)

In [36]:
college_df.replace('', np.nan, inplace = True)
college_df.dropna(inplace = True)
college_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Dorm room,11ake77,"Hey guys, so I don’t have a roommate this seme...",Own-Ambition6657,1,1,1.0
1,AP class choices in high school,11ak4w6,I’ve been told that in high school rather than...,Competitive_Contact5,1,1,1.0
2,It's been a while to go college after almost 4...,11ak3ys,Hi. I'm (21F) and I will go to the college soo...,Wary-Unrest,0,1,0.99
3,Would double majoring in community college be ...,11ajplp,"The title, basically. I'm currently in communi...",CallOfTheQueer,2,3,1.0
4,What classes to take senior year?,11aiken,I have been fortunate to satisfy *all* of my g...,PizzaZealousideal,0,2,1.0
5,Is getting a job during school worth it?,11aihvd,My parents think I should be working during co...,Strange616616,4,2,1.0
6,How much harder is a BS in psychology than a B...,11ahzma,I struggle with my mental health and I’m worri...,th1399,11,3,1.0
7,"A college ""corrected my FAFSA"" and now my EFC ...",11ahn18,"Hey everyone! So I am a high school senior, an...",MaximumP0wer,0,6,1.0
8,"I’m lost, I don’t know what to do! Depression ...",11ah6tu,"This semester Im taking 4 courses, Calc2, disc...",Ok_Ad_9986,2,1,1.0
9,Should I attend a lower quality college with v...,11afnhk,I am accepted into both University of Mary Har...,av4n_iv,4,7,1.0


In [37]:
college_df.shape

(934, 7)

In [137]:
college_df.to_csv('collegeData.csv')

### Explain Like I'm Five subreddit

The sixth subreddit that I will be collecting and cleaning up posts from is [r/explainlikeimfive](https://www.reddit.com/r/explainlikeimfive/).

In [38]:
# Initializing cryptocurrency_posts to the subreddit titled "Explain Like I'm Five"
elif_posts = reddit.subreddit('explainlikeimfive')

In [39]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people asking for concepts to be explained very simply for them
elif_posts.description

"[Request an explanation](/r/explainlikeimfive/submit?selftext=true&title=ELI5%3A)\n\n[Rules](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules#)\n\n\n---\n\n[Have an idea to improve ELI5?  r/IdeasForELI5](http://www.reddit.com/r/ideasforeli5)\n\n---\n\n###Before posting \n\n* Make sure to [ read the rules!](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules)\n\n* This subreddit is for asking for objective explanations. It is not a repository for any question you may have.\n\n* E is for Explain - merely answering a question is not enough.\n\n* LI5 means friendly, simplified and layperson-accessible explanations - not responses aimed at literal five-year-olds.\n\n* Perform a keyword search, you may find good explanations in past threads. You should also consider looking for your question in the FAQ.\n\n* Don't post to argue a point of view.\n\n\n* Flair your question after you've submitted it.\n\n---\n###Category filters\n\n---\n[Mathematics](https://ma.reddit.c

In [40]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in elif_posts.new(limit = 4000)]
[authors.append(x.author) for x in elif_posts.new(limit = 4000)]
[titles.append(x.title) for x in elif_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in elif_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in elif_posts.new(limit = 4000)]
[scores.append(x.score) for x in elif_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in elif_posts.new(limit = 4000)]

# New dataframe
elif_df = pd.DataFrame()

# Assigning lists to columns
elif_df['Title'] = titles
elif_df['Id'] = ids
elif_df['Text'] = texts 
elif_df['Author'] = authors
elif_df['Number of Comments'] = numComments
elif_df['Number of upvotes'] = scores
elif_df['Ratio of Upvotes'] = upvoteRatios

# Print head
elif_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,ELI5: How do hotels ensure there is plenty of ...,11akm7x,Particularly in the morning when many guess ar...,iBarbo,2,3,1.0
1,ELI5: How do cash registers/card readers work?,11ajr0n,I.e. if the cashier were to hit 'paid' on the ...,DJ_Jonezy,4,2,1.0
2,ELI5: How can companies give a “discount” for ...,11ajaiu,Isn’t paying more (not getting a discount) bec...,chickenlickeniii,4,4,1.0
3,ELI5 Difference between depth charges and mines,11aiyih,And which one is more dangerous to a ship or s...,yestext,5,3,0.75
4,ELI5:How can time be relative when the univers...,11ai4aw,It seems to me that we ought to be able to use...,AromaticDetective565,8,7,0.8
5,"eli5, ive seen videos with blue and red lights...",11ah8uy,,migthylord,9,6,1.0
6,ELI5: How does my 128gb iPhone store dozens of...,11agafr,,1stnate,3,0,0.25
7,ELI5 : What actually is time?,11afy46,I can't wrap my head around the theory of rela...,stoic_payyan,9,0,0.38
8,ELI5: Why is wild caught fish healthier than f...,11af2wr,,Public_Tomatillo_966,40,61,0.79
9,ELI5: how does high affinity mean high reactivty?,11aeyby,"im having a hard time grasping this concept, h...",Conscious-Comment568,10,2,0.75


In [41]:
elif_df.shape

(984, 7)

In [42]:
elif_df.replace('', np.nan, inplace = True)
elif_df.dropna(inplace = True)
elif_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,ELI5: How do hotels ensure there is plenty of ...,11akm7x,Particularly in the morning when many guess ar...,iBarbo,2,3,1.0
1,ELI5: How do cash registers/card readers work?,11ajr0n,I.e. if the cashier were to hit 'paid' on the ...,DJ_Jonezy,4,2,1.0
2,ELI5: How can companies give a “discount” for ...,11ajaiu,Isn’t paying more (not getting a discount) bec...,chickenlickeniii,4,4,1.0
3,ELI5 Difference between depth charges and mines,11aiyih,And which one is more dangerous to a ship or s...,yestext,5,3,0.75
4,ELI5:How can time be relative when the univers...,11ai4aw,It seems to me that we ought to be able to use...,AromaticDetective565,8,7,0.8
7,ELI5 : What actually is time?,11afy46,I can't wrap my head around the theory of rela...,stoic_payyan,9,0,0.38
9,ELI5: how does high affinity mean high reactivty?,11aeyby,"im having a hard time grasping this concept, h...",Conscious-Comment568,10,2,0.75
10,eli5 the differences in engineering between tr...,11adfuh,What are the differences between the Undergrou...,lankabrit,2,2,0.67
12,ELI5: How does the tongue taste things so spec...,11ad0df,Example: If i was to eat chicken and steak (as...,ibetonsport,7,0,0.55
16,"ELI5: Why are there no B4, B8, B10, or B11?",11aal46,I was curious and looked up how many B vitamin...,tokenwhitegirlx,10,25,0.72


In [43]:
elif_df.shape

(586, 7)

In [138]:
elif_df.to_csv('elifData.csv')

### Anime subreddit

The seventh subreddit that I will be collecting and cleaning up posts from is [r/anime](https://www.reddit.com/r/anime/).

In [44]:
# Initializing anime_posts to the subreddit titled "anime"
anime_posts = reddit.subreddit('anime')

In [45]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people who like anime
elif_posts.description

"[Request an explanation](/r/explainlikeimfive/submit?selftext=true&title=ELI5%3A)\n\n[Rules](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules#)\n\n\n---\n\n[Have an idea to improve ELI5?  r/IdeasForELI5](http://www.reddit.com/r/ideasforeli5)\n\n---\n\n###Before posting \n\n* Make sure to [ read the rules!](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules)\n\n* This subreddit is for asking for objective explanations. It is not a repository for any question you may have.\n\n* E is for Explain - merely answering a question is not enough.\n\n* LI5 means friendly, simplified and layperson-accessible explanations - not responses aimed at literal five-year-olds.\n\n* Perform a keyword search, you may find good explanations in past threads. You should also consider looking for your question in the FAQ.\n\n* Don't post to argue a point of view.\n\n\n* Flair your question after you've submitted it.\n\n---\n###Category filters\n\n---\n[Mathematics](https://ma.reddit.c

In [48]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in anime_posts.new(limit = 4000)]
[authors.append(x.author) for x in anime_posts.new(limit = 4000)]
[titles.append(x.title) for x in anime_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in anime_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in anime_posts.new(limit = 4000)]
[scores.append(x.score) for x in anime_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in anime_posts.new(limit = 4000)]

# New dataframe
anime_df = pd.DataFrame()

# Assigning lists to columns
anime_df['Title'] = titles
anime_df['Id'] = ids
anime_df['Text'] = texts 
anime_df['Author'] = authors
anime_df['Number of Comments'] = numComments
anime_df['Number of upvotes'] = scores
anime_df['Ratio of Upvotes'] = upvoteRatios

# Print head
anime_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,anyone know somewhere you can watch anime with...,11akstj,I'm trying to show a class anime but not every...,kunfookitten,3,0,0.5
1,What are some Anime that have great concepts/i...,11akn99,For me a good example would be Attack on Titan...,Styari,0,0,0.2
2,missing anime like golden time and my little m...,11akbls,its been so long since i could find an anime l...,Maclordsybyr,3,1,1.0
3,"detective Conan fans ,why do you guys still wa...",11akb7p,I used to be fan of detective Conan myself but...,Shillofnoone,3,0,0.33
4,Anime with bittersweet moments,11ak05k,Something like Yui from Angel Beats or Sayori ...,IntelligentFinance59,5,0,0.5
5,Rank your top 5 anime movies of all time!!,11ajrsb,5.) Mugen Train\n\n4.) AKIRA\n \n3.) Nausicaa ...,KnightofAmethyst,11,0,0.4
6,Lycoris Recoil Episode 6,11ajrmq,Is there a reason Chisato got within hand-to-h...,WhoGAF,14,0,0.36
7,"40yo father, watched some new anime with his kids",11ajo7x,"Hi there, this might come off as a strange pos...",Current-Rub9309,4,4,0.75
8,"In your opinion, what anime has the best intro...",11ajfrb,Mine is either More than a Married Couple but ...,StuffedCheeseBoi,5,2,0.71
9,Any anime like this?,11aiwl8,"Anime that'll make you believe in something, a...",weishenmyguy,9,1,0.67


In [49]:
anime_df.shape

(964, 7)

In [50]:
anime_df.replace('', np.nan, inplace = True)
anime_df.dropna(inplace = True)
anime_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,anyone know somewhere you can watch anime with...,11akstj,I'm trying to show a class anime but not every...,kunfookitten,3,0,0.5
1,What are some Anime that have great concepts/i...,11akn99,For me a good example would be Attack on Titan...,Styari,0,0,0.2
2,missing anime like golden time and my little m...,11akbls,its been so long since i could find an anime l...,Maclordsybyr,3,1,1.0
3,"detective Conan fans ,why do you guys still wa...",11akb7p,I used to be fan of detective Conan myself but...,Shillofnoone,3,0,0.33
4,Anime with bittersweet moments,11ak05k,Something like Yui from Angel Beats or Sayori ...,IntelligentFinance59,5,0,0.5
5,Rank your top 5 anime movies of all time!!,11ajrsb,5.) Mugen Train\n\n4.) AKIRA\n \n3.) Nausicaa ...,KnightofAmethyst,11,0,0.4
6,Lycoris Recoil Episode 6,11ajrmq,Is there a reason Chisato got within hand-to-h...,WhoGAF,14,0,0.36
7,"40yo father, watched some new anime with his kids",11ajo7x,"Hi there, this might come off as a strange pos...",Current-Rub9309,4,4,0.75
8,"In your opinion, what anime has the best intro...",11ajfrb,Mine is either More than a Married Couple but ...,StuffedCheeseBoi,5,2,0.71
9,Any anime like this?,11aiwl8,"Anime that'll make you believe in something, a...",weishenmyguy,9,1,0.67


In [51]:
anime_df.shape

(761, 7)

In [139]:
anime_df.to_csv('animeData.csv')

### CS Career Questions subreddit

The eighth subreddit that I will be collecting and cleaning up posts from is [r/cscareerquestions](https://www.reddit.com/r/cscareerquestions/)

In [52]:
# Initializing ccq_posts to the subreddit titled "CS Career Questions"
ccq_posts = reddit.subreddit('cscareerquestions')

In [53]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people asking CS career questions
ccq_posts.description

'**Welcome, one and all, to CSCareerQuestions!** \n\nHere we discuss careers in Computer Science, Computer Engineering, Software Engineering, and related fields. Please keep the conversation professional, adhere to the [reddiquette](https://www.reddit.com/wiki/reddiquette), and remember to [READ OUR RULES](/r/cscareerquestions/w/posting_rules).\n\n---\n\n# Discord\n\nCSCQ regular u/Kevincav runs a discord called CS Career Hub. Please check it out for your chatting needs: https://discord.gg/cscareerhub\n\nPlease note that **we, the CSCQ mod team are not in charge of this discord.**\n\n---\n\n#Want to ask a question?\n\n* **First**: [Read the rules](https://www.reddit.com/r/cscareerquestions/wiki/posting_rules)\n\n* **Second**: [Check out this awesome "quick answers to common questions" thread](https://www.reddit.com/r/cscareerquestions/comments/4qurgo/in_which_i_attempt_to_answer_like_90_of_a_normal/)\n\n* **Third**: [Check the FAQ](http://www.reddit.com/r/cscareerquestions/wiki/index)\

In [56]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in ccq_posts.new(limit = 4000)]
[authors.append(x.author) for x in ccq_posts.new(limit = 4000)]
[titles.append(x.title) for x in ccq_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in ccq_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in ccq_posts.new(limit = 4000)]
[scores.append(x.score) for x in ccq_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in ccq_posts.new(limit = 4000)]

# New dataframe
ccq_df = pd.DataFrame()

# Assigning lists to columns
ccq_df['Title'] = titles
ccq_df['Id'] = ids
ccq_df['Text'] = texts 
ccq_df['Author'] = authors
ccq_df['Number of Comments'] = numComments
ccq_df['Number of upvotes'] = scores
ccq_df['Ratio of Upvotes'] = upvoteRatios

# Print head
ccq_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Does the “junior dev pushed to main” trope act...,11akxtx,In terms of software that is developed by a te...,kingcammyg,0,1,1.0
1,30 yo mlops engineer off-work for a year. Need...,11ajgeu,I am a 30yo Research engineer with computer vi...,barelyawake_3am,1,1,1.0
2,Should I be gunning for big tech internship?,11aibrb,Most of my friends are gunning for prestigious...,roninja2,4,1,0.67
3,The salary talk for a position,11ahewk,"So, I have done 4 out of 5 of my interviews fo...",CrunchyAl,6,4,0.84
4,Please help: I’m a veteran and recent grad in ...,11ah4rc,After hundreds of applications no one is inter...,TylerGatsby,48,33,0.86
5,Switch from Front End in failing crypto compan...,11ah1yf,Hi this may be a stupid question and I've been...,SlopDoggo,2,2,1.0
6,Problem solving accountability server.,11agl9o,Hello all! A quick introduction I am a SWE wit...,SwiftlyNarrow,0,0,0.5
7,SQL or JavaScript?,11agkzx,"From zero experience, if I want a career in ei...",LondonChels1,17,0,0.43
8,Do companies check if you have completed all y...,11agh3u,I received an offer that is a new grad role. T...,CSStudentCareer,6,1,1.0
9,What does a job in CS look like? I am becoming...,11agbbs,Hello! I am currently an aerospace engineering...,collegecolloquial,6,0,0.25


In [57]:
ccq_df.shape

(983, 7)

In [58]:
ccq_df.replace('', np.nan, inplace = True)
ccq_df.dropna(inplace = True)
ccq_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Does the “junior dev pushed to main” trope act...,11akxtx,In terms of software that is developed by a te...,kingcammyg,0,1,1.0
1,30 yo mlops engineer off-work for a year. Need...,11ajgeu,I am a 30yo Research engineer with computer vi...,barelyawake_3am,1,1,1.0
2,Should I be gunning for big tech internship?,11aibrb,Most of my friends are gunning for prestigious...,roninja2,4,1,0.67
3,The salary talk for a position,11ahewk,"So, I have done 4 out of 5 of my interviews fo...",CrunchyAl,6,4,0.84
4,Please help: I’m a veteran and recent grad in ...,11ah4rc,After hundreds of applications no one is inter...,TylerGatsby,48,33,0.86
5,Switch from Front End in failing crypto compan...,11ah1yf,Hi this may be a stupid question and I've been...,SlopDoggo,2,2,1.0
6,Problem solving accountability server.,11agl9o,Hello all! A quick introduction I am a SWE wit...,SwiftlyNarrow,0,0,0.5
7,SQL or JavaScript?,11agkzx,"From zero experience, if I want a career in ei...",LondonChels1,17,0,0.43
8,Do companies check if you have completed all y...,11agh3u,I received an offer that is a new grad role. T...,CSStudentCareer,6,1,1.0
9,What does a job in CS look like? I am becoming...,11agbbs,Hello! I am currently an aerospace engineering...,collegecolloquial,6,0,0.25


In [59]:
ccq_df.shape

(982, 7)

In [132]:
ccq_df.to_csv('ccqData.csv')

### Rant! subreddit

The tenth subreddit that I will be collecting and cleaning up posts from is [r/rant](https://www.reddit.com/r/rant/)

In [60]:
# Initializing rant_posts to the subreddit titled "Rant!"
rant_posts = reddit.subreddit('cscareerquestions')

In [61]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people ranting
rant_posts.description

'**Welcome, one and all, to CSCareerQuestions!** \n\nHere we discuss careers in Computer Science, Computer Engineering, Software Engineering, and related fields. Please keep the conversation professional, adhere to the [reddiquette](https://www.reddit.com/wiki/reddiquette), and remember to [READ OUR RULES](/r/cscareerquestions/w/posting_rules).\n\n---\n\n# Discord\n\nCSCQ regular u/Kevincav runs a discord called CS Career Hub. Please check it out for your chatting needs: https://discord.gg/cscareerhub\n\nPlease note that **we, the CSCQ mod team are not in charge of this discord.**\n\n---\n\n#Want to ask a question?\n\n* **First**: [Read the rules](https://www.reddit.com/r/cscareerquestions/wiki/posting_rules)\n\n* **Second**: [Check out this awesome "quick answers to common questions" thread](https://www.reddit.com/r/cscareerquestions/comments/4qurgo/in_which_i_attempt_to_answer_like_90_of_a_normal/)\n\n* **Third**: [Check the FAQ](http://www.reddit.com/r/cscareerquestions/wiki/index)\

In [64]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in rant_posts.new(limit = 4000)]
[authors.append(x.author) for x in rant_posts.new(limit = 4000)]
[titles.append(x.title) for x in rant_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in rant_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in rant_posts.new(limit = 4000)]
[scores.append(x.score) for x in rant_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in rant_posts.new(limit = 4000)]

# New dataframe
rant_df = pd.DataFrame()

# Assigning lists to columns
rant_df['Title'] = titles
rant_df['Id'] = ids
rant_df['Text'] = texts 
rant_df['Author'] = authors
rant_df['Number of Comments'] = numComments
rant_df['Number of upvotes'] = scores
rant_df['Ratio of Upvotes'] = upvoteRatios

# Print head
rant_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Does the “junior dev pushed to main” trope act...,11akxtx,In terms of software that is developed by a te...,kingcammyg,0,1,1.0
1,30 yo mlops engineer off-work for a year. Need...,11ajgeu,I am a 30yo Research engineer with computer vi...,barelyawake_3am,1,1,1.0
2,Should I be gunning for big tech internship?,11aibrb,Most of my friends are gunning for prestigious...,roninja2,4,1,0.67
3,The salary talk for a position,11ahewk,"So, I have done 4 out of 5 of my interviews fo...",CrunchyAl,6,5,1.0
4,Please help: I’m a veteran and recent grad in ...,11ah4rc,After hundreds of applications no one is inter...,TylerGatsby,48,38,0.86
5,Switch from Front End in failing crypto compan...,11ah1yf,Hi this may be a stupid question and I've been...,SlopDoggo,2,2,1.0
6,Problem solving accountability server.,11agl9o,Hello all! A quick introduction I am a SWE wit...,SwiftlyNarrow,0,0,0.5
7,SQL or JavaScript?,11agkzx,"From zero experience, if I want a career in ei...",LondonChels1,17,0,0.43
8,Do companies check if you have completed all y...,11agh3u,I received an offer that is a new grad role. T...,CSStudentCareer,6,1,1.0
9,What does a job in CS look like? I am becoming...,11agbbs,Hello! I am currently an aerospace engineering...,collegecolloquial,6,0,0.25


In [65]:
rant_df.shape

(983, 7)

In [66]:
rant_df.replace('', np.nan, inplace = True)
rant_df.dropna(inplace = True)
rant_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Does the “junior dev pushed to main” trope act...,11akxtx,In terms of software that is developed by a te...,kingcammyg,0,1,1.0
1,30 yo mlops engineer off-work for a year. Need...,11ajgeu,I am a 30yo Research engineer with computer vi...,barelyawake_3am,1,1,1.0
2,Should I be gunning for big tech internship?,11aibrb,Most of my friends are gunning for prestigious...,roninja2,4,1,0.67
3,The salary talk for a position,11ahewk,"So, I have done 4 out of 5 of my interviews fo...",CrunchyAl,6,5,1.0
4,Please help: I’m a veteran and recent grad in ...,11ah4rc,After hundreds of applications no one is inter...,TylerGatsby,48,38,0.86
5,Switch from Front End in failing crypto compan...,11ah1yf,Hi this may be a stupid question and I've been...,SlopDoggo,2,2,1.0
6,Problem solving accountability server.,11agl9o,Hello all! A quick introduction I am a SWE wit...,SwiftlyNarrow,0,0,0.5
7,SQL or JavaScript?,11agkzx,"From zero experience, if I want a career in ei...",LondonChels1,17,0,0.43
8,Do companies check if you have completed all y...,11agh3u,I received an offer that is a new grad role. T...,CSStudentCareer,6,1,1.0
9,What does a job in CS look like? I am becoming...,11agbbs,Hello! I am currently an aerospace engineering...,collegecolloquial,6,0,0.25


In [67]:
rant_df.shape

(982, 7)

In [140]:
rant_df.to_csv('rantData.csv')

### Pittsburgh subreddit

The eleventh subreddit that I will be collecting and cleaning up posts from is [r/pittsburgh](https://www.reddit.com/r/pittsburgh/)

In [68]:
# Initializing rant_posts to the subreddit titled "Pittsburgh"
pgh_posts = reddit.subreddit('pittsburgh')

In [69]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people discussing various things related to Pittsburgh
pgh_posts.description

"This is the front page of Pittsburgh's place on the internet, curated by our community.\n\n[Share](/r/pittsburgh/submit) news, events, and thoughts with/about the Pittsburgh community.\n\nThis is a moderated subreddit. Posts may be removed if they violate /r/pittsburgh [rules](/r/pittsburgh/about/rules)\n\n##### [Search](/r/pittsburgh/search?q=subreddit%3Apittsburgh) and check [FAQ](/r/pittsburgh/wiki/faq) before posting. [Picturesque PittsburghPorn (City Pictures)](/r/PittsburghPorn)!\n___\n\n# [Rules & FAQ](/r/pittsburgh/wiki/faq)\n# [**COMING TO PGH?**](https://www.reddit.com/r/pittsburgh/search?q=pittsburgh+neighborhood+%28moving+OR+visiting%29&restrict_sr=on&include_over_18=on&sort=relevance&t=all) \n# [**JOBS**](/r/pittsburghjobs) \n# [**CLASSIFIEDS**](/r/pittsburghList)\n# [**Good Deeds (give/receive)**](/r/pittsburghgooddeeds)\n[**Recent Comments**](/r/pittsburgh/comments) • [FAQ (editable!)](/r/pittsburgh/wiki/faq) •  [Wiki](/r/pittsburgh/wiki/pages/)-[recent revisions](/r/pi

In [72]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in pgh_posts.new(limit = 4000)]
[authors.append(x.author) for x in pgh_posts.new(limit = 4000)]
[titles.append(x.title) for x in pgh_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in pgh_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in pgh_posts.new(limit = 4000)]
[scores.append(x.score) for x in pgh_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in pgh_posts.new(limit = 4000)]

# New dataframe
pgh_df = pd.DataFrame()

# Assigning lists to columns
pgh_df['Title'] = titles
pgh_df['Id'] = ids
pgh_df['Text'] = texts 
pgh_df['Author'] = authors
pgh_df['Number of Comments'] = numComments
pgh_df['Number of upvotes'] = scores
pgh_df['Ratio of Upvotes'] = upvoteRatios

# Print head
pgh_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,The fall of Pittsburgh Bodegas,11aiqge,I’ve been seeing conversations the last few da...,HarpPgh,5,2,0.75
1,Hope you like bugs.,11aiaol,,ravia,1,0,0.38
2,do they still make these?!,11ai20m,,jermavenus,8,17,0.9
3,Best Pittsburgh area for women in 30s?,11ai1a8,\n\n[View Poll](https://www.reddit.com/poll/11...,ZookeepergameNo1364,5,0,0.14
4,"Umm, can someone explain why the sky is flashi...",11ahstj,,mandlet,35,41,0.86
5,"A loud ""bang""",11ahqr6,Anyone hear a massively loud bang in Swissvale...,Everlucidd,11,8,0.72
6,anyone else's power go out for a couple second...,11ahqbn,,DroningBrightnessAV,21,14,0.79
7,"Activism in Pittsburgh, anti war and Norfolk S...",11agq8z,Hello\n\nI'm a journalist and activist. \n\nWo...,vulpesgato,5,0,0.19
8,Petition for North Park golf course to take te...,11ag517,Got a taste of summer weather but already drea...,EducationalMessage51,5,0,0.41
9,Ow Ou & Ouch,11afjjc,"Why do kids and adults alike say ou, ow, or ou...",kevvypoop313,6,0,0.11


In [73]:
pgh_df.shape

(978, 7)

In [74]:
pgh_df.replace('', np.nan, inplace = True)
pgh_df.dropna(inplace = True)
pgh_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,The fall of Pittsburgh Bodegas,11aiqge,I’ve been seeing conversations the last few da...,HarpPgh,5,2,0.75
3,Best Pittsburgh area for women in 30s?,11ai1a8,\n\n[View Poll](https://www.reddit.com/poll/11...,ZookeepergameNo1364,5,0,0.14
5,"A loud ""bang""",11ahqr6,Anyone hear a massively loud bang in Swissvale...,Everlucidd,11,8,0.72
7,"Activism in Pittsburgh, anti war and Norfolk S...",11agq8z,Hello\n\nI'm a journalist and activist. \n\nWo...,vulpesgato,5,0,0.19
8,Petition for North Park golf course to take te...,11ag517,Got a taste of summer weather but already drea...,EducationalMessage51,5,0,0.41
9,Ow Ou & Ouch,11afjjc,"Why do kids and adults alike say ou, ow, or ou...",kevvypoop313,6,0,0.11
11,Know of any good cooking classes?,11aefk3,Ideally not too expensive (less than $80 a per...,espressodepresso420,8,3,0.71
14,Arsenal Park Renovation Timeline,11ad6tr,Anybody know what the timeline looks like for ...,pyrojoe121,1,0,0.5
15,"Please don’t shame me, but is the County of Al...",11ad13h,I know I have one tax for my home that my escr...,toolatetobeoriginal,14,5,0.72
18,I remember when I could order large pizza & br...,11a94tw,"Oh my goodness, I just ordered Papa John's for...",Cautious-One-7770,66,1,0.51


In [75]:
pgh_df.shape

(624, 7)

In [141]:
pgh_df.to_csv('pghData.csv')

### Broadway

The twelfth subreddit that I will be collecting and cleaning up posts from is [r/broadway](https://www.reddit.com/r/Broadway/)

In [76]:
# Initializing rant_posts to the subreddit titled "Broadway"
bway_posts = reddit.subreddit('broadway')

In [77]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people discussing various things related to Broadway
bway_posts.description

"**Welcome to /r/broadway!**\n\nThis subreddit is dedicated to anything related to Broadway, including shows, music, actors, actresses, etc. If you work on or near Broadway, feel free to do an AMA.\n\nIf you want more information on Broadway, visit these helpful links:\n\n\n* [**List of shows on and off Broadway**](http://www.broadway.com/)\n\n* [**Ratings of shows currently on Broadway**](http://theater.nytimes.com/readersreviews/theater/highlyrated/broadway/index.html)\n\n* [**Broadway news, forums, articles etc.**](http://www.broadwayworld.com)\n\n* [**Cheap Broadway Tickets**](http://www.broadwayforbrokepeople.com)\n\n* [**Broadway Lottery/Rush/SRO Policies**]\n(http://www.playbill.com/article/broadway-rush-lottery-and-standing-room-only-policies-com-116003)\n\nIf you are requesting help with finding songs for performance or auditions, please give us as much information as possible, including: age, gender, voice type and range, a list of other songs in your book, and/or links to au

In [78]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in bway_posts.new(limit = 4000)]
[authors.append(x.author) for x in bway_posts.new(limit = 4000)]
[titles.append(x.title) for x in bway_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in bway_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in bway_posts.new(limit = 4000)]
[scores.append(x.score) for x in bway_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in bway_posts.new(limit = 4000)]

# New dataframe
bway_df = pd.DataFrame()

# Assigning lists to columns
bway_df['Title'] = titles
bway_df['Id'] = ids
bway_df['Text'] = texts 
bway_df['Author'] = authors
bway_df['Number of Comments'] = numComments
bway_df['Number of upvotes'] = scores
bway_df['Ratio of Upvotes'] = upvoteRatios

# Print head
bway_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,bad cinderella is a completely insane show,11akidb,spoilers incoming!!!\n\ni saw bad cinderella o...,yelizabetta,3,2,0.75
1,Baritone/low tenor belting songs? Suggestions?,11aibs2,,theatrekid_anonymous,2,1,0.67
2,“Bad Cinderella” was worse than bad. review be...,11ahzpo,Let me start off by saying: is it the worst sh...,Nice-Jackfruit-9894,5,1,0.56
3,Need some advice on preparing for Broadway trip!,11ahuto,"Hi! I’m going to NYC, for the first time in 10...",vindur_i,5,1,0.67
4,"The Mayor was at Parade tonight, addressing th...",11ahrsi,,katieg1970,6,40,0.92
5,Opening night for the INTO THE WOODS transfer ...,11ahoys,,ghdawg6197,17,49,0.98
6,Remember this one? Was listening to the 13 Goi...,11ah2lj,,Caroline0330,6,5,1.0
7,"Skip ""A Doll's House"" on broadway, and everyth...",11ah20t,"The director of this ""A Doll's House,"" Jamie L...",darthva,4,23,1.0
8,Back to the Future Broadway Marquee Is Up!,11agkif,,Dsvkb,4,12,0.89
9,Aaron Tveit Moulin Rogue,11aggyn,Does anyone know when Aaron Tveit's limited ru...,AnnaliseKeatingStan1,2,0,0.4


In [79]:
bway_df.shape

(983, 7)

In [80]:
bway_df.replace('', np.nan, inplace = True)
bway_df.dropna(inplace = True)
bway_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,bad cinderella is a completely insane show,11akidb,spoilers incoming!!!\n\ni saw bad cinderella o...,yelizabetta,3,2,0.75
2,“Bad Cinderella” was worse than bad. review be...,11ahzpo,Let me start off by saying: is it the worst sh...,Nice-Jackfruit-9894,5,1,0.56
3,Need some advice on preparing for Broadway trip!,11ahuto,"Hi! I’m going to NYC, for the first time in 10...",vindur_i,5,1,0.67
7,"Skip ""A Doll's House"" on broadway, and everyth...",11ah20t,"The director of this ""A Doll's House,"" Jamie L...",darthva,4,23,1.0
9,Aaron Tveit Moulin Rogue,11aggyn,Does anyone know when Aaron Tveit's limited ru...,AnnaliseKeatingStan1,2,0,0.4
10,Tina for non-Tina fan?,11ag63g,I had gotten Tina: Tina Turner musical tickets...,BunnyLuv13,5,1,0.67
12,Sweeney Todd,11ae2q4,Hi everyone! I hope you are all doing well! I ...,calamari04,21,0,0.5
13,Hamilton tour lottery question,11adz9b,I entered the ham4ham lottery in Boston for th...,comefromawayfan2022,2,0,0.2
15,"Hadestown, SLIH or Kimberly Akimbo?",11abj2x,Hi everyone! \nThis is my first ever Reddit po...,plynguis,7,0,0.5
16,First Date Audition Song Recommendations,11ab75n,I am auditioning for Frist Dat at my local the...,icerosequeen06,0,0,0.17


In [81]:
bway_df.shape

(642, 7)

In [142]:
bway_df.to_csv('bwayData.csv')

### Highschool subreddit

The thirteenth subreddit that I will be collecting and cleaning up posts from is [r/highschool](https://www.reddit.com/r/highschool/)

In [82]:
# Initializing hs_posts to the subreddit titled "Highschool"
hs_posts = reddit.subreddit('highschool')

In [83]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people discussing high school
hs_posts.description

"Talk about anything to do with high school.\n\n**Related Subreddits**\n\n/r/studytips\n\n/r/studying\n\nGet better at studying!\n\n/r/SAT  \nFor your assistance in preparation of the SAT\n\n/r/ACT\n\nFor your assistance in preparation of the ACT\n\n\n[/r/Under18](http://www.reddit.com/r/under18)\n\nA subreddit for all those under the age of 18\n\n[/r/APStudents](http://www.reddit.com/r/apstudents)\n\nFor those high-achieving students\n\n[/r/AP_Central](http://www.reddit.com/r/AP_Central)\n\nHelping AP Students excel in their individual classes\n\n/r/applyingtocollege\n\nIt's never too early!\n\n/r/AskHSTeacher\n\nAsk anything you want!\n\n/r/gedready \n\nNeed help with getting your GED?\n\n[/r/HomeworkHelp](http://www.reddit.com/r/homeworkhelp)\n\nA subreddit for help with your homework. \n\n[/r/Tutor](http://www.reddit.com/r/tutor)\n\nA place where a tutor and student can meet.\n\n[/r/Teenagers](http://www.reddit.com/r/teenagers)\n\nA subreddit for actual teenagers. \n\n/r/askredditt

In [84]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in hs_posts.new(limit = 4000)]
[authors.append(x.author) for x in hs_posts.new(limit = 4000)]
[titles.append(x.title) for x in hs_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in hs_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in hs_posts.new(limit = 4000)]
[scores.append(x.score) for x in hs_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in hs_posts.new(limit = 4000)]

# New dataframe
hs_df = pd.DataFrame()

# Assigning lists to columns
hs_df['Title'] = titles
hs_df['Id'] = ids
hs_df['Text'] = texts 
hs_df['Author'] = authors
hs_df['Number of Comments'] = numComments
hs_df['Number of upvotes'] = scores
hs_df['Ratio of Upvotes'] = upvoteRatios

# Print head
hs_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Duolingo XP Bot,11al237,Hello Reddit!\n\nMy name is Pablo and I'm a hi...,ADAN_RAIDEN_BUDA,0,1,1.0
1,Switching back to public school in Texas!!,11akjig,last year some stuff happened at public school...,CasualCartonOfMilk,0,1,1.0
2,stuck,11ajvnu,someone has been talking badly about someone t...,stanningmikan,1,1,1.0
3,my junior yr schedule! any feedback?,11ajb99,,Swimming_Sound6235,1,1,1.0
4,I need to be more popular and I don’t know how,11aj5o7,I say “more” because many people say I am alre...,quinnquack,4,1,1.0
5,Doctor's Note,11ai0e8,* I've been super sick and out of school for w...,Doritoscarfingbunny,0,1,1.0
6,"finally have a 4 year plan properly envision, ...",11afs23,9th grade: (current) (with grades)\n\nEnglish ...,2007erTheSpudFan,2,2,1.0
7,"senioritis. (usa, texas)",11afn2j,i know i’m not alone but i need to hear other ...,Effective-Barber-136,4,4,1.0
8,"Accreditation Question (USA, VA, JUNIOR)",11afmlq,"Lately, people from my school and a teacher ha...",agieuge,0,1,1.0
9,what sciences should I take if i want to go to...,11adgzj,today's the last day to lock my courses for se...,iggnnii,0,1,1.0


In [85]:
hs_df.shape

(975, 7)

In [86]:
hs_df.replace('', np.nan, inplace = True)
hs_df.dropna(inplace = True)
hs_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Duolingo XP Bot,11al237,Hello Reddit!\n\nMy name is Pablo and I'm a hi...,ADAN_RAIDEN_BUDA,0,1,1.0
1,Switching back to public school in Texas!!,11akjig,last year some stuff happened at public school...,CasualCartonOfMilk,0,1,1.0
2,stuck,11ajvnu,someone has been talking badly about someone t...,stanningmikan,1,1,1.0
4,I need to be more popular and I don’t know how,11aj5o7,I say “more” because many people say I am alre...,quinnquack,4,1,1.0
5,Doctor's Note,11ai0e8,* I've been super sick and out of school for w...,Doritoscarfingbunny,0,1,1.0
6,"finally have a 4 year plan properly envision, ...",11afs23,9th grade: (current) (with grades)\n\nEnglish ...,2007erTheSpudFan,2,2,1.0
7,"senioritis. (usa, texas)",11afn2j,i know i’m not alone but i need to hear other ...,Effective-Barber-136,4,4,1.0
8,"Accreditation Question (USA, VA, JUNIOR)",11afmlq,"Lately, people from my school and a teacher ha...",agieuge,0,1,1.0
9,what sciences should I take if i want to go to...,11adgzj,today's the last day to lock my courses for se...,iggnnii,0,1,1.0
10,Nation Wide High School Grant!,11ad6hs,Calling all high school AI clubs! SAILea is of...,TimeEclipser,0,0,0.5


In [87]:
hs_df.shape

(774, 7)

In [143]:
hs_df.to_csv('hsData.csv')

### Medicine subreddit

The fourteenth subreddit that I will be collecting and cleaning up posts from is [r/medicine](https://www.reddit.com/r/medicine/)

In [95]:
# Initializing pol_posts to the subreddit titled "Medicine"
med_posts = reddit.subreddit('medicine')

In [96]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to doctors to talk about medicine
med_posts.description



In [97]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in med_posts.new(limit = 4000)]
[authors.append(x.author) for x in med_posts.new(limit = 4000)]
[titles.append(x.title) for x in med_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in med_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in med_posts.new(limit = 4000)]
[scores.append(x.score) for x in med_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in med_posts.new(limit = 4000)]

# New dataframe
med_df = pd.DataFrame()

# Assigning lists to columns
med_df['Title'] = titles
med_df['Id'] = ids
med_df['Text'] = texts 
med_df['Author'] = authors
med_df['Number of Comments'] = numComments
med_df['Number of upvotes'] = scores
med_df['Ratio of Upvotes'] = upvoteRatios

# Print head
med_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Cochran review on masking seems to suggest no ...,11ak4il,,MarinerBlue,4,0,0.33
1,UPMC Susquehanna Williamsport total power fail...,11ag7wy,No generators kick on. Reportedly completely b...,Mitthrawnuruo,26,185,0.98
2,Interview red/green flags,11aft7s,I’m about halfway done in a one-year palliativ...,killajoule_jewelkill,16,23,0.93
3,Can someone explain physician’s dues to me?,11abbf3,Why do physicians need to pay the hospital for...,spicymemesdotcom,30,66,0.92
4,Wisconsin DSPS medical licensing delays,11a930t,I’ve heard anywhere from 6 months to 1 year. C...,nonjudiciablepeaches,12,27,0.92
5,stand alone e-rx program?,11a50ma,Quanum is shutting down its e-rx. I need to ch...,maydaymayday99,16,39,0.94
6,Best family medicine boards anki deck?,119wscv,Any recommendations? Thank you,bwis311,6,4,0.6
7,"Biweekly Careers Thread: February 23, 2023",119uzet,"Questions about medicine as a career, about wh...",AutoModerator,17,30,0.98
8,Anyone know how to quickly remove hand sanitiz...,119n49b,"After a shift, my hands always have a residue ...",Mohrisbetr,52,23,0.75
9,Books on the history of medicine,119fzb1,I’m looking for books on the general history o...,Adhocfin,44,58,0.98


In [98]:
med_df.shape

(995, 7)

In [99]:
med_df.replace('', np.nan, inplace = True)
med_df.dropna(inplace = True)
med_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
1,UPMC Susquehanna Williamsport total power fail...,11ag7wy,No generators kick on. Reportedly completely b...,Mitthrawnuruo,26,185,0.98
2,Interview red/green flags,11aft7s,I’m about halfway done in a one-year palliativ...,killajoule_jewelkill,16,23,0.93
3,Can someone explain physician’s dues to me?,11abbf3,Why do physicians need to pay the hospital for...,spicymemesdotcom,30,66,0.92
4,Wisconsin DSPS medical licensing delays,11a930t,I’ve heard anywhere from 6 months to 1 year. C...,nonjudiciablepeaches,12,27,0.92
5,stand alone e-rx program?,11a50ma,Quanum is shutting down its e-rx. I need to ch...,maydaymayday99,16,39,0.94
6,Best family medicine boards anki deck?,119wscv,Any recommendations? Thank you,bwis311,6,4,0.6
7,"Biweekly Careers Thread: February 23, 2023",119uzet,"Questions about medicine as a career, about wh...",AutoModerator,17,30,0.98
8,Anyone know how to quickly remove hand sanitiz...,119n49b,"After a shift, my hands always have a residue ...",Mohrisbetr,52,23,0.75
9,Books on the history of medicine,119fzb1,I’m looking for books on the general history o...,Adhocfin,44,58,0.98
10,Rise in animosity between the different levels...,11990c2,I'm currently going through paramedic school a...,CowCatThe3rd,95,146,0.89


In [100]:
med_df.shape

(759, 7)

In [144]:
med_df.to_csv('medData.csv')

### Adulting subreddit

The fifteenth subreddit that I will be collecting and cleaning up posts from is [r/adulting](https://www.reddit.com/r/adulting/)

In [126]:
# Initializing pol_posts to the subreddit titled "Adulting"
ad_posts = reddit.subreddit('adulting')

In [127]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to adulting
ad_posts.description

'**Welcome to /r/Adulting!**\n-\nUrban Dictionary defines adulting as "Doing something grown-up and responsible" and that is what this subreddit is all about. \n\nWhether it is getting an apartment, paying bills in a timely manner, budgeting, getting a job, furthering higher education or anything else responsible, this is the place to talk about it.\n\nWe welcome **all content related to being responsible and put together.** Victories, tips, questions and struggles are all welcome. \n\n**Rules**\n-\n1. **Don\'t be a dick.** - Everyone\'s adulting journey is different and should be respected. Disrespectful / rude comments will be removed.\n2. **No medical advice.** - Do not ask for or provide medical advice. The only correct answer is to ask your doctor. Do *not* post your random bug bites for identification.\n3. **No NSFW content.** - No porn, OnlyFans, FeetFinder, escorts, etc. There\'s 100+ other subs for that. Keep it out of here.\n\n**Related Subreddits**\n- \n- /r/Frugal\n- /r/per

In [128]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in ad_posts.new(limit = 4000)]
[authors.append(x.author) for x in ad_posts.new(limit = 4000)]
[titles.append(x.title) for x in ad_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in ad_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in ad_posts.new(limit = 4000)]
[scores.append(x.score) for x in ad_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in ad_posts.new(limit = 4000)]

# New dataframe
ad_df = pd.DataFrame()

# Assigning lists to columns
ad_df['Title'] = titles
ad_df['Id'] = ids
ad_df['Text'] = texts 
ad_df['Author'] = authors
ad_df['Number of Comments'] = numComments
ad_df['Number of upvotes'] = scores
ad_df['Ratio of Upvotes'] = upvoteRatios

# Print head
ad_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,I miss payless,11ajwpn,"Fellow adults with office jobs, where do you b...",Jazzlike-Student-859,5,1,1.0
1,Wives of girl dads! As an Dad to be of baby gi...,11aaxy0,Curious what it would be? What have you learne...,MtnLion27,2,1,0.67
2,Tired,11aillp,It’s crazy how for me personally I go to bed t...,Top_Wonder6145,3,4,1.0
3,"Who is making good money, living comfortably, ...",11agspj,I have a BA in comm. my background is in film ...,Far_Feeling_1492,5,4,0.83
4,"30 F, so lost.",11agngq,Like a lot of people my 20s were turbulent. I ...,Far_Feeling_1492,14,20,0.92
5,wedding gift questions,11afpe5,I am in a wedding soon (bridesmaid) and am not...,fairyhaus,2,1,1.0
6,Moving twice in a very short span - how should...,11ad944,Hey everyone! I'm not sure where else to ask a...,FeliciaFailure,1,5,0.81
7,Making doctor appointments,11actff,"The GP near my house is so dumb, you can only ...",alasfinallyaname,2,1,0.99
8,Laziness or depression?,11acsss,Does anyone else feel like this?\n\nFor the pa...,alasfinallyaname,17,38,0.94
9,Should military spouses and ex spouses be elig...,11a9xma,I’m just curious. I have never misrepresented ...,Ok_Ad_7966,75,0,0.39


In [129]:
ad_df.shape

(962, 7)

In [130]:
ad_df.replace('', np.nan, inplace = True)
ad_df.dropna(inplace = True)
ad_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,I miss payless,11ajwpn,"Fellow adults with office jobs, where do you b...",Jazzlike-Student-859,5,1,1.0
1,Wives of girl dads! As an Dad to be of baby gi...,11aaxy0,Curious what it would be? What have you learne...,MtnLion27,2,1,0.67
2,Tired,11aillp,It’s crazy how for me personally I go to bed t...,Top_Wonder6145,3,4,1.0
3,"Who is making good money, living comfortably, ...",11agspj,I have a BA in comm. my background is in film ...,Far_Feeling_1492,5,4,0.83
4,"30 F, so lost.",11agngq,Like a lot of people my 20s were turbulent. I ...,Far_Feeling_1492,14,20,0.92
5,wedding gift questions,11afpe5,I am in a wedding soon (bridesmaid) and am not...,fairyhaus,2,1,1.0
6,Moving twice in a very short span - how should...,11ad944,Hey everyone! I'm not sure where else to ask a...,FeliciaFailure,1,5,0.81
7,Making doctor appointments,11actff,"The GP near my house is so dumb, you can only ...",alasfinallyaname,2,1,0.99
8,Laziness or depression?,11acsss,Does anyone else feel like this?\n\nFor the pa...,alasfinallyaname,17,38,0.94
9,Should military spouses and ex spouses be elig...,11a9xma,I’m just curious. I have never misrepresented ...,Ok_Ad_7966,75,0,0.39


In [131]:
ad_df.shape

(823, 7)

In [145]:
ad_df.to_csv('adData.csv')

### Legal Advice subreddit

The sixteenth subreddit that I will be collecting and cleaning up posts from is [r/legaladvice](https://www.reddit.com/r/legaladvice/)

In [107]:
# Initializing pol_posts to the subreddit titled "Legal Advice"
legal_posts = reddit.subreddit('legaladvice')

In [108]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to asking legal questions 
legal_posts.description

"***A place to ask simple legal questions.  Advice here is for informational purposes only and should not be considered final or official advice.  See a local attorney for the best answer to your questions.***\n\n* [**READ OUR RULES**](https://www.reddit.com/r/legaladvice/wiki/index#wiki_general_rules) before posting or commenting.\n\n\n* Get answers to our most common questions, pointers to other sites about the law, and information about finding a lawyer of your own at the [/r/legaladvice wiki](https://www.reddit.com/r/legaladvice/wiki/index).\n\n* See our [list of megathreads](https://www.reddit.com/r/legaladvice/wiki/megathreads) before posting your question.\n\n\n\n\n* For a list of other location-specific legal subreddits, such as the United Kingdom, Ireland, Australia, New Zealand, France, Canada, Mexico, The Netherlands, or the EU [please see here](https://www.reddit.com/r/legaladvice/wiki/index#wiki_other_subreddits). \n\n* For a more relaxed and humorous meta discussion of th

In [110]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[ids.append(x.id) for x in legal_posts.new(limit = 4000)]
[authors.append(x.author) for x in legal_posts.new(limit = 4000)]
[titles.append(x.title) for x in legal_posts.new(limit = 4000)]
[texts.append(x.selftext) for x in legal_posts.new(limit = 4000)]
[numComments.append(x.num_comments) for x in legal_posts.new(limit = 4000)]
[scores.append(x.score) for x in legal_posts.new(limit = 4000)]
[upvoteRatios.append(x.upvote_ratio) for x in legal_posts.new(limit = 4000)]

# New dataframe
legal_df = pd.DataFrame()

# Assigning lists to columns
legal_df['Title'] = titles
legal_df['Id'] = ids
legal_df['Text'] = texts 
legal_df['Author'] = authors
legal_df['Number of Comments'] = numComments
legal_df['Number of upvotes'] = scores
legal_df['Ratio of Upvotes'] = upvoteRatios

# Print head
legal_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Could this be considered “Selling Food Stamps”...,11alnny,I had been asked by a customer if I’d be willi...,ManAlsoMan,0,1,1.0
1,Please help me,11alkcw,"I apologize for the length of this post, ...",k0rum1j0,0,1,1.0
2,Whole family being falsely accused by ex siste...,11ald1n,I want to start of by saying in my culture it'...,philismydunphy,0,0,0.5
3,Rear ended by drunk driver,11al82q,"Hello, I was recently rear ended by a drunk dr...",Worldly_Ad_4866,0,0,0.5
4,Credit Hack? Can I do this?,11al3kt,\n\nLately I've been thinking about how I can...,Revolutionary_You560,1,0,0.5
5,An odd question about use of video footage.,11al1kp,So this is going to be an odd question regardi...,Jackie_Daytona-Human,3,1,1.0
6,I need help figuring out concealed carry laws ...,11al1h9,So for some background Im 18 and im homeless i...,Weak-Lengthiness-660,2,1,0.67
7,Probate Fraud Question,11al04r,"When I was 13 years old, my biological father ...",WhiteWitchFae,0,0,0.5
8,Broad Non-Compete Advice,11akzkk,"Hello,\n\nI’m about to land a new role and it ...",ThrowawayCareer4,2,0,0.5
9,"Help, I refunded $3000 from someone and they a...",11akygi,Not too long ago I was not doing so well menta...,clersidjm,5,0,0.5


In [111]:
legal_df.shape

(990, 7)

In [112]:
legal_df.replace('', np.nan, inplace = True)
legal_df.dropna(inplace = True)
legal_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Could this be considered “Selling Food Stamps”...,11alnny,I had been asked by a customer if I’d be willi...,ManAlsoMan,0,1,1.0
1,Please help me,11alkcw,"I apologize for the length of this post, ...",k0rum1j0,0,1,1.0
2,Whole family being falsely accused by ex siste...,11ald1n,I want to start of by saying in my culture it'...,philismydunphy,0,0,0.5
3,Rear ended by drunk driver,11al82q,"Hello, I was recently rear ended by a drunk dr...",Worldly_Ad_4866,0,0,0.5
4,Credit Hack? Can I do this?,11al3kt,\n\nLately I've been thinking about how I can...,Revolutionary_You560,1,0,0.5
5,An odd question about use of video footage.,11al1kp,So this is going to be an odd question regardi...,Jackie_Daytona-Human,3,1,1.0
6,I need help figuring out concealed carry laws ...,11al1h9,So for some background Im 18 and im homeless i...,Weak-Lengthiness-660,2,1,0.67
7,Probate Fraud Question,11al04r,"When I was 13 years old, my biological father ...",WhiteWitchFae,0,0,0.5
8,Broad Non-Compete Advice,11akzkk,"Hello,\n\nI’m about to land a new role and it ...",ThrowawayCareer4,2,0,0.5
9,"Help, I refunded $3000 from someone and they a...",11akygi,Not too long ago I was not doing so well menta...,clersidjm,5,0,0.5


In [113]:
legal_df.shape

(988, 7)

In [146]:
legal_df.to_csv('legalData.csv')