# Data Collection

**Outline**:

1. [Setting Up](#Setting-Up)

2. [Gaming Subreddit](#Gaming-Subreddit)

3. [Lawyer Talk Subreddit](#Lawyer-Talk-Subreddit)

4. [Cryptocurrency Subreddit](#Cryptocurrency-Subreddit)

5. [NBA Subreddit](#NBA-Subreddit)

6. [College Subreddit](#College-Subreddit)

7. [Explain Like I'm Five Subreddit](#Explain-Like-I'm-Five-Subreddit)

8. [Anime Subreddit](#Anime-Subreddit)

9. [CS Career Questions Subreddit](#CS-Career-Questions-Subreddit)

10. [Rant Subreddit](#Rant-Subreddit)

11. [Pittsburgh Subreddit](#Pittsburgh-Subreddit)

12. [Broadway Subreddit](#Broadway-Subreddit)

13. [Highschool Subreddit](#Highschool-Subreddit)

14. [Medicine Subreddit](#Medicine-Subreddit)

15. [Adulting Subreddit](#Adulting-Subreddit)

16. [Legal Advice Subreddit](#Legal-Advice-Subreddit)


In [None]:
[nbviewer]()

## Setting Up

This notebook is dedicated to the initial data collection of my term project. The first thing that I need to do is install PRAW, which is an API wrapper used to scrape data from Reddit. Information regarding PRAW and its documentation can be found [here](https://praw.readthedocs.io/en/stable/).

In [3]:
pip install praw

Note: you may need to restart the kernel to use updated packages.


The next thing that I will do is import all libraries that I need for data collection. The libraries and their uses are as follows:
- **PRAW**: Reddit Reddit API Wrapper that allows users to work with various aspects of Reddit(subreddits, posts, etc.)
- **PANDAS**: Data Analysis library that allows users to easily work with data 
- **NUMPY**: A library for matrices, arrays, and mathematical functions

In [4]:
# This cell contains all of the imports that I will be using
import praw
import pandas as pd
import numpy as np

After importing everything, I can now work with the API to select and clean up data from Reddit.

In [5]:
# This initializes the API with information that is given by the Reddit API when you register a script on their website
pi = open("privateInfo.txt", "r")
privateInfo = []
[privateInfo.append(x) for x in pi.read().split('\n')]
reddit = praw.Reddit(client_id=privateInfo[0], client_secret=privateInfo[1], user_agent=privateInfo[2])

## Gaming Subreddit

For the purposes of my project, I will be analyzing posts from various subreddits on Reddit. The first subreddit that I will be collecting and cleaning up posts from is [r/gaming](https://www.reddit.com/r/gaming/).

In [15]:
# Initializing gamer_posts to the subreddit titled "gaming"
gamer_posts = reddit.subreddit('gaming')

In [16]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to the discussion of all things related to gaming
gamer_posts.description

'**If your submission does not appear, do not delete it. Simply [message the moderators](https://www.reddit.com/message/compose?to=%2Fr%2Fgaming) and ask us to look into it.**\n\n*Do NOT private message or use reddit chat to contact moderators about moderator actions. Only message the team via the link above. Directly messaging individual moderators may result in a temporary ban.*\n\n\n\n---\n#Community Rules\n\n1. **Submissions must be directly gaming-related**, not just a "forced" connection via the title or a caption added to the content.  Note that we do not allow non-gaming meme templates as submissions. **Discussion prompts must be made as text posts.**\n\n\n\n1. No bandwagon/raid/"pass it on" or direct reply posts.\n\n1. No piracy, even "abandonware".\n\n1. Mark your spoilers and NSFW submissions, comments and links. Spoiler tags are `>!X kills Y!<`  . Cosplay posts from content creators who focus primarily is adult content will be removed.\n\n1. No Giveaways / Trades / Contests

In [287]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio) ] for x in gamer_posts.new(limit = 15000)]

# New dataframe
gaming_df = pd.DataFrame()

# Assigning lists to columns
gaming_df['Title'] = titles
gaming_df['Id'] = ids
gaming_df['Text'] = texts
gaming_df['Author'] = authors
gaming_df['Number of Comments'] = numComments
gaming_df['Number of Upvotes'] = scores
gaming_df['Ratio of Upvotes'] = upvoteRatios

# Print head
gaming_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Building bases is so enjoyable in Sons of the ...,11wisal,,DanintheVortex,4,3,0.62
1,Buying games in 2023,11wiqaw,,NotToddHoward,52,195,0.82
2,Least Unhinged Fallout Fan:,11wiocn,,ChemFeind360,0,1,0.54
3,"seriously, why was that never a thing?",11wi4ik,,AlyksTheSage,16,25,0.63
4,So I have been Massacring Dragons lately and t...,11wh7ko,,KNIGHTMARE098,9,0,0.5
5,"Guys, Prime Gaming finally fixed Forspoken!",11wh358,,2o2i,6,6,0.61
6,I made this scene from The Last of Us II out o...,11wh0to,,Squiffybodge,4,54,0.92
7,What?,11wgnk9,,LesbainNaga,28,66,0.95
8,Please help me see PROS/CONS for getting ps5 o...,11wg0u2,**PLEASE READ MY ARGUMENTS!**\n**They are a bi...,NoonYsk,6,0,0.33
9,Anime-Inspired Mobile Games: Why They Fail to ...,11wg0nr,,sillyymood,18,0,0.44


In [288]:
# Printing out the shape, essentially the number of entries and columns
gaming_df.shape

(828, 7)

The next thing that I am going to do is fill all empty values with NaN. After this, I will remove all entries with NaN values. This will significantly reduce the number of entries, which I can account for through adding more entries at a later date if needed. This step will occur for all subreddits that I collect posts from.

In [289]:
gaming_df.replace('', np.nan, inplace = True)
gaming_df.dropna(inplace = True)
gaming_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
8,Please help me see PROS/CONS for getting ps5 o...,11wg0u2,**PLEASE READ MY ARGUMENTS!**\n**They are a bi...,NoonYsk,6,0,0.33
10,I have now finished every mainline Devil May C...,11wfl35,Hello everyone! After years of looking at the ...,-Megaflare-,1,0,0.43
16,How long does it take to get good at games lik...,11we5gd,Or any of those old games where you pretty muc...,Mister_Sasquatch,8,0,0.43
19,I just can't stay on one game for very long...,11wbfj1,"So I'm a big gamer, ya see? I play a lot of ga...",DarkFluo,35,1,0.55
21,Looking for a game.,11waj5u,"I remember playing this game a while back, it ...",myPikachu12,2,0,0.5
22,What single player game stories do you think c...,11w9o6c,By this I mean....something like the Legacy Of...,Call_It_Luck,13,0,0.36
23,Which is your favourite safehouse/save locatio...,11w6tv1,"A little too open ended, but I think it might ...",Madman1939,6,4,0.7
24,Is steam deck right for me?,11w2zxl,"Hi all,\n\n\nI am considering buying a steam d...",JohnnyBravo655,8,5,0.78
25,question,11w28hn,\n\nso im new to pc gaming and wanted to ask\...,BigOlHaggis_Sco,7,1,0.67
26,What Is More Satisfying?,11w1crk,A new game living up to your hype? \n\nor \n\n...,Aron_Blue,1,3,1.0


In [290]:
gaming_df.shape

(371, 7)

In [291]:
gaming_df.to_csv('gamingData5.csv', mode = 'a')

In [295]:
gamer_posts2 = reddit.subreddit('gaming')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio) ] for x in gamer_posts2.hot(limit=10000)]

# New dataframe
gaming_df2 = pd.DataFrame()

# Assigning lists to columns
gaming_df2['Title'] = titles
gaming_df2['Id'] = ids
gaming_df2['Text'] = texts
gaming_df2['Author'] = authors
gaming_df2['Number of Comments'] = numComments
gaming_df2['Number of Upvotes'] = scores
gaming_df2['Ratio of Upvotes'] = upvoteRatios


In [296]:
gaming_df2.replace('', np.nan, inplace = True)
gaming_df2.dropna(inplace = True)
gaming_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Making Friends Monday! Share your game tags here!,11waa6q,Use this post to look for new friends to game ...,AutoModerator,2,2,0.59
48,"I usually suck at accuracy, but RE4 in VR make...",11vr809,This is for the Quest 2 version.,lemmiewinxs,10,28,0.79
49,I have now finished every mainline Devil May C...,11wfl35,Hello everyone! After years of looking at the ...,-Megaflare-,1,0,0.43
50,Which is your favourite safehouse/save locatio...,11w6tv1,"A little too open ended, but I think it might ...",Madman1939,6,5,0.78
54,Are there any lesser known unreleased games yo...,11w0gyc,"So, we all know there are a ton of BIG game r...",Tazx14,51,8,0.76
55,How long does it take to get good at games lik...,11we5gd,Or any of those old games where you pretty muc...,Mister_Sasquatch,8,0,0.5
57,What are some engaging games that can be playe...,11w40cq,I broke my elbow and am not allowed to hold an...,sgcorona,33,2,0.61
58,Question about Vampire: The Masquerade,11vz7b9,I was looking for old RPG games with a good s...,sebsonion,32,6,0.69
60,Is steam deck right for me?,11w2zxl,"Hi all,\n\n\nI am considering buying a steam d...",JohnnyBravo655,8,5,0.78
61,Please help me see PROS/CONS for getting ps5 o...,11wg0u2,**PLEASE READ MY ARGUMENTS!**\n**They are a bi...,NoonYsk,6,0,0.33


In [297]:
gaming_df2.to_csv('gamingData6.csv')

## Lawyer Talk Subreddit

The second subreddit that I will be collecting and cleaning up posts from is [r/lawyertalk](https://www.reddit.com/r/Lawyertalk/).

In [26]:
# Initializing lawyer_posts to the subreddit titled "lawyertalk"
lawyer_posts = reddit.subreddit('lawyertalk')

In [27]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussion between lawyers
lawyer_posts.description

'This is a place for practicing lawyers to discuss their profession and everything associated with it.  Unlike [/r/law](http://www.reddit.com/r/law/), this is not a place for posting articles or updates about the legal world at large.  Rather, this subreddit is for discussion about lawyering itself.    \n\nBasically, this a great place to:\n\n* Discuss/lament the culture of your firm/non-profit/whatever\n\n* Get advice from other practicing lawyers on anything.\n\n* Vent about issues only other lawyers would find interesting (AEDPA anyone?)\n\n* Post esoteric memes\n\n_______________________________________\n\nRelated Subereddits:\n\n[/r/law](http://www.reddit.com/r/law/) - For discussion about legal news, and law in the abstract\n\n[/r/lawschool](http://www.reddit.com/r/lawschool/) - For discussion about law school\n\n[/r/lawfirm](http://www.reddit.com/r/lawfirm/) - For discussion about solo/small firm practice'

In [28]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in lawyer_posts.new(limit = 5000)]

# New dataframe
lawyer_df = pd.DataFrame()

# Assigning lists to columns
lawyer_df['Title'] = titles
lawyer_df['Id'] = ids
lawyer_df['Text'] = texts
lawyer_df['Author'] = authors
lawyer_df['Number of Comments'] = numComments
lawyer_df['Number of Upvotes'] = scores
lawyer_df['Ratio of Upvotes'] = upvoteRatios

# Print head
lawyer_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Decisions,11vwtr8,I have been working at a work comp and pi firm...,Significant_Hornet78,0,1,1.0
1,how to learn drafting at home as a new advocat...,11vwd2y,,Raj_deep_,0,0,0.5
2,Are you happy?,11vq9oq,"This may seem like a silly question, but I am ...",maplesyrupluv3r,20,9,0.84
3,Advice for leaving litigation for transactiona...,11vpnx8,"After trying my best in several positions, I’m...",Unhappy_Pickle22,10,17,0.95
4,Lemon law,11vjine,Anyone practice lemon law and breach of warran...,sea_screen6314,2,2,0.75
5,Help with considering becoming a lawyer and wh...,11vaz7p,"Hi, I am looking at maybe doing law school and...",No_Inflation_4001,13,0,0.33
6,How do you think AI will change the practice o...,11v6eej,"Hello fellow attorneys,\n\nI'm a criminal lawy...",Half-W,5,1,0.56
7,Lying clients..,11v3xju,I'm getting really tired of clients lying to m...,geshupenst,49,90,0.98
8,What should I expect as a soon-to-be first yea...,11v21m8,"I have an idea of what I will be doing, but as...",Constant_Airline_356,11,5,0.78
9,Desk computer setups?,11uts0z,"Share everything, please. I’ve never had an of...",msteel2015,9,8,1.0


In [29]:
lawyer_df.shape

(885, 7)

In [16]:
lawyer_df.replace('', np.nan, inplace = True)
lawyer_df.dropna(inplace = True)
lawyer_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Legal advice,11akhq9,"So basically on discord, I was in a crypto tra...",Charming_Anteater_73,0,1,1.0
1,Has anyone ever worked with or against Alex Mu...,11ah03n,"If so, what’s he like?",Tracy_Turnblad,1,2,1.0
2,JAG opinion,11aglti,How do you guys feel about JAGs?\n\nnot the tv...,tyrionthedrunk,3,1,1.0
3,Private sub for practicing r/Prosecutors,11aeag7,We recently started a new sub for practicing r...,weirdbeardwolf,1,0,0.33
5,Looking for a quote about lawyers defending gu...,11abm2s,Hello! I heard a quote a while ago about the m...,EarlyInterview1274,3,0,0.33
7,Best SEO companies that get results and charge...,11aauk1,I am working with an SEO company who has gotte...,Fragrant_Self_7104,2,0,0.33
8,Paralegal was rude to me. Am I being unreasona...,11a9u7r,"New hire at an ID firm. Lot of work, lot of pr...",Mission_Ad5628,29,4,0.65
9,Interview?,11a84q5,"Interview? Hello everyone, I'm currently in co...",Then-Poem7543,4,3,0.67
10,Do you generally hate other lawyers?,11a7f7y,"Hate is defined as, you wouldn't totally feel ...",SaltMembership9044,22,1,0.5
11,"In the last 30 hours, I’ve billed 24 of them.",11a5b2z,"Just needed to vent, and also something of a P...",olemiss18,56,117,0.95


In [17]:
lawyer_df.shape

(719, 7)

In [134]:
lawyer_df.to_csv('lawyerData.csv')

In [32]:
lawyer_posts2 = reddit.subreddit('lawyertalk')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in lawyer_posts2.hot(limit = 5000)]

# New dataframe
lawyer_df2 = pd.DataFrame()

# Assigning lists to columns
lawyer_df2['Title'] = titles
lawyer_df2['Id'] = ids
lawyer_df2['Text'] = texts
lawyer_df2['Author'] = authors
lawyer_df2['Number of Comments'] = numComments
lawyer_df2['Number of Upvotes'] = scores
lawyer_df2['Ratio of Upvotes'] = upvoteRatios



In [33]:
lawyer_df2.replace('', np.nan, inplace = True)
lawyer_df2.dropna(inplace = True)
lawyer_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,This is not a sub for requesting legal advice,divoge,"All visitors, please note that this is not a c...",Amicus_Conundrum,6,139,1.0
1,20k Subscribers and Proposed New Rules,wml6ba,"Hi everyone,\n\n**First,** I want to note that...",Amicus_Conundrum,50,87,0.98
2,Advice for leaving litigation for transactiona...,11vpnx8,"After trying my best in several positions, I’m...",Unhappy_Pickle22,10,16,0.91
3,Are you happy?,11vq9oq,"This may seem like a silly question, but I am ...",maplesyrupluv3r,20,8,0.83
4,Lying clients..,11v3xju,I'm getting really tired of clients lying to m...,geshupenst,49,93,0.98
5,Decisions,11vwtr8,I have been working at a work comp and pi firm...,Significant_Hornet78,0,1,1.0
7,Lemon law,11vjine,Anyone practice lemon law and breach of warran...,sea_screen6314,2,2,0.75
8,Are issues with deadlines in discovery normal ...,11us6jh,I'm relatively new to litigation and hit the g...,baesipsa,34,36,0.93
9,What should I expect as a soon-to-be first yea...,11v21m8,"I have an idea of what I will be doing, but as...",Constant_Airline_356,11,5,0.86
10,Desk computer setups?,11uts0z,"Share everything, please. I’ve never had an of...",msteel2015,9,7,0.9


In [34]:
lawyer_df2.to_csv('lawyerData2.csv')

## Cryptocurrency Subreddit

The third subreddit that I will be collecting and cleaning up posts from is [r/cryptocurrency](https://www.reddit.com/r/CryptoCurrency/).

In [35]:
# Initializing cryptocurrency_posts to the subreddit titled "cryptocurrency"
cryptocurrency_posts = reddit.subreddit('cryptocurrency')

In [36]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussion and news about cryptocurrency
cryptocurrency_posts.description



In [277]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in cryptocurrency_posts.new(limit = 10000)]

# New dataframe
cryptocurrency_df = pd.DataFrame()

# Assigning lists to columns
cryptocurrency_df['Title'] = titles
cryptocurrency_df['Id'] = ids
cryptocurrency_df['Text'] = texts 
cryptocurrency_df['Author'] = authors
cryptocurrency_df['Number of Comments'] = numComments
cryptocurrency_df['Number of Upvotes'] = scores
cryptocurrency_df['Ratio of Upvotes'] = upvoteRatios

# Print head
cryptocurrency_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Arbitrum airdrop hype helps zkSync addresses j...,11wiydj,,Squidsoda,2,1,1.0
1,Rattled crypto industry could emerge stronger ...,11wjizv,,Jocogui,12,3,0.8
2,Forbes: Crypto Takeaways From Recent Bank Fail...,11wjhvd,,SoylentYellow05,4,1,1.0
3,We must listen the arguments of those who are ...,11wjepy,Greetings from Portugal.\n\nSo our main goal i...,Not_a__Lawyer,26,5,0.78
4,Dapp to convert all dust?,11wjdg6,"Hey,\nI'm sure I'm like a lot of others in tha...",SpecificTrading,3,3,1.0
5,Interesting divergence with the ALTs?,11wj4un,"Usually, as we all know, when Bitcoin pumps we...",R0Y-BATTY,24,6,1.0
6,"Buying moons price impact, mexc vs sushiswap",11wj4ht,"I just swapped 4.605 eth to moons and got 25,0...",GMEthLoopring,55,15,1.0
7,Binance upcoming Launchpad.,11wiy26,Binance upcoming Launchpad.\n\nSo i have been ...,agus61lll,17,4,0.83
8,The DLT Science Foundation Makes its Public La...,11wivw9,,Perfect_Ability_1190,7,2,0.75
9,How do you respond to people who say that cryp...,11wiulk,This is a very common claim I hear from skepti...,solarsalmon777,83,13,0.84


In [278]:
cryptocurrency_df.shape

(935, 7)

In [279]:
cryptocurrency_df.replace('', np.nan, inplace = True)
cryptocurrency_df.dropna(inplace = True)
cryptocurrency_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
3,We must listen the arguments of those who are ...,11wjepy,Greetings from Portugal.\n\nSo our main goal i...,Not_a__Lawyer,26,5,0.78
4,Dapp to convert all dust?,11wjdg6,"Hey,\nI'm sure I'm like a lot of others in tha...",SpecificTrading,3,3,1.0
5,Interesting divergence with the ALTs?,11wj4un,"Usually, as we all know, when Bitcoin pumps we...",R0Y-BATTY,24,6,1.0
6,"Buying moons price impact, mexc vs sushiswap",11wj4ht,"I just swapped 4.605 eth to moons and got 25,0...",GMEthLoopring,55,15,1.0
7,Binance upcoming Launchpad.,11wiy26,Binance upcoming Launchpad.\n\nSo i have been ...,agus61lll,17,4,0.83
9,How do you respond to people who say that cryp...,11wiulk,This is a very common claim I hear from skepti...,solarsalmon777,83,13,0.84
11,Schrödinger's BTC: It will and will not hit 1 ...,11wimxf,I'm doing this post to show how Arthur Hayes's...,the_Conficker,64,13,0.76
13,What do you believe it was Bitcoin's biggest m...,11wigtr,"Hi fellow Redditors, a few days ago we passed ...",Va3V1ctis,68,18,0.95
15,Hong Kong Monetary Authority works on a regula...,11wicv3,The Hong Kong Monetary Authority is working on...,NEOKnightOne,8,6,1.0
16,Matt Damon is doing another Ad for CDC and Thi...,11wibqa,March 2024 and it's just green dildos all over...,BeingMe007,41,0,0.35


In [280]:
cryptocurrency_df.shape

(449, 7)

In [281]:
cryptocurrency_df.to_csv('cryptoData4.csv')

In [42]:
cryptocurrency_posts2 = reddit.subreddit('cryptocurrency')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in cryptocurrency_posts2.hot(limit = 5000)]

# New dataframe
cryptocurrency_df2 = pd.DataFrame()

# Assigning lists to columns
cryptocurrency_df2['Title'] = titles
cryptocurrency_df2['Id'] = ids
cryptocurrency_df2['Text'] = texts 
cryptocurrency_df2['Author'] = authors
cryptocurrency_df2['Number of Comments'] = numComments
cryptocurrency_df2['Number of Upvotes'] = scores
cryptocurrency_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
cryptocurrency_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Moon Week 37,11t058d,Hello everyone and welcome to Moon Week for ro...,MoonWeek,668,102,0.88
1,"Daily General Discussion - March 19, 2023 (GMT+0)",11v58f9,**Welcome to the Daily General Discussion thre...,CryptoDaily-,5390,82,0.94
2,I went to the supermarket here in Venezuela an...,11vnevt,"Hi guys, as you probably know I'm Venezuelan l...",WorkingLime,630,979,0.85
3,$18 Trillion Is Needed to Get BTC to $1M Withi...,11vh0j1,,RafaelNobre,680,980,0.83
4,Having An Emergency Fund Is One Of The Most Im...,11vqd67,I rarely hear people recommend having an emerg...,kirtash93,172,87,0.8
5,Caroline Ellison made $6 million at Alameda – ...,11vmpic,,bingorunner,190,136,0.86
6,I just realized i lost about 900$,11vw1ps,"I was kinda active at one time in here, moved ...",KermitTheFrogo01,88,30,0.86
7,"Bullish Trends are Highly Manipulated, Says Cr...",11vmxvj,,JayReyd,300,82,0.72
8,Scam Warning: We have seen dozens of Arbitrum ...,11vvp9z,"With many big crypto events, scammers will be ...",CryptoMaximalist,70,29,0.97
9,What’s your expectation of cryptocurrency in 2...,11vqh35,I have a few: \n\n- The top 100 marketcap cryp...,genjitenji,119,38,0.84


In [43]:
cryptocurrency_df2.replace('', np.nan, inplace = True)
cryptocurrency_df2.dropna(inplace = True)
cryptocurrency_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
0,Moon Week 37,11t058d,Hello everyone and welcome to Moon Week for ro...,MoonWeek,668,102,0.88
1,"Daily General Discussion - March 19, 2023 (GMT+0)",11v58f9,**Welcome to the Daily General Discussion thre...,CryptoDaily-,5390,82,0.94
2,I went to the supermarket here in Venezuela an...,11vnevt,"Hi guys, as you probably know I'm Venezuelan l...",WorkingLime,630,979,0.85
4,Having An Emergency Fund Is One Of The Most Im...,11vqd67,I rarely hear people recommend having an emerg...,kirtash93,172,87,0.8
6,I just realized i lost about 900$,11vw1ps,"I was kinda active at one time in here, moved ...",KermitTheFrogo01,88,30,0.86
8,Scam Warning: We have seen dozens of Arbitrum ...,11vvp9z,"With many big crypto events, scammers will be ...",CryptoMaximalist,70,29,0.97
9,What’s your expectation of cryptocurrency in 2...,11vqh35,I have a few: \n\n- The top 100 marketcap cryp...,genjitenji,119,38,0.84
10,"Trust me, babe, I will use the pullout method ...",11vk3si,"Listen, babe, everyone knows that using protec...",SenseiRaheem,157,84,0.69
11,"If people ask me how bitcoin works, I have fou...",11vobdz,1. First method is just don't do it. Don't bot...,AverageLiberalJoe,94,45,0.82
13,Coinbase is looking for a new overseas headqua...,11uycvy,**The company is seeking a new platform for cr...,plug_and_pray,480,1190,0.94


In [44]:
cryptocurrency_df2.to_csv('cryptoData2.csv')

## NBA Subreddit

The fourth subreddit that I will be collecting and cleaning up posts from is [r/nba](https://www.reddit.com/r/nba/).

In [6]:
# Initializing sports_posts to the subreddit titled "nba"
sports_posts = reddit.subreddit('nba')

In [7]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussion about nfl
sports_posts.description

'#Submit\r\n\r\n|||||\r\n|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|\r\n|[Posting Rules/Guidelines](/r/nba/wiki/rules)|\r\n\r\n||||||||||||||\r\n|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|\r\n|[Filters](/2o2g15/)|[ds](http://ds.reddit.com/r/nba/?limit=100#ds)|[nw](http://nw.reddit.com/r/nba/?limit=100#nw)|[rm](http://rm.reddit.com/r/nba/?limit=100#rm)|[tt](http://tt.reddit.com/r/nba/?limit=100#tt)|[rb](http://rb.reddit.com/r/nba/?limit=100#rb)|[mt](http://mt.reddit.com/r/nba/?limit=200#mt)|[gt](http://gt.reddit.com/r/nba/?limit=100#gt)|[mm](http://mm.reddit.com/r/nba/?limit=100#mm)|[ns](http://ns.reddit.com/r/nba/#ns)|[hl](http://hl.reddit.com/r/nba/?limit=100#hl)|[am](http://am.reddit.com/r/nba/?limit=100#am)|[su](http://su.reddit.com/r/nba/?limit=100#su)|\r\n\r\n\r\n||\r\n|---|\r\n||\r\n\r\n\r\n>* [](//www.reddit.com)\r\n>* [](//www.twitter.com/nba_reddit "@NBA_Reddit")\r\n>* [](//www.discord.gg/rnba "discord")\r\n\r\n> * **CHI** 10

In [15]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in sports_posts.new(limit = 20000)]

# New dataframe
sports_df = pd.DataFrame()

# Assigning lists to columns
sports_df['Title'] = titles
sports_df['Id'] = ids
sports_df['Text'] = texts 
sports_df['Author'] = authors
sports_df['Number of Comments'] = numComments
sports_df['Number of upvotes'] = scores
sports_df['Ratio of Upvotes'] = upvoteRatios

# Print head
sports_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Name a player on your team that has done a lot...,11xr7wo,Title. I want to learn more about what players...,Vince_-,7,1,0.67
1,[Charania] Hall of Famer Willis Reed has passe...,11xqq07,,DRAZZILB1424,20,106,0.98
2,Daily Discussion Thread + Game Thread Index,11xqmb9,"# Game Threads Index (March 21, 2023):\n\n|Tip...",NBA_MOD,1,1,1.0
3,Suppose there’s a 3 way trade where Giannis go...,11xqjfs,I’d say the Bucks become worse because they’re...,Ziawn,42,3,0.58
4,[Buha] Darvin Ham wouldn’t confirm that Anthon...,11xqj91,,iksnet,12,13,0.93
5,Willis Reed has passed away at the age of 80. ...,11xpqdc,,HokageEzio,13,146,0.99
6,Raptors' Scottie Barnes day-to-day with wrist ...,11xppbr,,CazOnReddit,8,8,0.9
7,Who makes the All Defense teams as of today?,11xpczz,"For me: \n\n1st Team: Jrue, OG, Giannis, JJJ, ...",DrOzmitazBuckshank,91,12,0.7
8,"If you were an elite East team, of the possibl...",11xpb8t,Let's assume the seeding 1-5 has the same 5 te...,allknowerofknowing,33,2,0.67
9,Willis Reed has sadly passed away at the age o...,11xp8mk,"[Just received word that Willis Reed, 80, pass...",Unique-Warning7798,35,266,0.99


In [16]:
sports_df.shape

(973, 7)

In [17]:
sports_df.replace('', np.nan, inplace = True)
sports_df.dropna(inplace = True)
sports_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Name a player on your team that has done a lot...,11xr7wo,Title. I want to learn more about what players...,Vince_-,7,1,0.67
2,Daily Discussion Thread + Game Thread Index,11xqmb9,"# Game Threads Index (March 21, 2023):\n\n|Tip...",NBA_MOD,1,1,1.0
3,Suppose there’s a 3 way trade where Giannis go...,11xqjfs,I’d say the Bucks become worse because they’re...,Ziawn,42,3,0.58
7,Who makes the All Defense teams as of today?,11xpczz,"For me: \n\n1st Team: Jrue, OG, Giannis, JJJ, ...",DrOzmitazBuckshank,91,12,0.7
8,"If you were an elite East team, of the possibl...",11xpb8t,Let's assume the seeding 1-5 has the same 5 te...,allknowerofknowing,33,2,0.67
9,Willis Reed has sadly passed away at the age o...,11xp8mk,"[Just received word that Willis Reed, 80, pass...",Unique-Warning7798,35,266,0.99
10,[Goodwill] Does race play into MVP voting? The...,11xoryn,[Tweet](https://twitter.com/vincegoodwill/stat...,lopea182,241,36,0.55
11,Why did Curry have such a big drop off in steals?,11xom0v,He averaged 2.1 steals per game during his mvp...,_Pho-Dac-Biet_,18,9,0.76
12,When was the last time a coach as good and you...,11xoeq2,I feel like it’s overlooked how odd and unusua...,SqueakyRadish,15,8,0.72
13,[Malcolm Brogdon] I think Joker is probably No...,11xo355,Malcolm Brogdon for Bally Sports:\n\n**Q: Niko...,Wonderful-Balance711,100,291,0.83


In [18]:
sports_df.shape

(649, 7)

In [19]:
sports_df.to_csv('sportsData3.csv')

In [12]:
sports_posts2 = reddit.subreddit('nba')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in sports_posts2.hot(limit = 10000)]

# New dataframe
sports_df2 = pd.DataFrame()

# Assigning lists to columns
sports_df2['Title'] = titles
sports_df2['Id'] = ids
sports_df2['Text'] = texts 
sports_df2['Author'] = authors
sports_df2['Number of Comments'] = numComments
sports_df2['Number of upvotes'] = scores
sports_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
sports_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Daily Discussion Thread + Game Thread Index,11xqmb9,"# Game Threads Index (March 21, 2023):\n\n|Tip...",NBA_MOD,1,1,1.0
1,[SERIOUS NEXT DAY THREAD] Post-Game Discussion...,11xepk6,"Here is a place to have in depth, x's and o's,...",NBA_MOD,24,20,0.88
2,[Logan Murdock] Jaylen Brown on remaining in B...,11xm8ax,[Source](https://www.theringer.com/nba/2023/3/...,AashyLarry,620,1858,0.96
3,[Charania] There is optimism that Minnesota Ti...,11xhyt6,,DRAZZILB1424,196,1642,0.98
4,"[Domantas Sabonis] ""The goal was to change the...",11xj060,Domantas Sabonis in a recent interview for ESP...,Wonderful-Balance711,84,1169,0.98
5,[TalkBasket] Klay Thompson after facing Rocket...,11xkkc8,> Asked in postgame about his mentality headin...,JohnLemonnn69,151,912,0.96
6,[The Ringer] “[KD] and JT are friends. They wa...,11xjanp,,Wonderful-Balance711,372,944,0.95
7,[Highlight] Coby White leaves Embiids ankles i...,11x1m2d,,_coed_,640,10107,0.92
8,"[Damichael Cole] Dillon Brooks has lost $248,2...",11xmlcn,Dillon Brooks has lost about as much money thr...,Unique-Warning7798,126,365,0.93
9,[NBA Stats] Domantas Sabonis is on pace to bec...,11xkf4l,,Wonderful-Balance711,32,436,0.97


In [13]:
sports_df2.replace('', np.nan, inplace = True)
sports_df2.dropna(inplace = True)
sports_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Daily Discussion Thread + Game Thread Index,11xqmb9,"# Game Threads Index (March 21, 2023):\n\n|Tip...",NBA_MOD,1,1,1.0
1,[SERIOUS NEXT DAY THREAD] Post-Game Discussion...,11xepk6,"Here is a place to have in depth, x's and o's,...",NBA_MOD,24,20,0.88
2,[Logan Murdock] Jaylen Brown on remaining in B...,11xm8ax,[Source](https://www.theringer.com/nba/2023/3/...,AashyLarry,620,1858,0.96
4,"[Domantas Sabonis] ""The goal was to change the...",11xj060,Domantas Sabonis in a recent interview for ESP...,Wonderful-Balance711,84,1169,0.98
5,[TalkBasket] Klay Thompson after facing Rocket...,11xkkc8,> Asked in postgame about his mentality headin...,JohnLemonnn69,151,912,0.96
8,"[Damichael Cole] Dillon Brooks has lost $248,2...",11xmlcn,Dillon Brooks has lost about as much money thr...,Unique-Warning7798,126,365,0.93
10,[Spears] “I want bigger things for my wife and...,11xk6f0,"> Josh Hart and his wife, Shannon, dated in hi...",iksnet,75,361,0.97
12,[Malcolm Brogdon] I think Joker is probably No...,11xo355,Malcolm Brogdon for Bally Sports:\n\n**Q: Niko...,Wonderful-Balance711,91,222,0.82
14,Old social media posts show that Austin Reaves...,11wzu96,Austin Reaves was a former LeBron Hater nephew...,2789334,825,7120,0.9
15,Willis Reed has sadly passed away at the age o...,11xp8mk,"[Just received word that Willis Reed, 80, pass...",Unique-Warning7798,26,174,0.99


In [14]:
sports_df2.to_csv('sportsData2.csv')

## College Subreddit

The fifth subreddit that I will be collecting and cleaning up posts from is [r/college](https://www.reddit.com/r/college/).

In [53]:
# Initializing cryptocurrency_posts to the subreddit titled "college"
college_posts = reddit.subreddit('college')

In [54]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to discussions about various aspects of college
college_posts.description

'#####Please see our [rules](https://www.reddit.com/r/college/about/rules/) before posting here.\n\n/r/college is a place for discussion related to college and collegiate life.\n\nTo maintain the quality of the discourse, we remove some types of content and ban users for certain violations of community norms. *Help the mods improve this subreddit/enforce these rules by reporting posts that are irrelevant, pointless, or of poor quality.*\n\n**Behavior that will result in a user ban:**\n\n1. **Posting spam** - including but not limited to SURVEYS, blog posts, links to low quality/crowdsourced websites, discord, copypasta, etc.\n2. **Seeking personal gain** – including but not limited to referrals, contests/giveaways, requests for votes/money, any attempt to sell or advertise a product/service/website, etc.\n3. **Engaging in threats or harassment.** \n4. **Advocating for dangerous or illegal activities** – including but not limited to cheating, copyright violation, fraud, etc. \n5. **Spre

In [262]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in college_posts.new(limit = 10000)]

# New dataframe
college_df = pd.DataFrame()

# Assigning lists to columns
college_df['Title'] = titles
college_df['Id'] = ids
college_df['Text'] = texts 
college_df['Author'] = authors
college_df['Number of Comments'] = numComments
college_df['Number of upvotes'] = scores
college_df['Ratio of Upvotes'] = upvoteRatios

# Print head
college_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,characteristics of those who make six figures ...,11wjclz,what are some characteristics that manifest a ...,tfamidoingwrong,0,1,0.99
1,Didn’t realize my English class assigned work ...,11wixoz,All of my classes (in person and online) - wer...,grateful_todd,0,2,1.0
2,Question regarding undergraduate education in ...,11wiix6,I have the following options to choose from fo...,Wanderer_Of_Space,1,2,1.0
3,Comp sci internships/jobs,11wih0n,I'm having hard time finding internships/jobs ...,CelebrationNo9442,1,1,1.0
4,Scholarships transfer student,11wifrr,Im planning on transferring out of my cc next ...,katquest11,0,1,1.0
5,1 major and 3 minors ?!or double major,11wi20c,Hi i am 21 I am planning on majoring in mathem...,AmalQWEEN,17,1,1.0
6,How often do you pull all nighters?,11whorv,Incoming engineering freshman here. I was jus...,tellgoldtomahwak,10,3,1.0
7,Chico State Admission,11whl41,How rigorous is it?,Dyatlov69,0,0,0.5
8,Is it wrong to not let someone join your group,11wh1nr,So I'm taking a few classes at community colle...,hiddenconcert,1,1,1.0
9,Is it too late to start over?,11wg3b9,I (20f) am currently getting a bachelors degr...,mycatisatux,0,2,1.0


In [263]:
college_df.shape

(965, 7)

In [264]:
college_df.replace('', np.nan, inplace = True)
college_df.dropna(inplace = True)
college_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,characteristics of those who make six figures ...,11wjclz,what are some characteristics that manifest a ...,tfamidoingwrong,0,1,0.99
1,Didn’t realize my English class assigned work ...,11wixoz,All of my classes (in person and online) - wer...,grateful_todd,0,2,1.0
2,Question regarding undergraduate education in ...,11wiix6,I have the following options to choose from fo...,Wanderer_Of_Space,1,2,1.0
3,Comp sci internships/jobs,11wih0n,I'm having hard time finding internships/jobs ...,CelebrationNo9442,1,1,1.0
4,Scholarships transfer student,11wifrr,Im planning on transferring out of my cc next ...,katquest11,0,1,1.0
5,1 major and 3 minors ?!or double major,11wi20c,Hi i am 21 I am planning on majoring in mathem...,AmalQWEEN,17,1,1.0
6,How often do you pull all nighters?,11whorv,Incoming engineering freshman here. I was jus...,tellgoldtomahwak,10,3,1.0
7,Chico State Admission,11whl41,How rigorous is it?,Dyatlov69,0,0,0.5
8,Is it wrong to not let someone join your group,11wh1nr,So I'm taking a few classes at community colle...,hiddenconcert,1,1,1.0
9,Is it too late to start over?,11wg3b9,I (20f) am currently getting a bachelors degr...,mycatisatux,0,2,1.0


In [265]:
college_df.shape

(950, 7)

In [266]:
college_df.to_csv('collegeData3.csv')

In [65]:
college_posts2 = reddit.subreddit('college')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in college_posts2.hot(limit = 5000)]

# New dataframe
college_df2 = pd.DataFrame()

# Assigning lists to columns
college_df2['Title'] = titles
college_df2['Id'] = ids
college_df2['Text'] = texts 
college_df2['Author'] = authors
college_df2['Number of Comments'] = numComments
college_df2['Number of upvotes'] = scores
college_df2['Ratio of Upvotes'] = upvoteRatios


In [66]:
college_df2.replace('', np.nan, inplace = True)
college_df2.dropna(inplace = True)

In [67]:
college_df2.to_csv('collegeData2.csv')

## Explain Like I'm Five Subreddit

The sixth subreddit that I will be collecting and cleaning up posts from is [r/explainlikeimfive](https://www.reddit.com/r/explainlikeimfive/).

In [190]:
# Initializing cryptocurrency_posts to the subreddit titled "Explain Like I'm Five"
elif_posts = reddit.subreddit('explainlikeimfive')

In [191]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people asking for concepts to be explained very simply for them
elif_posts.description

"[Request an explanation](/r/explainlikeimfive/submit?selftext=true&title=ELI5%3A)\n\n[Rules](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules#)\n\n\n---\n\n[Have an idea to improve ELI5?  r/IdeasForELI5](http://www.reddit.com/r/ideasforeli5)\n\n---\n\n###Before posting \n\n* Make sure to [ read the rules!](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules)\n\n* This subreddit is for asking for objective explanations. It is not a repository for any question you may have.\n\n* E is for Explain - merely answering a question is not enough.\n\n* LI5 means friendly, simplified and layperson-accessible explanations - not responses aimed at literal five-year-olds.\n\n* Perform a keyword search, you may find good explanations in past threads. You should also consider looking for your question in the FAQ.\n\n* Don't post to argue a point of view.\n\n\n* Flair your question after you've submitted it.\n\n---\n###Category filters\n\n---\n[Mathematics](https://ma.reddit.c

In [252]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in elif_posts.new(limit = 10000)]

# New dataframe
elif_df = pd.DataFrame()

# Assigning lists to columns
elif_df['Title'] = titles
elif_df['Id'] = ids
elif_df['Text'] = texts 
elif_df['Author'] = authors
elif_df['Number of Comments'] = numComments
elif_df['Number of upvotes'] = scores
elif_df['Ratio of Upvotes'] = upvoteRatios

# Print head
elif_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,ELI5: What makes objects feel softer or harder?,11wj4ew,Partially inspired by the recent post asking a...,TsukiSora,0,1,1.0
1,ELI5 What is 5G network slicing?,11wiehn,,sakshiiidee,0,0,0.5
2,ELI5 War with Iraq began based on the presumpt...,11wh2om,,MetalGuru94,24,0,0.48
3,Eli5: What's the difference between a pump sho...,11wh1fn,,Big_carrot_69,14,0,0.44
4,"ELI5: If weapons like flamethrowers, poison ga...",11wgnda,,arson1tez,10,0,0.4
5,ELI5: How do high voltage power line insulator...,11wgksp,On large power line towers there are large ins...,gone270,6,15,0.95
6,ELI5 What does it mean to “absorb the cost”?,11wfbi4,Example question being “If discounts or specia...,R3dF0r3,11,2,0.63
7,ELI5: Why are EV chargers restricted to chargi...,11wer5x,I recently got a charger that can charge via s...,McStroyer,14,9,0.74
8,ELI5: Why do electric cars have a shorter rang...,11wej9s,Given that the the overall tyre circumference ...,6425,6,0,0.5
9,ELI5 Haberdasher vs. Tailor vs. Hatter,11wegzt,In the US is seems like haberdasheries made me...,falcon3251,2,0,0.44


In [253]:
elif_df.shape

(992, 7)

In [254]:
elif_df.replace('', np.nan, inplace = True)
elif_df.dropna(inplace = True)
elif_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,ELI5: What makes objects feel softer or harder?,11wj4ew,Partially inspired by the recent post asking a...,TsukiSora,0,1,1.0
5,ELI5: How do high voltage power line insulator...,11wgksp,On large power line towers there are large ins...,gone270,6,15,0.95
6,ELI5 What does it mean to “absorb the cost”?,11wfbi4,Example question being “If discounts or specia...,R3dF0r3,11,2,0.63
7,ELI5: Why are EV chargers restricted to chargi...,11wer5x,I recently got a charger that can charge via s...,McStroyer,14,9,0.74
8,ELI5: Why do electric cars have a shorter rang...,11wej9s,Given that the the overall tyre circumference ...,6425,6,0,0.5
9,ELI5 Haberdasher vs. Tailor vs. Hatter,11wegzt,In the US is seems like haberdasheries made me...,falcon3251,2,0,0.44
11,ELI5 How does 6-pack abs form?,11wdewd,Also why are there only 6 and why are the form...,Taimo-kun,13,11,0.65
12,"ELI5: If Neanderthals were a separate species,...",11wd7yw,Cross-breeding species almost never produces a...,superblinky,52,75,0.83
14,ELI5: Are second stages of deep sleep helpful ...,11wbour,ELI5: If I wake in the night I often go into a...,CalorieStar,5,0,0.5
15,ELI5 Why so many people say ChatGPT is not dat...,11wa7ua,I just don’t get it why some hate it while oth...,Ok-Brother-1055,5,0,0.5


In [255]:
elif_df.shape

(594, 7)

In [256]:
elif_df.to_csv('elifData4.csv')

In [73]:
elif_posts2 = reddit.subreddit('explainlikeimfive')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in elif_posts2.hot(limit = 5000)]

# New dataframe
elif_df2 = pd.DataFrame()

# Assigning lists to columns
elif_df2['Title'] = titles
elif_df2['Id'] = ids
elif_df2['Text'] = texts 
elif_df2['Author'] = authors
elif_df2['Number of Comments'] = numComments
elif_df2['Number of upvotes'] = scores
elif_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
elif_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,ELI5 is looking for new moderators!,11o5bp8,"Hi everyone,\n\nELI5 is looking for new modera...",ELI5_Modteam,28,18,0.76
1,Bots and AI generated answers on r/explainlike...,zh922v,"Recently, there's been a surge in ChatGPT gene...",ELI5_Modteam,432,2237,0.96
2,ELI5: Newton's Third Law of Motion,11vkt94,"Newton's Third Law of Motion states that ""Ever...",USA_Ball,125,662,0.9
3,Eli5: how have supply chains not recovered ove...,11uxffz,"I understand how they got delayed initially, b...",ernirn,1904,9686,0.95
4,ELI5: why do card readers say to remove card “...,11v8kho,,knguyen2525,270,1501,0.93
5,eli5: how are the floppy-wobbly kung fu swords...,11vmr8e,,msawrlz,86,153,0.82
6,"Eli5: If I have 2% neanderthal genes, can we s...",11vsgum,Edit: The number of generations cannot be less...,user0199,34,65,0.81
7,ELI5: Can someone explain what is in catnip th...,11vop71,,Sanjuro7880,16,99,0.86
8,Eli5: How do countries recover from hyper infl...,11vgg1x,Pretty much what it says on the tin. I just se...,chocobobleh,63,101,0.9
9,ELI5: When a band appears on a TV show and it ...,11ujjj3,"What I mean by that is, obviously the guitars ...",docvoit,355,5071,0.93


In [74]:
elif_df2.replace('', np.nan, inplace = True)
elif_df2.dropna(inplace = True)

In [75]:
elif_df2.to_csv('elifData2.csv')

## Anime Subreddit

The seventh subreddit that I will be collecting and cleaning up posts from is [r/anime](https://www.reddit.com/r/anime/).

In [183]:
# Initializing anime_posts to the subreddit titled "anime"
anime_posts = reddit.subreddit('anime')

In [184]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people who like anime
anime_posts.description

'>> [Results of the 2022 /r/anime Awards!](/comments/11c1yg8)\n\n>> [The Start of Winter 2023 Survey Results](/comments/zz5h8q) *See what everyone\'s looking forward to this season!*\n\n>> [Check out the new user flair options!](/comments/zulfds) *Visit https://flair.r-anime.moe to pick one for yourself.*\n\n>> [Spoiler Tag Changes](/comments/q28ulr) *Tag your spoilers, it\'s easier than ever!*\n\n>> [A Quick-Start Guide to /r/anime](/comments/id248i) *Learn all about the subreddit and why CDF is our Roanapur.*\n\n\n> [Rules](/r/anime/wiki/rules)\n\n> [Subreddit wiki](#/)\n*[Wiki index](/r/anime/wiki)\n[FAQ](/r/anime/w/faq)\n[Mods](/r/anime/w/mods)\n[Comment Faces by Category](/r/anime/w/commentfacescategorized)\n[AMAs](/r/anime/w/amas)\n[Events](/r/anime/w/events)\n[Related Subreddits](/r/anime/w/related_subreddits)\n[Related Websites](/r/anime/w/related_sites)\n[Watch This! Archive](/r/anime/w/watchthisarchive)\n[Writing Club](/r/anime/w/writing_archive)\n[All pages](/r/anime/w/pages

In [185]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in anime_posts.new(limit = 5000)]

# New dataframe
anime_df = pd.DataFrame()

# Assigning lists to columns
anime_df['Title'] = titles
anime_df['Id'] = ids
anime_df['Text'] = texts 
anime_df['Author'] = authors
anime_df['Number of Comments'] = numComments
anime_df['Number of upvotes'] = scores
anime_df['Ratio of Upvotes'] = upvoteRatios

# Print head
anime_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,[Do You Remember Love - Macross Franchise 40th...,11vz77h,#**[Macross Delta](https://i.imgur.com/tVIIist...,Shimmering-Sky,30,11,1.0
1,What was the last trully agressive Tsundere in...,11vyygv,I used to love to watch anime that the main lo...,KirinoSussy,11,0,0.5
2,How to get lots of exercise in as an anime fan,11vyybb,,DontThrowAwayPies,0,0,0.33
3,Gamotan! Ribbitan! Gamotan! Ribbitan! [Occulti...,11vxznk,,NyaaPower,0,2,1.0
4,True courage is not planned! [Dragon Quest: Da...,11vxu4f,,unaviable,1,6,1.0
5,[Rewatch] Uma Musume: Pretty Derby Season 2 Ep...,11vxjgp,Season 2 Episode 2: Never Gonna Give it up!\n\...,Tetraika,38,15,0.94
6,【Watashi no Yuri wa Oshigoto desu！】Character P...,11vx47q,,inspyral,0,10,0.92
7,‘KONOSUBA - An Explosion on This Wonderful Wor...,11vw51d,,MarvelsGrantMan136,8,74,0.95
8,A look at Yuka in Blue Period,11vvzfd,"This is a video I made about Yuka, a non-gende...",shaggyjebus,0,1,0.6
9,McDonald's x Suzume collab advertisment.,11vuscp,,martinsallai666,31,248,0.96


In [186]:
anime_df.shape

(875, 7)

In [187]:
anime_df.replace('', np.nan, inplace = True)
anime_df.dropna(inplace = True)
anime_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,[Do You Remember Love - Macross Franchise 40th...,11vz77h,#**[Macross Delta](https://i.imgur.com/tVIIist...,Shimmering-Sky,30,11,1.0
1,What was the last trully agressive Tsundere in...,11vyygv,I used to love to watch anime that the main lo...,KirinoSussy,11,0,0.5
5,[Rewatch] Uma Musume: Pretty Derby Season 2 Ep...,11vxjgp,Season 2 Episode 2: Never Gonna Give it up!\n\...,Tetraika,38,15,0.94
8,A look at Yuka in Blue Period,11vvzfd,"This is a video I made about Yuka, a non-gende...",shaggyjebus,0,1,0.6
12,[FMAB] Thе problem of the scene of Miles and E...,11vu7cz,I'm talking about the moment in the story when...,Dioduo,4,0,0.5
13,The Familiar of Zero Lent In Violent Easter Re...,11vu7aa,Hello everyone! I am Holofan4life.\n\nWelcome ...,Holofan4life,4,2,0.67
14,What Have You Watched This Past Week That is N...,11vu718,Title says it all - talk about the anime you w...,MetaThPr4h,54,17,0.95
15,[Gintama 2023 Rewatch - Discussion] - Week 11(...,11vu6jp,# [Welcome to eleventh weekly discussion of Gi...,Shocketheth,64,23,0.96
16,How people felt about CloverWorks in 2021 vs. ...,11vtrnl,"So, in 2021, most of the anime community reall...",thealienredditor,15,0,0.38
17,Mou Ippon! • Ippon again! - Episode 11 discussion,11vrmop,"*Mou Ippon!*, episode 11\n\n\n\n# [Rate this e...",AutoLovepon,14,61,0.91


In [188]:
anime_df.shape

(593, 7)

In [189]:
anime_df.to_csv('animeData3.csv')

In [85]:
anime_posts2 = reddit.subreddit('anime')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in anime_posts2.hot(limit = 5000)]

# New dataframe
anime_df2 = pd.DataFrame()

# Assigning lists to columns
anime_df2['Title'] = titles
anime_df2['Id'] = ids
anime_df2['Text'] = texts 
anime_df2['Author'] = authors
anime_df2['Number of Comments'] = numComments
anime_df2['Number of upvotes'] = scores
anime_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
anime_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,"Anime Questions, Recommendations, and Discussi...",11vh9ih,This is a daily megathread for general chatter...,AnimeMod,127,23,0.88
1,Seeking Feedback for /r/anime's New User Exper...,11v7wbv,Hello everyone!\n\nWe're back with another fee...,AnimeMod,124,53,0.86
2,“Atelier Ryza” Anime Announced (Teaser Visual),11vl8yg,,zenzen_0,320,6723,0.99
3,/r/anime Karma Ranking & Discussion | Week 11 ...,11vkss2,,Abysswatcherbel,214,984,0.98
4,"""THE iDOLM@STER Shiny Colors"" Anime Announced",11vjbyg,,dorkmax_executives,76,1099,0.97
5,“The Elusive Samurai” Teaser Visual,11vnla8,,zenzen_0,50,457,0.98
6,'3-nen Z-gumi Ginpachi-sensei' (Gintama Spin-O...,11vjjr4,,RobotiSC,105,765,0.97
7,"Benriya Saitou-san, Isekai ni Iku • Handyman S...",11vmxtp,"*Benriya Saitou-san, Isekai ni Iku*, episode 1...",AutoLovepon,93,393,0.98
8,Kaiju No. 8 - New Character Visual,11viefm,,MarvelsGrantMan136,82,632,0.97
9,French Animation Legend confirms break from hi...,11vbihs,,Psych-roxx,62,1452,0.97


In [86]:
anime_df2.replace('', np.nan, inplace = True)
anime_df2.dropna(inplace = True)
anime_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,"Anime Questions, Recommendations, and Discussi...",11vh9ih,This is a daily megathread for general chatter...,AnimeMod,127,23,0.88
1,Seeking Feedback for /r/anime's New User Exper...,11v7wbv,Hello everyone!\n\nWe're back with another fee...,AnimeMod,124,53,0.86
7,"Benriya Saitou-san, Isekai ni Iku • Handyman S...",11vmxtp,"*Benriya Saitou-san, Isekai ni Iku*, episode 1...",AutoLovepon,93,393,0.98
10,Kyokou Suiri Season 2 • In/Spectre Season 2 - ...,11vnqib,"*Kyokou Suiri Season 2*, episode 11\n\n\n\n# [...",AutoLovepon,48,194,0.96
14,Mou Ippon! • Ippon again! - Episode 11 discussion,11vrmop,"*Mou Ippon!*, episode 11\n\n\n\n# [Rate this e...",AutoLovepon,11,52,0.95
18,Kami-tachi ni Hirowareta Otoko Season 2 • By t...,11vktpi,"*Kami-tachi ni Hirowareta Otoko Season 2*, epi...",AutoLovepon,18,76,0.88
19,Blue Lock - Episode 23 discussion,11utvjx,"*Blue Lock*, episode 23\n\n\n\n# [Rate this ep...",AutoLovepon,433,1958,0.97
20,"Best OP/ED of 2022 AnimeBracket: Round 4, Group A",11vouw7,"Edit: Things are getting short, so here are di...",Wuff_the_Dog,22,28,0.89
22,D4DJ All Mix - Episode 11 discussion,11vptvu,"*D4DJ All Mix*, episode 11\n\nAlternative name...",AutoLovepon,4,23,0.89
23,"Ijiranaide, Nagatoro-san 2nd Attack • Don't To...",11uu9xn,"*Ijiranaide, Nagatoro-san 2nd Attack*, episode...",AutoLovepon,216,1435,0.96


In [87]:
anime_df2.to_csv('animeData2.csv')

## CS Career Questions Subreddit

The eighth subreddit that I will be collecting and cleaning up posts from is [r/cscareerquestions](https://www.reddit.com/r/cscareerquestions/)

In [88]:
# Initializing ccq_posts to the subreddit titled "CS Career Questions"
ccq_posts = reddit.subreddit('cscareerquestions')

In [89]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people asking CS career questions
ccq_posts.description

'**Welcome, one and all, to CSCareerQuestions!** \n\nHere we discuss careers in Computer Science, Computer Engineering, Software Engineering, and related fields. Please keep the conversation professional, adhere to the [reddiquette](https://www.reddit.com/wiki/reddiquette), and remember to [READ OUR RULES](/r/cscareerquestions/w/posting_rules).\n\n---\n\n# Discord\n\nCSCQ regular u/Kevincav runs a discord called CS Career Hub. Please check it out for your chatting needs: https://discord.gg/cscareerhub\n\nPlease note that **we, the CSCQ mod team are not in charge of this discord.**\n\n---\n\n#Want to ask a question?\n\n* **First**: [Read the rules](https://www.reddit.com/r/cscareerquestions/wiki/posting_rules)\n\n* **Second**: [Check out this awesome "quick answers to common questions" thread](https://www.reddit.com/r/cscareerquestions/comments/4qurgo/in_which_i_attempt_to_answer_like_90_of_a_normal/)\n\n* **Third**: [Check the FAQ](http://www.reddit.com/r/cscareerquestions/wiki/index)\

In [90]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in ccq_posts.new(limit = 5000)]

# New dataframe
ccq_df = pd.DataFrame()

# Assigning lists to columns
ccq_df['Title'] = titles
ccq_df['Id'] = ids
ccq_df['Text'] = texts 
ccq_df['Author'] = authors
ccq_df['Number of Comments'] = numComments
ccq_df['Number of upvotes'] = scores
ccq_df['Ratio of Upvotes'] = upvoteRatios

# Print head
ccq_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Elective for CS student,11vxo8j,I realize this could be asked in csMajors subr...,pignoob123,0,0,0.5
1,"Do use softwares like ""Anydesk"" as remote worker?",11vxnkz,"Hey , iam wondering if you guys use any softwa...",Single-Sound-1865,0,0,0.5
2,Cogent Info Tech?,11vx77c,Hey guys! I’ll make this post short but my bro...,chimbucket,0,1,1.0
3,I got call for assessment from one my dream co...,11vx2ap,One of the job requirements was “Your responsi...,mmddev,0,2,1.0
4,Is AI Development more fun than Software Devel...,11vv11k,ChatGPT and DallE are probably the coolest pie...,CsInquirer,4,0,0.33
5,What is your WFH schedule ?,11vukdv,"Fresher here. My whole team follows wfh, recen...",Smarty_guy7,6,7,0.89
6,Summer after internship,11vugob,So after 2 years of the CS grind I landed a su...,ATG_076,2,0,0.5
7,Really conflicted between ML/CS PhD vs. fully ...,11vudmc,Crossposting from r/AskAcademia \- thought I c...,6ottle,1,1,1.0
8,Career switch into Software Development,11vtlig,"Hi everyone, \n\nI'm a 38M, and I'm in the mid...",-smee-is-me-,6,1,1.0
9,I am underqualified for my job. Should I quit?,11vt5sa,"I have been struggling a lot in this job, but ...",Momo-cupcakes,10,1,0.6


In [91]:
ccq_df.shape

(991, 7)

In [92]:
ccq_df.replace('', np.nan, inplace = True)
ccq_df.dropna(inplace = True)
ccq_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Elective for CS student,11vxo8j,I realize this could be asked in csMajors subr...,pignoob123,0,0,0.5
1,"Do use softwares like ""Anydesk"" as remote worker?",11vxnkz,"Hey , iam wondering if you guys use any softwa...",Single-Sound-1865,0,0,0.5
2,Cogent Info Tech?,11vx77c,Hey guys! I’ll make this post short but my bro...,chimbucket,0,1,1.0
3,I got call for assessment from one my dream co...,11vx2ap,One of the job requirements was “Your responsi...,mmddev,0,2,1.0
4,Is AI Development more fun than Software Devel...,11vv11k,ChatGPT and DallE are probably the coolest pie...,CsInquirer,4,0,0.33
5,What is your WFH schedule ?,11vukdv,"Fresher here. My whole team follows wfh, recen...",Smarty_guy7,6,7,0.89
6,Summer after internship,11vugob,So after 2 years of the CS grind I landed a su...,ATG_076,2,0,0.5
7,Really conflicted between ML/CS PhD vs. fully ...,11vudmc,Crossposting from r/AskAcademia \- thought I c...,6ottle,1,1,1.0
8,Career switch into Software Development,11vtlig,"Hi everyone, \n\nI'm a 38M, and I'm in the mid...",-smee-is-me-,6,1,1.0
9,I am underqualified for my job. Should I quit?,11vt5sa,"I have been struggling a lot in this job, but ...",Momo-cupcakes,10,1,0.6


In [59]:
ccq_df.shape

(982, 7)

In [96]:
ccq_df.to_csv('ccqData.csv')

In [93]:
ccq_posts2 = reddit.subreddit('cscareerquestions')

# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in ccq_posts2.hot(limit = 5000)]

# New dataframe
ccq_df2 = pd.DataFrame()

# Assigning lists to columns
ccq_df2['Title'] = titles
ccq_df2['Id'] = ids
ccq_df2['Text'] = texts 
ccq_df2['Author'] = authors
ccq_df2['Number of Comments'] = numComments
ccq_df2['Number of upvotes'] = scores
ccq_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
ccq_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,"Big N Discussion - March 19, 2023",11ve46y,Please use this thread to have discussions abo...,CSCQMods,7,5,0.73
1,"Daily Chat Thread - March 19, 2023",11ve5o1,"Please use this thread to chat, have casual di...",CSCQMods,0,1,0.6
2,Is it acceptable to do lunch 12-1pm at work? A...,11voie0,Asking as a new grad who is trying to understa...,TheCockatoo,214,225,0.74
3,Number of Open Tech Jobs has increased for 2 c...,11vqmgd,https://www.trueup.io/job-trend\n\nThis is a f...,TheCopyPasteLife,46,95,0.89
4,How to enforce good practices in my workplace?,11viy3c,"My team doesn't enforce good practices, and my...",Old-Fennel9061,58,149,0.91
5,What offer should I take?,11vps2i,CA @ 80k w/ 5k relocation to a beach city 60mi...,NoKarmaHalp,85,45,0.87
6,I guess I should just be happy to have a job i...,11vljaw,\#RANT\_ALERT \nI've been doing shitty work t...,Ok-Branch6704,19,33,0.8
7,What is your WFH schedule ?,11vukdv,"Fresher here. My whole team follows wfh, recen...",Smarty_guy7,6,8,0.9
8,2 YoE mentoring a 1 YoE. Tips?,11vr7au,This is my first job and I've been called a to...,sdePanda,6,7,0.9
9,Why are data engineers paid more than software...,11urc5z,Why is their work considered more valuable tha...,CsInquirer,197,524,0.87


In [94]:
ccq_df2.replace('', np.nan, inplace = True)
ccq_df2.dropna(inplace = True)

In [95]:
ccq_df2.to_csv('ccqData2.csv')

## Rant Subreddit

The ninth subreddit that I will be collecting and cleaning up posts from is [r/rant](https://www.reddit.com/r/rant/)

In [109]:
# Initializing rant_posts to the subreddit titled "Rant!"
rant_posts = reddit.subreddit('rant')

In [110]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people ranting
rant_posts.description



In [267]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in rant_posts.new(limit = 10000)]

# New dataframe
rant_df = pd.DataFrame()

# Assigning lists to columns
rant_df['Title'] = titles
rant_df['Id'] = ids
rant_df['Text'] = texts 
rant_df['Author'] = authors
rant_df['Number of Comments'] = numComments
rant_df['Number of upvotes'] = scores
rant_df['Ratio of Upvotes'] = upvoteRatios

# Print head
rant_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Coworker is adamant that we “don’t need to bab...,11wjdp3,I work in property casualty insurance. As a cl...,Aggravating-Fig8495,0,1,1.0
1,Banks are fucking stupid,11wigld,One day it shows my bank account is current an...,puppyknuckles_,0,1,1.0
2,Chewed out a bully because she was picking on ...,11wieoc,Tl;dr: Regina George wannabe kept picking on s...,somewhatrecalcitrant,2,5,1.0
3,I’m not a good man.,11wibx9,"I’m not a good man, and I’m not capable of rea...",United-Hovercraft-32,0,1,1.0
4,Don’t make kids if you can’t afford taking the...,11whfdy,I’m 25 in a month and this year was supposed t...,Timeishere58,0,0,0.5
5,"I wish a psycho prison warden kidnapped me, fo...",11wh0cr,I have this fantasy where some psycho warden ...,Weak-Sand9779,1,0,0.33
6,"Dear parents, being an art major doesn't mean ...",11wgttb,"Yes they are artist who can do that, but im n...",TheGoldminor,4,3,1.0
7,I'm in constant physical pain and I'm exhausted,11wgnxk,"I (26F) suffer from sciatica and sometimes, th...",KarrieDarling,0,3,1.0
8,My boyfriend completely disregards my feelings...,11wgkay,\n\nI see my bf around once every two weeks. I...,ReporterMaleficent41,3,1,1.0
9,Mid twenties with a bleak future where i'll ne...,11wfbbx,Just have to get this off my chest but lately ...,Giedy5,0,1,0.67


In [268]:
rant_df.shape

(979, 7)

In [269]:
rant_df.replace('', np.nan, inplace = True)
rant_df.dropna(inplace = True)
rant_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Coworker is adamant that we “don’t need to bab...,11wjdp3,I work in property casualty insurance. As a cl...,Aggravating-Fig8495,0,1,1.0
1,Banks are fucking stupid,11wigld,One day it shows my bank account is current an...,puppyknuckles_,0,1,1.0
2,Chewed out a bully because she was picking on ...,11wieoc,Tl;dr: Regina George wannabe kept picking on s...,somewhatrecalcitrant,2,5,1.0
3,I’m not a good man.,11wibx9,"I’m not a good man, and I’m not capable of rea...",United-Hovercraft-32,0,1,1.0
4,Don’t make kids if you can’t afford taking the...,11whfdy,I’m 25 in a month and this year was supposed t...,Timeishere58,0,0,0.5
5,"I wish a psycho prison warden kidnapped me, fo...",11wh0cr,I have this fantasy where some psycho warden ...,Weak-Sand9779,1,0,0.33
6,"Dear parents, being an art major doesn't mean ...",11wgttb,"Yes they are artist who can do that, but im n...",TheGoldminor,4,3,1.0
7,I'm in constant physical pain and I'm exhausted,11wgnxk,"I (26F) suffer from sciatica and sometimes, th...",KarrieDarling,0,3,1.0
8,My boyfriend completely disregards my feelings...,11wgkay,\n\nI see my bf around once every two weeks. I...,ReporterMaleficent41,3,1,1.0
9,Mid twenties with a bleak future where i'll ne...,11wfbbx,Just have to get this off my chest but lately ...,Giedy5,0,1,0.67


In [270]:
rant_df.shape

(934, 7)

In [271]:
rant_df.to_csv('rantData3.csv')

In [116]:
rant_posts2 = reddit.subreddit('rant')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in rant_posts2.hot(limit = 5000)]

# New dataframe
rant_df2 = pd.DataFrame()

# Assigning lists to columns
rant_df2['Title'] = titles
rant_df2['Id'] = ids
rant_df2['Text'] = texts 
rant_df2['Author'] = authors
rant_df2['Number of Comments'] = numComments
rant_df2['Number of upvotes'] = scores
rant_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
rant_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,Its bullshit that my parent's income dictates ...,11voa1t,"I'm 22, have been on my own for four years, an...",thereaintshitcaptain,36,174,0.95
1,what the FUCK.,11v5g59,WHY THE FUCK IS THERE NO CORN DOG EMOJI. I WAN...,doot-doot-doot-doot9,54,382,0.9
2,TikTok should go burn in the depths of Hell wh...,11vr5tl,There is literally no other application or pro...,Dragonceratops,9,14,0.82
3,I fucking hate group works.,11vqz2s,I had a group work in math the other day. The ...,heyitsmelolhaha,2,9,1.0
4,Well fuck me,11vvbnp,"Thanks for the nice email heads-up! \n\n""Here,...",Orimeia,4,3,1.0
5,I really wish there was some medication I coul...,11vuau7,,Kaje26,1,3,1.0
6,It's hard for me to make new friends,11vkbsc,I've just been to a convention recently by mys...,AndlenaRaines,7,8,0.84
7,lets admit that the high prices is more from g...,11um5z4,I have to call complete bs on the high prices....,tragically_,87,861,0.95
8,Retirement is a scam,11vte4r,I put money into my 401k to handle the expense...,safely_beyond_redemp,0,2,1.0
9,"3 months later, my ex proved to be the most an...",11vphb9,"we broke up in august, had absolutely no commu...",VastLiving1302,3,3,0.72


In [117]:
rant_df2.replace('', np.nan, inplace = True)
rant_df2.dropna(inplace = True)

In [118]:
rant_df2.to_csv('rantData2.csv')

## Pittsburgh Subreddit

The tenth subreddit that I will be collecting and cleaning up posts from is [r/pittsburgh](https://www.reddit.com/r/pittsburgh/)

In [119]:
# Initializing rant_posts to the subreddit titled "Pittsburgh"
pgh_posts = reddit.subreddit('pittsburgh')

In [120]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people discussing various things related to Pittsburgh
pgh_posts.description

"This is the front page of Pittsburgh's place on the internet, curated by our community.\n\n[Share](/r/pittsburgh/submit) news, events, and thoughts with/about the Pittsburgh community.\n\nThis is a moderated subreddit. Posts may be removed if they violate /r/pittsburgh [rules](/r/pittsburgh/about/rules)\n\n##### [Search](/r/pittsburgh/search?q=subreddit%3Apittsburgh) and check [FAQ](/r/pittsburgh/wiki/faq) before posting. [Picturesque PittsburghPorn (City Pictures)](/r/PittsburghPorn)!\n___\n\n# [Rules & FAQ](/r/pittsburgh/wiki/faq)\n# [**COMING TO PGH?**](https://www.reddit.com/r/pittsburgh/search?q=pittsburgh+neighborhood+%28moving+OR+visiting%29&restrict_sr=on&include_over_18=on&sort=relevance&t=all) \n# [**JOBS**](/r/pittsburghjobs) \n# [**CLASSIFIEDS**](/r/pittsburghList)\n# [**Good Deeds (give/receive)**](/r/pittsburghgooddeeds)\n[**Recent Comments**](/r/pittsburgh/comments) • [FAQ (editable!)](/r/pittsburgh/wiki/faq) •  [Wiki](/r/pittsburgh/wiki/pages/)-[recent revisions](/r/pi

In [257]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in pgh_posts.new(limit = 10000)]

# New dataframe
pgh_df = pd.DataFrame()

# Assigning lists to columns
pgh_df['Title'] = titles
pgh_df['Id'] = ids
pgh_df['Text'] = texts 
pgh_df['Author'] = authors
pgh_df['Number of Comments'] = numComments
pgh_df['Number of upvotes'] = scores
pgh_df['Ratio of Upvotes'] = upvoteRatios

# Print head
pgh_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,"Pittsburgh at night, on a plane",11wjahh,,JKray5_Reddit,0,1,0.99
1,Anyone know a reputable mechanic in the south ...,11wi893,"I need to replace struts in my car, I bought n...",InvisibleAdmin,2,0,0.33
2,Relocating from Denver - pls help!!,11wi7rw,Hi Burgh Reddit!!! I just matched to residency...,SearchingForPanacea,9,1,0.55
3,Workshop?,11wi6wk,Looking for a place to style and work on a car...,PropertyUnhappy1755,1,1,1.0
4,When was spring in full bloom last year?,11wi5jj,Just realized my short/medium term memory loss...,KSMO,2,2,1.0
5,Clifford “Cliff” Morrison seated at gun repair...,11wg5uu,,Yinzerman1992,2,24,0.93
6,Can we get some love for Nino Bonaccorsi?,11wfl3h,,Dr-Chim-Richolds,4,34,0.85
7,PSA: Someone is throwing stuff at cars on 65,11we8hl,Someone threw something at my car this morning...,chad4359,9,70,0.91
8,What's going on this week? Events/Discussion/C...,11w94zp,Visiting? \nWondering what's happening this w...,AutoModerator,2,0,0.5
9,Thoughts on the Sleepy Hollow Development?,11w87pa,The way I see it is the NIMBYs are opposed to ...,nateroon,6,0,0.18


In [258]:
pgh_df.shape

(983, 7)

In [259]:
pgh_df.replace('', np.nan, inplace = True)
pgh_df.dropna(inplace = True)
pgh_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
1,Anyone know a reputable mechanic in the south ...,11wi893,"I need to replace struts in my car, I bought n...",InvisibleAdmin,2,0,0.33
2,Relocating from Denver - pls help!!,11wi7rw,Hi Burgh Reddit!!! I just matched to residency...,SearchingForPanacea,9,1,0.55
3,Workshop?,11wi6wk,Looking for a place to style and work on a car...,PropertyUnhappy1755,1,1,1.0
4,When was spring in full bloom last year?,11wi5jj,Just realized my short/medium term memory loss...,KSMO,2,2,1.0
7,PSA: Someone is throwing stuff at cars on 65,11we8hl,Someone threw something at my car this morning...,chad4359,9,70,0.91
8,What's going on this week? Events/Discussion/C...,11w94zp,Visiting? \nWondering what's happening this w...,AutoModerator,2,0,0.5
9,Thoughts on the Sleepy Hollow Development?,11w87pa,The way I see it is the NIMBYs are opposed to ...,nateroon,6,0,0.18
10,Pittsburgh short film.,11w86ek,Tip of my tongue hasn't been very helpful so I...,C-WhiteD,0,0,0.5
11,Birthday this weekend and I have no idea where...,11w7616,"Fellow elder millennials, help me out please! ...",a-dizzle-dizzle,9,10,0.73
13,Lost Letter - Retrieved,11w5g7r,This is a long shot. But on my flight to Dalla...,igarcia18,2,1,0.56


In [260]:
pgh_df.shape

(606, 7)

In [261]:
pgh_df.to_csv('pghData3.csv')

In [126]:
pgh_posts2 = reddit.subreddit('pittsburgh')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in pgh_posts.hot(limit = 5000)]

# New dataframe
pgh_df2 = pd.DataFrame()

# Assigning lists to columns
pgh_df2['Title'] = titles
pgh_df2['Id'] = ids
pgh_df2['Text'] = texts 
pgh_df2['Author'] = authors
pgh_df2['Number of Comments'] = numComments
pgh_df2['Number of upvotes'] = scores
pgh_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
pgh_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,What's going on this week? Events/Discussion/C...,11q0n9o,Visiting? \nWondering what's happening this w...,AutoModerator,13,7,0.77
1,"Can we get an ""East Palestine Train Derailment...",1108q17,Some people keep asking questions in good fait...,zappafrank2112,286,580,0.93
2,Caught the Red Line going through Arlington th...,11vk6ne,,Present-Purchase-761,32,575,0.95
3,How does a small Pennsylvania borough get its ...,11vm3py,,Aggravating_Foot_528,2,17,0.83
4,Pirate programs,11vb2m8,,827xxx,2,83,0.96
5,Good warehouse jobs in the area?,11vps2v,Preferably safer warehouses. Looking to move t...,plant-milk,18,7,0.73
6,Pittsburgh’s Strip District attracts 'melting ...,11vvvzo,,Aggravating_Foot_528,5,4,0.7
7,This tunnel in Australia installed green light...,11uto83,,hereforthebeers,69,385,0.96
8,"Men of Pittsburgh, where could one go to buy a...",11vwp1a,Don't own a suit and my buddy is getting marri...,WouldYouKindly1417,10,2,0.75
9,Operator of 8 McDonald's in Pittsburgh region ...,11v6zpl,,oldschoolskater,46,62,0.87


In [127]:
pgh_df2.replace('', np.nan, inplace = True)
pgh_df2.dropna(inplace = True)

In [128]:
pgh_df2.to_csv('pghData2.csv')

## Broadway Subreddit

The eleventh subreddit that I will be collecting and cleaning up posts from is [r/broadway](https://www.reddit.com/r/Broadway/)

In [129]:
# Initializing rant_posts to the subreddit titled "Broadway"
bway_posts = reddit.subreddit('broadway')

In [130]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people discussing various things related to Broadway
bway_posts.description

"**Welcome to /r/broadway!**\n\nThis subreddit is dedicated to anything related to Broadway, including shows, music, actors, actresses, etc. If you work on or near Broadway, feel free to do an AMA.\n\nIf you want more information on Broadway, visit these helpful links:\n\n\n* [**List of shows on and off Broadway**](http://www.broadway.com/)\n\n* [**Ratings of shows currently on Broadway**](http://theater.nytimes.com/readersreviews/theater/highlyrated/broadway/index.html)\n\n* [**Broadway news, forums, articles etc.**](http://www.broadwayworld.com)\n\n* [**Cheap Broadway Tickets**](http://www.broadwayforbrokepeople.com)\n\n* [**Broadway Lottery/Rush/SRO Policies**]\n(http://www.playbill.com/article/broadway-rush-lottery-and-standing-room-only-policies-com-116003)\n\nIf you are requesting help with finding songs for performance or auditions, please give us as much information as possible, including: age, gender, voice type and range, a list of other songs in your book, and/or links to au

In [247]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in bway_posts.new(limit = 10000)]

# New dataframe
bway_df = pd.DataFrame()

# Assigning lists to columns
bway_df['Title'] = titles
bway_df['Id'] = ids
bway_df['Text'] = texts 
bway_df['Author'] = authors
bway_df['Number of Comments'] = numComments
bway_df['Number of upvotes'] = scores
bway_df['Ratio of Upvotes'] = upvoteRatios

# Print head
bway_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,‘Brokeback Mountain’ Adapted For West End Stag...,11wg8kb,,SanaR11,13,43,0.94
1,Help settle an argument...,11wddeg,"Is it correct to use the term ""Broadway"" for s...",OldChemistry8220,39,0,0.5
2,&Juliet SRO + Stage Door Experience,11wctbe,,zck13,1,5,1.0
3,Coming to NYC!,11wci5q,I have just booked flights (from UK) to come t...,smezme2,4,1,1.0
4,Help Me Decide - Show to Pair with Parade,11wb0c8,I want next weekend to be a mind-blowing weeke...,ILoveYourPuppies,5,1,1.0
5,School board bans 'Addams Family' musical over...,11wav0u,,TicoDreams,2,8,0.9
6,Video: CHICAGO's Jinkx Monsoon Speaks Out Agai...,11wa72m,,TicoDreams,1,13,0.81
7,Sweeney Todd felt more religious-ly than I exp...,11w8t8q,There was a song called “God that’s good” and ...,FuzzyAd1627,19,41,0.67
8,Recommendations for mid April?,11w88m3,I'm visiting NY April 14-19 and would love any...,amarama,2,0,0.25
9,I saw White Girl In Danger today,11w7l5x,,RapGamePterodactyl,6,2,1.0


In [248]:
bway_df.shape

(995, 7)

In [249]:
bway_df.replace('', np.nan, inplace = True)
bway_df.dropna(inplace = True)
bway_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
1,Help settle an argument...,11wddeg,"Is it correct to use the term ""Broadway"" for s...",OldChemistry8220,39,0,0.5
3,Coming to NYC!,11wci5q,I have just booked flights (from UK) to come t...,smezme2,4,1,1.0
4,Help Me Decide - Show to Pair with Parade,11wb0c8,I want next weekend to be a mind-blowing weeke...,ILoveYourPuppies,5,1,1.0
7,Sweeney Todd felt more religious-ly than I exp...,11w8t8q,There was a song called “God that’s good” and ...,FuzzyAd1627,19,41,0.67
8,Recommendations for mid April?,11w88m3,I'm visiting NY April 14-19 and would love any...,amarama,2,0,0.25
10,What's best way to get last-minute Broadway ai...,11w6w7k,Strange request I know but I work in Times Squ...,gold_and_diamond,3,0,0.4
13,Camelot or Sweeney Todd,11w6itt,Next month I have a short window where I have ...,gmviking,4,2,0.75
15,White Girl In Danger is what Musical Theater n...,11w5tsf,Saw it and was blown away by the imagination a...,OddAcanthocephala419,8,0,0.46
17,Just saw Six - Amazing!,11w53p7,"Long time lurker, first time poster. Today we ...",Nisi-Marie,3,25,0.96
18,Hamilton Angelica Tour Review,11w4you,"Hello, first review. Today I saw the Hamilton ...",Minirth22,9,7,1.0


In [250]:
bway_df.shape

(629, 7)

In [251]:
bway_df.to_csv('bwayData3.csv')

In [135]:
bway_posts2 = reddit.subreddit('broadway')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in bway_posts2.hot(limit = 5000)]

# New dataframe
bway_df2 = pd.DataFrame()

# Assigning lists to columns
bway_df2['Title'] = titles
bway_df2['Id'] = ids
bway_df2['Text'] = texts 
bway_df2['Author'] = authors
bway_df2['Number of Comments'] = numComments
bway_df2['Number of upvotes'] = scores
bway_df2['Ratio of Upvotes'] = upvoteRatios


In [136]:
bway_df2.replace('', np.nan, inplace = True)
bway_df2.dropna(inplace = True)

In [137]:
bway_df2.to_csv('bwayData2.csv')

## Highschool Subreddit

The twelfth subreddit that I will be collecting and cleaning up posts from is [r/highschool](https://www.reddit.com/r/highschool/)

In [138]:
# Initializing hs_posts to the subreddit titled "Highschool"
hs_posts = reddit.subreddit('highschool')

In [139]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to people discussing high school
hs_posts.description

"Talk about anything to do with high school.\n\n**Related Subreddits**\n\n/r/studytips\n\n/r/studying\n\nGet better at studying!\n\n/r/SAT  \nFor your assistance in preparation of the SAT\n\n/r/ACT\n\nFor your assistance in preparation of the ACT\n\n\n[/r/Under18](http://www.reddit.com/r/under18)\n\nA subreddit for all those under the age of 18\n\n[/r/APStudents](http://www.reddit.com/r/apstudents)\n\nFor those high-achieving students\n\n[/r/AP_Central](http://www.reddit.com/r/AP_Central)\n\nHelping AP Students excel in their individual classes\n\n/r/applyingtocollege\n\nIt's never too early!\n\n/r/AskHSTeacher\n\nAsk anything you want!\n\n/r/gedready \n\nNeed help with getting your GED?\n\n[/r/HomeworkHelp](http://www.reddit.com/r/homeworkhelp)\n\nA subreddit for help with your homework. \n\n[/r/Tutor](http://www.reddit.com/r/tutor)\n\nA place where a tutor and student can meet.\n\n[/r/Teenagers](http://www.reddit.com/r/teenagers)\n\nA subreddit for actual teenagers. \n\n/r/askredditt

In [140]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in hs_posts.new(limit = 5000)]

# New dataframe
hs_df = pd.DataFrame()

# Assigning lists to columns
hs_df['Title'] = titles
hs_df['Id'] = ids
hs_df['Text'] = texts 
hs_df['Author'] = authors
hs_df['Number of Comments'] = numComments
hs_df['Number of upvotes'] = scores
hs_df['Ratio of Upvotes'] = upvoteRatios

# Print head
hs_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,I’m in pain due to my period (have cramps.) Sh...,11vxv6a,Like Toy Story 3 should I ask my dad\n\n[View ...,foxxyfafalove99,0,1,1.0
1,Can't wait to leave my 'friends' behind once I...,11vu3qi,"They've made my life an absolute misery, self ...",sewer_dweller_,1,3,0.72
2,is it alright to be anxious about the future?,11vsh4a,i just don’t know what i wanna do or where i w...,ughflower,4,4,1.0
3,What do I do about my friends I feel so anxious,11vnuwz,"Hi, I'm an 18-year-old currently in high schoo...",Popular-Cup2705,2,4,1.0
4,Which device is better,11vmgb8,"So rn I’m in eighth grade, and next year I’m g...",QuailEmbarrassed420,1,1,1.0
5,"""how to study when you don't want to"" by Maria...",11vkxbi,,SSCharles,0,6,0.8
6,I’m looking forward to graduating from high sc...,11vk32y,I still have to make sure that I have all of m...,foxxyfafalove99,1,8,0.91
7,Research about books!,11viati,"Hello everyone, \n\nI hope this community will...",BootJustice,1,1,1.0
8,Poll,11vepzo,Do you feel more confident with your hair long...,NarenNaren07,1,2,1.0
9,Never had a BF before and there is someone I w...,11veesg,"I'm at a new school, and started about a week ...",Mariiuyu,0,2,1.0


In [141]:
hs_df.shape

(984, 7)

In [142]:
hs_df.replace('', np.nan, inplace = True)
hs_df.dropna(inplace = True)
hs_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,I’m in pain due to my period (have cramps.) Sh...,11vxv6a,Like Toy Story 3 should I ask my dad\n\n[View ...,foxxyfafalove99,0,1,1.0
1,Can't wait to leave my 'friends' behind once I...,11vu3qi,"They've made my life an absolute misery, self ...",sewer_dweller_,1,3,0.72
2,is it alright to be anxious about the future?,11vsh4a,i just don’t know what i wanna do or where i w...,ughflower,4,4,1.0
3,What do I do about my friends I feel so anxious,11vnuwz,"Hi, I'm an 18-year-old currently in high schoo...",Popular-Cup2705,2,4,1.0
4,Which device is better,11vmgb8,"So rn I’m in eighth grade, and next year I’m g...",QuailEmbarrassed420,1,1,1.0
6,I’m looking forward to graduating from high sc...,11vk32y,I still have to make sure that I have all of m...,foxxyfafalove99,1,8,0.91
7,Research about books!,11viati,"Hello everyone, \n\nI hope this community will...",BootJustice,1,1,1.0
8,Poll,11vepzo,Do you feel more confident with your hair long...,NarenNaren07,1,2,1.0
9,Never had a BF before and there is someone I w...,11veesg,"I'm at a new school, and started about a week ...",Mariiuyu,0,2,1.0
10,Advice for making female friends,11vdoic,I’m a male junior in high school currently. I ...,Good_Channel_4484,0,3,1.0


In [143]:
hs_df.shape

(801, 7)

In [143]:
hs_df.to_csv('hsData.csv')

In [144]:
hs_posts2 = reddit.subreddit('highschool')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in hs_posts2.hot(limit = 5000)]

# New dataframe
hs_df2 = pd.DataFrame()

# Assigning lists to columns
hs_df2['Title'] = titles
hs_df2['Id'] = ids
hs_df2['Text'] = texts 
hs_df2['Author'] = authors
hs_df2['Number of Comments'] = numComments
hs_df2['Number of upvotes'] = scores
hs_df2['Ratio of Upvotes'] = upvoteRatios


In [145]:
hs_df2.replace('', np.nan, inplace = True)
hs_df2.dropna(inplace = True)

In [146]:
hs_df2.to_csv('hsData2.csv')

## Medicine Subreddit

The thirteenth subreddit that I will be collecting and cleaning up posts from is [r/medicine](https://www.reddit.com/r/medicine/)

In [147]:
# Initializing pol_posts to the subreddit titled "Medicine"
med_posts = reddit.subreddit('medicine')

In [148]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to doctors to talk about medicine
med_posts.description



In [149]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in med_posts.new(limit = 5000)]

# New dataframe
med_df = pd.DataFrame()

# Assigning lists to columns
med_df['Title'] = titles
med_df['Id'] = ids
med_df['Text'] = texts 
med_df['Author'] = authors
med_df['Number of Comments'] = numComments
med_df['Number of upvotes'] = scores
med_df['Ratio of Upvotes'] = upvoteRatios

# Print head
med_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,If medicine in the future relies on chat gpt w...,11vxqgb,If everyone starts using chat gpt and char gpt...,jay1982k,0,1,1.0
1,Surgical Positions at Veteran's Hospitals. Wha...,11vuyie,"I'm a mid-career surgical subspecialist, fello...",eric-incognito,4,7,0.89
2,Tennessee shuns federal HIV funds,11vuta2,,DarlinThatSmile,3,16,1.0
3,Finish your course of antibiotics or else..,11vtjo9,Finish your course of antibiotics. Do not stop...,PhysicalGazelle2889,13,0,0.13
4,New California bill would protect doctors who ...,11vspkz,,DarlinThatSmile,8,153,0.99
5,Idaho hospital to stop labor and delivery serv...,11vmsy0,,TeaorTisane,122,510,0.98
6,Sublingual Dexmedetomidine in Psych,11vkxq1,Background: am 7-on-7-off overnight pharmacist...,thatoneguyintheback,26,16,0.91
7,Just for fun: how old is the oldest medication...,11v7x0z,This survey was inspired by an LVN who told me...,formerpharmama,121,116,0.97
8,How Melbourne doctor’s challenge to US naturop...,11v7lrs,,retvets,32,331,0.99
9,At what point does a publication become practice?,11v6esn,I’m curious what others do in their practice w...,getridofwires,32,29,0.9


In [150]:
med_df.shape

(995, 7)

In [151]:
med_df.replace('', np.nan, inplace = True)
med_df.dropna(inplace = True)
med_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,If medicine in the future relies on chat gpt w...,11vxqgb,If everyone starts using chat gpt and char gpt...,jay1982k,0,1,1.0
1,Surgical Positions at Veteran's Hospitals. Wha...,11vuyie,"I'm a mid-career surgical subspecialist, fello...",eric-incognito,4,7,0.89
3,Finish your course of antibiotics or else..,11vtjo9,Finish your course of antibiotics. Do not stop...,PhysicalGazelle2889,13,0,0.13
6,Sublingual Dexmedetomidine in Psych,11vkxq1,Background: am 7-on-7-off overnight pharmacist...,thatoneguyintheback,26,16,0.91
7,Just for fun: how old is the oldest medication...,11v7x0z,This survey was inspired by an LVN who told me...,formerpharmama,121,116,0.97
9,At what point does a publication become practice?,11v6esn,I’m curious what others do in their practice w...,getridofwires,32,29,0.9
10,Teplizumab: A very exciting monoclonal antibod...,11uz0mx,Would love to start a discussion/raise awarene...,DrLegoHair,72,249,0.87
11,Xylazine and skin necrosis,11ur68a,It has been reported in the media that xylazin...,SomeRG,53,77,0.98
12,Do Inpatient Podiatrists Clip Nails?,11unohy,My patient asked me if a podiatrist can come a...,Registered-Nurse,154,136,0.97
14,"It's friday, it's Saint Patricks day. Good luc...",11ubhlk,"I hope you got your helmets strapped on, your ...",oilchangefuckup,71,398,0.98


In [152]:
med_df.shape

(766, 7)

In [144]:
med_df.to_csv('medData.csv')

In [153]:
med_posts2 = reddit.subreddit('medicine')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in med_posts2.hot(limit = 5000)]

# New dataframe
med_df2 = pd.DataFrame()

# Assigning lists to columns
med_df2['Title'] = titles
med_df2['Id'] = ids
med_df2['Text'] = texts 
med_df2['Author'] = authors
med_df2['Number of Comments'] = numComments
med_df2['Number of upvotes'] = scores
med_df2['Ratio of Upvotes'] = upvoteRatios

# Print head
med_df2.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,"Biweekly Careers Thread: March 09, 2023",11mpcdz,"Questions about medicine as a career, about wh...",AutoModerator,7,7,1.0
1,Idaho hospital to stop labor and delivery serv...,11vmsy0,,TeaorTisane,122,511,0.98
2,New California bill would protect doctors who ...,11vspkz,,DarlinThatSmile,8,155,0.99
3,Tennessee shuns federal HIV funds,11vuta2,,DarlinThatSmile,3,16,1.0
4,How Melbourne doctor’s challenge to US naturop...,11v7lrs,,retvets,32,332,0.99
5,Surgical Positions at Veteran's Hospitals. Wha...,11vuyie,"I'm a mid-career surgical subspecialist, fello...",eric-incognito,4,6,0.81
6,Sublingual Dexmedetomidine in Psych,11vkxq1,Background: am 7-on-7-off overnight pharmacist...,thatoneguyintheback,26,14,0.86
7,Just for fun: how old is the oldest medication...,11v7x0z,This survey was inspired by an LVN who told me...,formerpharmama,121,112,0.97
8,Teplizumab: A very exciting monoclonal antibod...,11uz0mx,Would love to start a discussion/raise awarene...,DrLegoHair,72,252,0.87
9,If medicine in the future relies on chat gpt w...,11vxqgb,If everyone starts using chat gpt and char gpt...,jay1982k,0,1,1.0


In [None]:
med_df2.replace('', np.nan, inplace = True)
med_df2.dropna(inplace = True)

In [154]:
med_df2.to_csv('medData2.csv')

## Adulting Subreddit

The fourteenth subreddit that I will be collecting and cleaning up posts from is [r/adulting](https://www.reddit.com/r/adulting/)

In [155]:
# Initializing pol_posts to the subreddit titled "Adulting"
ad_posts = reddit.subreddit('adulting')

In [156]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to adulting
ad_posts.description

'**Welcome to /r/Adulting!**\n-\nUrban Dictionary defines adulting as "Doing something grown-up and responsible" and that is what this subreddit is all about. \n\nWhether it is getting an apartment, paying bills in a timely manner, budgeting, getting a job, furthering higher education or anything else responsible, this is the place to talk about it.\n\nWe welcome **all content related to being responsible and put together.** Victories, tips, questions and struggles are all welcome. \n\n**Rules**\n-\n1. **Don\'t be a dick.** - Everyone\'s adulting journey is different and should be respected. Disrespectful / rude comments will be removed.\n2. **No medical advice.** - Do not ask for or provide medical advice. The only correct answer is to ask your doctor. Do *not* post your random bug bites for identification.\n3. **No NSFW content.** - No porn, OnlyFans, FeetFinder, escorts, etc. There\'s 100+ other subs for that. Keep it out of here.\n\n**Related Subreddits**\n- \n- /r/Frugal\n- /r/per

In [162]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in ad_posts.new(limit = 5000)]

# New dataframe
ad_df = pd.DataFrame()

# Assigning lists to columns
ad_df['Title'] = titles
ad_df['Id'] = ids
ad_df['Text'] = texts 
ad_df['Author'] = authors
ad_df['Number of Comments'] = numComments
ad_df['Number of upvotes'] = scores
ad_df['Ratio of Upvotes'] = upvoteRatios

# Print head
ad_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,20 Foods to avoid for patients with chronic ki...,11vxz10,,rockz3r,0,1,1.0
1,Is it worth going university in my mid 20s?,11vxo7a,The degree I’m interested in is Human Resource...,Interesting_Inside22,0,1,0.67
2,What are some non-routine personal care / hygi...,11vxby3,I just wrapped up a very difficult project at ...,throw_somewhere,3,2,1.0
3,Help! How to keep a house clean,11vwxyv,I grew up in a “not so clean” house to put it ...,minmister,4,4,1.0
4,Is it a bad look to ask to go home early at a ...,11vvhqj,Started 4 days ago. The job is easy and so pai...,tylun,5,2,1.0
5,Rent requirement,11vv7y2,Is it possible for another person to pick up r...,Proof-Drama-8310,2,1,1.0
6,how do I clean it up or is the carpet ruined? ...,11vudbz,,Minute_Airport1906,24,8,0.83
7,What speeds should I switch at on a 5 gear man...,11vu9rl,I’ve never driven a manual before and just bou...,LfLnDNat5,8,2,1.0
8,Friends Preferences Dictating Groups Activitie...,11vu0l8,My friend introduced me to another friend of h...,Brilliant-Fall1687,1,2,1.0
9,Advice: should I move in with my boyfriend at ...,11vrcvb,"My boyfriend, 19, and I, 19, have been togethe...",No_Discount_193,21,1,0.6


In [163]:
ad_df.shape

(971, 7)

In [164]:
ad_df.replace('', np.nan, inplace = True)
ad_df.dropna(inplace = True)
ad_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
1,Is it worth going university in my mid 20s?,11vxo7a,The degree I’m interested in is Human Resource...,Interesting_Inside22,0,1,0.67
2,What are some non-routine personal care / hygi...,11vxby3,I just wrapped up a very difficult project at ...,throw_somewhere,3,2,1.0
3,Help! How to keep a house clean,11vwxyv,I grew up in a “not so clean” house to put it ...,minmister,4,4,1.0
4,Is it a bad look to ask to go home early at a ...,11vvhqj,Started 4 days ago. The job is easy and so pai...,tylun,5,2,1.0
5,Rent requirement,11vv7y2,Is it possible for another person to pick up r...,Proof-Drama-8310,2,1,1.0
7,What speeds should I switch at on a 5 gear man...,11vu9rl,I’ve never driven a manual before and just bou...,LfLnDNat5,8,2,1.0
8,Friends Preferences Dictating Groups Activitie...,11vu0l8,My friend introduced me to another friend of h...,Brilliant-Fall1687,1,2,1.0
9,Advice: should I move in with my boyfriend at ...,11vrcvb,"My boyfriend, 19, and I, 19, have been togethe...",No_Discount_193,21,1,0.6
10,I feel like such a grownup 😇,11vra6t,I’m baking a quiche so I don’t have to think a...,DutchieCrochet,10,137,0.99
11,Adulting bitches,11viv4a,Have you ever dressed up for a date and then y...,Nascent_soul05,4,3,0.57


In [165]:
ad_df.shape

(825, 7)

In [145]:
ad_df.to_csv('adData.csv')

In [166]:
ad_posts2 = reddit.subreddit('adulting')

In [167]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in ad_posts2.hot(limit = 5000)]

# New dataframe
ad_df2 = pd.DataFrame()

# Assigning lists to columns
ad_df2['Title'] = titles
ad_df2['Id'] = ids
ad_df2['Text'] = texts 
ad_df2['Author'] = authors
ad_df2['Number of Comments'] = numComments
ad_df2['Number of upvotes'] = scores
ad_df2['Ratio of Upvotes'] = upvoteRatios


In [168]:
ad_df2.replace('', np.nan, inplace = True)
ad_df2.dropna(inplace = True)

In [169]:
ad_df2.to_csv('adData2.csv')

## Legal Advice Subreddit

The fifteenth subreddit that I will be collecting and cleaning up posts from is [r/legaladvice](https://www.reddit.com/r/legaladvice/)

In [170]:
# Initializing pol_posts to the subreddit titled "Legal Advice"
legal_posts = reddit.subreddit('legaladvice')

In [171]:
# Printing out a description of the subreddit. Essentially, it is a subreddit dedicated to asking legal questions 
legal_posts.description

"***A place to ask simple legal questions.  Advice here is for informational purposes only and should not be considered final or official advice.  See a local attorney for the best answer to your questions.***\n\n* [**READ OUR RULES**](https://www.reddit.com/r/legaladvice/wiki/index#wiki_general_rules) before posting or commenting.\n\n\n* Get answers to our most common questions, pointers to other sites about the law, and information about finding a lawyer of your own at the [/r/legaladvice wiki](https://www.reddit.com/r/legaladvice/wiki/index).\n\n* See our [list of megathreads](https://www.reddit.com/r/legaladvice/wiki/megathreads) before posting your question.\n\n\n\n\n* For a list of other location-specific legal subreddits, such as the United Kingdom, Ireland, Australia, New Zealand, France, Canada, Mexico, The Netherlands, or the EU [please see here](https://www.reddit.com/r/legaladvice/wiki/index#wiki_other_subreddits). \n\n* For a more relaxed and humorous meta discussion of th

In [173]:
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in legal_posts.new(limit = 5000)]

# New dataframe
legal_df = pd.DataFrame()

# Assigning lists to columns
legal_df['Title'] = titles
legal_df['Id'] = ids
legal_df['Text'] = texts 
legal_df['Author'] = authors
legal_df['Number of Comments'] = numComments
legal_df['Number of upvotes'] = scores
legal_df['Ratio of Upvotes'] = upvoteRatios

# Print head
legal_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,IP Infringement claim for an App,11vyi7x,"Hi,\n\nI made an app as a hobby a couple of mo...",Fun_Clock_4421,0,1,1.0
1,Identity stolen in CT to get unemployment bene...,11vyeaa,I received a letter to my parents’ home within...,TIFUstorytime,0,1,1.0
2,How do I start my divorce?,11vycr4,My husband and I have grown apart and decided ...,MaleficentDinner1615,1,1,1.0
3,Carbon County Citations,11vyc9z,"Hello, I was driving through Rawlins Wyoming y...",bvrdy,0,1,1.0
4,My Sister is being blackmailed,11vy8s1,My sister (25f) and her abusive boyfriend (26m...,RedeemDaydream,2,1,1.0
5,Solar Farm / Taking?,11vy1wh,Gist: company buys 15 wet acres in a residenti...,404freedom14liberty,8,2,0.75
6,What is the punishment for driving alone with ...,11vy06a,I am sixteen and a half and want to start driv...,Throwaway9888273,11,1,1.0
7,"""Friend"" of mine offered to hold my items whil...",11vxzsy,I saved the text messages where she agreed to ...,That_nerd4,1,1,1.0
8,SCRA question,11vxyii,My husband is active duty and is currently try...,bubbles0034,1,1,1.0
9,Help?,11vxw5n,So I am 16 in the state of iowa. I am wanting ...,Altruistic-Escape836,2,1,1.0


In [174]:
legal_df.shape

(987, 7)

In [175]:
legal_df.replace('', np.nan, inplace = True)
legal_df.dropna(inplace = True)
legal_df.head(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,IP Infringement claim for an App,11vyi7x,"Hi,\n\nI made an app as a hobby a couple of mo...",Fun_Clock_4421,0,1,1.0
1,Identity stolen in CT to get unemployment bene...,11vyeaa,I received a letter to my parents’ home within...,TIFUstorytime,0,1,1.0
2,How do I start my divorce?,11vycr4,My husband and I have grown apart and decided ...,MaleficentDinner1615,1,1,1.0
3,Carbon County Citations,11vyc9z,"Hello, I was driving through Rawlins Wyoming y...",bvrdy,0,1,1.0
4,My Sister is being blackmailed,11vy8s1,My sister (25f) and her abusive boyfriend (26m...,RedeemDaydream,2,1,1.0
5,Solar Farm / Taking?,11vy1wh,Gist: company buys 15 wet acres in a residenti...,404freedom14liberty,8,2,0.75
6,What is the punishment for driving alone with ...,11vy06a,I am sixteen and a half and want to start driv...,Throwaway9888273,11,1,1.0
7,"""Friend"" of mine offered to hold my items whil...",11vxzsy,I saved the text messages where she agreed to ...,That_nerd4,1,1,1.0
8,SCRA question,11vxyii,My husband is active duty and is currently try...,bubbles0034,1,1,1.0
9,Help?,11vxw5n,So I am 16 in the state of iowa. I am wanting ...,Altruistic-Escape836,2,1,1.0


In [176]:
legal_df.shape

(986, 7)

In [177]:
legal_df.to_csv('legalData.csv')

In [178]:
legal_posts2 = reddit.subreddit('legaladvice')
# New lists
titles, texts, ids, authors, numComments, scores, upvoteRatios = [], [], [], [], [], [], []

# Appending various aspects of posts to lists
[[ids.append(x.id), authors.append(x.author), titles.append(x.title), texts.append(x.selftext), numComments.append(x.num_comments), scores.append(x.score),upvoteRatios.append(x.upvote_ratio)] for x in legal_posts2.hot(limit = 5000)]

# New dataframe
legal_df2 = pd.DataFrame()

# Assigning lists to columns
legal_df2['Title'] = titles
legal_df2['Id'] = ids
legal_df2['Text'] = texts 
legal_df2['Author'] = authors
legal_df2['Number of Comments'] = numComments
legal_df2['Number of upvotes'] = scores
legal_df2['Ratio of Upvotes'] = upvoteRatios


In [179]:
legal_df2.replace('', np.nan, inplace = True)
legal_df2.dropna(inplace = True)

In [180]:
legal_df2.to_csv('legalData2.csv')