# Data Organization

Before reading through this notebook, I recommend going through [Data Collection](https://github.com/Data-Science-for-Linguists-2023/For-Reddit-Grammaticality-Analysis/blob/main/notebooks/dataCollection.ipynb) first.

[nbviewer](https://nbviewer.org/github/Data-Science-for-Linguists-2023/For-Reddit-Grammaticality-Analysis/blob/main/notebooks/dataOrganization.ipynb)

**Outline**
1. [Legal Talk](#Legal-Talk)
2. [Adulting](#Adulting)
3. [Medicine](#Medicine)
4. [High School](#High-School)
5. [Broadway](#Broadway)
6. [Pittsburgh](#Pittsburgh)
7. [Rant](#Rant)
8. [CS Career Questions](#CS-Career-Questions)
9. [Anime](#Anime)
10. [Explain Like I'm Five](#Explain-Like-I'm-Five)
11. [College](#College)
12. [NBA](#NBA)
13. [Cryptocurrency](#Cryptocurrency)
14. [Lawyer Talk](#Lawyer-Talk)
15. [Gaming](#Gaming)
16. [Calculations](#Calculations)
17. [Dropping Entries](#Dropping-Entries)
18. [Cleaning Up](#Cleaning-Up)
19. [Final Dataframes](#Final-Dataframes)
20. [New CSV Files](#New-CSV-Files)

## Purpose


The purpose of this notebook is to combine the two .csv files per subreddit and ensure equal posts across all subreddits for analysis purposes. One .csv file corresponds to 'hot' posts and one .csv file corresponds to 'new' posts.

In [1]:
import pandas as pd

In [2]:
# This will keep track of the size of all of the combined csv files
entriesCount = []

## Legal Talk

In [3]:
# Reading in the csv files
legalDataNew = pd.read_csv("../data/legalData.csv")
legalDataHot = pd.read_csv("../data/legalData2.csv")

In [4]:
# Getting the shape
legalDataNew.shape

(986, 8)

In [5]:
legalDataHot.shape

(990, 8)

In [6]:
# Concatinating the dataframes together, to make one big dataframe
legalData = pd.concat([legalDataHot, legalDataNew])
legalData.shape

(1976, 8)

In [7]:
# Dropping duplicates
legalData = legalData.drop_duplicates(keep=False)

In [8]:
# Final shape
legalData.shape

(1962, 8)

In [9]:
# Keeping track of the size of all of the final dataframes
entriesCount.append(legalData.shape[0])

The process outlined above for the legalData is repeated for every subreddit....

## Adulting

In [10]:
adDataNew = pd.read_csv("../data/adData.csv")
adDataHot = pd.read_csv("../data/adData2.csv")

In [11]:
adDataNew.shape

(823, 8)

In [12]:
adDataHot.shape

(847, 8)

In [13]:
adData = pd.concat([adDataHot, adDataNew])
adData.shape

(1670, 8)

In [14]:
adData = adData.drop_duplicates(keep=False)

In [15]:
adData.shape

(1670, 8)

In [16]:
entriesCount.append(adData.shape[0])

## Medicine

In [17]:
medDataNew = pd.read_csv("../data/medData.csv")
medDataHot = pd.read_csv("../data/medData2.csv")

In [18]:
medDataNew.shape

(759, 8)

In [19]:
medDataHot.shape

(973, 8)

In [20]:
medData = pd.concat([medDataHot, medDataNew])
medData.shape

(1732, 8)

In [21]:
medData = medData.drop_duplicates(keep=False)

In [22]:
medData.shape

(1732, 8)

In [23]:
entriesCount.append(medData.shape[0])

## High School

In [24]:
hsDataNew = pd.read_csv("../data/hsData.csv")
hsDataHot = pd.read_csv("../data/hsData2.csv")

In [25]:
hsDataNew.shape

(774, 8)

In [26]:
hsDataHot.shape

(805, 8)

In [27]:
hsData = pd.concat([hsDataHot, hsDataNew])
hsData.shape

(1579, 8)

In [28]:
hsData = hsData.drop_duplicates(keep=False)

In [29]:
hsData.shape

(1579, 8)

In [30]:
entriesCount.append(hsData.shape[0])

## Broadway

In [31]:
bwayDataNew = pd.read_csv("../data/bwayData.csv")
bwayDataNew2 = pd.read_csv("../data/bwayData3.csv")
bwayDataHot = pd.read_csv("../data/bwayData2.csv")

In [32]:
bwayDataNew.shape

(642, 8)

In [33]:
bwayDataHot.shape

(633, 8)

In [34]:
bwayDataNew2.shape

(629, 8)

In [35]:
bwayData = pd.concat([bwayDataHot, bwayDataNew, bwayDataNew2])
bwayData.shape

(1904, 8)

In [36]:
bwayData = bwayData.drop_duplicates(keep=False)

In [37]:
bwayData.shape

(1900, 8)

In [38]:
entriesCount.append(bwayData.shape[0])

## Pittsburgh

In [39]:
pghDataNew = pd.read_csv("../data/pghData.csv")
pghDataNew2 = pd.read_csv("../data/pghData3.csv")
pghDataHot = pd.read_csv("../data/pghData2.csv")

In [40]:
pghDataNew.shape

(624, 8)

In [41]:
pghDataNew2.shape

(606, 8)

In [42]:
pghDataHot.shape

(549, 8)

In [43]:
pghData = pd.concat([pghDataHot, pghDataNew, pghDataNew2])
pghData.shape

(1779, 8)

In [44]:
pghData = pghData.drop_duplicates(keep=False)

In [45]:
pghData.shape

(1775, 8)

In [46]:
entriesCount.append(pghData.shape[0])

## Rant

In [47]:
rantDataNew = pd.read_csv("../data/rantData.csv")
rantDataNew2 = pd.read_csv("../data/rantData3.csv")
rantDataHot = pd.read_csv("../data/rantData2.csv")

In [48]:
rantDataNew.shape

(936, 8)

In [49]:
rantDataNew2.shape

(934, 8)

In [50]:
rantDataHot.shape

(537, 8)

In [51]:
rantData = pd.concat([rantDataHot, rantDataNew, rantDataNew2])
rantData.shape

(2407, 8)

In [52]:
rantData = rantData.drop_duplicates(keep=False)

In [53]:
rantData.shape

(2395, 8)

In [54]:
entriesCount.append(rantData.shape[0])

## CS Career Questions

In [55]:
ccqDataNew = pd.read_csv("../data/ccqData.csv")
ccqDataHot = pd.read_csv("../data/ccqData2.csv")

In [56]:
ccqDataNew.shape

(990, 8)

In [57]:
ccqDataHot.shape

(581, 8)

In [58]:
ccqData = pd.concat([ccqDataHot, ccqDataNew])
ccqData.shape

(1571, 8)

In [59]:
ccqData = ccqData.drop_duplicates(keep=False)

In [60]:
ccqData.shape

(1565, 8)

In [61]:
entriesCount.append(ccqData.shape[0])

## Anime

In [62]:
animeDataNew = pd.read_csv("../data/animeData.csv")
animeDataNew2 = pd.read_csv("../data/animeData3.csv")
animeDataHot = pd.read_csv("../data/animeData2.csv")

In [63]:
animeDataNew.shape

(761, 8)

In [64]:
animeDataNew2.shape

(593, 8)

In [65]:
animeDataHot.shape

(194, 8)

In [66]:
animeData = pd.concat([animeDataHot, animeDataNew, animeDataNew2])
animeData.shape

(1548, 8)

In [67]:
animeData = animeData.drop_duplicates(keep=False)

In [68]:
animeData.shape

(1548, 8)

In [69]:
entriesCount.append(animeData.shape[0])

## Explain Like I'm Five

In [70]:
elifDataNew = pd.read_csv("../data/elifData.csv")
elifDataNew2 = pd.read_csv("../data/elifData3.csv")
elifDataNew3 = pd.read_csv("../data/elifData4.csv")
elifDataHot = pd.read_csv("../data/elifData2.csv")

In [71]:
elifDataNew.shape

(588, 8)

In [72]:
elifDataNew2.shape

(589, 8)

In [73]:
elifDataNew3.shape

(594, 8)

In [74]:
elifDataHot.shape

(316, 8)

In [75]:
elifData = pd.concat([elifDataHot, elifDataNew, elifDataNew2, elifDataNew3])
elifData.shape

(2087, 8)

In [76]:
elifData = elifData.drop_duplicates(keep=False)

In [77]:
elifData.shape

(2077, 8)

In [78]:
entriesCount.append(elifData.shape[0])

## College

In [79]:
collegeDataNew = pd.read_csv("../data/collegeData.csv")
collegeDataHot = pd.read_csv("../data/collegeData2.csv")

In [80]:
collegeDataNew.shape

(934, 8)

In [81]:
collegeDataHot.shape

(749, 8)

In [82]:
collegeData = pd.concat([collegeDataHot, collegeDataNew])
collegeData.shape

(1683, 8)

In [83]:
collegeData = collegeData.drop_duplicates(keep=False)

In [84]:
collegeData.shape

(1683, 8)

In [85]:
entriesCount.append(collegeData.shape[0])

## NBA

In [86]:
sportsDataNew = pd.read_csv("../data/sportsData1.csv")
sportsDataNew2 = pd.read_csv("../data/sportsData3.csv")
sportsDataHot = pd.read_csv("../data/sportsData2.csv")

In [87]:
sportsDataNew.shape

(649, 8)

In [88]:
sportsDataHot.shape

(511, 8)

In [89]:
sportsDataNew2.shape

(649, 8)

In [90]:
sportsData = pd.concat([sportsDataHot, sportsDataNew, sportsDataNew2])
sportsData.shape

(1809, 8)

In [91]:
sportsData = sportsData.drop_duplicates(keep=False)

In [92]:
sportsData.shape

(1809, 8)

In [93]:
entriesCount.append(sportsData.shape[0])

## Cryptocurrency

In [94]:
cryptoDataNew = pd.read_csv("../data/cryptoData.csv")
cryptoDataNew2 = pd.read_csv("../data/cryptoData3.csv")
cryptoDataNew3 = pd.read_csv("../data/cryptoData4.csv")
cryptoDataHot = pd.read_csv("../data/cryptoData2.csv")

In [95]:
cryptoDataNew.shape

(323, 8)

In [96]:
cryptoDataHot.shape

(309, 8)

In [97]:
cryptoDataNew2.shape

(445, 8)

In [98]:
cryptoDataNew3.shape

(449, 8)

In [99]:
cryptoData = pd.concat([cryptoDataHot, cryptoDataNew, cryptoDataNew2, cryptoDataNew3])
cryptoData.shape

(1526, 8)

In [100]:
cryptoData = cryptoData.drop_duplicates(keep=False)

In [101]:
cryptoData.shape

(1526, 8)

In [102]:
entriesCount.append(cryptoData.shape[0])

## Lawyer Talk

In [103]:
lawyerDataNew = pd.read_csv("../data/lawyerData.csv")
lawyerDataHot = pd.read_csv("../data/lawyerData2.csv")

In [104]:
lawyerDataNew.shape

(719, 8)

In [105]:
lawyerDataHot.shape

(799, 8)

In [106]:
lawyerData = pd.concat([lawyerDataHot, lawyerDataNew])
lawyerData.shape

(1518, 8)

In [107]:
lawyerData = lawyerData.drop_duplicates(keep=False)

In [108]:
lawyerData.shape

(1518, 8)

In [109]:
entriesCount.append(lawyerData.shape[0])

## Gaming

In [110]:
gamingDataNew = pd.read_csv("../data/gamingData.csv")
gamingDataNew2 = pd.read_csv("../data/gamingData3.csv")
gamingDataNew3 = pd.read_csv("../data/gamingData4.csv")
gamingDataNew4 = pd.read_csv("../data/gamingData5.csv")
gamingDataHot = pd.read_csv("../data/gamingData2.csv")
gamingDataHot2 = pd.read_csv("../data/gamingData6.csv")

In [111]:
gamingDataNew.shape

(463, 8)

In [112]:
gamingDataNew2.shape

(368, 8)

In [113]:
gamingDataNew3.shape

(368, 8)

In [114]:
gamingDataNew4.shape

(371, 8)

In [115]:
gamingDataHot.shape

(89, 8)

In [116]:
gamingData = pd.concat([gamingDataHot, gamingDataHot2, gamingDataNew, gamingDataNew2, gamingDataNew3, gamingDataNew4])
gamingData.shape

(1742, 8)

In [117]:
gamingData = gamingData.drop_duplicates(keep=False)

In [118]:
gamingData.shape

(1542, 8)

In [119]:
entriesCount.append(gamingData.shape[0])

### Notes

The reason why some of the subreddits have multiple csv files, instead of just 2-3, is because the initial query did not return enough values. As discussed below, I decided that all of the final dataframes will have 1500 rows. However, some of the files did not have enough for 1500 rows which required me to go back into the dataCollection notebook and create more csv files with more posts. gamingData gave me the most trouble, which I hypothesize is due to many of the posts on that subreddit only have images or videos in the main post.

## Calculations

In [120]:
# Show all of the counts
entriesCount

[1962,
 1670,
 1732,
 1579,
 1900,
 1775,
 2395,
 1565,
 1548,
 2077,
 1683,
 1809,
 1526,
 1518,
 1542]

In [121]:
# Ensuring that all 15 subreddits are in the list
len(entriesCount)

15

In [122]:
# Finding the minimum to see where all of the dataframes can be cut off to be the same size
min(entriesCount)

1518

## Dropping Entries

In the part, I drop posts from the dataframes so that they are all 1500 rows.

In [123]:
legalData = legalData[:1500]

In [124]:
adData = adData[:1500]

In [125]:
medData = medData[:1500]

In [126]:
hsData = hsData[:1500]

In [127]:
bwayData = bwayData[:1500]

In [128]:
pghData = pghData[:1500]

In [129]:
rantData = rantData[:1500]

In [130]:
ccqData = ccqData[:1500]

In [131]:
animeData = animeData[:1500]

In [132]:
elifData = elifData[:1500]

In [133]:
collegeData = collegeData[:1500]

In [134]:
sportsData = sportsData[:1500]

In [135]:
cryptoData = cryptoData[:1500]

In [136]:
lawyerData = lawyerData[:1500]

In [137]:
gamingData = gamingData[:1500]

## Cleaning Up

While reading in the data, some weird column was added. I will be removing this column from all of the dataframes.

In [138]:
gamingData = gamingData.drop('Unnamed: 0', axis=1)

In [139]:
laywerData = lawyerData.drop('Unnamed: 0', axis=1)

In [140]:
cryptoData = cryptoData.drop('Unnamed: 0', axis=1)

In [141]:
sportsData = sportsData.drop('Unnamed: 0', axis=1)

In [142]:
collegeData = collegeData.drop('Unnamed: 0', axis=1)

In [143]:
elifData = elifData.drop('Unnamed: 0', axis=1)

In [144]:
animeData = animeData.drop('Unnamed: 0', axis=1)

In [145]:
ccqData = ccqData.drop('Unnamed: 0', axis=1)

In [146]:
rantData = rantData.drop('Unnamed: 0', axis=1)

In [147]:
pghData = pghData.drop('Unnamed: 0', axis=1)

In [148]:
bwayData = bwayData.drop('Unnamed: 0', axis=1)

In [149]:
hsData = hsData.drop('Unnamed: 0', axis=1)

In [150]:
medData = medData.drop('Unnamed: 0', axis=1)

In [151]:
adData = adData.drop('Unnamed: 0', axis=1)

In [152]:
legalData = legalData.drop('Unnamed: 0', axis=1)

## Final Dataframes

Show all of the dataframes, so that all of the columns and size looks correct.

In [153]:
gamingData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
42,Does anyone even like motion blur?,11u1930,Why is it the default? I do not know a single ...,postALEXpress,97,44,0.75
131,Dom decided to take a vacation,11tg2kk,"I decided to replay the gears of War series, ...",herpderpomygerp,3,0,0.42
103,What are some good open eorld driving games to...,11tyztm,Apart frim FH4 and FH5.,isti44,13,1,0.54
280,"Searching for: a game that uses the ""New Game ...",11qiun9,Hi people! I was thinking on playing a game th...,NastyBastard2000,23,6,0.63
295,Need some recommendations...,11pw0p0,"Anyone know of any good space shooters, or gam...",OddUnion6902,19,8,0.84
66,PS5,11uovkv,I've been gifted a PS5 for my birthday 😀 and w...,Wulfbayne1066,5,0,0.35
79,Alien: Isolation - On sale for $7.99 and I’m p...,11uo35i,Any opinions will be welcomed.,X---VIPER---X,132,61,0.77
3,[OC] MMO Region name generator,11w43sl,[https://codesandbox.io/s/mmo-region-generator...,isospeedrix,0,2,0.67
254,Is offline single player games dying because o...,11qlrs7,So is offline single player games dying and wi...,Obvious_Ad5912,24,0,0.2
127,Free Talk Friday!,11tip8y,"Use this post to discuss life, post memes, or ...",AutoModerator,24,8,0.73


In [154]:
laywerData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
98,Judge Says This In Hearing,112v7lu,I HAIL FROM THE GREAT STATE OF GEORGIA!\n\n\nO...,Madre-says,34,17,0.73
600,Stay or Move On,10fbzbg,\n\nI am an claims associate at a medium size...,Culturalistic,7,3,0.71
449,Am I screwed?,107734i,3.8 GPA graduating. 159 as my only LSAT taken....,msteel2015,15,0,0.22
649,What would you say about an attorney who has o...,10a9g9q,Upd. Just checking if there would be a recomme...,Erinrin13,53,16,0.68
327,Voluminous Records on Appeal,116lnme,I am in a position where many of my cases' rec...,terwen1400,3,6,0.88
695,Anyone use an iPad for your work,106zj6d,Any attorneys here using iPads for their work?...,BLParks12,19,7,1.0
259,Real-life utilization rates,11bnfzb,"Hi everyone, so I’ve been doing research into ...",sergeinfreiman,2,0,0.25
376,Realizing this Area of Law is Not for Me,111nvv7,Hey! I’m joining this community on a dummy acc...,JuggernautMediocre86,6,1,0.56
44,Advice Needed,11siaoz,I have 7 years of experience in property taxes...,Free-will_Illusion,0,0,0.4
354,Transitioning from Litigation to Transactional...,1132mqw,"I practice commercial litigation, with about 1...",Satories,4,2,0.67


In [155]:
cryptoData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of Upvotes,Ratio of Upvotes
248,Why you're witnessing history with crypto like...,11tl7m4,The parallel between crypto and the early stag...,Socialinfluencing,151,49,0.73
305,How to make it simple for non believers?,1172kqx,"Hi fellow kids, *puts down skateboard*\n\nEver...",SleemanAbad,73,0,0.42
272,Belgian Politician (and member of the EU parli...,11u1tqa,"This morning, a Belgian politician from the NV...",SnowyMountain__,55,22,0.84
362,Arbitrum Airdrop Check Your Eligibility,11tj198,"Hey guys, the much awaited Arbitrum Airdrop th...",TheOtherCoolCat,50,10,0.78
325,"""Sayonara"" Bitcoin has officially dead the 473...",11tuw47,"On 14 March 2023 Bitcoin died 473rd time, Robi...",mesutdmn,122,152,0.77
225,Recommend me math papers that could apply for ...,11uplvl,As probably most of you I read news about a to...,Saschb2b,11,2,0.63
191,"Don't check last ATH price, check the market cap!",11uwqtx,"I read the post "" **Now that we are on the ups...",mishaog,81,44,0.74
430,"Europol Cracks Down on Another Coin Mixer, Sei...",11spg1e,The operation is being touted as one of the la...,o_LUCIFER_o,6,3,0.71
96,Euphoria is bad for investing,11vic0i,We have had several green weeks lately. Return...,Aseira,92,17,0.66
110,...and now i know why bitcoin maxi's exist.,11uwi6m,back in 2021 i fell for the hype and bought a ...,Novel-Counter-8093,89,19,0.65


In [156]:
sportsData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
119,George Karl on why he doesn't think Joel Embii...,11vtd66,"\n\n> ""I don't want to bad mouth Embiid becaus...",Wonderful-Balance711,758,2098,0.84
405,Nikola Jokic tonight to snap the Nugget’s losi...,11tcyqn,[Source](https://www.espn.com/nba/boxscore/_/g...,Michael_B_Lopez,634,3256,0.89
507,Was the Jrue Holiday trade for the Bucks one o...,11swod5,"On November 24, 2020, Jrue Holiday was traded ...",Disastrous_Day_2166,97,242,0.91
218,The Mavericks collected Lukaless road wins ove...,11vplpl,The Mavs played every Pacific Division team on...,BayonettaBasher,11,48,0.82
446,GAME THREAD: Washington Wizards (32-37) @ Clev...,11u6224,##General Information\n **TIME** |**MED...,NBA_MOD,7,12,0.8
382,In last 10 clutch games Wolves’ opponents are ...,11udo85,"Including against Mavericks (6/11), Wizards (5...",PlayInChampions,11,30,0.86
109,Jalen Green tonight: 40 points on 11/22 shooti...,11w4fwz,Jalen had a big all around game with 27/6/7 in...,Few_Mulberry5372,181,846,0.97
500,The Kings have the best record in the NBA post...,11tgcce,Their only losses come to the TWolves and the ...,InsaneCookies21,57,757,0.95
503,[Azarly] Kawhi when asked about his 10-43 star...,11tfyy8,https://twitter.com/TomerAzarly/status/1636468...,MVPG2022,42,359,0.96
388,We are around 70 games into the season and the...,11ucywf,This could change tonight if the Mavs beat the...,Ted_Dance_Son,20,50,0.83


In [157]:
collegeData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
17,Have you ever felt guilty for giving yourself ...,11usg27,Lately I’ve been having this feeling like the ...,Left-Strawberry-1725,23,172,0.99
529,Advice on Pursuing my Bachelor's Degree,112httu,"Hi all,\n\nThis year, I'll be receiving my Ass...",sauceonmynips,1,1,1.0
467,Anyone here using a second brain app for note-...,11mpwgy,I know folks like to use all sorts of differen...,AliveFault,3,1,1.0
646,Humanities vs STEM,1105o2k,My dad has wanted me to enter the STEM field e...,_Saini_,20,18,0.9
532,Academic accommodations,112g1th,I’m in a bad situation with college in my firs...,_KingDawg72_,1,0,0.5
245,Pre-Vet undergrad,11r8qie,Hi! I am super interested in becoming a vet an...,SchnauzerLover55,2,1,1.0
173,Go back to school?,117xjvi,Hey guys! I’m 24 years old and I have a pretty...,MhmSomethingClever,5,3,0.81
38,I cannot decide what to major in,11a3a28,"As the title suggests, I cannot decide what to...",banxanamilk,1,0,0.5
663,Canvas quiz monitoring,10zso15,"Hi, I just took an online quiz on Canvas which...",ThisResolve,6,7,0.85
224,Master's versus post graduate certificate,117fkpj,I'm at crossroads. I want to specialize in cop...,NidhiOnATree,2,1,1.0


In [158]:
elifData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
541,eli5 / why are zoomed lights in movies compose...,11jfsfn,I mean..in movies when there are a lot zoomed ...,giansolcia,9,1,0.57
345,ELI5: What is higher order bacteria? Does bact...,11of1co,Is there anywhere I can read more about bacter...,lunar1412,1,2,0.6
443,ELI5: How do plants know when to branch?,11m91xt,"What causes a plant to branch, and why are som...",Smitttycakes,4,6,0.65
251,"ELI5: What is the ""Presence Range"" in audio",11qii25,"What exactly does it mean, and how does it the...",El_Cowboyz,3,8,0.68
75,ELI5 - How do all the recent “rent to own” hou...,11u70vv,There has been an influx of scam posts all acr...,AlertSanity,11,6,0.64
90,ELI5: Is there any confirmed explanation of ho...,11txwe0,"OK, so further explanation of my question. I k...",Fisherman_2727,167,229,0.75
162,ELI5 How does a Gurney flap in F1 cars work,11t1dnf,How does a Gurney flap in F1 cars work,theNthd0ct0R,0,0,0.5
40,ELI5 what causes baby fever?,11v495y,Why do a majority of people experience baby fe...,MothersMiIk,6,0,0.33
57,ELI5: How does ice make things cold?,11uohj3,"If ice has atoms which are not moving as much,...",JustTransportation51,10,0,0.29
563,eli5: Doesn’t chaos theory just prove we lack ...,11iqqtk,I don’t understand this concept of “chaos” in ...,justmikewilldo,14,5,0.62


In [159]:
animeData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
51,Blue Lock - Episode 23 discussion,11utvjx,"*Blue Lock*, episode 23\n\n\n\n# [Rate this ep...",AutoLovepon,434,1980,0.97
155,Intense action anime that’s mature (Seinen) wi...,119158i,Just need a good old anime action series \nI l...,CornbeefandRicee,10,0,0.5
394,"Best OP/ED of 2022 AnimeBracket: Round 1, Group D",11mx7g8,Check out the themes on [animethemes.moe](htt...,Wuff_the_Dog,49,50,0.91
503,How Do I Get My GF Into Anime???,115k0dd,She refuses to watch even the most main stream...,Sea-Technician-119,52,0,0.21
55,is “you’re under arrest” anime a full adaptati...,11uskbh,does anyone know if it adapted the manga fully?,sxleepy,11,4,0.65
417,Fumetsu no Anata e Season 2 - Episode 17 discu...,116acd7,"*Fumetsu no Anata e Season 2*, episode 17\n\nA...",AutoLovepon,307,1177,0.98
95,What's an anime song where you only recognize ...,11ubuq8,"For example, when I sing the very first openin...",SuperAlloyBerserker,17,0,0.43
526,Fumetsu no Anata e Season 2 - Episode 19 discu...,11ix8kj,"*Fumetsu no Anata e Season 2*, episode 19\n\nA...",AutoLovepon,317,1112,0.98
427,[NyanWatch] Non Non Biyori - Episode 10 Discus...,11lyhyr,# **Episode 10 - [We Watched the First Sunrise...,ChonkyOdango,83,74,0.92
379,"Casual Discussion Fridays - Week of March 10, ...",11n8ews,This is a weekly thread to get to know /r/anim...,AutoModerator,9721,71,0.93


In [160]:
ccqData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
28,CS job in Canada or US as international?,11vem6v,I am still a junior in a college but looking a...,Antique-Wrongdoer-15,15,0,0.5
243,I Have no Idea How to Conduct Myself as a Prof...,11sg5l8,Long story short here's what happened to me to...,TheChanceToBeAlive,15,2,0.57
90,"If am from a non-CS background, is survival in...",11ue3v8,I graduated from Biotech from an IIT. Was into...,Natural-Suspect8881,0,4,0.7
408,"15 yo from jordan, wondering what's the next step",11r2hln,"Hey there, I've been lurking around the whole ...",zezo_idrees,12,0,0.36
278,What is Everyone’s Most Ideal Third-Party Recr...,11s625n,"Hey everyone,\n\nI’m curious for everyone’s op...",anonyyy69420,1,1,0.6
82,Should I still pursue this career and become a...,11v17d4,"Hello all,\n\nI will probably get downvoted fo...",Status_Marsupial6265,19,0,0.42
134,"Nearly 1000 applications, where am I going wro...",11ty5k7,Brief summary: I'm going to graduate in May th...,ManWhoWantsToLearn,74,43,0.84
132,"What problems do you encounter at your job, da...",11ts4st,Wanna help me brainstorm problems in workplace...,EnvironmentalData287,3,3,0.64
726,Where do older software engineers go?,11mpzvx,I'm a a bit concerned to go into computer scie...,Zane2156,287,273,0.89
427,"Got my first role, what should I do to make my...",11qntsd,"Hello, some background I am doing a CS degree ...",Immediate-Ad1653,4,1,0.67


In [161]:
rantData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
138,I think i’ll never be enough or do enough for ...,11tbatg,Everyone always complains about what i don’t d...,Business_Relative_55,3,4,1.0
4,Don’t make kids if you can’t afford taking the...,11whfdy,I’m 25 in a month and this year was supposed t...,Timeishere58,0,0,0.5
246,I'm really worried about rightwing/conservativ...,11r9zir,And they're getting worse and worse and worse ...,Trash_man_can,3,10,0.74
19,I wish that I had never met you.,11w9b1c,I wish I didn't even know you existed. I don't...,lifeisntdaisies,3,7,0.77
137,"So I was using a hotel a few weeks ago, I used...",11tbitf,As it says. I was staying in another city so i...,BodybuilderChris2023,2,2,1.0
290,web designers who prevent you pasting in compl...,11qkws8,"ETA: instead of the ""designers"" I should have...",ButtercupsUncle,5,12,0.99
194,I get personally offended when people say the ...,11sb702,I don't have any special needs. But I struggle...,Affectionate_Hat494,2,3,0.8
811,I accidentally ate something that had my food ...,11esnyh,I projectile vomited 3-4 times and threw away ...,SnooMaps6193,0,2,1.0
694,My molester is moving back in,11hly0s,I am stuck living with my parents as I cannot ...,7Stargazer77,28,21,0.87
913,The way Reddit handles Loli content is ridiculous,11ctee5,I was scrolling through an anime art subreddit...,KingTheSleepyKing,2,5,0.73


In [162]:
pghData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
191,Does anyone else notice that the tap water sme...,11459g1,"For context, we're in Bethel Park. Is this due...",byzvntine,24,0,0.2
216,where can I get a custom car paint job done?,113btli,i’m thinking about getting my car repainted an...,WTF__Steve,7,0,0.46
333,"second try: Toynbee resurrected, the re-tiling?",10zqtkm,"Hi folks, \nDisclaimer: I apologize, I'm not...",StevInPitt,9,8,0.74
383,Mr. Smalls Theatre parking,11illw4,Have never been to this venue and google doesn...,SquintyB,7,0,0.31
528,Living,11bw3io,So me and my girlfriend this summer are lookin...,tsr112,20,0,0.36
261,Help finding tattoo artist,11mwuwq,Hey my wife and I are looking for a tattoo art...,SteelCityChip,9,0,0.38
241,Hello everyone!,112wfw1,I am looking for someone who’s able to build a...,DariusTheChaotic,6,0,0.31
172,ISO Part time office,11qgdlp,I’m struggling to find what I want on Craigsli...,Outrageous-Copy29,8,0,0.5
104,Is this an easy fix or a sign of bigger problems?,11777uu,My brother just sent me this:\n\nThe issue is ...,whataworld2020,11,2,0.53
198,Max and Erma’s - Cranberry Township Location. ...,113x6ix,I just received the email and I enjoyed this l...,DarkLuc1d1ty,127,85,0.9


In [163]:
bwayData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
463,Why is the musical theater industry so centere...,10qo41h,I know New York is the epicenter and it makes ...,MembersClubs,23,0,0.21
13,a review of the shows I saw on Broadway includ...,11v99is,Long time Broadway lover and lurker of this pa...,katie_vk,3,14,0.89
22,Lottery odds - one ticket vs two?,119zt2o,"As I’m sure many of you saw, they just opened ...",Yikes_Brigade,14,9,0.8
571,Any intel on when The Notebook is transferring...,10lsxu4,"I saw it in Chicago, and it was so good! Surpr...",Abm0506,4,8,0.91
608,Piano Lesson,10k4zdm,With this being the final week of The Piano Le...,DistributionDry8301,10,3,1.0
29,Jukebox musical audiences,11v4ekj,"In your experience, are jukebox musical audien...",aaes12,8,1,0.67
538,On this the International Holocaust Remembranc...,10mu64i,"&#x200B;\n\n[ Fiddler on the Roof, National Yi...",MikermanS,3,44,0.98
481,Tier List Just Because,11hdvcn,&#x200B;\n\nhttps://preview.redd.it/ibnxyb7v2l...,Careless-Will6982,0,0,0.33
176,Blumenthal in Charlotte NC announcing season M...,11qpzqc,Leading with Moulin Rouge!,moonbee1010,1,0,0.5
474,"Song Suggestions? Meta-Musicals ""Cabaret"" at m...",11h98qf,Hi everyone! My college is hosting a cabaret (...,flowerfox69,7,0,0.5


In [164]:
hsData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
269,Math specific requirements for EU unis,11icfdy,Is Canadian Calculus and Vectors (non-AP optio...,depressingfridge,0,1,1.0
299,Please help me decide which classes I should t...,10xdu3h,"I’m a junior who has taken 2 years of math, bu...",Tammisthrowaway2147,2,3,1.0
308,FREE CHEGG DISCORD,11frqir,"This chegg server is awesome, instant replies ...",Einsight22,2,0,0.33
287,Menstrual and Sexual Health Education in High ...,10xyg3a,Hi guys!\n\nI am a college student working on ...,leedle2135,0,1,1.0
720,I made a website where you can manage your hig...,10udvpk,"You can keep track of your courses, check the ...",111The1The111,12,15,0.83
130,Take a 4th year of math?,1158umu,High School Advice\n\nI’m currently a junior i...,ethanxdeleon,1,1,1.0
187,Foreign languages class help,112p3pl,"Hi, I'm a junior and I need to know how to get...",Jolly-Mistake-107,3,3,0.8
423,High school sports,11945dh,So I'm in high school right I'm a freshman pla...,Stanchpumpkin,7,3,0.71
686,Time Management w/ Hobbies,10wiri0,I feel like all I’ve done recently is work on ...,mercifie,1,1,1.0
696,I got THE ONE GUY I DIDN'T WANT FOR SEMESTER 2,10vrgy3,So my semester 2 is not that bad to be totally...,MrWingoTingo,1,4,1.0


In [165]:
medData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
672,Classic Reddit: “Some of the dumbest people I'...,zsauqs,,lovelydayfortoast,212,325,0.86
464,I’m currently interviewing for a new job in a ...,10hif6m,Thoughts appreciated. I’m in a small to averag...,weebvibes99,7,6,0.8
933,Are there non-arrhythmic reasons to give Amiod...,yxdcox,Hi all! Hoping someone can help put my mind at...,annabethtf13,66,36,0.91
204,Looking for additional information on mayo ECM...,117pxwl,https://www.jems.com/patient-care/mayo-clinic-...,Mitthrawnuruo,25,8,0.72
764,ChatGPT manages Acute MI,zhb5ba,"Like everyone else on the internet, I've been ...",c3fepime,56,138,0.97
14,Thank you’s using internal hospital email,118mgem,My wife and I had to take our son to the ED at...,readitonreddit34,63,155,0.97
333,Delay in Time to Antibiotics for De Novo Inpat...,100jw9b,With all the fuss made around getting neutrope...,roccmyworld,45,82,0.82
217,Doctors At 2 Allina Health Campuses Seek Union...,114jhmu,,Ejdubs,40,464,0.99
148,Spanish for Healthcare Professionals Courses -...,11dvd9w,I have recently been looking into getting my c...,phidelt649,38,56,0.91
853,Assembly Bill (AB) 1278,z7jnn9,Got this email from Medical Board of Californ...,sanarezai,2,3,1.0


In [166]:
adData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
510,What do I do if I lose my social security card?,1194kqx,"I moved out of my house for good last year, an...",Ok-Chemical-2275,3,2,1.0
364,Feel as if I’m constantly being scammed,11dy4a8,I’ve grown a little bit wiser now that I’m alm...,BrianArmstro,182,336,0.96
384,can I send priority mail from my home?,11edi70,I have some important documents to send ASAP t...,AcatSkates,3,2,1.0
644,Getting an autism report from GP?,1132aj5,"Hi, I know this isn't the best subreddit to as...",alasfinallyaname,5,2,0.75
127,"25 going on 26, what the fuck am I doing",115vuio,I am in a career I don’t know where I want to ...,doodoobopbop,4,9,1.0
104,What would you tell your younger self to help ...,11rhkz6,From a teen who wants to be ready,Nerdy-AND-Cool,7,2,0.75
450,Jury duty?,11bfbk5,"I was summoned for jury duty last year in May,...",somuchregretti,13,12,0.81
172,"Those of you who get paid hourly, do you have ...",11o46fc,Vice versa. Do those with a salary get less be...,712588Kf,7,6,1.0
78,Can't seem to find balance in my life.,117s70n,I sleep around 2am - 230am and wake up at 730a...,jaeha99,85,96,0.95
469,How to manage stress when you have no choice b...,10pv7ub,All the stress management advice on the intern...,shanwithareddit,55,105,0.98


In [167]:
legalData.sample(10)

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
261,I am in a video game development team that use...,11v5t3g,"Basically, the game is an alternate history sp...",PolarisStar05,0,2,0.67
667,Are legal jurisdictions in contracts enforceab...,11twaqo,If a vehicle was purchased in County A and pur...,Fit-Pitch-9550,1,1,1.0
736,My job is threatening to fire me over a raise,11twasu,Hello to all the people here. I right now work...,ColinTheBossReal,3,0,0.25
14,Oregon. Got “secret admirer” texts. Found out ...,11vdf9x,"For the last few days, I have been getting tex...",ChinUpNoseDown,23,73,0.9
382,Monterey park CA,11ut7gl,Accused of Robery during sale of a car\n\nRefu...,Individual_Trade394,12,0,0.36
792,Employer told me I had to stay at work after c...,11te209,I work at a veterinary office in Utah (USA). W...,Savings_Telephone_42,3,2,0.75
43,Security Deposit from renting,11vrvg9,So I lived in an apartment for a little over 4...,T-rev3,5,2,1.0
363,Medical Malpractice / Fuddled healthcare turns...,11v2xuw,I am wondering if anyone has any information o...,SonaraSounds,10,0,0.38
369,min hour contracts,11uv630,My employer has me on minimum hour contract. I...,Obvious_Owl_4349,6,0,0.33
337,min hour contracts,11uv630,My employer has me on minimum hour contract. I...,Obvious_Owl_4349,6,0,0.33


## New CSV Files

In [168]:
legalData.to_csv('finalLegalData.csv')
adData.to_csv('finalAdData.csv')
medData.to_csv('finalMedData.csv')
hsData.to_csv('finalHsData.csv')
bwayData.to_csv('finalBwayData.csv')
pghData.to_csv('finalPghData.csv')
rantData.to_csv('finalRantData.csv')
ccqData.to_csv('finalCcqData.csv')
animeData.to_csv('finalAnimeData.csv')
elifData.to_csv('finalElifData.csv')
collegeData.to_csv('finalCollegeData.csv')
sportsData.to_csv('finalSportsData.csv')
cryptoData.to_csv('finalCryptoData.csv')
lawyerData.to_csv('finalLawyerData.csv')
gamingData.to_csv('finalGamingData.csv')