# Reddit Data Collection
In order to collect data, the Reddit API will be used in combination with PRAW, a library built for interaction with said API. The subreddits we will be looking at are:  
r/AskReddit
r/Politics
r/Soccer
r/happy

We will be pulling numerous posts for the specific purpose of looking at their comments. We want to observe how conversations and discourse develop within an online forum

In [7]:
import sys
!{sys.executable} -m pip install praw



#### Initializing packages
We will be using praw, the requests package for simple interaction with the Reddit API, as well as pandas for simple data storage/manipulation

In [8]:
import praw as praw
from praw.models import MoreComments
import requests as requests
import pandas as pd
client_id = 'Ec_e0japI4_xhi-yc7wbyw'

A secret key and password are pulled from a local director:

In [9]:
with open('data/secret.txt', 'r') as f:
    secret_key = f.read()

In [10]:
auth_token = requests.auth.HTTPBasicAuth(client_id, secret_key)

In [11]:
with open('data/pw.txt', 'r') as f:
    pwd = f.read()

In [12]:
login_data = {
    'grant_type': 'password',
    'username': 'Primary-Ad3149',
    'password': pwd
}

Let's connect to Reddit via PRAW...

In [13]:
reddit = praw.Reddit(
    client_id=client_id,
    client_secret=secret_key,
    password=pwd,
    user_agent="Comment Extraction for Discourse Analysis (by u/Primary-Ad3149)",
    username="Primary-Ad3149",
)

and look at specific subreddits, starting with **AskReddit**:

## Investigating r/AskReddit

In [14]:
import inspect

In [15]:
askReddit = reddit.subreddit("AskReddit")
print(inspect.getmembers(askReddit))



There's a lot of unnecessary data here, so let's pull only what we need and insert it into a pandas dataframe. We'll start by pulling from rising posts of the subreddit, since they will likely have small discussions that will be easy to display/format.

In [69]:
# Get the information we're interested in. We'll gather post id, author, title, the text it contiains, the number of comments, and the comments themselves:
id_askReddit, author_askReddit, title_askReddit, text_askReddit, numComments_askReddit, comments_askReddit = [], [], [], [], [], []
[comments_askReddit.append(x.comments) for x in askReddit.rising(limit = 20)]
[id_askReddit.append(x.id) for x in askReddit.rising(limit = 20)]
[author_askReddit.append(x.author) for x in askReddit.rising(limit = 20)]
[title_askReddit.append(x.title) for x in askReddit.rising(limit = 20)]
[text_askReddit.append(x.selftext) for x in askReddit.rising(limit = 20)]
[numComments_askReddit.append(x.num_comments) for x in askReddit.rising(limit = 20)]

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

Convert this data into a dataframe:

In [70]:
askReddit_df = pd.DataFrame({'id':id_askReddit, 'author':author_askReddit, 'title':title_askReddit, 'text':text_askReddit, 'numComments':numComments_askReddit, 'comments':comments_askReddit})

We'll pull threads with a number of comments greater than 10, since it's 

In [71]:
askReddit_df = askReddit_df[askReddit_df['numComments'] > 10]
askReddit_df = askReddit_df.set_index('id')
askReddit_df.head(10)

Unnamed: 0_level_0,author,title,text,numComments,comments
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
11aqsxa,jlu7lilstrongst,(Serious) why did you quit your last job?,,12,"(j9tf652, j9tfhu9, j9tff6n, j9tfk2r, j9tfu7h, ..."
11aqsc2,plurBUDDHA,Needing feedback for a school project! How do ...,,12,"(j9tfeuv, j9tfkfw, j9tfamk, j9tg5xh)"
11aqal2,SolidAd6757,what food is underrated?,,88,"(j9tekur, j9tcc3o, j9td9a6, j9tepmf, j9tf0s4, ..."
11aqlzs,haighaighaig,What's the most embarrassing thing that happen...,,29,"(j9te7d5, j9te55j, j9tf11w, j9tg6ul, j9tevqk, ..."
11apmgf,arielmeso,what if god gave you 1 question to ask him?,,229,"(j9t9dcy, j9t8r62, j9t8x1o, j9taf66, j9t8u2g, ..."
11aq7lx,YuckBrusselSprouts,Has any politician ever been a bigger fail tha...,,35,"(j9tbx2n, j9tc9e2, j9teyo1, j9tc0j5, j9tc9z0, ..."
11aqng3,Man_in-Black_,Atheists who promote their sex-positive views ...,,13,"(j9te9p6, j9tegnk, j9tfdgl, j9teib7, j9teyyl, ..."
11aphjs,AggresiveKoala40,What is the healthiest diet to follow?,,21,"(j9t8h2w, j9t9u8y, j9t903q, j9t83tt, j9tasbl, ..."
11aqdrl,Ginger_Giant2002,"What is your favourite sex position, why?",,28,"(j9tcu6w, j9tctga, j9tcy62, j9td3e2, j9tdgg3, ..."


The comments field is a list of comment tree objects, which each have their own replies, whicch then have their own replies... etc. Let's take a look at a single post's comments:

In [88]:
for comment in askReddit_df.loc['11aphjs'].comments.list():
    # Using ____ to demarcate different comments
    print(comment.body + "\n _________")

The Mediterranean diet is usually ranked at the top or close to it every year.
 _________
A balanced diet. You need a variety of foods (both plant-based and animal-based) to maintain optimal health and physical / cognitive performance. The 'healthiest diet' is one that excludes all damaging, unhealthy, inflammatory foods and at the same time includes lots of high-quality, properly-sourced, nutritious foods. Our body needs a specific quantity and ratio of nutrients to function properly and perform its metabolic functions. The vegan and carnivore diet are both dietary extremes. A nice diet that emphasizes nutrient density, digestibility, gut health, and at the same time excludes inflammatory, gut-irritating and immunogenic / immunostimulatory foods is the Paleo diet. Even more so, the autoimmune Paleo (AIP) diet. AIP is primarily targeted towards autoimmune patients who want to reverse / manage their autoimmune condition naturally. Both Paleo and autoimmune Paleo (AIP) are ancestral diet

Finally, let's export this to a csv. In the future, comm:

In [116]:
askReddit_df.to_csv('askReddit.csv')

This data is challenging due to the way the RedditAPI pulls comments from the website. It's breadth-first, meaning comments are stored as such:  

top_level, top_level, ..., second_level, second_level, ... third_level  

Luckily, each comment structure  contains a parent_id that links to its parent comment. PRAW is also available [open-source on GitHub](https://github.com/praw-dev/praw/blob/5ee4b1820c2591117e32be45778372e7c03a5f56/praw/models/comment_forest.py#L83), so we will edit their list() function to use DFS.
```
def list_dfs(comm):  
        """Return a DFS list of all Comments.  
        This list may contain :class:`.MoreComments` instances if  
        :meth:`.replace_more` was not called first.  
        """  
        comments = []
        queue = list(comm)
        while queue:
            comment = queue.pop(0)
            comments.append(comment)
            if not isinstance(comment, MoreComments):
                queue[0:0] = comment.replies
        return 
```
This will likely be a significant portion of my project, so for now let's pull the relevant data from our other subreddits of interest starting with **r/politics**:

## Investigating r/Politics

In [74]:
politics = reddit.subreddit("politics")

Let's pull our data from rising:

In [82]:
id_politics, author_politics, title_politics, text_politics, numComments_politics, comments_politics = [], [], [], [], [], []
[comments_politics.append(x.comments) for x in politics.rising(limit = 20)]
[id_politics.append(x.id) for x in politics.rising(limit = 20)]
[author_politics.append(x.author) for x in politics.rising(limit = 20)]
[title_politics.append(x.title) for x in politics.rising(limit = 20)]
[text_politics.append(x.selftext) for x in politics.rising(limit = 20)]
[numComments_politics.append(x.num_comments) for x in politics.rising(limit = 20)]

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

And then place it into a dataframe:

In [83]:
politics_df = pd.DataFrame({'id':id_politics, 'author':author_politics, 'title':title_politics, 'text':text_politics, 'numComments':numComments_politics, 'comments':comments_politics})

In [84]:
politics_df = politics_df[politics_df['numComments'] > 10]
politics_df = politics_df.set_index('id')
politics_df.head(10)

Unnamed: 0_level_0,author,title,text,numComments,comments
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
11aqd9b,WaterChi,Tucker Carlson vs. Chelsea Handler: Why the ri...,,25,"(j9tcn69, j9tdwtj, j9tdj0x, j9tgd57, j9tdii8, ..."
11aqfak,WaterChi,Trump is preparing for more American carnage,,34,"(j9tcy7x, j9tdsrz, j9tfiug, j9tdid4, j9teis0, ..."
11ad9q1,Fr1sk3r,“He should be apologizing”: Critics call out T...,,392,"(j9rakdi, j9rawz4, j9rcf0g, j9rhsbs, j9rd2wu, ..."
11a85me,A_Queff_In_Time,All U.S. extremist mass killings in 2022 linke...,,613,"(j9qdm9i, j9qe5pz, j9s8dga, j9qe1vo, j9qu4u0, ..."
11a775d,Sharp_Literature_739,A ‘national divorce’ would destroy red states....,,2622,"(j9q795j, j9q88gv, j9qadzd, j9qb1fc, j9qcl0z, ..."
11acjs8,Scarlettail,California bill would eventually ban all tobac...,,724,"(j9r63ls, j9r8s4i, j9rdbs4, j9r7vsl, j9rnfkr, ..."
11a96us,thenewrepublic,Students Across Florida Walk Out in Protest of...,,267,"(j9qkikg, j9qtbm6, j9qkqpi, j9qmtvg, j9relyx, ..."
11a8wnf,cameronj,White House blames Trump administration and Re...,,674,"(j9qils7, j9qrd9b, j9qmoik, j9qke6g, j9qneqm, ..."
11afzj5,Beckles28nz,Marjorie Taylor Greene Mocked for Tweeting '6 ...,,231,"(j9rs0uj, j9rt9pf, j9rsaxa, j9rww5v, j9rsls2, ..."
11apbou,e-r_bridge,How a box with classified documents ended up i...,,20,"(j9t75vo, j9t7hja, j9t7vom, j9t7mqj, j9t7hdv, ..."


Let's look at some basic statistics about the number of comments on these posts:

In [90]:
politics_df.describe()

Unnamed: 0,numComments
count,18.0
mean,485.166667
std,635.309026
min,19.0
25%,45.0
50%,267.0
75%,684.5
max,2622.0


In [87]:
for comment in politics_df.loc['11apbou'].comments.list():
    # Using ____ to demarcate different comments
    print(comment.body + "\n _________")


As a reminder, this subreddit [is for civil discussion.](/r/politics/wiki/index#wiki_be_civil)

In general, be courteous to others. Debate/discuss/argue the merits of ideas, don't attack people. Personal insults, shill or troll accusations, hate speech, any suggestion or support of harm, violence, or death, and other rule violations can result in a permanent ban. 

If you see comments in violation of our rules, please report them.

 For those who have questions regarding any media outlets being posted on this subreddit, please click [here](https://www.reddit.com/r/politics/wiki/approveddomainslist) to review our details as to our approved domains list and outlet criteria.
 
 **Special announcement:**
 
 r/politics is currently accepting new moderator applications.  If you want to help make this community a better place, consider [applying here today](https://www.reddit.com/r/politics/comments/sskg6a/rpolitics_is_looking_for_more_moderators/)!

***


*I am a bot, and this action was pe

Lastly, we convert the dataframe to a csv:

In [117]:
politics_df.to_csv('politics.csv')

Now, let's move onto **r/soccer**

## Investigating r/Soccer

In [92]:
soccer = reddit.subreddit("soccer")

In [93]:
id_soccer, author_soccer, title_soccer, text_soccer, numComments_soccer, comments_soccer = [], [], [], [], [], []
[comments_soccer.append(x.comments) for x in soccer.rising(limit = 20)]
[id_soccer.append(x.id) for x in soccer.rising(limit = 20)]
[author_soccer.append(x.author) for x in soccer.rising(limit = 20)]
[title_soccer.append(x.title) for x in soccer.rising(limit = 20)]
[text_soccer.append(x.selftext) for x in soccer.rising(limit = 20)]
[numComments_soccer.append(x.num_comments) for x in soccer.rising(limit = 20)]

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [94]:
soccer_df = pd.DataFrame({'id':id_soccer, 'author':author_soccer, 'title':title_soccer, 'text':text_soccer, 'numComments':numComments_soccer, 'comments':comments_soccer})

In [95]:
soccer_df = soccer_df[soccer_df['numComments'] > 10]
soccer_df = soccer_df.set_index('id')
soccer_df.head(10)

Unnamed: 0_level_0,author,title,text,numComments,comments
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
11aqjtk,Heimebane,[Simon Stone] Erik ten Hag calls @NUFC an 'ann...,,131,"(j9tdowi, j9tdt99, j9tfaoa, j9tds8r, j9te5je, ..."
11ap3bl,RevertBackwards,Europa League RO16 draw,,721,"(j9t62gz, j9t621z, j9t63w9, j9t6526, j9t677r, ..."
11a9co3,PSGAcademy,Manchester United [2] - 1 Barcelona [4-3 on ag...,,987,"(j9qlmbs, j9qluv8, j9qlqci, j9qltx9, j9qlswu, ..."
11appc5,RevertBackwards,Djed Spence ushers the cameraman away after St...,,138,"(j9t977n, j9ta55b, j9tafau, j9tfguk, j9t9kb2, ..."
11aoc70,EmotionalMillionaire,[Everton FC] Pickford Signs New Long-Term Ever...,,146,"(j9t55cr, j9t2iyn, j9t2tcp, j9t5p95, j9t6x1j, ..."
11aq3wv,akskeleton_47,ECL Round of 16 draw,,167,"(j9tb8uk, j9tcmpj, j9tb995, j9tbwbr, j9tbd50, ..."
11a929i,PSGAcademy,Bruno smashes the ball into De Jong,,1381,"(j9qjnpe, j9qjy2h, j9qrulv, j9qlj3n, j9qx5z5, ..."
11a9wvt,dotuan,Raphael Varane clearance against Barcelona 90+4',,414,"(j9qpfzq, j9qpodo, j9qpt9d, j9qpiwq, j9qpife, ..."
11a8pyz,PSGAcademy,Manchester United [1] - 1 Barcelona [3-3 on ag...,,255,"(j9qhcg0, j9qhfin, j9qhe23, j9qhgfy, j9qhdu6, ..."
11a9xmw,LampseederBroDude51,Post Match Thread: Manchester United 2-1 FC Ba...,\n#**FT: Manchester United [2-1](#bar-3-white...,3723,"(j9qpn2s, j9qpu4j, j9qsg9k, j9qsl5p, j9qqgo5, ..."


Let's look at the comments statistics...

In [101]:
soccer_df.describe()

Unnamed: 0,numComments
count,16.0
mean,666.5
std,901.463921
min,13.0
25%,144.0
50%,427.0
75%,782.0
max,3723.0


As well as the comments themselves for the first post. We can limit ourselves to the first 20 for this one:

In [115]:
print(soccer_df.loc['11aqjtk'].title+"\n")
for comment in soccer_df.loc['11aqjtk'].comments.list()[:20]:
    # Using ____ to demarcate different comments
    print(comment.body + "\n _________")

[Simon Stone] Erik ten Hag calls @NUFC an 'annoying team'. Says their 'effective time' is the lowest in PL. "They are quite successful at it."

**This is a quotes thread. Remember that there's only one quotes post allowed per interview/press conference, so new quotes with the same origin will be removed. Feel free to comment other quotes/the whole interview as a reply to this comment so users can see them too!**


*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/soccer) if you have any questions or concerns.*
 _________
Full quotes:  


Ten Hag on #NUFC: "Annoying. They try to annoy you. If you see from the FA, referees want to play effective time. They have the lowest in the league, they are quite successful with it. It's up to us to get speed in the game, but we are dependent on the refereeing as well." #MUFC

Ten Hag: "I am not in the instructions of the opponents so I don't know, I can't influence th

Csv-ify:

In [118]:
soccer_df.to_csv('soccer.csv')

Let's move onto **r/happy**

## Enjoying r/Happy :)

In [103]:
happy = reddit.subreddit("happy")

We'll take the same approach as we did for the previous 3 subreddits, but pull from a larger pool of posts due to the lower level of activity:

In [108]:
id_happy, author_happy, title_happy, text_happy, numComments_happy, comments_happy = [], [], [], [], [], []
[comments_happy.append(x.comments) for x in happy.rising(limit = 40)]
[id_happy.append(x.id) for x in happy.rising(limit = 40)]
[author_happy.append(x.author) for x in happy.rising(limit = 40)]
[title_happy.append(x.title) for x in happy.rising(limit = 40)]
[text_happy.append(x.selftext) for x in happy.rising(limit = 40)]
[numComments_happy.append(x.num_comments) for x in happy.rising(limit = 40)]

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [109]:
happy_df = pd.DataFrame({'id':id_happy, 'author':author_happy, 'title':title_happy, 'text':text_happy, 'numComments':numComments_happy, 'comments':comments_happy})

In [110]:
happy_df = happy_df[happy_df['numComments'] > 10]
happy_df = happy_df.set_index('id')
happy_df.head(10)

Unnamed: 0_level_0,author,title,text,numComments,comments
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
119wk3m,Space_Velvet,"An acrylic painting I made called ""Valley of t...",,71,"(j9ocrqh, j9oe9bo, j9oobr9, j9p2pg7, j9opwxx, ..."
119dd9i,msolu10,Today is my 25th Birthday. After 25 years of m...,,33,"(j9ljkf7, j9luyg8, j9n8zgu, j9npe4c, j9m7gsu, ..."
119hoip,NRmusiccringe,This semester and life in general has thrown a...,,11,"(j9m5lye, j9n1ydn, j9p6xtg, j9mfzd2, j9omze6, ..."
1197uub,captivatedconcious69,I am sonhappy and excited today! I'm meeting m...,,27,"(j9kwz4f, j9l6spw, j9l7gct, j9lfs3u, j9li28o, ..."
118th5p,GaylordTurner,Crafting Delicate Blooms: The Art of Button Fl...,,11,"(j9j18j1, j9j7ltn, j9j366x, j9k5xie, j9m18sa, ..."
1194s51,nomilkyno,Just found out I’m pregnant! I can’t tell any ...,,12,"(j9kdew8, j9kgio0, j9kmks1, j9l5yrt, j9low5j, ..."
118umot,eyeballresort,My biggest childhood dream came true. Not only...,,12,"(j9j6e36, j9jnycp, j9k0n8w, j9kmquu, j9mi4cz, ..."


In [111]:
happy_df.describe()

Unnamed: 0,numComments
count,7.0
mean,25.285714
std,22.035685
min,11.0
25%,11.5
50%,12.0
75%,30.0
max,71.0


It appears that posts on r/happy have less comments compared to the other subreddits we've looked at. Let's see what the comments look like:

In [114]:
print(happy_df.loc['119dd9i'].title+"\n")
for comment in happy_df.loc['119dd9i'].comments.list()[:20]:
    # Using ____ to demarcate different comments
    print(comment.body + "\n _________")

Today is my 25th Birthday. After 25 years of my grandparents making me meals, today I got to make them one. It was the best gift I could’ve asked for.

Welcome to /r/happy where we support people in their endeavours! This is a place of positivity, if you can't think of anything good to say then don't say anything at all.

If you want to give tips/suggestions, make them constructive from the start and be supportive (even if you don't feel it's "enough"), if you don't know how to do that then don't give them.

We celebrate the good things in life and the change people strive for in /r/happy. If you find this post offensive or this community ridiculous, you're welcome to not hang around.


*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/happy) if you have any questions or concerns.*
 _________
Your grandparents look so happy and sweet! Love them while they're still here! Happy Birthday 🎂 💜
 _________
So, w

Let's convert this to a csv:

In [119]:
happy_df.to_csv('happy.csv')

Lastly, let's look at **r/music**

## Listening to r/Music

In [121]:
music = reddit.subreddit("music")

In [122]:
id_music, author_music, title_music, text_music, numComments_music, comments_music = [], [], [], [], [], []
[comments_music.append(x.comments) for x in music.rising(limit = 40)]
[id_music.append(x.id) for x in music.rising(limit = 40)]
[author_music.append(x.author) for x in music.rising(limit = 40)]
[title_music.append(x.title) for x in music.rising(limit = 40)]
[text_music.append(x.selftext) for x in music.rising(limit = 40)]
[numComments_music.append(x.num_comments) for x in music.rising(limit = 40)]

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [123]:
music_df = pd.DataFrame({'id':id_music, 'author':author_music, 'title':title_music, 'text':text_music, 'numComments':numComments_music, 'comments':comments_music})

In [124]:
music_df = music_df[music_df['numComments'] > 10]
music_df = music_df.set_index('id')
music_df.head(10)

Unnamed: 0_level_0,author,title,text,numComments,comments
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
11ari4r,BloatedBanana9,90s Bands That Are Still Making Good Music,Recently I’ve been digging into the later disc...,34,"(j9tk2p3, j9tkole, j9tkw8q, j9tnctx, j9tnqek, ..."
11aezk5,bodamfuonua1,R. Kelly Sentenced to 20 Years for Child Sex C...,,722,"(j9s742q, j9s7jrt, j9rm757, j9rovy2, j9s8ez6, ..."
119yqbf,I-Skeleton,Yeah Yeah Yeahs - Maps [Indie Rock],,267,"(j9ptum0, j9p25h7, j9opday, j9pc9m7, j9p4zgk, ..."
11afkp6,mrxexon,Alice In Chains - Man in the Box (Official Vid...,,25,"(j9rpp8f, j9s5s9n, j9rppaz, j9rrjsh, j9sdhhv, ..."
11akf30,Human_Capital_2518,Marilyn Manson Accuser Recants Sexual Assault ...,,36,"(j9tpmje, j9sko8k, j9t4ynu, j9st8ba, j9sktg1, ..."
11a0egf,Pristine2268,Are there any rock songs out there featuring a...,Also interested in any artists that frequently...,1202,"(j9ozehg, j9p264n, j9p2s9e, j9pbwnd, j9p24e1, ..."
119lk2u,mrxexon,Fatboy Slim - Weapon Of Choice [Rock],,255,"(j9nqgl6, j9n5gpv, j9n232f, j9neclb, j9nopb6, ..."
119zftm,dragonoid296,Ol' Dirty Bastard - Shimmy Shimmy Ya [Hip Hop],,25,"(j9p5oih, j9pfeek, j9pthh1, j9pgfil, j9popf4, ..."
11ab8mo,Sc00ter7622,Creed,"Go ahead and crucify me, but for some reason I...",217,"(j9rawmt, j9rg6a8, j9rfopi, j9rb7ap, j9rq67a, ..."
11apc6v,SassiMarzare,Sexy Rock Music?,Comment below if u have any suggestions to sex...,46,"(j9t7we2, j9t7wz7, j9tbj09, j9t7ezq, j9t7m3d, ..."


Let's look at the comment statistics...

In [125]:
music_df.describe()

Unnamed: 0,numComments
count,12.0
mean,240.333333
std,364.801399
min,18.0
25%,31.75
50%,41.5
75%,258.0
max,1202.0


As well as the comments:

In [126]:
print(music_df.loc['11ari4r'].title+"\n")
for comment in music_df.loc['11ari4r'].comments.list()[:20]:
    # Using ____ to demarcate different comments
    print(comment.body + "\n _________")

90s Bands That Are Still Making Good Music

Radiohead
 _________
Hum
 _________
Garbage

Skunk Anansie

Tori Amos

The Cardigans

Sheryl Crow
 _________
Yo La Tengo

Built to Spill
 _________
Beck
 _________
Pearl Jam
 _________
Phish.
 _________
Foo Fighters were still going with new music as of a few years ago, although unfortunately their drummer Taylor Hawkins passed away recently.
 _________
Pearl Jam: One of the most influential grunge bands of the 90s, Pearl Jam released their eleventh studio album Gigaton in 2020, which received critical acclaim and a Grammy nomination for Best Rock Album  


Green Day: The punk rock trio that dominated the 90s with hits like Basket Case and Good Riddance (Time of Your Life) released their thirteenth studio album Father of All Motherfuckers in 2020, which featured a more garage rock sound and shorter songs  


Weezer: The geek rock band that gave us classics like Buddy Holly and Say It Ain’t So released two albums in 2021: OK Human, which was i

Lastly, let's convert to CSV

In [127]:
music_df.to_csv('music.csv')

## Moving Forward
The comment data collected will need to be further cleaned before any form of analysis can be performed. Top posts may also be looked at, but they will require more time to pull from Reddit due to the sheer amount of data contained within each CommentForest