# Reddit AITA Dataset Creation


2 Input files created using datafile_filtering.py:
1. AITA submissions .csv file
2. Top level comments for the AITA submissions .csv file

1 Output file:
1. .csv file where each row is an AITA submission along with its top 10 comments

## Prepare Environment

In [1]:
%pip install zstandard pandas

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import zstandard as zstd

In [4]:
%pwd

'c:\\Users\\mattb\\Documents\\GitHub\\CPSS-24-Reddit-Conflict-Resolution\\dataset_creation'

## Creation of AITA submissions dataframe

In [70]:
# load submissions csv

submissions_df = pd.read_csv('aita-datafiles/2022/submissions_2022_score_50.csv')

In [71]:
submissions_df

Unnamed: 0,id,score,title,selftext,url,created_utc
0,rt72fp,22321,AITA telling my co worker that I will report h...,\nI F33 have been working in this company for ...,https://www.reddit.com/r/AmItheAsshole/comment...,1640996378
1,rt75nt,79,AITA for sitting in the back seat of the car e...,My ex picked me and our son up from the airpor...,https://www.reddit.com/r/AmItheAsshole/comment...,1640996653
2,rt7a13,80,[deleted by user],[removed],https://www.reddit.com/r/AmItheAsshole/comment...,1640997055
3,rt7bve,687,AITA for asking the receptionist to mute the t...,"TLDR: I asked the receptionist to mute the tv,...",https://www.reddit.com/r/AmItheAsshole/comment...,1640997223
4,rt7nbf,7813,AITA for grounding my daughter for sneaking a ...,[removed],https://www.reddit.com/r/AmItheAsshole/comment...,1640998285
...,...,...,...,...,...,...
49239,zzepv6,91,AITA: For coming home a day late from a work t...,So I've been out on a remote island for a work...,https://www.reddit.com/r/AmItheAsshole/comment...,1672442341
49240,zzeyxc,78,AITA For not refusing to letting my friend cra...,"I M43, have a friend, Laura F40, who was evict...",https://www.reddit.com/r/AmItheAsshole/comment...,1672442988
49241,zzf0d9,701,AITA for not wanting my 19yo stepdaughter to c...,[deleted],,1672443087
49242,zzf5jk,3563,AITA for not giving my husband a pass on doing...,So a little back story… I am a new grad rn at ...,https://www.reddit.com/r/AmItheAsshole/comment...,1672443461


In [72]:
# rename columns so that they better reflect their data

submissions_df = submissions_df.rename(columns={'id': 'submission_id',
                                      'score': 'submission_score',
                                      'title': 'submission_title',
                                      'selftext': 'submission_text',
                                      'url': 'submission_url'})

In [73]:
submissions_df

Unnamed: 0,submission_id,submission_score,submission_title,submission_text,submission_url,created_utc
0,rt72fp,22321,AITA telling my co worker that I will report h...,\nI F33 have been working in this company for ...,https://www.reddit.com/r/AmItheAsshole/comment...,1640996378
1,rt75nt,79,AITA for sitting in the back seat of the car e...,My ex picked me and our son up from the airpor...,https://www.reddit.com/r/AmItheAsshole/comment...,1640996653
2,rt7a13,80,[deleted by user],[removed],https://www.reddit.com/r/AmItheAsshole/comment...,1640997055
3,rt7bve,687,AITA for asking the receptionist to mute the t...,"TLDR: I asked the receptionist to mute the tv,...",https://www.reddit.com/r/AmItheAsshole/comment...,1640997223
4,rt7nbf,7813,AITA for grounding my daughter for sneaking a ...,[removed],https://www.reddit.com/r/AmItheAsshole/comment...,1640998285
...,...,...,...,...,...,...
49239,zzepv6,91,AITA: For coming home a day late from a work t...,So I've been out on a remote island for a work...,https://www.reddit.com/r/AmItheAsshole/comment...,1672442341
49240,zzeyxc,78,AITA For not refusing to letting my friend cra...,"I M43, have a friend, Laura F40, who was evict...",https://www.reddit.com/r/AmItheAsshole/comment...,1672442988
49241,zzf0d9,701,AITA for not wanting my 19yo stepdaughter to c...,[deleted],,1672443087
49242,zzf5jk,3563,AITA for not giving my husband a pass on doing...,So a little back story… I am a new grad rn at ...,https://www.reddit.com/r/AmItheAsshole/comment...,1672443461


## Creation of AITA comments dataframe

In [74]:
# load comments csv

comments_df = pd.read_csv('aita-datafiles/2022/top_level_comments_2022_score_5_submission_score_50.csv')

In [75]:
# strip the t3_ from the link_id column

comments_df['link_id'] = comments_df['link_id'].str.slice(3)

In [76]:
# rename columns so that they better reflect their data

comments_df = comments_df.rename(columns={'id': 'comment_id',
                                      'score': 'comment_score',
                                      'body': 'comment_text'})

## Merging of AITA submission and comments dataframes

In [77]:
# Create a dataframe of the top 10 comments for each submission

merged_df = submissions_df.merge(comments_df, left_on='submission_id', right_on='link_id') # merge submission and top comments dataframes
merged_df = merged_df.drop('link_id', axis=1) # remove link_id column
top_10_comments = merged_df.groupby('submission_id').apply(lambda x: x.nlargest(10, 'comment_score')['comment_text'].tolist()) # group by submission_id and get the top 10 comments for each submission
top_10_comments_df = pd.DataFrame(top_10_comments.tolist(), index=top_10_comments.index).add_prefix('comment_')

  top_10_comments = merged_df.groupby('submission_id').apply(lambda x: x.nlargest(10, 'comment_score')['comment_text'].tolist()) # group by submission_id and get the top 10 comments for each submission


In [78]:
# Merge submissions_df and top_10_comments_df on submission_id
# Result is a dataframe with both submissions and their top 10 comments

submissions_with_top_10_comments = submissions_df.merge(top_10_comments_df, on='submission_id')

In [79]:
# Convert UTC timestamps to datetime

submissions_with_top_10_comments['created_utc'] = pd.to_datetime(submissions_with_top_10_comments['created_utc'], unit='s')


In [80]:
# Rename timestamp and top comment columns for improved clarity

submissions_with_top_10_comments = submissions_with_top_10_comments.rename(columns={'created_utc': 'submission_date',
                                                                                    'comment_0': 'top_comment_1',
                                                                                    'comment_1': 'top_comment_2',
                                                                                    'comment_2': 'top_comment_3',
                                                                                    'comment_3': 'top_comment_4',
                                                                                    'comment_4': 'top_comment_5',
                                                                                    'comment_5': 'top_comment_6',
                                                                                    'comment_6': 'top_comment_7',
                                                                                    'comment_7': 'top_comment_8',
                                                                                    'comment_8': 'top_comment_9',
                                                                                    'comment_9': 'top_comment_10'})

In [81]:
# Remove submission_id column since it isn't important to the dataset

submissions_with_top_10_comments = submissions_with_top_10_comments.drop('submission_id', axis=1)

In [82]:
# Swap submission_score and submission_text columns

submissions_with_top_10_comments[['submission_score', 'submission_text']] = submissions_with_top_10_comments[['submission_text', 'submission_score']]
submissions_with_top_10_comments = submissions_with_top_10_comments.rename(columns={'submission_score': 'submission_text', 'submission_text': 'submission_score'})

In [83]:
submissions_with_top_10_comments

Unnamed: 0,submission_text,submission_title,submission_score,submission_url,submission_date,top_comment_1,top_comment_2,top_comment_3,top_comment_4,top_comment_5,top_comment_6,top_comment_7,top_comment_8,top_comment_9,top_comment_10
0,\nI F33 have been working in this company for ...,AITA telling my co worker that I will report h...,22321,https://www.reddit.com/r/AmItheAsshole/comment...,2022-01-01 00:19:38,"NTA. What if it hadn't been yours at all, but ...",NTA. For those questioning this:\n\nHow about ...,Nice gesture? All it was was a lot of scene-st...,As someone who had a baby in the last year and...,NTA and I have to be honest I'm a little baffl...,"""Our joy"".... Does he think he's the father? 🤔",NTA\n\n\nYou coworkers are really strange. Thi...,NTA.\n\nReport.him like yesterday.\n\nIt does ...,NTA\n\nHe was looking for an excuse to be the ...,NTA. Your pregnancy announcement is yours to ...
1,My ex picked me and our son up from the airpor...,AITA for sitting in the back seat of the car e...,79,https://www.reddit.com/r/AmItheAsshole/comment...,2022-01-01 00:24:13,If I were you I'd make sure there wasn't a nex...,NTA. There's no way you could be. You sat in...,"Damn, I’m glad you broke up with him. Sorry yo...",NTA. There's a reason why he's your ex.,NTA. I can see why sitting next to him would m...,It's a bit odd... but INFO\n\nIf you aren't co...,NTA- You don't owe him anything and you have e...,NTA but this is a thing with some people. My s...,NTA. Sitting in the back doesn’t mean you were...,NTA I can see why he’s an ex. You are entitled...
2,[removed],[deleted by user],80,https://www.reddit.com/r/AmItheAsshole/comment...,2022-01-01 00:30:55,Keep taking care of the dog. See if she is mic...,NTA but when you bring the dog to the vet to c...,Nta but be real careful. I am not a lawyer but...,Not the ass. That's your baby now they don't d...,[removed],[deleted],This is a tough one because all the signs of n...,,,
3,"TLDR: I asked the receptionist to mute the tv,...",AITA for asking the receptionist to mute the t...,687,https://www.reddit.com/r/AmItheAsshole/comment...,2022-01-01 00:33:43,I’m really surprised by all of the Y T A votes...,Info: Why couldn't you ask to turn the volume ...,Info: could you have moved chairs away from th...,NTA. You asked about one question to the recep...,NTA. The staff could have turned the volume do...,NTA! I hate having a TV in the waiting room. I...,Info: Did you read the room and see if anyone ...,NTA I don’t understand why televisions have to...,&gt;Why doesn’t the same principle apply to no...,"NTA. This is a place of business, not that guy..."
4,[removed],AITA for grounding my daughter for sneaking a ...,7813,https://www.reddit.com/r/AmItheAsshole/comment...,2022-01-01 00:51:25,NTA. OMG!!! Fair has nothing to do with it. ...,NTA We are all the most unfair father ever. En...,NTA. She’s 12. Sneaking boys over with the doo...,NTA. Rules have been established (I assume). I...,NTA\n\nShe is 12. She has absolutely no busine...,"NTA, but my concern is that handling this with...",NTA bc she broke the rules. I would suggest af...,NTA\n\nIt wouldn't even matter if was another ...,NTA. but I would have a talk with her. \n\nLet...,"Welp here comes the downvotes uh YTA, although..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48049,So I've been out on a remote island for a work...,AITA: For coming home a day late from a work t...,91,https://www.reddit.com/r/AmItheAsshole/comment...,2022-12-30 23:19:01,YTA\n\nYour wife is right. You should have con...,YTA \n\nYou did a FREE favour for your boss an...,Yeah you should have called your wife before a...,How long did it take you to get to the dock? I...,YTA. She and the kids miss you and will be dis...,Info:\n\n- how old are the kids? \n- is daycar...,"NAH, but an amazing opportunity to have a conv...",YTA. Your wife is doing all the child minding ...,Not an AH but a great learning experience for ...,YTA\n\nAt the very least you should have commu...
48050,"I M43, have a friend, Laura F40, who was evict...",AITA For not refusing to letting my friend cra...,78,https://www.reddit.com/r/AmItheAsshole/comment...,2022-12-30 23:29:48,"NTA, and I would be worried she would never mo...",NTA. You have a roommate. If you’re not okay w...,"NTA-No is a complete sentence. Besides that, y...",NTA. If you can’t then you can’t she has no ri...,,,,,,
48051,[deleted],AITA for not wanting my 19yo stepdaughter to c...,701,,2022-12-30 23:31:27,YTA I'm 35 and when I visit my mom we 100% wil...,YTA\n\nMy girls and I lay on our bed all the t...,Another post that is the prime example that so...,"YTA, you are talking about her like she is a p...",Say you hate your step daughter without saying...,"YTA, and you are making this creepy.",YTA. Your stepdaughter isn't doing this out of...,YTA. This is an incredibly weird and disturbin...,YTA.\n\nI would understand if it was because s...,INFO: How long has she been your stepdaughter?...
48052,So a little back story… I am a new grad rn at ...,AITA for not giving my husband a pass on doing...,3563,https://www.reddit.com/r/AmItheAsshole/comment...,2022-12-30 23:37:41,NTA and honestly if he can't wrap his head aro...,I am so sorry. I was reading this and feeling ...,So your husband just had to wait until about 7...,NTA They should have waited. If that was too h...,How are you putting up with this? Do you thin...,NTA. As an RN who has worked graveyard shift ...,NTA. Your husband sounds awful. When is the la...,NTA\n\nTime to have a talk with your husband a...,NTA and you're completely valid. You put in a ...,I don't even have kids and I feel terrible for...


## Saving of Reddit AITA dataset

In [84]:
# save the dataframe as a csv
output_file = 'aita-datafiles/2022/Reddit_AITA_2022_Raw.csv'
submissions_with_top_10_comments.to_csv(output_file, index=False)