# Sample and label comment questions
Now that we've collected comments from some advice subreddits, let's label them to determine if they are valid questions.

## Table of contents
1. Preprocess data
    1. [Load data](#Load-data)
    2. [Extract questions](#Extract-questions)
    3. [Sample data](#Sample-for-crowdsourcing)
2. Examine labels
    1. [Reload crowdsourced data](#Examine-crowdsourced-labelled-data)
    2. Compare text similarity for valid questions
        1. [Word overlap](#Determining-question-validity:-word-overlap)
        2. [Semantic similarity](#Determining-question-validity:-semantic-similarity)

### Load data

In [8]:
import pandas as pd
import os
data_dir = '../../data/reddit_data/'
comment_data_files = list(filter(lambda x: x.startswith('subreddit_comments'), os.listdir(data_dir)))
comment_data_files = list(map(lambda x: os.path.join(data_dir, x), comment_data_files))
## load all data
import json
import gzip
def load_json_data(data_file):
    data = []
    for l_i in gzip.open(data_file, 'rt'):
        data_i = json.loads(l_i.strip())
        data.append(data_i)
    data = pd.DataFrame(data)
    return data
comment_data = pd.concat(list(map(lambda x: load_json_data(x), comment_data_files)), axis=0)
display(comment_data.head())
print(f'{comment_data.shape[0]} comments')

Unnamed: 0,author,author_flair_text,author_fullname,body,created_utc,edited,id,parent_id,score,subreddit
0,grumpypantsoldman,,t2_27ps6lxw,NTA. I think you dodged a bullet. Who needs a ...,1541030405,False,e8tkic1,t3_9t3n27,14,AmItheAsshole
1,unknown_salmon,,t2_q57txtp,I do feel for you. Do you think you could be b...,1541030426,False,e8tkj26,t3_9t1u2e,2,Advice
2,vld-s,,t2_14p3sr,YNTA. He made you uncomfortable and you distan...,1541030467,False,e8tkkjt,t3_9t3xz3,3,AmItheAsshole
3,Prepperpoints2Ponder,,t2_11lpfa,Manufacturing here. Me and spouse will be payi...,1541030470,False,e8tkkns,t3_9t2e8e,2,personalfinance
4,juliej891,,t2_1mwezovi,I think you’re going to go out of your way to ...,1541030493,False,e8tklij,t3_9t4afp,1,Advice


1510689 comments


In [23]:
## add submission data lololol
# submission_data = load_json_data('../../data/reddit_data/subreddit_submissions_2018-01_2019-12.gz')
submission_data.rename(columns={'id' : 'parent_id'}, inplace=True)
display(submission_data.head())

Unnamed: 0,author,author_flair_text,created_utc,edited,parent_id,num_comments,score,selftext,subreddit,title,category,author_fullname
0,deepsouthsloth,,1514764840,False,7nby0l,7,1,26M/married/2 kids\n\nEmployer match is 50% up...,personalfinance,Should I continue with 401k despite terrible e...,,
1,CapableCounteroffer,,1514764890,False,7nby5t,5,0,"On November 24th, I called AT&amp;T to inquire...",legaladvice,[FL] Issue getting AT&amp;T to pay early termi...,,
2,pinkcrayon69,,1514764948,False,7nbybf,9,3,I live in south OC but I need to move out of m...,personalfinance,I need to move out in a month. What should I p...,,
3,bobshellby,Needs 64bit Windows...,1514765040,False,7nbykz,6,0,Are there keycaps for the Microsoft wireless k...,pcmasterrace,Keyboard keycap help,,
4,j0sh135742,,1514765064,1514765420.0,7nbyno,4,0,"So in MGL Part 1, Title 15, Chapter 94G, Secti...",legaladvice,Quick question about Medical Marijuana.,,


In [49]:
# cleanup, etc.
# fix parent ID
comment_data = comment_data.assign(**{'parent_id' : comment_data.loc[:, 'parent_id'].apply(lambda x: x.split('_')[-1])})
# fix edited names
submission_data.rename(columns={'edited' : 'submission_edited'}, inplace=True)
comment_data.rename(columns={'edited' : 'comment_edited'}, inplace=True)
comment_submission_data = pd.merge(submission_data.loc[:, ['title', 'selftext', 'parent_id', 'submission_edited']], 
                                   comment_data, on='parent_id')
# remove edited submissions
comment_submission_data = comment_submission_data.assign(**{'submission_edited' : comment_submission_data.loc[:, 'submission_edited'].apply(lambda x: x if type(x) is bool else x > 0)})
comment_submission_data = comment_submission_data[~comment_submission_data.loc[:, 'submission_edited']]
display(comment_submission_data.head())

Unnamed: 0,title,selftext,parent_id,submission_edited,author,author_flair_text,author_fullname,body,created_utc,comment_edited,id,score,subreddit,comment_questions,valid_question
0,Should I continue with 401k despite terrible e...,26M/married/2 kids\n\nEmployer match is 50% up...,7nby0l,False,slalomz,,,You'd get slightly lower expense ratios in an ...,1514765131,False,ds0nchz,3,personalfinance,[],False
1,Should I continue with 401k despite terrible e...,26M/married/2 kids\n\nEmployer match is 50% up...,7nby0l,False,DaveAlot,,,Tax-advantaged investing beats taxable investi...,1514765137,False,ds0ncoa,4,personalfinance,[How much do you want to invest per year and h...,False
2,Should I continue with 401k despite terrible e...,26M/married/2 kids\n\nEmployer match is 50% up...,7nby0l,False,20000to0,,,Tax-advantage accounts are KING.\n\nIt does su...,1514765436,False,ds0nlee,3,personalfinance,[],False
3,[FL] Issue getting AT&amp;T to pay early termi...,"On November 24th, I called AT&amp;T to inquire...",7nby5t,False,lucasrva,,,&gt;I asked if I needed to trade in my phones ...,1514765548,False,ds0noqx,5,legaladvice,[],False
4,[FL] Issue getting AT&amp;T to pay early termi...,"On November 24th, I called AT&amp;T to inquire...",7nby5t,False,swalsh411,Quality Contributor,,That isn't even legal mumbo jumbo buried in a ...,1514766103,False,ds0o4yq,6,legaladvice,[],False


### Extract questions

In [50]:
from data_helpers import extract_questions_all_data
comment_submission_data = comment_submission_data.assign(**{
    'comment_questions' : extract_questions_all_data(comment_submission_data.loc[:, 'body'].values)
})
# check for stand-alone questions
comment_submission_data = comment_submission_data.assign(**{
    'valid_question' : comment_submission_data.apply(lambda x: len(x.loc['comment_questions']) == 1 and x.loc['comment_questions'][0]==x.loc['body'], axis=1)
})

In [51]:
## how many comments have at least one question
print(f'{comment_submission_data[comment_submission_data.loc[:, "comment_questions"].apply(lambda x: len(x)>0)].shape[0]}/{comment_data.shape[0]} comments with at least one question')

202349/1510689 comments with at least one question


In [52]:
question_data = comment_submission_data[comment_submission_data.loc[:, 'valid_question']]
print(f'{question_data.shape[0]}/{comment_submission_data.shape[0]} valid questions')
print(f'subreddit distribution\n{question_data.loc[:, "subreddit"].value_counts()}')

14593/940911 valid questions
subreddit distribution
personalfinance    5253
legaladvice        3765
pcmasterrace       3358
Advice             1717
AmItheAsshole       500
Name: subreddit, dtype: int64


Let's see a sample of questions and determine how valid they are.

In [59]:
import numpy as np
np.random.seed(123)
sample_size = 50
sample_data = []
for subreddit_i, data_i in question_data.groupby('subreddit'):
#     print(f'sample questions for subreddit {subreddit_i}')
    sample_idx_i = np.random.choice(data_i.index, sample_size, replace=False)
    sample_data_i = data_i.loc[sample_idx_i, :]
    sample_data.append(sample_data_i.loc[:, ['parent_id', 'id', 'subreddit', 'title', 'selftext', 'body']])
sample_data = pd.concat(sample_data, axis=0)
sample_data_file = '../../data/reddit_data/sample_advice_subreddit_questions.tsv'
sample_data.to_csv(sample_data_file, sep='\t', index=False)

### Sample for crowdsourcing

Now that we've sampled the data, let's take the sub-sample of valid questions and send them to AMT for crowdsourced validation.

In [30]:
import pandas as pd
label_sample_data = pd.read_csv('../../data/reddit_data/sample_advice_subreddit_question_labels.tsv', sep='\t', index_col=False)
display(label_sample_data.head())

Unnamed: 0,parent_id,id,subreddit,title,selftext,body,question_is_relevant,question_is_clarification,submission_contains_answer,submission_can_include_question_answer
0,9hh9hv,e6bv7f2,Advice,"My brother, who abused me up until i was 14, w...","So, like the title says, my brother wants to t...",Do you want to forgive and forget or are you c...,1,1,0,1
1,9ag3ri,e4w8nbk,Advice,GF is pissed because I brought my best friends...,Basically my GF is freaking out on me and i do...,Were you going to tell her and how much time d...,1,1,0,1
2,9l8d4z,e74tijj,Advice,Fresh peppers,"Hi everyone! I hate fresh peppers green, red a...",Why do you wanna force yourself to do somethin...,1,1,0,1
3,9dqali,e5jf0y2,Advice,Should I buy a 1000$ pc for my best friend so ...,My best friend and I have been together ever s...,Have you considered buying it then have him ma...,1,0,0,1
4,87mh3y,dwdzcz7,Advice,Just wondering why it is when you get older yo...,I’m a (41m) and used to have a ton of close fr...,"Good to know, but would you be honest and say ...",1,0,0,1


In [None]:
valid_label_sample_data = label_sample_data[(label_sample_data.loc[:, 'question_is_relevant']==1) & 
                                            (label_sample_data.loc[:, 'question_is_clarification']==1) & 
                                            (label_sample_data.loc[:, 'submission_contains_answer']==0) & 
                                            (label_sample_data.loc[:, 'submission_can_include_question_answer']==1)]
print(f'{valid_label_sample_data.shape[0]}/{label_sample_data.shape[0]} valid questions')
## get per-subreddit counts
per_subreddit_valid_label_sample_data_counts = valid_label_sample_data.loc[:, 'subreddit'].value_counts() / label_sample_data.loc[:, 'subreddit'].value_counts()
per_subreddit_valid_label_sample_data_counts.sort_values(inplace=True, ascending=False)
print(per_subreddit_valid_label_sample_data_counts)

In [32]:
## output sample to file for labeling
import numpy as np
# np.random.seed(123)
crowdsource_label_data = []
samples_per_subreddit = 10
for subreddit_i, data_i in valid_label_sample_data.groupby('subreddit'):
    idx_i = np.random.choice(data_i.index, samples_per_subreddit, replace=False)
    data_i = data_i.loc[idx_i, :]
    data_i.rename(columns={'selftext' : 'text', 'body' : 'question'}, inplace=True)
    crowdsource_label_data.append(data_i.loc[:, ['subreddit', 'parent_id', 'title', 'id', 'text', 'question']])
crowdsource_label_data = pd.concat(crowdsource_label_data, axis=0)
display(crowdsource_label_data.head())
crowdsource_label_data.to_csv('../../data/reddit_data/advice_subreddit_question_data_for_crowdsource.csv', sep=',', index=False)

154/250 valid questions
legaladvice        0.88
pcmasterrace       0.66
personalfinance    0.60
Advice             0.54
AmItheAsshole      0.40
Name: subreddit, dtype: float64


Unnamed: 0,subreddit,parent_id,title,id,text,question
41,Advice,7osnuf,Constantly feeling worthless because of math (...,dsbxfx5,I'm not bad at math or most subjects. I make s...,Don't take this the wrong way but are you asia...
9,Advice,8zpq1u,Chronically ill and stuck on what to do about ...,e2kkpvv,"Hello, r/advice! I wanted to take a shot and s...",mind if i ask what disability you've been diag...
14,Advice,8e1nu9,There's a bigot in my social group.,dxrph8r,I regularly hang out with a circle of friends ...,What do the other folks in your group say abou...
17,Advice,9ul9hg,Should I move to Florida with my dad or stay h...,e9571n3,Live in NYC metro area and go to college in up...,"In the other hand, you’re an adult and will mo..."
43,Advice,935qpa,Neighbors cat staying on my porch,e3aztz7,"My neighbor has 2 cats, but there's one she ke...",Are you certain it's not a stray?


### Examine crowdsourced labelled data
We submitted some of the valid questions above to MTurk, so now let's see what the results look like and whether the annotators agreed.

In [1]:
import pandas as pd
crowdsource_label_complete_data = pd.read_csv('../../data/reddit_data/advice_subreddit_question_data_for_crowdsource_labels.csv', sep=',', index_col=False)
crowdsource_label_complete_data.fillna('', inplace=True)
print(crowdsource_label_complete_data.columns)
display(crowdsource_label_complete_data.head())

Index(['HITId', 'HITTypeId', 'Title', 'Description', 'Keywords', 'Reward',
       'CreationTime', 'MaxAssignments', 'RequesterAnnotation',
       'AssignmentDurationInSeconds', 'AutoApprovalDelayInSeconds',
       'Expiration', 'NumberOfSimilarHITs', 'LifetimeInSeconds',
       'AssignmentId', 'WorkerId', 'AssignmentStatus', 'AcceptTime',
       'SubmitTime', 'AutoApprovalTime', 'ApprovalTime', 'RejectionTime',
       'RequesterFeedback', 'WorkTimeInSeconds', 'LifetimeApprovalRate',
       'Last30DaysApprovalRate', 'Last7DaysApprovalRate', 'Input.parent_id',
       'Input.title', 'Input.id', 'Input.text', 'Input.question',
       'Answer.contains_answer', 'Answer.contains_answer.label',
       'Answer.more_info', 'Answer.more_info.label', 'Answer.relevant',
       'Answer.relevant.label', 'Answer.rewrite', 'Answer.rewrite.label',
       'Approve', 'Reject'],
      dtype='object')


Unnamed: 0,HITId,HITTypeId,Title,Description,Keywords,Reward,CreationTime,MaxAssignments,RequesterAnnotation,AssignmentDurationInSeconds,...,Answer.contains_answer,Answer.contains_answer.label,Answer.more_info,Answer.more_info.label,Answer.relevant,Answer.relevant.label,Answer.rewrite,Answer.rewrite.label,Approve,Reject
0,3T5ZXGO9EHGWLZ088OISI2DUZU6QZO,3QH037L2CQYJUFX42Y7FDEXZI34WAH,Is the question valid for the post?,Determine if the question is valid for the giv...,"text, classification, similarity",$0.50,Fri Feb 26 16:03:29 PST 2021,2,BatchId:4348053;OriginalHitTemplateId:929448123;,86400,...,Post does not contain answer,,Looking for more information,,Relevant,,,Post author could rewrite,,
1,3T5ZXGO9EHGWLZ088OISI2DUZU6QZO,3QH037L2CQYJUFX42Y7FDEXZI34WAH,Is the question valid for the post?,Determine if the question is valid for the giv...,"text, classification, similarity",$0.50,Fri Feb 26 16:03:29 PST 2021,2,BatchId:4348053;OriginalHitTemplateId:929448123;,86400,...,Post does not contain answer,,Looking for more information,,Relevant,,,Unsure,,
2,36AZSFEY07SS89T9O9WZ275ZPJZBVX,3QH037L2CQYJUFX42Y7FDEXZI34WAH,Is the question valid for the post?,Determine if the question is valid for the giv...,"text, classification, similarity",$0.50,Fri Feb 26 16:03:29 PST 2021,2,BatchId:4348053;OriginalHitTemplateId:929448123;,86400,...,,Post does not contain answer,,Looking for more information,,Relevant,,Post author could rewrite,,
3,36AZSFEY07SS89T9O9WZ275ZPJZBVX,3QH037L2CQYJUFX42Y7FDEXZI34WAH,Is the question valid for the post?,Determine if the question is valid for the giv...,"text, classification, similarity",$0.50,Fri Feb 26 16:03:29 PST 2021,2,BatchId:4348053;OriginalHitTemplateId:929448123;,86400,...,,Post does not contain answer,Looking for more information,,Relevant,,Post author could rewrite,,,
4,3QGHA0EA1MS5NYTEEJ1VO9ODWS1BW7,3QH037L2CQYJUFX42Y7FDEXZI34WAH,Is the question valid for the post?,Determine if the question is valid for the giv...,"text, classification, similarity",$0.50,Fri Feb 26 16:03:29 PST 2021,2,BatchId:4348053;OriginalHitTemplateId:929448123;,86400,...,Post does not contain answer,,Looking for more information,,,Relevant,Post author could rewrite,,,


In [2]:
## combine "label" columns
label_cols = [
    'Answer.contains_answer',
    'Answer.more_info',
    'Answer.relevant',
    'Answer.rewrite',
]
for label_col in label_cols:
    print(label_col)
    label_col_name = label_col.split('.')[-1]
    crowdsource_label_complete_data = crowdsource_label_complete_data.assign(**{
        label_col_name : crowdsource_label_complete_data.apply(lambda x: x.loc[label_col] + x.loc[f'{label_col}.label'], axis=1)
    })
    print(f'values for label {label_col}\n{crowdsource_label_complete_data.loc[:, label_col_name].value_counts()}')

Answer.contains_answer
values for label Answer.contains_answer
Post does not contain answer    71
Post contains answer            17
Unsure                          12
Name: contains_answer, dtype: int64
Answer.more_info
values for label Answer.more_info
Looking for more information        76
Not looking for more information    17
Unsure                               7
Name: more_info, dtype: int64
Answer.relevant
values for label Answer.relevant
Relevant        92
Not relevant     4
Unsure           4
Name: relevant, dtype: int64
Answer.rewrite
values for label Answer.rewrite
Post author could rewrite        53
Post author could not rewrite    28
Unsure                           19
Name: rewrite, dtype: int64


OK! Most of the questions are relevant, seek more information, reference information that the post does not contain, and contain information that the post author could use to rewrite the original post.

What is the overall proportion of "valid" questions using all 4 criteria?

In [55]:
valid_question_crowdsource_label_complete_data = crowdsource_label_complete_data[(crowdsource_label_complete_data.loc[:, 'contains_answer']=='Post does not contain answer') & 
                                                                                 (crowdsource_label_complete_data.loc[:, 'more_info']=='Looking for more information') & 
                                                                                 (crowdsource_label_complete_data.loc[:, 'relevant']=='Relevant') & 
                                                                                 (crowdsource_label_complete_data.loc[:, 'rewrite']=='Post author could rewrite')]
print(f'{valid_question_crowdsource_label_complete_data.shape[0]} / {crowdsource_label_complete_data.shape[0]} annotations identified valid questions')

38 / 100 annotations identified valid questions


That seems pretty low overall. What if we relax the constraint to ignore the "rewrite" condition?

In [56]:
valid_question_crowdsource_label_complete_data = crowdsource_label_complete_data[(crowdsource_label_complete_data.loc[:, 'contains_answer']=='Post does not contain answer') & 
                                                                                 (crowdsource_label_complete_data.loc[:, 'more_info']=='Looking for more information') & 
                                                                                 (crowdsource_label_complete_data.loc[:, 'relevant']=='Relevant')]
print(f'{valid_question_crowdsource_label_complete_data.shape[0]} / {crowdsource_label_complete_data.shape[0]} annotations identified valid questions')

56 / 100 annotations identified valid questions


The "rewrite" condition seems to be the bottleneck, and it is also probably not critical for identifying useful questions in general.

What is the agreement among annotators?

In [5]:
## first: map string labels to numbers for easier handling
label_value_numbers = {
    'Post contains answer' : 1,
    'Post does not contain answer' : 0, 
    'Unsure' : -1,
    'Looking for more information' : 1,
    'Not looking for more information' : 0,
    'Relevant' : 1,
    'Not relevant' : 0,
    'Post author could rewrite' : 1,
    'Post author could not rewrite' : 0,
}
clean_label_cols = ['contains_answer', 'more_info', 'relevant', 'rewrite']
for label_col in clean_label_cols:
    crowdsource_label_complete_data = crowdsource_label_complete_data.assign(**{
        f'{label_col}.num' : crowdsource_label_complete_data.loc[:, label_col].apply(lambda x: label_value_numbers.get(x))
    })
num_label_cols = list(map(lambda x: f'{x}.num', clean_label_cols))

In [6]:
## convert labels to annotator matrix, compute agreement
from scipy.stats import norm
def permutation_match_test(annotator_data, permutes=10000):
    ## randomly shuffle values in each annotator's distribution
    ## compute P of observing actual match score given random score assignment
    np.random.seed(123)
    M, N = labels_per_annotator_data.shape
    exact_match_score = (annotator_data[0, :] == annotator_data[1, :]).sum() / N
    permute_match_scores = []
    copy_data = np.array(annotator_data)
    for i in range(permutes):
        for j in range(M):
            np.random.shuffle(copy_data[j, :])
        match_score_i = (copy_data[0, :]==copy_data[1, :]).sum() / N
        permute_match_scores.append(match_score_i)
    # compute p-value of rejecting null
    permute_mean = np.mean(permute_match_scores)
    permute_std = np.std(permute_match_scores)
    norm_match_score = (exact_match_score - permute_mean) / permute_std
    p_obs = norm.pdf(exact_match_score)
    return permute_mean, p_obs
# from krippendorff import alpha
from sklearn.metrics import cohen_kappa_score
from statsmodels.stats.inter_rater import fleiss_kappa, aggregate_raters, cohens_kappa, to_table
import numpy as np
annotator_count = 2
for label_col in num_label_cols:
    print(f'testing label {label_col.split(".")[0]}')
    labels_per_annotator = crowdsource_label_complete_data.groupby('HITId').apply(lambda x: x.loc[:, label_col].values)
    # remove answers with too few annotators
    labels_per_annotator = labels_per_annotator[labels_per_annotator.apply(lambda x: len(x) == annotator_count)]
    # recombine into matrix
    labels_per_annotator_data = np.vstack(labels_per_annotator.values).T
    # tmp debugging: "inflate" data
#     labels_per_annotator_data = np.hstack([labels_per_annotator_data,]*10)
    # get rid of unsure annotations (all columns that contain at least one "-1" value)
    labels_per_annotator_data = labels_per_annotator_data[:, np.where(labels_per_annotator_data.min(axis=0) != -1)[0]]
    # compute exact match score
    exact_match_score = (labels_per_annotator_data[0, :] == labels_per_annotator_data[1, :]).sum() / labels_per_annotator_data.shape[1]
    permute_match_score, p_obs_match_score = permutation_match_test(labels_per_annotator_data)
    print(f'\texact matches = {"{:1.3f}".format(exact_match_score)} (permute score={"{:1.3f}".format(permute_match_score)}, p={"{:1.3f}".format(p_obs_match_score)})')
    agreement = cohen_kappa_score(labels_per_annotator_data[0, :], labels_per_annotator_data[1, :])
#     aggregate_label_data, _ = aggregate_raters(labels_per_annotator_data.T)
#     agreement = fleiss_kappa(aggregate_label_data)
#     aggregate_label_data, _ = to_table(labels_per_annotator_data.T)
#     agreement = cohens_kappa(aggregate_label_data)
    print(f'\thas agreement={"{:1.3f}".format(agreement)}')
    print(f'raw labels\n{labels_per_annotator_data}')

testing label contains_answer
	exact matches = 0.667 (permute score=0.632, p=0.319)
	has agreement=0.093
raw labels
[[0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0
  0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
  0 1 0]]
testing label more_info
	exact matches = 0.767 (permute score=0.726, p=0.297)
	has agreement=0.150
raw labels
[[1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1
  1 1 1 0 1 1 1]
 [1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1
  1 1 1 1 1 0 1]]
testing label relevant
	exact matches = 0.913 (permute score=0.916, p=0.263)
	has agreement=-0.034
raw labels
[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
  1 1 0 1 1 1 1 1 1 1]]
testing label rewrite
	exact matches = 0.697 (permute score=0.569, p=0.313)
	has agreement=0.295
raw labels
[[1 0 1 1 1

These agreement scores are pretty low. Exact matches are high, but these scores are about what we expect under a random distribution of labels from each annotator (high p-values.

The low agreement scores may have to do with how agreement is computed: if both annotators have a high prior probability of assigning $\text{label}=1$, this may result in a artificially low agreement score due to the formula "washing out" the observed probability in favor of the random chance probability.

How much does the agreement vary by subreddit?

In [7]:
## rejoin with original data
label_sample_data = pd.read_csv('../../data/reddit_data/sample_advice_subreddit_question_labels.tsv', sep='\t', index_col=False)
# display(label_sample_data.head())
crowdsource_label_complete_data.rename(columns={'Input.parent_id' : 'parent_id'}, inplace=True)
crowdsource_label_subreddit_data = pd.merge(crowdsource_label_complete_data,
                                            label_sample_data.loc[:, ['parent_id', 'subreddit']],
                                            on='parent_id')

In [8]:
## overall label counts per subreddit
for subreddit_i, data_i in crowdsource_label_subreddit_data.groupby('subreddit'):
    print(f'---- testing subreddit {subreddit_i} ----')
    for label_col in num_label_cols:
        label_counts = data_i.loc[:, label_col].value_counts()
        print(f'label {label_col} has counts\n{label_counts}')

---- testing subreddit Advice ----
label contains_answer.num has counts
 0    15
-1     3
 1     2
Name: contains_answer.num, dtype: int64
label more_info.num has counts
 1    17
 0     2
-1     1
Name: more_info.num, dtype: int64
label relevant.num has counts
 1    17
-1     3
Name: relevant.num, dtype: int64
label rewrite.num has counts
 1    12
-1     6
 0     2
Name: rewrite.num, dtype: int64
---- testing subreddit AmItheAsshole ----
label contains_answer.num has counts
 0    15
 1     4
-1     3
Name: contains_answer.num, dtype: int64
label more_info.num has counts
 1    11
 0     8
-1     3
Name: more_info.num, dtype: int64
label relevant.num has counts
 1    18
 0     3
-1     1
Name: relevant.num, dtype: int64
label rewrite.num has counts
 0    11
 1     7
-1     4
Name: rewrite.num, dtype: int64
---- testing subreddit legaladvice ----
label contains_answer.num has counts
 0    12
-1     4
 1     4
Name: contains_answer.num, dtype: int64
label more_info.num has counts
 1    15


There is less uncertainty in `personalfinance`, `pcmasterrace`, and (to some degree) `legaladvice`, as compared to the more generic, relationship-oriented subreddits `Advice` and `AmITheAsshole`.

In [9]:
annotator_count = 2
for subreddit_i, data_i in crowdsource_label_subreddit_data.groupby('subreddit'):
    print(f'{subreddit_i} has N={data_i.shape[0]}')
    for label_col in num_label_cols:
        print(f'label = {label_col.split(".")[0]}')
        ## compute alpha etc.
        labels_per_annotator = data_i.groupby('HITId').apply(lambda x: x.loc[:, label_col].values)
        # remove answers with too few annotators
        labels_per_annotator = labels_per_annotator[labels_per_annotator.apply(lambda x: len(x) == annotator_count)]
        # recombine into matrix
        labels_per_annotator_data = np.vstack(labels_per_annotator.values).T
        # get rid of unsure annotations (all columns that contain at least one "-1" value)
        # NOTE: makes super small sample sizes FML
#         labels_per_annotator_data = labels_per_annotator_data[:, np.where(labels_per_annotator_data.min(axis=0) != -1)[0]]
        print(f'\tN={labels_per_annotator_data.shape[1]}')
        exact_match_score = (labels_per_annotator_data[0, :] == labels_per_annotator_data[1, :]).sum() / labels_per_annotator_data.shape[1]
        print(f'\texact matches = {"{:1.3f}".format(exact_match_score)}')
        label_agreement = cohen_kappa_score(labels_per_annotator_data[0, :], labels_per_annotator_data[1, :])
        ## edge case: NAN => perfect agreement
        if(np.isnan(label_agreement)):
              label_agreement = 1.
        print(f'\thas kappa={"{:1.3f}".format(label_agreement)}')
        print(f'raw labels:\n{labels_per_annotator_data}')

Advice has N=20
label = contains_answer
	N=10
	exact matches = 0.500
	has kappa=0.000
raw labels:
[[ 0  0 -1 -1 -1  1  1  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0]]
label = more_info
	N=10
	exact matches = 0.700
	has kappa=-0.111
raw labels:
[[ 1  1  0  1 -1  1  1  1  1  1]
 [ 1  1  1  1  1  0  1  1  1  1]]
label = relevant
	N=10
	exact matches = 0.700
	has kappa=0.000
raw labels:
[[ 1 -1 -1  1 -1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]]
label = rewrite
	N=10
	exact matches = 0.500
	has kappa=0.074
raw labels:
[[ 1 -1  0 -1 -1  1  1  1  1  1]
 [ 1  1 -1 -1  1  1  0  1 -1  1]]
AmItheAsshole has N=22
label = contains_answer
	N=9
	exact matches = 0.444
	has kappa=-0.154
raw labels:
[[ 0  0  0  1 -1  1  0  0 -1]
 [ 0  0 -1  0  0  0  0  0  0]]
label = more_info
	N=9
	exact matches = 0.444
	has kappa=0.000
raw labels:
[[ 1  1  1  0 -1  1  0  1 -1]
 [ 1  1 -1  1  0  1  1  1  0]]
label = relevant
	N=9
	exact matches = 0.778
	has kappa=-0.059
raw labels:
[[ 1  1  1  1  1  1  1 -1  1]
 

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


- `legaladvice` has the highest scores consistently, 
- In general the sample sizes are very small, which may lead to very low agreement.

Which questions tend to have "unsure" annotations?

In [10]:
pd.set_option('display.max_colwidth', 500)
for label_col in num_label_cols:
    print(f'unsure data for label {label_col.split(".")[0]}')
    unsure_label_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, label_col]==-1]
    display(unsure_label_data.loc[:, ['subreddit', 'Input.title', 'Input.text', 'Input.question']].head(10))

unsure data for label contains_answer


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
12,Advice,My friend is demanding me to pay her,"English is not my first language, but I'll do my best. \n\nSo, last friday I was with this girl in school. Tbh I never considered myself a close friend of her, but I wanted to talk a bit with her. I was lying on the floor, she was next to me seated on the floor too. She put her glasses in my belly, I didn't feel anything, she neither told me anything. So, I seated too, glasses felt on the ground and apparently such a small fall made them to (a line in middle of the lense? It's not a crack li...",Would you ask her to pay if she knocked your glasses on the floor?
14,Advice,Fresh peppers,"Hi everyone! I hate fresh peppers green, red and yellow i can only eat pickled hot peppers but i want to be able to eat fresh peppers too. I nerd help any ideas?","Why do you wanna force yourself to do something you don't enjoy, op?"
16,Advice,Help I’m insecure of my girlfriend going to gym and going places by herself,"I work most of he time so there isn’t much time for therapy\n\nBut I’m always insecure of my girlfriend working Uber late night, going to the gym trying to lose weight because I don’t want guys talking to her or looking at her, and on her phone sometimes too, I get scared she is texting another guy.\n\n\nI really don’t want to lose her, help?",if you don’t trust her then why are you dating her ?
24,AmItheAsshole,WIBTA for dating my cousin’s (f) old crush (m)?,A year or two ago she was trying to date this guy who had a mutual interest in her but it just didn’t work out because it wasn’t the right time in life you know what I mean?\nShe now has a boyfriend and they’re pretty serious but the guy and I matched on Tinder and hit it off really well. I really feel like she wouldn’t care but maybe be jealous? So WIBTA for dating him without asking how she feels first?,"NTA, but why would you want your cousin's hand me downs?"
26,AmItheAsshole,AITA for wanting full rent?,"Some backstory is necessary to understand this.\n\nA couple of months ago i inherited an apartment from my grandpa. He bought it in the 70s and its worth about $400 000 right now. It's in a really nice neighbourhood where rent for similar apartments is around $2000/ month. \nSince I own the apartment the rent is about $400-500 a month, and I moved in a couple of weeks ago. \nIt's a 3 bedroom 2 bathroom + kitchen which is obviously too big for one person, which is why I've mentioned to some p...",Why are you paying rent on a place you own?
37,AmItheAsshole,AITA For Kicking a Car That Almost Hit Me?,"I was riding my bike yesterday through the city around rush hour, it was raining. I generally stay pretty safe, use crosswalks at big intersections- I'm not the bike person you see swerving in and out of traffic and running lights and stop signs.\n\nI was waiting at a crosswalk on my bike for the light to turn. A car stopped at the crosswalk (she didn't need to, her light was green but people do that sometimes). There was only one car in sight and it was directly behind her. I started go...","Her light was green, so I'm guessing the crosswalk was on the 'no walk/stop' symbol at this time?"
42,legaladvice,Is it technically illegal to brew alcohol under 21 (in America or PA in general),"I tried to brew alcohol in a mental institution as an impatient teen. (Long story) I’m aware purchasing, ingesting, trying to purchase, sell, or transporting is illegal (and possessing?) but if I’m brewing it in a mental hospital, does that count as possessing it?\n\nProbably a stupid question but I have my reasons for why I want an answer.",Why are you brewing alcohol in a mental hospital?
52,legaladvice,car dealership screwed my fiance over,"today my fiance went and bought a 5000 dollar car in full, then not even 5 hours later the car broke down on the side of the highway and we barely got it home. my fiance and 9 month old baby were stuck on the side of the highway for 2 hours till i could get to them. does the dealship have a responsibility to fix the car or take it back? or can i sue them for something if they refuse to help us out? \(live in kansas\)",What did your independent mechanic say when they inspected the car before you purchased it?
53,legaladvice,car dealership screwed my fiance over,"today my fiance went and bought a 5000 dollar car in full, then not even 5 hours later the car broke down on the side of the highway and we barely got it home. my fiance and 9 month old baby were stuck on the side of the highway for 2 hours till i could get to them. does the dealship have a responsibility to fix the car or take it back? or can i sue them for something if they refuse to help us out? \(live in kansas\)",What did your independent mechanic say when they inspected the car before you purchased it?
59,legaladvice,Advice: driving Uber under someone else’s account,"Hypothetically, if my friend wanted to drive lyft (sorry not Uber) under his dads account and they look similar: What risks would he be running? Anything criminal or just civil in the case of an accident? (Florida)",Is there a reason he can't get his own rideshare account?


unsure data for label more_info


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
14,Advice,Fresh peppers,"Hi everyone! I hate fresh peppers green, red and yellow i can only eat pickled hot peppers but i want to be able to eat fresh peppers too. I nerd help any ideas?","Why do you wanna force yourself to do something you don't enjoy, op?"
24,AmItheAsshole,WIBTA for dating my cousin’s (f) old crush (m)?,A year or two ago she was trying to date this guy who had a mutual interest in her but it just didn’t work out because it wasn’t the right time in life you know what I mean?\nShe now has a boyfriend and they’re pretty serious but the guy and I matched on Tinder and hit it off really well. I really feel like she wouldn’t care but maybe be jealous? So WIBTA for dating him without asking how she feels first?,"NTA, but why would you want your cousin's hand me downs?"
26,AmItheAsshole,AITA for wanting full rent?,"Some backstory is necessary to understand this.\n\nA couple of months ago i inherited an apartment from my grandpa. He bought it in the 70s and its worth about $400 000 right now. It's in a really nice neighbourhood where rent for similar apartments is around $2000/ month. \nSince I own the apartment the rent is about $400-500 a month, and I moved in a couple of weeks ago. \nIt's a 3 bedroom 2 bathroom + kitchen which is obviously too big for one person, which is why I've mentioned to some p...",Why are you paying rent on a place you own?
37,AmItheAsshole,AITA For Kicking a Car That Almost Hit Me?,"I was riding my bike yesterday through the city around rush hour, it was raining. I generally stay pretty safe, use crosswalks at big intersections- I'm not the bike person you see swerving in and out of traffic and running lights and stop signs.\n\nI was waiting at a crosswalk on my bike for the light to turn. A car stopped at the crosswalk (she didn't need to, her light was green but people do that sometimes). There was only one car in sight and it was directly behind her. I started go...","Her light was green, so I'm guessing the crosswalk was on the 'no walk/stop' symbol at this time?"
47,legaladvice,"(USA, Virginia) Lawyers my gf hired to handle a settlement case didn’t confirm with the hospital that the bill was paid for and now her credit score is tanked because of a $9000 bill that they claim was never paid for","My girlfriend and I were in a car accident last year and she decided on going with a settlement case. She was sent to the ER that night and when she got the bill to pay for the ER, it was sent to the lawyers for the use to go towards the settlement case. The lawyers said not to worry and just send the bills their way. \n\nAfter the case was done, the lawyers had documented that the hospital bill was paid for by insurance, so we thought that that was a done deal. \n\nMy gf checked her credit ...",Aren't you supposed to reimburse insurance in cases like this?
52,legaladvice,car dealership screwed my fiance over,"today my fiance went and bought a 5000 dollar car in full, then not even 5 hours later the car broke down on the side of the highway and we barely got it home. my fiance and 9 month old baby were stuck on the side of the highway for 2 hours till i could get to them. does the dealship have a responsibility to fix the car or take it back? or can i sue them for something if they refuse to help us out? \(live in kansas\)",What did your independent mechanic say when they inspected the car before you purchased it?
83,personalfinance,Should I really invest in a 401k if I'll be leaving the US in ten years?,"I'm an Indian national working in the US, in my late 20s. I'm currently enrolled in my employer's 401k match program, and 8% of every paycheck goes into 401k (company pays 50% of the first 6%). What I want to know is if it makes sense to contribute this much into retirement savings, if I'm going to leave the country in a decade? If I want to buy a house in a couple of years, am I better off using this money for that purpose?\n\nI understand the advantages of doing this:\n\n1. Company match (...",If you're leaving in 10 years or less... Why buy a house?


unsure data for label relevant


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
8,Advice,Chronically ill and stuck on what to do about money.,"Hello, r/advice! I wanted to take a shot and see what this subreddit had to say about my situation. Recently I've been diagnosed with a couple conditions and they've made it so I'm unable to work. On the other hand, I found out my medicaid had been changed so now it won't pay for anything. They said if I get my termination letter from my last job that they will review my case and probably reinstate my full medicaid. But, they said I would also have to stay unemployed to receive benefits. Wha...",mind if i ask what disability you've been diagnosed with?
12,Advice,My friend is demanding me to pay her,"English is not my first language, but I'll do my best. \n\nSo, last friday I was with this girl in school. Tbh I never considered myself a close friend of her, but I wanted to talk a bit with her. I was lying on the floor, she was next to me seated on the floor too. She put her glasses in my belly, I didn't feel anything, she neither told me anything. So, I seated too, glasses felt on the ground and apparently such a small fall made them to (a line in middle of the lense? It's not a crack li...",Would you ask her to pay if she knocked your glasses on the floor?
14,Advice,Fresh peppers,"Hi everyone! I hate fresh peppers green, red and yellow i can only eat pickled hot peppers but i want to be able to eat fresh peppers too. I nerd help any ideas?","Why do you wanna force yourself to do something you don't enjoy, op?"
30,AmItheAsshole,AITA first this prank I pulled?,"My first time here so sorry if this breaks any rules or anything. I played a prank on my friend yesterday. She applied for her first job at a local ski place. I thought it would be funny as a prank to text her, from a different number obviously, as someone from the ski lodge saying she got the job. They're not telling anyone if they've been hired until the 9th and she knows that because shes the one that told me that. She texted back thanking for the opportunity and being very grateful. I te...","Somehow, I feel like we're not getting the full story on this one... Any other information that's material to the story?"


unsure data for label rewrite


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
1,Advice,Help motivate me PLEASE,This is a little bit pathetic but I’m just sat here doing basically nothing. I NEED to do my homework as it’s due in tomorrow but I just can’t be bothered. Please help motivate me x,How long will it take you to do your homework?
8,Advice,Chronically ill and stuck on what to do about money.,"Hello, r/advice! I wanted to take a shot and see what this subreddit had to say about my situation. Recently I've been diagnosed with a couple conditions and they've made it so I'm unable to work. On the other hand, I found out my medicaid had been changed so now it won't pay for anything. They said if I get my termination letter from my last job that they will review my case and probably reinstate my full medicaid. But, they said I would also have to stay unemployed to receive benefits. Wha...",mind if i ask what disability you've been diagnosed with?
13,Advice,My friend is demanding me to pay her,"English is not my first language, but I'll do my best. \n\nSo, last friday I was with this girl in school. Tbh I never considered myself a close friend of her, but I wanted to talk a bit with her. I was lying on the floor, she was next to me seated on the floor too. She put her glasses in my belly, I didn't feel anything, she neither told me anything. So, I seated too, glasses felt on the ground and apparently such a small fall made them to (a line in middle of the lense? It's not a crack li...",Would you ask her to pay if she knocked your glasses on the floor?
14,Advice,Fresh peppers,"Hi everyone! I hate fresh peppers green, red and yellow i can only eat pickled hot peppers but i want to be able to eat fresh peppers too. I nerd help any ideas?","Why do you wanna force yourself to do something you don't enjoy, op?"
16,Advice,Help I’m insecure of my girlfriend going to gym and going places by herself,"I work most of he time so there isn’t much time for therapy\n\nBut I’m always insecure of my girlfriend working Uber late night, going to the gym trying to lose weight because I don’t want guys talking to her or looking at her, and on her phone sometimes too, I get scared she is texting another guy.\n\n\nI really don’t want to lose her, help?",if you don’t trust her then why are you dating her ?
17,Advice,Help I’m insecure of my girlfriend going to gym and going places by herself,"I work most of he time so there isn’t much time for therapy\n\nBut I’m always insecure of my girlfriend working Uber late night, going to the gym trying to lose weight because I don’t want guys talking to her or looking at her, and on her phone sometimes too, I get scared she is texting another guy.\n\n\nI really don’t want to lose her, help?",if you don’t trust her then why are you dating her ?
24,AmItheAsshole,WIBTA for dating my cousin’s (f) old crush (m)?,A year or two ago she was trying to date this guy who had a mutual interest in her but it just didn’t work out because it wasn’t the right time in life you know what I mean?\nShe now has a boyfriend and they’re pretty serious but the guy and I matched on Tinder and hit it off really well. I really feel like she wouldn’t care but maybe be jealous? So WIBTA for dating him without asking how she feels first?,"NTA, but why would you want your cousin's hand me downs?"
28,AmItheAsshole,AITA? Friend insisted on doing me what she thought was a kindness even though I specifically asked her not too....,"Aita? I was hospitalized For about a week unexpectedly. A very good friend of mine called and I told her I was in the hospital. She asked if I needed anything I said I didn’t then later she called and said ""we(her family) are brining you dinner what do You want?"" I specifically asked her not to come, I was In Isolation and not supposed to ha e visitors as I was being tested for TB and airborne illness and really didn’t want to have her come and really didn’t want any food. She is a strong pe...",Did you tell her you were in isolation or just that you were sick?
32,AmItheAsshole,AITA for not answering my bosses text?,"Context: It's my anniversary weekend. Yesterday I ignored my phone majority of the day except to check on my mom in the hospital. \n\nMy boss texted me twice and I ignored it along with ignoring my group chat and other phone calls. I read the texts through the notifications panel of my phone and saw it wasn't anything important, so didn't make the effort to unlock my phone and text back. \n\nThis morning I responded and she had a break down about me not responding and it being rude. I told h...","Almost definitely NTA, but I’m curious — what did the texts say?"
37,AmItheAsshole,AITA For Kicking a Car That Almost Hit Me?,"I was riding my bike yesterday through the city around rush hour, it was raining. I generally stay pretty safe, use crosswalks at big intersections- I'm not the bike person you see swerving in and out of traffic and running lights and stop signs.\n\nI was waiting at a crosswalk on my bike for the light to turn. A car stopped at the crosswalk (she didn't need to, her light was green but people do that sometimes). There was only one car in sight and it was directly behind her. I started go...","Her light was green, so I'm guessing the crosswalk was on the 'no walk/stop' symbol at this time?"


- `contains_answer`: "why" questions and complex reasoning.
- `more_info`: "why" questions, further inference about situation.
- `relevant`: more background information that may be tangential to situation.
- `rewrite`: "why" questions, more background information that author may not feel capable of providing ("what did the texts say" -> privacy)

In [11]:
## compare against answers with "ideal" labels
ideal_label_vals = {
    'contains_answer.num' : 0,
    'more_info.num' : 1,
    'relevant.num' : 1,
    'rewrite.num' : 1,
}
num_annotators = 2
for label_col in num_label_cols:
    print(f'agreement data for label {label_col.split(".")[0]}')
    ## get tasks with agreement
    ideal_label_val = ideal_label_vals.get(label_col)
    ideal_val_questions = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: (x.loc[:, label_col]==ideal_label_val).sum() == num_annotators)
    ideal_val_question_ids = ideal_val_questions[ideal_val_questions].index.tolist()
    ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(ideal_val_question_ids)].drop_duplicates('HITId')
    display(ideal_val_data.loc[:, ['subreddit', 'Input.title', 'Input.text', 'Input.question']].head(10))

agreement data for label contains_answer


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
0,Advice,Help motivate me PLEASE,This is a little bit pathetic but I’m just sat here doing basically nothing. I NEED to do my homework as it’s due in tomorrow but I just can’t be bothered. Please help motivate me x,How long will it take you to do your homework?
2,Advice,"My brother, who abused me up until i was 14, wants to talk to me.","So, like the title says, my brother wants to talk to me for the first time since I was 14. When we were younger my brother used to beat me relentlessly, even causing permanent damage, as I have had to deal with a studder since I was 13 after a particularly bad beating. \n\nI'm not sure how I should approach this, we haven't talked in years, as I'm now 21 and he's 25, and I feel the only reason he wants to talk to me is because he broke up with his girlfriend of 3 years while I am engaged to ...",Do you want to forgive and forget or are you content with not having a relationship with him?
4,Advice,Should I move to Florida with my dad or stay home with my friends,"Live in NYC metro area and go to college in upstate NY. Recently found out fathers job is going to move to Orlando (either he moves or is laid off). He’s convinced he won’t find a new job at his age because of younger people taking over. Mom is probably going to follow him, older sister is applying to grad schools in NY and FL. College I go to is not known in FL and has very good connections in NYC. My thought is it is not worth moving my home life down to FL if I’m graduating into the workf...","In the other hand, you’re an adult and will most likely fly the coop soon anyways...so do you have the funds to live alone?"
8,Advice,Chronically ill and stuck on what to do about money.,"Hello, r/advice! I wanted to take a shot and see what this subreddit had to say about my situation. Recently I've been diagnosed with a couple conditions and they've made it so I'm unable to work. On the other hand, I found out my medicaid had been changed so now it won't pay for anything. They said if I get my termination letter from my last job that they will review my case and probably reinstate my full medicaid. But, they said I would also have to stay unemployed to receive benefits. Wha...",mind if i ask what disability you've been diagnosed with?
10,Advice,Should I be offended if people give me homophobic abuse even though I'm straight?,Should I be offended if people give me homophobic abuse even though I'm straight? Would that even warrant a call to the police?,Can you explain what homophobic abuse they are giving you?
20,AmItheAsshole,AITA for hacking my calculator and saving notes on it to cheat.,"New school, teachers are absolute hell, we will write only one real test per year and the rest of it are pop up tests so you have to study 24/7 for every subject and hope for the best",How does this help you in the long run?
22,AmItheAsshole,AITA? Dude follows me for hours on end and I refuse to work nights from now on,"I work at a small family owned business that does kids birthday parties and fun types of entertainment. We also offer specific type of indoor sport that the store does team nights occasionally.\n\nFor context I am an 18 year old female who often works nights until 9/10pm. \n\nMy issue is that there is this guy (who is about 28 years old) comes in every single Friday, Saturday and Sunday night who does nothing but follow me around. Sometimes for up to six hours at a time. \n\nHe supposedly ju...",Have you mentioned to your boss how he is making your life miserable by not doing anything about Mr creepo?
30,AmItheAsshole,AITA first this prank I pulled?,"My first time here so sorry if this breaks any rules or anything. I played a prank on my friend yesterday. She applied for her first job at a local ski place. I thought it would be funny as a prank to text her, from a different number obviously, as someone from the ski lodge saying she got the job. They're not telling anyone if they've been hired until the 9th and she knows that because shes the one that told me that. She texted back thanking for the opportunity and being very grateful. I te...","Somehow, I feel like we're not getting the full story on this one... Any other information that's material to the story?"
34,AmItheAsshole,AITA for putting restrictions on spousal friendships?,"My spouse went on a solo vacation with a mutual friend of the opposite gender. I strongly argued against the trip, but I couldn't persuade my spouse it was a bad idea, so I reluctantly gave my ""permission"". In general my spouse has difficulty making and keeping friends, so I was trying to be supportive. During the trip they talked about many deep emotional topics, and my spouse found this connection to be very beneficial. They also drank heavily the entire trip and fell asleep together in a ...",Wtf were they doing laying in a hammock together?
38,AmItheAsshole,AITA For telling her to fuck off?,"I was dating this girl for almost two years, and everything is going fine. We didn't have any arguments or fights or other ""couple drama"" that you'd think of. Our sex life was... scarce to say the least? She never wanted to go ""all-the-way"" to say the least. However, she expects me to get her off by other means but never does anything towards me. It was a point that really irritated me, but as I'm really antisocial and hate conflicts, I leave it be because I genuinely loved her. \n\nCue four...",What feelings do you still have for someone who did that to you?


agreement data for label more_info


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
0,Advice,Help motivate me PLEASE,This is a little bit pathetic but I’m just sat here doing basically nothing. I NEED to do my homework as it’s due in tomorrow but I just can’t be bothered. Please help motivate me x,How long will it take you to do your homework?
2,Advice,"My brother, who abused me up until i was 14, wants to talk to me.","So, like the title says, my brother wants to talk to me for the first time since I was 14. When we were younger my brother used to beat me relentlessly, even causing permanent damage, as I have had to deal with a studder since I was 13 after a particularly bad beating. \n\nI'm not sure how I should approach this, we haven't talked in years, as I'm now 21 and he's 25, and I feel the only reason he wants to talk to me is because he broke up with his girlfriend of 3 years while I am engaged to ...",Do you want to forgive and forget or are you content with not having a relationship with him?
4,Advice,Should I move to Florida with my dad or stay home with my friends,"Live in NYC metro area and go to college in upstate NY. Recently found out fathers job is going to move to Orlando (either he moves or is laid off). He’s convinced he won’t find a new job at his age because of younger people taking over. Mom is probably going to follow him, older sister is applying to grad schools in NY and FL. College I go to is not known in FL and has very good connections in NYC. My thought is it is not worth moving my home life down to FL if I’m graduating into the workf...","In the other hand, you’re an adult and will most likely fly the coop soon anyways...so do you have the funds to live alone?"
6,Advice,Stepped On a Rusty Nail at Work Today,"While I was working a yard, I stepped on something really sharp. I noticed that I wasn't really feeling anything sort of cut, or sharp stinging pain that you'd get from something sharp. It hurt like a blunt hit, like if you were punched instead of being scratched. Because I was wearing shoes, I was sceptical if it actually stabbed into my foot. I worked the rest of the day pretty fine. I was bleeding just a little bit but it formed a bruise. It's sore now but that's to assume when you step o...",Why does your title say stepped on today if it was not today?
8,Advice,Chronically ill and stuck on what to do about money.,"Hello, r/advice! I wanted to take a shot and see what this subreddit had to say about my situation. Recently I've been diagnosed with a couple conditions and they've made it so I'm unable to work. On the other hand, I found out my medicaid had been changed so now it won't pay for anything. They said if I get my termination letter from my last job that they will review my case and probably reinstate my full medicaid. But, they said I would also have to stay unemployed to receive benefits. Wha...",mind if i ask what disability you've been diagnosed with?
10,Advice,Should I be offended if people give me homophobic abuse even though I'm straight?,Should I be offended if people give me homophobic abuse even though I'm straight? Would that even warrant a call to the police?,Can you explain what homophobic abuse they are giving you?
16,Advice,Help I’m insecure of my girlfriend going to gym and going places by herself,"I work most of he time so there isn’t much time for therapy\n\nBut I’m always insecure of my girlfriend working Uber late night, going to the gym trying to lose weight because I don’t want guys talking to her or looking at her, and on her phone sometimes too, I get scared she is texting another guy.\n\n\nI really don’t want to lose her, help?",if you don’t trust her then why are you dating her ?
20,AmItheAsshole,AITA for hacking my calculator and saving notes on it to cheat.,"New school, teachers are absolute hell, we will write only one real test per year and the rest of it are pop up tests so you have to study 24/7 for every subject and hope for the best",How does this help you in the long run?
30,AmItheAsshole,AITA first this prank I pulled?,"My first time here so sorry if this breaks any rules or anything. I played a prank on my friend yesterday. She applied for her first job at a local ski place. I thought it would be funny as a prank to text her, from a different number obviously, as someone from the ski lodge saying she got the job. They're not telling anyone if they've been hired until the 9th and she knows that because shes the one that told me that. She texted back thanking for the opportunity and being very grateful. I te...","Somehow, I feel like we're not getting the full story on this one... Any other information that's material to the story?"
32,AmItheAsshole,AITA for not answering my bosses text?,"Context: It's my anniversary weekend. Yesterday I ignored my phone majority of the day except to check on my mom in the hospital. \n\nMy boss texted me twice and I ignored it along with ignoring my group chat and other phone calls. I read the texts through the notifications panel of my phone and saw it wasn't anything important, so didn't make the effort to unlock my phone and text back. \n\nThis morning I responded and she had a break down about me not responding and it being rude. I told h...","Almost definitely NTA, but I’m curious — what did the texts say?"


agreement data for label relevant


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
0,Advice,Help motivate me PLEASE,This is a little bit pathetic but I’m just sat here doing basically nothing. I NEED to do my homework as it’s due in tomorrow but I just can’t be bothered. Please help motivate me x,How long will it take you to do your homework?
2,Advice,"My brother, who abused me up until i was 14, wants to talk to me.","So, like the title says, my brother wants to talk to me for the first time since I was 14. When we were younger my brother used to beat me relentlessly, even causing permanent damage, as I have had to deal with a studder since I was 13 after a particularly bad beating. \n\nI'm not sure how I should approach this, we haven't talked in years, as I'm now 21 and he's 25, and I feel the only reason he wants to talk to me is because he broke up with his girlfriend of 3 years while I am engaged to ...",Do you want to forgive and forget or are you content with not having a relationship with him?
4,Advice,Should I move to Florida with my dad or stay home with my friends,"Live in NYC metro area and go to college in upstate NY. Recently found out fathers job is going to move to Orlando (either he moves or is laid off). He’s convinced he won’t find a new job at his age because of younger people taking over. Mom is probably going to follow him, older sister is applying to grad schools in NY and FL. College I go to is not known in FL and has very good connections in NYC. My thought is it is not worth moving my home life down to FL if I’m graduating into the workf...","In the other hand, you’re an adult and will most likely fly the coop soon anyways...so do you have the funds to live alone?"
6,Advice,Stepped On a Rusty Nail at Work Today,"While I was working a yard, I stepped on something really sharp. I noticed that I wasn't really feeling anything sort of cut, or sharp stinging pain that you'd get from something sharp. It hurt like a blunt hit, like if you were punched instead of being scratched. Because I was wearing shoes, I was sceptical if it actually stabbed into my foot. I worked the rest of the day pretty fine. I was bleeding just a little bit but it formed a bruise. It's sore now but that's to assume when you step o...",Why does your title say stepped on today if it was not today?
10,Advice,Should I be offended if people give me homophobic abuse even though I'm straight?,Should I be offended if people give me homophobic abuse even though I'm straight? Would that even warrant a call to the police?,Can you explain what homophobic abuse they are giving you?
16,Advice,Help I’m insecure of my girlfriend going to gym and going places by herself,"I work most of he time so there isn’t much time for therapy\n\nBut I’m always insecure of my girlfriend working Uber late night, going to the gym trying to lose weight because I don’t want guys talking to her or looking at her, and on her phone sometimes too, I get scared she is texting another guy.\n\n\nI really don’t want to lose her, help?",if you don’t trust her then why are you dating her ?
18,Advice,Weird feeling and don't know what to do.,"Hello everyone. So to try and keep this short , there's this married gay couple on a TV show and seeing them just made me feel like I wanted to be them so bad. I wanted to be in one of those spot of the couple so i could experience it. It's almost like love but it dont really think it is like that. It's really messing me up and I just need someone to help me with this.",You want to be them or in a relationship like them?
20,AmItheAsshole,AITA for hacking my calculator and saving notes on it to cheat.,"New school, teachers are absolute hell, we will write only one real test per year and the rest of it are pop up tests so you have to study 24/7 for every subject and hope for the best",How does this help you in the long run?
22,AmItheAsshole,AITA? Dude follows me for hours on end and I refuse to work nights from now on,"I work at a small family owned business that does kids birthday parties and fun types of entertainment. We also offer specific type of indoor sport that the store does team nights occasionally.\n\nFor context I am an 18 year old female who often works nights until 9/10pm. \n\nMy issue is that there is this guy (who is about 28 years old) comes in every single Friday, Saturday and Sunday night who does nothing but follow me around. Sometimes for up to six hours at a time. \n\nHe supposedly ju...",Have you mentioned to your boss how he is making your life miserable by not doing anything about Mr creepo?
24,AmItheAsshole,WIBTA for dating my cousin’s (f) old crush (m)?,A year or two ago she was trying to date this guy who had a mutual interest in her but it just didn’t work out because it wasn’t the right time in life you know what I mean?\nShe now has a boyfriend and they’re pretty serious but the guy and I matched on Tinder and hit it off really well. I really feel like she wouldn’t care but maybe be jealous? So WIBTA for dating him without asking how she feels first?,"NTA, but why would you want your cousin's hand me downs?"


agreement data for label rewrite


Unnamed: 0,subreddit,Input.title,Input.text,Input.question
2,Advice,"My brother, who abused me up until i was 14, wants to talk to me.","So, like the title says, my brother wants to talk to me for the first time since I was 14. When we were younger my brother used to beat me relentlessly, even causing permanent damage, as I have had to deal with a studder since I was 13 after a particularly bad beating. \n\nI'm not sure how I should approach this, we haven't talked in years, as I'm now 21 and he's 25, and I feel the only reason he wants to talk to me is because he broke up with his girlfriend of 3 years while I am engaged to ...",Do you want to forgive and forget or are you content with not having a relationship with him?
4,Advice,Should I move to Florida with my dad or stay home with my friends,"Live in NYC metro area and go to college in upstate NY. Recently found out fathers job is going to move to Orlando (either he moves or is laid off). He’s convinced he won’t find a new job at his age because of younger people taking over. Mom is probably going to follow him, older sister is applying to grad schools in NY and FL. College I go to is not known in FL and has very good connections in NYC. My thought is it is not worth moving my home life down to FL if I’m graduating into the workf...","In the other hand, you’re an adult and will most likely fly the coop soon anyways...so do you have the funds to live alone?"
10,Advice,Should I be offended if people give me homophobic abuse even though I'm straight?,Should I be offended if people give me homophobic abuse even though I'm straight? Would that even warrant a call to the police?,Can you explain what homophobic abuse they are giving you?
18,Advice,Weird feeling and don't know what to do.,"Hello everyone. So to try and keep this short , there's this married gay couple on a TV show and seeing them just made me feel like I wanted to be them so bad. I wanted to be in one of those spot of the couple so i could experience it. It's almost like love but it dont really think it is like that. It's really messing me up and I just need someone to help me with this.",You want to be them or in a relationship like them?
26,AmItheAsshole,AITA for wanting full rent?,"Some backstory is necessary to understand this.\n\nA couple of months ago i inherited an apartment from my grandpa. He bought it in the 70s and its worth about $400 000 right now. It's in a really nice neighbourhood where rent for similar apartments is around $2000/ month. \nSince I own the apartment the rent is about $400-500 a month, and I moved in a couple of weeks ago. \nIt's a 3 bedroom 2 bathroom + kitchen which is obviously too big for one person, which is why I've mentioned to some p...",Why are you paying rent on a place you own?
34,AmItheAsshole,AITA for putting restrictions on spousal friendships?,"My spouse went on a solo vacation with a mutual friend of the opposite gender. I strongly argued against the trip, but I couldn't persuade my spouse it was a bad idea, so I reluctantly gave my ""permission"". In general my spouse has difficulty making and keeping friends, so I was trying to be supportive. During the trip they talked about many deep emotional topics, and my spouse found this connection to be very beneficial. They also drank heavily the entire trip and fell asleep together in a ...",Wtf were they doing laying in a hammock together?
56,legaladvice,I’m fifteen and my mom refuses to enroll me in school.,"I’m 15 and I live in Minnesota where minors must go to school until age sixteen. I want to graduate and be successful but I obviously can’t if I’m not allowed to be in school. Any ideas on what I can do? I really want to go to school but my mom (my only legal guardian, father is not involved) will not allow me to. I’m going into my sophomore year of high school. I attended my freshman year, but had trouble doing so die to mental and physical health issues. I feel stuck.","Out of curiosity, if it's not too personal, what's the reason for yout mother's stance on this?"
58,legaladvice,Advice: driving Uber under someone else’s account,"Hypothetically, if my friend wanted to drive lyft (sorry not Uber) under his dads account and they look similar: What risks would he be running? Anything criminal or just civil in the case of an accident? (Florida)",Is there a reason he can't get his own rideshare account?
60,legaladvice,"Renting a condo with a cat, no pets allowed.","Location is Alberta. TL;DR at bottom.\n\nWe recently signed a 1 year lease on a condo in Alberta. We are renting from the owner. Before we looked at the place we said we had a cat and he said 1 cat was fine. When we went to look at the unit we noticed a sign at the front entrance saying no pets allowed. We asked about this and he said only cats are allowed on the building but they need to be shown to the building manager and approved. \n\nWe really liked the place, send the information requ...",OP- What does your lease say about pets?
62,pcmasterrace,RX 580 vs. GTX 1070,Hey guys. So I was just messing around and then I saw the price of the RX and benchmarks on both of the cards(Sapphire nitro+ vs. Gigabyte windforce oc). The two seem too close for this price gap(it's way more expensive In Bulgaria). Any advice on which one to pick is welcome.,How much is each card where you are at?


- `contains_answer`: concrete details about experience, hypothetical situations.
- `more_info`: concrete details about experience (including "why" questions), addressing logical inconsistencies (""trip" in a wheelchair").
- `relevant`: concrete details about experience, mentioning entities/concepts from post and title.
- `rewrite`: reasons for narrative choices (including "why" questions), confirmation of details with likely minimal effort to include.

### Determining question validity: word overlap
Ideally we don't want to have to manually annotate hundreds of questions to get a reasonable collection of data for training/testing.

Let's figure out if we can automatically identify valid questions, according to crowdworkers, based on their semantic overlap with the post title and post text.

We are expecting that the $\text{sim}(\text{valid_question}, \text{post}) > \text{sim}(\text{invalid_question}, \text{post})$.

In [12]:
## easy similarity: word overlap
from nltk.stem.snowball import PorterStemmer
from nltk.tokenize import WordPunctTokenizer, PunktSentenceTokenizer
## prepare text
def clean_text(text, stemmer, word_tokenizer, sent_tokenizer):
    text_sents = sent_tokenizer.tokenize(text)
    # tokenize and stem
    text_sent_tokens = list(map(lambda x:list(map(lambda y:stemmer.stem(y),word_tokenizer.tokenize(x))),text_sents))
    return text_sent_tokens
stemmer = PorterStemmer()
word_tokenizer = WordPunctTokenizer()
sent_tokenizer = PunktSentenceTokenizer()
crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
    'title_tokens' : crowdsource_label_subreddit_data.loc[:, 'Input.title'].apply(lambda x: clean_text(x, stemmer, word_tokenizer, sent_tokenizer)),
    'text_tokens' : crowdsource_label_subreddit_data.loc[:, 'Input.text'].apply(lambda x: clean_text(x, stemmer, word_tokenizer, sent_tokenizer)),
    'question_tokens' : crowdsource_label_subreddit_data.loc[:, 'Input.question'].apply(lambda x: clean_text(x, stemmer, word_tokenizer, sent_tokenizer)),
})

In [13]:
import numpy as np
from itertools import product
def compute_word_overlap(text_1, text_2):
    # Jaccard similarity
    word_overlap = set(text_1) & set(text_2)
    word_union = set(text_1) | set(text_2)
    word_overlap_sim = len(word_overlap) / len(word_union)
    return word_overlap_sim
def compute_sent_word_overlap(text_1, text_2):
    # compute word overlap for all pairs of sentences
    # then get max score
    sent_pairs = list(product(text_1, text_2))
    sent_word_overlap_scores = np.array([compute_word_overlap(sent_i, sent_j) for sent_i, sent_j in sent_pairs])
    max_word_overlap_score = max(sent_word_overlap_scores)
    max_word_overlap_sent_pair = sent_pairs[np.argmax(sent_word_overlap_scores)]
    return max_word_overlap_score, max_word_overlap_sent_pair
crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
    'question_title_overlap' : crowdsource_label_subreddit_data.apply(lambda x: compute_sent_word_overlap(x.loc['title_tokens'], x.loc['question_tokens']), axis=1),
    'question_text_overlap' : crowdsource_label_subreddit_data.apply(lambda x: compute_sent_word_overlap(x.loc['text_tokens'], x.loc['question_tokens']), axis=1),
})
crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
    'question_title_overlap_score' : crowdsource_label_subreddit_data.loc[:, 'question_title_overlap'].apply(lambda x: x[0]),
    'question_title_overlap_sent' : crowdsource_label_subreddit_data.loc[:, 'question_title_overlap'].apply(lambda x: x[1][0]),
    'question_text_overlap_score' : crowdsource_label_subreddit_data.loc[:, 'question_text_overlap'].apply(lambda x: x[0]),
    'question_text_overlap_sent' : crowdsource_label_subreddit_data.loc[:, 'question_text_overlap'].apply(lambda x: x[1][0]),
})

Do we find a higher level of overlap for "ideal" questions?

In [28]:
from scipy.stats import mannwhitneyu
score_vars = ['question_text_overlap_score', 'question_title_overlap_score']
for score_var in score_vars:
    print(f'---- testing score var {score_var} ----')
    for label_col in num_label_cols:
        print(f'**** testing label {label_col} ****')
        ideal_label_val = ideal_label_vals.get(label_col)
        non_ideal_label_val = 1 - ideal_label_val
        ideal_val_questions = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: (x.loc[:, label_col]==ideal_label_val).sum() == num_annotators)
        non_ideal_val_questions = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: (x.loc[:, label_col]==non_ideal_label_val).sum() == num_annotators)
        ideal_val_question_ids = ideal_val_questions[ideal_val_questions].index.tolist()
        non_ideal_val_question_ids = non_ideal_val_questions[non_ideal_val_questions].index.tolist()
        ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(ideal_val_question_ids)].drop_duplicates('HITId')
        non_ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(non_ideal_val_question_ids)].drop_duplicates('HITId')
        ## compare score distributions
        ideal_val_data_overlap_scores = ideal_val_data.loc[:, score_var]
        non_ideal_val_data_overlap_scores = non_ideal_val_data.loc[:, score_var]
        ideal_question_overlap_median = np.median(ideal_val_data_overlap_scores)
        non_ideal_question_overlap_median = np.median(non_ideal_val_data_overlap_scores)
        test_stat, p_val = mannwhitneyu(ideal_val_data_overlap_scores, non_ideal_val_data_overlap_scores)
        print(f'ideal question has median = {"{:1.3f}".format(ideal_question_overlap_median)}')
        print(f'non-ideal question has median = {"{:1.3f}".format(non_ideal_question_overlap_median)}')
        print(f'difference has test stat = {"{:1.3f}".format(test_stat)} (p={"{:1.3f}".format(p_val)})')

---- testing score var question_text_overlap_score ----
**** testing label contains_answer.num ****
ideal question has median = 0.146
non-ideal question has median = 0.152
difference has test stat = 31.000 (p=0.328)
**** testing label more_info.num ****
ideal question has median = 0.143
non-ideal question has median = 0.167
difference has test stat = 7.500 (p=0.208)
**** testing label relevant.num ****
ideal question has median = 0.133
non-ideal question has median = 0.207
difference has test stat = 5.500 (p=0.111)
**** testing label rewrite.num ****
ideal question has median = 0.129
non-ideal question has median = 0.155
difference has test stat = 22.500 (p=0.134)
---- testing score var question_title_overlap_score ----
**** testing label contains_answer.num ****
ideal question has median = 0.053
non-ideal question has median = 0.056
difference has test stat = 34.500 (p=0.425)
**** testing label more_info.num ****
ideal question has median = 0.043
non-ideal question has median = 0.088


OK! The differences aren't significant, but the "ideal" questions for each label type seem to have less overlap with the text and title than the non-ideal questions.

In [58]:
## same thing but with right label combinations
def question_labels_all_valid(data, label_cols, ideal_label_vals):
    num_annotators = data.shape[0]
#     print(f'num annotators {num_annotators}')
    label_col_valid = []
    for label_col in label_cols:
#         print(data.loc[:, label_col])
        labels_are_valid = (data.loc[:, label_col] == ideal_label_vals[label_col]).astype(int).sum() == num_annotators
#         print(labels_are_valid)
        label_col_valid.append(labels_are_valid)
#     print(label_col_valid)
    return all(label_col_valid)
valid_check_label_cols = ['contains_answer.num', 'more_info.num', 'relevant.num']
ideal_question_id_data = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: question_labels_all_valid(x, valid_check_label_cols, ideal_label_vals))
ideal_question_ids = ideal_question_id_data[ideal_question_id_data].index.tolist()
ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(ideal_question_ids)]
# non-ideal questions should have consistent agreement between annotators
# NOPE that reduces sample size too much
# agreement_question_id_data = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: (x.loc[:, valid_check_label_cols].iloc[0, :] == x.loc[:, valid_check_label_cols].iloc[1, :]).sum() == len(valid_check_label_cols))
# agreement_question_ids = agreement_question_id_data[agreement_question_id_data].index.tolist()
non_ideal_question_ids = set(crowdsource_label_subreddit_data.loc[:, 'HITId'].unique()) - set(ideal_question_ids)
non_ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(non_ideal_question_ids)]
# deduplicate
ideal_val_data.drop_duplicates('HITId', inplace=True)
non_ideal_val_data.drop_duplicates('HITId', inplace=True)
for score_var in score_vars:
    print(f'---- testing score var {score_var} ----')
    ## compare score distributions
    ideal_val_data_overlap_scores = ideal_val_data.loc[:, score_var]
    non_ideal_val_data_overlap_scores = non_ideal_val_data.loc[:, score_var]
    ideal_question_overlap_median = np.median(ideal_val_data_overlap_scores)
    non_ideal_question_overlap_median = np.median(non_ideal_val_data_overlap_scores)
    test_stat, p_val = mannwhitneyu(ideal_val_data_overlap_scores, non_ideal_val_data_overlap_scores)
    print(f'ideal question has median = {"{:1.3f}".format(ideal_question_overlap_median)}')
    print(f'non-ideal question has median = {"{:1.3f}".format(non_ideal_question_overlap_median)}')
    print(f'difference has test stat = {"{:1.3f}".format(test_stat)} (N={ideal_val_data.shape[0]+non_ideal_val_data.shape[0]}, p={"{:1.3f}".format(p_val)})')

---- testing score var question_text_overlap_score ----
ideal question has median = 0.143
non-ideal question has median = 0.146
difference has test stat = 233.500 (N=50, p=0.170)
---- testing score var question_title_overlap_score ----
ideal question has median = 0.026
non-ideal question has median = 0.048
difference has test stat = 228.000 (N=50, p=0.139)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ideal_val_data.drop_duplicates('HITId', inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_ideal_val_data.drop_duplicates('HITId', inplace=True)


OK! We see an even stronger result than before: ideal questions tend to have less overlap with the original.

What are some example sentences for ideal questions and non-ideal questions with high/low overlap scores?

In [60]:
question_var = 'Input.question'
for score_var in score_vars:
    print(f'---- testing score var {score_var} ----')
    ## compare score distributions
    sent_score_var = score_var.replace('_score', '_sent')
    print('ideally labelled questions')
    display(ideal_val_data.loc[:, [question_var, score_var, sent_score_var]].sort_values(score_var, inplace=False, ascending=True).head(10))
    print('non-ideally labelled questions')
    display(non_ideal_val_data.loc[:, [question_var, score_var, sent_score_var]].sort_values(score_var, inplace=False, ascending=False).head(10))
#     ideal_val_data_overlap_scores = ideal_val_data.loc[:, score_var]
#     non_ideal_val_data_overlap_scores = non_ideal_val_data.loc[:, score_var]
#     ideal_question_overlap_median = np.median(ideal_val_data_overlap_scores)
#     non_ideal_question_overlap_median = np.median(non_ideal_val_data_overlap_scores)
#     test_stat, p_val = mannwhitneyu(ideal_val_data_overlap_scores, non_ideal_val_data_overlap_scores)
#     print(f'ideal question has median = {"{:1.3f}".format(ideal_question_overlap_median)}')
#     print(f'non-ideal question has median = {"{:1.3f}".format(non_ideal_question_overlap_median)}')
#     print(f'difference has test stat = {"{:1.3f}".format(test_stat)} (p={"{:1.3f}".format(p_val)})')

---- testing score var question_text_overlap_score ----
ideally labelled questions


Unnamed: 0,Input.question,question_text_overlap_score,question_text_overlap_sent
78,What CPU are you using and what games are you even trying to play?,0.037037,"[I, bought, thi, for, 144, HZ, game, as, I, ', ll, be, get, a, monitor, soon, .]"
20,How does this help you in the long run?,0.045455,"[new, school, ,, teacher, are, absolut, hell, ,, we, will, write, onli, one, real, test, per, year, and, the, rest, of, it, are, pop, up, test, so, you, have, to, studi, 24, /, 7, for, everi, subject, and, hope, for, the, best]"
62,How much is each card where you are at?,0.052632,"[ani, advic, on, which, one, to, pick, is, welcom, .]"
98,Did they deliver it directly to you or did they leave it somewhere?,0.0625,"[pleas, forward, a, pdf, ,, jpg, or, png, copi, of, the, report, to, the, follow, address, :, uk, -, deliveryinvestig, @, amazon, ., co, ., uk, .]"
4,"In the other hand, you’re an adult and will most likely fly the coop soon anyways...so do you have the funds to live alone?",0.092593,"[My, thought, is, it, is, not, worth, move, my, home, life, down, to, FL, if, I, ’, m, graduat, into, the, workforc, in, 3, year, ,, like, to, onli, find, a, job, in, NY, .]"
66,Did you reinstall Windows or are you using the same installation from the previous build?,0.105263,"[So, what, ', s, the, problem, ?]"
56,"Out of curiosity, if it's not too personal, what's the reason for yout mother's stance on this?",0.115385,"[ani, idea, on, what, I, can, do, ?]"
100,How did you guys decide who would share a room?,0.121212,"[now, one, of, the, guy, that, is, go, to, be, share, a, bedroom, say, that, he, should, be, pay, less, becaus, he, doesn, ', t, get, a, room, to, himself, .]"
76,does your laptop have an m.2 slot?,0.142857,"[bought, a, sandisk, m, ., 2, and, want, to, migrat, it, from, my, 5400, rpm, hdd, ., it, in, a, laptop, and, there, both, instal, help]"
2,Do you want to forgive and forget or are you content with not having a relationship with him?,0.142857,"[should, I, involv, my, famili, as, much, as, possibl, or, should, I, talk, to, him, 1, on, 1, ?]"


non-ideally labelled questions


Unnamed: 0,Input.question,question_text_overlap_score,question_text_overlap_sent
48,"Did the estate attorney say your and your brother's names ""were on"" the policies, or ""were not"" on them?",0.333333,"[also, ,, everybodi, is, say, they, don, ', t, have, copi, of, the, polici, with, mine, and, my, brother, ', s, name, on, them, .]"
36,"Her light was green, so I'm guessing the crosswalk was on the 'no walk/stop' symbol at this time?",0.277778,"[A, car, stop, at, the, crosswalk, (, she, didn, ', t, need, to, ,, her, light, wa, green, but, peopl, do, that, sometim, ).]"
42,Why are you brewing alcohol in a mental hospital?,0.272727,"[but, if, I, ’, m, brew, it, in, a, mental, hospit, ,, doe, that, count, as, possess, it, ?]"
16,if you don’t trust her then why are you dating her ?,0.263158,"[I, realli, don, ’, t, want, to, lose, her, ,, help, ?]"
82,If you're leaving in 10 years or less... Why buy a house?,0.225806,"[If, I, want, to, buy, a, hous, in, a, coupl, of, year, ,, am, I, better, off, use, thi, money, for, that, purpos, ?]"
12,Would you ask her to pay if she knocked your glasses on the floor?,0.208333,"[I, wa, lie, on, the, floor, ,, she, wa, next, to, me, seat, on, the, floor, too, .]"
38,What feelings do you still have for someone who did that to you?,0.206897,"[I, feel, bad, becaus, I, still, do, have, a, lot, of, feel, for, her, ,, but, I, don, ', t, want, anyth, to, do, with, her, .]"
64,"I just found a Xeon E5450 for 5 bucks, is that a good server processer?",0.181818,"[I, know, it, is, old, ,, but, I, just, want, to, know, what, the, best, one, is, to, tri, to, restor, an, old, dell, vostro, to, a, file, server, .]"
8,mind if i ask what disability you've been diagnosed with?,0.178571,"[recent, I, ', ve, been, diagnos, with, a, coupl, condit, and, they, ', ve, made, it, so, I, ', m, unabl, to, work, .]"
18,You want to be them or in a relationship like them?,0.166667,"[So, to, tri, and, keep, thi, short, ,, there, ', s, thi, marri, gay, coupl, on, a, TV, show, and, see, them, just, made, me, feel, like, I, want, to, be, them, so, bad, .]"


---- testing score var question_title_overlap_score ----
ideally labelled questions


Unnamed: 0,Input.question,question_title_overlap_score,question_title_overlap_sent
0,How long will it take you to do your homework?,0.0,"[help, motiv, me, pleas]"
86,"Did she file 0, 1, or 2 on her W-4?",0.0,"[tax, troubl]"
80,Is it set to 144hz in the Windows settings?,0.0,"[‘, motion, blur, ’, on, dell, s2716dg, monitor]"
66,Did you reinstall Windows or are you using the same installation from the previous build?,0.0,"[PC, refus, to, download, driver, !]"
56,"Out of curiosity, if it's not too personal, what's the reason for yout mother's stance on this?",0.0,"[I, ’, m, fifteen, and, my, mom, refus, to, enrol, me, in, school, .]"
62,How much is each card where you are at?,0.0,"[RX, 580, vs, .]"
20,How does this help you in the long run?,0.0,"[aita, for, hack, my, calcul, and, save, note, on, it, to, cheat, .]"
100,How did you guys decide who would share a room?,0.0,"[split, colleg, apart, rent]"
4,"In the other hand, you’re an adult and will most likely fly the coop soon anyways...so do you have the funds to live alone?",0.026316,"[should, I, move, to, florida, with, my, dad, or, stay, home, with, my, friend]"
60,OP- What does your lease say about pets?,0.052632,"[rent, a, condo, with, a, cat, ,, no, pet, allow, .]"


non-ideally labelled questions


Unnamed: 0,Input.question,question_title_overlap_score,question_title_overlap_sent
82,If you're leaving in 10 years or less... Why buy a house?,0.222222,"[should, I, realli, invest, in, a, 401k, if, I, ', ll, be, leav, the, US, in, ten, year, ?]"
38,What feelings do you still have for someone who did that to you?,0.166667,"[aita, for, tell, her, to, fuck, off, ?]"
12,Would you ask her to pay if she knocked your glasses on the floor?,0.15,"[My, friend, is, demand, me, to, pay, her]"
26,Why are you paying rent on a place you own?,0.142857,"[aita, for, want, full, rent, ?]"
64,"I just found a Xeon E5450 for 5 bucks, is that a good server processer?",0.136364,"[what, is, the, best, cpu, for, lga775, motherboard, ?]"
42,Why are you brewing alcohol in a mental hospital?,0.130435,"[Is, it, technic, illeg, to, brew, alcohol, under, 21, (, in, america, or, PA, in, gener, )]"
6,Why does your title say stepped on today if it was not today?,0.105263,"[step, On, a, rusti, nail, at, work, today]"
30,"Somehow, I feel like we're not getting the full story on this one... Any other information that's material to the story?",0.103448,"[aita, first, thi, prank, I, pull, ?]"
32,"Almost definitely NTA, but I’m curious — what did the texts say?",0.090909,"[aita, for, not, answer, my, boss, text, ?]"
44,Would it be more appropriate legally to change the phrasing to 'Learn to throw fast punches like Amir **can**' is that slightly more vague and thus more appropriate to use?,0.088235,"[use, someon, ', s, name, to, promot, a, cours]"


This echoes what we saw before: 

- The non-ideal questions may have high overlap because they are referencing complicated or hypothetical aspects of the scenario, which could be hard to judge as "question is looking for more information."
- The non-ideal questions may also only engage with the title and not the content of the post (higher title overlap).

### Determining question validity: semantic similarity

Let's repeat this analysis with semantic similarity rather than word overlap.

In [79]:
## load sentence encoder
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
encoder_model = AutoModelForSeq2SeqLM.from_pretrained('facebook/bart-base', cache_dir='../../data/model_cache/')
encoder_model.eval()
encoder_tokenizer = AutoTokenizer.from_pretrained('facebook/bart-base', cache_dir='../../data/model_cache/')

In [80]:
test_sent = 'this is a test sentence'
test_sent_tokens = encoder_tokenizer([test_sent], return_tensors='pt')
test_sent_output = encoder_model(**test_sent_tokens)
print(test_sent_output.encoder_last_hidden_state.shape)

torch.Size([1, 7, 768])


In [107]:
## get sentences to compute similarity
from nltk.tokenize import PunktSentenceTokenizer
sent_tokenizer = PunktSentenceTokenizer()
crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
    'text_sents' : crowdsource_label_subreddit_data.loc[:, 'Input.text'].apply(lambda x: sent_tokenizer.tokenize(x))
})

In [109]:
## convert all questions, post sentences, title to embeddings
def tokenize_encode(sent, tokenizer, model):
    tokens = tokenizer([sent], return_tensors='pt')
    encoding = model(**tokens)
    encoding_output = encoding.encoder_last_hidden_state
    # remove pad tokens
    encoding_output = encoding_output[:, 1:-1, :]
    # average over all tokens
    encoding_output = encoding_output.mean(axis=1).detach().numpy()[0]
    return encoding_output
import torch
with torch.no_grad():
    crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
        'question_encoding' : crowdsource_label_subreddit_data.loc[:, 'Input.question'].apply(lambda x: tokenize_encode(x, encoder_tokenizer, encoder_model))
    })
    # for posts: get encodings for all sentences separately
    crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
        'post_encoding' : crowdsource_label_subreddit_data.loc[:, 'text_sents'].apply(lambda x: list(map(lambda y: tokenize_encode(y, encoder_tokenizer, encoder_model), x)))
    })
    crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
        'title_encoding' : crowdsource_label_subreddit_data.loc[:, 'Input.title'].apply(lambda x: tokenize_encode(x, encoder_tokenizer, encoder_model))
    })

In [110]:
## compute similarity
from scipy.spatial.distance import cosine
crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
    'question_post_sent_sim' : crowdsource_label_subreddit_data.apply(lambda x: list(map(lambda y: 1-cosine(x.loc['question_encoding'], y), x.loc['post_encoding'])), axis=1),
    'question_title_sim' : crowdsource_label_subreddit_data.apply(lambda x: 1-cosine(x.loc['question_encoding'], x.loc['title_encoding']), axis=1),
})
# get max sim for post, sentence with max sim
crowdsource_label_subreddit_data = crowdsource_label_subreddit_data.assign(**{
    'question_post_sim' : crowdsource_label_subreddit_data.loc[:, 'question_post_sent_sim'].apply(lambda x: max(x)),
    'question_post_sim_sent' : crowdsource_label_subreddit_data.apply(lambda x: x.loc['text_sents'][np.argmax(np.array(x.loc['question_post_sent_sim']))], axis=1)
})

In [112]:
## test each label separately
from scipy.stats import mannwhitneyu
num_annotators = 2
sim_vars = ['question_post_sim', 'question_title_sim']
for sim_var in sim_vars:
    print(f'**** testing {sim_var} ****')
    for label_col in num_label_cols:
        print(f'---- testing label={label_col} ----')
        ideal_val = ideal_label_vals[label_col]
        ideal_val_data_ids = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: (x.loc[:, label_col]==ideal_val).sum() == num_annotators)
        non_ideal_val_data_ids = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: (x.loc[:, label_col]==(1-ideal_val)).sum() == num_annotators)
        ideal_val_data_ids = ideal_val_data_ids[ideal_val_data_ids].index.tolist()
        non_ideal_val_data_ids = non_ideal_val_data_ids[non_ideal_val_data_ids].index.tolist()
        ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(ideal_val_data_ids)].drop_duplicates('HITId')
        non_ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(non_ideal_val_data_ids)].drop_duplicates('HITId')
        ideal_val_data_sim_vals = ideal_val_data.loc[:, sim_var]
        non_ideal_val_data_sim_vals = non_ideal_val_data.loc[:, sim_var]
        ideal_val_data_sim_median = np.median(ideal_val_data_sim_vals)
        non_ideal_val_data_sim_median = np.median(non_ideal_val_data_sim_vals)
        test_stat, p_val = mannwhitneyu(ideal_val_data_sim_vals, non_ideal_val_data_sim_vals)
        print(f'ideal question similarity={"{:1.3f}".format(ideal_val_data_sim_median)} vs. non-ideal question similarity={"{:1.3f}".format(non_ideal_val_data_sim_median)}')
        print(f'difference has test stat={"{:1.3f}".format(test_stat)} (N={ideal_val_data.shape[0] + non_ideal_val_data.shape[0]}, p={"{:1.3f}".format(p_val)})')

**** testing question_post_sim ****
---- testing label=contains_answer.num ----
ideal question similarity=0.745 vs. non-ideal question similarity=0.784
difference has test stat=31.500 (N=28, p=0.341)
---- testing label=more_info.num ----
ideal question similarity=0.766 vs. non-ideal question similarity=0.799
difference has test stat=10.000 (N=32, p=0.294)
---- testing label=relevant.num ----
ideal question similarity=0.783 vs. non-ideal question similarity=0.786
difference has test stat=19.500 (N=44, p=0.453)
---- testing label=rewrite.num ----
ideal question similarity=0.758 vs. non-ideal question similarity=0.819
difference has test stat=13.000 (N=22, p=0.028)
**** testing question_title_sim ****
---- testing label=contains_answer.num ----
ideal question similarity=0.683 vs. non-ideal question similarity=0.738
difference has test stat=20.500 (N=28, p=0.110)
---- testing label=more_info.num ----
ideal question similarity=0.683 vs. non-ideal question similarity=0.612
difference has tes

In [116]:
## same thing but using all conditions for ideal questions
ideal_val_data_ids = crowdsource_label_subreddit_data.groupby('HITId').apply(lambda x: question_labels_all_valid(x, valid_check_label_cols, ideal_label_vals))
ideal_val_data_ids = ideal_val_data_ids[ideal_val_data_ids].index.tolist()
ideal_val_data = crowdsource_label_subreddit_data[crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(ideal_val_data_ids)].drop_duplicates('HITId')
non_ideal_val_data = crowdsource_label_subreddit_data[~crowdsource_label_subreddit_data.loc[:, 'HITId'].isin(ideal_val_data_ids)].drop_duplicates('HITId')
for sim_var in sim_vars:
    print(f'**** testing {sim_var} ****')
    ideal_val_data_sim_vals = ideal_val_data.loc[:, sim_var]
    non_ideal_val_data_sim_vals = non_ideal_val_data.loc[:, sim_var]
    ideal_val_data_sim_median = np.median(ideal_val_data_sim_vals)
    non_ideal_val_data_sim_median = np.median(non_ideal_val_data_sim_vals)
    test_stat, p_val = mannwhitneyu(ideal_val_data_sim_vals, non_ideal_val_data_sim_vals)
    print(f'ideal question similarity={"{:1.3f}".format(ideal_val_data_sim_median)} vs. non-ideal question similarity={"{:1.3f}".format(non_ideal_val_data_sim_median)}')
    print(f'difference has test stat={"{:1.3f}".format(test_stat)} (N={ideal_val_data.shape[0] + non_ideal_val_data.shape[0]}, p={"{:1.3f}".format(p_val)})')

**** testing question_post_sim ****
ideal question similarity=0.742 vs. non-ideal question similarity=0.786
difference has test stat=196.000 (N=50, p=0.043)
**** testing question_title_sim ****
ideal question similarity=0.687 vs. non-ideal question similarity=0.710
difference has test stat=247.000 (N=50, p=0.250)


What are some examples of questions with high similarity to the posts and titles?

In [119]:
print('ideal questions with low similarity')
display(ideal_val_data.sort_values('question_post_sim', inplace=False, ascending=True).loc[:, ['Input.question', 'question_post_sim', 'question_post_sim_sent']].head(5))
print('non-ideal questions with high similarity')
display(non_ideal_val_data.sort_values('question_post_sim', inplace=False, ascending=False).loc[:, ['Input.question', 'question_post_sim', 'question_post_sim_sent']].head(5))

ideal questions with low similarity


Unnamed: 0,Input.question,question_post_sim,question_post_sim_sent
20,How does this help you in the long run?,0.549635,"New school, teachers are absolute hell, we will write only one real test per year and the rest of it are pop up tests so you have to study 24/7 for every subject and hope for the best"
0,How long will it take you to do your homework?,0.709109,I NEED to do my homework as it’s due in tomorrow but I just can’t be bothered.
62,How much is each card where you are at?,0.709393,Any advice on which one to pick is welcome.
78,What CPU are you using and what games are you even trying to play?,0.715473,Been dissapointed in the gaming experience as I've only been getting 60-80 fps on most games on ultra or high.
34,Wtf were they doing laying in a hammock together?,0.71773,Am I in the wrong here?


non-ideal questions with high similarity


Unnamed: 0,Input.question,question_post_sim,question_post_sim_sent
22,Have you mentioned to your boss how he is making your life miserable by not doing anything about Mr creepo?,0.868096,I've now refused to work night shifts because my store refuses to do anything about this man or how it makes me uncomfortable.
24,"NTA, but why would you want your cousin's hand me downs?",0.85669,So WIBTA for dating him without asking how she feels first?
48,"Did the estate attorney say your and your brother's names ""were on"" the policies, or ""were not"" on them?",0.849153,"Also, everybody is saying they don't have copies of the policies with mine and my brother's names on them."
82,If you're leaving in 10 years or less... Why buy a house?,0.843991,"If I want to buy a house in a couple of years, am I better off using this money for that purpose?"
42,Why are you brewing alcohol in a mental hospital?,0.840838,"but if I’m brewing it in a mental hospital, does that count as possessing it?"
