# Negation Model
I have observed that negations are a common mechanism to create unanswerable questions. However, if we only look at negations generally, they occur at a similar rate in both answerable and unanswerable questions. However, if we look at the context, we find that if there is **no** negation in the context then the likelihood of a negation in the question is very low. Here we will encode this as a model.

## Model
We will only look at questions with negations. Training only on answerable questions.

### Model 1 - Naive / binary
P(q) = P(negation | context, question)
This will not work because if we only train on answerable questions that have negations, the prability is 1.

### Model 2 - Bayesian
P(q) = P(DEP | context, question)P(neg | DEP, context, question)

## Pretask 1:
- What are dep tags of negations?
- What are the parents of these tags?
- How do we build a balanced set?

In [15]:
import json
import pickle
import numpy as np
from collections import Counter

In [9]:
questions = []
contexts = []
examples = []
labels = []
id2idx = {}
train_questions = []
train_contexts = []
train_examples = []
train_labels = []
train_id2idx = {}
dev_questions = []
dev_contexts = []
dev_examples = []
dev_labels = []
dev_id2idx = {}
is_train = []
files = ["dataset/train-v2.0.json",  "dataset/dev-v2.0.json"]
for file in files:
    with open(file, 'r') as handle:
        jdata = json.load(handle)
        data = jdata['data']
    for i in range(len(data)):
        section = data[i]['paragraphs']
        for sec in section:
            context = sec['context']
            contexts.append(context)
            other_context = train_contexts if file == files[0] else dev_contexts
            other_context.append(context)
            qas = sec['qas']
            for j in range(len(qas)):
                question = qas[j]['question']
                is_imp = qas[j]['is_impossible']
                qid = qas[j]['id']
                questions.append(question)
                if file == files[0]:
                    is_train.append(True)
                else:
                    is_train.append(False)
                other_questions = train_questions if file == files[0] else dev_questions
                other_questions.append(question)
                labels.append(is_imp)
                other_labels = train_labels if file == files[0] else dev_labels
                other_labels.append(is_imp)
                examples.append((len(contexts)-1, len(questions)-1))
                other_examples = train_examples if file == files[0] else dev_examples
                other_examples.append((len(contexts)-1, len(questions)-1))
                id2idx[qid] = len(questions)-1
                other_id2idx = train_id2idx if file == files[0] else dev_id2idx
                other_id2idx[qid] = len(questions)-1

In [8]:
negation_terms = [' not ',  "n't"]

In [25]:
question_neg_counter = Counter()
q_with_neg = []
# repr = (train, unans)
for i, question in enumerate(questions):
    for neg in negation_terms:
        if neg in question:
            question_neg_counter[(is_train[i], labels[i])] += 1
            q_with_neg.append(i)
            break
    

In [14]:
question_neg_counter

Counter({(True, False): 1381,
         (True, True): 3763,
         (False, True): 644,
         (False, False): 126})

In [17]:
# Naive classifier of p(unans) = 1 if contains negation
print("Niave Classifier Train Acc:", question_neg_counter[(True, True)] / (question_neg_counter[(True, False)] +  question_neg_counter[(True, True)]))
print("Niave Classifier Train Coverage:", question_neg_counter[(True, True)] / np.sum(is_train))

print("Niave Classifier Dev Acc:", question_neg_counter[(False, True)] / (question_neg_counter[(False, False)] +  question_neg_counter[(False, True)]))
print("Niave Classifier Dev Coverage:", question_neg_counter[(False, True)] / (len(questions) - np.sum(is_train)))



Niave Classifier Train Acc: 0.7315318818040435
Niave Classifier Train Coverage: 0.02887529830646337
Niave Classifier Dev Acc: 0.8363636363636363
Niave Classifier Dev Coverage: 0.05424071422555378


In [19]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [41]:
for t in nlp("who couldn't win the presidency?"):
    print(t, t.dep_, t.head)

who nsubj win
could aux win
n't neg win
win ROOT win
the det presidency
presidency dobj win
? punct win


In [27]:
neg_questions = []
for i in q_with_neg:
    q = questions[i]
    neg_questions.append(nlp(q))

In [43]:
dep_counter = Counter()
pos_counter = Counter()
tok_counter = Counter()
for q in neg_questions:
    for t in q:
        if t.dep_ == 'neg':
            tok_counter[t.head.lemma_] += 1
            dep_counter[t.head.dep_] += 1
            pos_counter[t.head.pos_] += 1

In [32]:
dep_counter.most_common()

[('ROOT', 3886),
 ('ccomp', 524),
 ('relcl', 342),
 ('advcl', 244),
 ('acl', 173),
 ('conj', 151),
 ('xcomp', 118),
 ('pcomp', 94),
 ('nsubj', 70),
 ('acomp', 65),
 ('amod', 48),
 ('auxpass', 46),
 ('prep', 33),
 ('aux', 22),
 ('csubj', 19),
 ('compound', 18),
 ('attr', 17),
 ('dep', 16),
 ('advmod', 15),
 ('oprd', 7),
 ('pobj', 7),
 ('nsubjpass', 7),
 ('nmod', 5),
 ('csubjpass', 5),
 ('appos', 4),
 ('npadvmod', 3),
 ('dobj', 3),
 ('intj', 2),
 ('cc', 1),
 ('parataxis', 1),
 ('mark', 1),
 ('det', 1)]

In [33]:
pos_counter.most_common()

[('VERB', 5479),
 ('NOUN', 217),
 ('ADJ', 158),
 ('ADP', 36),
 ('PROPN', 29),
 ('ADV', 24),
 ('DET', 2),
 ('CCONJ', 1),
 ('INTJ', 1),
 ('NUM', 1)]

In [39]:
for q in neg_questions:
    for t in q:
        if t.dep_ == 'neg':
            if t.head.pos_ == 'NUM':
                print(t.head)
                print(q)
                assert False

one
What is economic liberalism not one of the causes of?


AssertionError: 

In [42]:
# Idea, sample sentences that have the same word and pos but no neg as neg set.

In [44]:
tok_counter.most_common(30)

[('be', 1311),
 ('have', 275),
 ('use', 190),
 ('do', 95),
 ('allow', 94),
 ('consider', 72),
 ('require', 58),
 ('want', 56),
 ('make', 50),
 ('find', 50),
 ('include', 47),
 ('take', 45),
 ('know', 42),
 ('need', 35),
 ('define', 34),
 ('give', 34),
 ('believe', 34),
 ('exist', 33),
 ('locate', 32),
 ('become', 32),
 ('change', 31),
 ('support', 30),
 ('call', 29),
 ('contain', 27),
 ('help', 27),
 ('go', 26),
 ('recognize', 25),
 ('develop', 25),
 ('influence', 25),
 ('play', 25)]

## Sampling for labeling
- Sample 1381 sentences for answerable questions w/o negations
- have same distribution of POS heads
- for each POS head, have same distribution of terms


In [63]:
train_q_with_neg = []
for i, question in enumerate(questions):
    for neg in negation_terms:
        if neg in question and is_train[i] and not labels[i]:
            train_q_with_neg.append(i)
            break
train_neg_questions = []
for i in train_q_with_neg:
    q = questions[i]
    train_neg_questions.append(nlp(q))
train_dep_counter = Counter()
train_pos_counter = Counter()
train_tok_counter = Counter()
for q in train_neg_questions:
    for t in q:
        if t.dep_ == 'neg':
            train_tok_counter[t.head.lemma_] += 1
            train_dep_counter[t.head.dep_] += 1
            train_pos_counter[t.head.pos_] += 1

In [65]:
for pos, count in train_pos_counter.most_common():
    print(pos, count / question_neg_counter[(True, False)] )

VERB 0.9275887038377987
ADJ 0.02896451846488052
NOUN 0.02606806661839247
ADP 0.011585807385952208
ADV 0.0065170166545981175
PROPN 0.004344677769732078
DET 0.000724112961622013


In [71]:
sm = 0
for x, c in train_tok_counter.most_common(100):
    print(x, c/question_neg_counter[(True, False)])
    sm += c/question_neg_counter[(True, False)]
print("Sum", sm)

be 0.18464880521361332
have 0.06010137581462708
use 0.03258508327299059
allow 0.02172338884866039
do 0.017378711078928313
want 0.015930485155684286
consider 0.010861694424330196
exist 0.010861694424330196
support 0.010137581462708182
require 0.010137581462708182
make 0.00941346850108617
take 0.007965242577842143
change 0.00724112961622013
include 0.0065170166545981175
find 0.0065170166545981175
know 0.0065170166545981175
need 0.005792903692976104
like 0.005792903692976104
follow 0.005792903692976104
define 0.005792903692976104
believe 0.005792903692976104
recognize 0.005792903692976104
agree 0.005068790731354091
give 0.005068790731354091
see 0.005068790731354091
identify 0.005068790731354091
become 0.005068790731354091
go 0.004344677769732078
intend 0.004344677769732078
happen 0.004344677769732078
hold 0.004344677769732078
work 0.004344677769732078
join 0.003620564808110065
pay 0.003620564808110065
get 0.003620564808110065
part 0.003620564808110065
in 0.003620564808110065
understand 0.

In [59]:
pos_counter.most_common()

[('VERB', 5479),
 ('NOUN', 217),
 ('ADJ', 158),
 ('ADP', 36),
 ('PROPN', 29),
 ('ADV', 24),
 ('DET', 2),
 ('CCONJ', 1),
 ('INTJ', 1),
 ('NUM', 1)]

In [61]:
question_neg_counter[(True, True)]

3763

In [338]:
ignore_terms = ['what', 'in', 'who', 'as', 'of', 'record', 'do', 'many']
# build training data:
q_with_neg_set = set(q_with_neg)
key_words = set(train_tok_counter.keys())
progress_counter = Counter()
pos_examples = []
q_without_neg = set()
q_without_neg_list = []
q_without_neg_heads = []
prev_prog = 0
for i, q in enumerate(questions):
    if not is_train[i] or i in q_with_neg_set or  labels[i] == True:
        continue
    parsed_q = nlp(q)
    for t in parsed_q:
        if t.lemma_ in key_words:
            progress_counter[t.lemma_] += 1
            if progress_counter[t.lemma_] >= train_tok_counter[t.lemma_]:
                key_words.remove(t.lemma_)
            pos_examples.append((q, parsed_q))
            q_without_neg.add(i)
            q_without_neg_list.append(i)
            q_without_neg_heads.append(t.text)
            break
    pct_done = sum(progress_counter.values()) / sum(train_tok_counter.values()) 
    rounded_pct_done = round( pct_done * 100 ) 
    if pct_done > .96:
        break
    if rounded_pct_done % 10 == 0 and rounded_pct_done != prev_prog:
        print("finished", pct_done)
        prev_prog = rounded_pct_done
i == 0
old_left = 99999
while len(pos_examples) < question_neg_counter[(True, False)]:
    q = questions[i]
    if i in q_with_neg_set or  i in q_without_neg or labels[i] == True:
        i += 1
        continue
    parsed_q = nlp(q)
    for t in parsed_q:
        if t.lemma_ in train_tok_counter.keys():
            if t.lemma_ in ignore_terms:
                continue
            pos_examples.append((q, parsed_q))
            progress_counter[t.lemma_] += 1
            q_without_neg.add(i)
            q_without_neg_list.append(i)
            q_without_neg_heads.append(t.text)
            break
    i += 1
    left = question_neg_counter[(True, False)]- len(pos_examples)
    if left != old_left:
        print("Left", left)
        old_left = left
        

finished 0.09503239740820735
finished 0.1951043916486681
finished 0.29517638588912887
finished 0.3952483801295896
finished 0.4953203743700504
finished 0.5953923686105111
finished 0.6954643628509719
finished 0.7955363570914327
finished 0.8956083513318934
Left 47
Left 46
Left 45
Left 44
Left 43
Left 42
Left 41
Left 40
Left 39
Left 38
Left 37
Left 36
Left 35
Left 34
Left 33
Left 32
Left 31
Left 30
Left 29
Left 28
Left 27
Left 26
Left 25
Left 24
Left 23
Left 22
Left 21
Left 20
Left 19
Left 18
Left 17
Left 16
Left 15
Left 14
Left 13
Left 12
Left 11
Left 10
Left 9
Left 8
Left 7
Left 6
Left 5
Left 4
Left 3
Left 2
Left 1
Left 0


In [339]:
len(pos_examples)

1381

In [340]:
question_neg_counter[(True, False)]

1381

In [341]:
pct_done > .96

True

# Training data
- training set
    - train questions w/ and w/o negation - all answerable
    - for each question w/ negation, remove it
    - find head words in non-negation questions
    - (c, q-negation, head_word, label)

- test set
    - dev questions with negation and labels
    - for each question, find the head word and remove the negation
    - (c, q-negation, head_word, label)

In [145]:
from spacy.tokens import Doc
import random 

replace_map = {'ca': 'can', 'wo':'will'}

def remove_negation(sent):
    #parsed = nlp(sent)
    neg_term = None
    for t in sent:
        if t.dep_ == 'neg':
            neg_term = t
    if sent[neg_term.i-1].text in ['ca', 'wo']:
        new_doc = sent.text[:neg_term.idx-2] + replace_map[sent[neg_term.i-1].text] + sent.text[neg_term.idx+len(neg_term.text):]
    else:
        new_doc = sent.text[:neg_term.idx] + sent.text[neg_term.idx+len(neg_term.text):]
    new_doc = new_doc.replace("  ", " ")
    return new_doc

In [206]:
for q in random.sample(neg_questions, 1000):
    if " won't " not in q.text:
        continue
    print("BEFORE:", q)
    print("AFTER: ", remove_negation(q))
    break

BEFORE: Who won't command the reserves?
AFTER:  Who will command the reserves?


## Note: You might want to convert "did not do" to "does" since it's more grammatical

In [342]:
training_data = []
for ii, i in enumerate(q_without_neg_list[:1374]):
    head_word = q_without_neg_heads[ii]
    ci, _ = examples[i]
    context = contexts[ci]
    question = questions[i]
    label = False
    training_data.append((context, question, head_word, label))
    
for ci, qi in examples:
    if not is_train[qi] or labels[qi]:
        continue
    context = contexts[ci]
    question = questions[qi]
    found = False
    for neg in negation_terms:
        if neg in question:
            found = True
            break
    if found:
        label = True
        head_word = None
        parsed_q = nlp(question)
        for t in parsed_q:
            if t.dep_ == 'neg':
                head_word = t.head.text
                break
        if not head_word:
            continue
        #assert headword is not None, parsed_q.text + "||" + " ".join([t.dep_ for t in parsed_q])
        modified_q = remove_negation(parsed_q)
        training_data.append((context, modified_q, head_word, label))
        

In [401]:
test_data = []
for ci, qi in examples:
    if is_train[qi]:
        continue
    context = contexts[ci]
    question = questions[qi]
    found = False
    for neg in negation_terms:
        if neg in question:
            found = True
            break
    if found:
        label = labels[qi]
        head_word = None
        parsed_q = nlp(question)
        for t in parsed_q:
            if t.dep_ == 'neg':
                head_word = t.head.text
                break
        if not head_word:
            continue
        #assert headword is not None, parsed_q.text + "||" + " ".join([t.dep_ for t in parsed_q])
        if "who did the mongols" in parsed_q.text.lower():
            print("WTF")
        modified_q = remove_negation(parsed_q)
        if "who did the mongols" in modified_q.lower():
            print("WTF")
        #print(modified_q)
        test_data.append((context, modified_q, head_word, label))
        
        

WTF


In [366]:
## This is just for testing and finding words to skip
# for i in range(len(questions)):
#     q = questions[i]
#     if i in q_with_neg_set or  i in q_without_neg or labels[i] == True:
#         continue
#     parsed_q = nlp(q)
#     for t in parsed_q:
#         if t.lemma_ in train_tok_counter.keys() and t.pos_ == 'VERB':
#             if t.lemma_ in ['what', 'in', 'who', 'as', 'of', 'record', 'do', 'many']:
#                 continue
#             print(i)
#             print(t.lemma_)
#             print(q)
#             break
#     if i > 200:
#         break

In [371]:
import pickle

In [373]:
with open("negation_training_data.pkl", 'wb') as f:
    pickle.dump(training_data, f)
with open("negation_test_data.pkl", 'wb') as f:
    pickle.dump(test_data, f)   

In [374]:
training_data[0]

('Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".',
 'When did Beyonce start becoming popular?',
 'did',
 False)

In [376]:
labels[138282]

False

In [379]:
for t in nlp(questions[138282]):
    print(t, t.dep_)

Who dobj
did aux
the det
Mongols nsubj
send ROOT
to prep
Bukhara pobj
as prep
administrators pobj
? punct


In [380]:
for neg in negation_terms:
    print(neg in questions[138282])

False
False


In [399]:
for t in test_data:
    if "who did the mongols" in t[1].lower():
        print("WTF", t[1])

WTF  Who did the Mongols send to Bukhara as administrators?


In [388]:
t[1]

'What does have a metric counterpart?'

In [395]:
for ci, qi in examples:
    if is_train[qi]:
        continue
    context = contexts[ci]
    question = questions[qi]
    found = False
    for neg in negation_terms:
        if neg in question:
            found = True
            break
    if found:
        if qi == 138282:
            print("WTF")
        if "who did the mongols" in question.lower():
            print("WTF")
        head_word = None
        parsed_q = nlp(question)
        for t in parsed_q:
            if t.dep_ == 'neg':
                head_word = t.head.text
                break
        if not head_word:
            continue
        modified_q = remove_negation(parsed_q)     

In [410]:
for i, q in enumerate(questions):
    if "william trent" in q.lower():
        print(i,q)

141607 When did British begin to build fort under William Trent?
141611 When didn't British begin to build fort under William Trent?
