## CommonsenseQA
CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers . It contains 12,102 questions with one correct answer and four distractor answers.

In [3]:
!wget https://s3.amazonaws.com/commensenseqa/train_rand_split.jsonl

--2021-06-24 15:53:38--  https://s3.amazonaws.com/commensenseqa/train_rand_split.jsonl
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.21.69
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.21.69|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3785890 (3.6M) [binary/octet-stream]
Saving to: ‘train_rand_split.jsonl.1’


2021-06-24 15:53:40 (2.45 MB/s) - ‘train_rand_split.jsonl.1’ saved [3785890/3785890]



In [4]:
import pandas as pd
import json

#### Data Prep from JSON file

In [5]:
with open('train_rand_split.jsonl', encoding="utf8") as f:
    data = f.readlines()
    data = [json.loads(line) for line in data] #convert string to dict format

In [6]:
len(data)

9741

In [7]:
ques = []
ans = []
ch_a = []
ch_b = []
ch_c = []
ch_d = []
ch_e = []

for i in range(len(data)):
  q = data[i]['question']['stem']
  ques.append(q)

  for j in range(5):
    if (data[i]['question']['choices'][j]['label'] == data[i]['answerKey']):
      a = data[i]['question']['choices'][j]['text']
      ans.append(a)
      
    if j == 0: ch_a.append(data[i]['question']['choices'][j]['text'])
    if j == 1: ch_b.append(data[i]['question']['choices'][j]['text'])
    if j == 2: ch_c.append(data[i]['question']['choices'][j]['text'])
    if j == 3: ch_d.append(data[i]['question']['choices'][j]['text'])
    if j == 4: ch_e.append(data[i]['question']['choices'][j]['text'])

In [8]:
data[i]['question']

{'choices': [{'label': 'A', 'text': 'put in to the water'},
  {'label': 'B', 'text': 'cause fire'},
  {'label': 'C', 'text': 'produce heat'},
  {'label': 'D', 'text': 'short fuse'},
  {'label': 'E', 'text': 'shock'}],
 'question_concept': 'electricity',
 'stem': "I forgot to pay the electricity bill, now what can't I do with my ground pump?"}

In [9]:
len(ans) == len(ques)

True

In [10]:
data[i]['question']['choices']

[{'label': 'A', 'text': 'put in to the water'},
 {'label': 'B', 'text': 'cause fire'},
 {'label': 'C', 'text': 'produce heat'},
 {'label': 'D', 'text': 'short fuse'},
 {'label': 'E', 'text': 'shock'}]

In [11]:
df = pd.DataFrame({'ques': ques, 'ans': ans, 'ch_a': ch_a, 'ch_b': ch_b, 'ch_c': ch_c, 'ch_d': ch_d, 'ch_e': ch_e}).reset_index()
df

Unnamed: 0,index,ques,ans,ch_a,ch_b,ch_c,ch_d,ch_e
0,0,The sanctions against the school were a punish...,ignore,ignore,enforce,authoritarian,yell at,avoid
1,1,Sammy wanted to go to where the people were. ...,populated areas,race track,populated areas,the desert,apartment,roadblock
2,2,To locate a choker not located in a jewelry bo...,jewelry store,jewelry store,neck,jewlery box,jewelry box,boutique
3,3,Google Maps and other highway and street GPS s...,atlas,united states,mexico,countryside,atlas,oceans
4,4,"The fox walked from the city into the forest, ...",natural habitat,pretty flowers.,hen house,natural habitat,storybook,dense forest
...,...,...,...,...,...,...,...,...
9736,9736,What would someone need to do if he or she wan...,telling all,consequences,being ridiculed,more money,more funding,telling all
9737,9737,Where might you find a chair at an office?,cubicle,stadium,kitchen,porch,cubicle,living room
9738,9738,Where would you buy jeans in a place with a la...,shopping mall,shopping mall,laundromat,hospital,clothing store,thrift store
9739,9739,John fell down the well. he couldn't believe ...,fairytale,fairytale,farm yard,farm country,michigan,horror movie


In [12]:
print('Ques: ', df.ques[1])
print('Ans : ', df.ans[1])
print('Ans : ', df.ch_a[1])
print('Ans : ', df.ch_b[1])
print('Ans : ', df.ch_c[1])
print('Ans : ', df.ch_d[1])
print('Ans : ', df.ch_e[1])

Ques:  Sammy wanted to go to where the people were.  Where might he go?
Ans :  populated areas
Ans :  race track
Ans :  populated areas
Ans :  the desert
Ans :  apartment
Ans :  roadblock


In [29]:
!python -m spacy download en_core_web_lg

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_lg')


In [16]:
import spacy

nlp = spacy.load('en_core_web_lg')  # make sure to use larger package!
doc1 = nlp(df.ques[1])
doc2 = nlp(df.ch_a[1])
doc3 = nlp(df.ch_b[1])
doc4 = nlp(df.ch_c[1])
doc5 = nlp(df.ch_d[1])
doc6 = nlp(df.ch_e[1])



# Similarity of two documents
print(doc1, "<->", doc2, doc1.similarity(doc2))
print(doc1, "<->", doc3, doc1.similarity(doc3))
print(doc1, "<->", doc4, doc1.similarity(doc4))
print(doc1, "<->", doc5, doc1.similarity(doc5))
print(doc1, "<->", doc6, doc1.similarity(doc6))

Sammy wanted to go to where the people were.  Where might he go? <-> race track 0.5016555105213444
Sammy wanted to go to where the people were.  Where might he go? <-> populated areas 0.4213782561167121
Sammy wanted to go to where the people were.  Where might he go? <-> the desert 0.6109081264891498
Sammy wanted to go to where the people were.  Where might he go? <-> apartment 0.3583721811057241
Sammy wanted to go to where the people were.  Where might he go? <-> roadblock 0.15306034075432243


In [52]:
import random

## Answering the Questions using Doc (sequence of Token) similarity
1. Randomly select a Question
2. Compare similarity of Question with each Choice ( Answer )
3. Predict one with Maximum Similarity score

In [55]:
for j in range(10):
  i = random.randint(1,9000)
  pred = []
  print('*'*30)
  print('Ques : ', df.ques[i])
  ques = nlp(df.ques[i])
  for j in range(5):
    choice = nlp(df.iloc[i,3+j])
    pred.append(ques.similarity(choice))
  
  m = max(pred)
  #print(pred)
  #print(pred.index(m))
  print('Pred : ', df.iloc[i,3+pred.index(m)])
  print('Actual : ',df.ans[i])



******************************
Ques :  If something's weight isn't light, what is it?
Pred :  heavy
Actual :  heavy
******************************
Ques :  The knowledge was expensive to get, where was it being gained?
Pred :  book
Actual :  university
******************************
Ques :  The baker got a new delivery of sugar, he opened the bag and poured it in the storage what?
Pred :  neighbor's house
Actual :  container
******************************
Ques :  After loading up his plate with all the food, what did he look for to wash it all down?
Pred :  falling down
Actual :  cup
******************************
Ques :  Sitting down in a cushy chair will lead to what?
Pred :  sitting
Actual :  comfort
******************************
Ques :  The child dreaded fall and having to what?
Pred :  go to school
Actual :  go to school
******************************
Ques :  Where might a chess set outside get frozen?
Pred :  toy store
Actual :  michigan
******************************
Ques :  Whe