In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [13]:
# New IAA consensus answer file from IAA and tag container
schema = pd.read_csv('Covid_Evidence2020_03_21-Schema.csv')
iaa = pd.read_csv('Covid_Evidencev1.IAA-2020-11-17T0612-Tags.csv')

answers = iaa.merge(schema, how="left", on="answer_uuid")
answers = answers[["article_sha256", "schema_sha256", "answer_uuid", "source_task_uuid",
                     "target_text", "question_uuid", "question_label", "answer_label",
                     "question_type", "answer_count"]]

answers = answers.replace(np.nan, '', regex=True)

Function that takes in  **question_label**, **source_task_uuid/quiz_task_uuid**, and **target_text**, and returns a list of all the corresponding consensus answers. This could be an empty list if there is no consensus for that question, a length one list if the question is a "select one" question (including ordinal questions), or a list of multiple answers if it is a "select one"/checkbox question.

If consensus answers exist, they will be in the form T1.Q_.A_

In [14]:
def get_consensus(question_label, quiz_task_uuid, target_text):
    answer_df = answers.loc[(answers["question_label"] == question_label)
                         & (answers["source_task_uuid"] == quiz_task_uuid)
                         & (answers["target_text"] == target_text)]
    
    return answer_df["answer_label"].tolist()

In [15]:
# Checklist example
get_consensus('T1.Q2',
              '5985e15b-66bd-41c5-9659-86fab8376b8d',
              '')

['T1.Q2.A2', 'T1.Q2.A3']

In [16]:
# "Select one" example
get_consensus('T1.Q9',
              'bb5c9329-144f-4d11-b54f-ca30121028a4',
              'For COVID-19, the R0 is estimated to be 3.28, though studies are still ongoing and this number will probably change. This means that for herd immunity, about 70% of the UK population would need to be immune to COVID-1')

['T1.Q9.A3']

Function that takes in  **question_label**, **source_task_uuid/quiz_task_uuid**, and **target_text**, and returns a **tuple containing the question type and number of answer choices, will help with scoring questions**.

If the question_type and num_answer_choices works as expected, we shouldn't need to hardcode the question schema data anymore.

Returned question type will be one of:  
RADIO  
CHECKBOX  
SELECT_SUBTOPIC  
TEXT  
DATE  
TIME  


In [21]:
def get_question_meta(question_label, quiz_task_uuid, target_text):
    answer_df = answers.loc[(answers["question_label"] == question_label)
                         & (answers["source_task_uuid"] == quiz_task_uuid)
                         & (answers["target_text"] == target_text)]
    
    return (answer_df["question_type"].iloc[0], answer_df["answer_count"].iloc[0])

In [22]:
get_question_meta('T1.Q2',
              '5985e15b-66bd-41c5-9659-86fab8376b8d',
              '')

('CHECKBOX', 9)

In [23]:
get_question_meta('T1.Q9',
              'bb5c9329-144f-4d11-b54f-ca30121028a4',
              'For COVID-19, the R0 is estimated to be 3.28, though studies are still ongoing and this number will probably change. This means that for herd immunity, about 70% of the UK population would need to be immune to COVID-1')

('RADIO', 3)