# Analysis of cleaned data from CrowdFlower job #743229

2015-06-19 Tong Shu Li

Now that we have finally cleaned up the data, let's see if we can get anything usable out of it.

Before moving on though, I would like to note that CrowdFlower's quiz mode has failed to work properly this time. Based on the answers of the cheaters and the answers I made for the test questions, I am forced to conclude that:

1. For CML:checkboxes, as long as a worker chooses one choice that matches the gold, then that judgement is considered to be correct, regardless of how many wrong choices were made.
2. Working faster than the minimum time limit per page does not automatically eliminate workers from your job, but rather "flags" them.

Due to conclusion 1, we need to examine the responses by hand to determine their accuracy scores.

In [1]:
from __future__ import division
import pandas as pd

In [2]:
from src.filter_data import filter_data
from src.aggregate_votes import aggregate_votes

### Grab data:

In [3]:
settings = {
    "loc": "data/crowdflower",
    "fname": "cleaned_job_743229_full.csv",
    "data_subset": "normal",
    "min_accuracy": 0.7,
    "max_accuracy": 1.0
}

cleaned_data = filter_data(settings)

In [4]:
len(cleaned_data)

167

Since it seems the CrowdFlower quiz interface is counting a judgement as correct if even one of the correct choices was chosen, we will need to determine the accuracy score of each contributor manually using two grading criteria:

1. Exact match with the gold (must choose all correct choices, and chose no incorrect choices) [extremely strict]
2. Loose match with gold (must choose no wrong answers, but any number of correct choices) [less strict]

In [5]:
settings = {
    "loc": "data/crowdflower",
    "fname": "job_743229_full_results.csv",
    "data_subset": "gold",
    "min_accuracy": 0.0,
    "max_accuracy": 1.0
}

gold_data = filter_data(settings)

In [6]:
len(gold_data)

448

In [7]:
settings = {
    "loc": "data/crowdflower",
    "fname": "job_743229_full_results.csv",
    "data_subset": "all",
    "min_accuracy": 0.0,
    "max_accuracy": 1.0
}

all_data = filter_data(settings)

In [8]:
len(all_data)

714

---

### For each worker who passed the quiz, evaluate their accuracy score manually:

We can modify accuracy score even further:

Each worker judgement is given a real number value from 0 to 1.


1. In order for a judgement to be correct, it must not contain choices which are wrong. An incorrect judgement receives a score of 0.
2. If there are N choices which are correct according to the gold standard, then the judgement is given a score of M/N where M is the number of correct choices chosen.

Therefore a judgement is assigned a score of 1 if and only if the worker chose all of the correct answers and none of the incorrect answers. A score > 0 but < 1 represents a worker who chose some subset of the correct choices and none of the incorrect choices. A score of 0 represents a worker who chose a non-zero number of incorrect choices, and any number of correct choices.


In [9]:
def gold_responses(worker_id):
    # given the responses to the quiz, returns the accuracy score
    # num test questions right / num test questions seen
    
    res = gold_data.query("_worker_id == {0}".format(worker_id))
    for idx, response in res.iterrows():
        correct_choices = response["chemical_disease_relationships_gold"].split('\n')
        user_choices = response["chemical_disease_relationships"].split('\n')
        
        print "correct"
        print correct_choices
        print "user"
        print user_choices
        print

### Our new accuracy score algorithm removes even more people..

In [12]:
def accuracy_score(worker_id):
    # given the responses to the quiz, returns the accuracy score
    # num test questions right / num test questions seen
    
    res = gold_data.query("_worker_id == {0}".format(worker_id))
    
    num_correct = 0.0
    for idx, response in res.iterrows():
        correct_choices = set(response["chemical_disease_relationships_gold"].split('\n'))
        user_choices = set(response["chemical_disease_relationships"].split('\n'))
        
        # less strict than exact match: choose any number of correct, but no wrong ones
        if user_choices - correct_choices == set():
            num_correct += len(user_choices) / len(correct_choices)
            
    return num_correct / len(res)

### Results from new accuracy algorithm:

In [13]:
good_workers = dict()
for worker_id in all_data["_worker_id"].unique():
    toby_trust = accuracy_score(worker_id)
    
    res = gold_data.query("_worker_id == {0}".format(worker_id))
    work_done = len(cleaned_data.query("_worker_id == {0}".format(worker_id)))
    
    if toby_trust >= 0.7 and work_done > 0:
        print worker_id, len(res), work_done
        print "cf trust:", res["_trust"].iloc[0]
        print "toby trust:", toby_trust
        print
        
        good_workers[worker_id] = toby_trust


31001914 6 3
cf trust: 1.0
toby trust: 0.833333333333

31668998 7 7
cf trust: 0.8571
toby trust: 0.714285714286

28175348 6 4
cf trust: 1.0
toby trust: 0.916666666667

29825265 6 4
cf trust: 0.8333
toby trust: 0.833333333333



### Results from old accuracy algorithm:

In [11]:
good_workers = dict()
for worker_id in all_data["_worker_id"].unique():
    toby_trust = accuracy_score(worker_id)
    
    res = gold_data.query("_worker_id == {0}".format(worker_id))
    work_done = len(cleaned_data.query("_worker_id == {0}".format(worker_id)))
    
    if toby_trust >= 0.7 and work_done > 0:
        print worker_id, len(res), work_done
        print "cf trust:", res["_trust"].iloc[0]
        print "toby trust:", toby_trust
        print
        
        good_workers[worker_id] = toby_trust


11000920 10 20
cf trust: 0.8
toby trust: 0.7

16567674 8 12
cf trust: 0.75
toby trust: 0.75

31001914 6 3
cf trust: 1.0
toby trust: 0.833333333333

31668998 7 7
cf trust: 0.8571
toby trust: 0.714285714286

28175348 6 4
cf trust: 1.0
toby trust: 1.0

17839436 7 8
cf trust: 0.7143
toby trust: 0.714285714286

29825265 6 4
cf trust: 0.8333
toby trust: 0.833333333333



### There were 7 workers who maintained 70% accuracy according to our less strict grading scheme, and performed at least one unit of actual work:

In [12]:
good_workers

{11000920: 0.7,
 16567674: 0.75,
 17839436: 0.7142857142857143,
 28175348: 1.0,
 29825265: 0.8333333333333334,
 31001914: 0.8333333333333334,
 31668998: 0.7142857142857143}

Now we filter the data down to the work done by these 7 people, and see if there are any useful responses:

In [13]:
final_data = cleaned_data.query("_worker_id in {0}".format(good_workers.keys()))

In [14]:
len(final_data)

58

We have a total of 58 responses in the final, trustworthy data.

In [15]:
final_data.head()

Unnamed: 0,_unit_id,_created_at,_golden,_id,_missed,_started_at,_tainted,_channel,_trust,_worker_id,...,choice_2_ids,choice_2_label,choice_3_ids,choice_3_label,choice_4_ids,choice_4_label,form_abstract,form_title,pmid,uniq_id
10,739660089,6/19/2015 00:44:51,False,1665425696,,6/19/2015 00:39:14,False,clixsense,1.0,31001914,...,D002945_induces_D003643,"<span class=""chemical"">cisplatin</span> contri...",D002945_induces_D009503,"<span class=""chemical"">cisplatin</span> contri...",D002945_induces_D002289,"<span class=""chemical"">cisplatin</span> contri...","<p>BACKGROUND: <span class=""chemical"">Cisplati...","Paclitaxel, <span class=""chemical"">cisplatin</...",11135224,bcv_id_3
11,739660089,6/19/2015 01:09:45,False,1665448869,,6/19/2015 01:06:44,False,elite,0.8571,31668998,...,D002945_induces_D003643,"<span class=""chemical"">cisplatin</span> contri...",D002945_induces_D009503,"<span class=""chemical"">cisplatin</span> contri...",D002945_induces_D002289,"<span class=""chemical"">cisplatin</span> contri...","<p>BACKGROUND: <span class=""chemical"">Cisplati...","Paclitaxel, <span class=""chemical"">cisplatin</...",11135224,bcv_id_3
24,739660095,6/19/2015 00:44:51,False,1665425703,,6/19/2015 00:39:14,False,clixsense,1.0,31001914,...,D013874_induces_D010146,"<span class=""chemical"">Thiopentone</span> cont...",D013874_induces_D014474,"<span class=""chemical"">Thiopentone</span> cont...",D008012_induces_D010146,"<span class=""chemical"">lidocaine</span> contri...","This study investigated <span class=""chemical""...","<span class=""chemical"">Thiopentone</span> pret...",8595686,bcv_id_9
25,739660095,6/19/2015 01:09:45,False,1665448873,,6/19/2015 01:06:44,False,elite,0.8571,31668998,...,D013874_induces_D010146,"<span class=""chemical"">Thiopentone</span> cont...",D013874_induces_D014474,"<span class=""chemical"">Thiopentone</span> cont...",D008012_induces_D010146,"<span class=""chemical"">lidocaine</span> contri...","This study investigated <span class=""chemical""...","<span class=""chemical"">Thiopentone</span> pret...",8595686,bcv_id_9
28,739660097,6/19/2015 00:29:23,False,1665411021,,6/19/2015 00:27:14,False,neodev,0.8,11000920,...,D010862_induces_D028361,"<span class=""chemical"">pilocarpine</span> cont...",D010862_induces_D004827,"<span class=""chemical"">pilocarpine</span> cont...",D010862_induces_D004833,"<span class=""chemical"">pilocarpine</span> cont...","<span class=""disease"">Mitochondrial abnormalit...",Investigation of mitochondrial involvement in ...,16337777,bcv_id_11


In [16]:
len(final_data["pmid"].unique())

24

In [17]:
len(final_data["uniq_id"].unique())

31

### Replace the crowdflower trust scores with our own:

In [18]:
for idx, row in final_data.iterrows():
    worker_id = row["_worker_id"]
    final_data.loc[idx, "_trust"] = good_workers[worker_id]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


### Write to file:

In [19]:
final_data.to_csv("data/crowdflower/job_743229_final_data.csv", sep = ",", index = False)